This module provides Japanese natural language processing capabilities using PyKNP (Python interface for JUMAN++ and KNP).
Overview
The NLP module enables:
- Morphological Analysis: Break down Japanese text into morphemes using JUMAN++
- Ruby Annotation: Automatically add furigana (reading aids) to kanji characters
- Text Formatting: Convert between halfwidth and fullwidth characters
- HTML Integration: Process HTML content and add ruby annotations to Japanese text
Key Features
- Automatic kanji-to-hiragana annotation
- Character width detection and conversion
- HTML content processing with Beautiful Soup
- Interactive display support for Jupyter notebooks
Install JUMAN++
pip install pyknp
pip install beautifulsoup4
pip install lxml