This module provides Japanese natural language processing capabilities using PyKNP (Python interface for JUMAN++ and KNP).

Overview

The NLP module enables:

  • Morphological Analysis: Break down Japanese text into morphemes using JUMAN++
  • Ruby Annotation: Automatically add furigana (reading aids) to kanji characters
  • Text Formatting: Convert between halfwidth and fullwidth characters
  • HTML Integration: Process HTML content and add ruby annotations to Japanese text

Key Features

  • Automatic kanji-to-hiragana annotation
  • Character width detection and conversion
  • HTML content processing with Beautiful Soup
  • Interactive display support for Jupyter notebooks

Installation Requirements

Install PyKNP and JUMAN++ for Japanese morphological analysis:

Install JUMAN++

pip install pyknp

Core Text Processing Functions

These functions handle Japanese text analysis and ruby annotation generation using JUMAN++ morphological analyzer.

HTML Processing

Additional dependencies for HTML content processing:

HTML Content Annotation

This function processes HTML content and adds ruby annotations to Japanese text within HTML elements.

is_halfwidth[source]

is_halfwidth(text)

Determine whether the text consists entirely of halfwidth characters. :param text: Input text string :return: True if all characters are halfwidth, otherwise False

halfwidth_to_fullwidth[source]

halfwidth_to_fullwidth(text)

annotate[source]

annotate(text)

pip install beautifulsoup4
pip install lxml

annotate_html[source]

annotate_html(content, interactive=False)