Go port of TinySegmenter 0.2. A lightweight library for segmenting Japanese text into words.
This implementation add following features:
- the NN and BC2/NM feature addition from the original, which prevents consecutive numbers from being split.
- preserve strings that should not be segmented.
- preserve token that should be kept URL/E-mail as a single token.
go get github.com/mattn/go-tinysegmenterpackage main
import (
"fmt"
"github.com/mattn/go-tinysegmenter"
)
func main() {
ts := tinysegmenter.New()
result := ts.Segment("私の名前は中野です")
fmt.Println(result) // [私 の 名前 は 中野 です]
}Modified BSD License (same as original TinySegmenter)
- TinySegmenter by Taku Kudo
Yasuhiro Matsumoto (a.k.a mattn)