Skip to content

mattn/go-tinysegmenter

Repository files navigation

go-tinysegmenter

Go port of TinySegmenter 0.2. A lightweight library for segmenting Japanese text into words.

This implementation add following features:

  • the NN and BC2/NM feature addition from the original, which prevents consecutive numbers from being split.
  • preserve strings that should not be segmented.
  • preserve token that should be kept URL/E-mail as a single token.

Installation

go get github.com/mattn/go-tinysegmenter

Usage

package main

import (
    "fmt"
    "github.com/mattn/go-tinysegmenter"
)

func main() {
    ts := tinysegmenter.New()
    result := ts.Segment("私の名前は中野です")
    fmt.Println(result) // [私 の 名前 は 中野 です]
}

License

Modified BSD License (same as original TinySegmenter)

Original Implementation

Author

Yasuhiro Matsumoto (a.k.a mattn)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Languages