A fancy plural for corpus ;) Also, a collection of handy but not especially mutually integrated tools for dealing with linguistic data. It abstracts away functionality which is often needed in practice in day to day work at the Czech National Corpus, without aspiring to be a fully featured or consistent NLP framework.
Currently available sub-packages are:
- morphodita: tokenizing and tagging raw textual data using MorphoDiTa
- vertical: parsing corpora in the vertical format devised originally for CWB, used also by (No)SketchEngine
$ pip3 install git+https://github.com/dlukes/corpyOnly recent versions of Python 3 are supported by design.
Copyright © 2016--present ÚČNK/David Lukeš
Distributed under the GNU General Public License v3.