GitHub / proycon / python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/proycon%2Fpython-ucto
Stars: 29
Forks: 5
Open issues: 5
License: None
Language: Cython
Size: 87.9 KB
Dependencies parsed at: Pending
Created at: about 11 years ago
Updated at: 6 months ago
Pushed at: 6 months ago
Last synced at: 3 days ago
Commit Stats
Commits: 138
Authors: 1
Mean commits per author: 138.0
Development Distribution Score: 0.0
More commit stats: https://commits.ecosyste.ms/hosts/GitHub/repositories/proycon/python-ucto
Topics: computational-linguistics, folia, nlp, nlp-library, python, text-processing, tokenizer