An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chinese-tokenizer

supercoderhawk/DeepLearning_NLP

基于深度学习的自然语言处理库

Language: Python - Size: 12.2 MB - Last synced at: 14 days ago - Pushed at: over 6 years ago - Stars: 156 - Forks: 40

lionsoul2014/friso

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Language: C - Size: 3.07 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 497 - Forks: 91

howl-anderson/MicroTokenizer

一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

Language: Python - Size: 174 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 150 - Forks: 22

samzshi0529/HanziNLP

A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包

Language: Python - Size: 212 MB - Last synced at: 22 days ago - Pushed at: 6 months ago - Stars: 26 - Forks: 3

howl-anderson/Chinese_tokenizer_benchmark

中文分词软件基准测试 | Chinese tokenizer benchmark

Language: Python - Size: 141 MB - Last synced at: 23 days ago - Pushed at: over 6 years ago - Stars: 23 - Forks: 5

HuangStomach/the-imp

Chinese tokenizer base on nodejieba and pullword

Language: JavaScript - Size: 78.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

howl-anderson/PaddleTokenizer

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

Language: JavaScript - Size: 1.62 MB - Last synced at: 23 days ago - Pushed at: almost 5 years ago - Stars: 15 - Forks: 2

volgachen/Chinese-Tokenization

UCAS Homework for NLP

Language: Python - Size: 39.6 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1