An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chinese-text-segmentation

ssb22/adjuster

Web Adjuster + Annotator Generator

Language: Python - Size: 5.53 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

wolfgarbe/SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: C# - Size: 12 MB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 3,230 - Forks: 303

fukuball/jieba-php

"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.

Language: PHP - Size: 38.4 MB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 1,343 - Forks: 260

oscarsun72/TextForCtext

為了《中國哲學書電子化計劃》輸入用-加速鍵入與排版,更好的輸入體驗+文房一寶勝四寶C#+WordVBA文史工具-中文博士寫程式

Language: C# - Size: 302 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 20 - Forks: 0

ReubenBond/HanBaoBao

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

Language: Java - Size: 90 MB - Last synced at: 18 days ago - Pushed at: about 3 years ago - Stars: 30 - Forks: 5

ssb22/CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)

Size: 35.1 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 4 - Forks: 1

mammothb/symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: Python - Size: 5.93 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 822 - Forks: 122

hankcs/hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统

Language: Java - Size: 73.2 KB - Last synced at: 18 days ago - Pushed at: over 4 years ago - Stars: 297 - Forks: 97

lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

Language: Java - Size: 21.1 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 923 - Forks: 212

yongzhuo/Pytorch-NLU

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

Language: Python - Size: 379 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 341 - Forks: 50

koth/kcws

Deep Learning Chinese Word Segment

Language: C++ - Size: 13.4 MB - Last synced at: 16 days ago - Pushed at: almost 7 years ago - Stars: 2,081 - Forks: 645

supercoderhawk/DNN_CWS

利用深度学习实现中文分词

Language: Python - Size: 46.2 MB - Last synced at: 8 days ago - Pushed at: over 7 years ago - Stars: 60 - Forks: 32

amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese language

Language: C - Size: 5.75 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 723 - Forks: 86

Jarod-Wingfield/Sentimental-Response-of-COVID-19-Outbreak-in-Guangzhou-China-Based-on-Weibo-Night-Comments

This is a practical exercise in processing Chinese text using R packages.

Language: R - Size: 11.1 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

fumiama/jieba

Jiebago 的性能优化版, 支持从 io.Reader 加载字典

Language: Go - Size: 9.21 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 2

konhay/sector-attention-index

Specifically built for the research proposal: Estimating sector attention index with deep learning methods : example of Chinese stock market, Jan. 4, 2024.

Language: Python - Size: 864 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

numb3r3/text_utils

Text Pre-processing toolkit

Language: Python - Size: 22.5 MB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 1

zhangsoledad/solr-ik 📦

solr-ik

Language: Java - Size: 3.97 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 0

blueshen/ik-analyzer

Tokenizer support Lucene5/6/7/8/9+ version, LTS

Language: Java - Size: 1.21 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 193 - Forks: 75

duduscript/split

中文分词程序

Language: Python - Size: 71 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 1

JherezTaylor/f360-textmining-test

Python code for text mining test

Language: Jupyter Notebook - Size: 9.88 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

blueshen/ik-rs

ik-analyzer for rust; chinese tokenizer for tantivy

Language: Rust - Size: 1.18 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 2

yingrui/mahjong

开源中文分词工具包,中文分词Web API,Lucene中文分词,中英文混合分词

Language: Scala - Size: 27.9 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 19

qinwf/jiebaR

Chinese text segmentation with R. R语言中文分词 (文档已更新 🎉 :https://qinwenfeng.com/jiebaR/ )

Language: C++ - Size: 21.6 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 337 - Forks: 110

qiaofei32/dnn-lstm-word-segment

Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network

Language: Python - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 23 - Forks: 15

deminy/jieba-php Fork of fukuball/jieba-php 📦

"结巴中文分词"PHP版本

Language: PHP - Size: 36.3 MB - Last synced at: about 1 month ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 1

Colearo/HuhuSeg

Simple Chinese segmentator, keywords extractor and other examples

Language: Python - Size: 14.4 MB - Last synced at: 13 days ago - Pushed at: almost 7 years ago - Stars: 8 - Forks: 1

ChiChou/zhparser-docker

Postgresql with zhparser

Size: 4.88 KB - Last synced at: 9 days ago - Pushed at: over 8 years ago - Stars: 9 - Forks: 5

smilepy/jieba Fork of fxsjy/jieba

结巴中文分词

Language: Python - Size: 42.2 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

jk195417/chinese-segmentation-as-service

Using Flask export jieba, SnowNLP, pkuseg as http API web service.

Language: Python - Size: 2.87 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

stephanoskomnenos/vscode-jieba

基于 jieba-rs 的中文分词插件

Language: TypeScript - Size: 4.22 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 1

fg607/ChatterBot Fork of gunthercox/ChatterBot

ChatterBot中文适配版,支持中文分词搜索和中文停用词

Language: Python - Size: 4.13 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 5

ericlingit/jieba-go

A copy-cat implementation of jieba as a learning exercise.

Language: Go - Size: 247 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

wycm/xuexin-ocr

学信网学籍&学历图片内容识别

Language: Python - Size: 173 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 3

FlyingOE/q_BosonNLP

Wrapper for BosonNLP online API

Size: 19.5 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

secsilm/text-segmentation-trap

一些容易被分词工具被分错的句子。

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

davidlorente78/Recogzi

Language: F# - Size: 21.9 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

jason2506/esapp

An unsupervised Chinese word segmentation tool.

Language: C++ - Size: 254 KB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 12 - Forks: 2

smart-lands-com/smla-cut

Chinese text segmentation

Language: Python - Size: 15.6 KB - Last synced at: 20 days ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

butlerwilson/cwseg

Chinese words segment,基于统计的分词系统实现

Language: C++ - Size: 3.35 MB - Last synced at: almost 2 years ago - Pushed at: about 9 years ago - Stars: 0 - Forks: 0

pengjinning/kcws Fork of koth/kcws

Deep Learning Chinese Word Segment

Language: C++ - Size: 10.3 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0