Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chinese-word-segmentation

baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

Language: C++ - Size: 63.6 MB - Last synced: about 19 hours ago - Pushed: almost 3 years ago - Stars: 3,765 - Forks: 588

ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

Language: Python - Size: 166 MB - Last synced: 2 days ago - Pushed: about 2 years ago - Stars: 3,212 - Forks: 610

oscarsun72/TextForCtext

為了《中國哲學書電子化計劃》輸入用

Language: C# - Size: 96.4 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 11 - Forks: 0

messense/jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

Language: Rust - Size: 2.86 MB - Last synced: 10 days ago - Pushed: 19 days ago - Stars: 686 - Forks: 40

dongrixinyu/jiojio

A convenient Chinese word segmentation tool 简便中文分词器

Language: Python - Size: 507 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 34 - Forks: 5

lancopku/pkuseg-python

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

Language: Python - Size: 4.18 MB - Last synced: 14 days ago - Pushed: over 1 year ago - Stars: 6,430 - Forks: 975

howl-anderson/MicroTokenizer

一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese

Language: Python - Size: 174 MB - Last synced: 9 days ago - Pushed: over 1 year ago - Stars: 143 - Forks: 22

mammothb/symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: Python - Size: 5.76 MB - Last synced: 14 days ago - Pushed: about 2 months ago - Stars: 766 - Forks: 116

usaoc/chissor

GUI application for Chinese word segmentation

Language: Rust - Size: 18 MB - Last synced: 19 days ago - Pushed: 20 days ago - Stars: 0 - Forks: 0

hankcs/pyhanlp

中文分词

Language: Python - Size: 235 KB - Last synced: 23 days ago - Pushed: 5 months ago - Stars: 3,068 - Forks: 800

monpa-team/monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Language: Python - Size: 8.24 MB - Last synced: 24 days ago - Pushed: almost 2 years ago - Stars: 244 - Forks: 26

wolfgarbe/SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: C# - Size: 12.1 MB - Last synced: 30 days ago - Pushed: about 1 month ago - Stars: 3,031 - Forks: 280

didi/ChineseNLP

Datasets, SOTA results of every fields of Chinese NLP

Language: HTML - Size: 875 KB - Last synced: 27 days ago - Pushed: about 2 years ago - Stars: 1,770 - Forks: 276

Embedding/Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Language: Python - Size: 1.42 MB - Last synced: about 1 month ago - Pushed: 6 months ago - Stars: 11,533 - Forks: 2,303

lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

Language: Java - Size: 21.1 MB - Last synced: 23 days ago - Pushed: 8 months ago - Stars: 905 - Forks: 212

Kyubyong/g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Language: Python - Size: 21.8 MB - Last synced: 12 days ago - Pushed: almost 5 years ago - Stars: 231 - Forks: 30

coldcolacos/iepy_demo

Support Chinese word segmentation based on Information Extraction in Python.

Language: Python - Size: 25.6 MB - Last synced: 4 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

AniChikage/SegmentationCN

程序用于中文语句的分词,实现基于最大匹配算法的前向、后向和双向分词技术,并提供了接口,具体下面介绍。

Language: C# - Size: 59.6 KB - Last synced: 4 months ago - Pushed: almost 7 years ago - Stars: 2 - Forks: 0

HuangStomach/the-imp

Chinese tokenizer base on nodejieba and pullword

Language: JavaScript - Size: 78.1 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

felipemarinho97/pinyin-word-api

Get audio pronunciation from chinese words. ex: /api/可以

Language: JavaScript - Size: 14 MB - Last synced: about 2 months ago - Pushed: almost 3 years ago - Stars: 6 - Forks: 2

bhrdj/rongzi

rongzi = banyan-like Chinese character-component paths. 永遠:永詠言這辶遠, 青鳥:青靖立鴗鳥

Language: Jupyter Notebook - Size: 1.77 MB - Last synced: 6 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

NLPIR-team/NLPIR-ICTCLAS

The Java Package of NLPIR-ICTCLAS.

Language: Java - Size: 17.1 MB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 17 - Forks: 10

lionsoul2014/friso

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

Language: C - Size: 3.07 MB - Last synced: 7 months ago - Pushed: 8 months ago - Stars: 451 - Forks: 93

AiningWang/Chinese-Words-Segmentation

Chinese word segmentation algorithm based on entropy(基于熵,无需语料库的中文分词)

Language: Python - Size: 21.1 MB - Last synced: 7 months ago - Pushed: about 6 years ago - Stars: 13 - Forks: 1

hemingkx/WordSeg

A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .

Language: Python - Size: 4.2 MB - Last synced: 7 months ago - Pushed: almost 2 years ago - Stars: 164 - Forks: 40

llhthinker/MachineLearningLab

Some experiments about Machine Learning

Language: Python - Size: 74.2 MB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 107 - Forks: 193

kindkindcom/Python-GCSpeechToText_SRT_for_Chinese

谷歌云声音转文字之SRT字幕中文语言版 | Google Cloud SpeechToText, SRT for Chinese

Language: Python - Size: 16.6 KB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 2

binaryoung/jieba-php

The Jieba Chinese Word Segmentation Implemented in PHP

Language: PHP - Size: 6.9 MB - Last synced: 25 days ago - Pushed: almost 4 years ago - Stars: 19 - Forks: 8

messense/cppjieba-cabi

Idiomatic C ABI for CppJieba

Language: C++ - Size: 32.2 KB - Last synced: 10 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

messense/cjieba-py

Python cffi binding to CppJieba

Language: Python - Size: 4.06 MB - Last synced: 6 days ago - Pushed: over 3 years ago - Stars: 15 - Forks: 0

hankcs/sub-character-cws

Sub-Character Representation Learning

Language: Python - Size: 42.1 MB - Last synced: 9 months ago - Pushed: almost 6 years ago - Stars: 25 - Forks: 2

jk195417/chinese-segmentation-as-service

Using Flask export jieba, SnowNLP, pkuseg as http API web service.

Language: Python - Size: 2.87 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 4 - Forks: 1

Ailln/simple-jieba

✂️用 100 行实现简单版本的 jieba 分词

Language: Python - Size: 1.95 MB - Last synced: 7 days ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 1

yizhiru/thulac4j

Chinese Word Segmentation Tool, THULAC的Java实现.

Language: Java - Size: 17.4 MB - Last synced: 10 months ago - Pushed: about 3 years ago - Stars: 86 - Forks: 32

xtea/chinese_medical_words

手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

Size: 1.33 MB - Last synced: 10 months ago - Pushed: about 4 years ago - Stars: 85 - Forks: 31

NLPIR-team/elasticsearch-analysis-ictclas

Elasticsearch analysis plugin of ICTCLAS

Language: Java - Size: 25.9 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0

supercoderhawk/DeepLearning_NLP

基于深度学习的自然语言处理库

Language: Python - Size: 12.2 MB - Last synced: 6 months ago - Pushed: over 5 years ago - Stars: 155 - Forks: 41

fumiama/jieba

Jiebago 的性能优化版, 支持从 io.Reader 加载字典

Language: Go - Size: 9.21 MB - Last synced: 9 months ago - Pushed: over 1 year ago - Stars: 11 - Forks: 2

brynne8/wordseg

Chinese Word Segmentation in Lua

Language: Lua - Size: 1.6 MB - Last synced: 25 days ago - Pushed: over 3 years ago - Stars: 6 - Forks: 1

jsrpy/Chinese-NLP-Jieba

This is an introduction to Chinese words segmentation using Jieba.

Language: Jupyter Notebook - Size: 2.83 MB - Last synced: 5 months ago - Pushed: almost 6 years ago - Stars: 9 - Forks: 1

bububa/jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

Language: Go - Size: 88.4 MB - Last synced: 12 months ago - Pushed: almost 3 years ago - Stars: 11 - Forks: 5

akibcmi/SAMS

Python Code for paper Attention is All You Need For Chinese Word Segmentation.

Language: Python - Size: 46.9 KB - Last synced: 12 months ago - Pushed: over 3 years ago - Stars: 6 - Forks: 3

izackwu/ChineseWordSegmentationSystem 📦

A Chinese word segmentation system, mainly based on HMM and Maximum Matching, with a local website built with Flask as the UI.

Language: Python - Size: 14.4 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 5 - Forks: 0

moronism189/chinese-nlp-stepbystep

从jieba分词到BERT-wwm,一步步带你进入中文NLP的世界

Language: Jupyter Notebook - Size: 1.98 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 6 - Forks: 0

fudannlp16/CWS_Dict

Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018

Language: Python - Size: 39.3 MB - Last synced: 11 months ago - Pushed: over 6 years ago - Stars: 91 - Forks: 32

GanjinZero/GTS

Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]

Language: Python - Size: 144 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 13 - Forks: 1

kuangmeng/GraduationProject

毕设:面向领域快速移植的高精度分词系统

Language: Java - Size: 12.6 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 6 - Forks: 4

yihong-chen/chinese-word-segmentation

Simple chinese word segmentation with experiments on the PKU datatset

Language: Jupyter Notebook - Size: 23.2 MB - Last synced: 10 days ago - Pushed: about 6 years ago - Stars: 9 - Forks: 1

fg607/ChatterBot Fork of gunthercox/ChatterBot

ChatterBot中文适配版,支持中文分词搜索和中文停用词

Language: Python - Size: 4.13 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 11 - Forks: 5

voidism/pywordseg

Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816

Language: Python - Size: 237 MB - Last synced: 14 days ago - Pushed: almost 3 years ago - Stars: 39 - Forks: 7

Chanrom/nlpcc2016-chinese-weibo-segmentation

The 1st solution (close and semi-open track) in NLPCC 2016 Chinese Weibo Segmentation

Language: Python - Size: 7.41 MB - Last synced: 9 months ago - Pushed: almost 6 years ago - Stars: 4 - Forks: 1

ShenDezhou/CRF

A Conditional Random Field Model based Chinese Word Segmentation Project.

Language: Python - Size: 10.2 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 1 - Forks: 1

Riccorl/chinese-word-segmentation-pytorch

Chinese Word Segmentation task based on BERT and implemented in Pytorch

Language: Python - Size: 133 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 12 - Forks: 2

NLPIR-team/nlpir-analysis-cn-ictclas

Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。

Language: Java - Size: 29.9 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 71 - Forks: 27

Edward-Sun/GapBasedCWS

The implementation of https://arxiv.org/abs/1712.09509

Language: Python - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 2 - Forks: 0

elsheikh21/chinese-word-segmentation

Implementing the SOTA for Chinese Word Segmentation using Keras

Language: Python - Size: 1.95 MB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 3 - Forks: 0

dalinvip/PyTorch_Chinese_word_segmentation

Chinese word segmentation with the neural seq2seq model implement in pytorch

Language: Python - Size: 59.6 KB - Last synced: 12 months ago - Pushed: over 6 years ago - Stars: 10 - Forks: 3

Hoiy/berserker

Berserker - BERt chineSE woRd toKenizER

Language: Python - Size: 174 KB - Last synced: 3 months ago - Pushed: about 5 years ago - Stars: 17 - Forks: 1

supercoderhawk/DNN_CWS

利用深度学习实现中文分词

Language: Python - Size: 46.2 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 61 - Forks: 32

supercoderhawk/DeepNLP

基于深度学习的自然语言处理库

Language: Python - Size: 11.9 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 37 - Forks: 19

jcyk/greedyCWS

Source code for an ACL2017 paper on Chinese word segmentation

Language: Python - Size: 48.2 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 90 - Forks: 23

wangjksjtu/multi-embedding-cws

Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019

Language: Python - Size: 26.3 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 15 - Forks: 5

mathsyouth/awesome-word-segmentation

A curated list of resources dedicated to word segmentation

Size: 9.77 KB - Last synced: 4 days ago - Pushed: over 5 years ago - Stars: 12 - Forks: 1

dhchenx/ner-kit

A toolkit for simple NLP APIs based on Stanza

Language: Python - Size: 56.6 KB - Last synced: 5 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

egrcc/Cross-Domain-CWS

Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"

Language: Python - Size: 3.27 MB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 14 - Forks: 2

richardcsuwandi/hmm-word-segmentation

Chinese word segmentation using Hidden Markov Model (HMM)

Language: Jupyter Notebook - Size: 631 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

chengdu/yuet

Tiny speech synthesis engine of Cantonese for offline embedded system.

Size: 5.86 KB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 9 - Forks: 1

napoler/tkit-seg

多领域中文分词工具

Language: HTML - Size: 21.1 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

WindomZ/gcws

gcws is CWS(Chinese Word Segmentation) for golang - 一个开源中文分词集成

Language: Go - Size: 19.5 KB - Last synced: 27 days ago - Pushed: about 6 years ago - Stars: 4 - Forks: 0

limchiahooi/nlp-chinese

This repo contains my Natural Language Processing (NLP) in Chinese project.

Language: Jupyter Notebook - Size: 40.4 MB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 1 - Forks: 0

secsilm/text-segmentation-trap

一些容易被分词工具被分错的句子。

Language: Jupyter Notebook - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

INotWant/crf

Chinese word segmentation tool based on CRF

Language: Java - Size: 17.6 KB - Last synced: about 1 year ago - Pushed: almost 5 years ago - Stars: 5 - Forks: 4

kemingy/handict

Language: Python - Size: 2.1 MB - Last synced: 24 days ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

guangzhixie/Chinese-word-segmentation

Chinese word segmentation

Language: Jupyter Notebook - Size: 14.2 MB - Last synced: 11 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1

Steven0038/NLP_course_report

PKU NLP lesson report, based on HMM POS, NER

Language: Python - Size: 26.9 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

izackwu/SegmentationCRF

Chinese word segmentation with CRF++.

Language: Python - Size: 65.6 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 4 - Forks: 0

Saltychtao/CWS

Chinese word segmenter based on bi-LSTM network

Language: Python - Size: 30.4 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 5 - Forks: 1

heropoo/cws

中文分词 Chinese Word Segmentation

Size: 4.88 KB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

JT501/SCWS Fork of vanry/SCWS

簡易中文分詞系統在Laravel的拓展包。

Language: PHP - Size: 16.6 KB - Last synced: 10 days ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

JT501/laravel-scout-tntsearch-driver-chinese Fork of teamtnt/laravel-scout-tntsearch-driver

支援SCWS中文分詞功能的 Laravel Scout TNTSearch 驅動包

Language: PHP - Size: 118 KB - Last synced: 9 days ago - Pushed: over 4 years ago - Stars: 0 - Forks: 1

dalinvip/pytorch_Joint-Word-Segmentation-And-POS-Tagging-old

pytorch_seq2seq_wordseg_and_postag

Language: Python - Size: 1.93 MB - Last synced: 12 months ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

zhengyuan-liu/DNN-for-CWS

Deep neural network for Chinese word segmentation

Language: Python - Size: 30.9 MB - Last synced: about 1 year ago - Pushed: about 5 years ago - Stars: 3 - Forks: 1

Saltychtao/njuseg

Language: Python - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 1

dalinvip/pytorch_seq2seq_wordseg_and_postag_version2

pytorch_seq2seq_wordseg_and_postag_version2

Language: Python - Size: 2.06 MB - Last synced: 12 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1

supercoderhawk/CWS_LSTM Fork of FudanNLP/CWS_LSTM

Long Short-Term Memory Neural Networks for Chinese Word Segmentation

Language: Python - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

guangzhixie/Chinese-word-segmenter-Java

Language: Java - Size: 18.6 KB - Last synced: 11 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

zhuangh/kcws Fork of koth/kcws

Deep Learning Chinese Word Segment

Language: C++ - Size: 13.4 MB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

Related Keywords
chinese-word-segmentation 87 chinese-nlp 17 natural-language-processing 15 nlp 15 word-segmentation 9 chinese-text-segmentation 9 chinese 9 cws 8 tensorflow 7 named-entity-recognition 7 jieba 6 crf 6 deep-learning 6 segmentation 5 tokenizer 5 python 5 pos-tagging 5 pytorch 5 ner 4 bert 4 chinese-tokenizer 4 jieba-chinese 4 seq2seq 3 pinyin 3 embeddings 3 ictclas 3 text-segmentation 3 classification 3 lstm 3 chinese-language 3 chinese-characters 3 pos 3 java 2 spellcheck 2 spelling 2 spelling-correction 2 symspell 2 part-of-speech-tagger 2 transformer 2 python3 2 seq2seq-batch 2 text-analysis 2 lucene-analyzer 2 mmseg 2 chinese-traditional 2 nlpir 2 nlp-machine-learning 2 scws 2 approximate-string-matching 2 damerau-levenshtein 2 edit-distance 2 laravel 2 spell-check 2 levenshtein-distance 2 levenshtein 2 fuzzy-search 2 fuzzy-matching 2 chinese-simplified 2 solr 1 neural-network 1 bert-embeddings 1 token 1 unsupervised-features 1 nlpcc2016 1 lucene 1 character-level-elmo 1 chinese-stop-words 1 chatterbot 1 nlp-datasets 1 chatbot 1 elasticsearch-analysis 1 elasticsearch-plugin 1 njunlp 1 relation-extraction 1 segment 1 postagging 1 unsupervised 1 graph-cut 1 bert-wwm 1 hmm-viterbi-algorithm 1 flask-application 1 clustering 1 lua 1 jieba-analysis 1 golang-package 1 golang-library 1 golang 1 word-embedding 1 hmm 1 viterbi-algorithm 1 cantonese 1 chinese-speech-synthesis 1 hong-kong 1 jyutping 1 tntsearch 1 search 1 mandarin 1 scout 1 natural-language-understanding 1 text-to-speech 1