chinese-nlp | Topic | Ecosyste.ms: Repos

Topic: "chinese-nlp"

pwxcoo/chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语，成语，词语，汉字。

Language: Python - Size: 34.6 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 11,163 - Forks: 2,616

brightmart/nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Size: 3.91 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 9,635 - Forks: 1,558

LianjiaTech/BELLE

BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）

Language: HTML - Size: 18 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 8,130 - Forks: 769

crownpku/Awesome-Chinese-NLP

A curated list of resources for Chinese NLP 中文自然语言处理相关资料

Size: 317 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 7,864 - Forks: 1,717

lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

Language: Jupyter Notebook - Size: 3.22 MB - Last synced at: 4 days ago - Pushed at: 5 months ago - Stars: 5,757 - Forks: 457

HIT-SCIR/ltp

Language Technology Platform

Language: Python - Size: 15.6 MB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 5,089 - Forks: 1,051

IDEA-CCNL/Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

Language: Python - Size: 84.5 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 4,106 - Forks: 380

baidu/lac

百度NLP：分词，词性标注，命名实体识别，词重要性

Language: C++ - Size: 63.6 MB - Last synced at: 16 days ago - Pushed at: almost 4 years ago - Stars: 3,921 - Forks: 596

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

Size: 681 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3,812 - Forks: 265

fastnlp/fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Language: Python - Size: 35.1 MB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 3,126 - Forks: 450

CVI-SZU/Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集

Language: Python - Size: 7.27 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 3,049 - Forks: 234

crownpku/Information-Extraction-Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Language: Python - Size: 78.9 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 2,255 - Forks: 808

thunlp/THULAC-Python

An Efficient Lexical Analyzer for Chinese

Language: Python - Size: 78.1 KB - Last synced at: 12 days ago - Pushed at: about 3 years ago - Stars: 2,059 - Forks: 335

didi/ChineseNLP

Datasets, SOTA results of every fields of Chinese NLP

Language: HTML - Size: 875 KB - Last synced at: 21 days ago - Pushed at: about 3 years ago - Stars: 1,804 - Forks: 271

HIT-SCIR/pyltp Fork of HuangFJ/pyltp

pyltp: the python extension for LTP

Language: C++ - Size: 8.76 MB - Last synced at: 12 days ago - Pushed at: almost 3 years ago - Stars: 1,544 - Forks: 350

baidu/DDParser

百度开源的依存句法分析系统

Language: Python - Size: 354 KB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 986 - Forks: 162

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

Language: Java - Size: 21.1 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 923 - Forks: 212

OYE93/Chinese-NLP-Corpus

Collections of Chinese NLP corpus

Language: Python - Size: 7.14 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 897 - Forks: 210

Doragd/Chinese-Chatbot-PyTorch-Implementation

:four_leaf_clover: Another Chinese chatbot implemented in PyTorch, which is the sub-module of intelligent work order processing robot. 👩‍🔧

Language: Python - Size: 81.6 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 890 - Forks: 194

thunlp/THULAC

An Efficient Lexical Analyzer for Chinese

Language: C++ - Size: 93.8 KB - Last synced at: 4 days ago - Pushed at: almost 2 years ago - Stars: 806 - Forks: 173

ECNU-ICALK/EduChat

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型，GPU部署，数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

Language: Jupyter Notebook - Size: 210 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 785 - Forks: 86

amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese language

Language: C - Size: 5.75 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 723 - Forks: 86

howl-anderson/Chinese_models_for_SpaCy

SpaCy 中文模型 | Models for SpaCy that support Chinese

Language: Jupyter Notebook - Size: 709 KB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 661 - Forks: 111

ydli-ai/CSL

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

Language: Python - Size: 3.97 MB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 623 - Forks: 59

rime/rime-cantonese

Rime Cantonese input schema | 粵語拼音輸入方案

Language: Python - Size: 88.5 MB - Last synced at: 12 days ago - Pushed at: 2 months ago - Stars: 580 - Forks: 64

nonamestreet/weixin_public_corpus

微信公众号语料库

Size: 1.37 GB - Last synced at: 10 months ago - Pushed at: over 6 years ago - Stars: 568 - Forks: 165

crownpku/Small-Chinese-Corpus

Some useful Chinese corpus datasets 中文语料小数据

Size: 92.4 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 529 - Forks: 162

modelscope/AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

Language: Python - Size: 5.03 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 434 - Forks: 41

Walleclipse/ChineseAddress_OCR

Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。

Language: Python - Size: 241 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 351 - Forks: 134

thunlp/THULAC-Java

An Efficient Lexical Analyzer for Chinese

Language: Java - Size: 332 KB - Last synced at: 20 days ago - Pushed at: over 7 years ago - Stars: 332 - Forks: 111

jayeew/Chinese-ChatBot

中文聊天机器人，基于10万组对白训练而成，采用注意力机制，对一般问题都会生成一个有意义的答复。已上传模型，可直接运行。

Language: Jupyter Notebook - Size: 131 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 315 - Forks: 70

boat-group/fancy-nlp

NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.

Language: Python - Size: 769 KB - Last synced at: 22 days ago - Pushed at: over 2 years ago - Stars: 284 - Forks: 40

linonetwo/segmentit

任何 JS 环境可用的中文分词包，fork from leizongmin/node-segment

Language: JavaScript - Size: 3.18 MB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 283 - Forks: 16

guhhhhaa/4675-scifi

chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说，中文科幻小说自然语言处理语料库，中文科幻小说文本语料库，中文科幻小说文本数据库，科幻小说语料

Size: 113 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 277 - Forks: 50

iflytek/cino

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Language: Python - Size: 21.7 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 242 - Forks: 30

Kyubyong/g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Language: Python - Size: 21.8 MB - Last synced at: 20 days ago - Pushed at: almost 6 years ago - Stars: 240 - Forks: 31

howl-anderson/WeatherBot

一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面

Size: 97.6 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 237 - Forks: 68

zake7749/Gossiping-Chinese-Corpus

PTT 八卦版問答中文語料

Language: Jupyter Notebook - Size: 116 MB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 231 - Forks: 36

thunlp/THUCTC

An Efficient Chinese Text Classifier

Language: Java - Size: 1.67 MB - Last synced at: 1 day ago - Pushed at: over 6 years ago - Stars: 207 - Forks: 68

HIT-SCIR/ltp4j Fork of ruoshui1126/ltp4j

ltp4j: Language Technology Platform For Java

Language: C++ - Size: 12.7 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 162 - Forks: 82

chenmingxiang110/Chinese-automatic-speech-recognition

Chinese speech recognition

Language: Jupyter Notebook - Size: 1.58 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 158 - Forks: 62

howl-anderson/MicroTokenizer

一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

Language: Python - Size: 174 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 150 - Forks: 22

yaoxiaoyuan/mimix

Mimix: A Text Generation Tool and Pretrained Chinese Models

Language: Python - Size: 6.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 144 - Forks: 16

ksOAn6g5/TaiSu

TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）

Language: Python - Size: 3.98 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 9

Aguila-team/Chinese_NLU_by_using_RASA_NLU

使用 RASA NLU 来构建中文自然语言理解系统（NLU）| Use RASA NLU to build a Chinese Natural Language Understanding System (NLU)

Language: Python - Size: 52.7 KB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 125 - Forks: 32

thunlp/THUCKE

THU Chinese Keyphrase Extraction Toolkit

Language: C++ - Size: 44.9 KB - Last synced at: 1 day ago - Pushed at: about 7 years ago - Stars: 125 - Forks: 19

tim5go/zhopenie

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Language: Python - Size: 89.8 KB - Last synced at: 10 months ago - Pushed at: almost 8 years ago - Stars: 119 - Forks: 26

limccn/cacl2

Lexicon for Chinese lexical analyzing, 中文语言分词词库

Language: Python - Size: 291 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 117 - Forks: 22

Isaac-JL-Chen/rouge_chinese Fork of pltrdy/rouge

Python ROUGE Score Implementation for Chinese Language Task (official rouge score)

Language: Python - Size: 90.8 KB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 101 - Forks: 5

crazywhalecc/idiom-database

成语数据库，成语接龙数据库，拥有30000+个成语，可直接使用首拼音和尾拼音编写自己的成语接龙

Size: 12.2 MB - Last synced at: 2 days ago - Pushed at: about 4 years ago - Stars: 93 - Forks: 22

xtea/chinese_medical_words

手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

Size: 1.33 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 85 - Forks: 31

Nrgeup/chinese_semantic_role_labeling

基于 Bi-LSTM 和 CRF 的中文语义角色标注

Language: Python - Size: 26 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 83 - Forks: 23

taishan1994/pytorch_bert_event_extraction

基于pytorch+bert的中文事件抽取

Language: Python - Size: 5.81 MB - Last synced at: 21 days ago - Pushed at: almost 3 years ago - Stars: 72 - Forks: 4

YJiangcm/Chinese-sentence-pair-modeling

Use deep models including BiLSTM, ABCNN, ESIM, RE2, BERT, etc. and evaluate on 5 Chinese NLP datasets: LCQMC, BQ Corpus, ChineseSTS, OCNLI, CMNLI

Language: Jupyter Notebook - Size: 13.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 70 - Forks: 14

old-wang-95/easy-bert

easy-bert是一个中文NLP工具，提供诸多bert变体调用和调参方法，极速上手；清晰的设计和代码注释，也很适合学习

Language: Python - Size: 9.05 MB - Last synced at: 9 months ago - Pushed at: over 2 years ago - Stars: 68 - Forks: 12

Abbey4799/CuteGPT

An open-source conversational language model developed by the Knowledge Works Research Laboratory at Fudan University.

Language: Python - Size: 276 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 3

abner-wong/textrank

keyword extraction and summarization for Chinese text by TextRank

Language: Python - Size: 10.7 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 60 - Forks: 16

aplmikex/deduplication_mnbvc

文本去重

Language: Python - Size: 104 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 57 - Forks: 6

zhongbin1/bert_tokenization_for_java

This is a java version of Chinese tokenization descried in BERT.

Language: Java - Size: 67.4 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 54 - Forks: 8

FerdinandZhong/punctuator

A small seq2seq punctuator tool based on DistilBERT

Language: Python - Size: 119 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 51 - Forks: 8

JaniceZhao/Douban-Dushu-Dataset

A dataset contains 37 million douban dushu comments

Size: 66.4 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 51 - Forks: 6

DreamerGPT/DreamerGPT

🌱 梦想家(DreamerGPT)：中文大语言模型指令精调

Language: Python - Size: 8.93 MB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 50 - Forks: 2

secsilm/zi-dataset

汉字数据集，包括汉字的相关信息，例如笔画数、部首、拼音、英文释义/同义词等。

Size: 1.57 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 50 - Forks: 8

guhhhhaa/wula-scifi

chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档，中文科幻小说自然语言处理语料库，中文科幻小说文本语料库，中文科幻小说文本数据库，科幻小说语料

Size: 199 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 49 - Forks: 9