Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: chinese-nlp

baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

Language: C++ - Size: 63.6 MB - Last synced: about 12 hours ago - Pushed: almost 3 years ago - Stars: 3,765 - Forks: 588

baidu/DDParser

百度开源的依存句法分析系统

Language: Python - Size: 354 KB - Last synced: 1 day ago - Pushed: over 1 year ago - Stars: 952 - Forks: 164

ECNU-ICALK/EduChat

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

Language: Python - Size: 210 MB - Last synced: about 3 hours ago - Pushed: 5 months ago - Stars: 610 - Forks: 58

crownpku/Awesome-Chinese-NLP

A curated list of resources for Chinese NLP 中文自然语言处理相关资料

Size: 317 KB - Last synced: 1 day ago - Pushed: 10 months ago - Stars: 7,679 - Forks: 1,705

pwxcoo/chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

Language: Python - Size: 34.6 MB - Last synced: 3 days ago - Pushed: 5 months ago - Stars: 10,663 - Forks: 2,510

HIT-SCIR/ltp

Language Technology Platform

Language: Python - Size: 15.5 MB - Last synced: 4 days ago - Pushed: 10 days ago - Stars: 4,810 - Forks: 1,030

cingtiye/Awesome-Open-domain-Dialogue-Models

Awesome Open-domain Dialogue Models,高质量开放域对话模型集合

Size: 25.4 KB - Last synced: 2 days ago - Pushed: about 1 year ago - Stars: 29 - Forks: 2

CVI-SZU/Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

Language: Python - Size: 7.27 MB - Last synced: 3 days ago - Pushed: 27 days ago - Stars: 2,991 - Forks: 230

modelscope/AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

Language: Python - Size: 5.03 MB - Last synced: 3 days ago - Pushed: 6 months ago - Stars: 367 - Forks: 33

OYE93/Chinese-NLP-Corpus

Collections of Chinese NLP corpus

Language: Python - Size: 7.14 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 848 - Forks: 207

crownpku/Information-Extraction-Chinese

Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取

Language: Python - Size: 78.9 MB - Last synced: 9 days ago - Pushed: 3 months ago - Stars: 2,196 - Forks: 814

esbatmop/MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

Size: 218 KB - Last synced: 9 days ago - Pushed: 11 days ago - Stars: 3,001 - Forks: 206

rime/rime-cantonese

Rime Cantonese input schema | 粵語拼音輸入方案

Language: Python - Size: 96.7 MB - Last synced: 9 days ago - Pushed: 3 months ago - Stars: 497 - Forks: 56

boat-group/fancy-nlp

NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.

Language: Python - Size: 769 KB - Last synced: 8 days ago - Pushed: over 1 year ago - Stars: 282 - Forks: 42

howl-anderson/Chinese_models_for_SpaCy

SpaCy 中文模型 | Models for SpaCy that support Chinese

Language: Jupyter Notebook - Size: 709 KB - Last synced: 10 days ago - Pushed: almost 4 years ago - Stars: 632 - Forks: 110

aplmikex/deduplication_mnbvc

文本去重

Language: Python - Size: 104 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 57 - Forks: 6

voidism/Chinese_Sentence_Dependency_Analyzer

Using Word2vec's center vector and context vector to analysis the collocation relations between Chinese words, and greedily want to extract some dependency relations in sentence (but not so successful).

Language: Python - Size: 5.64 MB - Last synced: 13 days ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

HIT-SCIR/pyltp Fork of HuangFJ/pyltp

pyltp: the python extension for LTP

Language: C++ - Size: 8.76 MB - Last synced: 4 days ago - Pushed: almost 2 years ago - Stars: 1,523 - Forks: 353

dongrixinyu/jiojio

A convenient Chinese word segmentation tool 简便中文分词器

Language: Python - Size: 507 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 34 - Forks: 5

crownpku/Small-Chinese-Corpus

Some useful Chinese corpus datasets 中文语料小数据

Size: 92.4 MB - Last synced: 9 days ago - Pushed: about 4 years ago - Stars: 526 - Forks: 161

howl-anderson/MicroTokenizer

一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese

Language: Python - Size: 174 MB - Last synced: 9 days ago - Pushed: over 1 year ago - Stars: 143 - Forks: 22

Isaac-JL-Chen/rouge_chinese Fork of pltrdy/rouge

Python ROUGE Score Implementation for Chinese Language Task (official rouge score)

Language: Python - Size: 90.8 KB - Last synced: 11 days ago - Pushed: over 1 year ago - Stars: 66 - Forks: 3

lyogavin/Anima

33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU

Language: Jupyter Notebook - Size: 3.15 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 2,790 - Forks: 218

Chunshan-Theta/markovify_zh

加上結巴的馬爾可夫鏈 句子產生器

Language: Python - Size: 23 MB - Last synced: 23 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

old-wang-95/easy-bert

easy-bert是一个中文NLP工具,提供诸多bert变体调用和调参方法,极速上手;清晰的设计和代码注释,也很适合学习

Language: Python - Size: 9.05 MB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 57 - Forks: 11

thunlp/THULAC-Java

An Efficient Lexical Analyzer for Chinese

Language: Java - Size: 332 KB - Last synced: 17 days ago - Pushed: over 6 years ago - Stars: 324 - Forks: 114

iflytek/cino

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Language: Python - Size: 21.7 MB - Last synced: 24 days ago - Pushed: about 1 year ago - Stars: 191 - Forks: 25

thunlp/THULAC-Python

An Efficient Lexical Analyzer for Chinese

Language: Python - Size: 78.1 KB - Last synced: 25 days ago - Pushed: over 2 years ago - Stars: 1,962 - Forks: 334

celtics1863/envtext

中文环境领域文本分析包,纯神经网络架构,支持EnvBert,LSTM,RNN,word2vec等模型,支持自定义模型,下游任务包括分类,回归,多选,情感分析,命名实体识别等,专题包括气候变化文本分析,环境知识图谱等。针对领域研究进行了接口的优化,一键使用模型。

Language: Python - Size: 408 MB - Last synced: 5 days ago - Pushed: about 1 year ago - Stars: 26 - Forks: 4

thunlp/THULAC

An Efficient Lexical Analyzer for Chinese

Language: C++ - Size: 93.8 KB - Last synced: 25 days ago - Pushed: 11 months ago - Stars: 769 - Forks: 170

yaoxiaoyuan/mimix

Mimix: A Text Generation Tool and Pretrained Chinese Models

Language: Python - Size: 6.2 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 144 - Forks: 16

LianjiaTech/BELLE

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

Language: HTML - Size: 18 MB - Last synced: 27 days ago - Pushed: about 2 months ago - Stars: 7,489 - Forks: 730

SleepingMonster/Keras_BiLSTM-CRF_Chinese_Sequence_Annotation

中山大学自然语言处理项目:中文分词(序列标注/命名实体识别)。Keras实现,BiLSTM+CRF框架。

Language: Jupyter Notebook - Size: 15 MB - Last synced: 28 days ago - Pushed: over 3 years ago - Stars: 13 - Forks: 4

didi/ChineseNLP

Datasets, SOTA results of every fields of Chinese NLP

Language: HTML - Size: 875 KB - Last synced: 27 days ago - Pushed: about 2 years ago - Stars: 1,770 - Forks: 276

open-chinese/chinese-word-structure

研究所有汉字的结构,为NLP中汉字结构问题提供完备的解。

Size: 202 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 9 - Forks: 2

brightmart/nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Size: 4.01 MB - Last synced: about 1 month ago - Pushed: 11 months ago - Stars: 9,089 - Forks: 1,526

lionsoul2014/jcseg

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

Language: Java - Size: 21.1 MB - Last synced: 22 days ago - Pushed: 8 months ago - Stars: 905 - Forks: 212

fastnlp/fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Language: Python - Size: 35.1 MB - Last synced: about 1 month ago - Pushed: 11 months ago - Stars: 3,032 - Forks: 454

howl-anderson/WeatherBot

一个基于 Rasa 的中文天气情况问询机器人(chatbot), 带 Web UI 界面

Size: 97.6 MB - Last synced: 10 days ago - Pushed: about 5 years ago - Stars: 234 - Forks: 68

nonamestreet/weixin_public_corpus

微信公众号语料库

Size: 1.37 GB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 558 - Forks: 165

duduscript/split

中文分词程序

Language: Python - Size: 71 MB - Last synced: about 2 months ago - Pushed: about 7 years ago - Stars: 0 - Forks: 1

JherezTaylor/f360-textmining-test

Python code for text mining test

Language: Jupyter Notebook - Size: 9.88 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 230 KB - Last synced: 14 days ago - Pushed: 4 months ago - Stars: 27 - Forks: 7

chatopera/chop

Chinese Tokenizer module for Python

Language: Python - Size: 9.32 MB - Last synced: 9 days ago - Pushed: almost 6 years ago - Stars: 17 - Forks: 8

tim5go/cnn-question-classification-keras

Chinese Question Classifier (Keras Implementation) on BQuLD

Language: Python - Size: 693 KB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 30 - Forks: 14

Kyubyong/g2pC

g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese

Language: Python - Size: 21.8 MB - Last synced: 12 days ago - Pushed: almost 5 years ago - Stars: 231 - Forks: 30

weather-bot/chrono

Javascript 時間自然語言模組 (fork 中文強化版)

Language: JavaScript - Size: 8.41 MB - Last synced: 4 days ago - Pushed: almost 6 years ago - Stars: 8 - Forks: 2

tim5go/zhopenie

Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)

Language: Python - Size: 89.8 KB - Last synced: 20 days ago - Pushed: almost 7 years ago - Stars: 120 - Forks: 26

amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese language

Language: C - Size: 5.75 MB - Last synced: 2 months ago - Pushed: 3 months ago - Stars: 641 - Forks: 85

Abbey4799/CuteGPT

An open-source conversational language model developed by the Knowledge Works Research Laboratory at Fudan University.

Language: Python - Size: 276 KB - Last synced: about 2 months ago - Pushed: 7 months ago - Stars: 60 - Forks: 2

hailiang-wang/hanlp-api Fork of beyai/node-hanlp

中文分词,命名实体识别,关键词提取,自动摘要,短语提取,拼音转换,简繁转换,文本推荐,依存句法分析

Language: JavaScript - Size: 1.27 MB - Last synced: 1 day ago - Pushed: about 7 years ago - Stars: 42 - Forks: 13

thunlp/THUCTC

An Efficient Chinese Text Classifier

Language: Java - Size: 1.67 MB - Last synced: 2 months ago - Pushed: over 5 years ago - Stars: 196 - Forks: 67

HIT-SCIR/ltp4j Fork of ruoshui1126/ltp4j

ltp4j: Language Technology Platform For Java

Language: C++ - Size: 12.7 MB - Last synced: 4 days ago - Pushed: about 3 years ago - Stars: 162 - Forks: 82

IDEA-CCNL/Fengshenbang-LM

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。

Language: Python - Size: 84.5 MB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 3,770 - Forks: 349

linonetwo/segmentit

任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment

Language: JavaScript - Size: 3.18 MB - Last synced: 2 months ago - Pushed: about 1 year ago - Stars: 240 - Forks: 15

Aguila-team/Chinese_NLU_by_using_RASA_NLU

使用 RASA NLU 来构建中文自然语言理解系统(NLU)| Use RASA NLU to build a Chinese Natural Language Understanding System (NLU)

Language: Python - Size: 52.7 KB - Last synced: 10 days ago - Pushed: over 1 year ago - Stars: 125 - Forks: 32

SUFE-AIFLM-Lab/StatChat

StatChat是一个专门用于统计学及相关应用领域(金融学、经济学、商业分析、数据科学等)知识问答的数字化智能学习助手

Size: 1.78 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

ericchw/Youth_Discord_NLP_Chatbot

A python AI chatbot with emotion detection model. Frontend using PHP, API using Flask and database using PostgreSQL. Collaborate with CyberYouth from SJS. @HKMU 2022-2023 FYP

Language: CSS - Size: 24.6 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

zake7749/Gossiping-Chinese-Corpus

PTT 八卦版問答中文語料

Language: Jupyter Notebook - Size: 116 MB - Last synced: about 2 months ago - Pushed: over 3 years ago - Stars: 226 - Forks: 36

esun-ai/phonetic_mlm

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Language: Python - Size: 24.3 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 15 - Forks: 5

FerdinandZhong/punctuator

A small seq2seq punctuator tool based on DistilBERT

Language: Python - Size: 38.8 MB - Last synced: about 1 month ago - Pushed: 9 months ago - Stars: 47 - Forks: 7

kevinhu/hotpot

A lightweight Chinese-English dictionary

Language: JavaScript - Size: 119 MB - Last synced: 27 days ago - Pushed: over 1 year ago - Stars: 3 - Forks: 2

TheOne1006/m3e-server

m3e api

Language: Python - Size: 20.5 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

VilTea/2-gram

2-gram中文分词

Language: Python - Size: 13.9 MB - Last synced: 5 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 2

falcondai/chinese-char-lm

explores Chinese language models with sub-character level visual information

Language: Python - Size: 77.1 KB - Last synced: 27 days ago - Pushed: over 5 years ago - Stars: 16 - Forks: 3

ksOAn6g5/TaiSu

TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)

Language: Python - Size: 3.98 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 129 - Forks: 9

guhhhhaa/4675-scifi

chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料

Size: 113 MB - Last synced: 6 months ago - Pushed: over 1 year ago - Stars: 277 - Forks: 50

jayeew/Chinese-ChatBot

中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行。

Language: Jupyter Notebook - Size: 131 MB - Last synced: 6 months ago - Pushed: 8 months ago - Stars: 315 - Forks: 70

IndexFziQ/LongLM-Eyas

Implement of IIE-NLP-Eyas@OutGen: Chinese Outline-guided Story Generation via a Progressive Plot-Event-Story Framework

Language: Python - Size: 4.92 MB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0

ydli-ai/CSL

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

Language: Python - Size: 3.97 MB - Last synced: 7 months ago - Pushed: 11 months ago - Stars: 435 - Forks: 49

zhongbin1/bert_tokenization_for_java

This is a java version of Chinese tokenization descried in BERT.

Language: Java - Size: 67.4 KB - Last synced: 7 months ago - Pushed: over 1 year ago - Stars: 54 - Forks: 8

chenmingxiang110/Chinese-automatic-speech-recognition

Chinese speech recognition

Language: Jupyter Notebook - Size: 1.58 MB - Last synced: 6 months ago - Pushed: 11 months ago - Stars: 158 - Forks: 62

elisa-aleman/StanfordCoreNLP_Chinese

Chinese implementation of the Python official interface for Stanford CoreNLP Java server application to parse, tokenize, part-of-speech tag, etc. Chinese texts.

Language: Python - Size: 56.6 KB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 29 - Forks: 11

thunlp/THUCKE

THU Chinese Keyphrase Extraction Toolkit

Language: C++ - Size: 44.9 KB - Last synced: 6 months ago - Pushed: about 6 years ago - Stars: 121 - Forks: 17

GotoRyusuke/ChineseFundReports

Repo to save codes for ChineseFundReports project

Language: Python - Size: 95.7 KB - Last synced: 7 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Doragd/Chinese-Chatbot-PyTorch-Implementation

:four_leaf_clover: Another Chinese chatbot implemented in PyTorch, which is the sub-module of intelligent work order processing robot. 👩‍🔧

Language: Python - Size: 81.6 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 809 - Forks: 185

Platanus-hy/sememes_codriven_text_matching

Co-Driven Recognition of Semantic Consistency via the Fusion of Transformer and HowNet Sememes Knowledge

Language: Python - Size: 9.81 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 3 - Forks: 0

Walleclipse/ChineseAddress_OCR

Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。

Language: Python - Size: 241 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 351 - Forks: 134

colibrisson/CHAT_models

Automatic transcription models for Chinese historical documents trained with the kraken OCR engine

Size: 80.4 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 3 - Forks: 0

abner-wong/textrank

keyword extraction and summarization for Chinese text by TextRank

Language: Python - Size: 10.7 KB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 60 - Forks: 16

messense/cppjieba-cabi

Idiomatic C ABI for CppJieba

Language: C++ - Size: 32.2 KB - Last synced: 9 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

DreamerGPT/DreamerGPT

🌱 梦想家(DreamerGPT):中文大语言模型指令精调

Language: Python - Size: 8.93 MB - Last synced: 7 months ago - Pushed: 11 months ago - Stars: 45 - Forks: 2

thinkwee/eda_zh_bert

Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"

Language: Python - Size: 7.81 KB - Last synced: 9 months ago - Pushed: almost 5 years ago - Stars: 9 - Forks: 1

paladin-t/tokenizer

一个简单的中文分词算法,可用于网游脏词过滤、搜索引擎文档解析、自然语言处理等需要中文分词的场合。

Language: Python - Size: 3.91 KB - Last synced: 9 months ago - Pushed: almost 6 years ago - Stars: 7 - Forks: 5

patrick-tssn/CDBert

[ACL2023] Shuo Wen Jie Zi is a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters

Language: Python - Size: 5.05 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 3 - Forks: 0

BOT-Man-JL/Random-Master

A Hackathon Project by Team Dimension, aiming to help people to make choices.

Language: C# - Size: 11.2 MB - Last synced: 9 months ago - Pushed: over 7 years ago - Stars: 6 - Forks: 0

pvalorconsultoria/nlp_lab_usp

Trabalho de conclusão do Laboratório de Processamento de Linguagem Natural da USP 2023. Um exemplo de como refinar um modelo de linguagem treinando em mandarim e adaptá-lo ao português com alguns caveats xD

Language: Jupyter Notebook - Size: 131 KB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

limccn/cacl2

Lexicon for Chinese lexical analyzing, 中文语言分词词库

Language: Python - Size: 291 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 94 - Forks: 18

xtea/chinese_medical_words

手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。

Size: 1.33 MB - Last synced: 9 months ago - Pushed: about 4 years ago - Stars: 85 - Forks: 31

LehaoLin/sentiment-zh_cn_web Fork of thisandagain/sentiment

AFINN-based sentiment analysis for any JS environment.任何 JS 环境可用的中文情感分析包

Language: JavaScript - Size: 145 KB - Last synced: 3 days ago - Pushed: almost 3 years ago - Stars: 3 - Forks: 0

YJiangcm/Chinese-sentence-pair-modeling

Use deep models including BiLSTM, ABCNN, ESIM, RE2, BERT, etc. and evaluate on 5 Chinese NLP datasets: LCQMC, BQ Corpus, ChineseSTS, OCNLI, CMNLI

Language: Jupyter Notebook - Size: 13.5 MB - Last synced: 11 months ago - Pushed: about 2 years ago - Stars: 70 - Forks: 14

hikaru-k-bit/ipa-cyrillic-chinese

Это веб-приложение, которое позволяет конвертировать слова, фразы и предложения, записанные в международном фонетическом алфавите (МФА), в систему Палладия.

Language: HTML - Size: 994 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

minaxixi/ai-couplet

use an AI model to write couplet with TensorFlow 2 / 用AI对对联

Language: Python - Size: 75.2 KB - Last synced: 8 months ago - Pushed: almost 4 years ago - Stars: 20 - Forks: 5

HITsz-TMG/Hansel

Code and data of WSDM 2023 paper "Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark".

Size: 3.7 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 9 - Forks: 1

thunlp/THULAC.so

An Efficient Lexical Analyzer for Chinese

Language: C++ - Size: 47.9 KB - Last synced: 12 months ago - Pushed: over 4 years ago - Stars: 35 - Forks: 20

thunlp/THULAC.NET

An Efficient Lexical Analyzer for Chinese

Size: 1000 Bytes - Last synced: 12 months ago - Pushed: over 6 years ago - Stars: 3 - Forks: 1

piglaker/SpecialEdition

A NLP Project for Chinese Spell Checking Task Released on ACL2023.

Language: Python - Size: 122 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 5 - Forks: 1

acaGPT/gopublic

人工智能辅助数据分析项目开发

Size: 8.79 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 3 - Forks: 0

richard-peng-xia/KD-CGEC

Code for Chinese grammatical error correction based on knowledge distillation

Language: Python - Size: 29 MB - Last synced: 12 months ago - Pushed: over 1 year ago - Stars: 8 - Forks: 0

jsrpy/Chinese-NLP-Jieba

This is an introduction to Chinese words segmentation using Jieba.

Language: Jupyter Notebook - Size: 2.83 MB - Last synced: 5 months ago - Pushed: almost 6 years ago - Stars: 9 - Forks: 1

Related Keywords
chinese-nlp 163 nlp 67 chinese 26 chinese-word-segmentation 17 bert 16 natural-language-processing 16 python 15 deep-learning 13 nlp-machine-learning 10 pytorch 8 corpus 8 keras 8 dataset 8 chinese-text-segmentation 7 machine-learning 7 chatbot 7 word-segmentation 7 llama 6 crf 6 text-mining 5 language-model 5 seq2seq 5 chinese-language 5 ner 5 named-entity-recognition 5 chatgpt 4 lstm 4 corpus-data 4 question-answering 4 nlp-datasets 4 segmentation 4 text-classification 4 tokenizer 4 ai 4 transformers 4 llm 4 chinese-simplified 4 java 4 chinese-characters 4 python3 3 chinese-traditional 3 datasets 3 chinese-corpus 3 machine-translation 3 linguistics 3 open-models 3 bert-chinese 3 hanlp 3 chinese-translation 3 chinese-chatbot 3 classification 3 lora 3 summarization 3 text-processing 3 ocr 3 chinese-segmenter 3 chinese-dataset 3 relation-extraction 3 sequence-labeling 3 part-of-speech-tagger 2 multimodal 2 nlp-parsing 2 nlu 2 pytorch-nlp 2 torch 2 nlp-library 2 knowledge-distillation 2 text-analysis 2 computational-linguistics 2 grammatical-error-correction 2 instruction-set 2 gpt 2 instruct-gpt 2 data-cleaning 2 data-visualization 2 sentiment-analysis 2 srl 2 stanford-corenlp 2 entity-linking 2 bilstm-crf-model 2 database 2 cnn 2 chinese-idiom 2 chrono 2 time 2 spelling-correction 2 postgresql 2 pretrained-models 2 instruction-tuning 2 chinese-ner 2 pos-tagging 2 huggingface 2 information-extraction 2 tensorflow 2 rnn 2 natural-language-understanding 2 tokenization 2 dictionary 2 json 2 data 2