An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: word-segmentation

wolfgarbe/SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: C# - Size: 12 MB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 3,230 - Forks: 303

PyThaiNLP/pythainlp

Thai natural language processing in Python

Language: Python - Size: 65.7 MB - Last synced at: 3 days ago - Pushed at: 7 days ago - Stars: 1,027 - Forks: 277

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language: C++ - Size: 23.9 MB - Last synced at: 6 days ago - Pushed at: 22 days ago - Stars: 10,799 - Forks: 1,216

bab2min/Kiwi

Kiwi(지능형 한국어 형태소 분석기)

Language: C++ - Size: 396 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 575 - Forks: 51

monpa-team/monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Language: Python - Size: 8.25 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 246 - Forks: 25

ku-nlp/jumanpp

Juman++ (a Morphological Analyzer Toolkit)

Language: C++ - Size: 3.78 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 387 - Forks: 44

yaoguangluo/ChromosomeDNA

《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.

Language: Java - Size: 676 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 7 - Forks: 2

chengchingwen/BytePairEncoding.jl

Julia implementation of Byte Pair Encoding for NLP

Language: Julia - Size: 2.28 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 3

baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

Language: C++ - Size: 63.6 MB - Last synced at: 12 days ago - Pushed at: almost 4 years ago - Stars: 3,921 - Forks: 596

modelscope/AdaSeq

AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models

Language: Python - Size: 5.03 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 434 - Forks: 41

ckiplab/ckip-transformers

CKIP Transformers

Language: Python - Size: 232 KB - Last synced at: 15 days ago - Pushed at: about 2 years ago - Stars: 723 - Forks: 76

cbaziotis/ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language: Python - Size: 659 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 667 - Forks: 92

JayYip/m3tl

BERT for Multitask Learning

Language: Jupyter Notebook - Size: 29.1 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 547 - Forks: 125

bab2min/kiwipiepy

Python API for Kiwi

Language: Python - Size: 163 MB - Last synced at: 10 days ago - Pushed at: about 2 months ago - Stars: 312 - Forks: 32

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

Language: Python - Size: 39.4 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 397 - Forks: 23

mammothb/symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: Python - Size: 5.93 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 822 - Forks: 122

dnanhkhoa/python-vncorenlp

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

Language: Python - Size: 40 KB - Last synced at: 9 days ago - Pushed at: over 6 years ago - Stars: 56 - Forks: 18

VKCOM/YouTokenToMe 📦

Unsupervised text tokenizer focused on computational efficiency

Language: C++ - Size: 192 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 966 - Forks: 103

eskriett/spell

Spelling correction and string segmentation written in Go

Language: Go - Size: 50.8 KB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 27 - Forks: 5

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

Language: Rust - Size: 27.3 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 19 - Forks: 0

peterolson/hanzi-tools

Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.

Language: JavaScript - Size: 2.51 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 111 - Forks: 19

yongzhuo/Pytorch-NLU

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee

Language: Python - Size: 379 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 341 - Forks: 50

ikegami-yukino/mecab Fork of taku910/mecab 📦

This repository is archived! The maintained MeCab can be found https://github.com/shogo82148/mecab

Language: C++ - Size: 84.2 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 254 - Forks: 16

viig99/SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Language: C++ - Size: 8.31 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 42 - Forks: 7

MighTguY/customized-symspell

Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm

Language: Java - Size: 8.6 MB - Last synced at: 19 days ago - Pushed at: over 4 years ago - Stars: 67 - Forks: 18

ckiplab/ckipnlp

CKIP CoreNLP Toolkits

Language: Python - Size: 573 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 119 - Forks: 15

Ailln/nlp-roadmap

🗺️ 一个自然语言处理的学习路线图

Size: 135 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 109 - Forks: 12

giganticode/codeprep

A toolkit for pre-processing large source code corpora

Language: Python - Size: 1.56 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 47 - Forks: 11

KrakenAI/SynThai

Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning

Language: Python - Size: 35.2 KB - Last synced at: 11 minutes ago - Pushed at: almost 8 years ago - Stars: 40 - Forks: 16

hnthap/vietnamese-word-segment

Vietnamese word segmentation package.

Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

fudannlp16/CWS_Dict

Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018

Language: Python - Size: 39.3 MB - Last synced at: about 5 hours ago - Pushed at: about 7 years ago - Stars: 90 - Forks: 32

dalinvip/pytorch_Joint-Word-Segmentation-and-POS-Tagging

Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging

Language: Python - Size: 293 KB - Last synced at: about 19 hours ago - Pushed at: about 6 years ago - Stars: 35 - Forks: 11

yuanhao-chen-nyoeghau/shanghainese-tts

Shanghainese TTS

Language: Jupyter Notebook - Size: 1.98 GB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 5

JayYip/cws-tensorflow

基于Tensorflow的中文分词模型

Language: Python - Size: 2.47 MB - Last synced at: 18 days ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 3

echogarden-project/text-segmentation

A library for multilingual word, phrase and sentence segmentation.

Language: TypeScript - Size: 22.5 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

hellonlp/hellonlp

NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现

Language: Python - Size: 43.9 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 8

echogarden-project/icu-segmentation-wasm

WebAssembly port of the ICU library's character, word, line-break, and sentence segmentation methods.

Language: C - Size: 27.1 MB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ruanchaves/hashformers

Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).

Language: Python - Size: 23.6 MB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 70 - Forks: 5

hrishikeshrt/vaiyyakarana

Vaiyyākaraṇaḥ is a telegram bot that offers various tools for a Sanskrit learner including stem (प्रातिपदिकम्) finder, root (धातुः) finder, declension (सुबन्ताः) generator, conjugation (तिङन्ताः) generator, and compound word (सन्धिसमासौ) splitter.

Language: Python - Size: 9.63 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 12 - Forks: 1

jidasheng/bi-lstm-crf

A PyTorch implementation of the BI-LSTM-CRF model.

Language: Python - Size: 12.7 KB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 246 - Forks: 48

paceheart/butthead-headbutt

Find all compound words that work both ways, like "butthead" and "headbutt"

Language: Python - Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

bnosac/sentencepiece

R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece

Language: C++ - Size: 4.56 MB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 25 - Forks: 6

apdullahyayik/TrTokenizer

🧩 A simple sentence tokenizer.

Language: Python - Size: 480 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 1

hankcs/iparser

Yet another dependency parser, integrated with tokenizer, tagger and visualization tool.

Language: Python - Size: 69.3 KB - Last synced at: 12 days ago - Pushed at: about 7 years ago - Stars: 11 - Forks: 2

vncorenlp/VnCoreNLP

A Vietnamese natural language processing toolkit (NAACL 2018)

Language: Java - Size: 232 MB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 570 - Forks: 141

NoerNova/ShanNLP

ShanNLP experimental project inspired by PythaiNLP

Language: Python - Size: 630 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 8 - Forks: 1

Ailln/simple-jieba

✂️用 100 行实现简单版本的 jieba 分词

Language: Python - Size: 1.95 MB - Last synced at: 7 days ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 1

salsowelim/tawseem

NLP crowdsourcing platform for word-level annotations

Language: Go - Size: 716 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 13 - Forks: 1

jacksonllee/pycantonese

Cantonese Linguistics and NLP

Language: Python - Size: 15.1 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 335 - Forks: 38

crackcell/gonlpir

Golang wapper for NLPIR/ICTCLAS2015.

Language: Go - Size: 79.8 MB - Last synced at: 10 months ago - Pushed at: over 8 years ago - Stars: 23 - Forks: 6

ankane/youtokentome-ruby 📦

High performance unsupervised text tokenization for Ruby

Language: Ruby - Size: 31.3 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 1

taishi-i/toiro

A comparison tool of Japanese tokenizers

Language: Python - Size: 1.04 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 115 - Forks: 8

wchan757/Cantonese_Word_Segmentation

Dictionary for Cantonese word segmentation

Size: 826 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 33 - Forks: 5

ikegami-yukino/rakutenma-python

Rakuten MA (Python version)

Language: Python - Size: 24.1 MB - Last synced at: 14 days ago - Pushed at: almost 8 years ago - Stars: 22 - Forks: 1

seanghay/khmersegment

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

Language: Python - Size: 12.7 KB - Last synced at: about 23 hours ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

ye-kyaw-thu/sylbreak

Syllable segmentation tool for Myanmar language (Burmese) by Ye.

Language: HTML - Size: 2.97 MB - Last synced at: 9 months ago - Pushed at: over 1 year ago - Stars: 55 - Forks: 19

PyThaiNLP/pylexto 📦

LexTo with Python 2 & 3 Wrapper

Language: Java - Size: 189 KB - Last synced at: 12 months ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 3

crusnic-corp/BN-DRISHTI

Line and Word Segmentation for Bangla Handwritten Text Recognition

Language: Jupyter Notebook - Size: 169 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 2

stevenay/myan-word-breaker

Myanmar Word Segmentation Tool

Language: Python - Size: 859 KB - Last synced at: 9 months ago - Pushed at: over 6 years ago - Stars: 29 - Forks: 9

ckiplab/ckip-classic

CKIP Classic Word Segmentation and Sentence Parsing Tools

Language: Python - Size: 356 KB - Last synced at: 20 days ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 2

ljdyer/Naive-Bayes-Space-Restorer

Train Naive Bayes-based statistical machine learning models for restoring spaces to unsegmented sequences of input characters

Language: Python - Size: 18 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

fly-studio/word_rpc_server

A word segmentation RPC server via HanLP, ansj_seg

Language: Java - Size: 36.1 KB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ThuraAung1601/mmCRFseg

mmCRFseg: Word Segmentation for Myanmar Language using Conditional Random Fields

Language: Jupyter Notebook - Size: 611 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 1

mwhirls/bunsetsu

A wrapper library around https://github.com/takuyaa/kuromoji.js that intelligently groups Japanese morphemes into words

Language: TypeScript - Size: 372 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

waf/thai-word-split

Experiments in Thai word splitting

Language: HTML - Size: 1.95 KB - Last synced at: 28 days ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

qiaofei32/dnn-lstm-word-segment

Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network

Language: Python - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 23 - Forks: 15

arxiver/Onepiecelang

Text segmentation solution using natural language processing.

Language: Jupyter Notebook - Size: 1010 KB - Last synced at: 8 days ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

kdrl/WNE

C++ implementation of the paper "Word-like n-gram embedding". EMNLP 2018 Workshop on Noisy User-generated Text.

Language: C++ - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

naetherm/NLP

Some of my NLP projects I've worked on and to harden my experience with the research field of NLP.

Language: Python - Size: 45.2 MB - Last synced at: 27 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

khanhtran2000/FPT.AI_2020

My work during internship at FPT.AI 2020

Language: Jupyter Notebook - Size: 778 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

Cater5009/Chinese-word-segmentation

使用MM、RMM、BM和CRF实现中文分词

Language: Roff - Size: 61.9 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 3

Nguyendat-bit/VieTokenizer

Vietnamese Tokenizer package based on deeplearning methods

Language: Python - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

fastcws/fastcws

轻量级高性能中文分词项目

Language: C++ - Size: 524 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 190 - Forks: 8

GargNishant/OCR_Neural_Networks Fork of argman/EAST 📦

OCR using Tessaract Engine on top of Tensorflow model EAST

Language: C++ - Size: 1.98 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

sgrpanchal31/SymSpell

This repo contains the Python 3 compatible code for SymSpell algorithm

Language: Python - Size: 17.6 KB - Last synced at: 4 months ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

messense/cjieba-py

Python cffi binding to CppJieba

Language: Python - Size: 4.06 MB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 15 - Forks: 0

jcyk/CWS

Source code for an ACL2016 paper of Chinese word segmentation

Language: Python - Size: 44.9 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 80 - Forks: 26

Socret360/joint-khmer-word-segmentation-and-pos-tagging

A Keras implementation of a deep learning network to simultaneously perform Word Segmentation and Part-of-Speech (POS) Tagging introduced by Bouy et al. in the paper Joint Khmer Word Segmentation and Part-of-Speech Tagging Using Deep Learning.

Language: Python - Size: 10.3 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 1

shinjinighosh/F20-9.660-Word-Segmentation

Word Segmentation Final Project for 9.660 Computational Cognitive Science, Fall 2020

Language: TeX - Size: 17.6 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

jp-myk/lm-decoder

Language Model Decoder is Transducer from a sentence to word/reading sequence.

Language: C++ - Size: 763 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 3

seanghay/mini-KhmerNLP

A mini version of KhmerNLP with LSTM only

Language: Python - Size: 49.8 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

wolfgarbe/WordSegmentationDP

Word Segmentation with Dynamic Programming

Language: C# - Size: 1.21 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

Jyutt/jieba-hs

Jieba中文分詞算法Haskell版本 Haskell Implementation of Jieba Chinese Segmentation Algorithm

Language: Haskell - Size: 969 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

hongquan/ViStickedWord

Library to split sticked Vietnamese words

Language: Python - Size: 43 KB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

wolfgarbe/WordSegmentationTM

Fast Word Segmentation with Triangular Matrix

Language: C# - Size: 1.22 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 67 - Forks: 12

maris205/dnasearchengine

Segmenting DNA sequence into ‘words’,https://arxiv.org/pdf/1202.2518.pdf

Language: C++ - Size: 20.7 MB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

KOLANICH-libs/WordSplitAbs.py

An abstraction layer around word splitters for python

Language: Python - Size: 13.7 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

dogterbox/thai-word-segmentation

Thai word segmentation using deep learning

Language: Jupyter Notebook - Size: 19.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 6 years ago - Stars: 10 - Forks: 1

fastcws/tagged-wiki2019zh

基于4-tag标注好的2019中文维基语料库,使用hanlp进行标注

Language: Python - Size: 1.95 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

rust-han/han-segment

基于隐式马尔可夫模型和正向最大化匹配的中文分词系统

Language: Rust - Size: 1.97 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 3

harshavkumar/word_segmentation

Word Segmentation done for handwritten text recogntion

Language: Python - Size: 1.99 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 5

undertheseanlp/word_tokenize 📦

Vietnamese Word Tokenize

Language: Python - Size: 28.5 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 45 - Forks: 24

ljdyer/Space-Punct-Cap-Restoration

A portal to GitHub repositories associated with the paper "Comparison of Token- and Character-Level Approaches to Restoration of Spaces, Punctuation, and Capitalization in Various Languages"

Language: SCSS - Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

electron0zero/dym-server 📦

Did you mean API server

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

lixxin2/uninlp-phd 📦

No long maintained! Java codes for basic natural language processing tasks, including Pinyin-to-Character Conversion, Chinese word segmentation, Part-of-Speech tagging, English chunking, dependency parsing

Language: Java - Size: 3.42 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 1

levyfan/sentencepiece-jni

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

Language: C++ - Size: 240 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 28 - Forks: 10

Ronny12301/Separador-de-Palabras

El programa obtendrá las palabras de un archivo de texto plano y las dividirá en un archivo llamando igual que la letra inicial de la palabra, util si tienes un diccionario de palabras muy grande y lo quieres separar en archivos más pequeños. Utilice este codigo como apoyo para crear los archivos de mi Wordament Solver

Language: Java - Size: 1.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

cvikasreddy/skt

Sanskrit compound segmentation using seq2seq model

Language: Python - Size: 24.6 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 22 - Forks: 11

labibdotc/Universal-Machine

Purpose of this is to understand virtual-machine code (and by extension machine code) by writing a software implementation of a simple virtual machine. This work put into test our ability to design, document, and implement a program with a clean modular structure. In this project, it shows how the structural choices may affect the performance of your programs. (We profile it in a later project indeed to be 36x faster.) The primary goal of the design and implementation is clean structure

Language: C - Size: 57.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

datquocnguyen/RDRsegmenter

A Fast and Accurate Vietnamese Word Segmenter (LREC 2018)

Language: Java - Size: 420 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 69 - Forks: 9

Related Keywords
word-segmentation 135 nlp 51 natural-language-processing 33 python 17 pos-tagging 15 text-segmentation 11 chinese-word-segmentation 10 spelling-correction 10 named-entity-recognition 10 deep-learning 9 vietnamese-nlp 9 segmentation 8 tokenizer 8 vietnamese 8 machine-learning 8 symspell 8 chinese-nlp 7 chinese 7 tensorflow 7 spellcheck 7 crf 6 pytorch 6 bert 6 ner 6 part-of-speech-tagging 6 spell-check 6 nlp-library 6 vietnamese-tokenizer 6 sequence-labeling 5 spelling 5 sentence-segmentation 5 java 5 thai-nlp 5 morphological-analysis 5 golang 5 pos-tagger 5 chinese-text-segmentation 5 language-model 4 part-of-speech-tagger 4 lstm 4 transformers 4 neural-machine-translation 4 word-segmenter 4 thai 4 computational-linguistics 4 levenshtein-distance 4 word-split 4 fuzzy-search 4 damerau-levenshtein 4 sanskrit 4 japanese-language 3 ckip 3 nlp-machine-learning 3 spell-corrector 3 tokenization 3 cws 3 ocr 3 text-classification 3 bpe 3 python3 3 spelling-corrector 3 spellchecker 3 word-embedding 3 sentiment-analysis 3 approximate-string-matching 3 edit-distance 3 fuzzy-matching 3 thai-language 3 japanese 3 opencv 2 korean 2 sentence-parsing 2 korean-nlp 2 korean-tokenizer 2 jieba 2 jieba-chinese 2 unigram 2 sentencepiece 2 rust 2 cantonese 2 vncorenlp 2 language-learning 2 linguistics 2 dnn 2 word-tokenizing 2 text-processing 2 hanlp 2 seq2seq 2 sanskrit-segmentation 2 java-bindings 2 google-sentencepiece 2 sentence-boundary-detection 2 word-boundary 2 spell-checker 2 character-segmentation 2 levenshtein 2 truecasing 2 punctuation-restoration 2 twitter 2 handwriting-recognition 2