An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-segmentation

catalyst-team/catalyst

Accelerated deep learning R&D

Language: Python - Size: 52.6 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 3,355 - Forks: 395

sedflix/awesome-topic-segmentation

(yet another not really) awesome topic/text segmentation list

Size: 15.6 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 109 - Forks: 13

ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.

Language: Python - Size: 16.8 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,766 - Forks: 159

mammothb/symspellpy

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: Python - Size: 5.76 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 827 - Forks: 124

blmoistawinde/HarvestText

文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

Language: Python - Size: 4.27 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 2,515 - Forks: 336

rlayers/pawpaw

Text Processing & Segmentation Framework

Language: Python - Size: 2.52 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

wolfgarbe/SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

Language: C# - Size: 12 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3,242 - Forks: 303

cbaziotis/ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language: Python - Size: 659 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 671 - Forks: 92

viig99/SymSpellCppPy

Fast SymSpell written in c++ and exposes to python via pybind11

Language: C++ - Size: 8.31 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 43 - Forks: 8

notAI-tech/deepsegment 📦

A sentence segmenter that actually works!

Language: Python - Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 306 - Forks: 55

ZumingHuang/awesome-ocr-resources

A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

Size: 10.5 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 418 - Forks: 72

ReubenBond/HanBaoBao

Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)

Language: Java - Size: 90 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 30 - Forks: 5

eskriett/spell

Spelling correction and string segmentation written in Go

Language: Go - Size: 50.8 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 5

google/emoji-segmenter

Emoji Segmenter

Language: C - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 64 - Forks: 15

DCY1117/MangaQuick

Automatic Manga Translator

Language: Jupyter Notebook - Size: 20 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 35 - Forks: 7

zamgi/lingvo--TextSegmenter

Text segmentation into separate words using a simple unigram model and the Viterbi algorithm

Language: C# - Size: 38 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 2

sigpwned/uax29

Java implementation of UAX#29 text segmentation algorithm

Language: Java - Size: 359 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nitely/nim-segmentation

Unicode text segmentation (tr29)

Language: Nim - Size: 40 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 10 - Forks: 1

Yannael/automatic-video-chaptering

Automate video chaptering with LLMs and TF-IDF: Transform raw transcripts into well-structured documents

Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

npillmayer/uax

Unicode Text Segmentation Algorithms

Language: Go - Size: 1.81 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 2

sobir-git/tajik-text-segmentation

Tajik text segmentation algorithms

Language: Python - Size: 53.7 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Feoramund/ucg

UTF-8 grapheme counting library written in C99.

Language: C - Size: 79.1 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Chayan-halder/WBSUBNdb_text---Bangla-handwritten-text-document-dataset

"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.

Size: 1.49 GB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

muhammad-usman-108/ai21-sdk

A npm package designed specializing in Natural Language Processing, which develops AI systems that can understand and generate natural language.

Language: TypeScript - Size: 29.3 KB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

danburzo/ltr

Split text into chars, words, or sentences from the command line.

Language: JavaScript - Size: 15.6 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

arxiver/Onepiecelang

Text segmentation solution using natural language processing.

Language: Jupyter Notebook - Size: 1010 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

koomri/text-segmentation

Implementation of the paper: Text Segmentation as a Supervised Learning Task

Language: Python - Size: 4.79 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 227 - Forks: 55

shayneobrien/text-segmentation

Neural and nonneural text segmentation methods.

Language: Jupyter Notebook - Size: 256 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 1

rafaelhferreira/grounded_task_segmentation_cta

Repo for the paper "Grounded Complex Task Segmentation for Conversational Assistants" presented at SIGDIAL 2023

Language: Python - Size: 6.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

wolfgarbe/WordSegmentationDP

Word Segmentation with Dynamic Programming

Language: C# - Size: 1.21 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

MosesBomera/Feedback-Prize-Evaluating-Student-Writing

Analyzing argumentative writing elements from students grade 6-12.

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

QuantumWizard888/How-to-add-user-dictionary-to-MeCab

How to add user dictionary to MeCab

Size: 42 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

Jumpst3r/printed-hw-segmentation

Printed and handwritten text segmentation using fully convolutional networks and CRF post-processing

Language: Python - Size: 181 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 6

wolfgarbe/WordSegmentationTM

Fast Word Segmentation with Triangular Matrix

Language: C# - Size: 1.22 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 67 - Forks: 12

Dobatymo/graphseg-python

Language: Python - Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 1

saud00/Line_and_Word_Segmentation_URDU

Line and ligature segmentation with corpus

Language: MATLAB - Size: 48.6 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

DhavalTaunk08/Text-Segmentation-in-Images

This project aimed to perform text segmentation in images using AutoEncoders.

Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

ReemHal/Semantic-Text-Segmentation-with-Embeddings

Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.

Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 11

evamaxfield/cue-queue 📦

Transcript segmentation using the average semantic encodings of cue sentences.

Language: Python - Size: 2.88 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Ronny12301/Separador-de-Palabras

El programa obtendrá las palabras de un archivo de texto plano y las dividirá en un archivo llamando igual que la letra inicial de la palabra, util si tienes un diccionario de palabras muy grande y lo quieres separar en archivos más pequeños. Utilice este codigo como apoyo para crear los archivos de mi Wordament Solver

Language: Java - Size: 1.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

iynehz/perl5-jieba

Perl wrapper for CppJieba (Chinese text segmentation)

Language: SWIG - Size: 38.1 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ravichoudhary98/CRFs_sanskrit_word_segmenation

word segmentation for Sanskrit text by CRF .

Language: Jupyter Notebook - Size: 3.85 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

hyunbool/Text-Segmentation

Text Segmentation 관련 논문 정리

Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 0

kushalchauhan98/ticket-segmentation

Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems

Size: 1.22 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 2

ravichoudhary98/CRFs_Bengali_word_segmenation

word segmentation for Bengali text by CRF .

Language: Jupyter Notebook - Size: 7.61 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

christophsk/segment-string

Demonstration of dynamic programming for segmenting strings into words. Just for fun!

Language: Python - Size: 1 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

sakmanal/ImgAnalysisToolkit

Image Analysis Toolkit for text document Binarization & Segmentation written in TypeScript.

Language: TypeScript - Size: 12 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AlinaOs/Medieval-Charter-Classification

Program that allows to detect and classify the segments of medieval royal charters according to their diplomatic formulae.

Language: Java - Size: 259 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

secsilm/text-segmentation-trap

一些容易被分词工具被分错的句子。

Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Ekan5h/TextSegmentation

Using Otsu's thresholding for text segmentation on images of sticky notes.

Language: HTML - Size: 916 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

kariminf/langpi

Language processing interface: some tools to process different natural languages

Language: Java - Size: 23.3 MB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

jessestuart/JSTextTiling

OSS Text Segmentation library.

Language: Groovy - Size: 67.4 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mremad/SpokenInputTopicDetection

Language: Python - Size: 46.8 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1

Related Keywords
text-segmentation 53 natural-language-processing 12 word-segmentation 11 nlp 11 segmentation 8 machine-learning 8 deep-learning 7 spelling-correction 7 spell-check 6 spellcheck 6 symspell 6 python 6 unicode 5 computer-vision 5 chinese-text-segmentation 4 spelling 4 image-processing 4 ocr 4 text-detection 3 nlp-machine-learning 3 text-classification 3 spelling-corrector 3 spell-corrector 3 text-processing 3 topic-segmentation 3 chinese-word-segmentation 3 fuzzy-matching 3 fuzzy-search 3 edit-distance 2 damerau-levenshtein 2 approximate-string-matching 2 levenshtein 2 levenshtein-distance 2 tokenization 2 crf 2 punctuation 2 spell-checker 2 spellchecker 2 chinese 2 neural-network 2 pytorch 2 manga 2 viterbi-algorithm 2 sentence-boundary-detection 2 dynamic-programming 2 fully-convolutional-networks 1 printed-handwritten-text 1 spelling-checker 1 graphseq 1 matlab 1 autoencoders 1 paraphrase 1 grammatical-correction 1 ipython-notebook 1 contextual-analysis 1 python3 1 embeddings 1 artifical-intelligence 1 semantic-segmentation 1 ai21 1 pyside6 1 sequence-segmentation 1 conversational-ai 1 wikipedia 1 choi 1 dataset 1 word 1 viterbi 1 unigram-model 1 unigram 1 machine-intelligence 1 dp 1 bigram-model 1 recipes 1 bigram 1 guide 1 mecab 1 text-improvement 1 conditional-random-fields 1 ostu-threshold 1 sauvola-threshold 1 typescript 1 web-workers 1 diplomatic 1 diplomatics 1 history 1 medieval-charters 1 chinese-nlp 1 text-analysis 1 otsu 1 otsu-thresholding 1 thresholding 1 preprocessing 1 stemming 1 word-tokenizing 1 wordnet 1 groovy 1 bilstm 1 deep-neural-networks 1 neural-networks 1