GitHub topics: text-segmentation
catalyst-team/catalyst
Accelerated deep learning R&D
Language: Python - Size: 52.6 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 3,355 - Forks: 395

sedflix/awesome-topic-segmentation
(yet another not really) awesome topic/text segmentation list
Size: 15.6 KB - Last synced at: 12 days ago - Pushed at: over 6 years ago - Stars: 109 - Forks: 13

ogkalu2/comic-translate
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
Language: Python - Size: 16.8 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 1,766 - Forks: 159

mammothb/symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Language: Python - Size: 5.76 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 827 - Forks: 124

blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Language: Python - Size: 4.27 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 2,515 - Forks: 336

rlayers/pawpaw
Text Processing & Segmentation Framework
Language: Python - Size: 2.52 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

wolfgarbe/SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Language: C# - Size: 12 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3,242 - Forks: 303

cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Language: Python - Size: 659 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 671 - Forks: 92

viig99/SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
Language: C++ - Size: 8.31 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 43 - Forks: 8

notAI-tech/deepsegment 📦
A sentence segmenter that actually works!
Language: Python - Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 306 - Forks: 55

ZumingHuang/awesome-ocr-resources
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).
Size: 10.5 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 418 - Forks: 72

ReubenBond/HanBaoBao
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
Language: Java - Size: 90 MB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 30 - Forks: 5

eskriett/spell
Spelling correction and string segmentation written in Go
Language: Go - Size: 50.8 KB - Last synced at: 5 days ago - Pushed at: 10 months ago - Stars: 27 - Forks: 5

google/emoji-segmenter
Emoji Segmenter
Language: C - Size: 35.2 KB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 64 - Forks: 15

DCY1117/MangaQuick
Automatic Manga Translator
Language: Jupyter Notebook - Size: 20 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 35 - Forks: 7

zamgi/lingvo--TextSegmenter
Text segmentation into separate words using a simple unigram model and the Viterbi algorithm
Language: C# - Size: 38 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 2

sigpwned/uax29
Java implementation of UAX#29 text segmentation algorithm
Language: Java - Size: 359 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

nitely/nim-segmentation
Unicode text segmentation (tr29)
Language: Nim - Size: 40 KB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 10 - Forks: 1

Yannael/automatic-video-chaptering
Automate video chaptering with LLMs and TF-IDF: Transform raw transcripts into well-structured documents
Language: Jupyter Notebook - Size: 2.48 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

npillmayer/uax
Unicode Text Segmentation Algorithms
Language: Go - Size: 1.81 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 2

sobir-git/tajik-text-segmentation
Tajik text segmentation algorithms
Language: Python - Size: 53.7 KB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

Feoramund/ucg
UTF-8 grapheme counting library written in C99.
Language: C - Size: 79.1 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Chayan-halder/WBSUBNdb_text---Bangla-handwritten-text-document-dataset
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.
Size: 1.49 GB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

muhammad-usman-108/ai21-sdk
A npm package designed specializing in Natural Language Processing, which develops AI systems that can understand and generate natural language.
Language: TypeScript - Size: 29.3 KB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

danburzo/ltr
Split text into chars, words, or sentences from the command line.
Language: JavaScript - Size: 15.6 KB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

arxiver/Onepiecelang
Text segmentation solution using natural language processing.
Language: Jupyter Notebook - Size: 1010 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

koomri/text-segmentation
Implementation of the paper: Text Segmentation as a Supervised Learning Task
Language: Python - Size: 4.79 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 227 - Forks: 55

shayneobrien/text-segmentation
Neural and nonneural text segmentation methods.
Language: Jupyter Notebook - Size: 256 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 1

rafaelhferreira/grounded_task_segmentation_cta
Repo for the paper "Grounded Complex Task Segmentation for Conversational Assistants" presented at SIGDIAL 2023
Language: Python - Size: 6.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

wolfgarbe/WordSegmentationDP
Word Segmentation with Dynamic Programming
Language: C# - Size: 1.21 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 20 - Forks: 5

MosesBomera/Feedback-Prize-Evaluating-Student-Writing
Analyzing argumentative writing elements from students grade 6-12.
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

QuantumWizard888/How-to-add-user-dictionary-to-MeCab
How to add user dictionary to MeCab
Size: 42 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1

Jumpst3r/printed-hw-segmentation
Printed and handwritten text segmentation using fully convolutional networks and CRF post-processing
Language: Python - Size: 181 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 6

wolfgarbe/WordSegmentationTM
Fast Word Segmentation with Triangular Matrix
Language: C# - Size: 1.22 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 67 - Forks: 12

Dobatymo/graphseg-python
Language: Python - Size: 5.86 KB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 1

saud00/Line_and_Word_Segmentation_URDU
Line and ligature segmentation with corpus
Language: MATLAB - Size: 48.6 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 0

DhavalTaunk08/Text-Segmentation-in-Images
This project aimed to perform text segmentation in images using AutoEncoders.
Language: Jupyter Notebook - Size: 2.12 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 4 - Forks: 0

ReemHal/Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.
Language: Jupyter Notebook - Size: 39.1 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 26 - Forks: 11

evamaxfield/cue-queue 📦
Transcript segmentation using the average semantic encodings of cue sentences.
Language: Python - Size: 2.88 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

Ronny12301/Separador-de-Palabras
El programa obtendrá las palabras de un archivo de texto plano y las dividirá en un archivo llamando igual que la letra inicial de la palabra, util si tienes un diccionario de palabras muy grande y lo quieres separar en archivos más pequeños. Utilice este codigo como apoyo para crear los archivos de mi Wordament Solver
Language: Java - Size: 1.3 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

iynehz/perl5-jieba
Perl wrapper for CppJieba (Chinese text segmentation)
Language: SWIG - Size: 38.1 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ravichoudhary98/CRFs_sanskrit_word_segmenation
word segmentation for Sanskrit text by CRF .
Language: Jupyter Notebook - Size: 3.85 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

hyunbool/Text-Segmentation
Text Segmentation 관련 논문 정리
Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 11 - Forks: 0

kushalchauhan98/ticket-segmentation
Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems
Size: 1.22 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 2

ravichoudhary98/CRFs_Bengali_word_segmenation
word segmentation for Bengali text by CRF .
Language: Jupyter Notebook - Size: 7.61 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

christophsk/segment-string
Demonstration of dynamic programming for segmenting strings into words. Just for fun!
Language: Python - Size: 1 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

sakmanal/ImgAnalysisToolkit
Image Analysis Toolkit for text document Binarization & Segmentation written in TypeScript.
Language: TypeScript - Size: 12 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

AlinaOs/Medieval-Charter-Classification
Program that allows to detect and classify the segments of medieval royal charters according to their diplomatic formulae.
Language: Java - Size: 259 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

secsilm/text-segmentation-trap
一些容易被分词工具被分错的句子。
Language: Jupyter Notebook - Size: 49.8 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

Ekan5h/TextSegmentation
Using Otsu's thresholding for text segmentation on images of sticky notes.
Language: HTML - Size: 916 KB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1

kariminf/langpi
Language processing interface: some tools to process different natural languages
Language: Java - Size: 23.3 MB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

jessestuart/JSTextTiling
OSS Text Segmentation library.
Language: Groovy - Size: 67.4 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

mremad/SpokenInputTopicDetection
Language: Python - Size: 46.8 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 1
