GitHub topics: sentence-splitting
mediacloud/sentence-splitter
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Language: Python - Size: 45.9 KB - Last synced at: about 8 hours ago - Pushed at: almost 3 years ago - Stars: 253 - Forks: 33

vngrs-ai/vnlp
State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.
Language: Python - Size: 392 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 267 - Forks: 17

sentencizer/sentencizer
A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.
Language: Go - Size: 1.85 MB - Last synced at: 7 days ago - Pushed at: 12 days ago - Stars: 42 - Forks: 8

adobe/NLP-Cube
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Language: HTML - Size: 11.1 MB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 560 - Forks: 95

erre-quadro/spikex
SpikeX - SpaCy Pipes for Knowledge Extraction
Language: Python - Size: 3.43 MB - Last synced at: 7 days ago - Pushed at: about 4 years ago - Stars: 399 - Forks: 28

Prismadic/magnet
the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly
Language: Python - Size: 11.8 MB - Last synced at: 15 days ago - Pushed at: 11 months ago - Stars: 31 - Forks: 3

zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
Language: Python - Size: 2.48 MB - Last synced at: about 18 hours ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 7

jparkerweb/splitter-vs-splitter
🪓 simple app to pit two sentence splitters against one another to understand their differences
Language: JavaScript - Size: 271 KB - Last synced at: 5 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

KorAP/Datok
High-Performance Finite State Tokenizer
Language: Go - Size: 124 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 0

astariul/Sentencize.jl
Smallish library for sentence splitting in Julia
Language: Julia - Size: 256 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 4 - Forks: 3

ZJaume/splitters
A CLI for Rust SRX sentence segmenation rules as Python package.
Language: Rust - Size: 68.4 KB - Last synced at: 5 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

mbanon/benchmarks
Several benchmarks on sentence splitting and language identification
Language: Mathematica - Size: 35.7 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

kimryan/Lingua-EN-Sentence
split text into sentences (a Perl module)
Language: Perl - Size: 29.3 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 3

M4t1ss/chunker
A sentence chunker PHP class + visualizer for Berkeley Parser parse trees
Language: PHP - Size: 23.5 MB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 2 - Forks: 0
