An open API service providing repository metadata for many open source software ecosystems.

Topic: "sentence-segmentation"

undertheseanlp/underthesea

Underthesea - Vietnamese NLP Toolkit

Language: Python - Size: 166 MB - Last synced at: about 10 hours ago - Pushed at: about 1 month ago - Stars: 1,517 - Forks: 281

natasha/natasha

Solves basic Russian NLP tasks, API for lower level Natasha projects

Language: Python - Size: 35.7 MB - Last synced at: 16 days ago - Pushed at: 6 months ago - Stars: 1,242 - Forks: 109

segment-any-text/wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Language: Python - Size: 83 MB - Last synced at: 6 days ago - Pushed at: 25 days ago - Stars: 987 - Forks: 56

nlp-uoregon/trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Language: Python - Size: 1.06 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 749 - Forks: 103

vncorenlp/VnCoreNLP

A Vietnamese natural language processing toolkit (NAACL 2018)

Language: Java - Size: 232 MB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 570 - Forks: 141

bitextor/bitextor

Bitextor generates translation memories from multilingual websites

Language: Python - Size: 177 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 290 - Forks: 43

natasha/razdel

Rule-based token, sentence segmentation for Russian language

Language: Python - Size: 37.2 MB - Last synced at: 22 days ago - Pushed at: almost 2 years ago - Stars: 264 - Forks: 32

milaan9/Python_Natural_Language_Processing

This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.

Language: Jupyter Notebook - Size: 182 KB - Last synced at: 17 days ago - Pushed at: almost 3 years ago - Stars: 198 - Forks: 174

ckiplab/ckipnlp

CKIP CoreNLP Toolkits

Language: Python - Size: 573 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 119 - Forks: 15

PKU-TANGENT/NeuralEDUSeg

A toolkit for discourse segmentation (EDU segmentation).

Language: Python - Size: 130 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 87 - Forks: 34

wikimedia/sentencex

A sentence segmentation library with wide language support optimized for speed and utility.

Language: Python - Size: 132 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 61 - Forks: 6

neelkamath/spacy-server 📦

🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec

Language: Python - Size: 87.9 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 58 - Forks: 13

UglyToad/PragmaticSegmenterNet

Port of PragmaticSegmenter for sentence boundary detection

Language: C# - Size: 209 KB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 35 - Forks: 12

sentencizer/sentencizer

A sentence splitting (sentence boundary disambiguation) library for Go. It is rule-based and works out-of-the-box.

Language: Go - Size: 1.83 MB - Last synced at: 5 days ago - Pushed at: 21 days ago - Stars: 31 - Forks: 6

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

Language: Python - Size: 2.48 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 7

hellonlp/hellonlp

NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现

Language: Python - Size: 43.9 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 8

mtreviso/deepbond

Deep neural approach to Boundary and Disfluency Detection - Based on my Master's work

Language: Python - Size: 734 KB - Last synced at: 21 days ago - Pushed at: 9 months ago - Stars: 19 - Forks: 2

bureaucratic-labs/models 📦

Pre-trained models for tokenization, sentence segmentation and so on

Language: Python - Size: 8.02 MB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 15 - Forks: 5

superlinear-ai/wtpsplit-lite

✂️ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) models

Language: Python - Size: 128 KB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 0

tc64/spacyss

Sentence Segmentation for Spacy

Language: Python - Size: 12.7 KB - Last synced at: 13 days ago - Pushed at: over 6 years ago - Stars: 9 - Forks: 1

KMiNT21/html2sent

HTML2SENT modifies HTML to improve sentences tokenizer quality

Language: Python - Size: 44.9 KB - Last synced at: 25 days ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 2

StarlangSoftware/Corpus

Corpus processing library

Language: Java - Size: 3.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 4

mkartawijaya/hasami

A tool to perform sentence segmentation on Japanese text

Language: Python - Size: 19.5 KB - Last synced at: 20 days ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 0

undertheseanlp/sent_tokenize

Vietnamese Sentence Boundary Detection

Language: Python - Size: 1.62 MB - Last synced at: 22 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 5

StarlangSoftware/Corpus-CPP

Corpus processing library

Language: C++ - Size: 16.4 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 4 - Forks: 1

seanghay/khmerpunctuate

Punctuation Restoration for Khmer language

Language: Python - Size: 2.69 MB - Last synced at: 19 days ago - Pushed at: 9 months ago - Stars: 4 - Forks: 1

amorgun/russian-nlp-pretrained-models Fork of bureaucratic-labs/models

Pre-trained models for tokenization, sentence segmentation and so on

Language: Python - Size: 8.02 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 2

StarlangSoftware/Corpus-Py

Corpus processing library

Language: Python - Size: 2.17 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 3 - Forks: 8

mbanon/benchmarks

Several benchmarks on sentence splitting and language identification

Language: Mathematica - Size: 35.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Michael95-m/mya-sent-break

Sentence segmentation for burmese language by rule-based method

Language: Python - Size: 42 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

StarlangSoftware/Corpus-Js

Corpus Processing Library

Language: TypeScript - Size: 2.14 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

wikimedia/sentencex-go

A sentence segmentation library with wide language support optimized for speed and utility.

Language: Go - Size: 152 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

forestluo/NLDBApplication

NLP applications for NLDB v2 and v3.

Language: C# - Size: 3.34 MB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

eaklykova/syntaxcomp

A Python3 package for extracting syntactic complexity measures from CoNLL-U annotations.

Language: Python - Size: 29.3 KB - Last synced at: 11 days ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

behitek/vncorenlp-wrapper 📦

A python wrapper for VnCoreNLP

Language: Python - Size: 135 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

maggieezzat/Covid19-Semantic-based-Search

Semantic-based search using word embedding to help the medical community develop answers to high priority scientific questions using Kaggle's CORD-19 dataset. This repository is part of Kaggle's CORD-19 challenge: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Language: Jupyter Notebook - Size: 26.7 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

MeryllEssig/sentence-extractor

Extracts sentences from txt files.

Language: JavaScript - Size: 2.93 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

StarlangSoftware/Corpus-Cy

Corpus Processing Library

Language: Cython - Size: 2.23 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 0

pngo1997/Text-Processing-Tokenization

Simple text analysis and tokenization.

Language: Jupyter Notebook - Size: 185 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

nature-of-eu-rules/data-preprocessing

Document preprocessing scripts for the Nature of EU Rules project

Language: Python - Size: 123 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

luxiant/sentence_segmentation

A rule-based sentence_segmenter, inspired by ruby pragmatic segmenter by diasks2 (repo: https://github.com/diasks2/pragmatic_segmenter)

Language: Rust - Size: 35.6 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

echogarden-project/text-segmentation

A library for multilingual word, phrase and sentence segmentation.

Language: TypeScript - Size: 22.5 KB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

echogarden-project/icu-segmentation-wasm

WebAssembly port of the ICU library's character, word, line-break, and sentence segmentation methods.

Language: C - Size: 27.1 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

StarlangSoftware/Corpus-Swift

Corpus processing library

Language: Swift - Size: 2.11 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 1

minseok0809/korean-sentence-segementation

AIHub 한국어 데이터 전처리: 한국어 문장 분리

Language: Jupyter Notebook - Size: 2.61 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ZJaume/splitters

A CLI for Rust SRX sentence segmenation rules as Python package.

Language: Rust - Size: 68.4 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

vladmycode/NLPSplitSentencesDeepLAPI

Reverse engineering technique to access DeepL's advanced natural language processing features.

Language: Python - Size: 3.91 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

StarlangSoftware/Corpus-CS

Corpus processing library

Language: C# - Size: 2.14 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

kwankoravich/NLP_tagging

Deploying CRF model to predict NER and Sentence Segmentation Tagging in Thai corpus via Heroku and Streamlit

Language: Python - Size: 3.79 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

tainvecs/sentence-splitter 📦

A simple sentence splitter based on Regex

Language: Clojure - Size: 12.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 2

NgoJunHaoJason/CZ4045 📦

Natural Language Processing assignments

Language: Jupyter Notebook - Size: 291 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 6

MBadriNarayanan/NaturalLanguageProcessing 📦

Course offered by Udemy . Created and taught by Ankit Mistry, Vijay Gadhave, Data Science & Machine Learning Academy.

Language: Jupyter Notebook - Size: 6.56 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1

chozelinek/wottw

Wrapper of TreeTaggerWrapper

Language: Python - Size: 57.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

YuriyBereguliak/QAGenerator

This is a simple project of building custom training and model data for Apache OpeNLP library. The main task is recognizing Ukrainian texts and building helpful questions and theses.

Language: Kotlin - Size: 1.85 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Related Topics
nlp 25 python 14 natural-language-processing 13 sentence-boundary-detection 13 sentence-tokenizer 11 tokenization 11 corpus-processing 8 turkish-sentence-tokenizer 7 turkish-sentence-segmentation 7 named-entity-recognition 7 ner 6 word-segmentation 5 spacy 4 nltk 4 tokenizer 4 sentence-splitting 4 pos-tagging 4 vietnamese-nlp 4 machine-learning 4 dependency-parsing 3 lemmatization 3 sentence-segmenter 3 vietnamese 3 preprocessing 3 vietnamese-tokenizer 3 word-segmenter 2 pretrained-models 2 parsing 2 word-boundary 2 syntax 2 russian 2 stemming 2 vocabulary-matching 2 word-embeddings 2 sentence 2 sentence-splitter 2 xlm-roberta 2 part-of-speech-tagging 2 multilingual 2 text-preprocessing 2 conditional-random-fields 2 russian-specific 2 nlp-machine-learning 2 pos 2 lstm 2 nlp-library 2 pytorch 2 phrase-segmentation 1 syntactic-complexity 1 universal-dependencies 1 conllu 1 ai 1 golang 1 llm 1 rag 1 retrieval-augmented-generation 1 phrase-boundary 1 text-splitter 1 text-splitting 1 european-union 1 html 1 law 1 legislation 1 lexnlp 1 pdf 1 english 1 spanish 1 treetagger 1 treetaggerwrapper 1 xml 1 udpipe 1 text-analysis 1 text-complexity 1 entropy 1 new-word-discovery 1 benchmark 1 language-identification 1 text-mining 1 adapters 1 artificial-intelligence 1 deeplearning 1 language-model 1 morphological-tagging 1 brazilian-portuguese 1 termfrequency 1 tf-idf 1 punctuation-restoration 1 tutor-milaan9 1 khmer-punct 1 khmer-language 1 wtpsplit 1 khmer 1 embeddings 1 morphology 1 deep-learning 1 visualization 1 character-segmentation 1 reverse-engineering 1 api 1 api-rest 1