Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: text-normalization
curegit/unicodecheck
Simple tool to check if Unicode text files are Unicode-normalized
Language: Python - Size: 44.9 KB - Last synced: 2 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
NVIDIA/NeMo-text-processing
NeMo text processing for ASR and TTS
Language: Python - Size: 33.5 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 211 - Forks: 67
snakers4/russian_stt_text_normalization 📦
Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks
Language: Python - Size: 3.03 MB - Last synced: 6 days ago - Pushed: about 3 years ago - Stars: 115 - Forks: 14
jfilter/clean-text
🧹 Python package for text cleaning
Language: Python - Size: 157 KB - Last synced: 23 days ago - Pushed: 12 months ago - Stars: 915 - Forks: 77
Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0
tomaarsen/TTSTextNormalization
Convert English text from written expressions into spoken forms
Language: Python - Size: 12 MB - Last synced: 1 day ago - Pushed: almost 2 years ago - Stars: 17 - Forks: 3
sugatagh/E-commerce-Text-Classification
Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.
Language: Jupyter Notebook - Size: 10.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 6 - Forks: 3
ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
Language: Cython - Size: 450 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 263 - Forks: 19
kscanne/caighdean
Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirà Gà idhlig/Gaelg→Gaeilge
Language: Perl - Size: 49.2 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 17 - Forks: 4
weezymatt/text-scrapbook
Welcome to my text scrapbook! Here you will find examples of text tokenization, normalization, n-grams, and lots of text adjacent stuff.
Language: Jupyter Notebook - Size: 4.79 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
esentis/string_extensions
Useful String extensions to save you time in production.
Language: Dart - Size: 627 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 5 - Forks: 2
greenlikeorange/knayi-myscript
Myanmar Language Script Library
Language: JavaScript - Size: 1.99 MB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 73 - Forks: 19
khanhtran2000/FPT.AI_2020
My work during internship at FPT.AI 2020
Language: Jupyter Notebook - Size: 778 KB - Last synced: 6 months ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0
speechio/chinese_text_normalization
Chinese text normalization for speech processing
Language: Python - Size: 918 KB - Last synced: 6 months ago - Pushed: about 1 year ago - Stars: 554 - Forks: 135
csebuetnlp/normalizer
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Language: Python - Size: 15.6 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 28 - Forks: 5
ajaytiwari0210/Normalization-of-Social-Media-Text
Size: 8.58 MB - Last synced: 9 months ago - Pushed: over 6 years ago - Stars: 3 - Forks: 1
rafalposwiata/text-normalization
Repository for text normalization research.
Size: 2.51 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 5 - Forks: 0
Isminoula/TextNormSeq2Seq
Code and model files for paper: I. Lourentzou et al., Adapting Sequence to Sequence models for Text Normalization in Social Media", ICWSM'19
Language: Python - Size: 40 KB - Last synced: 8 months ago - Pushed: almost 3 years ago - Stars: 35 - Forks: 16
chanmratekoko/MMStringNormalizer Fork of ayehninnkhine/MMStringNormalizer
Language: Java - Size: 3.91 KB - Last synced: 10 months ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0
cewarman/NTPU_online_text_normalization
An online text normalization tool for Chinese-English mixed text-to-speech system
Language: Python - Size: 81.1 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 5 - Forks: 2
ecomp-shONgit/text-normalisation
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
Language: JavaScript - Size: 330 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 10 - Forks: 1
Rumeysakeskin/Preprocessing-Turkish-Text-Data
Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)
Language: Jupyter Notebook - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
pgolo/sic
Utility for string normalization
Language: Python - Size: 9.31 MB - Last synced: 6 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
mvakili/Tokenizer
Spelling corrector and text normalizer
Language: C# - Size: 15.5 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0
princ3od/VietnamNumber
Library supports converting number to Vietnamese for .NET C# ./
Language: C# - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Bonniface/Text-CLeaning-And-Classification
Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n
Language: Jupyter Notebook - Size: 8.34 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
kscanne/droichead
Nascanna idir FoclĂłir UĂ DhĂłnaill agus DIL
Language: HTML - Size: 1.13 MB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 0
Amir79Naziri/TextNormalization_Project
Implementing text normalization for Farsi(Persian) language.
Language: Python - Size: 436 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
vietbt/ViTextnormASR
Our source code for the paper "Transformer-based Joint Learning Approach for Text Normalization in Vietnamese ASR"
Language: Python - Size: 5.46 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
cadia-lvl/althingi-asr
An ASR recipe and speech corpus of Icelandic parliamentary speeches
Language: Shell - Size: 14.3 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 2 - Forks: 0
amogh9594/Sentiment-Analysis
Sentiment-Analysis
Language: Jupyter Notebook - Size: 17.6 KB - Last synced: 11 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0
alanbracco/twnorm
Text Normalization on tweets (Tweet Normalization)
Language: Python - Size: 38.9 MB - Last synced: 8 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0
JasperHG90/Phonorm
Phonetic normalization using Recurrent Neural Networks
Language: Jupyter Notebook - Size: 222 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0