Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-normalization

curegit/unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Language: Python - Size: 44.9 KB - Last synced: 2 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0

NVIDIA/NeMo-text-processing

NeMo text processing for ASR and TTS

Language: Python - Size: 33.5 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 211 - Forks: 67

snakers4/russian_stt_text_normalization 📦

Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks

Language: Python - Size: 3.03 MB - Last synced: 6 days ago - Pushed: about 3 years ago - Stars: 115 - Forks: 14

jfilter/clean-text

🧹 Python package for text cleaning

Language: Python - Size: 157 KB - Last synced: 23 days ago - Pushed: 12 months ago - Stars: 915 - Forks: 77

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0

tomaarsen/TTSTextNormalization

Convert English text from written expressions into spoken forms

Language: Python - Size: 12 MB - Last synced: 1 day ago - Pushed: almost 2 years ago - Stars: 17 - Forks: 3

sugatagh/E-commerce-Text-Classification

Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.

Language: Jupyter Notebook - Size: 10.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 6 - Forks: 3

ikegami-yukino/neologdn

Japanese text normalizer for mecab-neologd

Language: Cython - Size: 450 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 263 - Forks: 19

kscanne/caighdean

Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge

Language: Perl - Size: 49.2 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 17 - Forks: 4

weezymatt/text-scrapbook

Welcome to my text scrapbook! Here you will find examples of text tokenization, normalization, n-grams, and lots of text adjacent stuff.

Language: Jupyter Notebook - Size: 4.79 MB - Last synced: 3 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

esentis/string_extensions

Useful String extensions to save you time in production.

Language: Dart - Size: 627 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 5 - Forks: 2

greenlikeorange/knayi-myscript

Myanmar Language Script Library

Language: JavaScript - Size: 1.99 MB - Last synced: 12 days ago - Pushed: about 1 year ago - Stars: 73 - Forks: 19

khanhtran2000/FPT.AI_2020

My work during internship at FPT.AI 2020

Language: Jupyter Notebook - Size: 778 KB - Last synced: 6 months ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0

speechio/chinese_text_normalization

Chinese text normalization for speech processing

Language: Python - Size: 918 KB - Last synced: 6 months ago - Pushed: about 1 year ago - Stars: 554 - Forks: 135

csebuetnlp/normalizer

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

Language: Python - Size: 15.6 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 28 - Forks: 5

ajaytiwari0210/Normalization-of-Social-Media-Text

Size: 8.58 MB - Last synced: 9 months ago - Pushed: over 6 years ago - Stars: 3 - Forks: 1

rafalposwiata/text-normalization

Repository for text normalization research.

Size: 2.51 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 5 - Forks: 0

Isminoula/TextNormSeq2Seq

Code and model files for paper: I. Lourentzou et al., Adapting Sequence to Sequence models for Text Normalization in Social Media", ICWSM'19

Language: Python - Size: 40 KB - Last synced: 8 months ago - Pushed: almost 3 years ago - Stars: 35 - Forks: 16

chanmratekoko/MMStringNormalizer Fork of ayehninnkhine/MMStringNormalizer

Language: Java - Size: 3.91 KB - Last synced: 10 months ago - Pushed: almost 6 years ago - Stars: 1 - Forks: 0

cewarman/NTPU_online_text_normalization

An online text normalization tool for Chinese-English mixed text-to-speech system

Language: Python - Size: 81.1 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 5 - Forks: 2

ecomp-shONgit/text-normalisation

JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin

Language: JavaScript - Size: 330 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 10 - Forks: 1

Rumeysakeskin/Preprocessing-Turkish-Text-Data

Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)

Language: Jupyter Notebook - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

pgolo/sic

Utility for string normalization

Language: Python - Size: 9.31 MB - Last synced: 6 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

mvakili/Tokenizer

Spelling corrector and text normalizer

Language: C# - Size: 15.5 MB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

princ3od/VietnamNumber

Library supports converting number to Vietnamese for .NET C# ./

Language: C# - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Bonniface/Text-CLeaning-And-Classification

Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n

Language: Jupyter Notebook - Size: 8.34 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

kscanne/droichead

Nascanna idir FoclĂłir UĂ­ DhĂłnaill agus DIL

Language: HTML - Size: 1.13 MB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 0

Amir79Naziri/TextNormalization_Project

Implementing text normalization for Farsi(Persian) language.

Language: Python - Size: 436 KB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

vietbt/ViTextnormASR

Our source code for the paper "Transformer-based Joint Learning Approach for Text Normalization in Vietnamese ASR"

Language: Python - Size: 5.46 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

cadia-lvl/althingi-asr

An ASR recipe and speech corpus of Icelandic parliamentary speeches

Language: Shell - Size: 14.3 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 2 - Forks: 0

amogh9594/Sentiment-Analysis

Sentiment-Analysis

Language: Jupyter Notebook - Size: 17.6 KB - Last synced: 11 months ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

alanbracco/twnorm

Text Normalization on tweets (Tweet Normalization)

Language: Python - Size: 38.9 MB - Last synced: 8 months ago - Pushed: over 5 years ago - Stars: 0 - Forks: 0

JasperHG90/Phonorm

Phonetic normalization using Recurrent Neural Networks

Language: Jupyter Notebook - Size: 222 MB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

Related Keywords
text-normalization 33 nlp 7 natural-language-processing 7 text-cleaning 5 tokenization 3 text-to-speech 3 text-preprocessing 3 tokenizer 2 text-processing 2 sequence-to-sequence 2 nlp-machine-learning 2 unicode 2 irish 2 speech-recognition 2 tweets 2 string-normalization 2 deep-learning 2 kaldi-asr 2 chinese 2 text-classification 2 pytorch 2 myanmar 2 speech-to-text 2 gaeilge 2 java 1 seq2seq 1 encoder-decoder 1 burmese 1 polish-language 1 recurrent-neural-networks 1 numbers-to-text 1 spelling-correction 1 neural-network 1 bengali-text-normalization 1 bangla-text-normalization 1 thrax-gramma 1 sparrowhawk 1 asr 1 word-segmentation 1 word-embeddings 1 internship 1 collection 1 phonetic-algorithms 1 twitter 1 sentiment-classification 1 sentiment-analysis 1 icelandic 1 althingi 1 joint-learning 1 automatic-speech-recognition 1 sean-ghaeilge 1 old-irish 1 etymology 1 dictionary 1 vietnamese 1 number 1 nuget 1 csharp 1 rule-based-nlp 1 turkish-language 1 data-processing 1 romanization 1 polytonic-greek-and-latin 1 greek-trasliteration 1 greek-latin 1 traditional-chinese 1 rule-based 1 online 1 mandarin 1 english 1 myanmar-text-normalizer 1 bio-tagging 1 tf-idf 1 product-categorization 1 e-commerce 1 tts 1 spoken-forms 1 normalization 1 competition 1 text-tokenization 1 text-lemmatization 1 text-extraction 1 stopwords-removal 1 stemming 1 requests 1 lemmatization 1 extract-html 1 data-extraction 1 beautifulsoup 1 user-generated-content 1 scraping 1 python-package 1 python 1 torchscript 1 speech 1 russian-language 1 python3 1 inverse-text-n 1 character-encoding 1 artificial-intelligence 1