An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-normalization

jfilter/clean-text

🧹 Python package for text cleaning

Language: Python - Size: 157 KB - Last synced at: about 6 hours ago - Pushed at: about 2 years ago - Stars: 976 - Forks: 79

davedean/deslopify

A utility that cleans up text by removing or translating common 'slop' patterns from AI-generated text

Language: TypeScript - Size: 221 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

digitalcortex/newline_normalizer

Fast, precise normalization of Unix and DOS newline formats in Rust.

Language: Rust - Size: 254 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

Seavleu/khmer-utils

A 🇰🇭 utility library for number formatting, currency display, date localization, text normalization, and script transliteration, built for Cambodian developers.

Language: JavaScript - Size: 6.84 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

curegit/unicodecheck

Simple tool to check if Unicode text files are Unicode-normalized

Language: Python - Size: 52.7 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

Agash/TTSTextNormalization

Modern .NET 9 / C# 13 library to normalize text (emojis, currency, numbers, abbreviations, chat slang) for consistent and natural Text-to-Speech (TTS) synthesis, ideal for stream chat/donations.

Language: C# - Size: 138 KB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

NVIDIA/NeMo-text-processing

NeMo text processing for ASR and TTS

Language: Python - Size: 25.4 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 324 - Forks: 104

tomaarsen/TTSTextNormalization

Convert English text from written expressions into spoken forms

Language: Python - Size: 12 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 25 - Forks: 3

ducnt18121997/Viet-Text-Normalization

A Python library for text normalization, specifically designed for Vietnamese and English text processing. This library provides comprehensive text normalization capabilities including handling of special characters, numbers, dates, and various text formats.

Language: Python - Size: 26.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

esentis/string_extensions

Useful String extensions to save you time in production.

Language: Dart - Size: 636 KB - Last synced at: 17 minutes ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

ikegami-yukino/neologdn

Japanese text normalizer for mecab-neologd

Language: Cython - Size: 593 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 278 - Forks: 20

karan89200/NLP_Tasks

This repository is dedicated to providing comprehensive resources and code snippets for text preprocessing and various NLP tasks. Whether you're a beginner or an experienced data scientist, you'll find useful tools and techniques here to enhance your natural language processing projects.

Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

spyros-briakos/AI-Research-Assessment-TextNormalization-SongSimilarity

AI-Research-Assessment-TextNormalization-SongSimilarity

Language: Jupyter Notebook - Size: 6.79 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sugatagh/E-commerce-Text-Classification

Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.

Language: Jupyter Notebook - Size: 10.9 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 4

kscanne/caighdean

Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge

Language: Perl - Size: 58.7 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 4

ZRktty/accent-folding Fork of aristus/accent-folding

A JavaScript library for accent-insensitive text processing, including accent folding and search term highlighting

Language: JavaScript - Size: 1.04 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 1

cewarman/NTPU_online_text_normalization

An online text normalization tool for Chinese-English mixed text-to-speech system

Language: Python - Size: 83 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 2

Loc7/omnivore.schleifenbaum.ch

MT preprocessor

Language: CSS - Size: 591 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vafaeim/ClipboardTranslator

Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.

Language: Python - Size: 125 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

neelpy/SMS-Text-Normalization-HMM-MEMM

Implementation of the paper on Text normalization by Choudhury et al.

Language: Python - Size: 332 KB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

CAMeL-Lab/codafication

Code, models, and data for "Exploiting Dialect Identification in Automatic Dialectal Text Normalization". ArabicNLP 2024, ACL.

Language: Python - Size: 3.33 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

areeba0/English-to-French-Translation-using-NLTK-and-Hugging-Face-Transformers-MarianMTModel

This repository provides a complete workflow for text processing using Hugging Face Transformers and NLTK. It includes modules for sentence normalization, spelling correction, word embedding generation, positional encoding computation, and English-to-French translation

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

snakers4/russian_stt_text_normalization 📦

Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks

Language: Python - Size: 3.03 MB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 116 - Forks: 15

greenlikeorange/knayi-myscript

Myanmar Language Script Library

Language: JavaScript - Size: 1.99 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 76 - Forks: 20

cadia-lvl/althingi-asr

An ASR recipe and speech corpus of Icelandic parliamentary speeches

Language: Shell - Size: 14.3 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

vn33/Intensity-Analysis-EmotionClassification

Predict emotions (happiness, anger, sadness) from WhatsApp chat data using machine learning and deep learning models. Includes text normalization, vectorization (TF-IDF, BoW, Word2Vec, GloVe), and model evaluation.

Language: Jupyter Notebook - Size: 3.57 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

vn33/Ecommerce-Product-Categorization

Accurate categorization of eCommerce products improves user experience and boosts search engine visibility. The project goal is to classify products into 14 predefined categories using their descriptions sourced from an eCommerce platform.

Language: Jupyter Notebook - Size: 7.95 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Aayshashukla/SentimentAnalysis

Twitter Sentiment Analysis using Natural Language Processing(NLP)

Language: Jupyter Notebook - Size: 9.39 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

weezymatt/text-scrapbook

Welcome to my text scrapbook! Here you will find examples of text tokenization, normalization, n-grams, and lots of text adjacent stuff.

Language: Jupyter Notebook - Size: 4.79 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

khanhtran2000/FPT.AI_2020

My work during internship at FPT.AI 2020

Language: Jupyter Notebook - Size: 778 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

speechio/chinese_text_normalization

Chinese text normalization for speech processing

Language: Python - Size: 918 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 554 - Forks: 135

csebuetnlp/normalizer

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

Language: Python - Size: 15.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 28 - Forks: 5

ajaytiwari0210/Normalization-of-Social-Media-Text

Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 3 - Forks: 1

rafalposwiata/text-normalization

Repository for text normalization research.

Size: 2.51 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

Isminoula/TextNormSeq2Seq

Code and model files for paper: I. Lourentzou et al., Adapting Sequence to Sequence models for Text Normalization in Social Media", ICWSM'19

Language: Python - Size: 40 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 35 - Forks: 16

chanmratekoko/MMStringNormalizer Fork of ayehninnkhine/MMStringNormalizer

Language: Java - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

ecomp-shONgit/text-normalisation

JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin

Language: JavaScript - Size: 330 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

Rumeysakeskin/Preprocessing-Turkish-Text-Data

Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)

Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

pgolo/sic

Utility for string normalization

Language: Python - Size: 9.31 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

mvakili/Tokenizer

Spelling corrector and text normalizer

Language: C# - Size: 15.5 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

princ3od/VietnamNumber

Library supports converting number to Vietnamese for .NET C# ./

Language: C# - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Bonniface/Text-CLeaning-And-Classification

Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n

Language: Jupyter Notebook - Size: 8.34 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

kscanne/droichead

Nascanna idir Foclóir Uí Dhónaill agus DIL

Language: HTML - Size: 1.13 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

Amir79Naziri/TextNormalization_Project

Implementing text normalization for Farsi(Persian) language.

Language: Python - Size: 436 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

vietbt/ViTextnormASR

Our source code for the paper "Transformer-based Joint Learning Approach for Text Normalization in Vietnamese ASR"

Language: Python - Size: 5.46 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

amogh9594/Sentiment-Analysis

Sentiment-Analysis

Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

alanbracco/twnorm

Text Normalization on tweets (Tweet Normalization)

Language: Python - Size: 38.9 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

JasperHG90/Phonorm

Phonetic normalization using Recurrent Neural Networks

Language: Jupyter Notebook - Size: 222 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

Related Keywords
text-normalization 49 nlp 14 natural-language-processing 10 text-to-speech 5 text-classification 5 text-cleaning 5 tokenization 4 deep-learning 4 python 4 text-processing 4 text-preprocessing 3 nlp-machine-learning 3 npm-package 2 chinese 2 unicode 2 tweets 2 tts 2 translation 2 irish 2 sequence-to-sequence 2 artificial-intelligence 2 gaeilge 2 word2vec 2 stemming 2 string-manipulation 2 string-normalization 2 tokenizer 2 lemmatization 2 speech-to-text 2 pytorch 2 myanmar 2 ai 2 word-embeddings 2 kaldi-asr 2 speech-recognition 2 extract-html 1 data-extraction 1 positional-encoding 1 beautifulsoup 1 python3 1 twitter-data 1 requests 1 text-mining 1 russian-language 1 logistic-regression 1 speech 1 kaggle 1 number-formatting 1 bidirectional-lstm 1 countvectorizer 1 emotion-classification 1 glove-embeddings 1 hyperparameter-tuning 1 icelandic 1 althingi 1 machine-learning 1 tf-idf-vectorizer 1 zawgyi 1 fontdetect 1 fontconvert 1 word2vec-embeddinngs 1 burmese-nlp 1 ecommerce 1 streamlit-webapp 1 torchscript 1 myanmar-text-normalizer 1 greek-latin 1 greek-trasliteration 1 polytonic-greek-and-latin 1 romanization 1 data-processing 1 turkish-language 1 rule-based-nlp 1 csharp 1 nuget 1 number 1 vietnamese 1 dictionary 1 etymology 1 old-irish 1 sean-ghaeilge 1 automatic-speech-recognition 1 joint-learning 1 sentiment-analysis 1 sentiment-classification 1 twitter 1 phonetic-algorithms 1 recurrent-neural-networks 1 stopwords-removal 1 text-extraction 1 text-lemmatization 1 text-tokenization 1 n-grams 1 perl 1 bio-tagging 1 collection 1 internship 1 word-segmentation 1 asr 1 sparrowhawk 1