GitHub topics: text-normalization
jfilter/clean-text
🧹 Python package for text cleaning
Language: Python - Size: 157 KB - Last synced at: about 6 hours ago - Pushed at: about 2 years ago - Stars: 976 - Forks: 79

davedean/deslopify
A utility that cleans up text by removing or translating common 'slop' patterns from AI-generated text
Language: TypeScript - Size: 221 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

digitalcortex/newline_normalizer
Fast, precise normalization of Unix and DOS newline formats in Rust.
Language: Rust - Size: 254 KB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 0

Seavleu/khmer-utils
A 🇰🇭 utility library for number formatting, currency display, date localization, text normalization, and script transliteration, built for Cambodian developers.
Language: JavaScript - Size: 6.84 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

curegit/unicodecheck
Simple tool to check if Unicode text files are Unicode-normalized
Language: Python - Size: 52.7 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

Agash/TTSTextNormalization
Modern .NET 9 / C# 13 library to normalize text (emojis, currency, numbers, abbreviations, chat slang) for consistent and natural Text-to-Speech (TTS) synthesis, ideal for stream chat/donations.
Language: C# - Size: 138 KB - Last synced at: 4 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

NVIDIA/NeMo-text-processing
NeMo text processing for ASR and TTS
Language: Python - Size: 25.4 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 324 - Forks: 104

tomaarsen/TTSTextNormalization
Convert English text from written expressions into spoken forms
Language: Python - Size: 12 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 25 - Forks: 3

ducnt18121997/Viet-Text-Normalization
A Python library for text normalization, specifically designed for Vietnamese and English text processing. This library provides comprehensive text normalization capabilities including handling of special characters, numbers, dates, and various text formats.
Language: Python - Size: 26.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

esentis/string_extensions
Useful String extensions to save you time in production.
Language: Dart - Size: 636 KB - Last synced at: 17 minutes ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
Language: Cython - Size: 593 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 278 - Forks: 20

karan89200/NLP_Tasks
This repository is dedicated to providing comprehensive resources and code snippets for text preprocessing and various NLP tasks. Whether you're a beginner or an experienced data scientist, you'll find useful tools and techniques here to enhance your natural language processing projects.
Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

spyros-briakos/AI-Research-Assessment-TextNormalization-SongSimilarity
AI-Research-Assessment-TextNormalization-SongSimilarity
Language: Jupyter Notebook - Size: 6.79 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sugatagh/E-commerce-Text-Classification
Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.
Language: Jupyter Notebook - Size: 10.9 MB - Last synced at: 29 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 4

kscanne/caighdean
Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge
Language: Perl - Size: 58.7 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 18 - Forks: 4

ZRktty/accent-folding Fork of aristus/accent-folding
A JavaScript library for accent-insensitive text processing, including accent folding and search term highlighting
Language: JavaScript - Size: 1.04 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 1

cewarman/NTPU_online_text_normalization
An online text normalization tool for Chinese-English mixed text-to-speech system
Language: Python - Size: 83 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 2

Loc7/omnivore.schleifenbaum.ch
MT preprocessor
Language: CSS - Size: 591 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

vafaeim/ClipboardTranslator
Clipboard Translator is a lightweight desktop application built with PyQt5 that automatically translates text copied to the clipboard into Persian using the Google Translate API. The application features a modern and minimalistic UI, custom styling, and real-time text normalization and tokenization.
Language: Python - Size: 125 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

neelpy/SMS-Text-Normalization-HMM-MEMM
Implementation of the paper on Text normalization by Choudhury et al.
Language: Python - Size: 332 KB - Last synced at: 10 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

CAMeL-Lab/codafication
Code, models, and data for "Exploiting Dialect Identification in Automatic Dialectal Text Normalization". ArabicNLP 2024, ACL.
Language: Python - Size: 3.33 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

areeba0/English-to-French-Translation-using-NLTK-and-Hugging-Face-Transformers-MarianMTModel
This repository provides a complete workflow for text processing using Hugging Face Transformers and NLTK. It includes modules for sentence normalization, spelling correction, word embedding generation, positional encoding computation, and English-to-French translation
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

snakers4/russian_stt_text_normalization 📦
Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks
Language: Python - Size: 3.03 MB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 116 - Forks: 15

greenlikeorange/knayi-myscript
Myanmar Language Script Library
Language: JavaScript - Size: 1.99 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 76 - Forks: 20

cadia-lvl/althingi-asr
An ASR recipe and speech corpus of Icelandic parliamentary speeches
Language: Shell - Size: 14.3 MB - Last synced at: 12 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

vn33/Intensity-Analysis-EmotionClassification
Predict emotions (happiness, anger, sadness) from WhatsApp chat data using machine learning and deep learning models. Includes text normalization, vectorization (TF-IDF, BoW, Word2Vec, GloVe), and model evaluation.
Language: Jupyter Notebook - Size: 3.57 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

vn33/Ecommerce-Product-Categorization
Accurate categorization of eCommerce products improves user experience and boosts search engine visibility. The project goal is to classify products into 14 predefined categories using their descriptions sourced from an eCommerce platform.
Language: Jupyter Notebook - Size: 7.95 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Aayshashukla/SentimentAnalysis
Twitter Sentiment Analysis using Natural Language Processing(NLP)
Language: Jupyter Notebook - Size: 9.39 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

weezymatt/text-scrapbook
Welcome to my text scrapbook! Here you will find examples of text tokenization, normalization, n-grams, and lots of text adjacent stuff.
Language: Jupyter Notebook - Size: 4.79 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

khanhtran2000/FPT.AI_2020
My work during internship at FPT.AI 2020
Language: Jupyter Notebook - Size: 778 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

speechio/chinese_text_normalization
Chinese text normalization for speech processing
Language: Python - Size: 918 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 554 - Forks: 135

csebuetnlp/normalizer
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Language: Python - Size: 15.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 28 - Forks: 5

ajaytiwari0210/Normalization-of-Social-Media-Text
Size: 8.58 MB - Last synced at: almost 2 years ago - Pushed at: almost 8 years ago - Stars: 3 - Forks: 1

rafalposwiata/text-normalization
Repository for text normalization research.
Size: 2.51 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

Isminoula/TextNormSeq2Seq
Code and model files for paper: I. Lourentzou et al., Adapting Sequence to Sequence models for Text Normalization in Social Media", ICWSM'19
Language: Python - Size: 40 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 35 - Forks: 16

chanmratekoko/MMStringNormalizer Fork of ayehninnkhine/MMStringNormalizer
Language: Java - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

ecomp-shONgit/text-normalisation
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
Language: JavaScript - Size: 330 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 1

Rumeysakeskin/Preprocessing-Turkish-Text-Data
Preprocessing Turkish text data with cleaning (punctuations, special, accented and unicode characters) and normalizing (numbers, abbreviations)
Language: Jupyter Notebook - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

pgolo/sic
Utility for string normalization
Language: Python - Size: 9.31 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

mvakili/Tokenizer
Spelling corrector and text normalizer
Language: C# - Size: 15.5 MB - Last synced at: 28 days ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

princ3od/VietnamNumber
Library supports converting number to Vietnamese for .NET C# ./
Language: C# - Size: 14.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Bonniface/Text-CLeaning-And-Classification
Text classification is a widely used natural language processing task in different business problems. Given a statement or document, the task involves assigning to it an appropriate category from a pre-defined set of categories. The dataset of choice determines the set of categories. Text classification has applications in emotion classification, n
Language: Jupyter Notebook - Size: 8.34 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

kscanne/droichead
Nascanna idir Foclóir Uí Dhónaill agus DIL
Language: HTML - Size: 1.13 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 0

Amir79Naziri/TextNormalization_Project
Implementing text normalization for Farsi(Persian) language.
Language: Python - Size: 436 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

vietbt/ViTextnormASR
Our source code for the paper "Transformer-based Joint Learning Approach for Text Normalization in Vietnamese ASR"
Language: Python - Size: 5.46 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

amogh9594/Sentiment-Analysis
Sentiment-Analysis
Language: Jupyter Notebook - Size: 17.6 KB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

alanbracco/twnorm
Text Normalization on tweets (Tweet Normalization)
Language: Python - Size: 38.9 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

JasperHG90/Phonorm
Phonetic normalization using Recurrent Neural Networks
Language: Jupyter Notebook - Size: 222 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0
