GitHub topics: text-processing
finnjest/Realm
Advanced Text Processing Tool
Language: AutoHotkey - Size: 253 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

moltenib/md-to-html
Sed script that converts Markdown to HTML code.
Language: sed - Size: 106 KB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 2

codingkush/ChatSense
ChatSense — A chat analyzer app that quickly summarizes and analyzes WhatsApp chat exports with a clean, easy-to-use interface.
Language: Python - Size: 17.6 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Pranav-Patel-123/GenAI
Language: TypeScript - Size: 102 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

GateNLP/python-gatenlp
Python text processing, pattern matching, and NLP framework
Language: Jupyter Notebook - Size: 19.4 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 66 - Forks: 8

farhad-here/Persian_Text_Processing
It is Persian Text processing with parsivar library
Language: Python - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

Pranav-Patel-123/WaY-scrapping
web and youtube scrapping for the given input like a search engine that brings links and text from web and youtube.
Language: Python - Size: 6.84 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

Edopramudya/Sentiment-Text-Clustering
Proyek ini berfokus pada preprocessing dan clustering data teks dari dataset sentimen. Dataset yang digunakan berisi teks dan label sentimen (positif, negatif, netral), dan dilakukan pembersihan teks sebelum proses klastering.
Language: Jupyter Notebook - Size: 729 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

acarl005/stripansi
A little Go package for removing ANSI color escape codes from strings.
Language: Go - Size: 1.95 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 135 - Forks: 16

LucasGoncSilva/mosheh
Mosheh, a tool for creating docs for projects, from Python to Python.
Language: Python - Size: 1.46 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 8 - Forks: 1

airbnb/artificial-adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Language: Python - Size: 116 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

open-i18n/rust-unic
UNIC: Unicode and Internationalization Crates for Rust
Language: Rust - Size: 14.1 MB - Last synced at: 15 days ago - Pushed at: 23 days ago - Stars: 241 - Forks: 24

IoeCmcomc/chiecthuyenngoaixa
An utility library for processing Vietnamese texts
Language: Python - Size: 235 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

parksb/ised
An interactive tool for find-and-replace across many files
Language: Rust - Size: 473 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

casics/nostril 📦
Nostril: Nonsense String Evaluator
Language: Python - Size: 143 MB - Last synced at: about 10 hours ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 35

PellaML/Markdown-Renderer
Enhanced Markdown Renderer: A versatile and extensible JavaScript-based Markdown rendering and parsing library, leveraging Abstract Syntax Trees (AST) for efficient processing and customizable output. Open-source and community-driven, with a focus on future improvements and contributions.
Language: JavaScript - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

aditiiprasad/WhatsStat
A fun and insightful WhatsApp chat analyzer that turns your conversations into beautiful stats, juicy graphs, and quirky insights.
Language: Python - Size: 1.27 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

ronnmabunga/scanpad
ScanPad is an OCR-powered notepad that extracts text from images and lets you edit, organize, and export documents. It features a rich text editor, multiple input methods, and a responsive user interface design.
Language: JavaScript - Size: 354 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

victoryosiobe/kingchop
Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.
Language: JavaScript - Size: 85.9 KB - Last synced at: about 13 hours ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

yui-mhcp/data_processing
Data processing utilities in keras3
Language: Jupyter Notebook - Size: 86.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 1

ucd-dnp/ConTexto
Librería en Python para minería de texto y NLP
Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 49 - Forks: 14

Bikatr7/Kudasai
Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies
Language: Python - Size: 90.4 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 4

KashifMoin1410/Text-Sentiment-Analysis
This project analyzes tweet sentiments using both traditional machine learning (Logistic Regression, Ridge, XGBoost) and deep learning (LSTM) models. The workflow covers text preprocessing, feature engineering, model training, and evaluation. Logistic Regression achieved an R² score of 0.80, while the LSTM model reached ~76% validation accuracy.
Language: Jupyter Notebook - Size: 3.58 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Hunter2718/kitten
A safe and modern clone of the Unix cat command written in Rust
Language: Rust - Size: 17.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

MarjovanLier/StringManipulation
PHP 8.3+ string manipulation library with accent removal, UTF-8 conversion, name formatting & date validation. Fully typed, 100% tested, production-ready.
Language: PHP - Size: 351 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

dlite-tools/NLPiper
NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.
Language: Python - Size: 165 KB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 1

CyberCRI/refinedoc
python library for post-extraction refinement of text that may be derived from PDF extraction.
Language: Python - Size: 19.5 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 2

derek73/python-nameparser
A simple Python module for parsing human names into their individual components
Language: Python - Size: 778 KB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 674 - Forks: 105

ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Language: Python - Size: 537 KB - Last synced at: 30 days ago - Pushed at: 6 months ago - Stars: 329 - Forks: 31

pemistahl/lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Language: Go - Size: 226 MB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 1,245 - Forks: 68

amirivojdan/shekar
Simplifying Persian NLP for Everyone
Language: Python - Size: 2.84 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 1

Cicatriiz/text-toolkit
Advanced MCP server providing comprehensive text transformation and formatting tools. TextToolkit offers over 40 specialized utilities for case conversion, encoding/decoding, formatting, analysis, and text manipulation - all accessible directly within your AI assistant workflow.
Language: TypeScript - Size: 1.37 MB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

michaelarutyunov/Earning-Calls-pdf-to-json-transformation
Demonstration of the effective use of LLM to transform earning call transcripts from PDF to JSON
Language: Python - Size: 12.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

haven-jeon/PyKoSpacing
Automatic Korean word spacing with Python
Language: Python - Size: 4.53 MB - Last synced at: 15 days ago - Pushed at: 12 months ago - Stars: 414 - Forks: 114

fastnlp/fastNLP
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Language: Python - Size: 35.1 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 3,132 - Forks: 449

MahtaFetrat/Persian-Informal-Text-Detector
Python package for detecting informal Persian text using regular expressions and rule-based methods
Language: Python - Size: 21.5 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

lykmapipo/US-Inaugural-Addresses
Python scripts to download, process, and analyze US Inaugural Addresses
Language: Python - Size: 4.45 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

kgruiz/PyTokenCounter
A simple Python library for tokenizing text and counting tokens. While currently only supporting OpenAI LLMs, it helps with text processing and managing token limits in AI applications.
Language: Python - Size: 420 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

neshkeev/pgpc
Python Generators based Parser Combinators
Language: Python - Size: 17.6 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

nonoroazoro/diff-match-patch-typescript
🚁 TypeScript port of diff-match-patch.
Language: TypeScript - Size: 564 KB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 11 - Forks: 1

9-5/Chromium-Intelligence
A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.
Language: JavaScript - Size: 834 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

learnbyexample/cli_text_processing_coreutils
Example based guide for specialized text processing with GNU Coreutils
Language: Shell - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 193 - Forks: 9

JhonnySalles/MangaExtractor
Image processing and character recognition, transforming into editable text. Extraction of bubbles in comics/manga.
Language: Python - Size: 140 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

birchb1024/frangipanni
Program to convert lines of text into a tree structure.
Language: Go - Size: 1 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 1,200 - Forks: 30

RMNCLDYO/gemini-ai-toolkit
Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.
Language: Python - Size: 313 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 70 - Forks: 15

brothersincode/virastar
Cleaning-up Persian Texts!
Language: JavaScript - Size: 1.3 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 139 - Forks: 15

angelosalatino/cso-classifier
Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).
Language: Python - Size: 19.9 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 90 - Forks: 19

prabhashj07/nepalikit
NepaliKit is a Python library for natural language processing (NLP) tasks in Nepali. It features tokenization (rule-based and SentencePiece), text preprocessing, stopword management, and sentence segmentation. Ideal for developers and researchers working with Nepali text data.
Language: Python - Size: 364 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

yaa110/rake-rs
Multilingual implementation of RAKE algorithm for Rust
Language: Rust - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 34 - Forks: 8

catatsuy/purl
Streamlining Text Processing
Language: Go - Size: 185 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 217 - Forks: 6

guillaumeast/printui
Minimal string CLI in Go → Clean, measure & reshape terminal text (Unicode/ANSI-aware)
Language: Shell - Size: 19.5 MB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

meefs/entseeker
entseeker is a command-line tool for Named Entity Recognition (NER) and web entity searches in text files. It uses spaCy's NLP capabilities for standard named entities and custom rules for web-related entities.
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

guillaumeast/strui
Minimal string CLI → Clean, measure & reshape terminal text (Unicode/ANSI-aware)
Language: C++ - Size: 15.6 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

guillaumeast/libstrui
Header-only C++ lib → Clean, measure & reshape terminal text (Unicode/ANSI-aware)
Language: C++ - Size: 21.5 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jhd3197/Tukuy
Tukuy is a robust, extensible data transformation library that leverages a flexible plugin system. It simplifies the manipulation, validation, and extraction of data across multiple formats (text, HTML, JSON, dates, numbers, and more), making it an ideal tool for building data pipelines and cleaning workflows.
Language: Python - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

gagolews/stringi
Fast and portable character string processing in R (with the Unicode ICU)
Language: C++ - Size: 210 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 310 - Forks: 49

guillaumeast/stringui
Minimal string toolbox → Clean, measure, reshape (unicode/ansi aware)
Language: C++ - Size: 25.4 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

daac-tools/python-daachorse
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)
Language: Rust - Size: 3.22 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 17 - Forks: 1

andrei-vataselu/data-science-snippets
🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.
Language: Python - Size: 30.3 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

ty70/text_preprocessing_tools
A set of Python tools for preprocessing Japanese text for subtitles or speech synthesis (e.g., ruby removal, kanji stripping).
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Rohit-Sharma-RS/Useful-py-scripts
Language: Python - Size: 457 KB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

oneai-nlp/oneai-python
Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.
Language: Python - Size: 539 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 7

rlayers/pawpaw
Text Processing & Segmentation Framework
Language: Python - Size: 2.52 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

callforpapers-source/doc2term
A fast sentence/word tokenizer, and punctuation remover.
Language: C - Size: 68.4 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

guillaumeast/str
Minimal string toolbox → Clean, measure, reshape (unicode/ansi aware)
Language: C++ - Size: 15.6 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BaseMax/ExtractWord
Extract word(s) from the lines of the file.
Language: PHP - Size: 23.4 KB - Last synced at: 4 days ago - Pushed at: about 6 years ago - Stars: 4 - Forks: 1

paul-j-lucas/wrap
Text reformatter better than fmt(1) or fold(1).
Language: C - Size: 3.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 4

kupolak/textstat
Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
Language: Ruby - Size: 242 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 34 - Forks: 10

nchern/cli-tools
This repo contains a set of handy command line tools
Language: Go - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

solnic/text_parser
TextParser is an Elixir library for extracting and validating structured tokens from text, such as URLs, hashtags, @-mentions etc.
Language: Elixir - Size: 61.5 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 44 - Forks: 0

mcnemesis/cli_tttt
The Reference Implementation of TEA (Transforming Executable Alphabet) computer programming language
Language: Python - Size: 6.92 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

cainky/ReplaceText
Replaces text based on a dictionary, given user input to specify which direction (keys-to-values or values-to-keys)
Language: Python - Size: 27.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

kk7nc/Text_Classification
Text Classification Algorithms: A Survey
Language: Python - Size: 13.8 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1,811 - Forks: 544

andalugeeks/andaluh-py
Transliterate español (spanish) spelling to andaluz proposals using python
Language: Python - Size: 759 KB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 23 - Forks: 3

shner-elmo/flashtext2
The fastest FlashText library for Python
Language: Python - Size: 1.89 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 22 - Forks: 3

uvaliyev/llmtxt
Tool for concatenating docs into unified text files for LLM processing. Converts structured documentation (HTML/Markdown) into clean training corpora with configurable filtering.
Language: Shell - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

yxshee/summarization-nlp
reduces a document to a shorter version, retaining key points. extractive summarization selects important content from the source using text summarization
Language: Jupyter Notebook - Size: 2.52 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

rlan/csv2ical
A CLI tool that converts a CSV file with event details into an iCalendar ICS file. The ICS file can then be imported into apps like Google Calendar, Microsoft Outlook, Apple macOS Calendar and etc.
Language: Python - Size: 43.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

xmarva/fnatural
Minimal F# library for functional text processing.
Language: F# - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

R-Mahesh45/HR---Resume-Text-Classification
Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.
Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

davedean/deslopify
A utility that cleans up text by removing or translating common 'slop' patterns from AI-generated text
Language: TypeScript - Size: 221 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sadit/TextClassification.jl
A text classification library using the microtc approach
Language: Julia - Size: 569 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Csb-218/Covlet
A Chrome extension that reads job descriptions on platforms like LinkedIn, Wellfound, and Internshala, then generates personalized cover letters designed to get you noticed.
Language: TypeScript - Size: 255 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

golesuman/Doctor-recommendation-system
This is gives a REST API for the doctor recommendation system. Given the Symptoms this will recommend the list of doctors you need to contact .
Language: Jupyter Notebook - Size: 896 KB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 4

d3bvstack/c-declare-top
Script to parse .c files and add function declarations at the top of each file
Language: Shell - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Sid3503/sparse-attention
PyTorch-style strided sparse attention with configurable strides, local+global token support, and memory-efficient masking.
Language: Jupyter Notebook - Size: 712 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robertoaleman/RA_Rabin-Karp_Search_Algorithm_Package
This class uses the Rabin-Karp algorithm to find occurrences of a given pattern within a standard text string.
Language: PHP - Size: 40 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

vmenger/docdeid
Create your own document de-identifier using docdeid, a simple framework independent of language or domain.
Language: Python - Size: 239 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 3

Ramtin-Karbaschi/enHumanizer_Bot
Transform AI-generated text to be indistinguishable from human writing. Features text type detection (academic, business, resume, narrative), preserves original meaning, and eliminates AI patterns through regex-based cleanup. Includes Telegram bot and supports multiple APIs (Gemini, DeepSeek, Hugging Face, ...).
Language: Python - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

juansear/specialization-postgreSQL_4e
Repository to store notes and exercises solutions of the PostgreSQL for Everybody specialization given by university of Michigan.
Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

vishnumishra/ai-youtube-transcript
A powerful Node.js library for retrieving and processing YouTube video transcripts. Supports multiple languages, translation, formatting options, and proxy handling without requiring an API key or headless browser.
Language: TypeScript - Size: 146 KB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

kaz-utashiro/App-Greple-xlate
App::Greple::xlate - translation support module for greple
Language: Perl - Size: 6.87 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

k3jph/phonics-in-r
Phonetic Spelling Algorithms in R
Language: R - Size: 443 KB - Last synced at: about 9 hours ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 8

lemon24/linesieve
An unholy blend of grep, sed, awk, and Python.
Language: Python - Size: 138 KB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

ikegami-yukino/python-tr
A Pure-Python implementation of the tr algorithm
Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 4

febeling/edit-distance
Levenshtein edit distance in Rust
Language: Rust - Size: 567 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 46 - Forks: 15

LunarisApp/text-tools
A collection of text processing tools
Language: TypeScript - Size: 2.5 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BenjaminDanker/Data-AI-Prepare
A collection of Python utilities for preparing and transforming text data—PDF extraction, paragraph analysis, embedding generation, URL scraping, CSV conversion, and Astra DB uploads
Language: Python - Size: 473 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

twardoch/split-markdown4gpt
A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.
Language: Python - Size: 78.1 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 24 - Forks: 2

Abraham7016/Small-Swear-Api
Türkçe metinlerde küfür ve argo kelimeleri algılayan, kategorize eden ve sansürleyen Node.js tabanlı RESTful API.
Language: JavaScript - Size: 15.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0
