An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-processing

finnjest/Realm

Advanced Text Processing Tool

Language: AutoHotkey - Size: 253 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

moltenib/md-to-html

Sed script that converts Markdown to HTML code.

Language: sed - Size: 106 KB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 2

codingkush/ChatSense

ChatSense — A chat analyzer app that quickly summarizes and analyzes WhatsApp chat exports with a clean, easy-to-use interface.

Language: Python - Size: 17.6 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

Pranav-Patel-123/GenAI

Language: TypeScript - Size: 102 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

GateNLP/python-gatenlp

Python text processing, pattern matching, and NLP framework

Language: Jupyter Notebook - Size: 19.4 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 66 - Forks: 8

farhad-here/Persian_Text_Processing

It is Persian Text processing with parsivar library

Language: Python - Size: 9.77 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

Pranav-Patel-123/WaY-scrapping

web and youtube scrapping for the given input like a search engine that brings links and text from web and youtube.

Language: Python - Size: 6.84 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

Edopramudya/Sentiment-Text-Clustering

Proyek ini berfokus pada preprocessing dan clustering data teks dari dataset sentimen. Dataset yang digunakan berisi teks dan label sentimen (positif, negatif, netral), dan dilakukan pembersihan teks sebelum proses klastering.

Language: Jupyter Notebook - Size: 729 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

acarl005/stripansi

A little Go package for removing ANSI color escape codes from strings.

Language: Go - Size: 1.95 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 135 - Forks: 16

LucasGoncSilva/mosheh

Mosheh, a tool for creating docs for projects, from Python to Python.

Language: Python - Size: 1.46 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 8 - Forks: 1

airbnb/artificial-adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Language: Python - Size: 116 KB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 402 - Forks: 57

open-i18n/rust-unic

UNIC: Unicode and Internationalization Crates for Rust

Language: Rust - Size: 14.1 MB - Last synced at: 15 days ago - Pushed at: 23 days ago - Stars: 241 - Forks: 24

IoeCmcomc/chiecthuyenngoaixa

An utility library for processing Vietnamese texts

Language: Python - Size: 235 KB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 4 - Forks: 1

parksb/ised

An interactive tool for find-and-replace across many files

Language: Rust - Size: 473 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 0

casics/nostril 📦

Nostril: Nonsense String Evaluator

Language: Python - Size: 143 MB - Last synced at: about 10 hours ago - Pushed at: about 3 years ago - Stars: 195 - Forks: 35

PellaML/Markdown-Renderer

Enhanced Markdown Renderer: A versatile and extensible JavaScript-based Markdown rendering and parsing library, leveraging Abstract Syntax Trees (AST) for efficient processing and customizable output. Open-source and community-driven, with a focus on future improvements and contributions.

Language: JavaScript - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

aditiiprasad/WhatsStat

A fun and insightful WhatsApp chat analyzer that turns your conversations into beautiful stats, juicy graphs, and quirky insights.

Language: Python - Size: 1.27 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

ronnmabunga/scanpad

ScanPad is an OCR-powered notepad that extracts text from images and lets you edit, organize, and export documents. It features a rich text editor, multiple input methods, and a responsive user interface design.

Language: JavaScript - Size: 354 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

victoryosiobe/kingchop

Kingchop ⚔️ is a JavaScript English based library for tokenizing text (chopping text). It uses vast rules for tokenizing, and you can adjust them easily.

Language: JavaScript - Size: 85.9 KB - Last synced at: about 13 hours ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

yui-mhcp/data_processing

Data processing utilities in keras3

Language: Jupyter Notebook - Size: 86.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 1

ucd-dnp/ConTexto

Librería en Python para minería de texto y NLP

Language: Jupyter Notebook - Size: 34.1 MB - Last synced at: 22 days ago - Pushed at: about 1 year ago - Stars: 49 - Forks: 14

Bikatr7/Kudasai

Streamlining Japanese-English Translation with Advanced Preprocessing and Integrated Translation Technologies

Language: Python - Size: 90.4 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 25 - Forks: 4

KashifMoin1410/Text-Sentiment-Analysis

This project analyzes tweet sentiments using both traditional machine learning (Logistic Regression, Ridge, XGBoost) and deep learning (LSTM) models. The workflow covers text preprocessing, feature engineering, model training, and evaluation. Logistic Regression achieved an R² score of 0.80, while the LSTM model reached ~76% validation accuracy.

Language: Jupyter Notebook - Size: 3.58 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Hunter2718/kitten

A safe and modern clone of the Unix cat command written in Rust

Language: Rust - Size: 17.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

MarjovanLier/StringManipulation

PHP 8.3+ string manipulation library with accent removal, UTF-8 conversion, name formatting & date validation. Fully typed, 100% tested, production-ready.

Language: PHP - Size: 351 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

dlite-tools/NLPiper

NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.

Language: Python - Size: 165 KB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 19 - Forks: 1

CyberCRI/refinedoc

python library for post-extraction refinement of text that may be derived from PDF extraction.

Language: Python - Size: 19.5 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 2

derek73/python-nameparser

A simple Python module for parsing human names into their individual components

Language: Python - Size: 778 KB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 674 - Forks: 105

ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

Language: Python - Size: 537 KB - Last synced at: 30 days ago - Pushed at: 6 months ago - Stars: 329 - Forks: 31

pemistahl/lingua-go

The most accurate natural language detection library for Go, suitable for short text and mixed-language text

Language: Go - Size: 226 MB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 1,245 - Forks: 68

amirivojdan/shekar

Simplifying Persian NLP for Everyone

Language: Python - Size: 2.84 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 1

Cicatriiz/text-toolkit

Advanced MCP server providing comprehensive text transformation and formatting tools. TextToolkit offers over 40 specialized utilities for case conversion, encoding/decoding, formatting, analysis, and text manipulation - all accessible directly within your AI assistant workflow.

Language: TypeScript - Size: 1.37 MB - Last synced at: 25 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

michaelarutyunov/Earning-Calls-pdf-to-json-transformation

Demonstration of the effective use of LLM to transform earning call transcripts from PDF to JSON

Language: Python - Size: 12.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

haven-jeon/PyKoSpacing

Automatic Korean word spacing with Python

Language: Python - Size: 4.53 MB - Last synced at: 15 days ago - Pushed at: 12 months ago - Stars: 414 - Forks: 114

fastnlp/fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Language: Python - Size: 35.1 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 3,132 - Forks: 449

MahtaFetrat/Persian-Informal-Text-Detector

Python package for detecting informal Persian text using regular expressions and rule-based methods

Language: Python - Size: 21.5 KB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

lykmapipo/US-Inaugural-Addresses

Python scripts to download, process, and analyze US Inaugural Addresses

Language: Python - Size: 4.45 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

kgruiz/PyTokenCounter

A simple Python library for tokenizing text and counting tokens. While currently only supporting OpenAI LLMs, it helps with text processing and managing token limits in AI applications.

Language: Python - Size: 420 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

neshkeev/pgpc

Python Generators based Parser Combinators

Language: Python - Size: 17.6 KB - Last synced at: 11 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

nonoroazoro/diff-match-patch-typescript

🚁 TypeScript port of diff-match-patch.

Language: TypeScript - Size: 564 KB - Last synced at: 4 days ago - Pushed at: 10 months ago - Stars: 11 - Forks: 1

9-5/Chromium-Intelligence

A powerful Chromium extension that leverages the multiple AI APIs to assist with various text operations, image analysis, and PDF processing.

Language: JavaScript - Size: 834 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 1

learnbyexample/cli_text_processing_coreutils

Example based guide for specialized text processing with GNU Coreutils

Language: Shell - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 193 - Forks: 9

JhonnySalles/MangaExtractor

Image processing and character recognition, transforming into editable text. Extraction of bubbles in comics/manga.

Language: Python - Size: 140 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

birchb1024/frangipanni

Program to convert lines of text into a tree structure.

Language: Go - Size: 1 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 1,200 - Forks: 30

RMNCLDYO/gemini-ai-toolkit

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.

Language: Python - Size: 313 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 70 - Forks: 15

brothersincode/virastar

Cleaning-up Persian Texts!

Language: JavaScript - Size: 1.3 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 139 - Forks: 15

angelosalatino/cso-classifier

Python library that classifies content from scientific papers with the topics of the Computer Science Ontology (CSO).

Language: Python - Size: 19.9 MB - Last synced at: 5 days ago - Pushed at: 6 months ago - Stars: 90 - Forks: 19

prabhashj07/nepalikit

NepaliKit is a Python library for natural language processing (NLP) tasks in Nepali. It features tokenization (rule-based and SentencePiece), text preprocessing, stopword management, and sentence segmentation. Ideal for developers and researchers working with Nepali text data.

Language: Python - Size: 364 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

yaa110/rake-rs

Multilingual implementation of RAKE algorithm for Rust

Language: Rust - Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 34 - Forks: 8

catatsuy/purl

Streamlining Text Processing

Language: Go - Size: 185 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 217 - Forks: 6

guillaumeast/printui

Minimal string CLI in Go → Clean, measure & reshape terminal text (Unicode/ANSI-aware)

Language: Shell - Size: 19.5 MB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

meefs/entseeker

entseeker is a command-line tool for Named Entity Recognition (NER) and web entity searches in text files. It uses spaCy's NLP capabilities for standard named entities and custom rules for web-related entities.

Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

guillaumeast/strui

Minimal string CLI → Clean, measure & reshape terminal text (Unicode/ANSI-aware)

Language: C++ - Size: 15.6 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

guillaumeast/libstrui

Header-only C++ lib → Clean, measure & reshape terminal text (Unicode/ANSI-aware)

Language: C++ - Size: 21.5 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jhd3197/Tukuy

Tukuy is a robust, extensible data transformation library that leverages a flexible plugin system. It simplifies the manipulation, validation, and extraction of data across multiple formats (text, HTML, JSON, dates, numbers, and more), making it an ideal tool for building data pipelines and cleaning workflows.

Language: Python - Size: 52.7 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

gagolews/stringi

Fast and portable character string processing in R (with the Unicode ICU)

Language: C++ - Size: 210 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 310 - Forks: 49

guillaumeast/stringui

Minimal string toolbox → Clean, measure, reshape (unicode/ansi aware)

Language: C++ - Size: 25.4 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

daac-tools/python-daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

Language: Rust - Size: 3.22 MB - Last synced at: 21 days ago - Pushed at: 3 months ago - Stars: 17 - Forks: 1

andrei-vataselu/data-science-snippets

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

Language: Python - Size: 30.3 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

ty70/text_preprocessing_tools

A set of Python tools for preprocessing Japanese text for subtitles or speech synthesis (e.g., ruby removal, kanji stripping).

Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Rohit-Sharma-RS/Useful-py-scripts

Language: Python - Size: 457 KB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

oneai-nlp/oneai-python

Python SDK for One AI APIs. One AI is an NLP-as-a-service platform. Our APIs enables language comprehension in context, transforming texts from any source into structured data to use in code.

Language: Python - Size: 539 KB - Last synced at: 7 days ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 7

rlayers/pawpaw

Text Processing & Segmentation Framework

Language: Python - Size: 2.52 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

callforpapers-source/doc2term

A fast sentence/word tokenizer, and punctuation remover.

Language: C - Size: 68.4 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 2

guillaumeast/str

Minimal string toolbox → Clean, measure, reshape (unicode/ansi aware)

Language: C++ - Size: 15.6 KB - Last synced at: about 17 hours ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

BaseMax/ExtractWord

Extract word(s) from the lines of the file.

Language: PHP - Size: 23.4 KB - Last synced at: 4 days ago - Pushed at: about 6 years ago - Stars: 4 - Forks: 1

paul-j-lucas/wrap

Text reformatter better than fmt(1) or fold(1).

Language: C - Size: 3.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 4

kupolak/textstat

Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.

Language: Ruby - Size: 242 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 34 - Forks: 10

nchern/cli-tools

This repo contains a set of handy command line tools

Language: Go - Size: 34.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

solnic/text_parser

TextParser is an Elixir library for extracting and validating structured tokens from text, such as URLs, hashtags, @-mentions etc.

Language: Elixir - Size: 61.5 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 44 - Forks: 0

mcnemesis/cli_tttt

The Reference Implementation of TEA (Transforming Executable Alphabet) computer programming language

Language: Python - Size: 6.92 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

cainky/ReplaceText

Replaces text based on a dictionary, given user input to specify which direction (keys-to-values or values-to-keys)

Language: Python - Size: 27.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

kk7nc/Text_Classification

Text Classification Algorithms: A Survey

Language: Python - Size: 13.8 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 1,811 - Forks: 544

andalugeeks/andaluh-py

Transliterate español (spanish) spelling to andaluz proposals using python

Language: Python - Size: 759 KB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 23 - Forks: 3

shner-elmo/flashtext2

The fastest FlashText library for Python

Language: Python - Size: 1.89 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 22 - Forks: 3

uvaliyev/llmtxt

Tool for concatenating docs into unified text files for LLM processing. Converts structured documentation (HTML/Markdown) into clean training corpora with configurable filtering.

Language: Shell - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

yxshee/summarization-nlp

reduces a document to a shorter version, retaining key points. extractive summarization selects important content from the source using text summarization

Language: Jupyter Notebook - Size: 2.52 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

rlan/csv2ical

A CLI tool that converts a CSV file with event details into an iCalendar ICS file. The ICS file can then be imported into apps like Google Calendar, Microsoft Outlook, Apple macOS Calendar and etc.

Language: Python - Size: 43.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 6 - Forks: 0

xmarva/fnatural

Minimal F# library for functional text processing.

Language: F# - Size: 10.7 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

R-Mahesh45/HR---Resume-Text-Classification

Text Classification for Resumes: Conducted Exploratory Data Analysis (EDA) on a vast collection of resumes. Organized the data using Bag of Words (BoW) and TF-IDF techniques. Built and evaluated multiple models, with Logistic Regression delivering standout performance. Created Word Clouds and Histograms.

Language: Jupyter Notebook - Size: 11.2 MB - Last synced at: 19 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

davedean/deslopify

A utility that cleans up text by removing or translating common 'slop' patterns from AI-generated text

Language: TypeScript - Size: 221 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sadit/TextClassification.jl

A text classification library using the microtc approach

Language: Julia - Size: 569 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Csb-218/Covlet

A Chrome extension that reads job descriptions on platforms like LinkedIn, Wellfound, and Internshala, then generates personalized cover letters designed to get you noticed.

Language: TypeScript - Size: 255 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

golesuman/Doctor-recommendation-system

This is gives a REST API for the doctor recommendation system. Given the Symptoms this will recommend the list of doctors you need to contact .

Language: Jupyter Notebook - Size: 896 KB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 4

d3bvstack/c-declare-top

Script to parse .c files and add function declarations at the top of each file

Language: Shell - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

Sid3503/sparse-attention

PyTorch-style strided sparse attention with configurable strides, local+global token support, and memory-efficient masking.

Language: Jupyter Notebook - Size: 712 KB - Last synced at: 6 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

robertoaleman/RA_Rabin-Karp_Search_Algorithm_Package

This class uses the Rabin-Karp algorithm to find occurrences of a given pattern within a standard text string.

Language: PHP - Size: 40 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

vmenger/docdeid

Create your own document de-identifier using docdeid, a simple framework independent of language or domain.

Language: Python - Size: 239 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 3

Ramtin-Karbaschi/enHumanizer_Bot

Transform AI-generated text to be indistinguishable from human writing. Features text type detection (academic, business, resume, narrative), preserves original meaning, and eliminates AI patterns through regex-based cleanup. Includes Telegram bot and supports multiple APIs (Gemini, DeepSeek, Hugging Face, ...).

Language: Python - Size: 44.9 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

juansear/specialization-postgreSQL_4e

Repository to store notes and exercises solutions of the PostgreSQL for Everybody specialization given by university of Michigan.

Language: Python - Size: 6.84 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

vishnumishra/ai-youtube-transcript

A powerful Node.js library for retrieving and processing YouTube video transcripts. Supports multiple languages, translation, formatting options, and proxy handling without requiring an API key or headless browser.

Language: TypeScript - Size: 146 KB - Last synced at: 15 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

kaz-utashiro/App-Greple-xlate

App::Greple::xlate - translation support module for greple

Language: Perl - Size: 6.87 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

k3jph/phonics-in-r

Phonetic Spelling Algorithms in R

Language: R - Size: 443 KB - Last synced at: about 9 hours ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 8

lemon24/linesieve

An unholy blend of grep, sed, awk, and Python.

Language: Python - Size: 138 KB - Last synced at: 15 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 0

ikegami-yukino/python-tr

A Pure-Python implementation of the tr algorithm

Language: Python - Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 15 - Forks: 4

febeling/edit-distance

Levenshtein edit distance in Rust

Language: Rust - Size: 567 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 46 - Forks: 15

LunarisApp/text-tools

A collection of text processing tools

Language: TypeScript - Size: 2.5 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

BenjaminDanker/Data-AI-Prepare

A collection of Python utilities for preparing and transforming text data—PDF extraction, paragraph analysis, embedding generation, URL scraping, CSV conversion, and Astra DB uploads

Language: Python - Size: 473 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

twardoch/split-markdown4gpt

A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.

Language: Python - Size: 78.1 KB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 24 - Forks: 2

Abraham7016/Small-Swear-Api

Türkçe metinlerde küfür ve argo kelimeleri algılayan, kategorize eden ve sansürleyen Node.js tabanlı RESTful API.

Language: JavaScript - Size: 15.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0