An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-processing

Goldziher/html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter

Language: HTML - Size: 7.41 MB - Last synced at: about 8 hours ago - Pushed at: about 16 hours ago - Stars: 409 - Forks: 40

Puchaczov/Musoq

SQL Syntax without any database

Language: C# - Size: 17 MB - Last synced at: about 9 hours ago - Pushed at: about 15 hours ago - Stars: 497 - Forks: 21

chmln/sd

Intuitive find & replace CLI (sed alternative)

Language: Rust - Size: 414 KB - Last synced at: about 8 hours ago - Pushed at: 8 months ago - Stars: 6,762 - Forks: 151

Cod-e-Codes/prepend

A fast, safe CLI tool for prepending text to files. Buffered I/O, atomic writes, full test suite.

Language: Rust - Size: 14.6 KB - Last synced at: about 5 hours ago - Pushed at: about 15 hours ago - Stars: 1 - Forks: 0

hrishikeshrt/sanskrit-text

Sanskrit Text (Devanagari) Utility Functions

Language: Python - Size: 46.9 KB - Last synced at: about 5 hours ago - Pushed at: about 16 hours ago - Stars: 3 - Forks: 0

Kaviya121/distil-localdoc.py

πŸ“ Generate complete docstrings for your Python code using a local SLM assistant, while keeping your proprietary information secure.

Language: Python - Size: 1.76 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

stellanomia/uroman-rs

A self-contained Rust reimplementation of uroman, a universal romanizer.

Language: Rust - Size: 1.3 MB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 1

Taha5125/DocxWriter-JSON

DocxWriter is a Python library for generating professional Word documents from JSON. Automate reports, add tables, lists, images, and apply custom styles β€” all from clean, structured data.

Language: Python - Size: 23.4 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Fe4rlessxD/parseltongue_mcp

🐍 Utilize the Parseltongue MCP server for 40+ tools to encode, decode, and transform text with ease, inspired by advanced encoding techniques.

Size: 1.34 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sashko8877/Replace

πŸ› οΈ Simplify plugin development with Replace, a library for efficient placeholder management and context-based updates.

Language: Kotlin - Size: 1.34 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Saleh908/anytext2images

πŸ–ΌοΈ Extract images from any text quickly, preview them in a gallery, and download your selections easily as individual files or a ZIP.

Language: Python - Size: 28.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sl5net/SL5-aura-service

Your offline, privacy-first voice assistant framework. Transform speech into commands and actions with a powerful, scriptable rule engine.

Language: Python - Size: 344 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 1

shawnacontrary24/DocStripper

🧹 Clean up your documents with DocStripper, the AI-powered tool that removes noise like page numbers and duplicates for clear, tidy text.

Language: Python - Size: 2.33 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

VIETCUTEa/awk-fmd

πŸ› οΈ Streamline and manage financial data using awk for efficient processing and transformation in your data workflows.

Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Dielectricheatingphenylacetamide203/categorized-english-words

Size: 2.08 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

RavyAun/transformer-gesture

πŸ€– Build a Transformer-based gesture recognition system with PyTorch, ONNX, and Gradio for real-time video analysis and efficient inference.

Language: Python - Size: 4.18 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

StarkAj75/kirmanjiku-12

πŸ”§ Streamline data management with kirmanjiku-12, a versatile tool designed to enhance efficiency and organization in your projects.

Size: 1.29 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Ajay2292/sonshell

πŸ“Έ Control your Sony camera remotely with SonShell, a Linux tool that captures images, tweaks settings, and manages files all from a single terminal.

Size: 1.39 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

kantord/headson

head/tail for structured data - summarize/preview JSON/YAML and source code

Language: Rust - Size: 45.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 48 - Forks: 3

cuentafre7297/perl-xvp

πŸͺ Simplify vector processing in Perl with perl-xvp, a powerful library for handling and manipulating vectors efficiently.

Size: 1.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Rokko-Vencht/regex-generator

πŸ›  Generate validated regular expressions easily with this open-source tool. Ideal for developers needing quick regex solutions for forms and text processing.

Language: JavaScript - Size: 1.33 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

albion83/tilelang

πŸš€ Accelerate your GPU/CPU kernel development with Tile Language, a concise and Pythonic DSL for high-performance computing optimizations.

Language: C++ - Size: 8.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

yuvrajpandiya/Piero-EnDe-Coder

A powerful encryption and decryption tool that combines the Vigenère cipher, XOR encryption, and Base64 encoding to secure messages. This tool allows users to encode and decode messages using a secret key, ensuring an extra layer of security.

Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Saffronduck5667/precision-r-comparison

Laboratory 8 - Retrieval Information

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

apakabarfm/syllabreak-swift

Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.

Language: Swift - Size: 218 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

JaweriaAsif745/LawMate

AI-powered contract analysis tool that extracts clauses, highlights risky terms, summarizes documents, and answers questions using NLP. Built with Streamlit + Python.

Language: Python - Size: 1.59 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

coy-flatness463/markdown-translator

🌐 Translate Markdown files to Chinese seamlessly using Python and OpenRouter API, ensuring quality through intelligent splitting and concurrent processing.

Language: Python - Size: 2.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Om20kar05/20250913122858-kardenwort

πŸ“š Accelerate language learning by turning any text into context-rich Anki flashcards with Kardenwort, your intelligent offline study companion.

Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

co-r-e/IrukaDark

Thinking without interruption, your full-screen AI assistant | Code, errors, PRs, charts, papers, tech sites. Instantly understand anything on your screen with a single shortcut.

Language: JavaScript - Size: 2.82 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

iluvn01/VFMTok

πŸ–ΌοΈ Leverage vision foundation models to transform visual data into effective tokens for autoregressive generation in this PyTorch implementation.

Language: Python - Size: 2.24 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

uludakar/FreeAIchat-2api

πŸ€– Connect freely with various AI models using our straightforward API, enabling engaging multi-turn conversations and real-time information access.

Language: Python - Size: 1.32 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

ZyyzouuSG/bash-ohz

πŸš€ Enhance your Bash experience with streamlined tools and functions designed for efficient script management and improved productivity.

Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

ashdahdadwd/bbc-basic-eqa

πŸ“Š Enhance comprehension with bbc-basic-eqa, a tool for efficient question answering in natural language using BBC's basic datasets.

Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

robbiechen1969/convert-md-to-rt

πŸ–₯️ Convert Markdown to Rich Text on macOS easily. Streamline your workflow by automatically transforming clipboard content into Rich Text format.

Language: Shell - Size: 1.89 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

JeffreyUrban/uniqseq

Stream-based deduplication for repeating sequences

Language: Python - Size: 1.25 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

sepandhaghighi/mytext

MyText: A Minimal AI-Powered Text Rewriting Tool

Language: Python - Size: 152 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 0

Anathelegend/perl-efz

πŸš€ Simplify data management with Perl EFZ, an efficient tool for efficient file and data manipulation in Perl applications.

Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

phil65/docler

Abstractions & Tools for OCR / document processing

Language: Python - Size: 2.45 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1

theultimateminecraftgang/MentionAi-Auto-Bot

πŸ€– Automate interactions with the Mention Network API using this Ruby script to generate AI-driven responses and log your Q&A workflows efficiently.

Language: Ruby - Size: 1.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

VitinDM/data-science-snippets

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

Language: Python - Size: 30.3 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

pyparsing/pyparsing

Python library for creating PEG parsers

Language: Python - Size: 8.6 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,431 - Forks: 296

Aksherwal/EditNPress

This project is a flask based web app which scrapes the news text from a news website and clean it, analyse it and show filtered text without any ads.

Language: Python - Size: 7.22 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Language: Python - Size: 342 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 8,613 - Forks: 672

mensfeld/llm-docs-builder

Transform and optimize your markdown documentation for Large Language Models (LLMs) and RAG systems. Generate llms.txt automatically.

Language: Ruby - Size: 1.74 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 69 - Forks: 3

apakabarfm/syllabreak-python

Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.

Language: Python - Size: 213 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

AnaPaula04/pii-redaction-demo

Lightweight PII redaction pipeline using Hugging Face NER + regex (Python) 96.5% accuracy

Language: Python - Size: 29.3 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

mahammed123-lab/bcpl-lnn

🌐 Build and manage blockchain-proof learning networks with bcpl-lnn, ensuring secure and efficient data sharing and collaboration.

Size: 1.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

halrraiser/Universal-Input-Sanitizer

A lightweight toolkit for sanitizing, masking, and safely encoding user input across multiple languages.

Language: Python - Size: 7.81 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

M4UNC/PDF-Package-Analyzer

πŸ” Analyze PDF files effectively with this Python tool, testing compatibility across libraries to guide optimal PDF processing solutions.

Language: Python - Size: 1.35 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

fmadore/iwac-ai-pipelines

AI pipelines for Omeka S digital collections - OCR correction, entity extraction, and text analysis

Language: Python - Size: 478 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

loderunner/typelit

A type-safe string templating library for TypeScript

Language: TypeScript - Size: 921 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1

PyThaiNLP/pythainlp

Thai natural language processing in Python

Language: Python - Size: 66 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 1,089 - Forks: 287

BurntSushi/aho-corasick

A fast implementation of Aho-Corasick in Rust.

Language: Rust - Size: 4.72 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,164 - Forks: 107

Toutl/textlib

A lightweight Java NLP library for building, cleaning, and vectorizing text corpora.

Language: Java - Size: 30.3 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

codeproexpert1/AutoHumanize-Automated-DOCX-Text-Humanization

A Python automation tool that reads DOCX files, splits large text into chunks, and humanizes content using undetected Chrome .

Language: Python - Size: 63.5 KB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

JeffreyUrban/patterndb-yaml

YAML-based pattern matching with multi-line capabilities for log normalization using syslog-ng patterndb

Language: Python - Size: 422 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

catatsuy/purl

Streamlining Text Processing

Language: Go - Size: 247 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 229 - Forks: 6

sstadick/hck

A sharp cut(1) clone.

Language: Rust - Size: 515 KB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 722 - Forks: 18

akikareha/himewiki

A simple wiki engine with content filter by AI agent

Language: Go - Size: 284 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

KyryloRud/wc-simd

High-performance C++ clone of Unix wc with SIMD acceleration, runtime CPU dispatch, and multithreaded chunking. Designed to handle files over 100 GB efficiently.

Language: CMake - Size: 98.6 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

linuxscout/pyarabic

pyarabic

Language: Python - Size: 1.23 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 469 - Forks: 87

naveedhahamed23/Fanqie-novel-Downloader

πŸ… Download your favorite novels easily with Fanqie Novel Downloader, a modern tool designed for quick and stylish access across multiple platforms.

Language: Python - Size: 2.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

roblillack/tdoc

CLI tool and Rust crate for rendering, converting, and creating nice text documents (FTML, Markdown, HTML, plaintext)

Language: HTML - Size: 1.39 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

LadioLover/x-ad-copy-analyzer

X Ad Copy Analyzer for automated analysis of advertising content.

Size: 36.8 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

LadioLover/x-sentiment-analysis-bot

X Sentiment Analysis Bot for automating sentiment analysis tasks across social media, websites, and reviews.

Size: 36.8 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

apakabarfm/syllabreak-kotlin

Kotlin library for multilingual syllabification and hyphenation

Language: Kotlin - Size: 231 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

mcnemesis/cli_tttt

TEA GitHub Project | The Reference Implementation of TEA (Transforming Executable Alphabet) computer programming language.

Language: Python - Size: 395 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

elektito/finglish

A Finglish to Persian converter.

Language: Python - Size: 2.28 MB - Last synced at: 6 days ago - Pushed at: about 4 years ago - Stars: 86 - Forks: 21

chill-lime/LineByLinePaster

Mac δΈ“η”¨ηš„ι€θ‘Œη²˜θ΄΄ε·₯ε…·ο½œA line-by-line paste tool for macOS

Language: HTML - Size: 1.46 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

nchern/cli-tools

This repo contains a set of handy command line tools

Language: Go - Size: 82 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

smyrgeorge/lexis

A collection of utilities for translating large documents and books.

Language: Python - Size: 1.62 MB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

SpezioC/Doctor_Ticket

Smart ticket classification system (showcase project)

Language: Python - Size: 23.4 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

Language: Python - Size: 379 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 336 - Forks: 32

ipusiron/modular-text-divider

A modular text splitter for cryptanalysis and pattern analysis

Language: JavaScript - Size: 385 KB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

SujethaJanet-2004/document-search-engine

A lightweight offline search engine built using pure Python. It indexes text documents, performs fast keyword search, highlights matches, and provides document insights like frequency analysis and similarity scores.

Language: Python - Size: 1.48 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

KyleDerZweite/word-salat

πŸ₯— Scramble text while keeping first and last letters intact (Cambridge effect). Includes CLI, scoring tools, and AI decoding benchmarks.

Language: Python - Size: 35.2 KB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

muijf/treelog

A customizable tree rendering library for Rust.

Language: Rust - Size: 575 KB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

johnsonjh/g

g: A portable general purpose programmable text editor with calculator and macro facility.

Language: C - Size: 3.25 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 38 - Forks: 2

knime/knime-textprocessing

KNIME - Text Processing Extension (Labs)

Language: Java - Size: 63.9 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 19 - Forks: 10

wenet-e2e/WeTextProcessing

Text Normalization & Inverse Text Normalization

Language: Python - Size: 1010 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 694 - Forks: 90

DevExpress-Examples/winforms-wpf-ai-text-extension

Add AI-powered text processing features to the DevExpress UI text components

Language: Visual Basic .NET - Size: 225 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

ethiopicai/Amharic-Text-Processor

Modular Amharic text preprocessing toolkit with composable processors and pipeline.

Language: Python - Size: 207 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 0

Ponniedog/bash-ohz

πŸš€ Enhance your Bash experience with bash-ohz, a collection of useful scripts and tools for streamlined command-line productivity.

Size: 1.29 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

mhakantatlici/Column-Crusher-by-MHT

Column Crusher is a web-based PHP tool you built to clean, reformat, split, merge, fix, and optimize spreadsheet columns with one click. It’s designed to solve the daily data-processing headaches you face at our work and in your automation workflows.

Language: HTML - Size: 8.79 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

himkt/konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

Language: Python - Size: 1.35 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 260 - Forks: 27

Kinosaur/natural-language-processing

This is one of my Courses from 2/2025 CS

Language: Python - Size: 1.5 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

cspnms/MSchunker

Smart text chunker for LLM preprocessing (sections β†’ paragraphs β†’ sentences β†’ hard splits).

Language: Python - Size: 90.8 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

amirivojdan/shekar

Simplifying Persian NLP for Modern Applications

Language: Python - Size: 23.4 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 52 - Forks: 3

shubhro2002/Drug-Class-Prediction-from-Medical-Text

This project demonstrates a complete, end-to-end multi-label NLP pipeline that predicts drug classes from descriptive medical text. The approach combines text engineering, multi-label modeling, and systematic evaluation, forming a foundation for more advanced biomedical NLP applications.

Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

scripal-git/scripal

universal text processor

Language: C++ - Size: 3.05 MB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 1

Nikelroid/huffman-encoder-decoder

A robust Java implementation of the Huffman coding algorithm for text compression, featuring both lossless and customizable lossy compression modes with 7-bit block optimization for enhanced storage efficiency.

Language: Java - Size: 374 KB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

microsoft/browsecloud πŸ“¦

A web app to create and browse text visualizations for automated customer listening.

Language: TypeScript - Size: 5.58 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 148 - Forks: 19

ChenghaoMou/text-dedup

All-in-one text de-duplication

Language: Python - Size: 58.9 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 730 - Forks: 74

proycon/pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Language: Python - Size: 12.8 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 480 - Forks: 68

cainky/ReplaceText

Replaces text based on a dictionary, given user input to specify which direction (keys-to-values or values-to-keys)

Language: Python - Size: 36.1 KB - Last synced at: 13 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

pivoshenko/ihroteka-converter

A lightweight package for converting Markdown into Steam-compatible markup

Language: Python - Size: 430 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

mahmoudalshukri/regex-helper

A clean and modern RegEx playground built with Next.js 14, TypeScript, Tailwind CSS, and shadcn/ui. Test, debug, highlight, and understand regular expressions with an intuitive UI, real-time matching, pattern library, and token-by-token regex explanations.

Language: TypeScript - Size: 3.77 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

MIT-LCP/bloatectomy

A python package for removing duplicate text in clinical notes or other documents

Language: TeX - Size: 7.48 MB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 10

digineo/texd

texd wraps TeX in a web API

Language: Go - Size: 1.1 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 14 - Forks: 1

Mindful-AI-Assistants/4-social-buzz-ai--Natural_Language_Processing-NL-Class_1

πŸͺ 4- Social Buss: NLP - Class 1 : This repository provides resources and practical implementations for Natural Language Processing (NLP) focused on social media data analysis. It includes tutorials and demos on NLP preprocessing techniques such as regex, tokenization, lemmatization, stemming, count vectorization, and stopword removal.

Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0