GitHub topics: text-processing
Goldziher/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter
Language: HTML - Size: 7.41 MB - Last synced at: about 8 hours ago - Pushed at: about 16 hours ago - Stars: 409 - Forks: 40
Puchaczov/Musoq
SQL Syntax without any database
Language: C# - Size: 17 MB - Last synced at: about 9 hours ago - Pushed at: about 15 hours ago - Stars: 497 - Forks: 21
chmln/sd
Intuitive find & replace CLI (sed alternative)
Language: Rust - Size: 414 KB - Last synced at: about 8 hours ago - Pushed at: 8 months ago - Stars: 6,762 - Forks: 151
Cod-e-Codes/prepend
A fast, safe CLI tool for prepending text to files. Buffered I/O, atomic writes, full test suite.
Language: Rust - Size: 14.6 KB - Last synced at: about 5 hours ago - Pushed at: about 15 hours ago - Stars: 1 - Forks: 0
hrishikeshrt/sanskrit-text
Sanskrit Text (Devanagari) Utility Functions
Language: Python - Size: 46.9 KB - Last synced at: about 5 hours ago - Pushed at: about 16 hours ago - Stars: 3 - Forks: 0
Kaviya121/distil-localdoc.py
π Generate complete docstrings for your Python code using a local SLM assistant, while keeping your proprietary information secure.
Language: Python - Size: 1.76 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
stellanomia/uroman-rs
A self-contained Rust reimplementation of uroman, a universal romanizer.
Language: Rust - Size: 1.3 MB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 1
Taha5125/DocxWriter-JSON
DocxWriter is a Python library for generating professional Word documents from JSON. Automate reports, add tables, lists, images, and apply custom styles β all from clean, structured data.
Language: Python - Size: 23.4 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
Fe4rlessxD/parseltongue_mcp
π Utilize the Parseltongue MCP server for 40+ tools to encode, decode, and transform text with ease, inspired by advanced encoding techniques.
Size: 1.34 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
sashko8877/Replace
π οΈ Simplify plugin development with Replace, a library for efficient placeholder management and context-based updates.
Language: Kotlin - Size: 1.34 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
Saleh908/anytext2images
πΌοΈ Extract images from any text quickly, preview them in a gallery, and download your selections easily as individual files or a ZIP.
Language: Python - Size: 28.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0
sl5net/SL5-aura-service
Your offline, privacy-first voice assistant framework. Transform speech into commands and actions with a powerful, scriptable rule engine.
Language: Python - Size: 344 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 1
shawnacontrary24/DocStripper
π§Ή Clean up your documents with DocStripper, the AI-powered tool that removes noise like page numbers and duplicates for clear, tidy text.
Language: Python - Size: 2.33 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
VIETCUTEa/awk-fmd
π οΈ Streamline and manage financial data using awk for efficient processing and transformation in your data workflows.
Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
Dielectricheatingphenylacetamide203/categorized-english-words
Size: 2.08 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
RavyAun/transformer-gesture
π€ Build a Transformer-based gesture recognition system with PyTorch, ONNX, and Gradio for real-time video analysis and efficient inference.
Language: Python - Size: 4.18 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
StarkAj75/kirmanjiku-12
π§ Streamline data management with kirmanjiku-12, a versatile tool designed to enhance efficiency and organization in your projects.
Size: 1.29 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
Ajay2292/sonshell
πΈ Control your Sony camera remotely with SonShell, a Linux tool that captures images, tweaks settings, and manages files all from a single terminal.
Size: 1.39 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0
kantord/headson
head/tail for structured data - summarize/preview JSON/YAML and source code
Language: Rust - Size: 45.6 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 48 - Forks: 3
cuentafre7297/perl-xvp
πͺ Simplify vector processing in Perl with perl-xvp, a powerful library for handling and manipulating vectors efficiently.
Size: 1.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
Rokko-Vencht/regex-generator
π Generate validated regular expressions easily with this open-source tool. Ideal for developers needing quick regex solutions for forms and text processing.
Language: JavaScript - Size: 1.33 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
albion83/tilelang
π Accelerate your GPU/CPU kernel development with Tile Language, a concise and Pythonic DSL for high-performance computing optimizations.
Language: C++ - Size: 8.24 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
yuvrajpandiya/Piero-EnDe-Coder
A powerful encryption and decryption tool that combines the Vigenère cipher, XOR encryption, and Base64 encoding to secure messages. This tool allows users to encode and decode messages using a secret key, ensuring an extra layer of security.
Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
Saffronduck5667/precision-r-comparison
Laboratory 8 - Retrieval Information
Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0
apakabarfm/syllabreak-swift
Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.
Language: Swift - Size: 218 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0
JaweriaAsif745/LawMate
AI-powered contract analysis tool that extracts clauses, highlights risky terms, summarizes documents, and answers questions using NLP. Built with Streamlit + Python.
Language: Python - Size: 1.59 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
coy-flatness463/markdown-translator
π Translate Markdown files to Chinese seamlessly using Python and OpenRouter API, ensuring quality through intelligent splitting and concurrent processing.
Language: Python - Size: 2.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
Om20kar05/20250913122858-kardenwort
π Accelerate language learning by turning any text into context-rich Anki flashcards with Kardenwort, your intelligent offline study companion.
Language: Jupyter Notebook - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
co-r-e/IrukaDark
Thinking without interruption, your full-screen AI assistant | Code, errors, PRs, charts, papers, tech sites. Instantly understand anything on your screen with a single shortcut.
Language: JavaScript - Size: 2.82 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0
iluvn01/VFMTok
πΌοΈ Leverage vision foundation models to transform visual data into effective tokens for autoregressive generation in this PyTorch implementation.
Language: Python - Size: 2.24 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
uludakar/FreeAIchat-2api
π€ Connect freely with various AI models using our straightforward API, enabling engaging multi-turn conversations and real-time information access.
Language: Python - Size: 1.32 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
ZyyzouuSG/bash-ohz
π Enhance your Bash experience with streamlined tools and functions designed for efficient script management and improved productivity.
Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
ashdahdadwd/bbc-basic-eqa
π Enhance comprehension with bbc-basic-eqa, a tool for efficient question answering in natural language using BBC's basic datasets.
Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
robbiechen1969/convert-md-to-rt
π₯οΈ Convert Markdown to Rich Text on macOS easily. Streamline your workflow by automatically transforming clipboard content into Rich Text format.
Language: Shell - Size: 1.89 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0
JeffreyUrban/uniqseq
Stream-based deduplication for repeating sequences
Language: Python - Size: 1.25 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
sepandhaghighi/mytext
MyText: A Minimal AI-Powered Text Rewriting Tool
Language: Python - Size: 152 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 8 - Forks: 0
Anathelegend/perl-efz
π Simplify data management with Perl EFZ, an efficient tool for efficient file and data manipulation in Perl applications.
Size: 1.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0
phil65/docler
Abstractions & Tools for OCR / document processing
Language: Python - Size: 2.45 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1
theultimateminecraftgang/MentionAi-Auto-Bot
π€ Automate interactions with the Mention Network API using this Ruby script to generate AI-driven responses and log your Q&A workflows efficiently.
Language: Ruby - Size: 1.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
VitinDM/data-science-snippets
π§° Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.
Language: Python - Size: 30.3 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
pyparsing/pyparsing
Python library for creating PEG parsers
Language: Python - Size: 8.6 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,431 - Forks: 296
Aksherwal/EditNPress
This project is a flask based web app which scrapes the news text from a news website and clean it, analyse it and show filtered text without any ads.
Language: Python - Size: 7.22 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language: Python - Size: 342 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 8,613 - Forks: 672
mensfeld/llm-docs-builder
Transform and optimize your markdown documentation for Large Language Models (LLMs) and RAG systems. Generate llms.txt automatically.
Language: Ruby - Size: 1.74 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 69 - Forks: 3
apakabarfm/syllabreak-python
Multilingual library for accurate and deterministic hyphenation and syllable counting without relying on dictionaries.
Language: Python - Size: 213 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0
AnaPaula04/pii-redaction-demo
Lightweight PII redaction pipeline using Hugging Face NER + regex (Python) 96.5% accuracy
Language: Python - Size: 29.3 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0
mahammed123-lab/bcpl-lnn
π Build and manage blockchain-proof learning networks with bcpl-lnn, ensuring secure and efficient data sharing and collaboration.
Size: 1.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0
halrraiser/Universal-Input-Sanitizer
A lightweight toolkit for sanitizing, masking, and safely encoding user input across multiple languages.
Language: Python - Size: 7.81 KB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0
M4UNC/PDF-Package-Analyzer
π Analyze PDF files effectively with this Python tool, testing compatibility across libraries to guide optimal PDF processing solutions.
Language: Python - Size: 1.35 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
fmadore/iwac-ai-pipelines
AI pipelines for Omeka S digital collections - OCR correction, entity extraction, and text analysis
Language: Python - Size: 478 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
loderunner/typelit
A type-safe string templating library for TypeScript
Language: TypeScript - Size: 921 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 1
PyThaiNLP/pythainlp
Thai natural language processing in Python
Language: Python - Size: 66 MB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 1,089 - Forks: 287
BurntSushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
Language: Rust - Size: 4.72 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 1,164 - Forks: 107
Toutl/textlib
A lightweight Java NLP library for building, cleaning, and vectorizing text corpora.
Language: Java - Size: 30.3 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
codeproexpert1/AutoHumanize-Automated-DOCX-Text-Humanization
A Python automation tool that reads DOCX files, splits large text into chunks, and humanizes content using undetected Chrome .
Language: Python - Size: 63.5 KB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0
JeffreyUrban/patterndb-yaml
YAML-based pattern matching with multi-line capabilities for log normalization using syslog-ng patterndb
Language: Python - Size: 422 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
catatsuy/purl
Streamlining Text Processing
Language: Go - Size: 247 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 229 - Forks: 6
sstadick/hck
A sharp cut(1) clone.
Language: Rust - Size: 515 KB - Last synced at: 7 days ago - Pushed at: 10 days ago - Stars: 722 - Forks: 18
akikareha/himewiki
A simple wiki engine with content filter by AI agent
Language: Go - Size: 284 KB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0
KyryloRud/wc-simd
High-performance C++ clone of Unix wc with SIMD acceleration, runtime CPU dispatch, and multithreaded chunking. Designed to handle files over 100 GB efficiently.
Language: CMake - Size: 98.6 KB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0
linuxscout/pyarabic
pyarabic
Language: Python - Size: 1.23 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 469 - Forks: 87
naveedhahamed23/Fanqie-novel-Downloader
π Download your favorite novels easily with Fanqie Novel Downloader, a modern tool designed for quick and stylish access across multiple platforms.
Language: Python - Size: 2.2 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0
roblillack/tdoc
CLI tool and Rust crate for rendering, converting, and creating nice text documents (FTML, Markdown, HTML, plaintext)
Language: HTML - Size: 1.39 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
LadioLover/x-ad-copy-analyzer
X Ad Copy Analyzer for automated analysis of advertising content.
Size: 36.8 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
LadioLover/x-sentiment-analysis-bot
X Sentiment Analysis Bot for automating sentiment analysis tasks across social media, websites, and reviews.
Size: 36.8 MB - Last synced at: 4 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
apakabarfm/syllabreak-kotlin
Kotlin library for multilingual syllabification and hyphenation
Language: Kotlin - Size: 231 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0
mcnemesis/cli_tttt
TEA GitHub Project | The Reference Implementation of TEA (Transforming Executable Alphabet) computer programming language.
Language: Python - Size: 395 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0
elektito/finglish
A Finglish to Persian converter.
Language: Python - Size: 2.28 MB - Last synced at: 6 days ago - Pushed at: about 4 years ago - Stars: 86 - Forks: 21
chill-lime/LineByLinePaster
Mac δΈη¨ηιθ‘η²θ΄΄ε·₯ε ·ο½A line-by-line paste tool for macOS
Language: HTML - Size: 1.46 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0
nchern/cli-tools
This repo contains a set of handy command line tools
Language: Go - Size: 82 KB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0
smyrgeorge/lexis
A collection of utilities for translating large documents and books.
Language: Python - Size: 1.62 MB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0
SpezioC/Doctor_Ticket
Smart ticket classification system (showcase project)
Language: Python - Size: 23.4 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0
ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Language: Python - Size: 379 KB - Last synced at: 9 days ago - Pushed at: 11 days ago - Stars: 336 - Forks: 32
ipusiron/modular-text-divider
A modular text splitter for cryptanalysis and pattern analysis
Language: JavaScript - Size: 385 KB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0
SujethaJanet-2004/document-search-engine
A lightweight offline search engine built using pure Python. It indexes text documents, performs fast keyword search, highlights matches, and provides document insights like frequency analysis and similarity scores.
Language: Python - Size: 1.48 MB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0
KyleDerZweite/word-salat
π₯ Scramble text while keeping first and last letters intact (Cambridge effect). Includes CLI, scoring tools, and AI decoding benchmarks.
Language: Python - Size: 35.2 KB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
muijf/treelog
A customizable tree rendering library for Rust.
Language: Rust - Size: 575 KB - Last synced at: 10 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0
johnsonjh/g
g: A portable general purpose programmable text editor with calculator and macro facility.
Language: C - Size: 3.25 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 38 - Forks: 2
knime/knime-textprocessing
KNIME - Text Processing Extension (Labs)
Language: Java - Size: 63.9 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 19 - Forks: 10
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
Language: Python - Size: 1010 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 694 - Forks: 90
DevExpress-Examples/winforms-wpf-ai-text-extension
Add AI-powered text processing features to the DevExpress UI text components
Language: Visual Basic .NET - Size: 225 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0
ethiopicai/Amharic-Text-Processor
Modular Amharic text preprocessing toolkit with composable processors and pipeline.
Language: Python - Size: 207 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 5 - Forks: 0
Ponniedog/bash-ohz
π Enhance your Bash experience with bash-ohz, a collection of useful scripts and tools for streamlined command-line productivity.
Size: 1.29 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
mhakantatlici/Column-Crusher-by-MHT
Column Crusher is a web-based PHP tool you built to clean, reformat, split, merge, fix, and optimize spreadsheet columns with one click. Itβs designed to solve the daily data-processing headaches you face at our work and in your automation workflows.
Language: HTML - Size: 8.79 KB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
himkt/konoha
πΏ An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Language: Python - Size: 1.35 MB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 260 - Forks: 27
Kinosaur/natural-language-processing
This is one of my Courses from 2/2025 CS
Language: Python - Size: 1.5 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0
cspnms/MSchunker
Smart text chunker for LLM preprocessing (sections β paragraphs β sentences β hard splits).
Language: Python - Size: 90.8 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0
amirivojdan/shekar
Simplifying Persian NLP for Modern Applications
Language: Python - Size: 23.4 MB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 52 - Forks: 3
shubhro2002/Drug-Class-Prediction-from-Medical-Text
This project demonstrates a complete, end-to-end multi-label NLP pipeline that predicts drug classes from descriptive medical text. The approach combines text engineering, multi-label modeling, and systematic evaluation, forming a foundation for more advanced biomedical NLP applications.
Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0
scripal-git/scripal
universal text processor
Language: C++ - Size: 3.05 MB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 1
Nikelroid/huffman-encoder-decoder
A robust Java implementation of the Huffman coding algorithm for text compression, featuring both lossless and customizable lossy compression modes with 7-bit block optimization for enhanced storage efficiency.
Language: Java - Size: 374 KB - Last synced at: 12 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0
microsoft/browsecloud π¦
A web app to create and browse text visualizations for automated customer listening.
Language: TypeScript - Size: 5.58 MB - Last synced at: 10 days ago - Pushed at: about 2 years ago - Stars: 148 - Forks: 19
ChenghaoMou/text-dedup
All-in-one text de-duplication
Language: Python - Size: 58.9 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 730 - Forks: 74
proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Language: Python - Size: 12.8 MB - Last synced at: 12 days ago - Pushed at: about 2 years ago - Stars: 480 - Forks: 68
cainky/ReplaceText
Replaces text based on a dictionary, given user input to specify which direction (keys-to-values or values-to-keys)
Language: Python - Size: 36.1 KB - Last synced at: 13 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0
pivoshenko/ihroteka-converter
A lightweight package for converting Markdown into Steam-compatible markup
Language: Python - Size: 430 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0
mahmoudalshukri/regex-helper
A clean and modern RegEx playground built with Next.js 14, TypeScript, Tailwind CSS, and shadcn/ui. Test, debug, highlight, and understand regular expressions with an intuitive UI, real-time matching, pattern library, and token-by-token regex explanations.
Language: TypeScript - Size: 3.77 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0
MIT-LCP/bloatectomy
A python package for removing duplicate text in clinical notes or other documents
Language: TeX - Size: 7.48 MB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 10
digineo/texd
texd wraps TeX in a web API
Language: Go - Size: 1.1 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 14 - Forks: 1
Mindful-AI-Assistants/4-social-buzz-ai--Natural_Language_Processing-NL-Class_1
πͺ 4- Social Buss: NLP - Class 1 : This repository provides resources and practical implementations for Natural Language Processing (NLP) focused on social media data analysis. It includes tutorials and demos on NLP preprocessing techniques such as regex, tokenization, lemmatization, stemming, count vectorization, and stopword removal.
Language: Jupyter Notebook - Size: 12.4 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0