GitHub topics: text-processing
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
Language: Python - Size: 892 KB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 611 - Forks: 85

hitesh22rana/sourcecollector
A simple tool to consolidate multiple files into a single .txt file. Perfect for feeding your files to AI tools without any fuss.
Language: Go - Size: 27.7 MB - Last synced at: about 15 hours ago - Pushed at: about 17 hours ago - Stars: 4 - Forks: 0

Lips7/Matcher
A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matching, implemented in Rust.
Language: Rust - Size: 36.9 MB - Last synced at: about 16 hours ago - Pushed at: about 18 hours ago - Stars: 17 - Forks: 1

pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Language: Python - Size: 331 MB - Last synced at: about 18 hours ago - Pushed at: about 20 hours ago - Stars: 7,559 - Forks: 624

VitinDM/data-science-snippets
🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.
Language: Python - Size: 30.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Goldziher/html-to-markdown
HTML to markdown converter
Language: Python - Size: 453 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 58 - Forks: 14

ChenghaoMou/text-dedup
All-in-one text de-duplication
Language: Python - Size: 5.77 MB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 700 - Forks: 74

Taha5125/DocxWriter-JSON
DocxWriter is a Python library for generating professional Word documents from JSON. Automate reports, add tables, lists, images, and apply custom styles — all from clean, structured data.
Language: Python - Size: 23.4 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

thomaszilliox1/Automated-Consumer-Goods-Classification
This project is focused on segmenting e-commerce customers using unsupervised machine learning models, specifically clustering algorithms.
Language: Jupyter Notebook - Size: 8.81 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

olympus-terminal/unix-utilities
General-purpose UNIX/Linux command-line utilities
Language: Shell - Size: 25.4 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

digineo/texd
texd wraps TeX in a web API
Language: Go - Size: 1000 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 12 - Forks: 1

iarri/Shadertoy2GM
This javascript webapp converts GLSL code from shadertoy.com to Gamemaker GLSL ES as well as output other necessary code to run.
Language: JavaScript - Size: 48.8 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 3

helix-editor/nucleo
A fast and convenient fuzzy matcher library for rust
Language: Rust - Size: 232 KB - Last synced at: 2 days ago - Pushed at: about 2 months ago - Stars: 1,138 - Forks: 41

CyberCRI/refinedoc
python library for post-extraction refinement of text that may be derived from PDF extraction.
Language: Python - Size: 23.4 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 7 - Forks: 2

BurntSushi/aho-corasick
A fast implementation of Aho-Corasick in Rust.
Language: Rust - Size: 4.71 MB - Last synced at: 1 day ago - Pushed at: 10 months ago - Stars: 1,122 - Forks: 103

victoria217-bottino/google-news-scraper
# 📰 Google News Scraper A Python tool to fetch, decode, and process Google News articles by keyword and time range. Extract clean article text, decode URLs, and perform NLP effortlessly. Perfect for news aggregation, analysis, or building bots. Includes progress tracking with `tqdm` and customizable features for advanced use cases. 🚀
Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 3 - Forks: 1

ds-modules/CUNEIF-102A
UC Berkeley CUNEIF 102A (Sumerian Text Analysis) Fall 2017
Language: Jupyter Notebook - Size: 40.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 6 - Forks: 0

tamtural/premium-file-parser
This parser extracts key financial transaction info from fixed-width carrier-generated premium files and classifies transaction types based on receipt references and refund/chargeback indicators.
Language: Python - Size: 0 Bytes - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

12345far/metrics-calculation-precision-recall
Laboratory 7 - Retrieval Information
Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

teenu/gpu-text-search
Ultra-high-performance GPU-accelerated text search using Metal compute shaders
Language: Swift - Size: 554 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

IG-onGit/TexeT
TexeT is the tool you need to take your interaction and content control to the next level.
Language: Python - Size: 117 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Darko-Martinovic/MeetingTranscriptProcessor
🤖 Intelligent meeting transcript processor that automatically extracts action items using Azure OpenAI and creates Jira tickets. Supports multiple file formats with fallback to rule-based processing when AI is unavailable.
Language: C# - Size: 188 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

DougLau/booky
A tool to analyze English text
Language: Rust - Size: 1.92 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

David-Langat/Information_Retrieval
An Information Retrieval system that processes and ranks news articles. It parses XML files, applies stop-word removal and stemming, and uses TF-IDF and BM25 algorithms to score documents against user queries, sorting them by relevance.
Language: Python - Size: 69.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London collaboration to standardize text and table data extracted from full text publications. See Open Access publication at: https://doi.org/10.3389/fdgth.2022.788124.
Language: HTML - Size: 57.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 21 - Forks: 8

linuxscout/pyarabic
pyarabic
Language: Python - Size: 1.23 MB - Last synced at: about 9 hours ago - Pushed at: over 1 year ago - Stars: 459 - Forks: 88

yuvrajpandiya/Piero-EnDe-Coder
A powerful encryption and decryption tool that combines the Vigenère cipher, XOR encryption, and Base64 encoding to secure messages. This tool allows users to encode and decode messages using a secret key, ensuring an extra layer of security.
Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Automattic/go-search-replace
🚀 Search & replace URLs in WordPress SQL files.
Language: Go - Size: 104 KB - Last synced at: 2 days ago - Pushed at: 23 days ago - Stars: 98 - Forks: 19

pyparsing/pyparsing
Python library for creating PEG parsers
Language: Python - Size: 7.8 MB - Last synced at: 3 days ago - Pushed at: 9 days ago - Stars: 2,358 - Forks: 291

open-korean-text/open-korean-text
Open Korean Text Processor - An Open-source Korean Text Processor
Language: Scala - Size: 32.7 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 634 - Forks: 98

KaizoKonpaku/Hush
AI-Powered Screenshot, Audio Transcription, and Text Processing for macOS, Hidden from Screen Sharing, Packed with Features, and Just 2MB
Language: Swift - Size: 12 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 40 - Forks: 9

andalugeeks/andaluh-py
Transliterate español (spanish) spelling to andaluz proposals using python
Language: Python - Size: 802 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 23 - Forks: 3

fossology/atarashi
Atarashi scans for license statements in open source software, focusing on text statistics. Designed to work stand-alone and with FOSSology.
Language: Python - Size: 46.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 29 - Forks: 29

alihoseiny/word_cloud_fa
A wrapper for wordcloud module for creating Persian word clouds.
Language: Python - Size: 1.76 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 145 - Forks: 13

PyThaiNLP/pythainlp
Thai natural language processing in Python
Language: Python - Size: 65.6 MB - Last synced at: 6 days ago - Pushed at: 12 days ago - Stars: 1,045 - Forks: 280

milliorn/cli-password-generators
Simple command-line applications for generating passwords
Language: Go - Size: 6.87 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

casics/nostril 📦
Nostril: Nonsense String Evaluator
Language: Python - Size: 143 MB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 194 - Forks: 35

dataout-org/hate_crimes_2010_2023
Identifying hate crimes against LGBTQIA+ people in Russia in court rulings
Language: Jupyter Notebook - Size: 20.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

rhiosutoyo/Teaching-Deep-Learning-and-Its-Applications
This course introduces the building blocks of deep learning and provides overview of various deep learning architectures. It also demonstrates how to solve real-world problems using a practical approach.
Language: Jupyter Notebook - Size: 30.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

fullscreen-triangle/kwasa-kwasa
Semantic computing framework with meta-cognitive orchestration and biomimetic principles
Language: Rust - Size: 9.93 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 0

hyung-hwan/hawk
An AWK interpreter
Language: C - Size: 4.41 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8 - Forks: 1

Willgnner-Santos/DPE-Legal-Doc-Classification-Pipeline
The results are drawn from experiments on the classification of legal documents using LLMs in a real-world institutional setting
Language: Jupyter Notebook - Size: 45.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

rhaberkorn/sciteco
Advanced TECO dialect and interactive screen editor based on Scintilla
Language: C - Size: 3.61 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 51 - Forks: 6

Sumit-807/newsnow
NewsNow offers a clean and elegant interface for reading real-time trending news. 🌐 Dive into the latest updates and enjoy seamless access with GitHub OAuth integration! 🐙
Language: TypeScript - Size: 4.55 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

homeofhx/Text-Purifier
Simple Mac application that filters out specific characters in given text using regular expression (Regex)
Language: Swift - Size: 1.14 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

alirezatheh/perke
A keyphrase extractor for Persian
Language: Python - Size: 143 KB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 69 - Forks: 8

theveryhim/Frequent-item-sets-And-LSH
A practice on finding frequent item sets and similar items in pysaprk framework
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

davidavidnitish/CoreTex
Discover the CORTEX Anomaly Detection app with real-time AI and facial recognition. Explore its cyberpunk interface and advanced features. 🌐💻
Size: 1000 Bytes - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

kupolak/textstat
Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
Language: Ruby - Size: 242 KB - Last synced at: 7 days ago - Pushed at: 12 months ago - Stars: 34 - Forks: 10

maqeel019/ATS
A powerful Python-based ATS that parses and ranks PDF resumes on recruiter-defined filters like skills, education, and experience. Handles scanned and complex resumes with detailed scoring and Excel output.
Language: Python - Size: 6.77 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

sstadick/hck
A sharp cut(1) clone.
Language: Rust - Size: 494 KB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 715 - Forks: 18

derek73/python-nameparser
A simple Python module for parsing human names into their individual components
Language: Python - Size: 778 KB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 675 - Forks: 105

twardoch/split-markdown4gpt
A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.
Language: Python - Size: 78.1 KB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 24 - Forks: 2

ProfRandom/Excel-Lambda-Suite
Reusable Excel LAMBDA function library for modeling, simulation, statistics, and advanced spreadsheet design.
Size: 2.02 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

nilskruthoff/pptx-parser
Parses PowerPoint presentations into Markdown syntax
Language: Rust - Size: 145 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

google/diff-match-patch 📦
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Language: Python - Size: 659 KB - Last synced at: 12 days ago - Pushed at: about 1 year ago - Stars: 7,804 - Forks: 1,144

notesjor/corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Language: C# - Size: 32.5 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 23 - Forks: 3

EDeev/chatping_abobot
Многофункциональный Telegram-бот для управления группами с аналитикой активности, интеллектуальными упоминаниями и интерактивными функциями
Language: Python - Size: 3.24 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

dnafication/llm-textfix
Sanitize LLM output by detecting and replacing 25+ problematic characters
Language: TypeScript - Size: 0 Bytes - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

ZeroX-DG/vi-rs
Vietnamese Input Method library
Language: Rust - Size: 385 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 152 - Forks: 15

bugbundle/texdora
A unique Docker image to build LaTeX documentation.
Language: Dockerfile - Size: 226 KB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

paul-j-lucas/wrap
Text reformatter better than fmt(1) or fold(1).
Language: C - Size: 2.94 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 16 - Forks: 4

voidful/TFkit
🤖📇 handling multiple nlp task in one pipeline
Language: Python - Size: 15.9 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 56 - Forks: 6

Arash-Mansourpour/MultiAgent-Chain-of-Expert
MultiAgent Chain of Expert: A Python app using Groq API for dual-model text processing. Gemma analyzes, LLaMA responds, with a modern tkinter GUI. Features history tracking, file I/O, and customizable AI settings. Secure API key handling via .env. MIT License.
Language: Python - Size: 0 Bytes - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

bocaletto-luca/TextEditorQt
This program is a simple text editor with an intuitive user interface, created using the PyQt5 framework for developing desktop applications in Python. The text editor provides many basic features expected from an editor, along with advanced functionalities such as text formatting.
Language: Python - Size: 34.2 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 8 - Forks: 2

blueheron786/line-by-line-quran
Scrapes the Qur'an text, from quran.com, and generates one page per file, with one line per line of the mushaf. This is the "15 line mushaf" which is also known as the Uthmani and Madini mushaf.
Language: Python - Size: 3.91 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

Hasnat-Aarif-Aslam/NLP-Foundation-Tokens-Ngrams-BoW-TF-IDF-TFIDF
Comprehensive guide to text preprocessing and vectorization techniques for NLP, covering tokenization, n-grams, Bag-of-Words, TF-IDF, and related feature-engineering methods.
Size: 2.93 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

himkt/konoha
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Language: Python - Size: 1.35 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 251 - Forks: 28

Moez-lab/parallel-keyword-scanner
High-performance keyword scanner for text and PDF files with multiprocessing and a modern React UI.
Language: TypeScript - Size: 80.1 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

samwega/obsidian-wordsmith Fork of chrisgrieser/obsidian-proofreader
AI-powered context-aware writing assistant for Obsidian. Instantly improve, translate, or generate new text with context-aware AI inline suggestions, custom prompts, and granular review. Supports ALL remote and local models. Enjoy a seamless, keyboard-first workflow for editing, refining, and creative writing—all within your notes.
Language: TypeScript - Size: 986 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

rlayers/pawpaw
Text Processing & Segmentation Framework
Language: Python - Size: 2.52 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 23 - Forks: 4

proycon/pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Language: Python - Size: 12.8 MB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 475 - Forks: 68

alexandersisco/kubun
Python-style slicing for paths and delimiter-separated strings, from your terminal.
Language: Go - Size: 29.3 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

arverma/HindiXlit Fork of AI4Bharat/IndicXlit
Transliteration models for Roman to Devanagari language
Language: Python - Size: 45.8 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

twardoch/wiktra2 Fork of kbatsuren/wiktra
Wiktra: transliteration tool using Wiktionary transliteration modules. Version 2 (fork)
Language: Lua - Size: 1.29 MB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 4 - Forks: 0

Mukeshthenraj/date-extraction-project
Extract and normalize dates from unstructured medical notes using Python and regular expressions.
Language: Python - Size: 40 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

loderunner/typelit
A type-safe string templating library for TypeScript
Language: TypeScript - Size: 381 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 1

dewanakl/aman
🤬 Filter kata kotor sederhana dengan regex. Cek, sensor, dan hapus kata kasar dengan pola karakter mirip.
Language: PHP - Size: 85 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 2 - Forks: 2

shama-llama/pdf-epub-converter
PDF to EPUB conversion using ML for layout detection
Language: Python - Size: 140 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

binsarjr/chatbot-indonesia
Kumpulan data yang akan digunakan untuk keperluan chatbot bahasa Indonesia dengan kode chatbot sederhana menggunakan Typescript
Language: TypeScript - Size: 559 KB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 35 - Forks: 12

sunsided/merge-whitespace-rs
Procedural macros for merging whitespace in const contexts
Language: Rust - Size: 101 KB - Last synced at: 10 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

weiwei/silabacion
Convert Spanish words into syllables
Language: TypeScript - Size: 1.62 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 8 - Forks: 0

hakatashi/japanese.js
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Language: JavaScript - Size: 283 KB - Last synced at: 12 days ago - Pushed at: almost 5 years ago - Stars: 168 - Forks: 3

znwang25/fuzzychinese
A small package to fuzzy match chinese words
Language: Python - Size: 1.81 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 88 - Forks: 10

Puchaczov/Musoq
SQL Syntax without any database
Language: C# - Size: 15.7 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 482 - Forks: 21

YULINHEEE/NLP-text-preprocessing-and-classification
Starter code to solve real-world text data problems related to job advertisements. Includes: Word2Vec, phrase embeddings, Text Classification with Logistic Regression, simple text preprocessing, pre-trained embeddings and more.
Language: Jupyter Notebook - Size: 1.21 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

shohanur-shoron/bangla_normalizer
A Python library designed to convert various written forms of Bengali text elements (like numbers, dates, times, currency, percentages, distances, etc.) into their corresponding spoken word representations.
Language: Python - Size: 96.7 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

bithead21/parcel
Parser for cpp programms! Parcel is simple language for parsing text information and retrieving any data.
Language: C++ - Size: 1.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 0

guillaumeast/mentorai
Turn any YouTube channel into a full Custom GPT (avatar, settings, transcripts)
Language: Shell - Size: 61.5 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

sdleffler/qp-trie-rs
An idiomatic and fast QP-trie implementation in pure Rust.
Language: Rust - Size: 80.1 KB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 25

mary-lev/mary-lev.github.io
Just another blog
Language: HTML - Size: 19.9 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

phil65/docler
Abstractions & Tools for OCR / document processing
Language: Python - Size: 2.28 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

brothersincode/virastar
Cleaning-up Persian Texts!
Language: JavaScript - Size: 1.3 MB - Last synced at: 1 day ago - Pushed at: 2 months ago - Stars: 138 - Forks: 15

hasinhayder/javascript-text-expander
Expands texts as you type, naturally
Language: JavaScript - Size: 12.7 KB - Last synced at: 10 days ago - Pushed at: almost 2 years ago - Stars: 67 - Forks: 19

Lord-Memester/tagger-txt-to-XMP
A python script to convert the .txt files generated by an automatic tagger plugin for Automatic1111's stable diffusion Web UI into XMP sidecar files interpretable by Immich.
Language: Python - Size: 105 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 0

LunarisApp/text-tools
A collection of text processing tools
Language: TypeScript - Size: 2.58 MB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

AlanSteinbarth/Audio2Tekst
Profesjonalny konwerter audio na tekst wykorzystujący OpenAI Whisper. Wspiera batch processing, eksport do różnych formatów (TXT, DOCX, PDF). GUI z drag&drop, progress tracking i opcjami konfiguracji jakości transkrypcji. Idealny dla dziennikarzy, studentów i twórców treści.
Language: Python - Size: 3.47 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

DineshDhamodharan24/Data_Science_Final_Project
Customer Insights & Recommendation System: Harnessing Decision Tree, Logistic Regression, and Random Forest models for behavior analysis. Utilizing EasyOCR and Python Imaging Library for image information extraction. Employing NLTK for sentiment analysis on textual data
Language: Jupyter Notebook - Size: 21.1 MB - Last synced at: 17 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

MonikaBarget/atr-historical-research
Automated Text Recognition in Historical Research
Language: Jupyter Notebook - Size: 2.92 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 14

Romelium/mpatch
A fuzzy patch tool in Rust for applying AI-generated diffs from markdown, ignoring line numbers.
Language: Rust - Size: 0 Bytes - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0
