Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: text-extraction

Repositories

lu4p/cat

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

Language: Go - Size: 215 KB - Last synced: about 5 hours ago - Pushed: 6 months ago - Stars: 89 - Forks: 18

TYPO3-Solr/ext-tika

A TYPO3 CMS extension that provides Apache Tika functionality

Language: PHP - Size: 2.11 MB - Last synced: about 18 hours ago - Pushed: 2 days ago - Stars: 6 - Forks: 29

edhou20/Medical-Texts-NLP-Clustering-

Language: Python - Size: 8.79 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

OCR with Tesseract and OpenCV: Extract text from images effortlessly. Preprocess with OpenCV for accuracy. Display results and save output. Easy integration for document digitization and data entry automation.

Language: Python - Size: 1.12 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

miso-belica/jusText

Heuristic based boilerplate removal tool

Language: Python - Size: 1.01 MB - Last synced: 2 days ago - Pushed: 7 days ago - Stars: 683 - Forks: 78

miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

Language: Python - Size: 1.57 MB - Last synced: 4 days ago - Pushed: 10 days ago - Stars: 3,429 - Forks: 524

bookieio/breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

Language: HTML - Size: 604 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 203 - Forks: 26

py-pdf/benchmarks

Benchmarking PDF libraries

Language: Python - Size: 3.73 MB - Last synced: 7 days ago - Pushed: 7 months ago - Stars: 153 - Forks: 8

nguyen-tho/ID-card-extract-module

Language: Python - Size: 59.5 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 2 - Forks: 0

ropensci/pdftools

Text Extraction, Rendering and Converting of PDF Documents

Language: C++ - Size: 1.08 MB - Last synced: 4 days ago - Pushed: 7 months ago - Stars: 500 - Forks: 69

ICIJ/datashare

A self-hosted search engine for documents.

Language: Java - Size: 313 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 542 - Forks: 48

jmriebold/BoilerPy3 Fork of mercuree/BoilerPy

Python port of Boilerpipe library

Language: Python - Size: 188 KB - Last synced: 9 days ago - Pushed: 7 months ago - Stars: 76 - Forks: 17

WonhoZhung/ee474

EE474 Term Project

Language: Python - Size: 457 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 3 - Forks: 1

unidoc/unipdf

Golang PDF library for creating and processing PDF files (pure go)

Language: Go - Size: 112 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 2,368 - Forks: 245

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Language: Scala - Size: 39.5 MB - Last synced: 5 days ago - Pushed: 3 months ago - Stars: 133 - Forks: 33

chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Language: Python - Size: 31.5 MB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 1,420 - Forks: 233

OwenOrcan/YiraBot-Crawler

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

Language: Python - Size: 207 KB - Last synced: 11 days ago - Pushed: 2 months ago - Stars: 13 - Forks: 0

pd3f/pd3f-core

📑 Python Package to reconstruct the original continuous text from PDFs with language models

Language: Jupyter Notebook - Size: 1.31 MB - Last synced: 6 days ago - Pushed: 8 months ago - Stars: 35 - Forks: 8

amenezes/aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

Language: Python - Size: 2.13 MB - Last synced: 14 days ago - Pushed: 3 months ago - Stars: 15 - Forks: 5

iscc/mobi

python based software to unpack kindlegen generated ebooks

Language: Python - Size: 761 KB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 55 - Forks: 8

ckorzen/pdf-text-extraction-benchmark

A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.

Language: TeX - Size: 505 MB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 60 - Forks: 11

weareprestatech/hotpdf

hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six

Language: Python - Size: 16.5 MB - Last synced: 20 days ago - Pushed: about 2 months ago - Stars: 164 - Forks: 8

gamemaker1/office-text-extractor

Yet another library to extract text from MS Office and PDF files

Language: TypeScript - Size: 2.71 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 42 - Forks: 3

abhinaba-ghosh/any-text

Get text content from any file

Language: JavaScript - Size: 226 KB - Last synced: 26 days ago - Pushed: 3 months ago - Stars: 54 - Forks: 8

HilaManor/Scene-Understanding-Based-on-Text-Extraction

An algorithm (and a wrapping system) for finding the geographic location of a set of given photos based on extracting and analyzing text in the images.

Language: Python - Size: 25.3 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1

adbar/trafilatura

Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

Language: Python - Size: 23.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,688 - Forks: 205

flairNLP/fundus

A very simple news crawler with a funny name

Language: Python - Size: 14.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 38 - Forks: 5

zanachka/dateparser Fork of scrapinghub/dateparser

python parser for human readable dates

Language: Python - Size: 5.03 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

zanachka/extruct Fork of scrapinghub/extruct

Extract embedded metadata from HTML markup

Language: Python - Size: 990 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

cdown/srt

A simple library and set of tools for parsing, modifying, and composing SRT files.

Language: Python - Size: 406 KB - Last synced: 30 days ago - Pushed: about 2 months ago - Stars: 422 - Forks: 44

Banner-19/Extraction-and-Analysis-of-Text

The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.

Language: Jupyter Notebook - Size: 459 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

IDisposable/IFilterExtractor

A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.

Language: C++ - Size: 42 KB - Last synced: about 1 month ago - Pushed: about 7 years ago - Stars: 8 - Forks: 5

Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

MRGRD56/textractor-translator

Translate visual novels and other games in real time

Language: TypeScript - Size: 2.17 MB - Last synced: 29 days ago - Pushed: 30 days ago - Stars: 9 - Forks: 0

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 34.5 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 44 - Forks: 11

rmottanet/unchainedtext

UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.

Language: Python - Size: 31.3 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

shixzie/nlp 📦

[UNMANTEINED] Extract values from strings and fill your structs with nlp.

Language: Go - Size: 50.8 KB - Last synced: 16 days ago - Pushed: over 6 years ago - Stars: 386 - Forks: 33

mohitpg/SimpleTextExtractor

Text extraction using pytorch

Language: Jupyter Notebook - Size: 4.49 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

dotfurther/OpenDiscoverSDK

.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.

Language: C# - Size: 170 MB - Last synced: about 2 months ago - Pushed: 3 months ago - Stars: 12 - Forks: 0

unidoc/unidoc

This repository has moved! https://github.com/unidoc/unipdf

Language: Go - Size: 29.3 MB - Last synced: 29 days ago - Pushed: almost 5 years ago - Stars: 704 - Forks: 88

fourdigits/wagtail_textract

Text extraction for Wagtail document search

Language: Python - Size: 1.02 MB - Last synced: 19 days ago - Pushed: 7 months ago - Stars: 31 - Forks: 13

dayrev/extractor

Web Page Content Extractor

Language: PHP - Size: 45.9 KB - Last synced: about 2 months ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 0

rajdeep2804/Automated_Invoice_Processing

The number of types of physical documents being digitized is on the increase. Medical bills, bank documents and personal documents are examples of such documents. Objective of this repo is to implement and understand such use cases with an example of extracting text information from invoice receipts.

Language: Jupyter Notebook - Size: 16.4 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 5 - Forks: 0

AndyTheFactory/article-extraction-dataset

Article title, authors, date and body extraction dataset.

Language: HTML - Size: 31.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

globality-corp/deboiler

Deboiler - Boilerplate Identification and Removal

Language: Python - Size: 1.42 MB - Last synced: 20 days ago - Pushed: 4 months ago - Stars: 7 - Forks: 0

hscspring/pnlp

NLP预/后处理工具。

Language: Python - Size: 230 KB - Last synced: 19 days ago - Pushed: 4 months ago - Stars: 27 - Forks: 7

Rindhujatreesa/Deep_Learning_Projects

This is the repository for Deep Learning Projects that include Classification using CNN, TensorFlow, Keras, and Text Extraction using PyTesseract

Language: Jupyter Notebook - Size: 9.1 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

hreikin/pdf-toolbox

Extract content from PDF's and convert or create new documents from the content in multiple output formats.

Language: Python - Size: 7.57 MB - Last synced: 2 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0

Lanjkn/Text-Extractor

Api to get text from multiple types of files

Language: Python - Size: 4.88 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

FileFormatInfo/ff-pdf2txt

Simple server to extract text from a PDF

Language: Java - Size: 5.99 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 3 - Forks: 2

mciccale/ScholarVista

ScholarVista analyses research papers and extracts/plots information about them. It uses Grobid to extract all the content of the research papers. Then all this data is plotted and displayed using Python.

Language: Python - Size: 3.24 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

sambitdash/PDFIO.jl

PDF Reader Library for Native Julia.

Language: Julia - Size: 24.4 MB - Last synced: 6 days ago - Pushed: 7 months ago - Stars: 122 - Forks: 13

sankeer28/URL-Extractor-and-Downloader

Extracts multiple URLs from text, and if downloadable, downloads them into a ZIP

Language: HTML - Size: 21.5 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

heussd/pdftotext-go

Extract texts + their page numbers from PDF

Language: Go - Size: 206 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 3 - Forks: 0

dataiku/dss-plugin-tesseract-ocr

Dataiku DSS plugin to perform optical character recognition (OCR) using the Tesseract engine.

Language: Python - Size: 2.38 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1 - Forks: 0

skylander86/lambda-text-extractor

AWS Lambda functions to extract text from various binary formats.

Language: Python - Size: 111 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 171 - Forks: 40

pd3f/pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

Language: HTML - Size: 930 KB - Last synced: 3 months ago - Pushed: 7 months ago - Stars: 246 - Forks: 33

whitelok/image-text-localization-recognition

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

Size: 333 KB - Last synced: 3 months ago - Pushed: 8 months ago - Stars: 936 - Forks: 238

ssciwr/AMMICO

AI Media and Misinformation Content Analysis Tool: Analyze text and images

Language: Python - Size: 87.7 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 4 - Forks: 3

ad-freiburg/pdftotext-plus-plus

A fast and accurate command line tool for extracting text from PDF files.

Language: C++ - Size: 18.2 MB - Last synced: about 2 months ago - Pushed: 7 months ago - Stars: 8 - Forks: 0

Jaha96/tesseract-quick-implementation

Tesseract-OCR quick implementation. Linked with stack-overflow question

Language: HTML - Size: 190 MB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Altabeh/tesseract-ocr-wrapper

This is a highly efficient python wrapper for tesseract-ocr.

Language: Python - Size: 26.4 KB - Last synced: 2 months ago - Pushed: almost 2 years ago - Stars: 16 - Forks: 3

yoshihikoueno/pdfminer-layout-scanner Fork of dpapathanasiou/pdfminer-layout-scanner

A more complete example of programming with PDFMiner, which continues where the default documentation stops

Language: Python - Size: 26.4 KB - Last synced: about 1 month ago - Pushed: almost 5 years ago - Stars: 8 - Forks: 4

KalyanM45/Optical-Character-Recognition

This project is a Python-based Optical Character Recognition (OCR) application using the EasyOCR library. It provides a convenient way to detect and recognize text in images, making it useful for a wide range of applications such as document processing, image captioning, and text extraction.

Language: Jupyter Notebook - Size: 720 KB - Last synced: 14 days ago - Pushed: 12 months ago - Stars: 3 - Forks: 0

vsymbol/CUTIE

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

Language: Python - Size: 2.87 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 156 - Forks: 80

andrealenzi11/py-poppleract

Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents

Language: Python - Size: 195 KB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 5 - Forks: 0

Anannyap7/text-extraction-from-image Fork of taritkandpal/text-extraction-from-image

Handwritten Text Extraction from an Image of a Document using Transformers

Language: Python - Size: 13.7 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

lykmapipo/US-Inaugural-Addresses

Python scripts to download, process, and analyze US Inaugural Addresses

Language: Python - Size: 4.45 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

vaites/php-apache-tika

Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats

Language: PHP - Size: 13.8 MB - Last synced: 21 days ago - Pushed: 9 months ago - Stars: 111 - Forks: 21

sanidhyajadaun/MediLink Fork of prakratisingh/MediLink

MediLink is a web application that revolutionizes health record management by seamlessly integrating NLP techniques for handwritten text extraction on prescriptions and blockchain technology for secure data storage.

Language: HTML - Size: 146 KB - Last synced: 4 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

ParisaArbab/Data-Modeling

Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information

Language: Python - Size: 2.79 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

rajesh-bhat/spark-ai-summit-2020-text-extraction

Language: Jupyter Notebook - Size: 105 MB - Last synced: 2 months ago - Pushed: over 3 years ago - Stars: 58 - Forks: 33

ingmarboeschen/JATSdecoder

A text extraction and manipulation toolset for NISO-JATS coded XML files

Language: R - Size: 2.64 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 14 - Forks: 0

dotfurther/OpenDiscoverPlatformCaseStudy

Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.

Size: 5.92 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 10 - Forks: 0

Artaal/License_Plate_Detection_And_Text_Extraction

License plate localizer using pre-trained YOLOv5, combined with text extraction using pre-trained TrOCR

Language: Python - Size: 239 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

atahanuz/yt2text

Extract text from a YouTube video in a single command, using OpenAi's Whisper speech recognition model.

Size: 10.7 KB - Last synced: 2 days ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

jessp01/zaje

Highlight/colourise command output, logfiles (and anything else really) based on regex pattern matching

Language: Go - Size: 8.04 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 4 - Forks: 0

SapienzaNLP/extend

Entity Disambiguation as text extraction (ACL 2022)

Language: Python - Size: 71.3 KB - Last synced: 6 months ago - Pushed: about 2 years ago - Stars: 148 - Forks: 9

jhw296/BookScanner

PyQt5를 사용한 간단한 도서 스캐너 프로젝트 (바코드 인식과 텍스트 추출을 통한 도서 정보를 검색 및 표시)

Language: Python - Size: 83 KB - Last synced: 7 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

Govind-S-B/pdf-to-text-chroma-search

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

Language: Python - Size: 0 Bytes - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

sharmaroshan/Text-Classification

This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.

Language: HTML - Size: 88.9 KB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 1

nehalAggarwal/Linguista

An application that translates foreign text on signboards, menu cards, etc. by capturing its image and translating it to desired language in real time supporting Spanish, German, French, Punjabi, Hindi, etc

Language: Python - Size: 16.5 MB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

anhthuan1999/PhoBERT-Extraction

Extract vectors by setences and words with one layer or concat more layers

Language: Python - Size: 5.86 KB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 0

nezamtrm/Extracting-contents-of-a-table-in-pdf-file-by-pdfplumber

Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

nbdy/prntscrngrb

prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database

Language: Python - Size: 68.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

abs-sayem/nlp

NLP, NLP Basic. Related NLP Projects

Language: Python - Size: 49.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

juliandavidmr/text2locale

Extract all the texts of any project with HTML files and generate a KV (Key-Value) file, key = reference key, value = extracted text.

Language: JavaScript - Size: 328 KB - Last synced: 3 months ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

nainiayoub/pdf-text-data-extractor

PDF text data extraction web app with OCR for scanned documents

Language: Python - Size: 56.6 KB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 43 - Forks: 21

mknz/mirusan 📦

A PDF collection reader with built-in full-text search engine

Language: JavaScript - Size: 2.71 MB - Last synced: 9 months ago - Pushed: almost 7 years ago - Stars: 19 - Forks: 0

FurkanOM/basic-web-crawler

It's a basic web crawler API implementation

Language: JavaScript - Size: 239 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1

SaviorVX/WindowsExplorerPreviewExpander

reg setting for more text previews for almost every file, in this case it allows explorer to view different extensions as a text thus a quick preview is allowed. perfect for analysis for aquick preview off DLL Java Jar Python Etc

Size: 39.1 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

shelfio/apache-tika-lambda-layer

AWS Lambda layer containing latest version of Apache Tika

Language: Shell - Size: 327 MB - Last synced: 3 months ago - Pushed: 6 months ago - Stars: 13 - Forks: 5

8Altair/Text-to-speech

A program for text-to-speech conversion from pdf file into a mp3 file.

Language: Python - Size: 22.8 MB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

iamvatsalpatel/CHARUSAT-SceW

An iOS application 📱 that extracts text real time using camera 📷 and play relevant video from the text

Language: Swift - Size: 10.5 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 2 - Forks: 3

rachhek/pdf-search-assistant

This assistant tool (WIP) will help you search, browse and summarize the answers to your questions from your uploaded PDF using advanced text analytics, semantic search and Large Language Model (LLM)

Language: Bicep - Size: 301 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

fonckchain/pdf-text-converter

Python tool for converting PDF files to text. Simplify your document processing tasks.

Language: Python - Size: 1000 Bytes - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

procesaur/TExASe

Flask application for OCR and extraction of text from documents with support for repository applications

Language: Python - Size: 14.7 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0

MaarkNassef/GraduationProject

HR Assistant: Web application for efficient HR recruitment and resume management. Utilizes OCR for text extraction and similarity analysis to rearrange resumes based on job descriptions. Simplifies the hiring process for HR recruiters and enhances candidate selection.

Language: HTML - Size: 14.8 MB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 1

SECRET-GUEST/transmutation_vscode.ext

A Visual Studio Code tool for easy HTML text extraction and management.

Language: TypeScript - Size: 87.9 KB - Last synced: 18 days ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

Asraf2asif/SummifyAI

Harnesses the power of OpenAI's to revolutionize the way you consume information. Say goodbye to information overload and hello to quick and comprehensive understanding. Let our AI-Powered Content Summarizer extract the key insights from any text, allowing you to focus on what matters most.

Language: JavaScript - Size: 585 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

Related Keywords

text-extraction 184 python 45 ocr 23 nlp 23 pdf 23 machine-learning 14 python3 13 image-processing 12 text-mining 10 html-extraction 10 tesseract-ocr 10 text 9 tesseract 9 deep-learning 9 natural-language-processing 9 golang 8 web-scraping 8 opencv 7 text-processing 7 data-extraction 7 api 6 automation 6 text-recognition 6 tika 6 computer-vision 6 artificial-intelligence 5 text-analysis 5 opencv-python 5 text-detection 5 ocr-recognition 5 text-cleaning 5 optical-character-recognition 5 nltk 5 pdf-files 4 extract-text 4 pypdf2 4 tensorflow 4 pytesseract 4 ocr-python 4 pytesseract-ocr 4 pdftotext 4 article-extractor 4 corpus 4 php 4 metadata 4 pdf-to-text 4 crawler 4 translation 3 text-extraction-from-image 3 flask 3 news-scraping 3 docker 3 apache-tika 3 ai 3 pdf-library 3 cli 3 pdf-document-processor 3 news 3 text-classification 3 go 3 html2text 3 regex 3 scraper 3 search 3 text-generation 3 scraping-websites 3 dataset 3 pytorch 3 pdf-text-extraction 3 html-parsing 3 html-extractor 3 extraction 3 spacy 3 extractor 3 text-preprocessing 3 scraping 3 news-crawler 3 javascript 3 readability 3 sdk 2 news-aggregator 2 shell 2 data-mining 2 command-line-tool 2 full-text-search 2 windows 2 tika-server 2 parse 2 html-to-markdown 2 corpus-tools 2 corpus-builder 2 flask-application 2 image-recognition 2 html-css-javascript 2 convolutional-neural-networks 2 electron 2 text-extractor 2 parser 2 gensim 2 nodejs 2