Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: text-extraction
lu4p/cat
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Language: Go - Size: 215 KB - Last synced: about 5 hours ago - Pushed: 6 months ago - Stars: 89 - Forks: 18
TYPO3-Solr/ext-tika
A TYPO3 CMS extension that provides Apache Tika functionality
Language: PHP - Size: 2.11 MB - Last synced: about 18 hours ago - Pushed: 2 days ago - Stars: 6 - Forks: 29
edhou20/Medical-Texts-NLP-Clustering-
Language: Python - Size: 8.79 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
real0x0a1/ocr-opencv
OCR with Tesseract and OpenCV: Extract text from images effortlessly. Preprocess with OpenCV for accuracy. Display results and save output. Easy integration for document digitization and data entry automation.
Language: Python - Size: 1.12 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0
miso-belica/jusText
Heuristic based boilerplate removal tool
Language: Python - Size: 1.01 MB - Last synced: 2 days ago - Pushed: 7 days ago - Stars: 683 - Forks: 78
miso-belica/sumy
Module for automatic summarization of text documents and HTML pages.
Language: Python - Size: 1.57 MB - Last synced: 4 days ago - Pushed: 10 days ago - Stars: 3,429 - Forks: 524
bookieio/breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Language: HTML - Size: 604 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 203 - Forks: 26
py-pdf/benchmarks
Benchmarking PDF libraries
Language: Python - Size: 3.73 MB - Last synced: 7 days ago - Pushed: 7 months ago - Stars: 153 - Forks: 8
nguyen-tho/ID-card-extract-module
Language: Python - Size: 59.5 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 2 - Forks: 0
ropensci/pdftools
Text Extraction, Rendering and Converting of PDF Documents
Language: C++ - Size: 1.08 MB - Last synced: 4 days ago - Pushed: 7 months ago - Stars: 500 - Forks: 69
ICIJ/datashare
A self-hosted search engine for documents.
Language: Java - Size: 313 MB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 542 - Forks: 48
jmriebold/BoilerPy3 Fork of mercuree/BoilerPy
Python port of Boilerpipe library
Language: Python - Size: 188 KB - Last synced: 9 days ago - Pushed: 7 months ago - Stars: 76 - Forks: 17
WonhoZhung/ee474
EE474 Term Project
Language: Python - Size: 457 MB - Last synced: 13 days ago - Pushed: 13 days ago - Stars: 3 - Forks: 1
unidoc/unipdf
Golang PDF library for creating and processing PDF files (pure go)
Language: Go - Size: 112 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 2,368 - Forks: 245
archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced: 5 days ago - Pushed: 3 months ago - Stars: 133 - Forks: 33
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tikaโข REST services allowing Tika to be called natively in the Python community.
Language: Python - Size: 31.5 MB - Last synced: 10 days ago - Pushed: about 1 month ago - Stars: 1,420 - Forks: 233
OwenOrcan/YiraBot-Crawler
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
Language: Python - Size: 207 KB - Last synced: 11 days ago - Pushed: 2 months ago - Stars: 13 - Forks: 0
pd3f/pd3f-core
๐ Python Package to reconstruct the original continuous text from PDFs with language models
Language: Jupyter Notebook - Size: 1.31 MB - Last synced: 6 days ago - Pushed: 8 months ago - Stars: 35 - Forks: 8
amenezes/aiopytesseract
A Python asyncio wrapper for Tesseract-OCR.
Language: Python - Size: 2.13 MB - Last synced: 14 days ago - Pushed: 3 months ago - Stars: 15 - Forks: 5
iscc/mobi
python based software to unpack kindlegen generated ebooks
Language: Python - Size: 761 KB - Last synced: 20 days ago - Pushed: over 1 year ago - Stars: 55 - Forks: 8
ckorzen/pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
Language: TeX - Size: 505 MB - Last synced: 22 days ago - Pushed: over 3 years ago - Stars: 60 - Forks: 11
weareprestatech/hotpdf
hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six
Language: Python - Size: 16.5 MB - Last synced: 20 days ago - Pushed: about 2 months ago - Stars: 164 - Forks: 8
gamemaker1/office-text-extractor
Yet another library to extract text from MS Office and PDF files
Language: TypeScript - Size: 2.71 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 42 - Forks: 3
abhinaba-ghosh/any-text
Get text content from any file
Language: JavaScript - Size: 226 KB - Last synced: 26 days ago - Pushed: 3 months ago - Stars: 54 - Forks: 8
HilaManor/Scene-Understanding-Based-on-Text-Extraction
An algorithm (and a wrapping system) for finding the geographic location of a set of given photos based on extracting and analyzing text in the images.
Language: Python - Size: 25.3 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1
adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python - Size: 23.2 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2,688 - Forks: 205
flairNLP/fundus
A very simple news crawler with a funny name
Language: Python - Size: 14.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 38 - Forks: 5
zanachka/dateparser Fork of scrapinghub/dateparser
python parser for human readable dates
Language: Python - Size: 5.03 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
zanachka/extruct Fork of scrapinghub/extruct
Extract embedded metadata from HTML markup
Language: Python - Size: 990 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
cdown/srt
A simple library and set of tools for parsing, modifying, and composing SRT files.
Language: Python - Size: 406 KB - Last synced: 30 days ago - Pushed: about 2 months ago - Stars: 422 - Forks: 44
Banner-19/Extraction-and-Analysis-of-Text
The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.
Language: Jupyter Notebook - Size: 459 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
IDisposable/IFilterExtractor
A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.
Language: C++ - Size: 42 KB - Last synced: about 1 month ago - Pushed: about 7 years ago - Stars: 8 - Forks: 5
Aalaa4444/Text_Processing-and-Unique_Word_Extraction_fromHTML
Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.
Language: Jupyter Notebook - Size: 12.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
MRGRD56/textractor-translator
Translate visual novels and other games in real time
Language: TypeScript - Size: 2.17 MB - Last synced: 29 days ago - Pushed: 30 days ago - Stars: 9 - Forks: 0
docwire/docwire
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality
Language: C++ - Size: 34.5 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 44 - Forks: 11
rmottanet/unchainedtext
UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.
Language: Python - Size: 31.3 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
shixzie/nlp ๐ฆ
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
Language: Go - Size: 50.8 KB - Last synced: 16 days ago - Pushed: over 6 years ago - Stars: 386 - Forks: 33
mohitpg/SimpleTextExtractor
Text extraction using pytorch
Language: Jupyter Notebook - Size: 4.49 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
dotfurther/OpenDiscoverSDK
.NET 6 API for document file format identification, text/metadata/attachment/embedded object/sensitive item (PII/PHI)/entity extraction.
Language: C# - Size: 170 MB - Last synced: about 2 months ago - Pushed: 3 months ago - Stars: 12 - Forks: 0
unidoc/unidoc
This repository has moved! https://github.com/unidoc/unipdf
Language: Go - Size: 29.3 MB - Last synced: 29 days ago - Pushed: almost 5 years ago - Stars: 704 - Forks: 88
fourdigits/wagtail_textract
Text extraction for Wagtail document search
Language: Python - Size: 1.02 MB - Last synced: 19 days ago - Pushed: 7 months ago - Stars: 31 - Forks: 13
dayrev/extractor
Web Page Content Extractor
Language: PHP - Size: 45.9 KB - Last synced: about 2 months ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 0
rajdeep2804/Automated_Invoice_Processing
The number of types of physical documents being digitized is on the increase. Medical bills, bank documents and personal documents are examples of such documents. Objective of this repo is to implement and understand such use cases with an example of extracting text information from invoice receipts.
Language: Jupyter Notebook - Size: 16.4 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 5 - Forks: 0
AndyTheFactory/article-extraction-dataset
Article title, authors, date and body extraction dataset.
Language: HTML - Size: 31.9 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
globality-corp/deboiler
Deboiler - Boilerplate Identification and Removal
Language: Python - Size: 1.42 MB - Last synced: 20 days ago - Pushed: 4 months ago - Stars: 7 - Forks: 0
hscspring/pnlp
NLP้ข/ๅๅค็ๅทฅๅ ทใ
Language: Python - Size: 230 KB - Last synced: 19 days ago - Pushed: 4 months ago - Stars: 27 - Forks: 7
Rindhujatreesa/Deep_Learning_Projects
This is the repository for Deep Learning Projects that include Classification using CNN, TensorFlow, Keras, and Text Extraction using PyTesseract
Language: Jupyter Notebook - Size: 9.1 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
hreikin/pdf-toolbox
Extract content from PDF's and convert or create new documents from the content in multiple output formats.
Language: Python - Size: 7.57 MB - Last synced: 2 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
Lanjkn/Text-Extractor
Api to get text from multiple types of files
Language: Python - Size: 4.88 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
FileFormatInfo/ff-pdf2txt
Simple server to extract text from a PDF
Language: Java - Size: 5.99 MB - Last synced: about 2 months ago - Pushed: over 2 years ago - Stars: 3 - Forks: 2
mciccale/ScholarVista
ScholarVista analyses research papers and extracts/plots information about them. It uses Grobid to extract all the content of the research papers. Then all this data is plotted and displayed using Python.
Language: Python - Size: 3.24 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
sambitdash/PDFIO.jl
PDF Reader Library for Native Julia.
Language: Julia - Size: 24.4 MB - Last synced: 6 days ago - Pushed: 7 months ago - Stars: 122 - Forks: 13
sankeer28/URL-Extractor-and-Downloader
Extracts multiple URLs from text, and if downloadable, downloads them into a ZIP
Language: HTML - Size: 21.5 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
heussd/pdftotext-go
Extract texts + their page numbers from PDF
Language: Go - Size: 206 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 3 - Forks: 0
dataiku/dss-plugin-tesseract-ocr
Dataiku DSS plugin to perform optical character recognition (OCR) using the Tesseract engine.
Language: Python - Size: 2.38 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1 - Forks: 0
skylander86/lambda-text-extractor
AWS Lambda functions to extract text from various binary formats.
Language: Python - Size: 111 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 171 - Forks: 40
pd3f/pd3f
๐ญ PDF text extraction pipeline: self-hosted, local-first, Docker-based
Language: HTML - Size: 930 KB - Last synced: 3 months ago - Pushed: 7 months ago - Stars: 246 - Forks: 33
whitelok/image-text-localization-recognition
A general list of resources to image text localization and recognition ๅบๆฏๆๆฌไฝ็ฝฎๆ็ฅไธ่ฏๅซ็่ฎบๆ่ตๆบไธๅฎ็ฐๅ้ ใทใผใณใใญในใใฎไฝ็ฝฎ่ช่ญใจ่ญๅฅใฎใใใฎ่ซๆใชใฝใผในใฎ่ฆ็ด
Size: 333 KB - Last synced: 3 months ago - Pushed: 8 months ago - Stars: 936 - Forks: 238
ssciwr/AMMICO
AI Media and Misinformation Content Analysis Tool: Analyze text and images
Language: Python - Size: 87.7 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 4 - Forks: 3
ad-freiburg/pdftotext-plus-plus
A fast and accurate command line tool for extracting text from PDF files.
Language: C++ - Size: 18.2 MB - Last synced: about 2 months ago - Pushed: 7 months ago - Stars: 8 - Forks: 0
Jaha96/tesseract-quick-implementation
Tesseract-OCR quick implementation. Linked with stack-overflow question
Language: HTML - Size: 190 MB - Last synced: about 2 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
Altabeh/tesseract-ocr-wrapper
This is a highly efficient python wrapper for tesseract-ocr.
Language: Python - Size: 26.4 KB - Last synced: 2 months ago - Pushed: almost 2 years ago - Stars: 16 - Forks: 3
yoshihikoueno/pdfminer-layout-scanner Fork of dpapathanasiou/pdfminer-layout-scanner
A more complete example of programming with PDFMiner, which continues where the default documentation stops
Language: Python - Size: 26.4 KB - Last synced: about 1 month ago - Pushed: almost 5 years ago - Stars: 8 - Forks: 4
KalyanM45/Optical-Character-Recognition
This project is a Python-based Optical Character Recognition (OCR) application using the EasyOCR library. It provides a convenient way to detect and recognize text in images, making it useful for a wide range of applications such as document processing, image captioning, and text extraction.
Language: Jupyter Notebook - Size: 720 KB - Last synced: 14 days ago - Pushed: 12 months ago - Stars: 3 - Forks: 0
vsymbol/CUTIE
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)
Language: Python - Size: 2.87 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 156 - Forks: 80
andrealenzi11/py-poppleract
Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents
Language: Python - Size: 195 KB - Last synced: 3 months ago - Pushed: 5 months ago - Stars: 5 - Forks: 0
Anannyap7/text-extraction-from-image Fork of taritkandpal/text-extraction-from-image
Handwritten Text Extraction from an Image of a Document using Transformers
Language: Python - Size: 13.7 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
lykmapipo/US-Inaugural-Addresses
Python scripts to download, process, and analyze US Inaugural Addresses
Language: Python - Size: 4.45 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
vaites/php-apache-tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Language: PHP - Size: 13.8 MB - Last synced: 21 days ago - Pushed: 9 months ago - Stars: 111 - Forks: 21
sanidhyajadaun/MediLink Fork of prakratisingh/MediLink
MediLink is a web application that revolutionizes health record management by seamlessly integrating NLP techniques for handwritten text extraction on prescriptions and blockchain technology for secure data storage.
Language: HTML - Size: 146 KB - Last synced: 4 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
ParisaArbab/Data-Modeling
Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information
Language: Python - Size: 2.79 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
rajesh-bhat/spark-ai-summit-2020-text-extraction
Language: Jupyter Notebook - Size: 105 MB - Last synced: 2 months ago - Pushed: over 3 years ago - Stars: 58 - Forks: 33
ingmarboeschen/JATSdecoder
A text extraction and manipulation toolset for NISO-JATS coded XML files
Language: R - Size: 2.64 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 14 - Forks: 0
dotfurther/OpenDiscoverPlatformCaseStudy
Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.
Size: 5.92 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 10 - Forks: 0
Artaal/License_Plate_Detection_And_Text_Extraction
License plate localizer using pre-trained YOLOv5, combined with text extraction using pre-trained TrOCR
Language: Python - Size: 239 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
atahanuz/yt2text
Extract text from a YouTube video in a single command, using OpenAi's Whisper speech recognition model.
Size: 10.7 KB - Last synced: 2 days ago - Pushed: 6 months ago - Stars: 1 - Forks: 0
jessp01/zaje
Highlight/colourise command output, logfiles (and anything else really) based on regex pattern matching
Language: Go - Size: 8.04 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 4 - Forks: 0
SapienzaNLP/extend
Entity Disambiguation as text extraction (ACL 2022)
Language: Python - Size: 71.3 KB - Last synced: 6 months ago - Pushed: about 2 years ago - Stars: 148 - Forks: 9
jhw296/BookScanner
PyQt5๋ฅผ ์ฌ์ฉํ ๊ฐ๋จํ ๋์ ์ค์บ๋ ํ๋ก์ ํธ (๋ฐ์ฝ๋ ์ธ์๊ณผ ํ ์คํธ ์ถ์ถ์ ํตํ ๋์ ์ ๋ณด๋ฅผ ๊ฒ์ ๋ฐ ํ์)
Language: Python - Size: 83 KB - Last synced: 7 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
Govind-S-B/pdf-to-text-chroma-search
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
Language: Python - Size: 0 Bytes - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
sharmaroshan/Text-Classification
This is a Project Assignment where I have Learned to Classify the Different Texts Using Clustering Techniques. Natural Language Processing and Clustering both of these Concepts are Being Used. I have Used K-means Clustering Techniques to Implement the Problem.
Language: HTML - Size: 88.9 KB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 1
nehalAggarwal/Linguista
An application that translates foreign text on signboards, menu cards, etc. by capturing its image and translating it to desired language in real time supporting Spanish, German, French, Punjabi, Hindi, etc
Language: Python - Size: 16.5 MB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
anhthuan1999/PhoBERT-Extraction
Extract vectors by setences and words with one layer or concat more layers
Language: Python - Size: 5.86 KB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 0
nezamtrm/Extracting-contents-of-a-table-in-pdf-file-by-pdfplumber
Language: Jupyter Notebook - Size: 20.5 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
nbdy/prntscrngrb
prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database
Language: Python - Size: 68.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0
abs-sayem/nlp
NLP, NLP Basic. Related NLP Projects
Language: Python - Size: 49.3 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 1 - Forks: 0
juliandavidmr/text2locale
Extract all the texts of any project with HTML files and generate a KV (Key-Value) file, key = reference key, value = extracted text.
Language: JavaScript - Size: 328 KB - Last synced: 3 months ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
nainiayoub/pdf-text-data-extractor
PDF text data extraction web app with OCR for scanned documents
Language: Python - Size: 56.6 KB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 43 - Forks: 21
mknz/mirusan ๐ฆ
A PDF collection reader with built-in full-text search engine
Language: JavaScript - Size: 2.71 MB - Last synced: 9 months ago - Pushed: almost 7 years ago - Stars: 19 - Forks: 0
FurkanOM/basic-web-crawler
It's a basic web crawler API implementation
Language: JavaScript - Size: 239 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 1
SaviorVX/WindowsExplorerPreviewExpander
reg setting for more text previews for almost every file, in this case it allows explorer to view different extensions as a text thus a quick preview is allowed. perfect for analysis for aquick preview off DLL Java Jar Python Etc
Size: 39.1 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
shelfio/apache-tika-lambda-layer
AWS Lambda layer containing latest version of Apache Tika
Language: Shell - Size: 327 MB - Last synced: 3 months ago - Pushed: 6 months ago - Stars: 13 - Forks: 5
8Altair/Text-to-speech
A program for text-to-speech conversion from pdf file into a mp3 file.
Language: Python - Size: 22.8 MB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
iamvatsalpatel/CHARUSAT-SceW
An iOS application ๐ฑ that extracts text real time using camera ๐ท and play relevant video from the text
Language: Swift - Size: 10.5 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 2 - Forks: 3
rachhek/pdf-search-assistant
This assistant tool (WIP) will help you search, browse and summarize the answers to your questions from your uploaded PDF using advanced text analytics, semantic search and Large Language Model (LLM)
Language: Bicep - Size: 301 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
fonckchain/pdf-text-converter
Python tool for converting PDF files to text. Simplify your document processing tasks.
Language: Python - Size: 1000 Bytes - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
procesaur/TExASe
Flask application for OCR and extraction of text from documents with support for repository applications
Language: Python - Size: 14.7 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0
MaarkNassef/GraduationProject
HR Assistant: Web application for efficient HR recruitment and resume management. Utilizes OCR for text extraction and similarity analysis to rearrange resumes based on job descriptions. Simplifies the hiring process for HR recruiters and enhances candidate selection.
Language: HTML - Size: 14.8 MB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 1
SECRET-GUEST/transmutation_vscode.ext
A Visual Studio Code tool for easy HTML text extraction and management.
Language: TypeScript - Size: 87.9 KB - Last synced: 18 days ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
Asraf2asif/SummifyAI
Harnesses the power of OpenAI's to revolutionize the way you consume information. Say goodbye to information overload and hello to quick and comprehensive understanding. Let our AI-Powered Content Summarizer extract the key insights from any text, allowing you to focus on what matters most.
Language: JavaScript - Size: 585 KB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0