Topic: "textract"
srcecde/aws-tutorial-code
AWS tutorial code.
Language: Python - Size: 99 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 202 - Forks: 320

danthelion/doc2audiobook
Convert text documents to high fidelity audio(books).
Language: Python - Size: 369 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 198 - Forks: 32

aeksco/aws-pdf-textract-pipeline
:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Language: TypeScript - Size: 1.66 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 166 - Forks: 18

likerRr/code4goal-resume-parser
Solution for Code4Goal challenge
Language: JavaScript - Size: 369 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 130 - Forks: 72

simonw/s3-ocr
Tools for running OCR against files stored in S3
Language: Python - Size: 40 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 119 - Forks: 7

mylukin/Textractor
一个高效的从HTML中提取正文的类库。An efficient class library for extracting text from HTML.
Language: PHP - Size: 14.6 KB - Last synced at: 22 days ago - Pushed at: almost 8 years ago - Stars: 50 - Forks: 9

fourdigits/wagtail_textract
Text extraction for Wagtail document search
Language: Python - Size: 1.02 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 14

sergiocorreia/quipucamayoc
dev repo for article
Language: Python - Size: 30.3 MB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 5

NanoNets/ocr-python
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 4

muhimasri/aws-textract-helper
Aws Textract Helper
Language: JavaScript - Size: 258 KB - Last synced at: 17 days ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 5

AvinashDalvi89/list-of-AWS-kickstart-projects
Learn AWS by Doing: Project Ideas
Language: Python - Size: 36.1 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 5

Mkranj/PapersCited
List all unique citations in your document
Language: Python - Size: 285 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

t04glovern/aws-textract-adoption-forms
Using Serverless to consume and processing WA Animals adoption forms using Amazon Textract and placing that data in DynamoDB
Language: Python - Size: 583 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 3

onify/blueprint-aws-textract-pdf-to-form
Onify Blueprint: Amazon AWS Textract - PDF to form example
Language: JavaScript - Size: 788 KB - Last synced at: 1 day ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 0

slub/textract2page
Convert AWS Textract JSON to PRImA PAGE XML
Language: Python - Size: 76.9 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 3

aws-samples/mask-words-in-image
A tool that can mask words that match regular expression, keywords or PII (Personally Identifiable Information) in an image file.
Language: Python - Size: 237 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 3

machinelearnear/amazon-textract-workbench
Language: Jupyter Notebook - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

edelgm6/ledger
Personal accounting tool with Django backend, HTMX+Alpine frontend, and AWS Textract
Language: Python - Size: 930 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 4 - Forks: 2

hupe1980/go-textractor
📄 Amazon textract response parser written in go.
Language: Go - Size: 6.24 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

Bmitch44/textract-demo
This repository is a demo for using AWS Textract to get data from scanned pdf files
Language: Python - Size: 49.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

muhimasri/aws-textract-app
Convert an image to an HTML form using Amazon Textract and NodeJS
Language: JavaScript - Size: 403 KB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

RocktimRajkumar/ATS
:trophy: An applicant tracking system (ATS) is a software application that enables the electronic handling of recruitment and hiring needs. Corporate recruiters or hiring managers can then search and sort through the resumes in a number of ways, depending on the needs
Language: Python - Size: 1.82 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 3

manuel-lang/Autonomous-Semantic-Search-Engine
Submission for HackDataKIBots 2018 - Web crawler combined with document analysis
Language: Python - Size: 29.3 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 3

build-on-aws/aiml-like-api-in-your-app
Sample code for adding AI/ML services to your app
Language: Python - Size: 3.43 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 5

sakshi360/Medi-Scanner
This is the repo for submission in AWS Health AI Hackathon hosted on Devpost.
Language: JavaScript - Size: 57.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

AWS-HumanInTheLoop/TabularDocumentDigitization
Human Reviewed Tabular Document Digitization with Amazon Textract and Amazon A2I
Language: Python - Size: 2.56 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 2

simonkeng/pdf_parser
Textual & numeric data extraction with Python using textract, easily shareable with Docker.
Language: C - Size: 15.6 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 1

Devanshu-17/HackScript-Hackathon
AI-powered Invoice and Form Label-Fields Extraction for Document Management using OpenAI & Hugging Face Transformers
Language: Python - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

briancullen/aws-textract-parser
Library for converting AWS Textract responses into a more usable structure.
Language: TypeScript - Size: 1.25 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

RodrigoRVieira/theHunterCOTWCompanion
This repository mantains the Visual Studio solution used to build the COTWOCRConsole application that works as companion to track harvests during theHunter COTW game :)
Language: C# - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

MoinDalvs/Resume_Classification
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

iann0036/textract-demo
Demonstration of Amazon Textract using its Boto3 library
Language: Python - Size: 4.39 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 2

rhabed/aws-ai-bedrock-textract
Demo for AWS Textextract and Bedrock
Language: Python - Size: 15.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

mycielski/textract_study
Analysing expense reports/invoices with AWS Textract and boto3.
Language: Python - Size: 25.4 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

aws-samples/aws-textract-e2e-processing
This repo contains all the code required to do an IDP solution on AWS from document splitting, classification to extraction.
Language: Python - Size: 1.37 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

gzomer/alex-bot
Your personal assistant at work
Language: JavaScript - Size: 15 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

Gv3N/PDF_File_Scanner
A pdf file scanner used to scan pdfs in bulk for automation using PyPDF2, textract & nltk libraries in Python.
Language: Python - Size: 57.1 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

machinelearnear/extract-info-by-doc-geometry-aws-textract
Language: Jupyter Notebook - Size: 3.13 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

softmatic/vision-data
Sample images and data for vision projects.
Size: 56.4 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

inmote/aws-textract-bounding-boxes
Textract Geometry Tool for annotating PDFs with bounding boxes for recognized LINE elements
Language: C# - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

atsuyaw/textlint-launcher4ja 📦
textlint導入・実行ツール
Language: Batchfile - Size: 3.91 KB - Last synced at: 26 days ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

rauanisanfelice/aws-textract
:robot: Ferramenta que lê os arquivos PDFs, realiza OCR e salva em JSON.
Language: Python - Size: 1.63 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Cylabeth/ocr
Responsive webApp based in OCR for personal documentation recognition. Ironhack WDFT final project.
Language: JavaScript - Size: 809 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Journalisme-UQAM/extractionPDF
Trois façons d'extraire le texte de fichiers PDF à l'aide de python
Language: Python - Size: 16.6 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

joshmenden/bartleby
A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."
Language: JavaScript - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

dalinkim/file-organizer
File Organizer SPA using textract to search by textual content
Language: JavaScript - Size: 204 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

afrozas/search-engine
Information Retrieval Course Assignment (CS469 @ BITS Pilani)
Language: Python - Size: 1.79 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

josephgoksu/Document-Analysis-API
Open Source Document Analyzer
Language: Python - Size: 4.62 MB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

anmol111pal/Billify-Invoice-Processor
Language: TypeScript - Size: 14.6 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Carlovo/textraxtras
Extras for Amazon's Textract API.
Language: Python - Size: 812 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

deven-sitapara/pdf2csv
Extract Tables data from PDF and convert to csv
Language: Python - Size: 23.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ClaytonAllenThompsonII/WebApp
Django Web Application for inventory and invoice management.
Language: Python - Size: 7.15 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pdfix/action-autotag-textract-docker
Autotag PDF documents using AWS Textract Layout Model and PDFix SDK in Docker
Language: Python - Size: 125 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Monnus/ServerlessInvoiceScanner
Return extracted data from any invoice
Language: JavaScript - Size: 405 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

paulo-freitas-junior/dio-bootcamp-nexa-aws-IA
Bootcamp NEXA para análise avançada de textos e imagens com uso de IA na AWS.
Language: Python - Size: 8.18 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jannichorst/Tutorials
Repository for all code related content posted at my blog and YouTube channel.
Language: Jupyter Notebook - Size: 430 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SpAcY001/OCR-NLP-Extraction-Reserach
contains all the research done on extracting content from scanned documents
Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

abinashsahoo007/Project-Resume-Classification
The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

shreyakapoor08/Personalized-Postcard-Application
The Postcard Application is a digital platform for creating and sending personalized postcards for any occasion. Users can easily design custom postcards with images and text, then send them to friends and family. It's a convenient way to share special moments and greetings, hosted on a reliable cloud infrastructure for seamless performance.
Language: JavaScript - Size: 18.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aws-samples/winform-amazon-bedrock-document-bot
A conversational document bot Windows Forms desktop application that allows users to upload PDF or Word files and ask questions about their content, with the bot keeping track of the conversation history and providing contextual responses based on the whole conversation.
Language: C# - Size: 108 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OzgenOzan/word-counter-py
An algorithm developed for counting words from documents in Python using pandas and textract. REGex pattern is tweaked to identify Latin characters all together (such as enzyme, protein names)
Language: Python - Size: 9.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

este6an13/checks-ocr
Software that applies OCR + RAG to extract bank checks information
Language: Python - Size: 217 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tomnotthomas/Docu.ai
Docu.ai: Document Analysis POC for Fintech Company 📈📊
Language: JavaScript - Size: 13.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

NItesh1724/Resume_classification_project
This is my NLP project on Resume Classification in this i have performed EDA , data cleaning, Model building on various models, model evaluation and model deployment
Language: Jupyter Notebook - Size: 5.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

muzalee/extract-resume
NodeJS api for extracting resume info
Language: JavaScript - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Yolo-cell-hash/image-to-speech
Generative AI Multi-Cloud application
Language: Python - Size: 7.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Tim-Abwao/named-entity-extractor
Extract named entities from data in files of various formats.
Language: Python - Size: 4.57 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

skaznowiecki/serverless-aws-textract-implementation
Implementation of AWS Textract using Event Driven Architectures
Language: TypeScript - Size: 325 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

VaibhavDongre1311/End_to_end_Resume_Classification__project
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention
Language: Jupyter Notebook - Size: 5.86 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

proviveknayan/document-keyword-extractor
PDF keyword extraction using Python 3. Extract text from a PDF document and determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
Language: Jupyter Notebook - Size: 132 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ShubhamMore4/Project-Resume_Classification
Resume classification is the task that automatically categorizes resumes or CVs into predefined domain categories or classes based on their content. This task is essential for the job recruitment process, particularly when organizations receive a large number of applications for various positions.
Language: Jupyter Notebook - Size: 8.37 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jurest82/Captcha
This repository contains a Python implementation to solve captchas using AWS Textract
Language: Dockerfile - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

wcelery/yolov5-onnxruntime-web Fork of Hyuto/yolov7-onnxruntime-web
YOLOv5 model for table detection and text recognizing with Textract API
Language: JavaScript - Size: 96 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

drew138/p2
Transcript IA es un software para el consultorio Julian Leon Ramirez Zuluaga, el cual transforma tablas en papel en archivos de Excel a traves de IA
Language: TypeScript - Size: 1.59 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

elina-chertova/AWS_Glue_TextractAnalyzeDocument
Запуск процесса обработки документов в AWS Glue
Language: Python - Size: 68.4 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Vraid-Systems/index-images
Perform text extraction and index all text
Language: Python - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Mehul-Raj/DropBox_AWS-Spring-MVC
Language: Java - Size: 154 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

leonardo-bm/textract
Extração do texto de um arquivo PDF
Language: Jupyter Notebook - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Goberman15/textractAnalyze
Use Amazon Textract to Analyze Expense from a Recipe
Language: TypeScript - Size: 534 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

iam-sandeep-82/OCR-Extractor
Now easily extract text from any document or image instantly and accurately.
Language: Python - Size: 1.69 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

hashem78/image_to_text_decoder
Decode text/code easily on windows/linux
Language: C++ - Size: 127 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

jrnewton/holliston-real-estate-sales
Web scraping real estate sales numbers for the town of Holliston MA
Language: Shell - Size: 412 KB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

rbsathish/amazon_textract
Extracting text,form,table using Textract
Language: Python - Size: 7.48 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

HowardNTUST/Document-Transformation-with-python
Language: Python - Size: 208 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

PatrickDomnick/gotextractlambda
A Golang Lambda for Textract
Last synced at: about 2 years ago - Stars: 0 - Forks: 0
