An open API service providing repository metadata for many open source software ecosystems.

Topic: "textract"

srcecde/aws-tutorial-code

AWS tutorial code.

Language: Python - Size: 99 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 202 - Forks: 320

danthelion/doc2audiobook

Convert text documents to high fidelity audio(books).

Language: Python - Size: 369 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 198 - Forks: 32

aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

Language: TypeScript - Size: 1.66 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 166 - Forks: 18

likerRr/code4goal-resume-parser

Solution for Code4Goal challenge

Language: JavaScript - Size: 369 KB - Last synced at: 14 days ago - Pushed at: about 2 years ago - Stars: 130 - Forks: 72

simonw/s3-ocr

Tools for running OCR against files stored in S3

Language: Python - Size: 40 KB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 119 - Forks: 7

mylukin/Textractor

一个高效的从HTML中提取正文的类库。An efficient class library for extracting text from HTML.

Language: PHP - Size: 14.6 KB - Last synced at: 22 days ago - Pushed at: almost 8 years ago - Stars: 50 - Forks: 9

fourdigits/wagtail_textract

Text extraction for Wagtail document search

Language: Python - Size: 1.02 MB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 34 - Forks: 14

sergiocorreia/quipucamayoc

dev repo for article

Language: Python - Size: 30.3 MB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 28 - Forks: 5

NanoNets/ocr-python

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

Language: Jupyter Notebook - Size: 5.52 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 4

muhimasri/aws-textract-helper

Aws Textract Helper

Language: JavaScript - Size: 258 KB - Last synced at: 17 days ago - Pushed at: almost 2 years ago - Stars: 12 - Forks: 5

AvinashDalvi89/list-of-AWS-kickstart-projects

Learn AWS by Doing: Project Ideas

Language: Python - Size: 36.1 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 5

Mkranj/PapersCited

List all unique citations in your document

Language: Python - Size: 285 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 0

t04glovern/aws-textract-adoption-forms

Using Serverless to consume and processing WA Animals adoption forms using Amazon Textract and placing that data in DynamoDB

Language: Python - Size: 583 KB - Last synced at: 5 days ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 3

onify/blueprint-aws-textract-pdf-to-form

Onify Blueprint: Amazon AWS Textract - PDF to form example

Language: JavaScript - Size: 788 KB - Last synced at: 1 day ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 0

slub/textract2page

Convert AWS Textract JSON to PRImA PAGE XML

Language: Python - Size: 76.9 MB - Last synced at: 28 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 3

aws-samples/mask-words-in-image

A tool that can mask words that match regular expression, keywords or PII (Personally Identifiable Information) in an image file.

Language: Python - Size: 237 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 3

machinelearnear/amazon-textract-workbench

Language: Jupyter Notebook - Size: 8.35 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 5 - Forks: 0

edelgm6/ledger

Personal accounting tool with Django backend, HTMX+Alpine frontend, and AWS Textract

Language: Python - Size: 930 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 4 - Forks: 2

hupe1980/go-textractor

📄 Amazon textract response parser written in go.

Language: Go - Size: 6.24 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

Bmitch44/textract-demo

This repository is a demo for using AWS Textract to get data from scanned pdf files

Language: Python - Size: 49.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

muhimasri/aws-textract-app

Convert an image to an HTML form using Amazon Textract and NodeJS

Language: JavaScript - Size: 403 KB - Last synced at: 17 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 1

RocktimRajkumar/ATS

:trophy: An applicant tracking system (ATS) is a software application that enables the electronic handling of recruitment and hiring needs. Corporate recruiters or hiring managers can then search and sort through the resumes in a number of ways, depending on the needs

Language: Python - Size: 1.82 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 3

manuel-lang/Autonomous-Semantic-Search-Engine

Submission for HackDataKIBots 2018 - Web crawler combined with document analysis

Language: Python - Size: 29.3 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 4 - Forks: 3

build-on-aws/aiml-like-api-in-your-app

Sample code for adding AI/ML services to your app

Language: Python - Size: 3.43 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 5

sakshi360/Medi-Scanner

This is the repo for submission in AWS Health AI Hackathon hosted on Devpost.

Language: JavaScript - Size: 57.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

AWS-HumanInTheLoop/TabularDocumentDigitization

Human Reviewed Tabular Document Digitization with Amazon Textract and Amazon A2I

Language: Python - Size: 2.56 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 2

simonkeng/pdf_parser

Textual & numeric data extraction with Python using textract, easily shareable with Docker.

Language: C - Size: 15.6 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 1

Devanshu-17/HackScript-Hackathon

AI-powered Invoice and Form Label-Fields Extraction for Document Management using OpenAI & Hugging Face Transformers

Language: Python - Size: 23.4 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

briancullen/aws-textract-parser

Library for converting AWS Textract responses into a more usable structure.

Language: TypeScript - Size: 1.25 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

RodrigoRVieira/theHunterCOTWCompanion

This repository mantains the Visual Studio solution used to build the COTWOCRConsole application that works as companion to track harvests during theHunter COTW game :)

Language: C# - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

MoinDalvs/Resume_Classification

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention

Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

iann0036/textract-demo

Demonstration of Amazon Textract using its Boto3 library

Language: Python - Size: 4.39 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 2

rhabed/aws-ai-bedrock-textract

Demo for AWS Textextract and Bedrock

Language: Python - Size: 15.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

mycielski/textract_study

Analysing expense reports/invoices with AWS Textract and boto3.

Language: Python - Size: 25.4 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

aws-samples/aws-textract-e2e-processing

This repo contains all the code required to do an IDP solution on AWS from document splitting, classification to extraction.

Language: Python - Size: 1.37 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

gzomer/alex-bot

Your personal assistant at work

Language: JavaScript - Size: 15 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

Gv3N/PDF_File_Scanner

A pdf file scanner used to scan pdfs in bulk for automation using PyPDF2, textract & nltk libraries in Python.

Language: Python - Size: 57.1 MB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

machinelearnear/extract-info-by-doc-geometry-aws-textract

Language: Jupyter Notebook - Size: 3.13 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

softmatic/vision-data

Sample images and data for vision projects.

Size: 56.4 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

inmote/aws-textract-bounding-boxes

Textract Geometry Tool for annotating PDFs with bounding boxes for recognized LINE elements

Language: C# - Size: 23.4 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

atsuyaw/textlint-launcher4ja 📦

textlint導入・実行ツール

Language: Batchfile - Size: 3.91 KB - Last synced at: 26 days ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

rauanisanfelice/aws-textract

:robot: Ferramenta que lê os arquivos PDFs, realiza OCR e salva em JSON.

Language: Python - Size: 1.63 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Cylabeth/ocr

Responsive webApp based in OCR for personal documentation recognition. Ironhack WDFT final project.

Language: JavaScript - Size: 809 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0

Journalisme-UQAM/extractionPDF

Trois façons d'extraire le texte de fichiers PDF à l'aide de python

Language: Python - Size: 16.6 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

joshmenden/bartleby

A simple AWS Lambda function that extracts Key-Value pairs from an image using AWS Textract for people who would "prefer not to."

Language: JavaScript - Size: 11.7 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

dalinkim/file-organizer

File Organizer SPA using textract to search by textual content

Language: JavaScript - Size: 204 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

afrozas/search-engine

Information Retrieval Course Assignment (CS469 @ BITS Pilani)

Language: Python - Size: 1.79 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

josephgoksu/Document-Analysis-API

Open Source Document Analyzer

Language: Python - Size: 4.62 MB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

anmol111pal/Billify-Invoice-Processor

Language: TypeScript - Size: 14.6 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Carlovo/textraxtras

Extras for Amazon's Textract API.

Language: Python - Size: 812 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

deven-sitapara/pdf2csv

Extract Tables data from PDF and convert to csv

Language: Python - Size: 23.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ClaytonAllenThompsonII/WebApp

Django Web Application for inventory and invoice management.

Language: Python - Size: 7.15 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

pdfix/action-autotag-textract-docker

Autotag PDF documents using AWS Textract Layout Model and PDFix SDK in Docker

Language: Python - Size: 125 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Monnus/ServerlessInvoiceScanner

Return extracted data from any invoice

Language: JavaScript - Size: 405 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

paulo-freitas-junior/dio-bootcamp-nexa-aws-IA

Bootcamp NEXA para análise avançada de textos e imagens com uso de IA na AWS.

Language: Python - Size: 8.18 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

jannichorst/Tutorials

Repository for all code related content posted at my blog and YouTube channel.

Language: Jupyter Notebook - Size: 430 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SpAcY001/OCR-NLP-Extraction-Reserach

contains all the research done on extracting content from scanned documents

Language: Jupyter Notebook - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

abinashsahoo007/Project-Resume-Classification

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

shreyakapoor08/Personalized-Postcard-Application

The Postcard Application is a digital platform for creating and sending personalized postcards for any occasion. Users can easily design custom postcards with images and text, then send them to friends and family. It's a convenient way to share special moments and greetings, hosted on a reliable cloud infrastructure for seamless performance.

Language: JavaScript - Size: 18.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aws-samples/winform-amazon-bedrock-document-bot

A conversational document bot Windows Forms desktop application that allows users to upload PDF or Word files and ask questions about their content, with the bot keeping track of the conversation history and providing contextual responses based on the whole conversation.

Language: C# - Size: 108 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

OzgenOzan/word-counter-py

An algorithm developed for counting words from documents in Python using pandas and textract. REGex pattern is tweaked to identify Latin characters all together (such as enzyme, protein names)

Language: Python - Size: 9.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

este6an13/checks-ocr

Software that applies OCR + RAG to extract bank checks information

Language: Python - Size: 217 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tomnotthomas/Docu.ai

Docu.ai: Document Analysis POC for Fintech Company 📈📊

Language: JavaScript - Size: 13.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

NItesh1724/Resume_classification_project

This is my NLP project on Resume Classification in this i have performed EDA , data cleaning, Model building on various models, model evaluation and model deployment

Language: Jupyter Notebook - Size: 5.84 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

muzalee/extract-resume

NodeJS api for extracting resume info

Language: JavaScript - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Yolo-cell-hash/image-to-speech

Generative AI Multi-Cloud application

Language: Python - Size: 7.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Tim-Abwao/named-entity-extractor

Extract named entities from data in files of various formats.

Language: Python - Size: 4.57 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

skaznowiecki/serverless-aws-textract-implementation

Implementation of AWS Textract using Event Driven Architectures

Language: TypeScript - Size: 325 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

VaibhavDongre1311/End_to_end_Resume_Classification__project

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention

Language: Jupyter Notebook - Size: 5.86 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

proviveknayan/document-keyword-extractor

PDF keyword extraction using Python 3. Extract text from a PDF document and determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Language: Jupyter Notebook - Size: 132 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ShubhamMore4/Project-Resume_Classification

Resume classification is the task that automatically categorizes resumes or CVs into predefined domain categories or classes based on their content. This task is essential for the job recruitment process, particularly when organizations receive a large number of applications for various positions.

Language: Jupyter Notebook - Size: 8.37 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

jurest82/Captcha

This repository contains a Python implementation to solve captchas using AWS Textract

Language: Dockerfile - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

wcelery/yolov5-onnxruntime-web Fork of Hyuto/yolov7-onnxruntime-web

YOLOv5 model for table detection and text recognizing with Textract API

Language: JavaScript - Size: 96 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

drew138/p2

Transcript IA es un software para el consultorio Julian Leon Ramirez Zuluaga, el cual transforma tablas en papel en archivos de Excel a traves de IA

Language: TypeScript - Size: 1.59 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

elina-chertova/AWS_Glue_TextractAnalyzeDocument

Запуск процесса обработки документов в AWS Glue

Language: Python - Size: 68.4 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Vraid-Systems/index-images

Perform text extraction and index all text

Language: Python - Size: 8.79 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Mehul-Raj/DropBox_AWS-Spring-MVC

Language: Java - Size: 154 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

leonardo-bm/textract

Extração do texto de um arquivo PDF

Language: Jupyter Notebook - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Goberman15/textractAnalyze

Use Amazon Textract to Analyze Expense from a Recipe

Language: TypeScript - Size: 534 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

iam-sandeep-82/OCR-Extractor

Now easily extract text from any document or image instantly and accurately.

Language: Python - Size: 1.69 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

hashem78/image_to_text_decoder

Decode text/code easily on windows/linux

Language: C++ - Size: 127 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

jrnewton/holliston-real-estate-sales

Web scraping real estate sales numbers for the town of Holliston MA

Language: Shell - Size: 412 KB - Last synced at: 5 days ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

rbsathish/amazon_textract

Extracting text,form,table using Textract

Language: Python - Size: 7.48 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

HowardNTUST/Document-Transformation-with-python

Language: Python - Size: 208 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

PatrickDomnick/gotextractlambda

A Golang Lambda for Textract

Last synced at: about 2 years ago - Stars: 0 - Forks: 0