GitHub / NotShrirang / Data-Extractor-App
This Python script is designed to extract structured data from PDF files containing information such as Company Identification Number (CIN), email addresses, PAN (Permanent Account Number), phone numbers, dates, and websites. The script utilizes the PyPDF2 library for PDF processing and multiprocessing for efficient extraction from multiple PDFs.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FData-Extractor-App
PURL: pkg:github/NotShrirang/Data-Extractor-App
Stars: 1
Forks: 0
Open issues: 0
License: None
Language: Python
Size: 1.58 MB
Dependencies parsed at: Pending
Created at: over 1 year ago
Updated at: over 1 year ago
Pushed at: over 1 year ago
Last synced at: 11 days ago
Topics: multiprocessing, pypdf2, selenium