An open API service providing repository metadata for many open source software ecosystems.

GitHub / NotShrirang / Data-Extractor-App

This Python script is designed to extract structured data from PDF files containing information such as Company Identification Number (CIN), email addresses, PAN (Permanent Account Number), phone numbers, dates, and websites. The script utilizes the PyPDF2 library for PDF processing and multiprocessing for efficient extraction from multiple PDFs.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NotShrirang%2FData-Extractor-App
PURL: pkg:github/NotShrirang/Data-Extractor-App

Stars: 1
Forks: 0
Open issues: 0

License: None
Language: Python
Size: 1.58 MB
Dependencies parsed at: Pending

Created at: over 1 year ago
Updated at: over 1 year ago
Pushed at: over 1 year ago
Last synced at: 11 days ago

Topics: multiprocessing, pypdf2, selenium

    Loading...