An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: warc-format

commoncrawl/arc2warc-conversion

Experiences converting Common Crawl's ARC files from the crawls 2008 - 2012 to the WARC format

Size: 24.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

toimik/WarcProtocol

Parser for WARC (aka WebArchive) files

Language: C# - Size: 181 KB - Last synced at: 13 days ago - Pushed at: 10 months ago - Stars: 13 - Forks: 3

edgi-govdata-archiving/eis-WARC-archiver 📦

ARCHIVED--Docker app to crawl URLs and generate WARCs

Language: Python - Size: 28.1 MB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 10 - Forks: 5

pierlauro/MDBubing

From WARC records to MongoDB documents

Language: Java - Size: 145 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 0