Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: warc-files
datacoon/metawarc
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
Language: Python - Size: 80.1 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 24 - Forks: 0
toimik/WarcProtocol
Parser for WARC (aka WebArchive) files
Language: C# - Size: 180 KB - Last synced: 2 days ago - Pushed: 14 days ago - Stars: 8 - Forks: 3
commoncrawl/cc-pyspark
Process Common Crawl data with Python and Spark
Language: Python - Size: 127 KB - Last synced: 28 days ago - Pushed: about 2 months ago - Stars: 379 - Forks: 84
toimik/CommonCrawl
Common Crawl's processing tools
Language: C# - Size: 85.9 KB - Last synced: 3 days ago - Pushed: about 1 month ago - Stars: 5 - Forks: 0
N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
Language: JavaScript - Size: 7.99 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 91 - Forks: 23
commoncrawl/ia-web-commons Fork of Aloisius/ia-web-commons
Web archiving utility library
Language: Java - Size: 7.94 MB - Last synced: 28 days ago - Pushed: 7 months ago - Stars: 9 - Forks: 6
javieraespinosa/lifranum
Discovering French Digital Literature (LIFRANUM ANR project)
Language: Jupyter Notebook - Size: 871 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
sebastian-nagel/warc-crawler
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr
Language: FLUX - Size: 44.9 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 6 - Forks: 1
nouranHisham/wget_warc_files
This is part of my 2022 Summer Internship, it's mainly about web scraping.
Language: Jupyter Notebook - Size: 46.9 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
hrbrmstr/warc
:card_index: Tools to Work with the Web Archive Ecosystem in R
Language: R - Size: 2.52 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 21 - Forks: 3
pierlauro/MDBubing
From WARC records to MongoDB documents
Language: Java - Size: 145 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0