Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: warc-files

datacoon/metawarc

metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)

Language: Python - Size: 80.1 KB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 24 - Forks: 0

toimik/WarcProtocol

Parser for WARC (aka WebArchive) files

Language: C# - Size: 180 KB - Last synced: 2 days ago - Pushed: 14 days ago - Stars: 8 - Forks: 3

commoncrawl/cc-pyspark

Process Common Crawl data with Python and Spark

Language: Python - Size: 127 KB - Last synced: 28 days ago - Pushed: about 2 months ago - Stars: 379 - Forks: 84

toimik/CommonCrawl

Common Crawl's processing tools

Language: C# - Size: 85.9 KB - Last synced: 3 days ago - Pushed: about 1 month ago - Stars: 5 - Forks: 0

N0taN3rd/node-warc

Parse And Create Web ARChive (WARC) files with node.js

Language: JavaScript - Size: 7.99 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 91 - Forks: 23

commoncrawl/ia-web-commons Fork of Aloisius/ia-web-commons

Web archiving utility library

Language: Java - Size: 7.94 MB - Last synced: 28 days ago - Pushed: 7 months ago - Stars: 9 - Forks: 6

javieraespinosa/lifranum

Discovering French Digital Literature (LIFRANUM ANR project)

Language: Jupyter Notebook - Size: 871 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

sebastian-nagel/warc-crawler

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

Language: FLUX - Size: 44.9 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 6 - Forks: 1

nouranHisham/wget_warc_files

This is part of my 2022 Summer Internship, it's mainly about web scraping.

Language: Jupyter Notebook - Size: 46.9 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

hrbrmstr/warc

:card_index: Tools to Work with the Web Archive Ecosystem in R

Language: R - Size: 2.52 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 21 - Forks: 3

pierlauro/MDBubing

From WARC records to MongoDB documents

Language: Java - Size: 145 KB - Last synced: over 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0