An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: web-archives

webrecorder/warcio

Streaming WARC/ARC library for fast web archive IO

Language: Python - Size: 293 KB - Last synced at: about 11 hours ago - Pushed at: 4 months ago - Stars: 409 - Forks: 61

webrecorder/pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

Language: JavaScript - Size: 32 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,490 - Forks: 228

lanl/Zotero-Robust-Links-Extension

Create Robust Links from within Zotero

Language: JavaScript - Size: 192 KB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 19 - Forks: 2

anjackson/sliver

A tool for collection archival slivers of the web and web archives

Language: Python - Size: 61.5 KB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 1

archivesunleashed/notebooks

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 4

N0taN3rd/node-warc

Parse And Create Web ARChive (WARC) files with node.js

Language: JavaScript - Size: 7.99 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 97 - Forks: 22

oduwsdl/MementoEmbed

A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).

Language: HTML - Size: 32.7 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 3

hrbrmstr/cdx

🕸 Query Web Archive Crawl Indexes ('CDX')

Language: R - Size: 8.79 KB - Last synced at: 10 days ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

cocrawler/cdx_toolkit

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

Language: Python - Size: 209 KB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 169 - Forks: 31

caltechlibrary/eprints2archives

Send records from an EPrints server to the Internet Archive and other web archives

Language: Python - Size: 504 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

zytedata/web-snap

Create "perfect" snapshots of web pages

Language: JavaScript - Size: 790 KB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 32 - Forks: 4

oldweb-today/oldweb-today

Browse emulated browsers connected to old web sites in your browser!

Language: JavaScript - Size: 12.4 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 266 - Forks: 26

wsdookadr/warctools

warc tools allowing joining, finding missing resources, fetching missing resources, accessing metadata, conversion to zim and offline viewing for web archives

Language: Python - Size: 29.2 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

oduwsdl/offtopic-goldstandard-data

Data for testing the Offtopic detection software

Language: Python - Size: 274 KB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

helgeho/Tempas2ArchiveSpark

ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)

Language: Scala - Size: 23.4 KB - Last synced at: 9 days ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

oduwsdl/raintale

A Python utility for publishing a social media story built from archived web pages to multiple services.

Language: Python - Size: 40.5 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

sebastian-nagel/warc-crawler

Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr

Language: FLUX - Size: 44.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

ukwa/ukwa-gsheets-utils

Add-On for Google Sheets to help those working with web archives.

Language: JavaScript - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 3

ukwa/ukwa-ui

A new user interface for the UK Web Archive

Language: Java - Size: 170 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 6

ukwa/waybacks

This module builds our Waybacks in the various different configurations we require.

Language: Java - Size: 23.2 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 2

nchylak/capstone-project

A collection of the scripts and notebooks I wrote as part of my Data Science Bootcamp capstone project

Language: Jupyter Notebook - Size: 520 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

web-archive-group/wadl2017

WADL2017 Web Archive Group team papers

Language: TeX - Size: 1.73 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

tigercosmos/web-archives

Web Archives Collection System

Language: Python - Size: 9.12 MB - Last synced at: 28 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

N0taN3rd/node-cdxj

Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js

Language: JavaScript - Size: 128 KB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 1