GitHub topics: web-archives
webrecorder/warcio
Streaming WARC/ARC library for fast web archive IO
Language: Python - Size: 293 KB - Last synced at: about 11 hours ago - Pushed at: 4 months ago - Stars: 409 - Forks: 61

webrecorder/pywb
Core Python Web Archiving Toolkit for replay and recording of web archives
Language: JavaScript - Size: 32 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,490 - Forks: 228

lanl/Zotero-Robust-Links-Extension
Create Robust Links from within Zotero
Language: JavaScript - Size: 192 KB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 19 - Forks: 2

anjackson/sliver
A tool for collection archival slivers of the web and web archives
Language: Python - Size: 61.5 KB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 1

archivesunleashed/notebooks
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Language: Jupyter Notebook - Size: 49.1 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 26 - Forks: 4

N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
Language: JavaScript - Size: 7.99 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 97 - Forks: 22

oduwsdl/MementoEmbed
A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (mementos).
Language: HTML - Size: 32.7 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 3

hrbrmstr/cdx
🕸 Query Web Archive Crawl Indexes ('CDX')
Language: R - Size: 8.79 KB - Last synced at: 10 days ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

cocrawler/cdx_toolkit
A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine
Language: Python - Size: 209 KB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 169 - Forks: 31

caltechlibrary/eprints2archives
Send records from an EPrints server to the Internet Archive and other web archives
Language: Python - Size: 504 KB - Last synced at: 9 days ago - Pushed at: almost 2 years ago - Stars: 4 - Forks: 0

zytedata/web-snap
Create "perfect" snapshots of web pages
Language: JavaScript - Size: 790 KB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 32 - Forks: 4

oldweb-today/oldweb-today
Browse emulated browsers connected to old web sites in your browser!
Language: JavaScript - Size: 12.4 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 266 - Forks: 26

wsdookadr/warctools
warc tools allowing joining, finding missing resources, fetching missing resources, accessing metadata, conversion to zim and offline viewing for web archives
Language: Python - Size: 29.2 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

oduwsdl/offtopic-goldstandard-data
Data for testing the Offtopic detection software
Language: Python - Size: 274 KB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

helgeho/Tempas2ArchiveSpark
ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)
Language: Scala - Size: 23.4 KB - Last synced at: 9 days ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

oduwsdl/raintale
A Python utility for publishing a social media story built from archived web pages to multiple services.
Language: Python - Size: 40.5 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 11 - Forks: 3

sebastian-nagel/warc-crawler
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr
Language: FLUX - Size: 44.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

ukwa/ukwa-gsheets-utils
Add-On for Google Sheets to help those working with web archives.
Language: JavaScript - Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 3

ukwa/ukwa-ui
A new user interface for the UK Web Archive
Language: Java - Size: 170 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 6

ukwa/waybacks
This module builds our Waybacks in the various different configurations we require.
Language: Java - Size: 23.2 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 2

nchylak/capstone-project
A collection of the scripts and notebooks I wrote as part of my Data Science Bootcamp capstone project
Language: Jupyter Notebook - Size: 520 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

web-archive-group/wadl2017
WADL2017 Web Archive Group team papers
Language: TeX - Size: 1.73 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

tigercosmos/web-archives
Web Archives Collection System
Language: Python - Size: 9.12 MB - Last synced at: 28 days ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

N0taN3rd/node-cdxj
Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js
Language: JavaScript - Size: 128 KB - Last synced at: 5 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 1
