Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: webarchive
cipher387/quickcacheandarchivesearch
Quick Cache and Archive search buttons
Language: JavaScript - Size: 85 KB - Last synced: 3 minutes ago - Pushed: about 1 hour ago - Stars: 33 - Forks: 4
toimik/WarcProtocol
Parser for WARC (aka WebArchive) files
Language: C# - Size: 180 KB - Last synced: about 22 hours ago - Pushed: about 23 hours ago - Stars: 8 - Forks: 3
rumca-js/RSS-Link-Database
Bookmarked archived links
Size: 214 MB - Last synced: about 17 hours ago - Pushed: 1 day ago - Stars: 8 - Forks: 0
karust/gogetcrawl
Extract web archive data using Wayback Machine and Common Crawl
Language: Go - Size: 53.7 KB - Last synced: 3 days ago - Pushed: 11 months ago - Stars: 128 - Forks: 15
chatnoir-eu/chatnoir-resiliparse
A robust web archive analytics toolkit
Language: Cython - Size: 1.87 MB - Last synced: 3 days ago - Pushed: 12 days ago - Stars: 43 - Forks: 8
helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Language: Scala - Size: 1.15 MB - Last synced: 3 days ago - Pushed: about 1 month ago - Stars: 141 - Forks: 19
WebarchivCZ/Seeder
Seeder - Czech webarchive curating tool and public site
Language: Python - Size: 11.3 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 15 - Forks: 2
hyponet/webpage-packer
saves webpages as archive files.
Language: Go - Size: 1.75 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0
sul-dlss-deprecated/WASMetadataExtractor 📦
[DEPRECATED] Extract metadata from web archiving ARC and WARC files; used by was_robot_suite
Language: Java - Size: 87.9 MB - Last synced: 25 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 3
N0taN3rd/node-warc
Parse And Create Web ARChive (WARC) files with node.js
Language: JavaScript - Size: 7.99 MB - Last synced: 26 days ago - Pushed: over 1 year ago - Stars: 91 - Forks: 23
theorm/webarchive-page-downloader
Get archive history of a page and download pages from web.archive.org
Language: TypeScript - Size: 8.79 KB - Last synced: 27 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
vegetableman/vandal
Navigator for Web Archive
Language: JavaScript - Size: 128 MB - Last synced: 3 months ago - Pushed: 6 months ago - Stars: 149 - Forks: 5
mathis2001/WebHackUrls
Simple python OSINT tool for urls recon thanks to the waybackmachine.
Language: Python - Size: 44.9 KB - Last synced: 2 months ago - Pushed: 11 months ago - Stars: 37 - Forks: 6
ticky/webarchive
📑 Rust utilities for working with Apple's Web Archive file format
Language: Rust - Size: 506 KB - Last synced: 27 days ago - Pushed: about 2 years ago - Stars: 6 - Forks: 0
minch-dev/DownTheMoon Fork of downthemall/downthemall-legacy
A continuation of legacy XUL version of DownThemAll! ✔️preserves web.archive.org timestamps, ✔️advanced filters for remote directory tree mirroring, ✔️UI is tweaked for better UX
Language: JavaScript - Size: 13 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 5 - Forks: 0
rcarmo/python-webarchive
Create WebKit/Safari .webarchive files on any platform
Language: Python - Size: 7.81 KB - Last synced: 10 days ago - Pushed: over 4 years ago - Stars: 44 - Forks: 4
maxmmueller/404-to-Archive-Redirector
Greasemonkey script that redirects from a 404 page to the Wayback Machine.
Language: JavaScript - Size: 15.6 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
rumca-js/RSS-Link-Database-2023
link archive for year 2023
Language: HTML - Size: 413 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 2 - Forks: 2
jasonmtroos/ccwarcs
R package to provide access to Common Crawl WARC files via Amazon Web Services
Language: R - Size: 566 KB - Last synced: 5 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
Mixnode/mixnode-warcreader-php
Read Web ARChive (WARC) files in PHP.
Language: PHP - Size: 7.81 KB - Last synced: 16 days ago - Pushed: about 7 years ago - Stars: 21 - Forks: 3
gitdev-bash/webArchiver
A archiving utility with an interface for web servers.
Language: Python - Size: 63.5 KB - Last synced: 6 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0
Fooftilly/RSS_archiver
Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.
Language: Python - Size: 30.3 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
mijho/crawl-log2xml
Parse a Heritrix crawl.log into an XML sitemap
Language: TypeScript - Size: 104 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0
JanMeritus/WebBEAT
WebBEAT website data extractor
Language: Shell - Size: 43 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 2
WebarchivCZ/extinct-websites
Aplikace slouží jako automatizované řešení pro identifikaci a popis mrtvých webů. Následně je ukládá do vlastní databáze a zpřístupňuje kurátorům, kteří s informacemi v ní dále nakládají, interpretují je a obsah klasifikují.
Language: PHP - Size: 238 KB - Last synced: about 2 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 0
MozillaCZ/forum-archiv 📦
Přesměrování
Language: HTML - Size: 150 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 1
gonejack/webarchive-to-singlefile
This command line converts .webarchive file to resources embed .html file
Language: Go - Size: 21.5 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 7 - Forks: 1
gonejack/html-to-webarchive
This command line converts .html file to Safari's .webarchive file.
Language: Go - Size: 77.1 KB - Last synced: 10 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
nlnwa/docker-chrome-headless 📦
Language: Shell - Size: 12.7 KB - Last synced: 28 days ago - Pushed: about 6 years ago - Stars: 6 - Forks: 1
DavidCThames/MobileMink-WebService
Web Service to provide memento data to the Mobile Memento app
Language: C - Size: 25.8 MB - Last synced: 12 months ago - Pushed: over 7 years ago - Stars: 3 - Forks: 1
HRN-Projects/common_crawl_with_scrapy
Parsing Huge Web Archive files from Common Crawl data index to fetch any required domain's data concurrently with Python and Scrapy.
Language: Python - Size: 23.9 MB - Last synced: 9 months ago - Pushed: almost 3 years ago - Stars: 4 - Forks: 5
rumca-js/RSS-Link-Database-2022
Link archive for year 2022
Size: 155 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 1 - Forks: 0
ukwa/ukwa-manage
Shepherding our web archives from crawl to access.
Language: Jupyter Notebook - Size: 122 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 10 - Forks: 5
nlnwa/veidemann-harvester 📦
Language: Java - Size: 5.54 MB - Last synced: 28 days ago - Pushed: about 3 years ago - Stars: 6 - Forks: 0
dbmdz/rosetta-openwayback-vpp
Configurable OpenWayback VPP for Rosetta digital preservation system.
Language: Java - Size: 395 KB - Last synced: 9 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 1
eUgEntOptIc44/TinyWeatherForecastGermanyArchiver Fork of tinyweatherforecastgermanygroup/TinyWeatherForecastGermanyArchiver
submit urls.txt to web archive using GitHub Action
Language: Python - Size: 116 KB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 0 - Forks: 0
maddsua/harextract 📦
CLI based web archive extractor with lightning-fast base64 decoder (written in C)
Language: C++ - Size: 2.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Sicos1977/WebArchiveExtractor
A .NET Standard 2.0 library to extract a Safari web archive to a folder
Language: C# - Size: 270 KB - Last synced: 10 days ago - Pushed: about 3 years ago - Stars: 5 - Forks: 1
mhucka/devilfish
A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.
Language: AppleScript - Size: 253 KB - Last synced: 13 days ago - Pushed: almost 4 years ago - Stars: 20 - Forks: 1
raviraa/htmltoebook
Convert html web pages to readable ebook
Language: Go - Size: 219 KB - Last synced: 5 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
WebarchivCZ/WA-KAT
Catalogization tool for the czech webarchive.
Language: JavaScript - Size: 13.6 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 2 - Forks: 0
RimeCoOfficial/Arya Fork of Caffeine-Devotee/arya
A snapshot of Arya.Rime.co
Language: HTML - Size: 878 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
helgeho/WarcPartitioner
Partition (W)ARC Files by MIME Type and Year
Language: Java - Size: 8.79 KB - Last synced: 3 days ago - Pushed: about 7 years ago - Stars: 1 - Forks: 1
pereslavtsev/memento-client
Time Travel APIs NodeJS library with full support of the Memento protocol.
Language: TypeScript - Size: 294 KB - Last synced: 21 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
ibnesayeed/archival-tests
A set of web archival replay test cases
Language: HTML - Size: 11.7 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
thorkill/dbce
Diff Based Content Extraction is a part of my Bachelor Thesis: Joint Approach to Boilerplate Detection in Web Archives
Language: HTML - Size: 5.53 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 1 - Forks: 1
helgeho/HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Language: Java - Size: 51.8 KB - Last synced: 3 days ago - Pushed: over 6 years ago - Stars: 9 - Forks: 3
mccallofthewild/alexandrias-revenge
🔥The bold new archive that can’t be burned, bulldozed or battering-rammed #PoweredByArweave
Language: TypeScript - Size: 1.63 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 4 - Forks: 1
pierlauro/MDBubing
From WARC records to MongoDB documents
Language: Java - Size: 145 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
ukwa/waybacks
This module builds our Waybacks in the various different configurations we require.
Language: Java - Size: 23.2 MB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 2
nlnwa/testsites
Docker image with DNS and Apache, serving dummy testsites
Language: Shell - Size: 7.07 MB - Last synced: 28 days ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0
Kulturarvscluster/Getting-started-with-Netarkivet-and-Sparklyr
Some short code snippets and tutorials for getting started with Sparklyr and an ETL for the Danish Netarchive
Size: 9.77 KB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0
elhardoum/scrape-wayback-machine
Language: Python - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 1
N0taN3rd/node-cdxj
Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js
Language: JavaScript - Size: 128 KB - Last synced: 4 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 1
nlnwa/crawl-test-site
Language: HTML - Size: 9.35 MB - Last synced: 28 days ago - Pushed: over 7 years ago - Stars: 0 - Forks: 0