Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: webarchive

cipher387/quickcacheandarchivesearch

Quick Cache and Archive search buttons

Language: JavaScript - Size: 85 KB - Last synced: 3 minutes ago - Pushed: about 1 hour ago - Stars: 33 - Forks: 4

toimik/WarcProtocol

Parser for WARC (aka WebArchive) files

Language: C# - Size: 180 KB - Last synced: about 22 hours ago - Pushed: about 23 hours ago - Stars: 8 - Forks: 3

rumca-js/RSS-Link-Database

Bookmarked archived links

Size: 214 MB - Last synced: about 17 hours ago - Pushed: 1 day ago - Stars: 8 - Forks: 0

karust/gogetcrawl

Extract web archive data using Wayback Machine and Common Crawl

Language: Go - Size: 53.7 KB - Last synced: 3 days ago - Pushed: 11 months ago - Stars: 128 - Forks: 15

chatnoir-eu/chatnoir-resiliparse

A robust web archive analytics toolkit

Language: Cython - Size: 1.87 MB - Last synced: 3 days ago - Pushed: 12 days ago - Stars: 43 - Forks: 8

helgeho/ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Language: Scala - Size: 1.15 MB - Last synced: 3 days ago - Pushed: about 1 month ago - Stars: 141 - Forks: 19

WebarchivCZ/Seeder

Seeder - Czech webarchive curating tool and public site

Language: Python - Size: 11.3 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 15 - Forks: 2

hyponet/webpage-packer

saves webpages as archive files.

Language: Go - Size: 1.75 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

sul-dlss-deprecated/WASMetadataExtractor 📦

[DEPRECATED] Extract metadata from web archiving ARC and WARC files; used by was_robot_suite

Language: Java - Size: 87.9 MB - Last synced: 25 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 3

N0taN3rd/node-warc

Parse And Create Web ARChive (WARC) files with node.js

Language: JavaScript - Size: 7.99 MB - Last synced: 26 days ago - Pushed: over 1 year ago - Stars: 91 - Forks: 23

theorm/webarchive-page-downloader

Get archive history of a page and download pages from web.archive.org

Language: TypeScript - Size: 8.79 KB - Last synced: 27 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

vegetableman/vandal

Navigator for Web Archive

Language: JavaScript - Size: 128 MB - Last synced: 3 months ago - Pushed: 6 months ago - Stars: 149 - Forks: 5

mathis2001/WebHackUrls

Simple python OSINT tool for urls recon thanks to the waybackmachine.

Language: Python - Size: 44.9 KB - Last synced: 2 months ago - Pushed: 11 months ago - Stars: 37 - Forks: 6

ticky/webarchive

📑 Rust utilities for working with Apple's Web Archive file format

Language: Rust - Size: 506 KB - Last synced: 27 days ago - Pushed: about 2 years ago - Stars: 6 - Forks: 0

minch-dev/DownTheMoon Fork of downthemall/downthemall-legacy

A continuation of legacy XUL version of DownThemAll! ✔️preserves web.archive.org timestamps, ✔️advanced filters for remote directory tree mirroring, ✔️UI is tweaked for better UX

Language: JavaScript - Size: 13 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 5 - Forks: 0

rcarmo/python-webarchive

Create WebKit/Safari .webarchive files on any platform

Language: Python - Size: 7.81 KB - Last synced: 10 days ago - Pushed: over 4 years ago - Stars: 44 - Forks: 4

maxmmueller/404-to-Archive-Redirector

Greasemonkey script that redirects from a 404 page to the Wayback Machine.

Language: JavaScript - Size: 15.6 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

rumca-js/RSS-Link-Database-2023

link archive for year 2023

Language: HTML - Size: 413 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 2 - Forks: 2

jasonmtroos/ccwarcs

R package to provide access to Common Crawl WARC files via Amazon Web Services

Language: R - Size: 566 KB - Last synced: 5 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Mixnode/mixnode-warcreader-php

Read Web ARChive (WARC) files in PHP.

Language: PHP - Size: 7.81 KB - Last synced: 16 days ago - Pushed: about 7 years ago - Stars: 21 - Forks: 3

gitdev-bash/webArchiver

A archiving utility with an interface for web servers.

Language: Python - Size: 63.5 KB - Last synced: 6 months ago - Pushed: almost 3 years ago - Stars: 1 - Forks: 0

Fooftilly/RSS_archiver

Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.

Language: Python - Size: 30.3 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

mijho/crawl-log2xml

Parse a Heritrix crawl.log into an XML sitemap

Language: TypeScript - Size: 104 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0

JanMeritus/WebBEAT

WebBEAT website data extractor

Language: Shell - Size: 43 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 2

WebarchivCZ/extinct-websites

Aplikace slouží jako automatizované řešení pro identifikaci a popis mrtvých webů. Následně je ukládá do vlastní databáze a zpřístupňuje kurátorům, kteří s informacemi v ní dále nakládají, interpretují je a obsah klasifikují.

Language: PHP - Size: 238 KB - Last synced: about 2 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 0

MozillaCZ/forum-archiv 📦

Přesměrování

Language: HTML - Size: 150 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 1

gonejack/webarchive-to-singlefile

This command line converts .webarchive file to resources embed .html file

Language: Go - Size: 21.5 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 7 - Forks: 1

gonejack/html-to-webarchive

This command line converts .html file to Safari's .webarchive file.

Language: Go - Size: 77.1 KB - Last synced: 10 months ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0

nlnwa/docker-chrome-headless 📦

Language: Shell - Size: 12.7 KB - Last synced: 28 days ago - Pushed: about 6 years ago - Stars: 6 - Forks: 1

DavidCThames/MobileMink-WebService

Web Service to provide memento data to the Mobile Memento app

Language: C - Size: 25.8 MB - Last synced: 12 months ago - Pushed: over 7 years ago - Stars: 3 - Forks: 1

HRN-Projects/common_crawl_with_scrapy

Parsing Huge Web Archive files from Common Crawl data index to fetch any required domain's data concurrently with Python and Scrapy.

Language: Python - Size: 23.9 MB - Last synced: 9 months ago - Pushed: almost 3 years ago - Stars: 4 - Forks: 5

rumca-js/RSS-Link-Database-2022

Link archive for year 2022

Size: 155 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 1 - Forks: 0

ukwa/ukwa-manage

Shepherding our web archives from crawl to access.

Language: Jupyter Notebook - Size: 122 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 10 - Forks: 5

nlnwa/veidemann-harvester 📦

Language: Java - Size: 5.54 MB - Last synced: 28 days ago - Pushed: about 3 years ago - Stars: 6 - Forks: 0

dbmdz/rosetta-openwayback-vpp

Configurable OpenWayback VPP for Rosetta digital preservation system.

Language: Java - Size: 395 KB - Last synced: 9 months ago - Pushed: about 1 year ago - Stars: 0 - Forks: 1

eUgEntOptIc44/TinyWeatherForecastGermanyArchiver Fork of tinyweatherforecastgermanygroup/TinyWeatherForecastGermanyArchiver

submit urls.txt to web archive using GitHub Action

Language: Python - Size: 116 KB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 0 - Forks: 0

maddsua/harextract 📦

CLI based web archive extractor with lightning-fast base64 decoder (written in C)

Language: C++ - Size: 2.2 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Sicos1977/WebArchiveExtractor

A .NET Standard 2.0 library to extract a Safari web archive to a folder

Language: C# - Size: 270 KB - Last synced: 10 days ago - Pushed: about 3 years ago - Stars: 5 - Forks: 1

mhucka/devilfish

A utility for simultaneously creating full-page PDF snapshots and web archives of web pages in DEVONthink Pro.

Language: AppleScript - Size: 253 KB - Last synced: 13 days ago - Pushed: almost 4 years ago - Stars: 20 - Forks: 1

raviraa/htmltoebook

Convert html web pages to readable ebook

Language: Go - Size: 219 KB - Last synced: 5 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

WebarchivCZ/WA-KAT

Catalogization tool for the czech webarchive.

Language: JavaScript - Size: 13.6 MB - Last synced: about 1 month ago - Pushed: 10 months ago - Stars: 2 - Forks: 0

RimeCoOfficial/Arya Fork of Caffeine-Devotee/arya

A snapshot of Arya.Rime.co

Language: HTML - Size: 878 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

helgeho/WarcPartitioner

Partition (W)ARC Files by MIME Type and Year

Language: Java - Size: 8.79 KB - Last synced: 3 days ago - Pushed: about 7 years ago - Stars: 1 - Forks: 1

pereslavtsev/memento-client

Time Travel APIs NodeJS library with full support of the Memento protocol.

Language: TypeScript - Size: 294 KB - Last synced: 21 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

ibnesayeed/archival-tests

A set of web archival replay test cases

Language: HTML - Size: 11.7 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

thorkill/dbce

Diff Based Content Extraction is a part of my Bachelor Thesis: Joint Approach to Boilerplate Detection in Web Archives

Language: HTML - Size: 5.53 MB - Last synced: about 1 year ago - Pushed: almost 7 years ago - Stars: 1 - Forks: 1

helgeho/HadoopConcatGz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

Language: Java - Size: 51.8 KB - Last synced: 3 days ago - Pushed: over 6 years ago - Stars: 9 - Forks: 3

mccallofthewild/alexandrias-revenge

🔥The bold new archive that can’t be burned, bulldozed or battering-rammed #PoweredByArweave

Language: TypeScript - Size: 1.63 MB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 4 - Forks: 1

pierlauro/MDBubing

From WARC records to MongoDB documents

Language: Java - Size: 145 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0

ukwa/waybacks

This module builds our Waybacks in the various different configurations we require.

Language: Java - Size: 23.2 MB - Last synced: about 1 year ago - Pushed: almost 6 years ago - Stars: 2 - Forks: 2

nlnwa/testsites

Docker image with DNS and Apache, serving dummy testsites

Language: Shell - Size: 7.07 MB - Last synced: 28 days ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

Kulturarvscluster/Getting-started-with-Netarkivet-and-Sparklyr

Some short code snippets and tutorials for getting started with Sparklyr and an ETL for the Danish Netarchive

Size: 9.77 KB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0

elhardoum/scrape-wayback-machine

Language: Python - Size: 14.6 KB - Last synced: about 1 year ago - Pushed: over 5 years ago - Stars: 0 - Forks: 1

N0taN3rd/node-cdxj

Parse CDXJ(https://github.com/oduwsdl/ORS/wiki/CDXJ) files with node.js

Language: JavaScript - Size: 128 KB - Last synced: 4 days ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 1

nlnwa/crawl-test-site

Language: HTML - Size: 9.35 MB - Last synced: 28 days ago - Pushed: over 7 years ago - Stars: 0 - Forks: 0