An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: internet-archiving

ArchiveBox/ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Language: Python - Size: 10.9 MB - Last synced at: about 1 hour ago - Pushed at: about 1 month ago - Stars: 23,684 - Forks: 1,253

ArchiveBox/good-karma-kit

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

Size: 69.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 360 - Forks: 12

ArchiveBox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

Language: JavaScript - Size: 177 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 73 - Forks: 4

akamhy/waybackpy

Wayback Machine API interface & a command-line tool

Language: Python - Size: 575 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 519 - Forks: 35

ArchiveBox/docs

Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.

Language: CSS - Size: 7.48 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 16 - Forks: 5

ArchiveBox/archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

Language: JavaScript - Size: 848 KB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 293 - Forks: 29

pirate/wikipedia-mirror

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

Language: PLpgSQL - Size: 10.2 MB - Last synced at: 17 days ago - Pushed at: 29 days ago - Stars: 446 - Forks: 33

vegetableman/vandal

Navigator for Web Archive

Language: JavaScript - Size: 128 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 155 - Forks: 5

pirate/internet-archiving-talk

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

Language: JavaScript - Size: 27.6 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 52 - Forks: 5

ArchiveBox/docker-archivebox

Home of the official docker image for ArchiveBox

Size: 93.8 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 48 - Forks: 12

ElektroStudios/SyncCollection-Enhanced Fork of malfunct/SyncCollection

Downloads an archive collection from Archive.org to your computer.

Language: C# - Size: 195 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

Own-Data-Privateer/hoardy-web

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

Language: Python - Size: 1.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 38 - Forks: 1

ArchiveBox/readability-extractor

Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.

Language: JavaScript - Size: 93.8 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 13

mikwielgus/forum-dl

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

Language: Python - Size: 391 KB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 68 - Forks: 2

ArchiveBox/electron-archivebox

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

Language: JavaScript - Size: 156 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 174 - Forks: 15

ArchiveBox/archivebox-proxy

Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.

Language: Python - Size: 12.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 10 - Forks: 0

ArchiveBox/pip-archivebox

Official Python package for ArchiveBox, the self-hosted internet archiving solution.

Size: 15.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 13 - Forks: 2

ArchiveBox/DigestBox

DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.

Language: HTML - Size: 1.75 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0

ArchiveBox/debian-archivebox

Home of the official apt/deb package for Ubuntu/Debian-based systems.

Language: Python - Size: 3.34 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 5

ArchiveBox/homebrew-archivebox

Homebrew formula for the ArchiveBox self-hosted internet archiving solution.

Language: Ruby - Size: 61.8 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 3

itsliamdowd/WaybackBrowserWindows

Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

Language: Python - Size: 106 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Fooftilly/RSS_archiver

Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.

Language: Python - Size: 30.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Quoorex/archive-file-urls

Submit URLs listed inside a file to website archival services

Language: Python - Size: 17.6 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

itsliamdowd/WaybackBrowserMacOS

Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

Language: Swift - Size: 32.2 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

gabldotink/sharkive.old 📦

upload stuff to the Internet Archive using a shell script

Language: Shell - Size: 104 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

httpreserve/conventoarchiver

Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.

Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0