GitHub topics: internet-archiving
ArchiveBox/ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Language: Python - Size: 10.9 MB - Last synced at: about 1 hour ago - Pushed at: about 1 month ago - Stars: 23,684 - Forks: 1,253

ArchiveBox/good-karma-kit
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
Size: 69.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 360 - Forks: 12

ArchiveBox/abx-dl
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...
Language: JavaScript - Size: 177 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 73 - Forks: 4

akamhy/waybackpy
Wayback Machine API interface & a command-line tool
Language: Python - Size: 575 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 519 - Forks: 35

ArchiveBox/docs
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
Language: CSS - Size: 7.48 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 16 - Forks: 5

ArchiveBox/archivebox-browser-extension
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
Language: JavaScript - Size: 848 KB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 293 - Forks: 29

pirate/wikipedia-mirror
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
Language: PLpgSQL - Size: 10.2 MB - Last synced at: 17 days ago - Pushed at: 29 days ago - Stars: 446 - Forks: 33

vegetableman/vandal
Navigator for Web Archive
Language: JavaScript - Size: 128 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 155 - Forks: 5

pirate/internet-archiving-talk
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
Language: JavaScript - Size: 27.6 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 52 - Forks: 5

ArchiveBox/docker-archivebox
Home of the official docker image for ArchiveBox
Size: 93.8 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 48 - Forks: 12

ElektroStudios/SyncCollection-Enhanced Fork of malfunct/SyncCollection
Downloads an archive collection from Archive.org to your computer.
Language: C# - Size: 195 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 1

Own-Data-Privateer/hoardy-web
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
Language: Python - Size: 1.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 38 - Forks: 1

ArchiveBox/readability-extractor
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
Language: JavaScript - Size: 93.8 KB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 13

mikwielgus/forum-dl
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
Language: Python - Size: 391 KB - Last synced at: 9 months ago - Pushed at: 10 months ago - Stars: 68 - Forks: 2

ArchiveBox/electron-archivebox
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Language: JavaScript - Size: 156 KB - Last synced at: 9 months ago - Pushed at: about 2 years ago - Stars: 174 - Forks: 15

ArchiveBox/archivebox-proxy
Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.
Language: Python - Size: 12.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 10 - Forks: 0

ArchiveBox/pip-archivebox
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
Size: 15.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 13 - Forks: 2

ArchiveBox/DigestBox
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.
Language: HTML - Size: 1.75 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0

ArchiveBox/debian-archivebox
Home of the official apt/deb package for Ubuntu/Debian-based systems.
Language: Python - Size: 3.34 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 17 - Forks: 5

ArchiveBox/homebrew-archivebox
Homebrew formula for the ArchiveBox self-hosted internet archiving solution.
Language: Ruby - Size: 61.8 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 3

itsliamdowd/WaybackBrowserWindows
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Language: Python - Size: 106 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Fooftilly/RSS_archiver
Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.
Language: Python - Size: 30.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Quoorex/archive-file-urls
Submit URLs listed inside a file to website archival services
Language: Python - Size: 17.6 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

itsliamdowd/WaybackBrowserMacOS
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Language: Swift - Size: 32.2 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

gabldotink/sharkive.old 📦
upload stuff to the Internet Archive using a shell script
Language: Shell - Size: 104 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

httpreserve/conventoarchiver
Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.
Language: Python - Size: 23.4 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0
