An open API service providing repository metadata for many open source software ecosystems.

GitHub / adbar 37 Repositories

Research scientist – natural language processing, web scraping and text analytics. Mostly with Python.

adbar/htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

Language: Python - Size: 30.1 MB - Last synced at: about 10 hours ago - Pushed at: 7 months ago - Stars: 135 - Forks: 27

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Language: Python - Size: 33.8 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 4,512 - Forks: 302

adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

Language: Python - Size: 729 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 166 - Forks: 14

adbar/py3langid Fork of saffsd/langid.py

Faster, modernized fork of the language identification tool langid.py

Language: Python - Size: 12.3 MB - Last synced at: 20 days ago - Pushed at: 8 months ago - Stars: 56 - Forks: 9

adbar/courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

Language: Python - Size: 547 KB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 141 - Forks: 9

adbar/python-chess Fork of niklasf/python-chess

A chess library for Python, with move generation and validation, PGN parsing and writing, Polyglot opening book reading, Gaviota tablebase probing, Syzygy tablebase probing, and UCI/XBoard engine communication

Language: Python - Size: 12.3 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

adbar/lichess-bot Fork of lichess-bot-devs/lichess-bot

A bridge between Lichess bots and chess engines

Language: Python - Size: 1.34 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

adbar/haystack-integrations Fork of deepset-ai/haystack-integrations

🚀 A list of Haystack Integrations, maintained by the community or deepset.

Size: 10 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

adbar/adbar

Size: 4.88 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

adbar/awesome-digital-humanities Fork of dh-tech/awesome-digital-humanities

Software for humanities scholars using quantitative or computational methods.

Language: HTML - Size: 199 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

adbar/Mastodon-OpenScience Fork of germanrepro/Mastodon-OpenScience

Tool to bulk follow accounts related Open Science on Mastodon. Runs at https://germanrepro.github.io/Mastodon-OpenScience/ Based on the DIY webapp to bulk follow sociological accounts on Mastodon by David Adler, Thomas Haase & Hendrik Erz.

Size: 5.57 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

adbar/awesome-german-open-source-ml Fork of johko/awesome-german-open-source-ml

A list of awesome open source projects in the machine learning field, who's developers are mainly based in Germany

Size: 1.24 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

adbar/German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Size: 144 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 440 - Forks: 63

adbar/jusText Fork of miso-belica/jusText

Heuristic based boilerplate removal tool

Language: Python - Size: 1.05 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

adbar/awesome-web-scraper Fork of duyet/awesome-web-scraper

A collection of awesome web scaper, crawler.

Size: 30.3 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

adbar/datatrove Fork of huggingface/datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Language: Python - Size: 16 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

adbar/geokelone

integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization

Language: Python - Size: 260 KB - Last synced at: 4 months ago - Pushed at: about 6 years ago - Stars: 5 - Forks: 0

adbar/flux-toolchain

Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchain

Language: Perl - Size: 168 KB - Last synced at: about 1 year ago - Pushed at: about 10 years ago - Stars: 2 - Forks: 1

adbar/tweets-tools

Diverse tools used with Twitter data

Language: Python - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: about 9 years ago - Stars: 2 - Forks: 0

adbar/awesome-crawler Fork of BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

Size: 57.6 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

adbar/coronakorpus

Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirus

Size: 7.57 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

adbar/german-reddit

Extraction of a German Reddit Corpus

Language: Python - Size: 2.93 KB - Last synced at: about 1 year ago - Pushed at: almost 9 years ago - Stars: 3 - Forks: 0

adbar/zeitcrawler 📦

Automatically exported from code.google.com/p/zeitcrawler

Language: Java - Size: 2.24 MB - Last synced at: about 1 year ago - Pushed at: about 10 years ago - Stars: 1 - Forks: 0

adbar/dwdsmor Fork of zentrum-lexikographie/dwdsmor

SFST/SMOR/DWDS-based German Morphology

Language: XSLT - Size: 5.79 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

adbar/trafilatura_gui 📦

Language: Python - Size: 89.8 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

adbar/equipe-crawler 📦

Automatically exported from code.google.com/p/equipe-crawler

Language: Perl - Size: 797 KB - Last synced at: about 1 year ago - Pushed at: about 10 years ago - Stars: 0 - Forks: 0

adbar/gps-corpus-builder 📦

Automatically exported from code.google.com/p/gps-corpus-builder

Language: Perl - Size: 148 KB - Last synced at: about 1 year ago - Pushed at: about 10 years ago - Stars: 0 - Forks: 0

adbar/corpus-visualizer 📦

Explore, visualize and publish corpora as CSS/XHTML documents

Language: CSS - Size: 113 KB - Last synced at: about 1 year ago - Pushed at: almost 13 years ago - Stars: 0 - Forks: 0

adbar/wee-benchmarking-tool Fork of Nootka-io/wee-benchmarking-tool

Size: 6.84 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

adbar/jlcl-style

Experiments to modernize the LaTeX class of the JLCL

Language: TeX - Size: 1.09 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 3

adbar/archiveis Fork of palewire/archiveis

A simple Python wrapper for the archive.is capturing service

Language: Python - Size: 35.2 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

adbar/btw21 Fork of jfilter/btw21

Visualization of the most frequent words in the German federal election in 2021

Language: Jupyter Notebook - Size: 1.43 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

adbar/python-readability Fork of buriy/python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

Language: HTML - Size: 624 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

adbar/jparser Fork of fxsjy/jparser

A readability parser which can extract title, content, images from html pages

Language: Python - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

adbar/cChardet Fork of PyYoshi/cChardet

universal character encoding detector

Language: Python - Size: 1.09 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

adbar/vardial-experiments

Experiments conducted on the occasion of the VarDial shared tasks

Language: Python - Size: 15.6 KB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 1

adbar/toponyms

Old prototype for toponym extraction in historical texts written in German

Size: 229 KB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 0

adbar/url-compressor

A fast pattern-based URL compression for lists of links

Language: Pascal - Size: 105 KB - Last synced at: about 1 year ago - Pushed at: almost 13 years ago - Stars: 1 - Forks: 0

adbar/dateparser Fork of scrapinghub/dateparser

python parser for human readable dates

Language: Python - Size: 1000 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

adbar/valency-oriented-chunker

A one-pass FSA valency-oriented chunker for German (proof of concept)

Language: Perl - Size: 122 KB - Last synced at: about 1 year ago - Pushed at: almost 9 years ago - Stars: 0 - Forks: 0

adbar/microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language

Language: Python - Size: 281 KB - Last synced at: about 1 year ago - Pushed at: about 12 years ago - Stars: 1 - Forks: 0

adbar/laclos

LAnguage-CLassified OpenSubtitles

Language: Python - Size: 172 KB - Last synced at: about 1 year ago - Pushed at: over 10 years ago - Stars: 0 - Forks: 1