Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / DavidNemeskey / cc_corpus
Tools for compiling corpora from Common Crawl
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DavidNemeskey%2Fcc_corpus
Stars: 12
Forks: 1
Open Issues: 11
License: lgpl-3.0
Language: Python
Repo Size: 964 KB
Dependencies:
28
Created: over 5 years ago
Updated: 2 days ago
Last pushed: 2 days ago
Last synced: 2 days ago
Files
Loading...
Readme
Loading...
Dependencies
setup.py
pypi
- Boilerplate *
- Easier *
- Just *
- Language *
- Minhash *
- Type *
- Uncommented *
- WARC *
- Will *
- beautifulsoup4 *
- boto3 *
- botocore *
- cld2-cffi *
- datasketch *
- idzip *
- justext *
- langid *
- lxml *
- multiprocessing-logging *
- progress *
- redis *
- requests *
- simplejson *
- tldextract *
- tqdm *
- typing *
- warc *
- warc3-wet *