Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / PlanTL-GOB-ES / corpus-cleaner
Generic toolkit for corpus cleaning
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlanTL-GOB-ES%2Fcorpus-cleaner
Stars: 3
Forks: 0
Open Issues: 28
License: mit
Language: Python
Repo Size: 1.63 MB
Dependencies:
59
Created: about 4 years ago
Updated: over 1 year ago
Last pushed: over 1 year ago
Last synced: about 1 year ago
Files
Loading...
Readme
Loading...
Dependencies
requirements.txt
pypi
- PyYAML ==6.0
- aiohttp ==3.8.3
- aiosignal ==1.2.0
- alphabet-detector ==0.0.7
- async-timeout ==4.0.2
- attrs ==22.1.0
- beautifulsoup4 ==4.11.1
- chardet ==3.0.4
- charset-normalizer ==2.1.1
- click ==7.1.1
- colorama ==0.4.6
- docopt ==0.6.2
- exceptiongroup ==1.0.0
- fasttext ==0.9.1
- filelock ==3.8.0
- frozenlist ==1.3.1
- ftfy ==5.7
- google ==3.0.0
- grpcio ==1.50.0
- idna ==3.4
- importlib-resources ==5.10.0
- iniconfig ==1.1.1
- joblib ==1.2.0
- jsonschema ==4.17.0
- langid ==1.1.6
- msgpack ==1.0.4
- multidict ==6.0.2
- multiprocessing-logging ==0.3.1
- numpy ==1.22.0
- openfile ==0.0.7
- ordered-set ==3.1.1
- packaging ==21.3
- pip-autoremove ==0.9.1
- pipel ==0.1.1
- pkg_resources ==0.0.0
- pkgutil_resolve_name ==1.3.10
- pluggy ==1.0.0
- protobuf ==3.20.0
- py-spy ==0.3.14
- pybind11 ==2.5.0
- pyparsing ==3.0.9
- pyrsistent ==0.19.1
- pytest ==7.2.0
- ray ==0.8.6
- redis ==3.4.1
- regex ==2020.2.20
- sacremoses ==0.0.38
- selectolax ==0.2.4
- sentence-splitter ==1.4
- six ==1.14.0
- soupsieve ==2.3.2.post1
- textnorm ==1.2
- tomli ==2.0.1
- tqdm ==4.43.0
- warcio ==1.7.3
- wcwidth ==0.1.8
- yarl ==1.8.1
- zipp ==3.10.0
Dockerfile
docker
- ubuntu 20.04 build