GitHub / Anonym0usWork1221 / python-code-docstring-scraper
A multi-threaded GitHub scraper to collect Python code with docstrings from public repositories, creating a well-documented dataset for the JaraConverse LLM model.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anonym0usWork1221%2Fpython-code-docstring-scraper
PURL: pkg:github/Anonym0usWork1221/python-code-docstring-scraper
Stars: 3
Forks: 0
Open issues: 0
License: mit
Language: Python
Size: 454 KB
Dependencies parsed at:
14
Created at: 11 months ago
Updated at: 5 months ago
Pushed at: 11 months ago
Last synced at: 3 months ago
Topics: causal-language-modeling, data-scraping, dataset, dataset-generation, dataset-scripts, docst, docstring-generator, github-scraper, llm, llm-training, nlp, nlp-machine-learning, python-code, python-dataset, python3, scraper, script
- Deprecated ==1.2.14
- PyGithub ==1.59.1
- PyJWT ==2.8.0
- PyNaCl ==1.5.0
- certifi ==2023.7.22
- cffi ==1.15.1
- charset-normalizer ==3.2.0
- cryptography ==41.0.4
- idna ==3.4
- psutil ==5.9.5
- pycparser ==2.21
- requests ==2.31.0
- urllib3 ==2.0.4
- wrapt ==1.15.0