GitHub / Anonym0usWork1221 / python-code-docstring-scraper
A multi-threaded GitHub scraper to collect Python code with docstrings from public repositories, creating a well-documented dataset for the JaraConverse LLM model.
JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anonym0usWork1221%2Fpython-code-docstring-scraper
PURL: pkg:github/Anonym0usWork1221/python-code-docstring-scraper
Stars: 3
Forks: 0
Open issues: 0
License: mit
Language: Python
Size: 454 KB
Dependencies parsed at: Pending
Created at: 11 months ago
Updated at: 5 months ago
Pushed at: 11 months ago
Last synced at: 2 months ago
Topics: causal-language-modeling, data-scraping, dataset, dataset-generation, dataset-scripts, docst, docstring-generator, github-scraper, llm, llm-training, nlp, nlp-machine-learning, python-code, python-dataset, python3, scraper, script