GitHub topics: llm-data-quality
NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
Language: Jupyter Notebook - Size: 7.66 MB - Last synced at: about 21 hours ago - Pushed at: about 22 hours ago - Stars: 879 - Forks: 124

Related Keywords
data
1
data-curation
1
data-prep
1
data-preparation
1
data-processing
1
data-processing-pipelines
1
data-quality
1
datacuration
1
datarecipes
1
deduplication
1
fast-data-processing
1
fine-tuning
1
large-language-models
1
large-scale-data-processing
1
llm
1
llm-data-quality
1
llmapps
1
python
1
semantic-deduplication
1