An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: llm-training-data

deepakshroff/Capston-Gemini-ChatBot

πŸ‘¨β€πŸ«This project was developed under the guidance of Mr. Lokesh Sir as part of the AI & ML Training Program. It explores LLM integration using Google Gemini APIs with a custom UI built on Streamlit.

Language: Python - Size: 117 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Cre4T3Tiv3/Cre4T3Tiv3

I’m a senior software engineer crafting scalable systems end to end. With 10+ years across fintech, ad-tech, and enterprise SaaS, I deliver production-grade software that fuses robust backend architecture, seamless frontend UX, and cutting-edge AI & ML native tools.

Size: 714 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

vinsblack/The-Stach-Processed-v2

Sample edition of The Stack Enriched: annotated, secure, and optimized code dataset, this is a sample version

Language: Python - Size: 199 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

BlazeWild/Custom_LLM_DataGen_Template

πŸ”§ Modular pipeline for generating high-quality, domain-specific datasets for LLM fine-tuning β€” from PDFs and web scraping to synthetic Q&A generation, quality filtering, and training-ready formatting.

Language: Python - Size: 24.4 KB - Last synced at: about 2 hours ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

emailmarketingdataset/Open-Email-Marketing-Dataset

Following is the Open Email Marketing Dataset; you can use it without any restrictions.

Size: 172 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0