GitHub topics: llm-training-data
deepakshroff/Capston-Gemini-ChatBot
π¨βπ«This project was developed under the guidance of Mr. Lokesh Sir as part of the AI & ML Training Program. It explores LLM integration using Google Gemini APIs with a custom UI built on Streamlit.
Language: Python - Size: 117 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Cre4T3Tiv3/Cre4T3Tiv3
Iβm a senior software engineer crafting scalable systems end to end. With 10+ years across fintech, ad-tech, and enterprise SaaS, I deliver production-grade software that fuses robust backend architecture, seamless frontend UX, and cutting-edge AI & ML native tools.
Size: 714 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

vinsblack/The-Stach-Processed-v2
Sample edition of The Stack Enriched: annotated, secure, and optimized code dataset, this is a sample version
Language: Python - Size: 199 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

BlazeWild/Custom_LLM_DataGen_Template
π§ Modular pipeline for generating high-quality, domain-specific datasets for LLM fine-tuning β from PDFs and web scraping to synthetic Q&A generation, quality filtering, and training-ready formatting.
Language: Python - Size: 24.4 KB - Last synced at: about 2 hours ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

emailmarketingdataset/Open-Email-Marketing-Dataset
Following is the Open Email Marketing Dataset; you can use it without any restrictions.
Size: 172 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0
