An open API service providing repository metadata for many open source software ecosystems.

Topic: "llm-datasets"

neo4j-labs/text2cypher

collection of text2cypher datasets, evaluations, and finetuning instructions

Language: Jupyter Notebook - Size: 4.8 MB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 177 - Forks: 22

dsdanielpark/open-llm-datasets

Repository for organizing datasets and papers used in Open LLM.

Size: 3.13 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 93 - Forks: 6

discus-labs/discus

A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

Language: Python - Size: 2.38 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 64 - Forks: 7

asimsinan/LLM-Research

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

Language: Python - Size: 3.12 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 52 - Forks: 8

amao0o0/awesome-AI-Math-Datasets

A collection of recent open-source math datasets for training and evaluating Math LLMs

Size: 129 KB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 11 - Forks: 0

altunenes/rustysozluk

Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust

Language: Rust - Size: 670 KB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

ronniross/asi-core-protocol

A framework to analyze how AGI/ASI might emerge from decentralized, adaptive systems, rather than as the fruit of a single model deployment. It also aims to present orientation as a dynamic and self-evolving Magna Carta, helping to guide the emergence of such phenomena.

Size: 122 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 1

neuralwork/audio2chat

Convert multi-speaker audio files to structured chat data for LLMs

Language: Python - Size: 2.02 MB - Last synced at: 1 day ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

DefinetlyNotAI/LLM_Data 📦

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

Language: Python - Size: 208 MB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

tiddly-gittly/TiddlyWiki-LLM-dataset

WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)

Language: TypeScript - Size: 363 KB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

dmeldrum6/LLMDatasetBuilder

LLM-Powered Dataset Creation Tool

Language: HTML - Size: 44.9 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

redblock-ai/parrot-python

PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.

Language: Python - Size: 5.97 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

arian-askari/SOLID

Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.

Language: Python - Size: 30.6 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

bot08/aiua-20k

Size: 17.6 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

jsurrea/LLM-Latino

Collection of ETL scripts used to create a dataset of text in Spanish to train Large Language Models.

Language: Python - Size: 23.4 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

aloobun/ccpem-modified

A modified dataset consisting of English dialogs between a user and an assistant discussing movie preferences in natural language.

Size: 15 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Related Topics