An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-selection"

princeton-nlp/LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Language: Jupyter Notebook - Size: 366 KB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 433 - Forks: 44

p-lambda/dsir

DSIR large-scale data selection framework for language model training

Language: Python - Size: 642 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 245 - Forks: 19

alon-albalak/data-selection-survey

A Survey on Data Selection for Language Models

Size: 1.53 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 216 - Forks: 11

georgian-io/Transformers-Domain-Adaptation 📦

:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains

Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 87 - Forks: 13

yueyu1030/Patron

[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

Language: Python - Size: 458 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 19 - Forks: 2

yichengchen24/MIG

Official code for MIG

Language: Python - Size: 10.3 MB - Last synced at: about 14 hours ago - Pushed at: about 16 hours ago - Stars: 16 - Forks: 1

reds-lab/projektor

This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).

Language: Python - Size: 39.8 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 1

Nokia-Bell-Labs/data-centric-federated-learning

Enhancing Efficiency in Multidevice Federated Learning through Data Selection

Language: Python - Size: 1.93 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 3

ArsamAryandoust/DataSelectionMaps 📦

Enhanced spatio-temporal electric load forecasts with less data using active deep learning

Language: Jupyter Notebook - Size: 875 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 4

JoyeBright/DataSelection-NMT

Repository for the experiments in my paper accepted to the CLIN Journal: "Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts"

Language: Perl - Size: 108 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 10 - Forks: 2

lvapeab/sentence-selectioNN

Keras sentence classification

Language: Python - Size: 139 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 10 - Forks: 1

zincware/ZnNL

A Python package for studying neural learning

Language: Python - Size: 7.68 MB - Last synced at: 17 days ago - Pushed at: 4 months ago - Stars: 8 - Forks: 1

allo-media/cynical-selection

Allo-media data selection tool

Language: Python - Size: 14.6 KB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

sterzhang/CORE

CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay (CogSci 2024 Oral)

Language: Python - Size: 2.96 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 1

4AI/generative_deduplication

Code for Generative Deduplication For Socia Media Data Selection (Findings of EMNLP 2024)

Language: Python - Size: 541 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

ippolito-cmu/ChasingRandom

Official Repository for the Paper: Chasing Random: Instruction Selection Strategies Fail to Generalize

Language: Python - Size: 4.98 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

wyu-du/Self-Training-Dialogue-Generation

This repository contains the data and code for the paper "Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation" (EMNLP2022-Findings).

Language: Python - Size: 5.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Bessouat40/pdf-region-picker

A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.

Language: JavaScript - Size: 3.92 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

abderrahman-bns/Data-Cleaning-and-Preprocessinng-with-Pandas

Introducing you to the fundamentals of the quintessential Python data analysis library, pandas, and its core data structures – the Series and DataFrame objects.

Language: Jupyter Notebook - Size: 604 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 2

JiaQiSJTU/IterIT

An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection

Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tigureis/Data-Preparation-from-kickstarter-campaigns

Kickstarter Data Prep: A hands-on guide to basic data cleaning and transformation.

Language: Jupyter Notebook - Size: 6.75 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

koudounasalkis/CSI-MIT

This repo contains the code for "Privacy Preserving Data Selection for Bias Mitigation in Speech Models"

Language: Jupyter Notebook - Size: 72.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SyncfusionExamples/cell-and-checkbox-selection-with-vue-grid-rows

A quick-start project that helps you to perform different types of selection in Vue Grid and know about different modes of selection – Row, Cell and Both. This project contains code snippet about cell, checkbox and toggle selection, and the way to get row index of selected cells using row selection events.

Language: JavaScript - Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Topics
instruction-tuning 4 language-model 3 nlp 3 llm 2 natural-language-processing 2 active-learning 2 data 2 deep-learning 2 data-science 2 few-shot-learning 2 machine-learning 2 physics 1 llms 1 prompt-learning 1 fine-tuning 1 cold-start 1 influence 1 acl2023 1 spoken-language-understanding 1 speech-recognition 1 privacy 1 bias-mitigation 1 asr 1 transfer-learning 1 llama 1 huggingface-transformers 1 huggingface-tokenizers 1 domain-adaptation 1 deprecated 1 survey 1 mistral 1 scaling-law 1 projection 1 performance-prediction 1 vuegrid 1 vue-grid 1 mathematics 1 machinelearning 1 pandas 1 numpy 1 in-domain 1 in-domain-translation 1 data-integration 1 data-construction 1 data-cleaning 1 region-picker 1 pdf 1 machine-translation 1 parsing 1 javascript 1 fitz 1 extract-data 1 data-extraction 1 spatio-temporal-prediction 1 neural-machine-translation 1 sampling 1 remote-sensing 1 load-forecasting 1 synthetic-data 1 post-training 1 importance-resampling 1 data-filtering 1 theano 1 statistical-machine-translation 1 recurrent-neural-networks 1 neural-network 1 lstm 1 keras 1 convolutional-neural-networks 1 cnn 1 uncertainty-estimation 1 transformers 1 text-augmentation 1 self-training 1 few-shot-generation 1 dialogue-generation 1 wearable-devices 1 wearable-computing 1 split-learning 1 flower 1 federated-learning 1 edge-computing 1 data-centric-machine-learning 1 data-centric-ai 1 generative-deduplication 1 emnlp2024 1 deduplication 1 vue-component 1 vue-cli 1 vue 1 toggleselection 1 toggle-selection 1 tabledata 1 selection 1 rowselection 1 row-selection 1 grid 1 datatable 1 checkboxselection 1 checkbox-selection 1