Topic: "data-selection"
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Language: Jupyter Notebook - Size: 366 KB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 433 - Forks: 44

p-lambda/dsir
DSIR large-scale data selection framework for language model training
Language: Python - Size: 642 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 245 - Forks: 19

alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
Size: 1.53 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 216 - Forks: 11

georgian-io/Transformers-Domain-Adaptation 📦
:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains
Language: Jupyter Notebook - Size: 1.93 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 87 - Forks: 13

yueyu1030/Patron
[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
Language: Python - Size: 458 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 19 - Forks: 2

yichengchen24/MIG
Official code for MIG
Language: Python - Size: 10.3 MB - Last synced at: about 14 hours ago - Pushed at: about 16 hours ago - Stars: 16 - Forks: 1

reds-lab/projektor
This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).
Language: Python - Size: 39.8 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 1

Nokia-Bell-Labs/data-centric-federated-learning
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
Language: Python - Size: 1.93 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 3

ArsamAryandoust/DataSelectionMaps 📦
Enhanced spatio-temporal electric load forecasts with less data using active deep learning
Language: Jupyter Notebook - Size: 875 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 4

JoyeBright/DataSelection-NMT
Repository for the experiments in my paper accepted to the CLIN Journal: "Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts"
Language: Perl - Size: 108 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 10 - Forks: 2

lvapeab/sentence-selectioNN
Keras sentence classification
Language: Python - Size: 139 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 10 - Forks: 1

zincware/ZnNL
A Python package for studying neural learning
Language: Python - Size: 7.68 MB - Last synced at: 17 days ago - Pushed at: 4 months ago - Stars: 8 - Forks: 1

allo-media/cynical-selection
Allo-media data selection tool
Language: Python - Size: 14.6 KB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

sterzhang/CORE
CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay (CogSci 2024 Oral)
Language: Python - Size: 2.96 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 4 - Forks: 1

4AI/generative_deduplication
Code for Generative Deduplication For Socia Media Data Selection (Findings of EMNLP 2024)
Language: Python - Size: 541 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

ippolito-cmu/ChasingRandom
Official Repository for the Paper: Chasing Random: Instruction Selection Strategies Fail to Generalize
Language: Python - Size: 4.98 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

wyu-du/Self-Training-Dialogue-Generation
This repository contains the data and code for the paper "Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation" (EMNLP2022-Findings).
Language: Python - Size: 5.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Bessouat40/pdf-region-picker
A project to select only part of a PDF file. It's usefull when you want to extract informations with some python library like fitz.
Language: JavaScript - Size: 3.92 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

abderrahman-bns/Data-Cleaning-and-Preprocessinng-with-Pandas
Introducing you to the fundamentals of the quintessential Python data analysis library, pandas, and its core data structures – the Series and DataFrame objects.
Language: Jupyter Notebook - Size: 604 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 2

JiaQiSJTU/IterIT
An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

tigureis/Data-Preparation-from-kickstarter-campaigns
Kickstarter Data Prep: A hands-on guide to basic data cleaning and transformation.
Language: Jupyter Notebook - Size: 6.75 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

koudounasalkis/CSI-MIT
This repo contains the code for "Privacy Preserving Data Selection for Bias Mitigation in Speech Models"
Language: Jupyter Notebook - Size: 72.3 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

SyncfusionExamples/cell-and-checkbox-selection-with-vue-grid-rows
A quick-start project that helps you to perform different types of selection in Vue Grid and know about different modes of selection – Row, Cell and Both. This project contains code snippet about cell, checkbox and toggle selection, and the way to get row index of selected cells using row selection events.
Language: JavaScript - Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
