An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-cleaning"

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Language: Python - Size: 16.8 MB - Last synced at: 2 days ago - Pushed at: 20 days ago - Stars: 11,239 - Forks: 876

voxel51/fiftyone

Refine high-quality datasets and visual AI models

Language: Python - Size: 2.03 GB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 10,191 - Forks: 696

johnkerl/miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Language: Go - Size: 201 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 9,562 - Forks: 230

unionai-oss/pandera

A light-weight, flexible, and expressive statistical data testing library

Language: Python - Size: 4.72 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 4,141 - Forks: 366

justmarkham/pandas-videos

Jupyter notebook and datasets from the pandas video series

Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 2,187 - Forks: 1,928

OpenDCAI/DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Language: Python - Size: 4.88 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 2,003 - Forks: 141

justmarkham/DAT8

General Assembly's 2015 Data Science course in Washington, DC

Language: Jupyter Notebook - Size: 23 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 1,613 - Forks: 1,067

hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Language: Python - Size: 110 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,540 - Forks: 233

skrub-data/skrub

Machine learning with dataframes

Language: Python - Size: 15.3 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 1,538 - Forks: 187

sfirke/janitor

simple tools for data cleaning in R

Language: R - Size: 8.2 MB - Last synced at: 30 days ago - Pushed at: about 1 year ago - Stars: 1,438 - Forks: 132

data-forge/data-forge-ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Language: TypeScript - Size: 3.68 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1,383 - Forks: 77

ECNU-ICALK/EduChat

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

Language: Jupyter Notebook - Size: 242 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 889 - Forks: 103

akanz1/klib

Easy to use Python library of customized functions for cleaning and analyzing data.

Language: Python - Size: 47.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 520 - Forks: 56

schema-inspector/schema-inspector

Schema-Inspector is a simple JavaScript object sanitization and validation module.

Language: JavaScript - Size: 1.85 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 503 - Forks: 45

Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Language: C++ - Size: 153 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 459 - Forks: 88

encord-team/encord-active

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

Language: Python - Size: 264 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 449 - Forks: 26

data-cleaning/validate

Professional data validation for the R environment

Language: R - Size: 6.32 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 428 - Forks: 42

DataWithBaraa/sql-data-warehouse-project

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Language: TSQL - Size: 20.5 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 391 - Forks: 320

jim-schwoebel/voicebook

🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).

Language: Python - Size: 299 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 381 - Forks: 86

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 378 - Forks: 27

HKUSTDial/awesome-data-agents

Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"

Language: Python - Size: 57 MB - Last synced at: 17 days ago - Pushed at: 20 days ago - Stars: 322 - Forks: 16

rasgointelligence/feature-engineering-tutorials

Data Science Feature Engineering and Selection Tutorials

Language: Jupyter Notebook - Size: 2.76 MB - Last synced at: 21 days ago - Pushed at: 24 days ago - Stars: 289 - Forks: 101

probcomp/PClean

A domain-specific probabilistic programming language for scalable Bayesian data cleaning

Language: Julia - Size: 1.37 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 227 - Forks: 32

CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering

LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!

Language: Python - Size: 39.9 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 225 - Forks: 62

genomoncology/FuzzTypes 📦

Pydantic extension for annotating autocorrecting fields.

Language: Python - Size: 359 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 220 - Forks: 4

BdR76/CSVLint

CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.

Language: C# - Size: 13.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 205 - Forks: 18

charlesdedampierre/BunkaTopics

🗺️ Data Cleaning and Textual Data Visualization 🗺️

Language: Python - Size: 229 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 197 - Forks: 18

ajaymache/data-analysis-using-python

Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊

Language: Jupyter Notebook - Size: 49.3 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 193 - Forks: 89

ekstroem/dataMaid

An R package for data screening

Language: HTML - Size: 25.5 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 143 - Forks: 26

Hi-Dolphin/datamax

A powerful multi-format file parsing, data cleaning, and AI annotation toolkit.

Language: Python - Size: 3.37 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 142 - Forks: 17

jim-schwoebel/allie

🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

Language: Python - Size: 275 MB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 141 - Forks: 35

hi-primus/bumblebee

🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)

Language: Vue - Size: 23 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 141 - Forks: 35

aai-institute/pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

Language: Python - Size: 454 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 140 - Forks: 9

KulikDM/pythresh

Outlier Detection Thresholding

Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 137 - Forks: 5

iam-mhaseeb/Skytrax-Data-Warehouse 📦

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

Language: Python - Size: 1.34 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 30

ChrisMuir/refinr

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

Language: C++ - Size: 287 KB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 104 - Forks: 5

opendataval/opendataval

OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)

Language: Python - Size: 23.4 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 99 - Forks: 8

sail-sg/sailcraft

🚢 Data Toolkit for Sailor Language Models

Language: Python - Size: 219 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 94 - Forks: 11

trenton3983/DataCamp

Python-based Jupyter notebooks, notes, and project solutions from DataCamp courses on data science, machine learning, and statistics.

Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: 18 days ago - Pushed at: 21 days ago - Stars: 93 - Forks: 97

awesome-mlops/awesome-ml-monitoring

A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀

Size: 4.88 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 9

Iqrar99/data-analytics-portfolio

Portfolio of data science and data analyst projects completed by me for academic, self learning, and hobby purposes.

Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 84 - Forks: 22

LoLei/redditcleaner

Cleans Reddit Text Data :scroll: :broom:

Language: Python - Size: 41 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 82 - Forks: 2

cosbidev/PyTrack

a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction

Language: Python - Size: 92.3 MB - Last synced at: 13 days ago - Pushed at: 12 months ago - Stars: 76 - Forks: 13

HoloClean/HoloClean-Legacy-deprecated 📦

A Machine Learning System for Data Enrichment.

Language: Python - Size: 179 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 75 - Forks: 22

Renumics/sliceguard

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

Language: Python - Size: 4.28 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 64 - Forks: 3

akvo/akvo-lumen

Make sense of your data

Language: JavaScript - Size: 35.5 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 62 - Forks: 18

rvanasa/pandas-gpt

Power up your data science workflow with ChatGPT.

Language: Jupyter Notebook - Size: 498 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 61 - Forks: 9

sharmaroshan/Drugs-Recommendation-using-Reviews

Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.

Language: Jupyter Notebook - Size: 3.86 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 60 - Forks: 31

ibug-group/fpage

FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild

Language: Python - Size: 3.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 59 - Forks: 8

scottythered/gratefuldata

Grateful Data isn't programming code, but an online tutorial about data acquisition, cleaning and enriching, using publicly accessible data on the band the Grateful Dead as examples. Read the Wiki to find out how to use the sample data.

Size: 4.25 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 55 - Forks: 6

LibraryCarpentry/lc-open-refine

Library Carpentry: OpenRefine

Size: 25.2 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 54 - Forks: 136

LaureBerti/Learn2Clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

Language: Python - Size: 34.6 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 52 - Forks: 20

hplt-project/OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

Language: Python - Size: 7.71 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 51 - Forks: 15

ropensci/taxa

taxonomic classes for R

Language: R - Size: 20.7 MB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 50 - Forks: 12

mrankitgupta/Sales-Insights-Data-Analysis-using-Tableau-and-SQL

India based Hardware company Sales Insights - A Data Analysis Project performed on Tableau & SQL

Size: 4.95 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 50 - Forks: 12

msberends/clean

Fast and Easy Data Cleaning (in R)

Language: R - Size: 459 KB - Last synced at: 9 months ago - Pushed at: over 5 years ago - Stars: 49 - Forks: 1

sharad461/nepali-translator

Neural Machine Translation on the Nepali-English language pair

Language: Python - Size: 3.85 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 16

mramshaw/Data-Cleaning

Data Cleaning with Python

Language: Python - Size: 1.17 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 17

Elysian01/Data-Purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 6

dssg/pgdedupe

A simple command line interface to the datamade/dedupe library.

Language: Jupyter Notebook - Size: 225 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 42 - Forks: 5

skupriienko/Ukrainian-Stopwords

the list of ~2000 ukrainian stopwords (with numbers)

Language: Python - Size: 116 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 39 - Forks: 19

TheRoniOne/Cleaner.jl

A toolbox of simple solutions for common data cleaning problems.

Language: Julia - Size: 556 KB - Last synced at: 18 days ago - Pushed at: 21 days ago - Stars: 36 - Forks: 3

Digital-Dermatology/SelfClean

🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).

Language: Python - Size: 37.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 36 - Forks: 1

ropensci-archive/scrubr 📦

:warning: ARCHIVED :warning: Clean species occurrence records

Language: R - Size: 1.14 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 34 - Forks: 10

chrislicodes/Udacity-Data-Analyst-Nanodegree

Repository for the projects needed to complete the Data Analyst Nanodegree.

Language: Jupyter Notebook - Size: 93.1 MB - Last synced at: almost 3 years ago - Pushed at: almost 7 years ago - Stars: 34 - Forks: 22

jacobmarks/image-quality-issues

FiftyOne Plugin for finding common image quality issues

Language: Python - Size: 147 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 33 - Forks: 3

Sinhaaz/Accenture-Data-Analytics-and-Visualization-Virtual-Internship

Accenture Data Analytics & Visualization Internship

Size: 3.9 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 33 - Forks: 12

zhenglz/dockingML

A package for MD, Docking and Machine learning drug discovery pipeline

Language: Python - Size: 34.7 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 33 - Forks: 20

cleanlab/cleanlab-studio

Client interface to Cleanlab Studio

Language: Python - Size: 3.52 MB - Last synced at: 2 days ago - Pushed at: 11 months ago - Stars: 32 - Forks: 10

datacarpentry/stata-economics

Economics Lesson with Stata

Language: Makefile - Size: 17.1 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 20

datacarpentry/OpenRefine-ecology-lesson

Data Cleaning with OpenRefine for Ecologists

Size: 19.1 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 29 - Forks: 111

sharmaroshan/FIFA-2019-Analysis

This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations

Language: Jupyter Notebook - Size: 7.18 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 29 - Forks: 23

HITsz-TMG/YiZhao

YiZhao: A 2TB Open Financial Corpus. Data and tools for generating and inspecting YiZhao, a safe, high-quality, open-source bilingual financial corpus (Chinese and English).

Language: Python - Size: 6.68 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 3

sharmaroshan/Big-Mart-Sales-Prediction

Using Machine Learning Algorithms for Regression Analysis to predict the sales pattern and Using Data Analysis and Data Visualizations to Support it.

Language: Jupyter Notebook - Size: 648 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 27 - Forks: 10

sharmaroshan/Churn-Modelling-Dataset

Predicting which set of the customers are gong to churn out from the organization by looking into some of the important attributes and applying Machine Learning and Deep Learning on it.

Language: Jupyter Notebook - Size: 319 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 27 - Forks: 33

irsol/udacity-bertelsmann-data-science-challenge-scholarship-2018

This is a repo for my Bertelsmann Data Science Scholarship Challenge: notes, exercises, quizzes.

Language: Python - Size: 5.63 MB - Last synced at: 9 months ago - Pushed at: over 7 years ago - Stars: 27 - Forks: 26

mhmdkardosha/CAT-Reloaded-2025-Data-Science-Roadmap

Roadmap for Data Science circle associated with CAT Reloaded.

Size: 83 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 26 - Forks: 1

jmcastagnetto/covid-19-data-cleanup 📦

Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19

Language: R - Size: 1010 MB - Last synced at: 9 months ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 13

datacarpentry/openrefine-socialsci

OpenRefine for Social Science Data

Size: 11.7 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 25 - Forks: 47

umich-dbgroup/foofah

Foofah: programming-by-example data transformation program synthesizer

Language: CSS - Size: 4.31 MB - Last synced at: almost 3 years ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 10

roshansridhar/Multimodal-Sentiment-Analysis

Engaged in research to help improve to boost text sentiment analysis using facial features from video using machine learning.

Language: Jupyter Notebook - Size: 2.04 MB - Last synced at: almost 3 years ago - Pushed at: almost 8 years ago - Stars: 25 - Forks: 10

jkminder/data2neo

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

Language: Python - Size: 5.59 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 0

SouGuit/Zomato_Dataset_Analysis

Zomato Data Exploration and Analysis with SQL (SQL SERVER)

Language: TSQL - Size: 1.05 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 8

facultyai/boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Language: Python - Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 24 - Forks: 9

uzaymacar/exemplary-ml-pipeline

Exemplary, annotated machine learning pipeline for any tabular data problem.

Language: Jupyter Notebook - Size: 104 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 7

MigoXLab/awesome-data-quality

A comprehensive collection of data quality resources, tools, papers, and projects across various data types including traditional data, LLM pretraining/fine-tuning data, multimodal data, and more. Essential reference for researchers and practitioners in data-centric AI.

Size: 71.3 KB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 23 - Forks: 4

sharmaroshan/Students-Performance-Analytics

Students Performance Evaluation using Feature Engineering, Feature Extraction, Manipulation of Data, Data Analysis, Data Visualization and at lat applying Classification Algorithms from Machine Learning to Separate Students with different grades

Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 12

data-cleaning/errorlocate

Find and replace erroneous fields in data using validation rules

Language: R - Size: 7.76 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 22 - Forks: 3

the-Hull/datacleanr

Interactive and Reproducible Data Cleaning

Language: R - Size: 24.1 MB - Last synced at: 30 days ago - Pushed at: 8 months ago - Stars: 22 - Forks: 5

catalyst/moodle-local_datacleaner

Reduce, filter, and anonymize moodle data for non-prod environments

Language: PHP - Size: 3.38 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 21 - Forks: 17

meaningTeam/tidy-tunes

Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open source models while minimizing dependencies.

Language: Python - Size: 83 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 3

FalconSoft/dataPipe

dataPipe is a data processing and data analytics library for JavaScript. Inspired by LINQ (C#) and Pandas (Python)

Language: TypeScript - Size: 279 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 2

KshitizPandya/Natural-Language-Processing-with-Machine-Learning

This repository builds a basic understanding of Natural Language Processing and Machine Learning tasks around it.

Language: Jupyter Notebook - Size: 2.06 MB - Last synced at: almost 3 years ago - Pushed at: almost 3 years ago - Stars: 21 - Forks: 1

bakdata/dedupe

Java DSL for (online) deduplication

Language: Java - Size: 1.01 MB - Last synced at: 9 months ago - Pushed at: 12 months ago - Stars: 20 - Forks: 2

rubydamodar/The-Ultimate-Pandas-Bootcamp

Welcome to the Pandas for Data Science repository! This course is designed to take you from beginner to proficient in using Pandas, the powerful data manipulation library in Python. Whether you're just starting your data science journey or looking to sharpen your skills, this repository contains all the resources

Language: Jupyter Notebook - Size: 459 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 0

ammarshaikh123/Projects-on-Data-Cleaning-and-Manipulation

This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.

Language: Jupyter Notebook - Size: 8.55 MB - Last synced at: almost 3 years ago - Pushed at: about 6 years ago - Stars: 20 - Forks: 16

Amine-Smahi/R-Learning-Journey

Some of the projects i made when starting to learn R for Data Science at the university

Language: R - Size: 63.5 KB - Last synced at: 9 months ago - Pushed at: over 6 years ago - Stars: 20 - Forks: 0

LimaRAF/plantR

An R Package for Managing Species Records from Biological Collections

Language: R - Size: 590 MB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 19 - Forks: 7

BioPsyk/cleansumstats

Convert GWAS sumstat files into a common format with a common reference for positions, rsids and effect alleles.

Language: Shell - Size: 39.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 19 - Forks: 3

Aifred-Health/VulcanAI

A high level deep learning framework for quickly prototyping networks with added tools in data visualisation, model interpretability and performance metrics

Language: Python - Size: 25.8 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 7

Related Topics
data-visualization 362 data-analysis 345 python 302 data-science 273 machine-learning 210 pandas 182 exploratory-data-analysis 104 feature-engineering 86 sql 81 jupyter-notebook 80 numpy 78 data-wrangling 68 eda 68 matplotlib 67 data-preprocessing 67 data 66 python3 63 data-analytics 62 r 55 seaborn 54 powerbi 48 data-processing 44 tableau 44 excel 43 data-mining 43 data-transformation 42 data-manipulation 40 dashboard 39 data-engineering 38 deep-learning 37 statistics 34 visualization 33 csv 32 web-scraping 30 feature-selection 30 logistic-regression 28 data-exploration 28 scikit-learn 26 classification 25 data-modeling 24 etl 24 nlp 23 feature-extraction 23 machine-learning-algorithms 22 business-intelligence 22 sklearn 21 data-preparation 21 predictive-modeling 21 mysql 20 dataset 20 data-quality 20 linear-regression 20 clustering 19 javascript 19 data-cleansing 19 data-collection 18 outlier-detection 18 pivot-tables 18 natural-language-processing 18 streamlit 18 database 17 pandas-dataframe 17 data-visualisation 16 random-forest 16 regression-models 16 data-analysis-python 16 automation 15 preprocessing 15 plotly 15 sentiment-analysis 15 analytics 14 statistical-analysis 14 artificial-intelligence 14 data-validation 14 datascience 13 postgresql 13 webscraping 13 dashboards 13 cross-validation 13 data-profiling 12 data-management 12 dax 12 json 12 data-pipeline 12 beautifulsoup 11 tableau-public 11 flask 11 business-analytics 11 pandas-python 11 hyperparameter-tuning 11 model-evaluation 11 supervised-learning 10 power-bi 10 data-centric-ai 10 matplotlib-pyplot 10 model-building 10 data-curation 10 etl-pipeline 10 datasets 10 regression 10