Topic: "data-cleaning"
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Language: Python - Size: 16.8 MB - Last synced at: 2 days ago - Pushed at: 20 days ago - Stars: 11,239 - Forks: 876
voxel51/fiftyone
Refine high-quality datasets and visual AI models
Language: Python - Size: 2.03 GB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 10,191 - Forks: 696
johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Language: Go - Size: 201 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 9,562 - Forks: 230
unionai-oss/pandera
A light-weight, flexible, and expressive statistical data testing library
Language: Python - Size: 4.72 MB - Last synced at: 11 days ago - Pushed at: 13 days ago - Stars: 4,141 - Forks: 366
justmarkham/pandas-videos
Jupyter notebook and datasets from the pandas video series
Language: Jupyter Notebook - Size: 1.84 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 2,187 - Forks: 1,928
OpenDCAI/DataFlow
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Language: Python - Size: 4.88 MB - Last synced at: 5 days ago - Pushed at: 8 days ago - Stars: 2,003 - Forks: 141
justmarkham/DAT8
General Assembly's 2015 Data Science course in Washington, DC
Language: Jupyter Notebook - Size: 23 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 1,613 - Forks: 1,067
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Language: Python - Size: 110 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,540 - Forks: 233
skrub-data/skrub
Machine learning with dataframes
Language: Python - Size: 15.3 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 1,538 - Forks: 187
sfirke/janitor
simple tools for data cleaning in R
Language: R - Size: 8.2 MB - Last synced at: 30 days ago - Pushed at: about 1 year ago - Stars: 1,438 - Forks: 132
data-forge/data-forge-ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Language: TypeScript - Size: 3.68 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1,383 - Forks: 77
ECNU-ICALK/EduChat
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Language: Jupyter Notebook - Size: 242 MB - Last synced at: 15 days ago - Pushed at: 6 months ago - Stars: 889 - Forks: 103
akanz1/klib
Easy to use Python library of customized functions for cleaning and analyzing data.
Language: Python - Size: 47.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 520 - Forks: 56
schema-inspector/schema-inspector
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Language: JavaScript - Size: 1.85 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 503 - Forks: 45
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Language: C++ - Size: 153 MB - Last synced at: 3 days ago - Pushed at: 5 days ago - Stars: 459 - Forks: 88
encord-team/encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Language: Python - Size: 264 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 449 - Forks: 26
data-cleaning/validate
Professional data validation for the R environment
Language: R - Size: 6.32 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 428 - Forks: 42
DataWithBaraa/sql-data-warehouse-project
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
Language: TSQL - Size: 20.5 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 391 - Forks: 320
jim-schwoebel/voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Language: Python - Size: 299 MB - Last synced at: 8 months ago - Pushed at: about 3 years ago - Stars: 381 - Forks: 86
msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Language: Python - Size: 25.4 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 378 - Forks: 27
HKUSTDial/awesome-data-agents
Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"
Language: Python - Size: 57 MB - Last synced at: 17 days ago - Pushed at: 20 days ago - Stars: 322 - Forks: 16
rasgointelligence/feature-engineering-tutorials
Data Science Feature Engineering and Selection Tutorials
Language: Jupyter Notebook - Size: 2.76 MB - Last synced at: 21 days ago - Pushed at: 24 days ago - Stars: 289 - Forks: 101
probcomp/PClean
A domain-specific probabilistic programming language for scalable Bayesian data cleaning
Language: Julia - Size: 1.37 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 227 - Forks: 32
CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
Language: Python - Size: 39.9 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 225 - Forks: 62
genomoncology/FuzzTypes 📦
Pydantic extension for annotating autocorrecting fields.
Language: Python - Size: 359 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 220 - Forks: 4
BdR76/CSVLint
CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.
Language: C# - Size: 13.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 205 - Forks: 18
charlesdedampierre/BunkaTopics
🗺️ Data Cleaning and Textual Data Visualization 🗺️
Language: Python - Size: 229 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 197 - Forks: 18
ajaymache/data-analysis-using-python
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
Language: Jupyter Notebook - Size: 49.3 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 193 - Forks: 89
ekstroem/dataMaid
An R package for data screening
Language: HTML - Size: 25.5 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 143 - Forks: 26
Hi-Dolphin/datamax
A powerful multi-format file parsing, data cleaning, and AI annotation toolkit.
Language: Python - Size: 3.37 MB - Last synced at: 30 days ago - Pushed at: about 1 month ago - Stars: 142 - Forks: 17
jim-schwoebel/allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Language: Python - Size: 275 MB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 141 - Forks: 35
hi-primus/bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Language: Vue - Size: 23 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 141 - Forks: 35
aai-institute/pyDVL
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
Language: Python - Size: 454 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 140 - Forks: 9
KulikDM/pythresh
Outlier Detection Thresholding
Language: Jupyter Notebook - Size: 14.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 137 - Forks: 5
iam-mhaseeb/Skytrax-Data-Warehouse 📦
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
Language: Python - Size: 1.34 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 30
ChrisMuir/refinr
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
Language: C++ - Size: 287 KB - Last synced at: 29 days ago - Pushed at: almost 2 years ago - Stars: 104 - Forks: 5
opendataval/opendataval
OpenDataVal: a Unified Benchmark for Data Valuation in Python (NeurIPS 2023)
Language: Python - Size: 23.4 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 99 - Forks: 8
sail-sg/sailcraft
🚢 Data Toolkit for Sailor Language Models
Language: Python - Size: 219 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 94 - Forks: 11
trenton3983/DataCamp
Python-based Jupyter notebooks, notes, and project solutions from DataCamp courses on data science, machine learning, and statistics.
Language: Jupyter Notebook - Size: 13.3 MB - Last synced at: 18 days ago - Pushed at: 21 days ago - Stars: 93 - Forks: 97
awesome-mlops/awesome-ml-monitoring
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀
Size: 4.88 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 90 - Forks: 9
Iqrar99/data-analytics-portfolio
Portfolio of data science and data analyst projects completed by me for academic, self learning, and hobby purposes.
Language: Jupyter Notebook - Size: 11.8 MB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 84 - Forks: 22
LoLei/redditcleaner
Cleans Reddit Text Data :scroll: :broom:
Language: Python - Size: 41 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 82 - Forks: 2
cosbidev/PyTrack
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction
Language: Python - Size: 92.3 MB - Last synced at: 13 days ago - Pushed at: 12 months ago - Stars: 76 - Forks: 13
HoloClean/HoloClean-Legacy-deprecated 📦
A Machine Learning System for Data Enrichment.
Language: Python - Size: 179 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 75 - Forks: 22
Renumics/sliceguard
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
Language: Python - Size: 4.28 MB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 64 - Forks: 3
akvo/akvo-lumen
Make sense of your data
Language: JavaScript - Size: 35.5 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 62 - Forks: 18
rvanasa/pandas-gpt
Power up your data science workflow with ChatGPT.
Language: Jupyter Notebook - Size: 498 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 61 - Forks: 9
sharmaroshan/Drugs-Recommendation-using-Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Language: Jupyter Notebook - Size: 3.86 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 60 - Forks: 31
ibug-group/fpage
FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in the Wild
Language: Python - Size: 3.7 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 59 - Forks: 8
scottythered/gratefuldata
Grateful Data isn't programming code, but an online tutorial about data acquisition, cleaning and enriching, using publicly accessible data on the band the Grateful Dead as examples. Read the Wiki to find out how to use the sample data.
Size: 4.25 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 55 - Forks: 6
LibraryCarpentry/lc-open-refine
Library Carpentry: OpenRefine
Size: 25.2 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 54 - Forks: 136
LaureBerti/Learn2Clean
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
Language: Python - Size: 34.6 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 52 - Forks: 20
hplt-project/OpusCleaner
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
Language: Python - Size: 7.71 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 51 - Forks: 15
ropensci/taxa
taxonomic classes for R
Language: R - Size: 20.7 MB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 50 - Forks: 12
mrankitgupta/Sales-Insights-Data-Analysis-using-Tableau-and-SQL
India based Hardware company Sales Insights - A Data Analysis Project performed on Tableau & SQL
Size: 4.95 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 50 - Forks: 12
msberends/clean
Fast and Easy Data Cleaning (in R)
Language: R - Size: 459 KB - Last synced at: 9 months ago - Pushed at: over 5 years ago - Stars: 49 - Forks: 1
sharad461/nepali-translator
Neural Machine Translation on the Nepali-English language pair
Language: Python - Size: 3.85 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 16
mramshaw/Data-Cleaning
Data Cleaning with Python
Language: Python - Size: 1.17 MB - Last synced at: 5 months ago - Pushed at: over 1 year ago - Stars: 47 - Forks: 17
Elysian01/Data-Purifier
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
Language: Jupyter Notebook - Size: 7.51 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 45 - Forks: 6
dssg/pgdedupe
A simple command line interface to the datamade/dedupe library.
Language: Jupyter Notebook - Size: 225 KB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 42 - Forks: 5
skupriienko/Ukrainian-Stopwords
the list of ~2000 ukrainian stopwords (with numbers)
Language: Python - Size: 116 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 39 - Forks: 19
TheRoniOne/Cleaner.jl
A toolbox of simple solutions for common data cleaning problems.
Language: Julia - Size: 556 KB - Last synced at: 18 days ago - Pushed at: 21 days ago - Stars: 36 - Forks: 3
Digital-Dermatology/SelfClean
🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).
Language: Python - Size: 37.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 36 - Forks: 1
ropensci-archive/scrubr 📦
:warning: ARCHIVED :warning: Clean species occurrence records
Language: R - Size: 1.14 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 34 - Forks: 10
chrislicodes/Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Language: Jupyter Notebook - Size: 93.1 MB - Last synced at: almost 3 years ago - Pushed at: almost 7 years ago - Stars: 34 - Forks: 22
jacobmarks/image-quality-issues
FiftyOne Plugin for finding common image quality issues
Language: Python - Size: 147 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 33 - Forks: 3
Sinhaaz/Accenture-Data-Analytics-and-Visualization-Virtual-Internship
Accenture Data Analytics & Visualization Internship
Size: 3.9 MB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 33 - Forks: 12
zhenglz/dockingML
A package for MD, Docking and Machine learning drug discovery pipeline
Language: Python - Size: 34.7 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 33 - Forks: 20
cleanlab/cleanlab-studio
Client interface to Cleanlab Studio
Language: Python - Size: 3.52 MB - Last synced at: 2 days ago - Pushed at: 11 months ago - Stars: 32 - Forks: 10
datacarpentry/stata-economics
Economics Lesson with Stata
Language: Makefile - Size: 17.1 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 20
datacarpentry/OpenRefine-ecology-lesson
Data Cleaning with OpenRefine for Ecologists
Size: 19.1 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 29 - Forks: 111
sharmaroshan/FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Language: Jupyter Notebook - Size: 7.18 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 29 - Forks: 23
HITsz-TMG/YiZhao
YiZhao: A 2TB Open Financial Corpus. Data and tools for generating and inspecting YiZhao, a safe, high-quality, open-source bilingual financial corpus (Chinese and English).
Language: Python - Size: 6.68 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 3
sharmaroshan/Big-Mart-Sales-Prediction
Using Machine Learning Algorithms for Regression Analysis to predict the sales pattern and Using Data Analysis and Data Visualizations to Support it.
Language: Jupyter Notebook - Size: 648 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 27 - Forks: 10
sharmaroshan/Churn-Modelling-Dataset
Predicting which set of the customers are gong to churn out from the organization by looking into some of the important attributes and applying Machine Learning and Deep Learning on it.
Language: Jupyter Notebook - Size: 319 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 27 - Forks: 33
irsol/udacity-bertelsmann-data-science-challenge-scholarship-2018
This is a repo for my Bertelsmann Data Science Scholarship Challenge: notes, exercises, quizzes.
Language: Python - Size: 5.63 MB - Last synced at: 9 months ago - Pushed at: over 7 years ago - Stars: 27 - Forks: 26
mhmdkardosha/CAT-Reloaded-2025-Data-Science-Roadmap
Roadmap for Data Science circle associated with CAT Reloaded.
Size: 83 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 26 - Forks: 1
jmcastagnetto/covid-19-data-cleanup 📦
Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Language: R - Size: 1010 MB - Last synced at: 9 months ago - Pushed at: almost 5 years ago - Stars: 26 - Forks: 13
datacarpentry/openrefine-socialsci
OpenRefine for Social Science Data
Size: 11.7 MB - Last synced at: 6 days ago - Pushed at: 9 days ago - Stars: 25 - Forks: 47
umich-dbgroup/foofah
Foofah: programming-by-example data transformation program synthesizer
Language: CSS - Size: 4.31 MB - Last synced at: almost 3 years ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 10
roshansridhar/Multimodal-Sentiment-Analysis
Engaged in research to help improve to boost text sentiment analysis using facial features from video using machine learning.
Language: Jupyter Notebook - Size: 2.04 MB - Last synced at: almost 3 years ago - Pushed at: almost 8 years ago - Stars: 25 - Forks: 10
jkminder/data2neo
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.
Language: Python - Size: 5.59 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 0
SouGuit/Zomato_Dataset_Analysis
Zomato Data Exploration and Analysis with SQL (SQL SERVER)
Language: TSQL - Size: 1.05 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 8
facultyai/boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Language: Python - Size: 21.5 KB - Last synced at: 4 days ago - Pushed at: over 5 years ago - Stars: 24 - Forks: 9
uzaymacar/exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Language: Jupyter Notebook - Size: 104 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 24 - Forks: 7
MigoXLab/awesome-data-quality
A comprehensive collection of data quality resources, tools, papers, and projects across various data types including traditional data, LLM pretraining/fine-tuning data, multimodal data, and more. Essential reference for researchers and practitioners in data-centric AI.
Size: 71.3 KB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 23 - Forks: 4
sharmaroshan/Students-Performance-Analytics
Students Performance Evaluation using Feature Engineering, Feature Extraction, Manipulation of Data, Data Analysis, Data Visualization and at lat applying Classification Algorithms from Machine Learning to Separate Students with different grades
Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 12
data-cleaning/errorlocate
Find and replace erroneous fields in data using validation rules
Language: R - Size: 7.76 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 22 - Forks: 3
the-Hull/datacleanr
Interactive and Reproducible Data Cleaning
Language: R - Size: 24.1 MB - Last synced at: 30 days ago - Pushed at: 8 months ago - Stars: 22 - Forks: 5
catalyst/moodle-local_datacleaner
Reduce, filter, and anonymize moodle data for non-prod environments
Language: PHP - Size: 3.38 MB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 21 - Forks: 17
meaningTeam/tidy-tunes
Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open source models while minimizing dependencies.
Language: Python - Size: 83 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 21 - Forks: 3
FalconSoft/dataPipe
dataPipe is a data processing and data analytics library for JavaScript. Inspired by LINQ (C#) and Pandas (Python)
Language: TypeScript - Size: 279 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 2
KshitizPandya/Natural-Language-Processing-with-Machine-Learning
This repository builds a basic understanding of Natural Language Processing and Machine Learning tasks around it.
Language: Jupyter Notebook - Size: 2.06 MB - Last synced at: almost 3 years ago - Pushed at: almost 3 years ago - Stars: 21 - Forks: 1
bakdata/dedupe
Java DSL for (online) deduplication
Language: Java - Size: 1.01 MB - Last synced at: 9 months ago - Pushed at: 12 months ago - Stars: 20 - Forks: 2
rubydamodar/The-Ultimate-Pandas-Bootcamp
Welcome to the Pandas for Data Science repository! This course is designed to take you from beginner to proficient in using Pandas, the powerful data manipulation library in Python. Whether you're just starting your data science journey or looking to sharpen your skills, this repository contains all the resources
Language: Jupyter Notebook - Size: 459 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 0
ammarshaikh123/Projects-on-Data-Cleaning-and-Manipulation
This repository contains projects I have worked on for Data Cleaning and Manipulation in Python.
Language: Jupyter Notebook - Size: 8.55 MB - Last synced at: almost 3 years ago - Pushed at: about 6 years ago - Stars: 20 - Forks: 16
Amine-Smahi/R-Learning-Journey
Some of the projects i made when starting to learn R for Data Science at the university
Language: R - Size: 63.5 KB - Last synced at: 9 months ago - Pushed at: over 6 years ago - Stars: 20 - Forks: 0
LimaRAF/plantR
An R Package for Managing Species Records from Biological Collections
Language: R - Size: 590 MB - Last synced at: 18 days ago - Pushed at: 20 days ago - Stars: 19 - Forks: 7
BioPsyk/cleansumstats
Convert GWAS sumstat files into a common format with a common reference for positions, rsids and effect alleles.
Language: Shell - Size: 39.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 19 - Forks: 3
Aifred-Health/VulcanAI
A high level deep learning framework for quickly prototyping networks with added tools in data visualisation, model interpretability and performance metrics
Language: Python - Size: 25.8 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 19 - Forks: 7