GitHub topics: data-splitting
drkocoglu/Patter_Recognition_Class
Pattern Recognition - ECE - TTU - Spring 2021
Language: MATLAB - Size: 21.2 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Language: Python - Size: 2.42 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 7

szcf-weiya/SplitClusterTest.jl
Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"
Language: Julia - Size: 1.02 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

aarryasutar/Credit_EDA
This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.
Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

krisssix/BankruptcyPrediction
Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.
Language: Python - Size: 4.88 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

katiebristol/data_splitter
A basic Python script to split a .dat file into individual sample files.
Language: Python - Size: 1.7 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
Size: 1.67 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shenwanxiang/ChemBench
MoleculeNet benchmark dataset & MolMapNet dataset
Language: HTML - Size: 126 MB - Last synced at: 12 months ago - Pushed at: about 3 years ago - Stars: 59 - Forks: 17

Lefteris-Souflas/Spark-Movies-Analytics
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
Language: Jupyter Notebook - Size: 289 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

MadhuBala11/DiabetesPrediction
In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building
Language: Jupyter Notebook - Size: 3.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

MuhireIghor/health-pro-backend
A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure
Language: JavaScript - Size: 30.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

JVTupinamba/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
Language: Jupyter Notebook - Size: 50.8 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

UERJ-LIVIA/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Nicolas-Bolouri/Cloud-Data-Protection-Analysis
Comparative Analysis of Data Protection Mechanisms in Public Clouds
Size: 1.46 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yuerongz/DUPLEX-data-split-function
Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.
Language: Python - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

omardbaa/Data-Splitter
Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.
Language: Jupyter Notebook - Size: 3.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Dimas263/Preprocessing-Data-into-Train-Test-Val-Data
Python Preprocessing for Sales Project Notebook
Language: Jupyter Notebook - Size: 3.06 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

NabilahSharfina/Ruangguru-Bootcamp
Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2
Language: Jupyter Notebook - Size: 19 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0
