An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-splitting

drkocoglu/Patter_Recognition_Class

Pattern Recognition - ECE - TTU - Spring 2021

Language: MATLAB - Size: 21.2 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sharejing/Takin

A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。

Language: Python - Size: 2.42 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 7

szcf-weiya/SplitClusterTest.jl

Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"

Language: Julia - Size: 1.02 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 1

aarryasutar/Credit_EDA

This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.

Language: Jupyter Notebook - Size: 1.42 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

krisssix/BankruptcyPrediction

Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.

Language: Python - Size: 4.88 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

katiebristol/data_splitter

A basic Python script to split a .dat file into individual sample files.

Language: Python - Size: 1.7 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise

Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.

Size: 1.67 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shenwanxiang/ChemBench

MoleculeNet benchmark dataset & MolMapNet dataset

Language: HTML - Size: 126 MB - Last synced at: 12 months ago - Pushed at: about 3 years ago - Stars: 59 - Forks: 17

Lefteris-Souflas/Spark-Movies-Analytics

Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.

Language: Jupyter Notebook - Size: 289 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

MadhuBala11/DiabetesPrediction

In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building

Language: Jupyter Notebook - Size: 3.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

MuhireIghor/health-pro-backend

A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure

Language: JavaScript - Size: 30.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

JVTupinamba/Kennard-Stone-Mahalanobis

As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.

Language: Jupyter Notebook - Size: 50.8 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

UERJ-LIVIA/Kennard-Stone-Mahalanobis

As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.

Language: Jupyter Notebook - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Nicolas-Bolouri/Cloud-Data-Protection-Analysis

Comparative Analysis of Data Protection Mechanisms in Public Clouds

Size: 1.46 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yuerongz/DUPLEX-data-split-function

Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.

Language: Python - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

omardbaa/Data-Splitter

Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.

Language: Jupyter Notebook - Size: 3.33 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Dimas263/Preprocessing-Data-into-Train-Test-Val-Data

Python Preprocessing for Sales Project Notebook

Language: Jupyter Notebook - Size: 3.06 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 1

NabilahSharfina/Ruangguru-Bootcamp

Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2

Language: Jupyter Notebook - Size: 19 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

Related Keywords
data-splitting 18 python 4 logistic-regression 3 pandas 3 machine-learning 2 support-vector-machine 2 data-analysis 2 cross-validation 2 seaborn 2 numpy 2 matplotlib 2 jupyter-notebook 2 feature-engineering 2 data-preprocessing 2 mahalanobis-distance 2 cov 2 hyperparameter-tuning 2 exploratory-data-analysis 2 data-visualization 2 one-hot-encoding 2 univariate-analysis 1 data-distribution-strategies 1 k-nn 1 data-conversion 1 csv-to-json 1 csv-to-database 1 csv-processing 1 pipeline 1 pyspark 1 data-split-pytorch 1 data-split 1 pyspark-mllib 1 data-protection 1 pyspark-sql 1 spark-session 1 additional-analysis-prediction 1 correlation-analysis 1 data-exploration 1 cryptography 1 cloud-computing 1 datapreparation 1 python3 1 sklearn 1 outliers 1 numerical-data 1 multivariate-analysis 1 missing-values 1 inner-join 1 handling-outlier 1 handling-missing-value 1 forecasting 1 encoding 1 categorical-data 1 dimas263 1 dimas-dwi-putra 1 dimas 1 data-processing 1 data-mining 1 data-cleansing 1 sqlalchemy-library 1 sql-server 1 random-data-distributiony 1 python-script 1 pandas-library 1 numpy-library 1 json-processing 1 database-connection 1 data-transformation 1 data-manipulation 1 data-integration 1 data-engineering 1 k-nearest-neighbors 1 feature-selection 1 feature-scaling 1 dataset 1 class-imbalance-handling 1 catboost 1 scikit-learn 1 heatmap 1 dataframe 1 data-cleaning 1 correlation-matrix 1 boxplot 1 binning 1 post-selection 1 fdr 1 text-cleaning 1 nlp 1 file-processing 1 svm-classifier 1 soft-svm-classifier 1 regularization 1 pca 1 naive-bayes-classifier 1 maximum-likelihood-estimation 1 linear-discriminant-analysis 1 kernel-svm-classifier 1 hard-svm-classifier 1 model-evaluation 1 linear-regression 1