Topic: "pyspark"
sbl-sdsc/mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Language: Jupyter Notebook - Size: 369 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 60 - Forks: 31

tomaztk/Spark-for-data-engineers
Apache Spark for data engineers
Language: Jupyter Notebook - Size: 6.9 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 55 - Forks: 22

kaantas/spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Language: Python - Size: 2.93 KB - Last synced at: about 22 hours ago - Pushed at: over 6 years ago - Stars: 55 - Forks: 34

chenxingqiang/DNN-YouTube-RecSys
This repository provides a comprehensive implementation of a deep neural network-based recommendation system similar to YouTube's. The repo is organized to include the core Python implementation of the model and a Spark-based Scala solution for data generation and model serving.
Language: Python - Size: 1.86 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 54 - Forks: 32

allisonwang-db/pyspark-data-sources
Custom PySpark Data Sources
Language: Python - Size: 665 KB - Last synced at: 25 days ago - Pushed at: about 2 months ago - Stars: 53 - Forks: 9

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

dimajix/spark-training
Repository used for Spark Trainings
Language: Jupyter Notebook - Size: 9 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 66

Spratiher9/Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Language: HTML - Size: 1.23 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 53 - Forks: 7

luisbelloch/data_processing_course
Some class materials for a data processing course using PySpark
Language: Python - Size: 563 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 24

NHSDigital/data-analytics-services
This repo collects the open-source work of the Analytics Service within NHS Digital Data Services
Size: 81.1 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 50 - Forks: 2

jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
Language: Jupyter Notebook - Size: 318 KB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 50 - Forks: 28

jmcmt87/spark_app_twitter
A data engineering project (Twitter monitor app)
Language: Python - Size: 1.76 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 50 - Forks: 8

shauryashaurya/learn-data-munging
Notes on Data Engineering with Pandas, PySpark, Dask, Ray, Arrow DataFusion, Polars etc.
Language: Jupyter Notebook - Size: 631 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 47 - Forks: 21

idealo/terraform-emr-pyspark 📦
Quickstart PySpark with Anaconda on AWS/EMR using Terraform
Language: HCL - Size: 11.7 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 47 - Forks: 33

sllynn/spark-xgboost
A Python wrapper for XGBoost4J-Spark classes.
Language: Python - Size: 456 KB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 47 - Forks: 13

zero323/pyspark-asyncactions
Asynchronous actions for PySpark
Language: Python - Size: 80.1 KB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 47 - Forks: 2

sparkdq-community/sparkdq
A declarative PySpark framework for row- and aggregate-level data quality validation.
Language: Python - Size: 7.43 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 46 - Forks: 5

criteo/cluster-pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Language: Python - Size: 436 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 45 - Forks: 21

dogukannulu/streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
Language: Python - Size: 1.81 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 44 - Forks: 17

rafaelpierre/pyjaws
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Language: Python - Size: 3.46 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 43 - Forks: 4

G-Research/spark-dgraph-connector
A connector for Apache Spark and PySpark to Dgraph databases.
Language: Scala - Size: 1.44 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 43 - Forks: 12

ManuelGuerra1987/data-engineering-zoomcamp-notes
Detailed notes and homeworks from 2025 Data Engineering Zoomcamp by Datatalks.Club
Language: Python - Size: 14.8 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 43 - Forks: 0

Wittline/apache-spark-docker
Dockerizing an Apache Spark Standalone Cluster
Language: VBA - Size: 63.7 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 43 - Forks: 27

airscholar/SparkingFlow
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
Language: Java - Size: 94.7 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 27

kaantas/kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Language: Python - Size: 2.93 KB - Last synced at: about 22 hours ago - Pushed at: almost 8 years ago - Stars: 42 - Forks: 15

mrugankray/Big-Data-Cluster
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
Language: Shell - Size: 118 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 15

opentargets/gentropy
Open Targets python framework for post-GWAS analysis
Language: Jupyter Notebook - Size: 31.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 39 - Forks: 11

MrPowers/ceja
PySpark phonetic and string matching algorithms
Language: Python - Size: 32.2 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 39 - Forks: 6

godfanmiao/ML-Kaggle-Github-2022
《 Python机器学习及实践:从零开始通往Kaggle竞赛之路(2022年度版)》全书数据和开源代码
Language: Jupyter Notebook - Size: 168 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 39 - Forks: 11

jubins/Spark-And-MLlib-Projects
This repository contains Spark, MLlib, PySpark and Dataframes projects
Language: Jupyter Notebook - Size: 101 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 39 - Forks: 97

Wh1isper/sparglim
Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!
Language: Python - Size: 151 KB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 37 - Forks: 4

awslabs/amazon-emr-vscode-toolkit
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Language: TypeScript - Size: 907 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 37 - Forks: 5

garystafford/emr-demo
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
Language: Python - Size: 691 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 37 - Forks: 17

san089/Cloudera_Material
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Size: 9.02 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 37 - Forks: 30

gmrqs/lasagna
A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive Metastore, Trino and Kafka
Language: Jupyter Notebook - Size: 11.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 36 - Forks: 11

josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Language: C - Size: 561 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 36 - Forks: 15

asifahmed90/pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Language: Jupyter Notebook - Size: 93.8 KB - Last synced at: 19 days ago - Pushed at: about 6 years ago - Stars: 36 - Forks: 98

basin-etl/basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Language: TypeScript - Size: 7.08 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 3

mjhea0/flask-spark-docker
Just a boilerplate for PySpark and Flask
Language: Python - Size: 206 KB - Last synced at: 3 months ago - Pushed at: almost 7 years ago - Stars: 35 - Forks: 18

Spratiher9/SparkDataset
Instant search for and access to many datasets in Pyspark.
Language: Jupyter Notebook - Size: 14.9 MB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 34 - Forks: 8

AaronYang2333/DSCI_553
USC :v: 2020 Spring DSCI 553 (Foundations and Applications of Data Mining) 数据挖掘基础与应用 Score: :nine::four:
Language: ReScript - Size: 265 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 34 - Forks: 21

razevedo1994/razv-data-engineering
Portfolio of projects and studies conducted in data engineering.
Language: Jupyter Notebook - Size: 70.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 33 - Forks: 3

DataEngineering-LATAM/Spark-StudyClub
Grupo de Estudios de Apache Spark organizado por la comunidad Data Engineering Latam
Language: Jupyter Notebook - Size: 4.83 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 24

feng-li/dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Language: Python - Size: 276 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 46

garystafford/pyspark-setup-demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Language: Jupyter Notebook - Size: 3.07 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 33 - Forks: 30

mikulskibartosz/check-engine 📦
Data validation library for PySpark 3.0.0
Language: Python - Size: 94.7 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 4

lbdeoliveira/song-playlist-recommendation
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Language: HTML - Size: 225 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 32 - Forks: 12

ksindi/kafka-compose
:musical_score: Docker compose files for various kafka stacks
Language: Python - Size: 24.4 KB - Last synced at: 7 days ago - Pushed at: over 7 years ago - Stars: 32 - Forks: 10

mikeroyal/Apache-Spark-Guide
Apache Spark Guide
Language: Python - Size: 237 KB - Last synced at: 8 days ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 11

datamole-ai/pysparkdt
An open-source Python library for simplifying local testing of Databricks workflows that use PySpark and Delta tables.
Language: Python - Size: 64.5 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 30 - Forks: 0

astrolabsoftware/spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Language: Scala - Size: 24.8 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 30 - Forks: 16

ykursadkaya/pyspark-Docker
PySpark in Docker Containers
Language: Dockerfile - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 30 - Forks: 7

abhishek-ch/streamlit-healthcare-ML-App
Streamlit example showing Scikit Learn & Pyspark ML over Healthcare data ! Its simple !!
Language: Python - Size: 3.05 MB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 30 - Forks: 11

cerndb/SparkDLTrigger
Code and links to the data for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
Language: Jupyter Notebook - Size: 11.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 12

isarn/isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Language: Scala - Size: 1.33 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 29 - Forks: 12

Graphlet-AI/graphlet
PyPi module for Graphlet AI Knowledge Graph Factory
Language: Python - Size: 20.4 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 1

urigoren/decorators4DS
Useful decorators every Data Scientist should know
Language: Python - Size: 118 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 5

roshankoirala/pySpark_tutorial
Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
Language: Jupyter Notebook - Size: 202 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 29 - Forks: 26

nikhitmago/lookalike-modelling
Finding customer lookalikes using Machine Learning in PySpark
Language: Python - Size: 17.6 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 29 - Forks: 18

kitestring/DataQuest
Data Science Massive Open Online Course: All the code, notes and supplementary materials generated during the course of my data scientific learning.
Language: Jupyter Notebook - Size: 71.5 MB - Last synced at: 14 days ago - Pushed at: about 7 years ago - Stars: 29 - Forks: 15

jldbc/gutenberg
A content-based recommender system for books using the Project Gutenberg text corpus
Language: Python - Size: 13.2 MB - Last synced at: 22 days ago - Pushed at: over 8 years ago - Stars: 29 - Forks: 12

guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
Language: Python - Size: 3.81 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 28 - Forks: 12

mozilla/python_mozetl
ETL jobs for Firefox Telemetry
Language: Python - Size: 2.69 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 28 - Forks: 28

jihyeonseong/ESG-AI-investment-by-streamlit
ESG-investment AI
Language: Jupyter Notebook - Size: 32.6 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 28 - Forks: 7

alexott/spark-playground
Playing with different packages of the Apache Spark
Language: Scala - Size: 48.8 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 13

sbl-sdsc/mmtf-genomics
Methods for mapping genomic data onto 3D protein structure.
Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 28 - Forks: 10

datayoga-io/lineage
Generate beautiful documentation for your data pipelines in markdown format
Language: TypeScript - Size: 461 KB - Last synced at: about 2 hours ago - Pushed at: over 3 years ago - Stars: 28 - Forks: 0

SemyonSinchenko/flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
Language: Python - Size: 169 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 27 - Forks: 1

godfanmiao/PyAI-Github-2024
《 Python人工智能编程实践(2024年度版)》全书数据和开源代码
Language: Jupyter Notebook - Size: 182 MB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 27 - Forks: 6

Thanaraklee/Real-Time-PySpark
This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.
Language: Python - Size: 329 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 27 - Forks: 13

jmcurbelo/pyspark-ingenieria-de-datos
Este repositorio contiene el material del curso de Udemy Big Data y Spark: ingeniería de datos con Python y pyspark. En este curso, aprenderás a utilizar las herramientas y técnicas necesarias para trabajar con grandes conjuntos de datos utilizando la librería pyspark.
Language: Python - Size: 1.56 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 27 - Forks: 46

rayyan17/jobAnalytics_and_search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Language: Python - Size: 1.12 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 2

Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language: Python - Size: 3.61 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 27 - Forks: 13

oliveiraJessica/workshop-spark
Código para workshops Spark com ambiente de desenvolvimento em docker
Language: Jupyter Notebook - Size: 10.8 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 27 - Forks: 1

FavioVazquez/ODSC_India_2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Language: Jupyter Notebook - Size: 214 MB - Last synced at: about 2 months ago - Pushed at: almost 7 years ago - Stars: 27 - Forks: 8

dimajix/docker-jupyter-spark
Docker image for Jupyter notebooks with PySpark
Language: Shell - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 27 - Forks: 25

mrpowers-io/tsumugi-spark
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
Language: Python - Size: 1.64 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 26 - Forks: 6

AsadiAhmad/Ngram-Spark-Wikipedia
Calculating Ngram with PySpark for wikipedia text
Language: Jupyter Notebook - Size: 101 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 26 - Forks: 0

Wathon/data_engineering_with_python-track-datacamp
Data Engineer with Python lecture notes from #datacamp.
Language: Jupyter Notebook - Size: 59.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 22

rusucosmin/courses
Just the stuff from the faculty (homework, projects, lectures)
Language: Jupyter Notebook - Size: 707 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 26 - Forks: 19

kpolley/RELK
RELK -- The Research Elastic Stack (Kafka, Beats, Zookeeper, Logstash, ElasticSearch, Kibana, Spark, & Jupyter -- All in Docker)
Language: Dockerfile - Size: 275 KB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 26 - Forks: 7

datitran/spark-tdd-example
A simple Spark TDD example
Language: Jupyter Notebook - Size: 30.3 KB - Last synced at: about 2 months ago - Pushed at: almost 8 years ago - Stars: 26 - Forks: 12

japila-books/pyspark-internals
The Internals of PySpark
Size: 6.98 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 25 - Forks: 5

ThiagoPanini/terraglue
Providing an easy way to deploy a Glue job in any AWS account using Terraform
Language: Python - Size: 310 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 25 - Forks: 3

rodalbuyeh/pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Language: Python - Size: 229 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 25 - Forks: 8

brennerh1/databricks-demos
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
Language: Python - Size: 1.06 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 25 - Forks: 52

Morphl-AI/MorphL-Model-User-Search-Intent
Google Cloud Storage connector, pre-processor and model for predicting user search intent based on keywords
Language: Python - Size: 70.3 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 25 - Forks: 4

microsoft/driver_safety_analysis 📦
Estimating driver safety for connected cars
Language: Jupyter Notebook - Size: 20.1 MB - Last synced at: 1 day ago - Pushed at: almost 7 years ago - Stars: 25 - Forks: 15

thinagar-sivadas/spark-fundamentals
Elevate big data skills with Apache Spark's core concepts and examples
Language: Jupyter Notebook - Size: 719 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 24 - Forks: 1

AsadiAhmad/Edit-Distance-Spark
Calculating Edit Distance with PySpark
Language: Jupyter Notebook - Size: 4.88 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 0

zalmane/copybook
Python copybook parser
Language: Python - Size: 37.1 KB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 9

pancr9/Netflix-Recommender-System
ITCS 6190 : Cloud Computing for Data Analysis project. Movie Recommendation Engine for Netflix Data with custom functions implementation and library usage.
Language: Python - Size: 1.75 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 24 - Forks: 6

moritzkoerber/covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Language: Python - Size: 1.31 MB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 5

crflynn/pbspark
protobuf pyspark conversion
Language: Python - Size: 91.8 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 23 - Forks: 5

NHSDigital/artificial-data-generator
Pipelines for generating large volumes of anonymous artificial data that share some of the characteristics of real NHS data
Language: Python - Size: 136 KB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 23 - Forks: 5

mahmoudparsian/machine-learning-course
Machine Learning Course @ Santa Clara University
Size: 194 MB - Last synced at: 5 days ago - Pushed at: about 5 years ago - Stars: 23 - Forks: 16

narenmanoharan/Movie-Recommender-System
Movie recommender system with Collaborative Filtering using PySpark
Language: Python - Size: 895 KB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 23 - Forks: 9

point8/data-science-learning-paths
Practical data science courses
Language: Jupyter Notebook - Size: 270 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 22 - Forks: 3

maprihoda/data-analysis-with-python-and-pyspark
Language: Python - Size: 6.87 MB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 12

soyelherein/pyspark-cicd-template
PySpark data-pipeline testing and CICD
Language: Python - Size: 5.85 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 13
