GitHub topics: apache-spark
kubeflow/spark-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Language: Go - Size: 25.5 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 2,952 - Forks: 1,411

streamnative/pulsar-spark
Spark Connector to read and write with Pulsar
Language: Scala - Size: 726 KB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 113 - Forks: 51

ac-gomes/spark-iceberg-hive
Language: Jupyter Notebook - Size: 1010 KB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 0 - Forks: 0

teragrep/pth_10
Data Processing Language (DPL) translator for Apache Spark
Language: Java - Size: 1.17 MB - Last synced at: about 17 hours ago - Pushed at: about 18 hours ago - Stars: 1 - Forks: 9

miroslav-reiter/Big_Data_Apache_Spark_Hive_Hadoop_Airflow
🗄️ Materiály k online kurzom a školeniam Big Data (Veľké Dáta), Apache Spark, Hive, Apache Hadoop, Apache Airflow
Language: Jupyter Notebook - Size: 189 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Hazim-HF/Data-Management
This repository covers data management and big data technologies, including databases, querying, and big data processing. Topics include Hadoop (MapReduce, HDFS), Apache Spark, data security, and optimization techniques. Students will learn Spark’s architecture, data distribution, parallel computing, and memory caching to enhance big data solutions
Language: Jupyter Notebook - Size: 65.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Language: Scala - Size: 75.2 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 1,303 - Forks: 773

chernistry/tabularasa-bi-core
A PoC for BI DE challenges, feat. Java, Apache Spark, Kafka, Spring Boot, PostgreSQL, and Docker for AdTech data processing.
Language: Java - Size: 148 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

LucaCanali/Miscellaneous
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
Language: Jupyter Notebook - Size: 34.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 449 - Forks: 152

dsgrid/dsgrid
Python package for working with demand-side grid projects, datasets and queries
Language: Python - Size: 8.76 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 28 - Forks: 5

microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 939 - Forks: 211

AndGeo69/StreamingCotiles
A streaming implementation of COTILES algorithm using Apache Spark's Structured Streaming API
Language: Python - Size: 2.74 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

Anonymous0-0paper/SWG
AutoPipe: LLM Assisted Automatic Stream Processing Pipeline Generation
Language: Python - Size: 105 KB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 150 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 4,712 - Forks: 379

sparklyr/sparklyr
R interface for Apache Spark
Language: R - Size: 97 MB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 962 - Forks: 308

AbsaOSS/hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Language: Scala - Size: 1.63 MB - Last synced at: about 20 hours ago - Pushed at: about 21 hours ago - Stars: 45 - Forks: 13

mlflow/mlflow
Open source platform for the machine learning lifecycle
Language: Python - Size: 795 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 20,791 - Forks: 4,576

feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise
Language: Scala - Size: 29.4 MB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 1,898 - Forks: 232

O2-Czech-Republic/proxima-platform
The Proxima platform.
Language: Java - Size: 9.4 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 21 - Forks: 7

newfront/hitchhikers_guide_to_deltalake_streaming
Don't Panic. This guide will help you when it feels like the end of the world.
Language: Jupyter Notebook - Size: 230 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 23 - Forks: 9

microsoft/SynapseML
Simple and Distributed Machine Learning
Language: Scala - Size: 157 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 5,138 - Forks: 847

graphframes/graphframes
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Language: Scala - Size: 3.95 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1,058 - Forks: 251

geoHeil/awesome-tools
curated list of awesome tools and libraries for specific domains
Size: 958 KB - Last synced at: 4 days ago - Pushed at: 8 days ago - Stars: 47 - Forks: 11

oceanbase/spark-connector-oceanbase
Apache Spark Connectors for OceanBase.
Language: Scala - Size: 283 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 4

anaregdesign/openaivec
Pandas extension, Tabular calculation with LLM, Spark UDF Builder
Language: Python - Size: 1.37 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 12 - Forks: 1

srimantapal205/Subject-Wise-Question---Answer
This branch focuses on building Data Engineering Interview Question and Answer
Size: 445 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

Bhargav129/Spark
This repo helps you understand the core components of Apache Spark, starting with a deep dive into the Catalyst Optimizer.
Size: 43.9 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

Azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Language: Scala - Size: 19.6 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 236 - Forks: 178

techsparksguru/data_ai_for_all
Data Analysis, Analytics, Science, AI & ML, LLM etc.
Language: Jupyter Notebook - Size: 23.3 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 14 - Forks: 3

datatweets/airflow-pyspark-k8s
Run Apache Airflow with KubernetesExecutor and PySpark on Kubernetes using Helm charts and Kind for local development
Language: Python - Size: 283 KB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

exacaster/lighter
REST API for Apache Spark on K8S or YARN
Language: Java - Size: 6.68 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 98 - Forks: 23

LucaCanali/sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Language: Scala - Size: 1.96 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 758 - Forks: 151

astrolabsoftware/fink
Fink documentation website
Size: 41.9 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 3 - Forks: 2

ayushlingayat/Spark-Learning
These are my Spark learning notes a space I revisit often to revise and strengthen my Spark concepts...🐉🀄
Language: Python - Size: 4.88 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

astrolabsoftware/fink-broker
Astronomy Broker based on Apache Spark
Language: Python - Size: 98.4 MB - Last synced at: 12 days ago - Pushed at: 13 days ago - Stars: 70 - Forks: 14

divithraju/divith-aju-Hadoop-Pyspark-pipeline
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
Language: Python - Size: 4.88 KB - Last synced at: 13 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

dataflint/spark
A modern replacement Apache Spark UI
Language: TypeScript - Size: 17.8 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 254 - Forks: 30

miguno/kafka-storm-starter 📦
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Language: Scala - Size: 393 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 724 - Forks: 329

GoogleCloudPlatform/dataproc-templates
Dataproc templates and pipelines for solving in-cloud data tasks
Language: Python - Size: 18.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 129 - Forks: 99

aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
Language: Java - Size: 443 KB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 26 - Forks: 5

OKDP/okdp-spark-auth-filter
Oauth2/OIDC Authentication filter for Apache Spark Apps/History UIs
Language: Java - Size: 879 KB - Last synced at: 14 days ago - Pushed at: 15 days ago - Stars: 8 - Forks: 8

AdityaSreevatsaK/PySpark-Pipeline
A collection of PySpark projects showcasing scalable data processing, transformation pipelines, and big data analytics using Apache Spark.
Language: Jupyter Notebook - Size: 1.78 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Language: C# - Size: 4.87 MB - Last synced at: 15 days ago - Pushed at: about 1 month ago - Stars: 2,058 - Forks: 327

data-tools/big-data-types
A library to transform Scala product types and Schemes from different systems into other Schemes. Any implemented type automatically gets methods to convert it into the rest of the types and vice versa. E.g: a Spark Schema can be transformed into a BigQuery table.
Language: Scala - Size: 3.74 MB - Last synced at: 8 days ago - Pushed at: 17 days ago - Stars: 13 - Forks: 3

opencypher/morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Language: Scala - Size: 29.6 MB - Last synced at: 5 days ago - Pushed at: almost 5 years ago - Stars: 341 - Forks: 62

benedekh/bigdata-projects
Student projects in Big Data field.
Language: Java - Size: 198 KB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 19 - Forks: 12

lw-lin/CoolplaySpark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Language: Scala - Size: 9.54 MB - Last synced at: 19 days ago - Pushed at: about 3 years ago - Stars: 3,488 - Forks: 1,410

mrpowers-io/quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Language: Python - Size: 1.98 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 672 - Forks: 98

tansu-io/example-spark
Tansu schema-backed topics, instantly accessible as Apache Iceberg tables in Apache Spark
Language: Just - Size: 13.7 KB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

priyanshubiswas-tech/priyanshubiswas-tech
Data Engineer | Python, SQL, Spark, Hadoop, Airflow, DBT, AWS | Building pipelines, solving data problems, and sharing projects.
Size: 3.86 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

helioribeiro/helioribeiro.github.io
REPOSITORY FOR MY SOFTWARE DEVELOPMENT AND DATA SCIENCE PORTFOLIO.
Language: CSS - Size: 62.9 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 0 - Forks: 0

Salmon-Brain/dead-salmon-brain
Apache Spark based framework for analysis A/B experiments
Language: Java - Size: 407 KB - Last synced at: 19 days ago - Pushed at: 8 months ago - Stars: 15 - Forks: 0

guidok91/spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
Language: Python - Size: 192 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 13 - Forks: 4

spark-notebook/spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
Language: JavaScript - Size: 15.8 MB - Last synced at: 19 days ago - Pushed at: about 2 years ago - Stars: 3,152 - Forks: 653

CodelyTV/spark-best_practices_and_deploy-course
Deploy Spark course examples
Language: Scala - Size: 82.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 5 - Forks: 1

G-Research/fasttrackml
Experiment tracking server focused on speed and scalability
Language: Go - Size: 5.4 MB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 105 - Forks: 20

ptyadana/SQL-Data-Analysis-and-Visualization-Projects
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Language: Jupyter Notebook - Size: 35.1 MB - Last synced at: 25 days ago - Pushed at: almost 3 years ago - Stars: 1,436 - Forks: 553

san089/goodreads_etl_pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Language: Python - Size: 1.31 MB - Last synced at: 25 days ago - Pushed at: over 5 years ago - Stars: 1,378 - Forks: 227

JuanParias29/BigDataProcessingProject
Este repositorio contiene un proyecto de análisis y procesamiento de datos a gran escala basado en la metodología CRISP-DM, enfocado en resolver preguntas de negocio dentro del ámbito educativo.
Language: Jupyter Notebook - Size: 4.3 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language: Python - Size: 23.9 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 304 - Forks: 63

rizkipragustono/data_analysis_spark
Exploration: Data Analysis using Spark
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: 18 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

lykmapipo/Python-Joblib-Cookbook
A step-by-step guide to master various aspects of Joblib for parallel computing in Python
Language: Python - Size: 44.9 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

lykmapipo/Scala-Spark-Product-Sales-Analysis
Scala application to process, and analyze product sales using Spark
Language: Scala - Size: 125 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 1

lykmapipo/Python-Spark-Log-Analysis
Python scripts to process, and analyze log files using PySpark.
Language: Python - Size: 131 KB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

lykmapipo/NYC-TLC-Trip-Data
Python scripts to download, process, and analyze the New York City Taxi and Limousine Commission (TLC) Trip Record Data dataset
Language: Jupyter Notebook - Size: 100 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 1

airscholar/e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Language: Python - Size: 289 KB - Last synced at: 25 days ago - Pushed at: 4 months ago - Stars: 250 - Forks: 123

openscoring/openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Language: Java - Size: 869 KB - Last synced at: 25 days ago - Pushed at: 10 months ago - Stars: 583 - Forks: 171

Hydrospheredata/mist
Serverless proxy for Spark cluster
Language: Scala - Size: 9.96 MB - Last synced at: 23 days ago - Pushed at: over 4 years ago - Stars: 326 - Forks: 72

Devinterview-io/apache-spark-interview-questions
🟣 Apache Spark interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
Size: 31.3 KB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 9 - Forks: 8

mohitsarawgi/Leave-Authorization-system
Facilitates online submission of diverse leave request types and routes them to appropriate authorities for approval. • Monitors leave balances accurately to avoid overstepping allocated time off limits, fostering a smooth workflow and improved employee contentment.
Language: JavaScript - Size: 13.2 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 1

tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
Language: Jupyter Notebook - Size: 8.97 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 347 - Forks: 271

vesko-vujovic/vesko-vujovic.github.io
Personal blog about Data Engineering
Language: CSS - Size: 10.9 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

infoslack/awesome-kafka
A list about Apache Kafka
Size: 96.7 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 582 - Forks: 164

aloneguid/parquet-dotnet
Fully managed Apache Parquet implementation
Language: C# - Size: 121 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 732 - Forks: 160

sbl-sdsc/mmtf-pyspark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Language: Python - Size: 524 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 67 - Forks: 27

awslabs/amazon-emr-vscode-toolkit
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Language: TypeScript - Size: 907 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 37 - Forks: 5

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Size: 6.88 MB - Last synced at: 28 days ago - Pushed at: 8 months ago - Stars: 63 - Forks: 38

intel/BigDL
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
Language: Jupyter Notebook - Size: 356 MB - Last synced at: 27 days ago - Pushed at: 3 months ago - Stars: 2,674 - Forks: 731

mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language: Java - Size: 397 MB - Last synced at: 26 days ago - Pushed at: 8 months ago - Stars: 1,075 - Forks: 661

mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 601 MB - Last synced at: 17 days ago - Pushed at: 7 months ago - Stars: 158 - Forks: 143

1duo/awesome-ai-infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Size: 11.8 MB - Last synced at: 29 days ago - Pushed at: about 6 years ago - Stars: 416 - Forks: 74

tweag/sparkle
Haskell on Apache Spark.
Language: Haskell - Size: 1.1 MB - Last synced at: 25 days ago - Pushed at: over 2 years ago - Stars: 448 - Forks: 27

big-data-europe/docker-spark
Apache Spark docker image
Language: Shell - Size: 7.78 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 2,055 - Forks: 702

awslabs/amazon-emr-cli
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Language: Python - Size: 150 KB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 14

archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 33

japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
Size: 119 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 420 - Forks: 172

mattjw/sparkql
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
Language: Python - Size: 4.59 MB - Last synced at: 21 days ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 4

anqorithm/RealTime-StockStream
RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis
Language: Python - Size: 5.36 MB - Last synced at: 5 days ago - Pushed at: 4 months ago - Stars: 26 - Forks: 3

kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Language: Scala - Size: 26.4 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 85 - Forks: 18

hendhamdi/Sentiment-Analysis-Spark-NLP
This project uses a Spark pipeline (PySpark) to analyze the sentiment of user reviews.
Language: HTML - Size: 433 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jlsilva01/projeto-ed-satc
Repositorio modelo para desenvolvimento do projeto final da disciplina de Engenharia de Dados do curso de Engenharia de Software da UNISATC.
Size: 662 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 3

umbertogriffo/apache-spark-best-practices-and-tuning
https://umbertogriffo.gitbook.io/apache-spark-best-practices-and-tuning/
Size: 1.78 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 2

srafay/Hadoop-hands-on
Learning how to tame the Big Data with Hadoop and related technologies
Language: PigLatin - Size: 96.7 KB - Last synced at: 6 days ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 21

Peippo1/marketing-analytics-pipeline
A scalable marketing analytics pipeline built with Apache Spark and Delta Lake, designed to process, transform, and export data for advanced business insights.
Language: Python - Size: 404 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Language: HTML - Size: 13.6 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 195 - Forks: 165

ArshTiwari2004/Sahyog
Centralized Disaster Response and Inventory Management System that leverages AI and Google Cloud Technologies to predict disasters, optimize resource management, and provide real-time coordination.
Language: JavaScript - Size: 14.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 3

cerndb/dist-keras 📦
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Language: Python - Size: 54.6 MB - Last synced at: 29 days ago - Pushed at: almost 7 years ago - Stars: 623 - Forks: 167

vishnu812-tech/Data-Engineering-Essentials
Git hub profile for learning new languages and developing projects
Language: Python - Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

alimiheb/ChicagoEnergyUsageAnalysis
The Chicago Energy Usage Analysis project aims to explore energy consumption patterns in Chicago using big data techniques. Leveraging Apache Spark, it processes a dataset of approximately 30,000 records to provide actionable insights for urban planning and energy efficiency initiatives.
Language: Java - Size: 8.68 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

msamij/zig-flow
Data Engineering pipeline.
Language: Java - Size: 555 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0
