Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: datalake
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Language: Java - Size: 235 MB - Last synced: about 3 hours ago - Pushed: about 3 hours ago - Stars: 9,614 - Forks: 2,781
linkedin/openhouse
Open Control Plane for Tables in Data Lakehouse
Language: Java - Size: 4.23 MB - Last synced: about 8 hours ago - Pushed: about 9 hours ago - Stars: 256 - Forks: 36
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Language: Python - Size: 65 MB - Last synced: about 10 hours ago - Pushed: about 10 hours ago - Stars: 7,736 - Forks: 593
zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Language: Java - Size: 438 MB - Last synced: about 13 hours ago - Pushed: about 14 hours ago - Stars: 888 - Forks: 109
amosproj/amos2024ss04-building-information-enhancer
Building Information System for potential energy savings
Language: C# - Size: 3.8 MB - Last synced: about 8 hours ago - Pushed: about 17 hours ago - Stars: 1 - Forks: 0
apache/doris-thirdparty
Self-managed thirdparty dependencies for Apache Doris
Size: 244 MB - Last synced: about 24 hours ago - Pushed: 1 day ago - Stars: 28 - Forks: 25
datastrato/gravitino
World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
Language: Java - Size: 14 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 331 - Forks: 148
Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Size: 8.24 MB - Last synced: 1 day ago - Pushed: about 1 month ago - Stars: 460 - Forks: 111
samber/awesome-olap
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
Size: 33.2 KB - Last synced: 2 days ago - Pushed: 8 months ago - Stars: 21 - Forks: 2
prestodb/prestorials
Tutorials and examples of how to deploy Presto and connect it to different data sources
Size: 508 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 15 - Forks: 8
ExpediaGroup/apiary-data-lake
Terraform scripts for deploying Apiary Data Lake
Language: HCL - Size: 635 KB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 18 - Forks: 25
Sinaptik-AI/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Language: Python - Size: 4.13 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 11,055 - Forks: 999
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 136 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 4,054 - Forks: 328
mchien15/datascience
Soccer Players Data Analyst and Similar Players Finder
Language: Jupyter Notebook - Size: 44.4 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 1 - Forks: 0
buoyant-data/oxbow
Collection of AWS Lambdas for creating and managing Delta tables
Language: Rust - Size: 193 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 7 - Forks: 4
leesf/hudi-resources
汇总Apache Hudi相关资料
Size: 23.8 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 519 - Forks: 155
essraahmed/Data-Lake-with-Spark
Data Lake with Spark
Language: Python - Size: 37.1 KB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
Databricks-BR/open_tax
Lakehouse Tributário, para apoio gerencial aos processos fiscais, visando a melhoria contínua, identificação de falhas (Tax Compliance), modelos inteligentes de identificação de oportunidades (Tax Intelligence) e democratização das informações fiscais.
Language: Python - Size: 4.54 MB - Last synced: 21 days ago - Pushed: 24 days ago - Stars: 1 - Forks: 0
naiborhujosua/Data-Scientist-learning-path-using-databricks
This is the summary of learning Data Science using Databricks
Size: 51.8 KB - Last synced: 8 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language: Java - Size: 1.1 GB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 5,077 - Forks: 2,345
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
Language: Java - Size: 343 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 7,708 - Forks: 1,600
seyed-nouraie/Azure-Security-Data-Lake
A platform for extracting and shipping security value from your data lake to Sentinel.
Size: 174 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 22 - Forks: 2
Rucal-Data-Solutions/datalakefoundation
Datalakehouse Foundation
Language: Scala - Size: 97.7 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 3 - Forks: 0
Phelipe-Sempreboni/tutorials-informations-notes
Repository for tutorials, information and notes on technology in general.
Language: Python - Size: 34.4 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 1 - Forks: 0
leo-project/leofs
The LeoFS Storage System
Language: Erlang - Size: 30 MB - Last synced: 17 days ago - Pushed: almost 4 years ago - Stars: 1,538 - Forks: 155
memiiso/debezium-server-batch 📦
Debezium server batch consumers
Language: Java - Size: 406 KB - Last synced: 18 days ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 2
DataLinkDC/dinky
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Language: Java - Size: 31.5 MB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 2,797 - Forks: 1,003
GitDataAI/jiaozifs
An Git-like version control file system for data lineage & data collaboration.
Language: Go - Size: 1.66 MB - Last synced: 22 days ago - Pushed: about 1 month ago - Stars: 41 - Forks: 2
apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Language: Java - Size: 62.3 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 676 - Forks: 235
apache/doris-website
Apache Doris Website
Language: TypeScript - Size: 275 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 62 - Forks: 110
paradedb/paradedb
Postgres for Search and Analytics
Language: Rust - Size: 5.23 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 3,727 - Forks: 102
federicopfund/data-engineer
Proceso ETL
Language: Jupyter Notebook - Size: 84.5 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 0
lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Language: Java - Size: 33.1 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 2,291 - Forks: 418
japila-books/delta-lake-internals
The Internals of Delta Lake
Size: 168 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 175 - Forks: 36
vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced: 23 days ago - Pushed: about 1 year ago - Stars: 53 - Forks: 34
UncoderIO/Uncoder_IO
An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
Language: Python - Size: 2.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 101 - Forks: 16
aws-samples/aws-insurancelake-infrastructure
This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake ETL with CDK Pipelines project.
Language: Python - Size: 497 KB - Last synced: 5 days ago - Pushed: 2 months ago - Stars: 7 - Forks: 3
aws-samples/aws-insurancelake-etl
This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project
Language: Python - Size: 5.44 MB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 12 - Forks: 5
WeBankFinTech/Streamis
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Language: Java - Size: 70 MB - Last synced: 16 days ago - Pushed: 25 days ago - Stars: 97 - Forks: 40
dbsystel/datalake-graphql-wrapper
The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.
Language: TypeScript - Size: 294 KB - Last synced: 19 days ago - Pushed: about 1 year ago - Stars: 16 - Forks: 0
tuancamtbtx/dataplatform-stack
How to build a complete Data Platform -> Here
Language: Python - Size: 7.57 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 2 - Forks: 0
awslabs/aws-orbit-workbench 📦
A Data Platform built for AWS, powered by Kubernetes.
Language: Python - Size: 53.7 MB - Last synced: about 16 hours ago - Pushed: 10 months ago - Stars: 127 - Forks: 26
HamzaKaGit/Data_Engineering_Essentials
This Repository will cover all the important data engineering concepts, skills that will help you become a successful data engineer. You will learn the basics of data engineer, the important algorithms used by data engineer and look at the data engineer roles and responsibilities in this Data Engineering.
Size: 43.9 KB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0
MehdiTAZI/BigData-Platform
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]
Language: Jupyter Notebook - Size: 85 KB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 6 - Forks: 6
manuzhang/awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc
Size: 22.5 KB - Last synced: 24 days ago - Pushed: 2 months ago - Stars: 4 - Forks: 1
nazish555/Tokyo-Olympics-Azure-Data-Engineering-Project
This project leverages Azure Cloud services like Azure Data Factory, Azure Databricks, and Synapse Analytics to execute a data engineering workflow. Utilizing data sourced from the Olympic API on GitHub, it involves extracting raw data into Azure Data Lake Storage, transforming it with PySpark on Azure Databricks, and analyzing the transformed data
Language: Jupyter Notebook - Size: 337 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
ismailsimsek/iceberg-examples
Apache iceberg Spark s3 examples
Language: Java - Size: 33.2 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 12 - Forks: 8
logleads/LogverzReleases
LOGVERZ APPLICATION BUNDLE. Logverz is a cutting-edge self-service data platform and instant data lake. The fastest route from AWS S3 to instant reports. The application bundle is the packaged repository incorporating the "LogverzPortalAccess", "LogverzPortal", and "LogverzCore" components.
Language: PowerShell - Size: 92.8 KB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 3 - Forks: 1
pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Size: 43.9 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 19 - Forks: 1
ianthropos88/Enterprise_Data_Architecture
The pragmatic technology journey for an Enterprise Data Model serving reporting, analytical, advanced data science and other digital use cases with integrated data from a variety of sources.
Size: 657 KB - Last synced: 22 days ago - Pushed: 9 months ago - Stars: 1 - Forks: 0
kimtth/pyspark-tika-text-extraction
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
Language: Python - Size: 261 MB - Last synced: 24 days ago - Pushed: over 2 years ago - Stars: 5 - Forks: 0
edgBR/delta-lake-polars
Building a poor man's data lake: Exploring the Power of Polars and Delta Lake
Language: Python - Size: 156 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 4 - Forks: 0
izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Language: Dockerfile - Size: 106 KB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40
sanogotech/minIO-trino-hive-docker Fork of sensei23/trino-hive-docker
MinIO trino + hive + minio with postgres in docker compose
Language: Dockerfile - Size: 267 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
rifa8/data-warehouse-submission
Learning about Data Warehouse
Size: 1.19 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
ewerthonk/datalakehouse-northwind
Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.
Language: Jupyter Notebook - Size: 559 KB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
micvet/data-eng-project-amazon
O objetivo deste projeto foi aplicar os conhecimentos nas ferramentas de extração e tratamento de dados da plataforma Azure.
Language: Jupyter Notebook - Size: 78.1 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
autruonggiang/IS353-GCP
Topic: Social network data processing is based on Google Cloud Platform technology.
Size: 36.7 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
PaloAltoNetworks/pan-cortex-data-lake-python
Python idiomatic SDK for Cortex™ Data Lake.
Language: Python - Size: 1.35 MB - Last synced: 28 days ago - Pushed: over 2 years ago - Stars: 41 - Forks: 20
abdullahkhawer/aws-auto-terminate-idle-emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Language: Python - Size: 19.5 KB - Last synced: 22 days ago - Pushed: over 2 years ago - Stars: 26 - Forks: 16
UncoderIO/RootA
Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automated translation into other languages
Size: 250 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 94 - Forks: 5
ExpediaGroup/apiary
Apiary provides modules which can be combined to create a federated cloud data lake
Size: 303 KB - Last synced: 3 months ago - Pushed: over 2 years ago - Stars: 35 - Forks: 8
Gares95/DataLake-Spark
This repository consist of a project to build an ETL pipeline for a data lake hosted on S3 using Spark. This project is based on Udacity's template.
Language: Python - Size: 43.9 KB - Last synced: 4 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
soorajpazeekal/logistics-real-time-poc
A Data engineering based Proof of Concept demonstrating cutting-edge logistics solutions for a US-based Grocery Delivery Platform
Language: Jupyter Notebook - Size: 30.3 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 1
Onniedvin/Python-ETL-Data-Pipeline-with-AWS
Harjottelua IaC parissa käyttäen Terraformia ja AWS.
Language: Python - Size: 267 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0
FeuerwehrHackathon2024/FireLake
Idee einer Plattform für Daten die für einen Feuerwehreinsatz relevant sein können.
Size: 20.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 2 - Forks: 0
KleinYuan/llama2-csv-webapp
self host/local host llama2 based web app to chat with your csvs (multiple)
Language: Python - Size: 168 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
phammylinh2002/Implementing-a-Data-Lake-Using-MongoDB-Integrated-with-BigQuery
This project is a part of my major project at my university and I am the one who was responsible for the Implementation of Data Lake on MongoDB (Integrated with BigQuery is an extension of the project)
Language: Python - Size: 2.77 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
vre-hub/vre
VRE infrastructure running at CERN
Language: Shell - Size: 13 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 5 - Forks: 1
dd-Splunk/splunk-datalake
How to combine smart store and ingest action for datalake use case
Language: Python - Size: 360 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
vfx-beavers/de-sprint-7
Организация Data Lake
Language: Python - Size: 202 KB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
rlevchenko/terraform-azure-data
Terraform script to deploy almost all Azure Data Services
Language: HCL - Size: 26.4 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 35 - Forks: 26
Tanay0510/Data-Lake-with-Spark
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Language: Python - Size: 418 KB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
KennethanCeyer/awesome-data-pipeline
Awesome list for datapipeline
Size: 200 KB - Last synced: 26 days ago - Pushed: over 1 year ago - Stars: 20 - Forks: 4
kassette-ai/kassette-server
Secured pipelines for your reporting and auditing data
Language: Go - Size: 858 KB - Last synced: 10 days ago - Pushed: 6 months ago - Stars: 7 - Forks: 0
KirillZhul/de-project-sprint-7 Fork of yandex-praktikum/de-project-sprint-7
PySpark, DataLake
Language: Python - Size: 71.3 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
Stefen-Taime/azurePipeline
Azure Data Pipeline
Language: Jupyter Notebook - Size: 95.7 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
bluishglc/serverless-datalake-example
A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
Language: Shell - Size: 212 KB - Last synced: 5 months ago - Pushed: over 1 year ago - Stars: 17 - Forks: 4
aessing/demo-mdwh
Modern Dataware House Demos with Azure Databricks, Azure Data Factory & Azure Dedicated SQL pool (formerly SQL DW)
Size: 48.3 MB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 4 - Forks: 1
Sheitak/datalake-jljq
Data Lake project for ingest and transform financial data and dashboard BI proposal
Language: Python - Size: 41 KB - Last synced: 8 days ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
NikiReis/Data-Engineer
Repository intended to upload the codes challenges and notes, through the path of the bootcamp
Language: Jupyter Notebook - Size: 286 KB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 1 - Forks: 0
AbsaOSS/enceladus
Dynamic Conformance Engine
Language: Scala - Size: 7.93 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 28 - Forks: 14
tac0x2a/nayco
Nayco(内湖) is all in one micro DataLake for IoT
Language: JavaScript - Size: 9.63 MB - Last synced: 9 days ago - Pushed: over 1 year ago - Stars: 11 - Forks: 0
AWS-Big-Data-Projects/AWS-Data-Lake
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
Size: 17.6 KB - Last synced: 29 days ago - Pushed: over 3 years ago - Stars: 16 - Forks: 3
DataTech-Solutions/Threat-Detection-and-Visualization
Threat Detection and Visualization
Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153
bloomberg/trino Fork of trinodb/trino
Trino, the distributed SQL query engine for big data
Size: 223 MB - Last synced: 27 days ago - Pushed: 2 months ago - Stars: 10 - Forks: 8
epomatti/az-data-services
End-to-end scenario for Azure data services.
Language: HCL - Size: 354 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
liyichencc/incubator-paimon Fork of apache/paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Size: 24.4 MB - Last synced: 24 days ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
CanaanGM/databases-infrastructure
on demand databases deployment, varuios kinds, adding more as i use them!
Language: Python - Size: 4.02 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0
hbuddana/Azure_Data_Factory_COVID-19_Reporting
Data Engineering Project on Covid19 Reporting – Using Azure Data Factory, Databricks, HDInsight, Azure Data Factory – An End to End ETL pipeline in addition to a Power BI report dashboard.
Language: Jupyter Notebook - Size: 16.6 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced: 7 months ago - Pushed: almost 2 years ago - Stars: 282 - Forks: 27
law-pal/data_pipelines
Data Pipelines for moving and processing data for analytics.
Language: Python - Size: 1000 Bytes - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
pprzetacznik/datalake-aws
Sample data lake pipeline on AWS implemented using Terraform
Language: HCL - Size: 133 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
fortinux/bigdata-book
Libro Fundamentos de Big Data
Language: Jupyter Notebook - Size: 4.94 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0
3amory99/Sparkify-App-Data-Lake-Using-Apache-Spark-and-S3
Sparkify app, my objective is to assist Sparkify, a music streaming startup, in migrating its data warehouse to a data lake. To achieve this, I have developed an ETL (Extract, Transform, Load) pipeline. This pipeline is designed to extract data from S3, process it using Apache Spark, and subsequently load the processed data into a new S3 storage lo
Language: Jupyter Notebook - Size: 1.02 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
thunchanokbow/audibleBook_Revenue
Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand
Language: Jupyter Notebook - Size: 11.6 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
neuro-ml/tarn
An insanely customizable framework for key-value storage 💾
Language: Python - Size: 336 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0
lynnlangit/serverless-architecture
Companion to my Linked In Learning 'Serverless Architecture' course
Size: 5.77 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 5 - Forks: 2
lobiouni/EBAC
Nesse repositório eu faço o upload dos códigos gerados no curso profissionalizante de analista de dados da Escola Britânica de Artes Criativas e Tecnologia - EBAC (https://ebaconline.com.br/analista-de-dados).
Language: Jupyter Notebook - Size: 320 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 4 - Forks: 0
mfilipelino/kafka2hdfs
pyspark streaming kafka(0.8.2) to hdfs
Language: Python - Size: 5.86 KB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 5 - Forks: 1