Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datalake

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language: Java - Size: 235 MB - Last synced: about 3 hours ago - Pushed: about 3 hours ago - Stars: 9,614 - Forks: 2,781

linkedin/openhouse

Open Control Plane for Tables in Data Lakehouse

Language: Java - Size: 4.23 MB - Last synced: about 8 hours ago - Pushed: about 9 hours ago - Stars: 256 - Forks: 36

activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Language: Python - Size: 65 MB - Last synced: about 10 hours ago - Pushed: about 10 hours ago - Stars: 7,736 - Forks: 593

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Language: Java - Size: 438 MB - Last synced: about 13 hours ago - Pushed: about 14 hours ago - Stars: 888 - Forks: 109

amosproj/amos2024ss04-building-information-enhancer

Building Information System for potential energy savings

Language: C# - Size: 3.8 MB - Last synced: about 8 hours ago - Pushed: about 17 hours ago - Stars: 1 - Forks: 0

apache/doris-thirdparty

Self-managed thirdparty dependencies for Apache Doris

Size: 244 MB - Last synced: about 24 hours ago - Pushed: 1 day ago - Stars: 28 - Forks: 25

datastrato/gravitino

World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.

Language: Java - Size: 14 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 331 - Forks: 148

Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

Size: 8.24 MB - Last synced: 1 day ago - Pushed: about 1 month ago - Stars: 460 - Forks: 111

samber/awesome-olap

A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.

Size: 33.2 KB - Last synced: 2 days ago - Pushed: 8 months ago - Stars: 21 - Forks: 2

prestodb/prestorials

Tutorials and examples of how to deploy Presto and connect it to different data sources

Size: 508 KB - Last synced: 2 days ago - Pushed: 2 days ago - Stars: 15 - Forks: 8

ExpediaGroup/apiary-data-lake

Terraform scripts for deploying Apiary Data Lake

Language: HCL - Size: 635 KB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 18 - Forks: 25

Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

Language: Python - Size: 4.13 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 11,055 - Forks: 999

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 136 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 4,054 - Forks: 328

mchien15/datascience

Soccer Players Data Analyst and Similar Players Finder

Language: Jupyter Notebook - Size: 44.4 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 1 - Forks: 0

buoyant-data/oxbow

Collection of AWS Lambdas for creating and managing Delta tables

Language: Rust - Size: 193 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 7 - Forks: 4

leesf/hudi-resources

汇总Apache Hudi相关资料

Size: 23.8 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 519 - Forks: 155

essraahmed/Data-Lake-with-Spark

Data Lake with Spark

Language: Python - Size: 37.1 KB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

Databricks-BR/open_tax

Lakehouse Tributário, para apoio gerencial aos processos fiscais, visando a melhoria contínua, identificação de falhas (Tax Compliance), modelos inteligentes de identificação de oportunidades (Tax Intelligence) e democratização das informações fiscais.

Language: Python - Size: 4.54 MB - Last synced: 21 days ago - Pushed: 24 days ago - Stars: 1 - Forks: 0

naiborhujosua/Data-Scientist-learning-path-using-databricks

This is the summary of learning Data Science using Databricks

Size: 51.8 KB - Last synced: 8 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

Language: Java - Size: 1.1 GB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 5,077 - Forks: 2,345

StarRocks/starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.

Language: Java - Size: 343 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 7,708 - Forks: 1,600

seyed-nouraie/Azure-Security-Data-Lake

A platform for extracting and shipping security value from your data lake to Sentinel.

Size: 174 KB - Last synced: 12 days ago - Pushed: 12 days ago - Stars: 22 - Forks: 2

Rucal-Data-Solutions/datalakefoundation

Datalakehouse Foundation

Language: Scala - Size: 97.7 KB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 3 - Forks: 0

Phelipe-Sempreboni/tutorials-informations-notes

Repository for tutorials, information and notes on technology in general.

Language: Python - Size: 34.4 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 1 - Forks: 0

leo-project/leofs

The LeoFS Storage System

Language: Erlang - Size: 30 MB - Last synced: 17 days ago - Pushed: almost 4 years ago - Stars: 1,538 - Forks: 155

memiiso/debezium-server-batch 📦

Debezium server batch consumers

Language: Java - Size: 406 KB - Last synced: 18 days ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 2

DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

Language: Java - Size: 31.5 MB - Last synced: 22 days ago - Pushed: 23 days ago - Stars: 2,797 - Forks: 1,003

GitDataAI/jiaozifs

An Git-like version control file system for data lineage & data collaboration.

Language: Go - Size: 1.66 MB - Last synced: 22 days ago - Pushed: about 1 month ago - Stars: 41 - Forks: 2

apache/amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Language: Java - Size: 62.3 MB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 676 - Forks: 235

apache/doris-website

Apache Doris Website

Language: TypeScript - Size: 275 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 62 - Forks: 110

paradedb/paradedb

Postgres for Search and Analytics

Language: Rust - Size: 5.23 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 3,727 - Forks: 102

federicopfund/data-engineer

Proceso ETL

Language: Jupyter Notebook - Size: 84.5 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 0

lakesoul-io/LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

Language: Java - Size: 33.1 MB - Last synced: 30 days ago - Pushed: 30 days ago - Stars: 2,291 - Forks: 418

japila-books/delta-lake-internals

The Internals of Delta Lake

Size: 168 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 175 - Forks: 36

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced: 23 days ago - Pushed: about 1 year ago - Stars: 53 - Forks: 34

UncoderIO/Uncoder_IO

An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.

Language: Python - Size: 2.3 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 101 - Forks: 16

aws-samples/aws-insurancelake-infrastructure

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake ETL with CDK Pipelines project.

Language: Python - Size: 497 KB - Last synced: 5 days ago - Pushed: 2 months ago - Stars: 7 - Forks: 3

aws-samples/aws-insurancelake-etl

This solution helps you deploy ETL processes and data storage resources to create an Insurance Lake using Amazon S3 buckets for storage, AWS Glue for data transformation, and AWS CDK Pipelines. It is originally based on the AWS blog Deploy data lake ETL jobs using CDK Pipelines, and complements the InsuranceLake Infrastructure project

Language: Python - Size: 5.44 MB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 12 - Forks: 5

WeBankFinTech/Streamis

Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.

Language: Java - Size: 70 MB - Last synced: 16 days ago - Pushed: 25 days ago - Stars: 97 - Forks: 40

dbsystel/datalake-graphql-wrapper

The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.

Language: TypeScript - Size: 294 KB - Last synced: 19 days ago - Pushed: about 1 year ago - Stars: 16 - Forks: 0

tuancamtbtx/dataplatform-stack

How to build a complete Data Platform -> Here

Language: Python - Size: 7.57 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 2 - Forks: 0

awslabs/aws-orbit-workbench 📦

A Data Platform built for AWS, powered by Kubernetes.

Language: Python - Size: 53.7 MB - Last synced: about 16 hours ago - Pushed: 10 months ago - Stars: 127 - Forks: 26

HamzaKaGit/Data_Engineering_Essentials

This Repository will cover all the important data engineering concepts, skills that will help you become a successful data engineer. You will learn the basics of data engineer, the important algorithms used by data engineer and look at the data engineer roles and responsibilities in this Data Engineering.

Size: 43.9 KB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 2 - Forks: 0

MehdiTAZI/BigData-Platform

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]

Language: Jupyter Notebook - Size: 85 KB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 6 - Forks: 6

manuzhang/awesome-lakehouse

a curated list of awesome lakehouse frameworks, applications, etc

Size: 22.5 KB - Last synced: 24 days ago - Pushed: 2 months ago - Stars: 4 - Forks: 1

nazish555/Tokyo-Olympics-Azure-Data-Engineering-Project

This project leverages Azure Cloud services like Azure Data Factory, Azure Databricks, and Synapse Analytics to execute a data engineering workflow. Utilizing data sourced from the Olympic API on GitHub, it involves extracting raw data into Azure Data Lake Storage, transforming it with PySpark on Azure Databricks, and analyzing the transformed data

Language: Jupyter Notebook - Size: 337 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

ismailsimsek/iceberg-examples

Apache iceberg Spark s3 examples

Language: Java - Size: 33.2 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 12 - Forks: 8

logleads/LogverzReleases

LOGVERZ APPLICATION BUNDLE. Logverz is a cutting-edge self-service data platform and instant data lake. The fastest route from AWS S3 to instant reports. The application bundle is the packaged repository incorporating the "LogverzPortalAccess", "LogverzPortal", and "LogverzCore" components.

Language: PowerShell - Size: 92.8 KB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 3 - Forks: 1

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytical stacks and data engineering ecosystem

Size: 43.9 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 19 - Forks: 1

ianthropos88/Enterprise_Data_Architecture

The pragmatic technology journey for an Enterprise Data Model serving reporting, analytical, advanced data science and other digital use cases with integrated data from a variety of sources.

Size: 657 KB - Last synced: 22 days ago - Pushed: 9 months ago - Stars: 1 - Forks: 0

kimtth/pyspark-tika-text-extraction

🚴‍♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.

Language: Python - Size: 261 MB - Last synced: 24 days ago - Pushed: over 2 years ago - Stars: 5 - Forks: 0

edgBR/delta-lake-polars

Building a poor man's data lake: Exploring the Power of Polars and Delta Lake

Language: Python - Size: 156 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 4 - Forks: 0

izhangzhihao/Real-time-Data-Warehouse

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Language: Dockerfile - Size: 106 KB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 95 - Forks: 40

sanogotech/minIO-trino-hive-docker Fork of sensei23/trino-hive-docker

MinIO trino + hive + minio with postgres in docker compose

Language: Dockerfile - Size: 267 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

rifa8/data-warehouse-submission

Learning about Data Warehouse

Size: 1.19 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

ewerthonk/datalakehouse-northwind

Creating a Simple Data Lakehouse using Delta Lake on Databricks. My 1st Data Engineering Project.

Language: Jupyter Notebook - Size: 559 KB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

micvet/data-eng-project-amazon

O objetivo deste projeto foi aplicar os conhecimentos nas ferramentas de extração e tratamento de dados da plataforma Azure.

Language: Jupyter Notebook - Size: 78.1 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

autruonggiang/IS353-GCP

Topic: Social network data processing is based on Google Cloud Platform technology.

Size: 36.7 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0

PaloAltoNetworks/pan-cortex-data-lake-python

Python idiomatic SDK for Cortex™ Data Lake.

Language: Python - Size: 1.35 MB - Last synced: 28 days ago - Pushed: over 2 years ago - Stars: 41 - Forks: 20

abdullahkhawer/aws-auto-terminate-idle-emr

AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Language: Python - Size: 19.5 KB - Last synced: 22 days ago - Pushed: over 2 years ago - Stars: 26 - Forks: 16

UncoderIO/RootA

Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automated translation into other languages

Size: 250 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 94 - Forks: 5

ExpediaGroup/apiary

Apiary provides modules which can be combined to create a federated cloud data lake

Size: 303 KB - Last synced: 3 months ago - Pushed: over 2 years ago - Stars: 35 - Forks: 8

Gares95/DataLake-Spark

This repository consist of a project to build an ETL pipeline for a data lake hosted on S3 using Spark. This project is based on Udacity's template.

Language: Python - Size: 43.9 KB - Last synced: 4 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

soorajpazeekal/logistics-real-time-poc

A Data engineering based Proof of Concept demonstrating cutting-edge logistics solutions for a US-based Grocery Delivery Platform

Language: Jupyter Notebook - Size: 30.3 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 1

Onniedvin/Python-ETL-Data-Pipeline-with-AWS

Harjottelua IaC parissa käyttäen Terraformia ja AWS.

Language: Python - Size: 267 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

FeuerwehrHackathon2024/FireLake

Idee einer Plattform für Daten die für einen Feuerwehreinsatz relevant sein können.

Size: 20.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 2 - Forks: 0

KleinYuan/llama2-csv-webapp

self host/local host llama2 based web app to chat with your csvs (multiple)

Language: Python - Size: 168 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

phammylinh2002/Implementing-a-Data-Lake-Using-MongoDB-Integrated-with-BigQuery

This project is a part of my major project at my university and I am the one who was responsible for the Implementation of Data Lake on MongoDB (Integrated with BigQuery is an extension of the project)

Language: Python - Size: 2.77 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

vre-hub/vre

VRE infrastructure running at CERN

Language: Shell - Size: 13 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 5 - Forks: 1

dd-Splunk/splunk-datalake

How to combine smart store and ingest action for datalake use case

Language: Python - Size: 360 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

vfx-beavers/de-sprint-7

Организация Data Lake

Language: Python - Size: 202 KB - Last synced: 4 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

rlevchenko/terraform-azure-data

Terraform script to deploy almost all Azure Data Services

Language: HCL - Size: 26.4 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 35 - Forks: 26

Tanay0510/Data-Lake-with-Spark

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

Language: Python - Size: 418 KB - Last synced: 5 months ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

KennethanCeyer/awesome-data-pipeline

Awesome list for datapipeline

Size: 200 KB - Last synced: 26 days ago - Pushed: over 1 year ago - Stars: 20 - Forks: 4

kassette-ai/kassette-server

Secured pipelines for your reporting and auditing data

Language: Go - Size: 858 KB - Last synced: 10 days ago - Pushed: 6 months ago - Stars: 7 - Forks: 0

KirillZhul/de-project-sprint-7 Fork of yandex-praktikum/de-project-sprint-7

PySpark, DataLake

Language: Python - Size: 71.3 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

Stefen-Taime/azurePipeline

Azure Data Pipeline

Language: Jupyter Notebook - Size: 95.7 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

bluishglc/serverless-datalake-example

A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Language: Shell - Size: 212 KB - Last synced: 5 months ago - Pushed: over 1 year ago - Stars: 17 - Forks: 4

aessing/demo-mdwh

Modern Dataware House Demos with Azure Databricks, Azure Data Factory & Azure Dedicated SQL pool (formerly SQL DW)

Size: 48.3 MB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 4 - Forks: 1

Sheitak/datalake-jljq

Data Lake project for ingest and transform financial data and dashboard BI proposal

Language: Python - Size: 41 KB - Last synced: 8 days ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

NikiReis/Data-Engineer

Repository intended to upload the codes challenges and notes, through the path of the bootcamp

Language: Jupyter Notebook - Size: 286 KB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 1 - Forks: 0

AbsaOSS/enceladus

Dynamic Conformance Engine

Language: Scala - Size: 7.93 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 28 - Forks: 14

tac0x2a/nayco

Nayco(内湖) is all in one micro DataLake for IoT

Language: JavaScript - Size: 9.63 MB - Last synced: 9 days ago - Pushed: over 1 year ago - Stars: 11 - Forks: 0

AWS-Big-Data-Projects/AWS-Data-Lake

AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.

Size: 17.6 KB - Last synced: 29 days ago - Pushed: over 3 years ago - Stars: 16 - Forks: 3

DataTech-Solutions/Threat-Detection-and-Visualization

Threat Detection and Visualization

Language: TSQL - Size: 11.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 25 - Forks: 153

bloomberg/trino Fork of trinodb/trino

Trino, the distributed SQL query engine for big data

Size: 223 MB - Last synced: 27 days ago - Pushed: 2 months ago - Stars: 10 - Forks: 8

epomatti/az-data-services

End-to-end scenario for Azure data services.

Language: HCL - Size: 354 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

liyichencc/incubator-paimon Fork of apache/paimon

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

Size: 24.4 MB - Last synced: 24 days ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

CanaanGM/databases-infrastructure

on demand databases deployment, varuios kinds, adding more as i use them!

Language: Python - Size: 4.02 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

hbuddana/Azure_Data_Factory_COVID-19_Reporting

Data Engineering Project on Covid19 Reporting – Using Azure Data Factory, Databricks, HDInsight, Azure Data Factory – An End to End ETL pipeline in addition to a Power BI report dashboard.

Language: Jupyter Notebook - Size: 16.6 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

cuebook/cuelake

Use SQL to build ELT pipelines on a data lakehouse.

Language: JavaScript - Size: 28 MB - Last synced: 7 months ago - Pushed: almost 2 years ago - Stars: 282 - Forks: 27

law-pal/data_pipelines

Data Pipelines for moving and processing data for analytics.

Language: Python - Size: 1000 Bytes - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

pprzetacznik/datalake-aws

Sample data lake pipeline on AWS implemented using Terraform

Language: HCL - Size: 133 KB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

fortinux/bigdata-book

Libro Fundamentos de Big Data

Language: Jupyter Notebook - Size: 4.94 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 1 - Forks: 0

3amory99/Sparkify-App-Data-Lake-Using-Apache-Spark-and-S3

Sparkify app, my objective is to assist Sparkify, a music streaming startup, in migrating its data warehouse to a data lake. To achieve this, I have developed an ETL (Extract, Transform, Load) pipeline. This pipeline is designed to extract data from S3, process it using Apache Spark, and subsequently load the processed data into a new S3 storage lo

Language: Jupyter Notebook - Size: 1.02 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0

thunchanokbow/audibleBook_Revenue

Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand

Language: Jupyter Notebook - Size: 11.6 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

neuro-ml/tarn

An insanely customizable framework for key-value storage 💾

Language: Python - Size: 336 KB - Last synced: 2 days ago - Pushed: about 1 month ago - Stars: 1 - Forks: 0

lynnlangit/serverless-architecture

Companion to my Linked In Learning 'Serverless Architecture' course

Size: 5.77 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 5 - Forks: 2

lobiouni/EBAC

Nesse repositório eu faço o upload dos códigos gerados no curso profissionalizante de analista de dados da Escola Britânica de Artes Criativas e Tecnologia - EBAC (https://ebaconline.com.br/analista-de-dados).

Language: Jupyter Notebook - Size: 320 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 4 - Forks: 0

mfilipelino/kafka2hdfs

pyspark streaming kafka(0.8.2) to hdfs

Language: Python - Size: 5.86 KB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 5 - Forks: 1