GitHub topics: datalake
mchien15/datascience
Soccer Players Data Analyst and Similar Players Finder
Language: Jupyter Notebook - Size: 44.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

UncoderIO/Roota
Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automated translation into other languages
Size: 271 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 109 - Forks: 8

Phelipe-Sempreboni/informations
Repository for tutorials, information and notes on technology in general.
Size: 63.9 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

felipelaptrin/data-lake
This project is a simple proof of concept to implement a data lake using AWS cloud.
Language: Python - Size: 19.5 KB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

amosproj/amos2024ss04-building-information-enhancer
Building Information System for potential energy savings
Language: C# - Size: 6.49 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

trannhatnguyen2/BI_DataLake_Azure
Building Data Lake on the Microsoft Azure Cloud Platform
Size: 72.3 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

trannhatnguyen2/BI_Cloud_KienTap
Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration
Language: Jupyter Notebook - Size: 37.6 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Asami1997/Data-Engineering-Nanodegree
Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

vitorjpc10/etl-breweries
Brewery Data Pipeline - This project implements a data pipeline to fetch, transform, and persist brewery data from the Open Brewery DB API into a data lake, following the medallion architecture (bronze, silver, gold layers). The pipeline is orchestrated using Apache Airflow and runs within Docker containers, coordinated via Docker Compose.
Language: Python - Size: 285 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

abdullahkhawer/aws-auto-terminate-idle-emr
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 26 - Forks: 16

essraahmed/Data-Lake-with-Spark
Data Lake with Spark
Language: Python - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

naiborhujosua/Data-Scientist-learning-path-using-databricks
This is the summary of learning Data Science using Databricks
Size: 51.8 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

seyed-nouraie/Azure-Security-Data-Lake
A platform for extracting and shipping security value from your data lake to Sentinel.
Size: 174 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 2

memiiso/debezium-server-batch 📦
Debezium server batch consumers
Language: Java - Size: 406 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 2

federicopfund/data-engineer
Proceso ETL
Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

vim89/datapipelines-essentials-python
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

UncoderIO/Uncoder_IO
An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
Language: Python - Size: 2.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 16

HamzaKaGit/Data_Engineering_Essentials
This Repository will cover all the important data engineering concepts, skills that will help you become a successful data engineer. You will learn the basics of data engineer, the important algorithms used by data engineer and look at the data engineer roles and responsibilities in this Data Engineering.
Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

MehdiTAZI/BigData-Platform
End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]
Language: Jupyter Notebook - Size: 85 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 6

nazish555/Tokyo-Olympics-Azure-Data-Engineering-Project
This project leverages Azure Cloud services like Azure Data Factory, Azure Databricks, and Synapse Analytics to execute a data engineering workflow. Utilizing data sourced from the Olympic API on GitHub, it involves extracting raw data into Azure Data Lake Storage, transforming it with PySpark on Azure Databricks, and analyzing the transformed data
Language: Jupyter Notebook - Size: 337 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kimtth/pyspark-tika-text-extraction
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
Language: Python - Size: 261 MB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

hifxit/dataligo
A library to accelerate ML and ETL pipeline by connecting all data sources
Language: Python - Size: 879 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 47 - Forks: 3

sanogotech/minIO-trino-hive-docker Fork of sensei23/trino-hive-docker
MinIO trino + hive + minio with postgres in docker compose
Language: Dockerfile - Size: 267 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

rifa8/data-warehouse-submission
Learning about Data Warehouse
Size: 1.19 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

micvet/data-eng-project-amazon
O objetivo deste projeto foi aplicar os conhecimentos nas ferramentas de extração e tratamento de dados da plataforma Azure.
Language: Jupyter Notebook - Size: 78.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ac-gomes/data_engineer_with_airflow
Este projeto é uma adaptação com base em um teste real para uma posição de Engenheiro de Dados Jr.
Language: Python - Size: 2.5 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

ExpediaGroup/apiary
Apiary provides modules which can be combined to create a federated cloud data lake
Size: 303 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 35 - Forks: 8

Onniedvin/Python-ETL-Data-Pipeline-with-AWS
Harjottelua IaC parissa käyttäen Terraformia ja AWS.
Language: Python - Size: 267 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

relloyd/halfpipe 📦
Halfpipe is an ELT utility and microservice that streams data into Snowflake from S3, Oracle, SQL Server, Netezza and more. Continuous data integration patterns wrapped into single commands. ODBC support is available and Postgres is on the roadmap.
Language: Go - Size: 64.2 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 3

phammylinh2002/Implementing-a-Data-Lake-Using-MongoDB-Integrated-with-BigQuery
This project is a part of my major project at my university and I am the one who was responsible for the Implementation of Data Lake on MongoDB (Integrated with BigQuery is an extension of the project)
Language: Python - Size: 2.77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

FeuerwehrHackathon2024/FireLake
Idee einer Plattform für Daten die für einen Feuerwehreinsatz relevant sein können.
Size: 20.3 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

vfx-beavers/de-sprint-7
Организация Data Lake
Language: Python - Size: 202 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rlevchenko/terraform-azure-data
Terraform script to deploy almost all Azure Data Services
Language: HCL - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 26

Tanay0510/Data-Lake-with-Spark
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Language: Python - Size: 418 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

kassette-ai/kassette-server
Secured pipelines for your reporting and auditing data
Language: Go - Size: 858 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

KirillZhul/de-project-sprint-7 Fork of yandex-praktikum/de-project-sprint-7
PySpark, DataLake
Language: Python - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Stefen-Taime/azurePipeline
Azure Data Pipeline
Language: Jupyter Notebook - Size: 95.7 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bluishglc/serverless-datalake-example
A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
Language: Shell - Size: 212 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 4

aessing/demo-mdwh
Modern Dataware House Demos with Azure Databricks, Azure Data Factory & Azure Dedicated SQL pool (formerly SQL DW)
Size: 48.3 MB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 2

Sheitak/datalake-jljq
Data Lake project for ingest and transform financial data and dashboard BI proposal
Language: Python - Size: 41 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

NikiReis/Data-Engineer
Repository intended to upload the codes challenges and notes, through the path of the bootcamp
Language: Jupyter Notebook - Size: 286 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

tac0x2a/nayco
Nayco(内湖) is all in one micro DataLake for IoT
Language: JavaScript - Size: 9.63 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

AWS-Big-Data-Projects/AWS-Data-Lake
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
Size: 17.6 KB - Last synced at: about 17 hours ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

DataTech-Solutions/Threat-Detection-and-Visualization
Threat Detection and Visualization
Language: TSQL - Size: 11.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 153

poshkaran04/stocks_data_transform
This is in order to add additional stocks data information using dbt.
Language: Python - Size: 86.4 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

bloomberg/trino Fork of trinodb/trino
Trino, the distributed SQL query engine for big data
Size: 248 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 10 - Forks: 8

epomatti/az-data-services
End-to-end scenario for Azure data services.
Language: HCL - Size: 354 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

liyichencc/incubator-paimon Fork of apache/paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Size: 24.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

epomatti/az-e2e-data-eng-proj
Data engineering with Azure services
Language: HCL - Size: 404 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

CanaanGM/databases-infrastructure
on demand databases deployment, varuios kinds, adding more as i use them!
Language: Python - Size: 4.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

thunchanokbow/AudibleBook-Revenue
Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand
Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

hbuddana/Azure_Data_Factory_COVID-19_Reporting
Data Engineering Project on Covid19 Reporting – Using Azure Data Factory, Databricks, HDInsight, Azure Data Factory – An End to End ETL pipeline in addition to a Power BI report dashboard.
Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

law-pal/data_pipelines
Data Pipelines for moving and processing data for analytics.
Language: Python - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

pprzetacznik/datalake-aws
Sample data lake pipeline on AWS implemented using Terraform
Language: HCL - Size: 133 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

3amory99/Sparkify-App-Data-Lake-Using-Apache-Spark-and-S3
Sparkify app, my objective is to assist Sparkify, a music streaming startup, in migrating its data warehouse to a data lake. To achieve this, I have developed an ETL (Extract, Transform, Load) pipeline. This pipeline is designed to extract data from S3, process it using Apache Spark, and subsequently load the processed data into a new S3 storage lo
Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

elastacloud/parquet-usql
A custom extractor designed to read parquet for Azure Data Lake Analytics
Language: C# - Size: 1.38 MB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 13 - Forks: 5

lobiouni/EBAC
Nesse repositório eu faço o upload dos códigos gerados no curso profissionalizante de analista de dados da Escola Britânica de Artes Criativas e Tecnologia - EBAC (https://ebaconline.com.br/analista-de-dados).
Language: Jupyter Notebook - Size: 320 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

mfilipelino/kafka2hdfs
pyspark streaming kafka(0.8.2) to hdfs
Language: Python - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

RJ-SMTR/wiki
📚 Documentação de Dados e Inovação
Language: CSS - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ash-0521/Jersey-Ecommerce-store--Database-and-Data-warehousing-ETL
Developed a robust e-Store database/ Datawarehouse for seamless management of jersey orders, customer data, employee info, and inventory. Ensure accurate order recording, real-time inventory updates, and smooth multi-user access using ETL process & OracleSuite
Size: 5.33 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Mimetis/ProjectY
Project Y is a straightforward Landing Zones automated deployment tool dedicated to data processing.
Language: C# - Size: 5.72 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 5

bobbyngo/Formula1
Formula1 ADF pipeline
Language: Python - Size: 1.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

herry13/airbyte-trino-superset
Data Analytics Platform using Airbyte+Trino+Superset
Language: HCL - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mehroosali/s3-redshift-batch-etl-pipeline
Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.
Language: Python - Size: 944 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

tfabien/kylo-sandbox-docker
A dockerized Kylo sandbox
Language: Dockerfile - Size: 1.97 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 14

Wellikiandre/Formacao-Engenheiro-de-Dados-Cloud-e-Big-Data-Azure-DataBricks-
Formação Engenheiro de Dados Cloud e Big Data (Azure & DataBricks)
Size: 10.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2

murilobellatini/ifood-data-architect-test
My solution to the iFood Data Architect Test using PySpark, Jupyter and Docker in order to create a local prototype data lake.
Language: Jupyter Notebook - Size: 158 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

CharlieSergeant/airflow-minio-postgres-fastapi
Sample data store project to be hosted on a remote server or cluster. CICD using GitHub actions for SSH Deploy to remote server for docker compose.
Language: Python - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

gpism/OpenDataCore
Welcome to the fascinating intersection of Web3, Artificial Intelligence (AI), Open Data Core (ODC), and Composable Enterprise Fabric - a nexus of modern technologies that are significantly reshaping the enterprise landscape
Language: Java - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

iBalajiShanmugam/formual1
"Explore Formula 1 data analytics with this project. Leveraging the Ergast API, it utilizes Databricks Spark for ingestion, transformation, and analysis. ADLS acts as the storage layer, while Power BI visualizes the ADLS presentation layer. Uncover insights in the world of Formula 1 through powerful data analytics."
Language: Python - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

cooptimize/Dataverse
Tools and samples to help reporting from Dataverse. Primarily focused on Data Lake based reporting.
Language: TSQL - Size: 110 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 3

vincentnam/docker_datalake
Datalake
Language: JavaScript - Size: 49.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 11

dataalways/CoinMetrics-formula-builder-models
A collection of json files used to automatically create models at https://charts.coinmetrics.io/formulas/
Size: 7.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

vforteli/DataLakeFileSystemClientExtension
Extension method for listing paths in parallel with Azure DataLakeFileSystemClient
Language: C# - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ylder/20230514_historicoBolsasCapes
Coleta, armazenamento e análise de dados históricos das distribuições de bolsas de estudos do CAPES.
Language: Python - Size: 109 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

bihaiyang/datalake-example
Data lake implementation demo, include iceberg on flink, iceberg on spark, hudi on flink, hudi on spark
Language: Java - Size: 924 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

MuhammadHasaanWahid/Datalake-To-Database-Via-DataBricks
This project extracts data from Datalake and then transfer to Azure SQL Database via Azure DataBricks in Python(Pyspark).
Language: Jupyter Notebook - Size: 8.22 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Hizaak/benchmark-results-audal
Ce dépôt contient l'ensemble des benchmarks réalisés sur le projet AUDAL. L'ultime version visera à avoir les données complètes, correctes et cohérentes de l'exécution des scripts. Plus de détails dans le README.
Size: 60.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MuhammadHasaanWahid/Data-Filtering-Pipeline-ETL
This Project Extracts supply chain data from csv file having 180k records and more than 40 columns from the Azure Datalake Gen2 storage account and do some dataanalysis with Python(Pandas) to find the top 3 countries and filtered the data for top 3 countries and finally transferred it to 3 files in datalake again by creating ETL pipeline in ADF.
Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

MuhammadHasaanWahid/Data-Cleaning-Pipeline-ETL
This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.
Size: 127 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

wmeints/modern-datawarehouse
A set of resource manager templates to quickly deploy a modern data warehouse
Size: 32.2 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

victorskl/genomic-bigdata-spark
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
Language: Jupyter Notebook - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

DecisioNaut/sparkling_lakes
Part 3 of Udacity's Data Engineering With AWS Nano-Degree
Language: Python - Size: 7.14 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Balajirvp/DE-Zoomcamp
Code/Notes from the Data Engineering Zoomcamp by DataTalksClub
Language: Jupyter Notebook - Size: 9.55 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 0

leehuwuj/lake-inspector
Inspect your lakehouse data by using PyArrow
Language: Python - Size: 447 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

deBroglieeeen/DataQueen
企業のテラバイト以上のデータ処理を高速化するデータレイクハウスです。ダッシュボードによるデータ処理が30秒以上かかっている場合に数秒以内のデータ読み込みを行います。---A data lakehouse that accelerates the processing of corporate terabytes of data or more Data loading within seconds when data processing by dashboards takes 30 seconds or more.
Language: TypeScript - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

quyetnn1102/udacity-project3-azuredatalake
Building an Azure Data Lake for Bike Share Data Analytics
Language: Jupyter Notebook - Size: 396 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

aravinthsci/Spark_Delta_Lake
Delta Lake Examples
Language: Jupyter Notebook - Size: 285 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 12 - Forks: 12

SimonJang/s3-query-json
Query JSON documents on S3 with SQL
Language: TypeScript - Size: 331 KB - Last synced at: about 11 hours ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

duartejr/bootcamp-covid-research
Segundo bootcamp da dados do curso da Blue Edtech
Language: Jupyter Notebook - Size: 113 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

martandsingh/ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
Language: Python - Size: 141 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 47

pactera-ai/data2lake
a tool to form a lake on AWS from your data
Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

OMR5221/esbi_stream
Application to ingest data into DB from API
Language: Python - Size: 26.4 KB - Last synced at: about 2 months ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

Dam1029/iceberg-assembly
汇总Apache Iceberg相关的最新文章、资料以及Demo等
Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 10

r3dlin3/datalake.gen2
dotnet core sample project to upload file to an Azure Data Lake Storage Gen2
Language: C# - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

MarcosMJD/ghcn-d
Data Pipeline from the Global Historical Climatology Network DataSet
Language: Jupyter Notebook - Size: 1.19 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 5

xpertdev/tdameritrade-streaming-deleteme Fork of hackingthemarkets/tdameritrade-streaming 📦
Streaming order book data from TD Ameritrade API
Language: Python - Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

randleon/Information-Architectures
assignments and projects for Yeshiva University's Katz School Information Architectures course, spring 2020
Language: Jupyter Notebook - Size: 1.95 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

fernandito77777/AWSDataAnalyticsPostgreWorkshop
Workshop Database RDS Postgre Integration and offloading to Data Lake, and visualize the data to QuickSight
Size: 5.63 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

helenamin/databricks_PerthProperties
A Re-do of Perth City Properties project using Azure Data Engineering technologies such as Azure Data Factory (ADF), Azure Data Lake Storage Gen2, Azure Blob Storage, Azure Databricks.
Language: Python - Size: 1.23 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0
