An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: datalake

mchien15/datascience

Soccer Players Data Analyst and Similar Players Finder

Language: Jupyter Notebook - Size: 44.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

UncoderIO/Roota

Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automated translation into other languages

Size: 271 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 109 - Forks: 8

Phelipe-Sempreboni/informations

Repository for tutorials, information and notes on technology in general.

Size: 63.9 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

felipelaptrin/data-lake

This project is a simple proof of concept to implement a data lake using AWS cloud.

Language: Python - Size: 19.5 KB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

amosproj/amos2024ss04-building-information-enhancer

Building Information System for potential energy savings

Language: C# - Size: 6.49 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

trannhatnguyen2/BI_DataLake_Azure

Building Data Lake on the Microsoft Azure Cloud Platform

Size: 72.3 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

trannhatnguyen2/BI_Cloud_KienTap

Building a Business Intelligence Solution on the Microsoft Azure Cloud Platform with Dynamic ELT Integration

Language: Jupyter Notebook - Size: 37.6 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Asami1997/Data-Engineering-Nanodegree

Language: Jupyter Notebook - Size: 33.1 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

vitorjpc10/etl-breweries

Brewery Data Pipeline - This project implements a data pipeline to fetch, transform, and persist brewery data from the Open Brewery DB API into a data lake, following the medallion architecture (bronze, silver, gold layers). The pipeline is orchestrated using Apache Airflow and runs within Docker containers, coordinated via Docker Compose.

Language: Python - Size: 285 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

abdullahkhawer/aws-auto-terminate-idle-emr

An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.

Language: Python - Size: 22.5 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 26 - Forks: 16

essraahmed/Data-Lake-with-Spark

Data Lake with Spark

Language: Python - Size: 37.1 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

naiborhujosua/Data-Scientist-learning-path-using-databricks

This is the summary of learning Data Science using Databricks

Size: 51.8 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

seyed-nouraie/Azure-Security-Data-Lake

A platform for extracting and shipping security value from your data lake to Sentinel.

Size: 174 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 2

memiiso/debezium-server-batch 📦

Debezium server batch consumers

Language: Java - Size: 406 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 2

federicopfund/data-engineer

Proceso ETL

Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 34

UncoderIO/Uncoder_IO

An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.

Language: Python - Size: 2.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 101 - Forks: 16

HamzaKaGit/Data_Engineering_Essentials

This Repository will cover all the important data engineering concepts, skills that will help you become a successful data engineer. You will learn the basics of data engineer, the important algorithms used by data engineer and look at the data engineer roles and responsibilities in this Data Engineering.

Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

MehdiTAZI/BigData-Platform

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]

Language: Jupyter Notebook - Size: 85 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 6

nazish555/Tokyo-Olympics-Azure-Data-Engineering-Project

This project leverages Azure Cloud services like Azure Data Factory, Azure Databricks, and Synapse Analytics to execute a data engineering workflow. Utilizing data sourced from the Olympic API on GitHub, it involves extracting raw data into Azure Data Lake Storage, transforming it with PySpark on Azure Databricks, and analyzing the transformed data

Language: Jupyter Notebook - Size: 337 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kimtth/pyspark-tika-text-extraction

🚴‍♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.

Language: Python - Size: 261 MB - Last synced at: 23 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

hifxit/dataligo

A library to accelerate ML and ETL pipeline by connecting all data sources

Language: Python - Size: 879 KB - Last synced at: 4 months ago - Pushed at: about 2 years ago - Stars: 47 - Forks: 3

sanogotech/minIO-trino-hive-docker Fork of sensei23/trino-hive-docker

MinIO trino + hive + minio with postgres in docker compose

Language: Dockerfile - Size: 267 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

rifa8/data-warehouse-submission

Learning about Data Warehouse

Size: 1.19 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

micvet/data-eng-project-amazon

O objetivo deste projeto foi aplicar os conhecimentos nas ferramentas de extração e tratamento de dados da plataforma Azure.

Language: Jupyter Notebook - Size: 78.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ac-gomes/data_engineer_with_airflow

Este projeto é uma adaptação com base em um teste real para uma posição de Engenheiro de Dados Jr.

Language: Python - Size: 2.5 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

ExpediaGroup/apiary

Apiary provides modules which can be combined to create a federated cloud data lake

Size: 303 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 35 - Forks: 8

Onniedvin/Python-ETL-Data-Pipeline-with-AWS

Harjottelua IaC parissa käyttäen Terraformia ja AWS.

Language: Python - Size: 267 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

relloyd/halfpipe 📦

Halfpipe is an ELT utility and microservice that streams data into Snowflake from S3, Oracle, SQL Server, Netezza and more. Continuous data integration patterns wrapped into single commands. ODBC support is available and Postgres is on the roadmap.

Language: Go - Size: 64.2 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 3

phammylinh2002/Implementing-a-Data-Lake-Using-MongoDB-Integrated-with-BigQuery

This project is a part of my major project at my university and I am the one who was responsible for the Implementation of Data Lake on MongoDB (Integrated with BigQuery is an extension of the project)

Language: Python - Size: 2.77 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

FeuerwehrHackathon2024/FireLake

Idee einer Plattform für Daten die für einen Feuerwehreinsatz relevant sein können.

Size: 20.3 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

vfx-beavers/de-sprint-7

Организация Data Lake

Language: Python - Size: 202 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rlevchenko/terraform-azure-data

Terraform script to deploy almost all Azure Data Services

Language: HCL - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 26

Tanay0510/Data-Lake-with-Spark

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

Language: Python - Size: 418 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

kassette-ai/kassette-server

Secured pipelines for your reporting and auditing data

Language: Go - Size: 858 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

KirillZhul/de-project-sprint-7 Fork of yandex-praktikum/de-project-sprint-7

PySpark, DataLake

Language: Python - Size: 71.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Stefen-Taime/azurePipeline

Azure Data Pipeline

Language: Jupyter Notebook - Size: 95.7 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bluishglc/serverless-datalake-example

A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Language: Shell - Size: 212 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 4

aessing/demo-mdwh

Modern Dataware House Demos with Azure Databricks, Azure Data Factory & Azure Dedicated SQL pool (formerly SQL DW)

Size: 48.3 MB - Last synced at: 4 days ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 2

Sheitak/datalake-jljq

Data Lake project for ingest and transform financial data and dashboard BI proposal

Language: Python - Size: 41 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

NikiReis/Data-Engineer

Repository intended to upload the codes challenges and notes, through the path of the bootcamp

Language: Jupyter Notebook - Size: 286 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

tac0x2a/nayco

Nayco(内湖) is all in one micro DataLake for IoT

Language: JavaScript - Size: 9.63 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 0

AWS-Big-Data-Projects/AWS-Data-Lake

AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.

Size: 17.6 KB - Last synced at: about 17 hours ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

DataTech-Solutions/Threat-Detection-and-Visualization

Threat Detection and Visualization

Language: TSQL - Size: 11.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 153

poshkaran04/stocks_data_transform

This is in order to add additional stocks data information using dbt.

Language: Python - Size: 86.4 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 1

bloomberg/trino Fork of trinodb/trino

Trino, the distributed SQL query engine for big data

Size: 248 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 10 - Forks: 8

epomatti/az-data-services

End-to-end scenario for Azure data services.

Language: HCL - Size: 354 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

liyichencc/incubator-paimon Fork of apache/paimon

Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.

Size: 24.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

epomatti/az-e2e-data-eng-proj

Data engineering with Azure services

Language: HCL - Size: 404 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 1

CanaanGM/databases-infrastructure

on demand databases deployment, varuios kinds, adding more as i use them!

Language: Python - Size: 4.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

thunchanokbow/AudibleBook-Revenue

Manage big data on cloud computing to find a list of best-selling audible books, generate reports and dashboards, and provide products and sales promotions that meet the needs of consumers in Thailand

Language: Jupyter Notebook - Size: 11.6 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

hbuddana/Azure_Data_Factory_COVID-19_Reporting

Data Engineering Project on Covid19 Reporting – Using Azure Data Factory, Databricks, HDInsight, Azure Data Factory – An End to End ETL pipeline in addition to a Power BI report dashboard.

Language: Jupyter Notebook - Size: 16.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

law-pal/data_pipelines

Data Pipelines for moving and processing data for analytics.

Language: Python - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

pprzetacznik/datalake-aws

Sample data lake pipeline on AWS implemented using Terraform

Language: HCL - Size: 133 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

3amory99/Sparkify-App-Data-Lake-Using-Apache-Spark-and-S3

Sparkify app, my objective is to assist Sparkify, a music streaming startup, in migrating its data warehouse to a data lake. To achieve this, I have developed an ETL (Extract, Transform, Load) pipeline. This pipeline is designed to extract data from S3, process it using Apache Spark, and subsequently load the processed data into a new S3 storage lo

Language: Jupyter Notebook - Size: 1.02 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

elastacloud/parquet-usql

A custom extractor designed to read parquet for Azure Data Lake Analytics

Language: C# - Size: 1.38 MB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 13 - Forks: 5

lobiouni/EBAC

Nesse repositório eu faço o upload dos códigos gerados no curso profissionalizante de analista de dados da Escola Britânica de Artes Criativas e Tecnologia - EBAC (https://ebaconline.com.br/analista-de-dados).

Language: Jupyter Notebook - Size: 320 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 4 - Forks: 0

mfilipelino/kafka2hdfs

pyspark streaming kafka(0.8.2) to hdfs

Language: Python - Size: 5.86 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

RJ-SMTR/wiki

📚 Documentação de Dados e Inovação

Language: CSS - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ash-0521/Jersey-Ecommerce-store--Database-and-Data-warehousing-ETL

Developed a robust e-Store database/ Datawarehouse for seamless management of jersey orders, customer data, employee info, and inventory. Ensure accurate order recording, real-time inventory updates, and smooth multi-user access using ETL process & OracleSuite

Size: 5.33 MB - Last synced at: 10 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Mimetis/ProjectY

Project Y is a straightforward Landing Zones automated deployment tool dedicated to data processing.

Language: C# - Size: 5.72 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 5

bobbyngo/Formula1

Formula1 ADF pipeline

Language: Python - Size: 1.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

herry13/airbyte-trino-superset

Data Analytics Platform using Airbyte+Trino+Superset

Language: HCL - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

mehroosali/s3-redshift-batch-etl-pipeline

Built functional python ETL script with functions that initialized spark clusters using pyspark library to extract songs stored in S3 bucket. Partitioned songs data by year and artist_id and compressed in parquet output files to increase load performance. Used the overwrite mode in spark to ensure every new run of ELT script is overwritten in the data lake to avoid duplicates. Orchestrated ELT data pipeline that extracts from S3, loads in redshift for transformation and loads output back to S3. Used hooks in airflow to make connection credentials configurable in order to separate access rights from code base for security. Used operators to execute loading and transformation scripts for redshift with airflow DAG.

Language: Python - Size: 944 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 3

tfabien/kylo-sandbox-docker

A dockerized Kylo sandbox

Language: Dockerfile - Size: 1.97 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 14

Wellikiandre/Formacao-Engenheiro-de-Dados-Cloud-e-Big-Data-Azure-DataBricks-

Formação Engenheiro de Dados Cloud e Big Data (Azure & DataBricks)

Size: 10.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 2

murilobellatini/ifood-data-architect-test

My solution to the iFood Data Architect Test using PySpark, Jupyter and Docker in order to create a local prototype data lake.

Language: Jupyter Notebook - Size: 158 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

CharlieSergeant/airflow-minio-postgres-fastapi

Sample data store project to be hosted on a remote server or cluster. CICD using GitHub actions for SSH Deploy to remote server for docker compose.

Language: Python - Size: 26.4 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

gpism/OpenDataCore

Welcome to the fascinating intersection of Web3, Artificial Intelligence (AI), Open Data Core (ODC), and Composable Enterprise Fabric - a nexus of modern technologies that are significantly reshaping the enterprise landscape

Language: Java - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

iBalajiShanmugam/formual1

"Explore Formula 1 data analytics with this project. Leveraging the Ergast API, it utilizes Databricks Spark for ingestion, transformation, and analysis. ADLS acts as the storage layer, while Power BI visualizes the ADLS presentation layer. Uncover insights in the world of Formula 1 through powerful data analytics."

Language: Python - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

cooptimize/Dataverse

Tools and samples to help reporting from Dataverse. Primarily focused on Data Lake based reporting.

Language: TSQL - Size: 110 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 3

vincentnam/docker_datalake

Datalake

Language: JavaScript - Size: 49.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 21 - Forks: 11

dataalways/CoinMetrics-formula-builder-models

A collection of json files used to automatically create models at https://charts.coinmetrics.io/formulas/

Size: 7.81 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 14 - Forks: 0

vforteli/DataLakeFileSystemClientExtension

Extension method for listing paths in parallel with Azure DataLakeFileSystemClient

Language: C# - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ylder/20230514_historicoBolsasCapes

Coleta, armazenamento e análise de dados históricos das distribuições de bolsas de estudos do CAPES.

Language: Python - Size: 109 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

bihaiyang/datalake-example

Data lake implementation demo, include iceberg on flink, iceberg on spark, hudi on flink, hudi on spark

Language: Java - Size: 924 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

MuhammadHasaanWahid/Datalake-To-Database-Via-DataBricks

This project extracts data from Datalake and then transfer to Azure SQL Database via Azure DataBricks in Python(Pyspark).

Language: Jupyter Notebook - Size: 8.22 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Hizaak/benchmark-results-audal

Ce dépôt contient l'ensemble des benchmarks réalisés sur le projet AUDAL. L'ultime version visera à avoir les données complètes, correctes et cohérentes de l'exécution des scripts. Plus de détails dans le README.

Size: 60.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MuhammadHasaanWahid/Data-Filtering-Pipeline-ETL

This Project Extracts supply chain data from csv file having 180k records and more than 40 columns from the Azure Datalake Gen2 storage account and do some dataanalysis with Python(Pandas) to find the top 3 countries and filtered the data for top 3 countries and finally transferred it to 3 files in datalake again by creating ETL pipeline in ADF.

Language: Jupyter Notebook - Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

MuhammadHasaanWahid/Data-Cleaning-Pipeline-ETL

This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.

Size: 127 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

wmeints/modern-datawarehouse

A set of resource manager templates to quickly deploy a modern data warehouse

Size: 32.2 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 0

victorskl/genomic-bigdata-spark

Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture

Language: Jupyter Notebook - Size: 172 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 0

DecisioNaut/sparkling_lakes

Part 3 of Udacity's Data Engineering With AWS Nano-Degree

Language: Python - Size: 7.14 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Balajirvp/DE-Zoomcamp

Code/Notes from the Data Engineering Zoomcamp by DataTalksClub

Language: Jupyter Notebook - Size: 9.55 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 9 - Forks: 0

leehuwuj/lake-inspector

Inspect your lakehouse data by using PyArrow

Language: Python - Size: 447 KB - Last synced at: about 20 hours ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

deBroglieeeen/DataQueen

企業のテラバイト以上のデータ処理を高速化するデータレイクハウスです。ダッシュボードによるデータ処理が30秒以上かかっている場合に数秒以内のデータ読み込みを行います。---A data lakehouse that accelerates the processing of corporate terabytes of data or more Data loading within seconds when data processing by dashboards takes 30 seconds or more.

Language: TypeScript - Size: 1.06 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 0

quyetnn1102/udacity-project3-azuredatalake

Building an Azure Data Lake for Bike Share Data Analytics

Language: Jupyter Notebook - Size: 396 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

aravinthsci/Spark_Delta_Lake

Delta Lake Examples

Language: Jupyter Notebook - Size: 285 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 12 - Forks: 12

SimonJang/s3-query-json

Query JSON documents on S3 with SQL

Language: TypeScript - Size: 331 KB - Last synced at: about 11 hours ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

duartejr/bootcamp-covid-research

Segundo bootcamp da dados do curso da Blue Edtech

Language: Jupyter Notebook - Size: 113 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 1

martandsingh/ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

Language: Python - Size: 141 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 71 - Forks: 47

pactera-ai/data2lake

a tool to form a lake on AWS from your data

Language: Python - Size: 1.25 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

OMR5221/esbi_stream

Application to ingest data into DB from API

Language: Python - Size: 26.4 KB - Last synced at: about 2 months ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

Dam1029/iceberg-assembly

汇总Apache Iceberg相关的最新文章、资料以及Demo等

Size: 51.8 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 25 - Forks: 10

r3dlin3/datalake.gen2

dotnet core sample project to upload file to an Azure Data Lake Storage Gen2

Language: C# - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

MarcosMJD/ghcn-d

Data Pipeline from the Global Historical Climatology Network DataSet

Language: Jupyter Notebook - Size: 1.19 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 16 - Forks: 5

xpertdev/tdameritrade-streaming-deleteme Fork of hackingthemarkets/tdameritrade-streaming 📦

Streaming order book data from TD Ameritrade API

Language: Python - Size: 74.2 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

randleon/Information-Architectures

assignments and projects for Yeshiva University's Katz School Information Architectures course, spring 2020

Language: Jupyter Notebook - Size: 1.95 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

fernandito77777/AWSDataAnalyticsPostgreWorkshop

Workshop Database RDS Postgre Integration and offloading to Data Lake, and visualize the data to QuickSight

Size: 5.63 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

helenamin/databricks_PerthProperties

A Re-do of Perth City Properties project using Azure Data Engineering technologies such as Azure Data Factory (ADF), Azure Data Lake Storage Gen2, Azure Blob Storage, Azure Databricks.

Language: Python - Size: 1.23 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0