An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: awsglue

muhd-minhaz/AWS-Glue--Data-Copy

The function for copying data like CSV, Parquet, avro etc., from a source S3 bucket to a destination S3 bucket using AWS Glue. It includes the necessary setup for the Glue job, logging, reading data from the source bucket, and writing it to the destination bucket

Language: Python - Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

anushreebiswas/AWS-End-to-End-Data-Engineering-Project-with-Music-Data

AWS

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

safwanhamza/DeFtunes

DeFtunes - a complete data ecosystem for a music streaming case study, work done as part of a Data Engineering learning & capstone project of DeepLearning.ai Certificate.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

prestodb/prestorials

Tutorials and examples of how to deploy Presto and connect it to different data sources

Size: 1.11 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 15

vidupriya/AWS-Glue--Data-Copy

The function for copying data like CSV, Parquet, avro etc., from a source S3 bucket to a destination S3 bucket using AWS Glue. It includes the necessary setup for the Glue job, logging, reading data from the source bucket, and writing it to the destination bucket

Language: Python - Size: 2.93 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

sumit-sinha9/Superstore-Data-Analysis-using-AWS

This project builds a pipeline to analyze Superstore sales data using the power of AWS. It transforms the data to make it ready for exploration. Querying the transformed data using SQL queries to uncover trends and patterns. Analyzing results and creates easy-to-understand visualizations, providing clear insights into Superstore sales performance.

Size: 1.48 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Talha1a/Super-Store-sales-Analysis-using-AWS

This project demonstrates the use of Amazon Web Services (AWS) to analyze superstore sales data. The analysis was performed using AWS S3 for data storage, AWS Glue for data cataloging, AWS Athena for SQL-based serverless data querying, and AWS Quick Sight for visualization. The project’s objective was to provide actionable insights into sales trend

Size: 1.18 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

najmaelboutaheri/Data-Engineering-Project-Youtube

This project aims to securely manage, process, and analyze structured and semi-structured YouTube data based on video categories and trending metrics. The architecture leverages AWS services to ingest, store, transform, analyze, and visualize data efficiently and at scale.

Language: Python - Size: 32.2 KB - Last synced at: 27 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Harikishan-AI/Harikishan-AI

I am dedicated to delivering innovative solutions that align with business objectives while ensuring optimal performance, reliability, and security. My strong analytical skills, attention to detail, and problem-solving abilities drive me to create effective and efficient solutions.

Size: 48.8 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Cuchuflim/ETL-S3-to-Redshift

Incremental Data Load from S3 Bucket to Amazon Redshift Using AWS Glue

Language: Python - Size: 13.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

TanishkaMarrott/Real-Time-Streaming-Analytics-with-Kinesis-Flink-and-OpenSearch

This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.

Language: Java - Size: 586 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

catherman/Data-Science-Miscellaneous

AWS S3 & Sentiment Analysis, Basic Plotting with Matplotlib, & Supervised Learning & Machine Learning with Sklearn.

Language: Jupyter Notebook - Size: 2.96 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

VivekaAryan/Reddit-Data-Pipeline

This project offers a robust data pipeline solution designed to efficiently extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. Leveraging a blend of industry-standard tools and services, the pipeline ensures seamless data processing and integration.

Language: Jupyter Notebook - Size: 1.06 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

bhavanachitragar/Superstore-Data-Analysis-using-AWS

This project builds a pipeline to analyze Superstore sales data using the power of AWS. It transforms the data to make it ready for exploration. Querying the transformed data using SQL queries to uncover trends and patterns. Analyzing results and creates easy-to-understand visualizations, providing clear insights into Superstore sales performance.

Size: 821 KB - Last synced at: about 12 hours ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nazish555/AWS-Data_Engineering-Spotify_Data

This project showcases a data transformation pipeline utilizing AWS Glue and Amazon Athena to process Spotify data from CSV files. It involves loading, transforming, and storing data in an S3 datawarehouse, enabling seamless querying through Amazon Athena.

Language: Python - Size: 158 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

parth2050/aws-data-pipeline

An End-To-End data pipeline integration from Website Source to analytical dashboard in AWS using Python flask, ML models, DynamoDB and other AWS services.

Language: HTML - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shaundominic/Kafka-Streaming-Project

Leverages Apache Kafka to facilitate streaming real time data generated by Python to upload data into S3 using s3fs

Language: Python - Size: 1.95 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

nischaybikramthapa/dbt-athena-tpch

This project demonstrates how you can build downstream data pipeline using dbt in athena

Language: Python - Size: 297 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pawanyoda/create_glue_table_using_gitlab_cicd

Create Glue table using CI -CD

Size: 3.91 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

shreyask1406/Financial-Market-AWS-Data-Pipeline

AWS Data pipeline

Size: 2.67 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

riship1095/YouTube-ETL

Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.

Language: Python - Size: 9.77 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Undisputed-jay/SpotifyAPI-Data-Engineering-Project

This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).

Language: Jupyter Notebook - Size: 2.22 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

iqrabismii/Big-Data-Projects-

Projects on Big Data Using Pyspark and AWS

Language: Jupyter Notebook - Size: 2.05 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0