An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: aws-athena

omurkoc/enhanced-sentiment-analysis

A unique sentiment analysis model on IMDB reviews with custom negation handling. Instead of generic preprocessing, it smartly tags words after negators like "not" (e.g., "not good" → "not_good"), preserving sentiment context. Comparison of models with and without this logic shows improved accuracy and real-world reliability.

Language: Python - Size: 9.77 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

gps31320779/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Size: 8.79 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

saifuzzuhdi123/apache_kafka_stock_market_data_streaming

This repository provides a clear guide on using Apache Kafka for real-time stock market data streaming. 📈 Explore how to set up producers and consumers, and see practical applications in financial data processing. 🛠️

Language: Jupyter Notebook - Size: 2.46 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

pizofreude/insightflow-retail-economic-pipeline

A data engineering portfolio project using AWS cloud services to analyze correlations between Malaysian retail performance and fuel prices. Features Terraform IaC, ETL/ELT with AWS S3, Glue, SQL analytics via Athena coupled with data transformation via dbt, and workflow orchestration with Kestra.

Language: HCL - Size: 2.38 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

ccao-data/data-architecture

Codebase for CCAO data infrastructure construction and management

Language: R - Size: 31 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 6 - Forks: 4

Omio-saha/Spotify_Data_Pipe_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Size: 1000 Bytes - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

ghfjd/youtube-veri-analizi-sunum

Veri analizi hakkında hazırladığım sunum

Size: 1000 Bytes - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

The-AI-Alliance/analytics

Repository for the AI Alliance Analytics Stack

Language: Python - Size: 310 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 1

commoncrawl/cc-notebooks

Various Jupyter notebooks about Common Crawl data

Language: Jupyter Notebook - Size: 3.01 MB - Last synced at: 12 days ago - Pushed at: 3 months ago - Stars: 54 - Forks: 11

commoncrawl/cc-index-table

Index Common Crawl archives in tabular format

Language: Java - Size: 205 KB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 122 - Forks: 11

ShreyasShende3/reddit-data-engineering

Built a ETL pipeline using Airflow and then used various AWS tools for further processing, storage and visualization like S3, Glue, Athena and Redshift

Language: Python - Size: 119 KB - Last synced at: 26 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

aws-samples/streamlit-application-deployment-on-aws

Streamlit EDA Dashboard Powered by AWS Cloud

Language: Python - Size: 3.99 MB - Last synced at: 19 days ago - Pushed at: about 1 month ago - Stars: 82 - Forks: 33

dbcli/athenacli

AthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.

Language: Python - Size: 995 KB - Last synced at: 30 days ago - Pushed at: about 3 years ago - Stars: 214 - Forks: 32

tokern/piicatcher

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 311 - Forks: 99

frankndungu/f1-streamlit-data-pipline

A serverless data project showing how to ingest, query, and visualize F1 data using AWS Glue, Athena, and Streamlit.

Language: Python - Size: 217 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

HariSekhon/SQL-scripts

100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS Athena. DBA, Analytics, DevOps, performance engineering. Google BigQuery ML machine learning classification.

Language: Shell - Size: 620 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 444 - Forks: 124

aws-samples/transactional-datalake-using-amazon-datafirehose-iceberg

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS

Language: Python - Size: 546 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 11 - Forks: 1

glassechidna/config2jsonlines

Transform AWS Config snapshots to a more AWS Athena-friendly format.

Language: Go - Size: 276 KB - Last synced at: 25 days ago - Pushed at: almost 5 years ago - Stars: 11 - Forks: 3

vsingh55/NBA-Analytics-Data-Lake

A sports analytics data lake leveraging AWS S3 for storage, AWS Glue for data cataloging, and AWS Athena for querying. Python scripts are used for data ingestion and manages the infrastructure.

Language: Python - Size: 1.32 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aws-samples/aws-glue-streaming-etl-with-delta-lake

Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 314 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 9 - Forks: 0

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

Language: Python - Size: 727 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 32 - Forks: 2

Danitilahun/Reddit-Data-Engineering

This project automates the extraction, transformation, and loading (ETL) of Reddit data into a Redshift data warehouse using Airflow. Key technologies include Celery, PostgreSQL, S3, Glue, Athena, and Redshift, providing a complete data pipeline solution.

Size: 119 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

classmethod/athena-query

Athena-Query provide simple interface to get athena query results.

Language: TypeScript - Size: 433 KB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 10 - Forks: 6

AlexisRodriguezCS/serverless-data-platform

Serverless data platform using AWS Lambda, S3, DynamoDB, Athena & CDK. Upload files, run SQL, and deploy with CI/CD, fully serverless and production-ready

Size: 2.93 KB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

aws-samples/saas-metering-system-on-aws

This project shows how to implement a simple SaaS metering system on AWS

Language: Python - Size: 971 KB - Last synced at: 19 days ago - Pushed at: 2 months ago - Stars: 11 - Forks: 2

aws-samples/aws-analytics-immersion-day

Describes the concepts of lambda architecture and the actual deployment process with an example of building a serverless business intelligence systems using Amazon Kinesis, S3, Athena, OpenSearch Service, and QuickSight.

Language: Python - Size: 12.9 MB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 14 - Forks: 8

ghdna/athena-express

Athena-Express can simplify executing SQL queries in Amazon Athena AND fetching cleaned-up JSON results in the same synchronous or asynchronous request - well suited for web applications.

Language: JavaScript - Size: 214 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 182 - Forks: 70

JaewonSon37/Mining_Big_Data2

Topic: Exploring the Relationship Between Weather and Taxi Demand in Chicago

Language: Jupyter Notebook - Size: 181 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

VandanaBhumireddygari/Data-Engineering-YouTube-Analysis-Project

This project focuses on securely managing, streamlining, and analyzing structured and semi-structured data from YouTube videos based on categories and trending metrics. The goal is to build a comprehensive ETL system to process and transform raw data into a usable format, store it in a centralized data lake, and scale the solutions.

Language: Python - Size: 59.6 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

tedilabs/terraform-aws-data

🌳 A sustainable Terraform Package which creates resources for Data Services on AWS

Language: HCL - Size: 169 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 14 - Forks: 4

aws-samples/aws-glue-streaming-etl-with-apache-iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Language: Python - Size: 465 KB - Last synced at: 19 days ago - Pushed at: 10 months ago - Stars: 23 - Forks: 2

dacort/metabase-athena-driver

An Amazon Athena driver for Metabase 0.32 and later

Language: Clojure - Size: 143 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 223 - Forks: 32

segmentio/go-athena

Golang database/sql driver for AWS Athena

Language: Go - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 130 - Forks: 66

enchant3dmango/esdiel

Esdiel (SDL) stands for serverless data lake. In this project, I'm learning to deploy a simple serverless data lake on AWS using Terraform.

Language: HCL - Size: 544 KB - Last synced at: 7 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Tejesvani/IoT-Data-Streaming-and-Analytics

The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.

Language: Python - Size: 18.6 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

dacort/demo-code

Bits of code I use during live demos

Language: Jupyter Notebook - Size: 774 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 31 - Forks: 24

avegao/aws-athena-node-client

NodeJS AWS Athena client

Language: TypeScript - Size: 589 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 6 - Forks: 1

aws-samples/transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

Language: Python - Size: 701 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

ShubhamMohanty680/Spotify_Snowflake

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS into snowflake datawarehouse. It utilizes AWS services such as Lambda, S3, and CloudWatch to orchestrate the process. The transformed data is then loaded into Snowflake using Snowpipe, and finally visualized in Power BI.

Language: Python - Size: 1.79 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 1

BrianWangila/Sports-Data-Lake-AWS

Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.

Language: Python - Size: 470 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ShubhamMohanty680/Spotify_end_to_end_data_engineering

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

Language: Jupyter Notebook - Size: 1.44 MB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

AntoineGagne/parthenon

A library to parse Athena structures into Erlang terms

Language: Erlang - Size: 52.7 KB - Last synced at: 13 days ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

zablon-oigo/nba-data-lake

This project automates the creation of a data lake for NBA analytics using AWS services

Language: Python - Size: 12.7 KB - Last synced at: 20 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

SWO-GS/athena-cloudtrail-partitioner 📦

Automate the daily partitioning of your CloudTrail bucket in Athena

Language: JavaScript - Size: 671 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 28 - Forks: 7

reyhanhosavci/youtube-veri-analizi-sunum

Veri analizi hakkında hazırladığım sunum

Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

WinterYukky/athena-view

Language: TypeScript - Size: 256 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

ndomah/AWS-YouTube-Data-Analysis

Analyzed YouTube trending video data using AWS services to build a scalable pipeline for data ingestion, ETL, and storage in a centralized data lake. Created QuickSight dashboards highlighting video views by country, category, and region. Workflow included ingestion, preprocessing, cataloging, and analysis.

Language: Python - Size: 968 KB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

gautamgc17/YouTube-Data-Analytics-AWS-Pipeline

The projects aims to build a data engineering pipeline on AWS, for analysis of YouTube data based on video categories and trending metrics.

Language: Python - Size: 54.7 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

zapr-oss/zapr-athena-client

ZAPR AWS athena client is a python library to run the presto query on the AWS Athena.

Language: Python - Size: 18.6 KB - Last synced at: 14 days ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 2

shahidmalik4/aws-glue-stepfunctions-etl

This project automates an ETL pipeline using AWS Glue, S3, Athena, and Step Functions to transform raw Airbnb data. It cleanses, enriches, and organizes the data into separate raw and transformed databases, enabling efficient querying and analysis via Athena, with automated notifications through SNS.

Language: Python - Size: 3.47 MB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

aryan4codes/StockIO

StockIO is a real-time data streaming solution designed to process and analyze stock market data using Apache Kafka and AWS services.

Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

OElesin/querypal

Web UI for Amazon Athena

Language: Vue - Size: 22.6 MB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 55 - Forks: 26

SadafAsad/LinkedIn-Jobs-Analysis

Unveiling job market trends with Scrapy and AWS

Language: Python - Size: 562 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

AWS-Big-Data-Projects/front-line-concussion-monitoring-system-using-AWS-IoT-and-serverless-data-lakes

A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel made using Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda

Language: Shell - Size: 30.3 KB - Last synced at: 5 days ago - Pushed at: almost 5 years ago - Stars: 12 - Forks: 0

Saurabhkhandebharad/BigData-SK

Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!

Language: Python - Size: 8.79 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

Tyriek-cloud/NYC-Mobility-Survey-Analysis

An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.

Language: Python - Size: 2.75 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

TimKong21/AWS-Batch-Processing

Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.

Language: Python - Size: 8.01 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

alash3al/xyr

Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.

Language: Go - Size: 85.9 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 65 - Forks: 3

jxareas/Athena-SpringKlient

POC app to show how to query Athena and integrate the AWS SDK in Spring Boot.

Language: Kotlin - Size: 78.1 KB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

DenysGonzaga/glue-athena-cdk-example

A small walkthrough how to create an AWS Glue Job Pipeline with AWS CDK

Language: Python - Size: 10.7 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

pgrarchives/AWS_DATA_PIPELINE

End to End Data Engineering Pipeline using AWS Cloud Services

Language: Jupyter Notebook - Size: 2.03 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

flemm0/capitol-trades

politician stock market activity web scraping project

Language: Python - Size: 2.26 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

taupirho/read-big-file-aws-athena-glue

Continuing with my case study on reading a big data file, this is the fifth part of my trilogy :-) on how I got on reading a big'ish file with C, Python, spark-python and spark-scala, AWS Elastic Map reduce and AWS Athena.

Language: Python - Size: 45.9 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

sumanthmalipeddi/spotify_trending_telugu

Collecting the list of songs,album and artists list details from the Spotify Music Application in specific intervals using spotipy API and performing ETL Operations using Amazon Cloud Services

Language: Jupyter Notebook - Size: 630 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

h-fuzzy-logic/data-analytics-spring

Open data and cloud computing to answer the question: Are we losing our spring days?

Language: Jupyter Notebook - Size: 390 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

stamixthereal/forecast-athena-query-cost

This Python project offers a business-focused solution for analyzing SQL query logs and predicting memory usage, primarily for AWS Athena. It enhances database performance monitoring and optimization, crucial for data-driven enterprises.

Language: Python - Size: 313 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

UdbhavSrivastava/Youtube-Analysis-Piepline

This AWS-based data pipeline manages data from storage in S3 data lakes, through transformation with AWS Glue and Lambda, to refined storage in separate S3 repositories. Using Athena for SQL querying and QuickSight for interactive dashboards, this solution optimizes data processing and visualization, facilitating informed decision-making and insigh

Language: Python - Size: 494 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

tewfik-ghariani/cloud-storage-analyzer

Analyzing and detecting anomalies in S3 Data using Athena JDBC Driver

Language: Python - Size: 2.58 MB - Last synced at: 4 days ago - Pushed at: 12 months ago - Stars: 1 - Forks: 0

mihirkudale/Stock-Market-Real-Time-Data-Engineering-Project

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

Language: Jupyter Notebook - Size: 2.46 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

mihirkudale/youtube-analysis-data-engineering-project

This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.

Language: Python - Size: 114 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shubhamjais40/AWS-Data-Pipeline-Project-Implementing-Data-Validation-Using-Lambda-based-Gluecrawler-v1.0

This Project demonstrates the Technology shift in Automobile Firm to resolve the data engineering challenge of manual data ops. AWS Cloud Services implemented here as: S3 bucket for lake storage incoming batches, Lambda Python Script for automating the validation function call and Glue Crawler to generate relational table with successful testing.

Language: Python - Size: 347 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Gabyzera/covid_data_lake_analysis

☁️ Análise de dados do data lake de covid-19 da AWS

Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

fermat01/ETL-Data-Pipeline-using-AWS-EMR-Spark-Glue-Athena

Etl data pipeline using aws services

Language: Python - Size: 4.07 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

DimaKuriptya/RedditETL

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

Language: Python - Size: 14.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

VivekRajyaguru/aws-athena

Language: JavaScript - Size: 9.77 KB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

QuiNovas/lambda-pyathena Fork of laughingman7743/PyAthena

PyAthena is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena.

Language: Python - Size: 318 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

blieusong/aws-cookbook

A set of commands that can help when working with AWS

Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sarah-zhan/data_pipeline_amazon_products

An end-to-end data pipeline built with AWS S3, Glue, Crawler, Athena, Tableau visulization

Language: Jupyter Notebook - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

gakas14/Kafka_streaming_project

The project is to simulate Real-time streaming for movie details using Kafka. We used different technologies such as Python, Amazon EC2, Apache Kafka, Glue, Athena, and SQL.

Language: Jupyter Notebook - Size: 1.51 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

san99tiago/aws-cdk-athena-s3-workflow

AWS CDK-TypeScript project to showcase an Athena-based solution for S3 data analysis.

Language: TypeScript - Size: 3.85 MB - Last synced at: about 21 hours ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 2

shiv-rna/Youtube-Data-Engineering-Pipeline

This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

Language: Python - Size: 179 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

kingyiusuen/udacity-data-engineering-nanodegree

Projects for Udacity's Data Engineering Nanodegree

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

prathyyyyy/Youtube-ETL-Pipeline-For-Data-Analysis

Youtube ETL pipeline Project Using Pyspark and AWS

Language: Python - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

eljandoubi/aws-human-balance-analytics

Using AWS Glue, AWS S3, Python, and Spark, create or generate Python scripts to build a lakehouse solution in AWS

Language: Python - Size: 1.54 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tracebit-com/cloudtrail-latency-investigation

Jupyter notebook for investigating CloudTrail latency using Athena and matplotlib.

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

nnthanh101/sentiment-analysis

Voice of the Customer (VoC) to enhance customer experience with serverless architecture and sentiment analysis, using Amazon Kinesis, Amazon Athena, Amazon QuickSight, Amazon Comprehend, and ChatGPT-LLMs for sentiment analysis.

Language: JavaScript - Size: 7.78 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 5

exasol/athena-virtual-schema

Virtual Schema for connecting Athena as a data source to Exasol

Language: Java - Size: 66.4 KB - Last synced at: 30 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 1

omkarfadtare/Practical_data_science

These are the handwritten notes on Coursera's Practical data science specialization course.

Size: 82 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

sarutlaa/Spotify-End-to-End-Data-Pipeline

ETL Data Pipeline built using AWS Offerings

Language: Jupyter Notebook - Size: 104 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

RuFerdZ/Medical-X

US Insurance cost predicting linear regression model. Mainly used to learn about Machine Learning tools in Amazon Web Services (AWS)

Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 1

tmheo/spark-athena

AWS Athena data source for Apache Spark

Language: Scala - Size: 8.88 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 24 - Forks: 7

daniel-cortez-stevenson/aws-athena-udfs-h3

This connector extends Amazon Athena's capability by adding UDFs (via Lambda) for selected [h3-java](https://github.com/uber/h3-java) Java functions to support geospatial indexing and queries with Uber's [H3](https://h3geo.org/)

Language: Java - Size: 1.11 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 16 - Forks: 1

epomatti/aws-elb-access-logs

Access logs for ELB

Language: HCL - Size: 154 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

imverma/DataEngineering-YouTube-Analysis-Project

An end-to-end solution for managing and analyzing YouTube video data from Kaggle, leveraging AWS services and visualized through Quicksight and Tableau

Language: Python - Size: 61.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

markoshlima/positional-file-process

This project is based for legacy applications that works with positional files to process data. The objetive is read these positional files when they arrives in AWS S3, and then send to a dataware-house like AWS Redshift, and finally read the results with a Business Intelligence tool as AWS QuickSight.

Size: 873 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

tylerdsilva/Brackets-Analytics-Dashboard

User, Event, and Predictive Metric Dashboard on 2GB/month of log files from Brackets IDE

Language: JavaScript - Size: 545 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bhavya1917/Layoffs_Decoded

Demystifying ~400K layoffs to analyze underlying causes and predict future trends of layoffs by different companies.

Language: Jupyter Notebook - Size: 38.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tylerdsilva/Layoffs-Decoded

Demystifying ~400K layoffs to analyze underlying causes and predict future trends

Language: Jupyter Notebook - Size: 38.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

bhavya1917/Brackets_Analytics_Dashboard

User, event, and predictive metric dashboard on log files from Brackets IDE.

Language: JavaScript - Size: 545 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

burtcorp/athena-runner

Runs Athena queries with AWS Lambda and Step Functions

Language: Makefile - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 18 - Forks: 9