GitHub topics: big-data-analytics
ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Language: Python - Size: 840 MB - Last synced at: 43 minutes ago - Pushed at: 4 days ago - Stars: 12,986 - Forks: 1,722

bose234/data-storage-project
This repository contains a data storage project focused on analyzing sales and returns using a real-world dataset. It features SQL-based ETL processes, data visualization with Tableau, and a comparison of relational and graph databases. 🐙📊
Language: TSQL - Size: 24.5 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Peippo1/TrendNest
TrendNest is a modular, AI-integrated data pipeline that extracts, cleans, models, and visualizes time-based trends from data. It includes Gemini 1.5 summarisation, CSV export, and a dashboard UI. Built with Python, SQL, and BigQuery support, and fully dockerized for deployment —for data engineering and analytics portfolios.
Language: Python - Size: 93.7 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

Lara33779/Alibaba-Cloud-Useful-Resources
This repository shares useful resources, updates, and tips to help you navigate the world of cloud computing with Alibaba Cloud.
Size: 388 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

clairelee-codes/bigdata-analyst-notes
Big Data Analyst Certification
Language: Jupyter Notebook - Size: 56.6 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

v6d-io/v6d
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Language: C++ - Size: 19.4 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 902 - Forks: 124

ingef/conquery
Visual, interactive queries against big databases
Language: Java - Size: 49 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 37 - Forks: 13

HuaTanSang/VreID
Vehicle re-identification - Big data analysis final project
Language: Python - Size: 5.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

dongsuo/vue-data-board
A Data Analysis Board in Vue.
Language: Vue - Size: 10.4 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1,329 - Forks: 291

MrXujiang/v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Language: TypeScript - Size: 36 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 659 - Forks: 152

madhurimarawat/Madhurima-Mindscape
This is a personal blog where I share a variety of content, including personal reflections, tech insights, project diaries, and creative photography. Explore different categories such as personal growth, tech insights, and project experiences.
Language: HTML - Size: 27.9 MB - Last synced at: 21 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Language: Python - Size: 12.9 MB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 334 - Forks: 114

mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 1,223 - Forks: 478

jdvelasq/courses
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
Language: Python - Size: 470 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 17 - Forks: 7

ICT-BDA/EasyML
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Language: Java - Size: 14.9 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 1,976 - Forks: 439

varshithdupati/yelp-business-analysis
Big Data analysis on Yelp reviews/businesses for Arizona. Using Hadoop, Spark, PySpark.
Language: Jupyter Notebook - Size: 686 KB - Last synced at: 16 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

JuanParias29/BigDataProcessingProject
Este repositorio contiene un proyecto de análisis y procesamiento de datos a gran escala basado en la metodología CRISP-DM, enfocado en resolver preguntas de negocio dentro del ámbito educativo.
Language: Jupyter Notebook - Size: 4.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

ceredamatteo-lab/GSECA
Gene Set Enrichment Class Analysis for heterogeneous RNA sequencing data
Language: R - Size: 56.4 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 1

theoliverlear/Crypto-Trader
A Spring Boot web app that buys and sells cryptocurrencies from API data sources. Its quick trading and other features allow users to leverage computer power to outperform the market.
Language: Java - Size: 34.2 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

mgunawardhana/learning-RStudio
R is a programming language used for statistical analysis, data visualization, and data science. It is widely used by researchers, data analysts, and scientists around the world.
Language: R - Size: 621 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 33

PriyankaJhaTheDeveloper/YellowTaxiNYC_HiveCaseStudy
This repository shows the Case Study of Yellow Taxi Cabs of NYC, using the Hadoop-Hive ecosystem with HiveQL.
Size: 556 KB - Last synced at: 28 days ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 3

SepidehHayati/Projects
Includes both my personal and academic projects, reports, assignments at the University of Pavia.
Language: Jupyter Notebook - Size: 30 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Irish-C/learner_dashboard
A Group Project for Big Data Analytics with Dash.
Language: Python - Size: 62 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

LatiefDataVisionary/big-data-and-data-analytics-college-task
Language: Jupyter Notebook - Size: 63.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JKA098/Pokemon-Feistiness-MapReduce-Job
This Project aims to implement a **Hadoop MapReduce job in Pseudo-Distributed Mode** to determine the **feistiest Pokémon** based on their **type**. The job processes the Pokémon dataset (`pokemon.csv`) and outputs a CSV file containing Pokémon **type1, type2, name, and feistiness score**.
Language: Python - Size: 220 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

JKA098/CSTADS-2021-22-Substance-Use-Analysis
The **Canadian Student Tobacco, Alcohol and Drugs Survey (CSTADS)** 2021–22 dataset is analyzed to explore: * Provincial variation in youth **cannabis**, **alcohol**, and **tobacco** use * The impact of **cannabis legalization** * Access networks for each substance * Regional policy implications using **geospatial** and **network** analysis
Size: 4.37 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

adwaiy2912/BDA-Lab
Repository contains weekly lab work and assignments for the Big Data Analytics (BDA) course
Language: Python - Size: 7.8 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

vinay-ram1999/data-engineer-playground
Language: TypeScript - Size: 9.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

prof-rjimenez/cit_bigdata_basico
Repositorio para las clases de laboratorio del curso básico de introducción a Big Data.
Language: Python - Size: 97.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 4

LatiefDataVisionary/big-data-for-data-science-college-task
Language: Mermaid - Size: 3.73 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

y0nil/kusto.blog
A technical blog about Kusto
Language: HTML - Size: 2.78 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 2

MrHAM17/Spotify_Streaming_Analytics
This is my Sem 7 BDA Lab Project. For complete details, kindly check the below README File.
Language: Jupyter Notebook - Size: 14.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

exajobs/data-engineering-collection
A collection of awesome software, libraries, Learning Tutorials, documents, books, resources and interesting stuff about Big Data Science & Engineering
Size: 241 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 7 - Forks: 1

rouyang2017/SISSO
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Language: Fortran - Size: 3.88 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 276 - Forks: 85

K-G-PRAJWAL/Big-Data-Engineering
Language: PLpgSQL - Size: 254 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 22 - Forks: 14

caioricciuti/ch-ui
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.
Language: TypeScript - Size: 24.1 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 346 - Forks: 26

trieu/leo-cdp-free-edition
The binary build of LEO CDP Free Edition for training purposes
Language: HTML - Size: 782 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 38 - Forks: 14

yaoguangluo/ChromosomeDNA
《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.
Language: Java - Size: 676 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 2

Yash22222/Olympic-Games-Analytics-Using-Apache-Spark
The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.
Language: HTML - Size: 18.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Houssam-11/BigData-Architecture
Big Data system predicts pandemic risk (COVID-19) via data analysis, ML modeling, and real-time dashboard.
Language: Python - Size: 29 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

bydevmar/Master_MASD_FPO
Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.
Language: Jupyter Notebook - Size: 155 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

KayvanShah1/Big-Data-Specialization-Coursera
Repository for the Big Data Specialization from University of California San Diego on Coursera
Language: Python - Size: 20 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

bilgeswe/BigDataManagement
Building a Data Pipeline with Lakehouse Architecture on Microsoft Azure Platform
Language: TSQL - Size: 2.02 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SrLozano/Tinder-Big-Data-Analysis
Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 5

FastTrack-Academy/adolescent-suicide-dashboard
An interactive data visualization and analytics tool designed to analyze risk factors, trends, and disparities in adolescent suicide rates. Using machine learning and open data, this dashboard helps policymakers, educators, and mental health professionals identify patterns and develop prevention strategies to support adolescent well-being. 🚀
Language: HTML - Size: 11.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

chandrahask535/Big-Data-Analysis-to-Identify-Adverse-Effects-of-Covid-19-Vaccines2.0
This project utilizes big data analytics, machine learning, and statistical methods to identify and classify adverse effects of COVID-19 vaccinations. By analyzing large datasets, it aims to uncover patterns and correlations, providing valuable insights into vaccine safety and efficacy.
Language: Python - Size: 5.71 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

ToheedAsghar/Smart-Meters-London-Analytics
This project analyzes the Smart Meters in London dataset, performing data preprocessing, EDA, and predictive modeling to forecast energy usage and identify optimization opportunities. It demonstrates my expertise in transforming raw data into actionable insights for improving energy efficiency using AI and real-world datasets.
Language: Jupyter Notebook - Size: 2.17 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

rapticore/ssvc_ore_miner
SSVC Ore Miner - www.rapticore.com
Language: Python - Size: 433 KB - Last synced at: 21 days ago - Pushed at: 7 months ago - Stars: 9 - Forks: 1

grahman20/ADF
Adaptive Decision Forest(ADF) is an incremental machine learning framework called to produce a decision forest to classify new records. ADF is capable to classify new records even if they are associated with previously unseen classes. ADF also is capable of identifying and handling concept drift; it, however, does not forget previously gained knowledge. Moreover, ADF is capable of handling big data if the data can be divided into batches.
Language: Java - Size: 1.63 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

Nico-Curti/PhDthesis
PhD thesis in Applied Physics
Language: TeX - Size: 220 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 8 - Forks: 0

Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language: Python - Size: 3.61 MB - Last synced at: 2 months ago - Pushed at: about 3 years ago - Stars: 27 - Forks: 13

yuvrajsaraogi/Unemployment-Analysis-with-Python
Unemployment is measured by the unemployment rate which is the number of people who are unemployed as a percentage of the total labour force. We have seen a sharp increase in the unemployment rate during Covid-19, so analyzing the unemployment rate can be a good data science project.
Language: Jupyter Notebook - Size: 244 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

OwenOrcan/YiraBot-Crawler
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
Language: Python - Size: 221 KB - Last synced at: 20 days ago - Pushed at: 7 months ago - Stars: 19 - Forks: 0

JoseRuiz01/AirlineOn-TimePerformanceAnalysis
Airline on-time performance analysis using Spark Machine Learning libraries
Size: 0 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

SohelRana-aiub-Pro/Traffic-Forecasting-Graph-Neural-Networks-LSTM
https://docs.omniverse.nvidia.com/prod_install-guide/prod_install-guide/overview.html
Language: Jupyter Notebook - Size: 1.07 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

deepaiimpactx/BARS
Language: Python - Size: 16.2 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

BasharatWali/Medicine_Rec_System
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

khushi-sabarad/PsyliqIntenshipDataAnalysis
Big Data Analysis Internship. Diabetes Prediction, HR & Employee Data Analysis. Tools: SQL, Power BI and Excel
Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

JosepSampe/lithops Fork of lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Language: Python - Size: 12.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

Advaitiyer/advaitiyer.github.io
Data Scientist's Portfolio covering the topics: Big Data Analytics, Information Visualization, Advanced Data Mining, Applied Data Analytics, Financial, and Marketing Analytics, Artificial Intelligence, and Deep Learning.
Language: HTML - Size: 53.2 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ellie991/Titanic-Dataset-Analysis
Big Data Analysis on Titanic Dataset
Language: R - Size: 190 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ellie991/Spark-Spotify-Analysys
SPOTIFY - Big Data Analysis w/ Spark
Language: Python - Size: 11 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

GMAP/DSPBench
a suite of benchmark applications for distributed data stream processing systems
Language: Java - Size: 250 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 28 - Forks: 3

Matt-J-Dong/Top-Towns-To-Take-Over-Tech
Which American cities are the best for tech jobs?
Language: Scala - Size: 12.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1

Kaustubh-Indulkar/TE-IT-DSBDA-Assignmnets
This repository contains the solutions for a series of assignments covering Data Science And Big Data Analytics concepts.
Language: Jupyter Notebook - Size: 9.71 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

lesleyzhao/Bus_Delays_Analysis
Bus Delays Analysis is a big data analytics project designed to do ETL and analyze bus delays using Scala, Apache Spark, and HDFS.
Language: Scala - Size: 12.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

msche81/2-Jedha_Fullstack
450h Data Scientist training - Collect and store large amounts of data - Build prediction models in Machine Learning and Deep Learning - Deploy your models in real conditions
Language: Jupyter Notebook - Size: 248 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

subhanjandas/RDBMS-to-GraphDB---Big-Data-Analytics-using-Neo4j
This project involves migration from a traditional RDBMS to Neo4j for big data analytics. Using graph database technology, various business-critical questions are addressed, including identifying the employees who sold Tofu, the products sold with Tofu, the total number of products, top 5 products by sales, and the category with the highest sales.
Language: JavaScript - Size: 668 KB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

tashi-2004/Apache-Hadoop-Spark-Hive-CyberAnalytics
This project utilizes Apache Hadoop, Hive, and PySpark to process and analyze the UNSW-NB15 dataset, enabling advanced query analysis, machine learning modeling, and visualization. The project demonstrates efficient data ingestion, processing, and predictive analytics for network security insights.
Language: Jupyter Notebook - Size: 2.62 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

OuchenOussama/hespressence
Kappa Architecture Based Sentiment Analysis System for User Comments
Language: Python - Size: 10.8 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

mohamedsaleh1984/twitter-spark
Fetch data from Twitter and push it through Kafka to Spark then HDFS
Language: Java - Size: 7.82 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Dare-marvel/Big-Data-Analytics--BDA--
💾 Welcome to the Big Data Analytics Repository! 📚✨ Immerse yourself in a carefully curated reservoir of knowledge on Big Data Analytics. 🌐💡 Explore the intricacies of deriving insights from vast datasets and navigating powerful analytics tools. 🚀🔍
Language: Java - Size: 174 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 2

shivatejapecheti/Twitter-Live-Feed-Analysis-and-Streaming-for-Movies
Bigdata Analysis Project
Language: Jupyter Notebook - Size: 165 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

AbdullahKhurshid/ecommerce-marketing-analytics
Using Apache Spark for marketing analytics
Language: R - Size: 2.3 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

madhurimarawat/Big-Data-Analytics
This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 1

sxhixho/Preprocessing_Analysis
A project that demonstrates data storage, preprocessing, and analysis using tools like HDFS, Apache Pig, and Hive, executed in an Azure virtual machine environment. The project includes cleaning and aggregating a Spotify dataset and running Hive queries to extract meaningful insights.
Size: 4.24 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

notnayan/WLV_HCK
You're welcome.
Language: Jupyter Notebook - Size: 99.4 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

wandersonlira/state-of-data-brazil-2023
Este repositório abriga o projeto acadêmico da disciplina de Tópicos de Big Data em Python. O projeto analisa os dados da pesquisa anual "State of Data Brazil", realizada pela comunidade Data Hackers em parceria com a Bain & Company.
Language: Jupyter Notebook - Size: 17.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 3

Amey-Thakur/BIG-DATA-ANALYTICS-AND-COMPUTATIONAL-LAB-I
CSDLO7032: Big Data Analytics & CSL704: Computational Lab - I <Semester VII>
Language: Jupyter Notebook - Size: 183 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

metatron-app/metatron-discovery
Powerful & Easy way for big data discovery
Language: TypeScript - Size: 93.3 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 444 - Forks: 111

Radhikareddy-chintareddy/Big-Data-Analysis-NY-Weather-Air-Quality-2022
End-to-end workflow showcasing database setup, API development, and interactive data retrieval of large datasets. Includes integration and analysis of 2022 SURFACE HOURLY weather data (global, US, and NY) merged with NY air pollution data from the EPA to uncover actionable insights.
Language: Jupyter Notebook - Size: 3.47 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Radhikareddy-chintareddy/Big-Data-Insights-NYC-Taxi-Trips-2013-
A project showcasing memory-efficient big data processing using Python, focusing on scalable data handling to overcome memory constraints. Includes anomaly detection, efficient visualizations, and actionable insights from the 2013 NYC Taxi Trip dataset.
Language: Jupyter Notebook - Size: 2.49 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

haustcsa/SocialSituSecu
SocialSituSecu is a project exploring the social network security, computing and intelligence basd on social situational metadata, which is sponsored by National Natural Science Foundation of China Grant No.61972133, and Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant No.204200510021.
Language: Python - Size: 87.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 1

jaanli/american-community-survey
American Community Survey data on people and households
Language: Jupyter Notebook - Size: 142 MB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 19 - Forks: 1

sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
Language: Python - Size: 32.2 KB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

AAlkiyumi/Predicting-Hospital-Readmission-Risk
This project aims to create a predictive model that forecasts the likelihood of a patient being readmitted to the hospital within 30 days of discharge.
Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

tabletop-labs/tabletop
A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse
Language: Go - Size: 290 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

bibekbhatta/BusinessAnalytics
Anyone (including beginners) can use these resources to get started with accessing, cleaning, and analysing different kinds of data in Python. No installation required. No registration required.
Language: Jupyter Notebook - Size: 84.1 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

bryanfks-dev/Klempoken-Analysis
Analysis and forcasting model for Klempoken MSMEs
Language: Jupyter Notebook - Size: 6.19 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

mehwishferoz/BDA-project
A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.
Language: Java - Size: 7.42 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

Amey-Thakur/OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING
Big Data Analytics [BDA] Mini Project
Language: Jupyter Notebook - Size: 2.55 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

waseemsalami/project-Big-Data-in-behavioral-science-
An exciting Big Data project done during a course I took at the Technion university
Language: HTML - Size: 31.8 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

madhurimarawat/Python-Projects
This repository contains the projects that I made in the Python programming language.
Language: Jupyter Notebook - Size: 17.6 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

MSUSAzureAccelerators/Workplace-Intelligence-Accelerator
The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.
Language: TSQL - Size: 22.3 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 6 - Forks: 3

chaaalistaa/Thelookecommerce---Project
Analysis "TheLook" eCommerce with highlight goals such as identifying sales trends, understanding customer behaviors, enhancing customer retention, and driving repeat purchases.
Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Ashish7129/Graph_Sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Language: Python - Size: 4.91 MB - Last synced at: 7 months ago - Pushed at: over 4 years ago - Stars: 161 - Forks: 50

BhushanSagar/Telecom-Data-Analysis
Telecom Data Analysis with Apache Hive
Language: HiveQL - Size: 357 KB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
Language: Shell - Size: 6.22 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 0

hatoonguls/Big-Data-Analytics
The repositary contains big data analytics projects using Apache Spark, SQL, and Machine Learning models.
Language: Python - Size: 197 KB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0
