Topic: "big-data-analytics"
ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Language: Python - Size: 841 MB - Last synced at: about 9 hours ago - Pushed at: 4 days ago - Stars: 12,866 - Forks: 1,707

ICT-BDA/EasyML
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
Language: Java - Size: 14.9 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,978 - Forks: 440

dongsuo/vue-data-board
A Data Analysis Board in Vue.
Language: Vue - Size: 10.4 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 1,326 - Forks: 292

mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 1,217 - Forks: 475

v6d-io/v6d
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Language: C++ - Size: 19.3 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 878 - Forks: 124

MrXujiang/v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Language: TypeScript - Size: 36 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 642 - Forks: 140

metatron-app/metatron-discovery
Powerful & Easy way for big data discovery
Language: TypeScript - Size: 93.3 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 444 - Forks: 111

caioricciuti/ch-ui
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.
Language: TypeScript - Size: 24.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 346 - Forks: 26

lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Language: Python - Size: 12.9 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 329 - Forks: 111

rouyang2017/SISSO
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
Language: Fortran - Size: 3.88 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 273 - Forks: 86

Ashish7129/Graph_Sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Language: Python - Size: 4.91 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 161 - Forks: 50

archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language: Scala - Size: 39.5 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 32

FTiniNadhirah/Coursera-and-EdX-courses-answers
This is about learning courses in Coursera. All the answers given written by myself
Language: HTML - Size: 476 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 74 - Forks: 40

panstacks/pandata
The Pandata scalable open-source analysis stack
Size: 1.25 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 64 - Forks: 1

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Language: Scala - Size: 11.3 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 63 - Forks: 46

drshahizan/BDM
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
Language: Jupyter Notebook - Size: 102 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 48 - Forks: 46

u2i/egis
Egis - a handy Ruby interface for AWS Athena
Language: Ruby - Size: 317 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 41 - Forks: 2

tatsuiman/rpot2
Real-time Packet Observation Tool
Language: Bro - Size: 145 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 6

trieu/leo-cdp-free-edition
The binary build of LEO CDP Free Edition for training purposes
Language: HTML - Size: 782 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 38 - Forks: 14

ingef/conquery
Visual, interactive queries against big databases
Language: Java - Size: 48.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 37 - Forks: 13

maniram-yadav/Big_DataHadoop_Projects
Big data projects implemented by Maniram yadav
Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 33 - Forks: 33

jackkolokasis/teraheap
TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks
Size: 537 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 28 - Forks: 12

GMAP/DSPBench
a suite of benchmark applications for distributed data stream processing systems
Language: Java - Size: 250 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 28 - Forks: 3

Wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Language: Python - Size: 3.61 MB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 13

arakat-community/arakat 📦
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Language: Python - Size: 31.6 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 27 - Forks: 21

eskimo-sh/eskimo
Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.
Language: Java - Size: 39.9 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 7

airflow-plugins/pandora-plugin
Plugin offering views, operators, sensors, and more developed at Pandora Media.
Language: Python - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 25 - Forks: 6

scalytics/SDE
Scalytics Connect development environment, pre-build
Language: Jupyter Notebook - Size: 34 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 8

K-G-PRAJWAL/Big-Data-Engineering
Language: PLpgSQL - Size: 254 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 21 - Forks: 14

suzumura/graph500
World championship code for Graph500
Language: C - Size: 437 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 7

OwenOrcan/YiraBot-Crawler
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
Language: Python - Size: 221 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 0

jaanli/american-community-survey
American Community Survey data on people and households
Language: Jupyter Notebook - Size: 142 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 1

Azure/AzureKusto
R interface to Azure Data Explorer, aka Kusto
Language: R - Size: 400 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 5

jdvelasq/courses
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
Language: Python - Size: 470 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 16 - Forks: 7

AWS-Big-Data-Projects/Iot-and-Big-Data-Application-using-aws-and-apache-kafka
Iot,Big Data Analytics using Apache-kafka,spark and other aws services
Language: Python - Size: 18.6 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

seeratawan01/autocapture.js
Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.
Language: TypeScript - Size: 554 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 1

klugem/watchdog
Workflow management system for the automated and distributed analysis of large-scale experimental data.
Language: Java - Size: 193 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 4

AWS-Big-Data-Projects/aws-serverless-data-lake-workshop
This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion
Language: Jupyter Notebook - Size: 31 MB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 5

Dammonoit/Student-performance-analysis-using-Big-data
This project analyses and correlates student performance with different attributes. Then at last, it determines most suitable algorithm from bunch of them.
Language: Python - Size: 1.48 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 12 - Forks: 11

y0nil/kusto.blog
A technical blog about Kusto
Language: HTML - Size: 2.72 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 11 - Forks: 2

FIWARE/tutorials.Big-Data-Flink
:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink
Language: Shell - Size: 37.5 MB - Last synced at: 25 days ago - Pushed at: 2 months ago - Stars: 10 - Forks: 5

Amey-Thakur/OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING
Big Data Analytics [BDA] Mini Project
Language: Jupyter Notebook - Size: 2.55 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 1

XuanyouLiu/US-Real-Estate-Analysis
US Real Estate Rental Price Analysis
Language: Jupyter Notebook - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

SrLozano/Tinder-Big-Data-Analysis
Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech
Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 5

adityakamble49/loss-ratio-prediction
Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage
Language: Jupyter Notebook - Size: 71.8 MB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 10 - Forks: 2

ThinkBigEg/influxDB-grafana-gke
In this tutorial we explain how to get real time analytics of energy produced and consumed from two solar stations simulators using influxDB together with grafana hosted on the kubernetes engine of google
Language: Python - Size: 457 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 1

ys2843/million-song-dataset-analysis
Big Data Analysis on Million Song Dataset
Language: PigLatin - Size: 26.4 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 10 - Forks: 3

big-data-lab-team/accident-prediction-montreal
Language: Jupyter Notebook - Size: 65 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 7

SkipToYourSoul/keep-hungry-stay-foolish
A Personal Work Notebook on Gitbook. 1)编程知识点总结;2)大数据场景下的用户数据解决方案实例
Language: HTML - Size: 5.44 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 9 - Forks: 0

rapticore/ssvc_ore_miner
SSVC Ore Miner - www.rapticore.com
Language: Python - Size: 433 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 8 - Forks: 1

Amey-Thakur/BIG-DATA-ANALYTICS-AND-COMPUTATIONAL-LAB-I
CSDLO7032: Big Data Analytics & CSL704: Computational Lab - I <Semester VII>
Language: Jupyter Notebook - Size: 183 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

adityajain10/pyspark-mlib-based-stock-predictor
PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data collected every hour for 20 year for 500 companies obtained via Alpha Vantage API
Language: CSS - Size: 13.7 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 6

epidataio/epidata-community
EpiData IoT Data Science Platform - Community Edition
Language: Python - Size: 7.56 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 7

N1ghtF1re/Map-of-emergency-incidents
Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information
Language: PHP - Size: 71.2 MB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 8 - Forks: 5

yaoguangluo/ChromosomeDNA
《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.
Language: Java - Size: 676 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 7 - Forks: 2

Amey-Thakur/HADOOP
HADOOP
Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

jimit105/Computer-Engineering-Programs
Programs for various subjects of Computer Engineering
Language: C - Size: 19.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

kaushik03/Modern-Big-Data-Analysis-using-SQL
RDBMS techniques for Big Data analysis
Size: 1.57 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 1

Nico-Curti/PhDthesis
PhD thesis in Applied Physics
Language: TeX - Size: 220 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 0

asavinov/bistro
A general-purpose data analysis engine radically changing the way batch and stream data is processed
Language: Java - Size: 2.16 MB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 0

Ren294/Covid-Data-Process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
Language: Shell - Size: 6.22 MB - Last synced at: 15 days ago - Pushed at: 7 months ago - Stars: 6 - Forks: 0

MSUSAzureAccelerators/Workplace-Intelligence-Accelerator
The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.
Language: TSQL - Size: 22.3 MB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 3

hisah/Multilevel-Streaming-Analytics
A Multilevel Streaming Data Analytics Infrastructure for Predictive Analytics
Size: 51.4 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 5

oprecomp/oprecomp
The Horizon 2020 Open Transprecision Computing project
Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 4

Mgosi/Big-Data-Analysis-using-MapReduce-in-Hadoop
We explore data by using Big Data Analysis and Visualization skills. To obtain this, we perform 3 main operations. i.e. i)Data Aggregation through different sources. ii) Big Data Analysis using MapReduce and iii) Visualization through Tableau. Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.
Language: Jupyter Notebook - Size: 16.8 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 3

matthew-mcateer/MIT_Policy_Hackathon
Repository for the winning code for the Internet & Cybersecurity track of the MIT Policy Hackathon.
Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: 2 days ago - Pushed at: about 7 years ago - Stars: 6 - Forks: 0

FreeIPCC/FreeWorkPhone
企业手机,工作手机,商务手机,企业数据沉淀,销冠手机,定制版企业手机,智能手机。
Size: 191 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

haustcsa/SocialSituSecu
SocialSituSecu is a project exploring the social network security, computing and intelligence basd on social situational metadata, which is sponsored by National Natural Science Foundation of China Grant No.61972133, and Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant No.204200510021.
Language: Python - Size: 87.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

Ren294/Log-Analysis-Project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Language: Python - Size: 2.88 MB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 5 - Forks: 1

bydevmar/Master_MASD_FPO
Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.
Language: Jupyter Notebook - Size: 153 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

noobpk/gemini-web-vulnerability-detection
Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning
Language: Python - Size: 50.8 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 5 - Forks: 0

KayvanShah1/Big-Data-Specialization-Coursera
Repository for the Big Data Specialization from University of California San Diego on Coursera
Language: Python - Size: 20 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

tabletop-labs/tabletop
A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse
Language: Go - Size: 290 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

Gugo-le/student-performance-predict
Big data was learned using tensorflow.
Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

AnabelSMRuggiero/NNDescent.cpp
A C++ iteration of PyNNDescent.
Language: C++ - Size: 1.53 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

jofaval/tfm-iabd
Master's Final Degree Project on Artificial Intelligence and Big Data
Language: Shell - Size: 17.1 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 4

Thomas-George-T/File-Processing-Comparative-Analytics
Wanna know which languages and execution engines are the quickest or the slowest at processing files? Well here's your answer. 📊 Data Analysis & comparison between the time taken ⌚ for computing word counts in various languages and execution engines for files of different sizes.
Language: Jupyter Notebook - Size: 4.34 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

exajobs/data-engineering-collection
A collection of awesome software, libraries, Learning Tutorials, documents, books, resources and interesting stuff about Big Data Science & Engineering
Size: 241 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

JavadDogani/Multivariate-Cloud-workload-analysis
This repository analyzes the Multivariate workload data of Google Cluster machines.
Language: Python - Size: 880 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

UCLA-SEAL/Semeru
Semeru: A Memory-Disaggregated Managed Runtime (OSDI 2020)
Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
Language: Jupyter Notebook - Size: 150 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

matteocereda/GSECA
Gene Set Enrichment Class Analysis for heterogeneous RNA sequencing data
Language: R - Size: 56.4 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

anshul1004/LiveTwitterSentimentAnalysis
Live sentiment analysis of tweets using Kafka
Language: Python - Size: 4.63 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 1

PotatoSpudowski/The-centralized-guide-to-distributed-storage-and-processing-of-big-data
This is a repository containing my code samples that helped me understand the concepts of distributed storage and processing of Big data using Apache spark and Python.
Language: Python - Size: 8.56 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 3

sahith/Link-Prediction-for-Citation-Networks-using-Apache-Spark
Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.
Language: Scala - Size: 6.41 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1

JackSnowWolf/EECS_E6893_Big_Data_Analytics_Homework
Homework for Big Data Analytic @ columbia university
Language: Python - Size: 12.9 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 11

vvittis/FlinkSampling
Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.
Language: Java - Size: 69.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

JINHXu/how-much-hate-with-china
Code repository for the paper: How Much Hate with #china? A Preliminary Analysis on China-related Hateful Tweets Two Years After the Covid Pandemic Began
Language: Jupyter Notebook - Size: 6.69 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

TirendazAcademy/Hands-on-Data-Science-with-GCP
Google BigQuery Tutorial
Language: Jupyter Notebook - Size: 366 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

sharma-n/global_event_analytics
Big data analytics using Hadoop on GDELT global news dataset.
Language: Java - Size: 2.66 MB - Last synced at: 10 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

manthanthakker/MapReduceSpark
CS6240 - Large Scale Parallel Processing Course at Northeastern University
Language: Java - Size: 78.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 0

chandrahask535/Big-Data-Analysis-to-Identify-Adverse-Effects-of-Covid-19-Vaccines2.0
This project utilizes big data analytics, machine learning, and statistical methods to identify and classify adverse effects of COVID-19 vaccinations. By analyzing large datasets, it aims to uncover patterns and correlations, providing valuable insights into vaccine safety and efficacy.
Language: Python - Size: 5.71 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

nssharmaofficial/market-basket-analysis
Market basket analysis using Pandas and next order prediction by XGBoost
Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 19 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

dgkanatsios/GameAnalyticsEventHubFunctionsCosmosDatalake
Big data reference architecture and implementation for an online multiplayer game
Language: JavaScript - Size: 563 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

GiovanniMerici/Big-Data-in-Linguistics
Supporting code for big-data analysis in linguistics
Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

geraked/bigdata
Implementation of Big Data Analytics Algorithms in Python
Language: Jupyter Notebook - Size: 11 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

dwsmith1983/space-filling-curves
Space filling curve library for Spark
Language: Scala - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

BMClab/covid19
BMClab's research on Covid-19
Language: Jupyter Notebook - Size: 15.6 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 3

Bayunova28/SAS_Visual_Data_Mining_Machine_Learning
This repository contains about my weekly projects from Big Data Analytics II course at my college
Language: SAS - Size: 2.66 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Sayan3990/Big-Data-PPT
Research on Innovation of Automobile Marketing Mode Based on Big Data Marketing
Size: 15.3 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1
