An open API service providing repository metadata for many open source software ecosystems.

Topic: "big-data-analytics"

ydataai/ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Language: Python - Size: 841 MB - Last synced at: about 9 hours ago - Pushed at: 4 days ago - Stars: 12,866 - Forks: 1,707

ICT-BDA/EasyML

Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.

Language: Java - Size: 14.9 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 1,978 - Forks: 440

dongsuo/vue-data-board

A Data Analysis Board in Vue.

Language: Vue - Size: 10.4 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 1,326 - Forks: 292

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark

Language: Jupyter Notebook - Size: 8.96 MB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 1,217 - Forks: 475

v6d-io/v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

Language: C++ - Size: 19.3 MB - Last synced at: 12 days ago - Pushed at: about 1 month ago - Stars: 878 - Forks: 124

MrXujiang/v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Language: TypeScript - Size: 36 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 642 - Forks: 140

metatron-app/metatron-discovery

Powerful & Easy way for big data discovery

Language: TypeScript - Size: 93.3 MB - Last synced at: 27 days ago - Pushed at: about 1 year ago - Stars: 444 - Forks: 111

caioricciuti/ch-ui

Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.

Language: TypeScript - Size: 24.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 346 - Forks: 26

lithops-cloud/lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

Language: Python - Size: 12.9 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 329 - Forks: 111

rouyang2017/SISSO

A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.

Language: Fortran - Size: 3.88 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 273 - Forks: 86

Ashish7129/Graph_Sampling

Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.

Language: Python - Size: 4.91 MB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 161 - Forks: 50

archivesunleashed/aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Language: Scala - Size: 39.5 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 32

FTiniNadhirah/Coursera-and-EdX-courses-answers

This is about learning courses in Coursera. All the answers given written by myself

Language: HTML - Size: 476 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 74 - Forks: 40

panstacks/pandata

The Pandata scalable open-source analysis stack

Size: 1.25 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 64 - Forks: 1

Thomas-George-T/Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Language: Scala - Size: 11.3 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 63 - Forks: 46

drshahizan/BDM

Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.

Language: Jupyter Notebook - Size: 102 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 48 - Forks: 46

u2i/egis

Egis - a handy Ruby interface for AWS Athena

Language: Ruby - Size: 317 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 41 - Forks: 2

tatsuiman/rpot2

Real-time Packet Observation Tool

Language: Bro - Size: 145 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 6

trieu/leo-cdp-free-edition

The binary build of LEO CDP Free Edition for training purposes

Language: HTML - Size: 782 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 38 - Forks: 14

ingef/conquery

Visual, interactive queries against big databases

Language: Java - Size: 48.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 37 - Forks: 13

maniram-yadav/Big_DataHadoop_Projects

Big data projects implemented by Maniram yadav

Language: PigLatin - Size: 2.79 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 33 - Forks: 33

jackkolokasis/teraheap

TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks

Size: 537 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 28 - Forks: 12

GMAP/DSPBench

a suite of benchmark applications for distributed data stream processing systems

Language: Java - Size: 250 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 28 - Forks: 3

Wittline/pyspark-on-aws-emr

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

Language: Python - Size: 3.61 MB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 13

arakat-community/arakat 📦

ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform

Language: Python - Size: 31.6 MB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 27 - Forks: 21

eskimo-sh/eskimo

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Language: Java - Size: 39.9 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 25 - Forks: 7

airflow-plugins/pandora-plugin

Plugin offering views, operators, sensors, and more developed at Pandora Media.

Language: Python - Size: 34.2 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 25 - Forks: 6

scalytics/SDE

Scalytics Connect development environment, pre-build

Language: Jupyter Notebook - Size: 34 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 8

K-G-PRAJWAL/Big-Data-Engineering

Language: PLpgSQL - Size: 254 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 21 - Forks: 14

suzumura/graph500

World championship code for Graph500

Language: C - Size: 437 KB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 7

OwenOrcan/YiraBot-Crawler

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.

Language: Python - Size: 221 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 0

jaanli/american-community-survey

American Community Survey data on people and households

Language: Jupyter Notebook - Size: 142 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 19 - Forks: 1

Azure/AzureKusto

R interface to Azure Data Explorer, aka Kusto

Language: R - Size: 400 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 18 - Forks: 5

jdvelasq/courses

Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia

Language: Python - Size: 470 MB - Last synced at: about 19 hours ago - Pushed at: about 21 hours ago - Stars: 16 - Forks: 7

AWS-Big-Data-Projects/Iot-and-Big-Data-Application-using-aws-and-apache-kafka

Iot,Big Data Analytics using Apache-kafka,spark and other aws services

Language: Python - Size: 18.6 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 3

seeratawan01/autocapture.js

Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.

Language: TypeScript - Size: 554 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 13 - Forks: 1

klugem/watchdog

Workflow management system for the automated and distributed analysis of large-scale experimental data.

Language: Java - Size: 193 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 12 - Forks: 4

AWS-Big-Data-Projects/aws-serverless-data-lake-workshop

This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion

Language: Jupyter Notebook - Size: 31 MB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 5

Dammonoit/Student-performance-analysis-using-Big-data

This project analyses and correlates student performance with different attributes. Then at last, it determines most suitable algorithm from bunch of them.

Language: Python - Size: 1.48 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 12 - Forks: 11

y0nil/kusto.blog

A technical blog about Kusto

Language: HTML - Size: 2.72 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 11 - Forks: 2

FIWARE/tutorials.Big-Data-Flink

:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink

Language: Shell - Size: 37.5 MB - Last synced at: 25 days ago - Pushed at: 2 months ago - Stars: 10 - Forks: 5

Amey-Thakur/OPTIMIZING-STOCK-TRADING-STRATEGY-WITH-K-MEANS-CLUSTERING

Big Data Analytics [BDA] Mini Project

Language: Jupyter Notebook - Size: 2.55 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 10 - Forks: 1

XuanyouLiu/US-Real-Estate-Analysis

US Real Estate Rental Price Analysis

Language: Jupyter Notebook - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

SrLozano/Tinder-Big-Data-Analysis

Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech

Language: Jupyter Notebook - Size: 21.7 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 5

adityakamble49/loss-ratio-prediction

Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage

Language: Jupyter Notebook - Size: 71.8 MB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 10 - Forks: 2

ThinkBigEg/influxDB-grafana-gke

In this tutorial we explain how to get real time analytics of energy produced and consumed from two solar stations simulators using influxDB together with grafana hosted on the kubernetes engine of google

Language: Python - Size: 457 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 10 - Forks: 1

ys2843/million-song-dataset-analysis

Big Data Analysis on Million Song Dataset

Language: PigLatin - Size: 26.4 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 10 - Forks: 3

big-data-lab-team/accident-prediction-montreal

Language: Jupyter Notebook - Size: 65 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 7

SkipToYourSoul/keep-hungry-stay-foolish

A Personal Work Notebook on Gitbook. 1)编程知识点总结;2)大数据场景下的用户数据解决方案实例

Language: HTML - Size: 5.44 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 9 - Forks: 0

rapticore/ssvc_ore_miner

SSVC Ore Miner - www.rapticore.com

Language: Python - Size: 433 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 8 - Forks: 1

Amey-Thakur/BIG-DATA-ANALYTICS-AND-COMPUTATIONAL-LAB-I

CSDLO7032: Big Data Analytics & CSL704: Computational Lab - I <Semester VII>

Language: Jupyter Notebook - Size: 183 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 2

adityajain10/pyspark-mlib-based-stock-predictor

PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data collected every hour for 20 year for 500 companies obtained via Alpha Vantage API

Language: CSS - Size: 13.7 MB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 6

epidataio/epidata-community

EpiData IoT Data Science Platform - Community Edition

Language: Python - Size: 7.56 MB - Last synced at: 11 months ago - Pushed at: almost 2 years ago - Stars: 8 - Forks: 7

N1ghtF1re/Map-of-emergency-incidents

Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information

Language: PHP - Size: 71.2 MB - Last synced at: 22 days ago - Pushed at: almost 7 years ago - Stars: 8 - Forks: 5

yaoguangluo/ChromosomeDNA

《DNA元基催化与肽计算》 在进化计算中, 软件函数文件进行 DNA 语义元基索引编码的 PDE 新陈代谢优化方式, 是一种有效的进化方式.

Language: Java - Size: 676 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 7 - Forks: 2

Amey-Thakur/HADOOP

HADOOP

Language: Jupyter Notebook - Size: 12.7 KB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 1

jimit105/Computer-Engineering-Programs

Programs for various subjects of Computer Engineering

Language: C - Size: 19.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

kaushik03/Modern-Big-Data-Analysis-using-SQL

RDBMS techniques for Big Data analysis

Size: 1.57 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 7 - Forks: 1

Nico-Curti/PhDthesis

PhD thesis in Applied Physics

Language: TeX - Size: 220 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 7 - Forks: 0

asavinov/bistro

A general-purpose data analysis engine radically changing the way batch and stream data is processed

Language: Java - Size: 2.16 MB - Last synced at: 28 days ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 0

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: 15 days ago - Pushed at: 7 months ago - Stars: 6 - Forks: 0

MSUSAzureAccelerators/Workplace-Intelligence-Accelerator

The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.

Language: TSQL - Size: 22.3 MB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 3

hisah/Multilevel-Streaming-Analytics

A Multilevel Streaming Data Analytics Infrastructure for Predictive Analytics

Size: 51.4 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 5

oprecomp/oprecomp

The Horizon 2020 Open Transprecision Computing project

Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 4

Mgosi/Big-Data-Analysis-using-MapReduce-in-Hadoop

We explore data by using Big Data Analysis and Visualization skills. To obtain this, we perform 3 main operations. i.e. i)Data Aggregation through different sources. ii) Big Data Analysis using MapReduce and iii) Visualization through Tableau. Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.Data Analysis is very critical in understanding the data, and what we can do with the data. For small datasets it is easier to process and obtain the results. But as for big companies, it becomes crucial for them to obtain the trends of the company for any changes need to be made. Hence we introduce Big Data Analysis to solve this problem. In this lab, we collect close to 20000 tweets, 500 articles on New York Times and 500 articles on Common Crawl Data about Entertainment, which is our main topic of discussion. Using this data, we perform preprocessing and feed it to a MapReduce to find the Word Count and Word Co-Occurrence. Using this, we find the trend of the data collected in this topic. We have used Python to perform Data Analysis.

Language: Jupyter Notebook - Size: 16.8 MB - Last synced at: 5 months ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 3

matthew-mcateer/MIT_Policy_Hackathon

Repository for the winning code for the Internet & Cybersecurity track of the MIT Policy Hackathon.

Language: Jupyter Notebook - Size: 1.01 MB - Last synced at: 2 days ago - Pushed at: about 7 years ago - Stars: 6 - Forks: 0

FreeIPCC/FreeWorkPhone

企业手机,工作手机,商务手机,企业数据沉淀,销冠手机,定制版企业手机,智能手机。

Size: 191 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

haustcsa/SocialSituSecu

SocialSituSecu is a project exploring the social network security, computing and intelligence basd on social situational metadata, which is sponsored by National Natural Science Foundation of China Grant No.61972133, and Project of Leading Talents in Science and Technology Innovation for Thousands of People Plan in Henan Province Grant No.204200510021.

Language: Python - Size: 87.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 1

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 5 - Forks: 1

bydevmar/Master_MASD_FPO

Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.

Language: Jupyter Notebook - Size: 153 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 5 - Forks: 0

noobpk/gemini-web-vulnerability-detection

Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning

Language: Python - Size: 50.8 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 5 - Forks: 0

KayvanShah1/Big-Data-Specialization-Coursera

Repository for the Big Data Specialization from University of California San Diego on Coursera

Language: Python - Size: 20 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

tabletop-labs/tabletop

A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse

Language: Go - Size: 290 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

Gugo-le/student-performance-predict

Big data was learned using tensorflow.

Language: Jupyter Notebook - Size: 1.31 MB - Last synced at: 12 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

AnabelSMRuggiero/NNDescent.cpp

A C++ iteration of PyNNDescent.

Language: C++ - Size: 1.53 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

jofaval/tfm-iabd

Master's Final Degree Project on Artificial Intelligence and Big Data

Language: Shell - Size: 17.1 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 4

Thomas-George-T/File-Processing-Comparative-Analytics

Wanna know which languages and execution engines are the quickest or the slowest at processing files? Well here's your answer. 📊 Data Analysis & comparison between the time taken ⌚ for computing word counts in various languages and execution engines for files of different sizes.

Language: Jupyter Notebook - Size: 4.34 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

exajobs/data-engineering-collection

A collection of awesome software, libraries, Learning Tutorials, documents, books, resources and interesting stuff about Big Data Science & Engineering

Size: 241 KB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

JavadDogani/Multivariate-Cloud-workload-analysis

This repository analyzes the Multivariate workload data of Google Cluster machines.

Language: Python - Size: 880 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

UCLA-SEAL/Semeru

Semeru: A Memory-Disaggregated Managed Runtime (OSDI 2020)

Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

SinghHarshita/Clustering-Algorithms-Spark

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

Language: Jupyter Notebook - Size: 150 KB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 5 - Forks: 0

matteocereda/GSECA

Gene Set Enrichment Class Analysis for heterogeneous RNA sequencing data

Language: R - Size: 56.4 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

anshul1004/LiveTwitterSentimentAnalysis

Live sentiment analysis of tweets using Kafka

Language: Python - Size: 4.63 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 5 - Forks: 1

PotatoSpudowski/The-centralized-guide-to-distributed-storage-and-processing-of-big-data

This is a repository containing my code samples that helped me understand the concepts of distributed storage and processing of Big data using Apache spark and Python.

Language: Python - Size: 8.56 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 3

sahith/Link-Prediction-for-Citation-Networks-using-Apache-Spark

Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.

Language: Scala - Size: 6.41 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1

JackSnowWolf/EECS_E6893_Big_Data_Analytics_Homework

Homework for Big Data Analytic @ columbia university

Language: Python - Size: 12.9 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 11

vvittis/FlinkSampling

Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.

Language: Java - Size: 69.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 1

JINHXu/how-much-hate-with-china

Code repository for the paper: How Much Hate with #china? A Preliminary Analysis on China-related Hateful Tweets Two Years After the Covid Pandemic Began

Language: Jupyter Notebook - Size: 6.69 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

TirendazAcademy/Hands-on-Data-Science-with-GCP

Google BigQuery Tutorial

Language: Jupyter Notebook - Size: 366 KB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 1

sharma-n/global_event_analytics

Big data analytics using Hadoop on GDELT global news dataset.

Language: Java - Size: 2.66 MB - Last synced at: 10 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 1

manthanthakker/MapReduceSpark

CS6240 - Large Scale Parallel Processing Course at Northeastern University

Language: Java - Size: 78.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 4 - Forks: 0

chandrahask535/Big-Data-Analysis-to-Identify-Adverse-Effects-of-Covid-19-Vaccines2.0

This project utilizes big data analytics, machine learning, and statistical methods to identify and classify adverse effects of COVID-19 vaccinations. By analyzing large datasets, it aims to uncover patterns and correlations, providing valuable insights into vaccine safety and efficacy.

Language: Python - Size: 5.71 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

nssharmaofficial/market-basket-analysis

Market basket analysis using Pandas and next order prediction by XGBoost

Language: Jupyter Notebook - Size: 1.12 MB - Last synced at: 19 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 1

dgkanatsios/GameAnalyticsEventHubFunctionsCosmosDatalake

Big data reference architecture and implementation for an online multiplayer game

Language: JavaScript - Size: 563 KB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 2

GiovanniMerici/Big-Data-in-Linguistics

Supporting code for big-data analysis in linguistics

Language: Jupyter Notebook - Size: 1.99 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 0

geraked/bigdata

Implementation of Big Data Analytics Algorithms in Python

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

dwsmith1983/space-filling-curves

Space filling curve library for Spark

Language: Scala - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 1

BMClab/covid19

BMClab's research on Covid-19

Language: Jupyter Notebook - Size: 15.6 MB - Last synced at: 6 months ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 3

Bayunova28/SAS_Visual_Data_Mining_Machine_Learning

This repository contains about my weekly projects from Big Data Analytics II course at my college

Language: SAS - Size: 2.66 MB - Last synced at: 19 days ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

Sayan3990/Big-Data-PPT

Research on Innovation of Automobile Marketing Mode Based on Big Data Marketing

Size: 15.3 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 1