GitHub topics: mapreduce
benedekh/bigdata-projects
Student projects in Big Data field.
Language: Java - Size: 225 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 12

groda/big_data
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
Language: Jupyter Notebook - Size: 54.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 81 - Forks: 27

cdapio/cdap
An open source framework for building data analytic applications.
Language: Java - Size: 613 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 777 - Forks: 351

PowerJob/PowerJob
Enterprise job scheduling middleware with distributed computing ability.
Language: Java - Size: 18.6 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 7,520 - Forks: 1,314

apache/uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
Language: Java - Size: 13.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 418 - Forks: 160

Hazim-HF/Data-Management
This repository covers data management and big data technologies, including databases, querying, and big data processing. Topics include Hadoop (MapReduce, HDFS), Apache Spark, data security, and optimization techniques. Students will learn Spark’s architecture, data distribution, parallel computing, and memory caching to enhance big data solutions
Language: Jupyter Notebook - Size: 73.6 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

H1ghBre4k3r/rust-map-reduce
A small hobby implementation of MapReduce that I hacked together at 2am.
Language: Rust - Size: 50.8 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

kevwan/mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Language: Go - Size: 44.9 KB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 174 - Forks: 24

hiejulia/Data-pipeline-project
Data pipeline project
Language: Jupyter Notebook - Size: 55.1 MB - Last synced at: about 7 hours ago - Pushed at: 5 months ago - Stars: 35 - Forks: 23

lynnlangit/learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Language: HTML - Size: 13.6 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 196 - Forks: 165

tonyamf/Demonstration_of_Big-Data_Analysis_Pipeline
Demonstration of a Big Data Program," aims to demonstrate a complete big data analysis pipeline. The central goal is to analyze a dataset of house prices in India to understand the factors influencing the price (descriptive analytics) and to build a model that can predict house prices based on these factors (predictive analytics)
Language: Jupyter Notebook - Size: 43.2 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

MariaSchoinaki/roomie
An implementation of a distributed room booking mobile app, we created during our third year at AUEB's Distributed Systems course. This implementation leverages the MapReduce framework.
Language: Java - Size: 24.1 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 0 - Forks: 2

miozilla/dataprochs
dataprochs :elephant::honeybee: : Dataproc Cluster # Apache # Hadoop # MapReduce # Spark # YARN # HDFS
Language: Shell - Size: 2.21 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

water8394/BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Size: 6.59 MB - Last synced at: 22 days ago - Pushed at: almost 4 years ago - Stars: 1,622 - Forks: 447

EleniKechrioti/roomie Fork of MariaSchoinaki/roomie
An implementation of a distributed airbnb booking mobile app, we created during our third year at AUEB's Distributed Systems course. This implementation leverages the MapReduce framework.
Language: Java - Size: 24.1 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

casangi/graphviper
Dask Based MapReduce for Multi Xarray Datasets.
Language: Python - Size: 2.61 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 1 - Forks: 2

miguno/avro-hadoop-starter 📦
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Language: Java - Size: 650 KB - Last synced at: 3 days ago - Pushed at: over 9 years ago - Stars: 115 - Forks: 83

maengsanha/bigdata
KMU CS Hot Topics in Big Data
Language: Go - Size: 54.5 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 2 - Forks: 0

microsoft/Mobius
C# and F# language binding and extensions to Apache Spark
Language: C# - Size: 6.44 MB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 940 - Forks: 211

zeekling/hadoop_book
Hadoop 学习笔记。
Size: 216 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

CamDavidsonPilon/tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Language: Python - Size: 91.8 KB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 396 - Forks: 54

MadhukarSaiBabu/Aviation-Trend-Analysis-using-MapReduce-and-R
Developed a data-driven solution leveraging Hadoop MapReduce, Hive, and R to analyze air travel data. Identified trends in passenger volume, route utilization, and peak travel periods, providing actionable insights for optimizing airline operations and improving the passenger experience.
Size: 1.34 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

amirkiarafiei/Spark-Statistics-Analysis
Descriptive and Exploratory Statistical functions implemented within a distributed Spark Cluster with Performance Analysis and Visualizations
Language: Python - Size: 1.28 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

serihiro/simple_map_reduce
Distributed MapReduce implementation written in ruby.
Language: Ruby - Size: 257 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

douban/dpark 📦
Python clone of Spark, a MapReduce alike framework in Python
Language: Python - Size: 2.65 MB - Last synced at: 24 days ago - Pushed at: over 4 years ago - Stars: 2,680 - Forks: 530

TmohamedashrafT/High-Availability-Bigdata-Cluster
A highly available, fully distributed big data cluster built with Docker, integrating Hadoop HDFS, YARN, ZooKeeper, HBase, Hive, Spark, and Tez. Designed for scalability, fault tolerance, and seamless data processing in a containerized environment.
Language: Shell - Size: 16.6 KB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

abshek7/Big-data
A repository for documenting the learning related to theory and practical notes of big data computing.
Language: Python - Size: 330 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PasanAbeysekara/Taxi-Pickup-Hotspot-Analysis-using-Hadoop-MapReduce
This project analyzes one month of NYC Yellow Taxi trip data (January 2016) to identify the busiest taxi pickup locations. It utilizes the Hadoop MapReduce framework to process the data and a lookup table to map location IDs to human-readable zone names.
Language: Java - Size: 5.65 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PramithaMJ/job-analysis-MapReduce
Technical Skills Analysis using MapReduce - hadoop
Language: Shell - Size: 883 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mcxiaoxiao/bookdata-visual
mapreduce数据分析可视化 mapreduce期末作业 当当网数据可视化前后端
Size: 133 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

jathavaan/bds-seoul-hadoop
Language: Python - Size: 81.1 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

PramithaMJ/hadoop-cluster-manager
Complete Apache Hadoop 3.4.1 cluster installation and management toolkit with automated scripts, comprehensive documentation, and production-ready configuration templates for single-node and multi-node deployments.
Language: Shell - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

WilliamZhang20/dask-algorithms
Implemented distributed computing algorithms
Language: Python - Size: 3.91 KB - Last synced at: 16 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

DigitalPebble/behemoth 📦
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Language: Java - Size: 7.45 MB - Last synced at: 13 days ago - Pushed at: about 7 years ago - Stars: 282 - Forks: 59

CocaineCong/tangseng
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Language: Go - Size: 6.81 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 124 - Forks: 36

MuhamedHekal/Hadoop-HA-Cluster-on-Docker
Hadoop3-HA-Docker is a production-ready, fault-tolerant Hadoop cluster deployed with Docker Compose. It automates the setup of a fully distributed Hadoop ecosystem with high availability (HA) features, designed for reliability, scalability, and real-world big data workloads
Language: Dockerfile - Size: 273 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

collabH/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Language: Shell - Size: 221 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 1,612 - Forks: 375

heibaiying/BigData-Notes
大数据入门指南 :star:
Language: Java - Size: 22.9 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 16,422 - Forks: 4,279

cubefs/compass
Compass is a task diagnosis platform for bigdata
Language: Java - Size: 5.92 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 385 - Forks: 139

donnemartin/data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Language: Python - Size: 46.8 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 28,169 - Forks: 7,978

IwasakiYuuki/data-analysis-platform-infra
Construct on-premises Hadoop cluster using ansible
Language: Jinja - Size: 248 KB - Last synced at: 3 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Language: Java - Size: 397 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1,075 - Forks: 661

mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language: HTML - Size: 601 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 158 - Forks: 143

arindas/mit-6.824-distributed-systems
Template repository to work on the labs from MIT 6.824 Distributed Systems course.
Language: Go - Size: 1.42 MB - Last synced at: 2 days ago - Pushed at: about 3 years ago - Stars: 60 - Forks: 8

kwartile/connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Language: Scala - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 85 - Forks: 18

grailbio/bigslice
A serverless cluster computing system for the Go programming language
Language: Go - Size: 2.66 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 554 - Forks: 35

srafay/Hadoop-hands-on
Learning how to tame the Big Data with Hadoop and related technologies
Language: PigLatin - Size: 96.7 KB - Last synced at: 3 days ago - Pushed at: over 5 years ago - Stars: 23 - Forks: 21

adwaiy2912/BDA-Lab
Repository contains weekly lab work and assignments for the Big Data Analytics (BDA) course
Language: Python - Size: 7.8 MB - Last synced at: 11 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Language: Python - Size: 44.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 215 - Forks: 93

ggcr/go-mapreduce
MapReduce implementation written in Go. MIT 6824
Language: Go - Size: 2.56 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

datawhalechina/juicy-bigdata
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Language: Python - Size: 27.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 313 - Forks: 43

eecs485staff/madoop
A light weight MapReduce framework for education
Language: Python - Size: 515 KB - Last synced at: 11 days ago - Pushed at: 3 months ago - Stars: 9 - Forks: 4

limbo-io/fluxion
Orchestrate & Schedule Platform. More custom extension for distributed computation.
Language: Java - Size: 1.11 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

lovnishverma/bigdataecosystem
Complete Big Data Ecosystem on Docker Desktop
Language: Shell - Size: 405 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 1

ReusJimenez/python-data-engineering
Laboratorios prácticos de ingeniería de datos con Python. ⚙️
Language: Jupyter Notebook - Size: 27.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Tencent/Firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Language: Java - Size: 1.63 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 256 - Forks: 72

whitfin/efflux
Easy Hadoop Streaming and MapReduce interfaces in Rust
Language: Rust - Size: 51.8 KB - Last synced at: 12 days ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 7

grexrr/code-learning-ai
Some concept related practices
Language: Jupyter Notebook - Size: 63.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

tejaswirupa/Big-Data-Systems-Project-Hadoop-Hive-MapReduce-Sqoop-Workflows
Designed and implemented scalable data workflows using Hadoop, Hive, and Sqoop. This project involved log aggregation, airline delay analysis, word frequency processing, and TF-IDF computation across multiple datasets using MapReduce, Hive queries, and Hadoop Streaming.
Size: 3.75 MB - Last synced at: 24 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

asuiu/pyxtension
Pure Python extensions library that includes Scala-like streams, Json with attribute access syntax, and other common use stuff
Language: Python - Size: 334 KB - Last synced at: 21 days ago - Pushed at: 4 months ago - Stars: 46 - Forks: 1

yahiazakaria445/MapReduce-in-bash-scripting
A Bash-Based MapReduce for Distributed File Processing
Language: Shell - Size: 7.81 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

bcongdon/corral
🐎 A serverless MapReduce framework written for AWS Lambda
Language: Go - Size: 1.43 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 694 - Forks: 40

cwensel/cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Language: Java - Size: 32.1 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 350 - Forks: 221

longshilin/Hadoop-MapReduce
基于MapReduce的应用案例 :ear_of_rice:
Language: Java - Size: 30.3 KB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 25 - Forks: 7

LucasUTNFRD/mit6.5840
Distributed System Related Projects in GO
Language: Go - Size: 14.4 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

hobbyquaker/mqttDB
JSON Store with MQTT Interface :books::open_file_folder::satellite:
Language: JavaScript - Size: 99.6 KB - Last synced at: 17 days ago - Pushed at: about 7 years ago - Stars: 26 - Forks: 0

Erfanafshar/hadoop-cluster-crime-stats
Distributed crime data analysis using a multi-node Hadoop cluster with MapReduce and HDFS.
Language: Java - Size: 197 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mimecast/dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Language: Go - Size: 12.3 MB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 128 - Forks: 10

cold-bin/mit-6.824-labs
实现2023年的mit6.824的四个labs和三个challenges
Language: Go - Size: 9.53 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 0

TurboWay/pybigdata
使用 python 操作大数据的各种组件
Language: Python - Size: 85 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 63 - Forks: 18

course-files/DistributedDatabases-HDFS-MapReduce-WideColumn
Concepts: Distributed Database Management Systems and Non-Relational Data Models - Setting up Hadoop in a fully distributed mode, using Hadoop Distributed File System (HDFS) and MapReduce (in Java and Python), and using a non-relational database based on a wide-column data model (HBase).
Language: TeX - Size: 220 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

asuiu/streamerate
Iterable Java8 style Streams for Python
Language: Python - Size: 486 KB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 8 - Forks: 3

dayyass/pydfs
Distributed File System written in Python
Language: Python - Size: 61.5 KB - Last synced at: 10 days ago - Pushed at: almost 3 years ago - Stars: 14 - Forks: 0

am-kantox/elixir-iteraptor
Handy enumerable operations implementation.
Language: Elixir - Size: 206 KB - Last synced at: 21 days ago - Pushed at: 5 months ago - Stars: 72 - Forks: 9

niqdev/devops
DevOps
Language: Shell - Size: 9.18 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 19

lokk798/BigData-Quiz-Bank
A comprehensive collection of multiple-choice questions (MCQs) and assessments covering Hadoop, MapReduce, and the broader Big Data ecosystem.
Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

feng-li/Distributed-Statistical-Computing
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Language: HTML - Size: 49.1 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 106 - Forks: 66

mahmoudparsian/pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Language: Python - Size: 40.5 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 84 - Forks: 44

vitalibo/grapes
Six degrees of separation theory research
Language: Java - Size: 262 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

KhalilKrugerOS/PaymentMethodCounter
INSAT exercice solution where we count how many transactions use Mastercard using MapReduce Frameword on hadoop
Language: Java - Size: 0 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

InnoFang/subgraph-isomorphism
❄Implement the common subgraph isomorphism algorithms (i.e. Ullmann, VF2) based on MapReduce on Hadoop
Language: Java - Size: 19.6 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 19 - Forks: 0

ruitianzhong/xdu-distributed-system
Assignment for Distributed Computing(分布式计算) and Network Application Desgin(网络程序应用设计) in Xidian University(Spring 2024)
Language: Java - Size: 2.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

5ss90/Market_Basket_Analysis
A data mining project analyzing Instacart's 3 million grocery orders to uncover customer shopping patterns and product associations. Using market basket analysis and the Apriori algorithm, the project reveals key insights about shopping behavior, product combinations, and temporal patterns, providing valuable recommendations for retail strategy
Language: Jupyter Notebook - Size: 203 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

chaokunyang/athena
A task scheduler for spark, flink, mapreduce, java, python, bash
Language: Java - Size: 176 KB - Last synced at: 2 months ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 3

touero/ctenopharyngodon-idella
Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Language: Java - Size: 3.75 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 140 - Forks: 0

taovietducofficial/BDA-PROJECT
This project analyzes U.S. traffic accidents using Jupyter Lab and Power BI to identify trends, causes, and risks through data preprocessing, analysis, and visualization.
Language: Jupyter Notebook - Size: 19.5 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

gowri-malla216/Predicting-UEFA-Champions-league-match-outcome-in-Fifa
The project aims to predict football UEFA champion's league match outcome of taking two teams and in particular year using machine learning, focusing on player ratings data-set obtained from fifa application and team past performance with opponent in that year [2016-2013].
Language: Python - Size: 1.17 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Sabaudian/AMD_Market_Basket_Analysis
Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project
Language: Jupyter Notebook - Size: 2.16 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

edydfang/UW-Madison-CS537
Operating System Projects
Language: C - Size: 1.19 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 6

Young-ook/terraform-aws-emr
Terraform Module: Amazon EMR
Language: HCL - Size: 7.33 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

AashikSharif/Topic-Sensitive-Page-Ranking-algorithm-using-MapReduce
Topic Sensitive Page Ranking algorithm using MapReduce
Language: Python - Size: 10.8 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

MarioInf-Master-CompuerScience-UCM/Gestion_datosInformacion
Repositorio de trabajo de la asignatura "Sistemas de gestión de datos y de la información" (curso 22-23), perteneciente al Máster en Ingeniería Informática de la Universidad Complutense de Madrid (UCM)
Language: Jupyter Notebook - Size: 313 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

mohammad-malik/wikipedia-naive-search
This repository houses a naïve search engine utilising MapReduce technology which leverages a 5GB csv file as dataset. It makes use of the Vector Space Model for Information Retrieval. This was developed as part of an assignment for the course Fundamentals of Big Data Analytics (DS2004).
Language: Python - Size: 992 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

dhchenx/Catla-HS
Catla for Hadoop and Spark (Catla-HS): An open-source system to support tuning MapReduce performance on Hadoop and Spark clusters.
Language: Java - Size: 105 MB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

berksudan/Analysis-on-Big-Data-with-Hadoop
Implementation of Statistical Methods via Hadoop Map-Reduce Library.
Language: Java - Size: 75.2 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

flipkart-incubator/hbase-orm
A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)
Language: Java - Size: 363 KB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 81 - Forks: 41

samuele-lolli/Steam-Recommendation-System
A basic recommendation system built with Scala and Spark.
Language: Scala - Size: 368 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 1 - Forks: 2

nzrsky/FunkObjC 📦
Functional and typed extensions for ObjC 🚀
Language: Objective-C - Size: 169 KB - Last synced at: 26 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

pngo1997/K-Means-K-Median-Clustering-with-Hadoop-MapReduce
Implements K-Means and K-Median Clustering using Hadoop MapReduce on a three-node cluster.
Language: Jupyter Notebook - Size: 3.4 MB - Last synced at: 5 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

EthanWng97/ray-mapreduce-kmeans
📚 Build a whole MapReduce on top of Ray and implement clustering algorithm based on that.
Language: Python - Size: 16 MB - Last synced at: about 2 months ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 2
