An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: hadoop-mapreduce

jathavaan/bds-seoul-hadoop

langage: Python - taille: 81,1 ko - dernière synchronisation: il y a 3 jours - enregistré: il y a 3 jours - étoiles: 0 - forks: 0

NitchayaninT/EGCI466_BigData

For big data processing course. Lecture includes the use of hadoop, mongoDB, etc

langage: Jupyter Notebook - taille: 2,61 Mo - dernière synchronisation: il y a 7 jours - enregistré: il y a 7 jours - étoiles: 0 - forks: 0

benedekh/bigdata-projects

Student projects in Big Data field.

langage: Java - taille: 198 ko - dernière synchronisation: il y a 10 jours - enregistré: il y a 10 jours - étoiles: 19 - forks: 12

janheinrichmerker/hadoop-ktx

💾 Kotlin Extensions for Apache Hadoop (MapReduce).

langage: Kotlin - taille: 178 ko - dernière synchronisation: il y a 7 jours - enregistré: il y a 13 jours - étoiles: 1 - forks: 0

SaltFishGC/SteamGameDataAnalysis

大数据课设,steam游戏数据分析,结合hadoop+hive+sqoop+mysql+springboot+echarts展示结果。

langage: JavaScript - taille: 9,3 Mo - dernière synchronisation: il y a 14 jours - enregistré: il y a 15 jours - étoiles: 0 - forks: 0

Lokeshkanna7/An-End-to-End-Big-Data-Pipeline-for-Amazon-Book-Reviews-using-Hadoop-and-Spark

A scalable big data pipeline built with Hadoop and Spark to analyze Amazon book reviews. This project performs sentiment analysis, rating prediction, and fake review detection using PySpark, demonstrating real-world applications of distributed systems and machine learning.

langage: Jupyter Notebook - taille: 459 ko - dernière synchronisation: il y a 18 jours - enregistré: il y a 18 jours - étoiles: 0 - forks: 0

mahmoudparsian/data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

langage: Java - taille: 397 Mo - dernière synchronisation: il y a 18 jours - enregistré: il y a 8 mois - étoiles: 1 075 - forks: 661

JKA098/Pokemon-Feistiness-Apache-Spark-Job

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

langage: Python - taille: 184 ko - dernière synchronisation: il y a environ un mois - enregistré: il y a environ un mois - étoiles: 0 - forks: 0

JKA098/Pokemon-Feistiness-MapReduce-Job

This Project aims to implement a **Hadoop MapReduce job in Pseudo-Distributed Mode** to determine the **feistiest Pokémon** based on their **type**. The job processes the Pokémon dataset (`pokemon.csv`) and outputs a CSV file containing Pokémon **type1, type2, name, and feistiness score**.

langage: Python - taille: 220 ko - dernière synchronisation: il y a environ un mois - enregistré: il y a environ un mois - étoiles: 0 - forks: 0

taabishhh/LLM_Preprocessing

This project implements a Byte Pair Encoding (BPE) tokenization approach along with a Word2Vec model to generate word embeddings from a text corpus. The implementation leverages Apache Hadoop for distributed processing and includes evaluation metrics for optimal dimensionality of embeddings.

langage: Scala - taille: 7,37 Mo - dernière synchronisation: il y a 6 jours - enregistré: il y a 7 mois - étoiles: 1 - forks: 0

Yousuf1733/Titanic-Dataset-Analysis

Exploratory data analysis of the Titanic dataset, uncovering insights on passenger survival rates based on gender, age, and class. Includes data cleaning, visualization, and findings.

langage: Jupyter Notebook - taille: 71,3 ko - dernière synchronisation: il y a environ 2 mois - enregistré: il y a environ 2 mois - étoiles: 0 - forks: 0

groda/big_data

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are self-contained and live—ready to run with a click.

langage: Jupyter Notebook - taille: 51,9 Mo - dernière synchronisation: il y a 13 jours - enregistré: il y a environ 2 mois - étoiles: 75 - forks: 26

ArianaPerez-24/Hadoop-MapReduce-de-WordCount

Ejercicios para contar palabras, ordenar numeros (de menor a mayor) y resolver sudoku.

langage: Shell - taille: 837 ko - dernière synchronisation: il y a environ 2 mois - enregistré: il y a environ 2 mois - étoiles: 0 - forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab

Data Analytics Laboratory

langage: R - taille: 23,1 Mo - dernière synchronisation: il y a 2 mois - enregistré: il y a 2 mois - étoiles: 1 - forks: 0

senthuran16/word-count-streaming-python-hadoop-mapreduce

A word count streaming MapReduce implementation with Python

langage: Python - taille: 586 ko - dernière synchronisation: il y a 2 mois - enregistré: il y a plus de 3 ans - étoiles: 0 - forks: 0

sueszli/sparkly-svm

distributed training of a SVM with sparkML

langage: Jupyter Notebook - taille: 21,6 Mo - dernière synchronisation: il y a 23 jours - enregistré: il y a 3 mois - étoiles: 0 - forks: 0

lokk798/BigData-Quiz-Bank

A comprehensive collection of multiple-choice questions (MCQs) and assessments covering Hadoop, MapReduce, and the broader Big Data ecosystem.

taille: 5,86 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a 3 mois - étoiles: 0 - forks: 0

touero/ctenopharyngodon-idella

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

langage: Java - taille: 3,75 Mo - dernière synchronisation: il y a 21 jours - enregistré: il y a 8 mois - étoiles: 140 - forks: 0

imsanjoykb/PySpark-Bootcamp

My Practice and project on PySpark

langage: Jupyter Notebook - taille: 4,52 Mo - dernière synchronisation: il y a 2 mois - enregistré: il y a plus de 3 ans - étoiles: 8 - forks: 3

josericodata/josericodata

Adding a cool README file

taille: 87,9 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a 3 mois - étoiles: 0 - forks: 0

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

taille: 14,3 Mo - dernière synchronisation: il y a 2 mois - enregistré: il y a 4 mois - étoiles: 2 - forks: 0

groda/hats

Hadoop Ansible Test Suite

langage: Shell - taille: 33,2 ko - dernière synchronisation: il y a 8 jours - enregistré: il y a 4 mois - étoiles: 0 - forks: 0

nikisetti01/Hadoop-MapReduce-LetterFrequency-Analysis

Simple example of Hadoop Application count letter, with an intersting Romance Language Analysis

langage: Jupyter Notebook - taille: 2,71 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a 10 mois - étoiles: 2 - forks: 2

berksudan/Analysis-on-Big-Data-with-Hadoop

Implementation of Statistical Methods via Hadoop Map-Reduce Library.

langage: Java - taille: 75,2 Mo - dernière synchronisation: il y a 4 mois - enregistré: il y a 4 mois - étoiles: 0 - forks: 0

pngo1997/Big-Data-Mining-Project-PageRank-Hadoop-Streaming

Explores Big Data Processing using Hadoop & MapReduce.

langage: Jupyter Notebook - taille: 2,54 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a 4 mois - étoiles: 0 - forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

langage: Jupyter Notebook - taille: 10,7 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a 4 mois - étoiles: 0 - forks: 1

chouaib-629/CustomerSegmentation

Hadoop-based Customer Segmentation project using the Online Retail Dataset. Implements MapReduce for processing and Python for preprocessing to uncover customer purchasing patterns for targeted marketing.

langage: Jupyter Notebook - taille: 260 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 4 mois - étoiles: 1 - forks: 0

bytedance/CloudShuffleService

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

langage: Java - taille: 1,23 Mo - dernière synchronisation: il y a 20 jours - enregistré: il y a environ un an - étoiles: 255 - forks: 58

arkady-emelyanov/hadoop-playground 📦

🐘Yet another Hadoop playground

langage: Shell - taille: 49,8 ko - dernière synchronisation: il y a 3 jours - enregistré: il y a environ 7 ans - étoiles: 2 - forks: 1

developer-sdk/beginner-bigdata-example

Hadoop, Hive, Spark 작업의 예제들

langage: Java - taille: 3,34 Mo - dernière synchronisation: il y a 2 jours - enregistré: il y a 5 mois - étoiles: 1 - forks: 1

Hadeel-Abdeljalil/-Advanced-Topics-in-Database-DBMS--University-Assignment

langage: Java - taille: 411 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a 9 mois - étoiles: 1 - forks: 0

chouaib-629/MovieRecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

langage: Java - taille: 320 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 5 mois - étoiles: 1 - forks: 0

QiushiSun/Distributed-Computing-Systems

2021 Spring (Distributed Computing Systems) 分布式系统与编程

langage: Java - taille: 101 Mo - dernière synchronisation: il y a 2 mois - enregistré: il y a presque 4 ans - étoiles: 15 - forks: 1

krishnadey30/Intro-to-Hadoop-and-MapReduce

langage: Python - taille: 6,54 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a presque 7 ans - étoiles: 2 - forks: 0

krishnadey30/NewsHeadlines

This repository have codes that extracts meaningful information from News headline data-set.

langage: Python - taille: 85,9 ko - dernière synchronisation: il y a 2 mois - enregistré: il y a environ 6 ans - étoiles: 3 - forks: 2

benjdiasaad/MapReduce_WordCount

Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM

langage: Java - taille: 11,7 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a plus de 4 ans - étoiles: 2 - forks: 0

singhdivyank/MongoHadoop

A MongoDB and Hadoop cheat sheet with some commands and a few questions

langage: Python - taille: 499 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a 5 mois - étoiles: 0 - forks: 0

Mariam-iftikhar/BigDataProjects

The repository showcases a series of exercises and projects focused on big data processing using Hadoop, HBase, Hive, and Spark with Python. Hosted on AWS EMR, these projects demonstrate efficient data handling and processing techniques, leveraging the power of cloud computing to tackle complex data challenges.

taille: 10,4 Mo - dernière synchronisation: il y a 30 jours - enregistré: il y a environ un an - étoiles: 0 - forks: 0

sephiroth7712/K-Nearest-Neigbours

Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.

langage: C++ - taille: 19,5 ko - dernière synchronisation: il y a 2 mois - enregistré: il y a 6 mois - étoiles: 0 - forks: 0

mehwishferoz/BDA-project

A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.

langage: Java - taille: 7,42 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a 6 mois - étoiles: 0 - forks: 0

HabibAroua/Newspaper-analysis

langage: Java - taille: 12,5 Mo - dernière synchronisation: il y a 7 mois - enregistré: il y a 7 mois - étoiles: 1 - forks: 1

SAKET-SK/Semester6-SPPU-Data-Analysis-Lab

I installed Hadoop on Virtual Machine and all Assignments are performed on Ubuntu OS. Refer to this repo for completion of the Hadoop Assignments. It is recommended that you have a stable internet connection while doing these things.

langage: Rebol - taille: 3,24 Mo - dernière synchronisation: il y a 9 jours - enregistré: il y a environ 2 ans - étoiles: 13 - forks: 6

m-anshu/big-data-coursework

Big Data coursework material

langage: Shell - taille: 3,27 Mo - dernière synchronisation: il y a 1 jour - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

RiccardoRevalor/MapReduce

Collection of exercises regarding Hadoop and MapReduce approach

langage: Java - taille: 71,3 ko - dernière synchronisation: il y a 8 mois - enregistré: il y a 8 mois - étoiles: 0 - forks: 0

chriniko13/apache-hadoop-word-count-example

langage: Java - taille: 5,86 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a environ 7 ans - étoiles: 1 - forks: 0

amaankhan02/maplejuice

A parallel distributed batch processing framework similar to Hadoop MapReduce with a SQL Engine and a distributed file system

langage: Go - taille: 864 ko - dernière synchronisation: il y a 9 mois - enregistré: il y a 9 mois - étoiles: 0 - forks: 0

MariaDukmak/Hadopy

Easy parallel map-reduce command line tool

langage: Python - taille: 28,3 ko - dernière synchronisation: il y a 3 jours - enregistré: il y a environ 4 ans - étoiles: 7 - forks: 0

Rifat392000/BigDataAnalytics

langage: Jupyter Notebook - taille: 18,4 Mo - dernière synchronisation: il y a 30 jours - enregistré: il y a 9 mois - étoiles: 0 - forks: 0

burhanahmed1/Big-Data-Analytics

Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.

langage: Jupyter Notebook - taille: 40 ko - dernière synchronisation: il y a 4 mois - enregistré: il y a 12 mois - étoiles: 2 - forks: 0

drexly/movie140reviewcorpus

네이버 영화 164397건 중 140자 평이 있는 영화별 평점 raw data for spark

taille: 336 Mo - dernière synchronisation: il y a 9 mois - enregistré: il y a plus de 7 ans - étoiles: 7 - forks: 5

prateekkr1/Project-Work

This repository contains some of my personal projects.

langage: Jupyter Notebook - taille: 3,79 Mo - dernière synchronisation: il y a 10 mois - enregistré: il y a environ 5 ans - étoiles: 0 - forks: 0

WilliamCallao/HadoopNewsTrends

News trend analysis using Hadoop in a virtualized CentOS environment

langage: Python - taille: 17,4 Mo - dernière synchronisation: il y a 29 jours - enregistré: il y a 12 mois - étoiles: 1 - forks: 0

ZiadSalah2003/BigData-Project

"BigData-Project", is a comprehensive Big Data solution that involves various operations such as web crawling, PageRank algorithm, TF-IDF calculations, and inverted index creation using Hadoop.

taille: 83 ko - dernière synchronisation: il y a 25 jours - enregistré: il y a environ un an - étoiles: 0 - forks: 0

viseshrp/PageRank-MapReduce-Implementation

The MapReduce-Hadoop implementation of Google's PageRank algorithm

langage: Java - taille: 206 ko - dernière synchronisation: il y a 6 jours - enregistré: il y a environ 8 ans - étoiles: 2 - forks: 0

highoncarbs/hadoopwithpy

:elephant: :heavy_plus_sign: :snake: Learning Hadoop with Python

langage: Python - taille: 86,6 Mo - dernière synchronisation: il y a 3 mois - enregistré: il y a plus de 7 ans - étoiles: 3 - forks: 0

sharma-n/global_event_analytics

Big data analytics using Hadoop on GDELT global news dataset.

langage: Java - taille: 2,66 Mo - dernière synchronisation: il y a 11 mois - enregistré: il y a plus de 5 ans - étoiles: 4 - forks: 1

KingJin-web/Hadoop

hadoop-hdfs 以及 mapreduce 学习

langage: Java - taille: 7,56 Mo - dernière synchronisation: il y a 12 mois - enregistré: il y a presque 4 ans - étoiles: 1 - forks: 1

29DCH/Hadoop-HDFS-MapReduce-Examples

Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用

langage: Java - taille: 35,2 ko - dernière synchronisation: il y a 3 mois - enregistré: il y a presque 3 ans - étoiles: 2 - forks: 1

DecioXXIV/BD-StockAnalysis

Repository per il Secondo Progetto del Corso di "Big Data" (2023/24)

langage: Python - taille: 36,1 ko - dernière synchronisation: il y a 12 mois - enregistré: il y a environ un an - étoiles: 0 - forks: 0

jbw/hadoop-docker-cluster

Hadoop cluster on Docker (single host)

langage: Shell - taille: 159 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 7 ans - étoiles: 0 - forks: 0

Coursal/Text-Sentiment-Analysis-In-Hadoop-And-Spark

The source code developed and used for the purposes of my thesis with the same title under the guidance of my supervisor professor Vasilis Mamalis for the Department of Informatics and Computer Engineering of the University of West Attica.

langage: Java - taille: 66,5 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 4 ans - étoiles: 1 - forks: 1

Coursal/Hadoop-Letter-File-Index-Counter

A Hadoop-based Java project that counts the max number of word occurences for each letter in a textfile of a folder.

langage: Java - taille: 213 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 4 ans - étoiles: 0 - forks: 0

Coursal/Hadoop-Examples

Some simple, kinda introductory projects based on Apache Hadoop to be used as guides in order to make the MapReduce model look less weird or boring.

langage: Java - taille: 340 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 5 - forks: 2

StevenMonty/MapReduceSearchEngine

A containerized search engine GUI that communicates with a Hadoop cluster running MapReduce on GCP to create Inverted Indices for search engine queries.

langage: Java - taille: 15,8 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 4 ans - étoiles: 0 - forks: 0

chicuongdev2002/BigData_Hadoop_MapReduce

Use Scrapy Hadoop PigLatin

langage: Python - taille: 6,99 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 0 - forks: 0

subhash26jan96/cluster

This repository has a hadoop cluster code that are automated, ondemand, manual using by python, linux, html etc.

langage: Python - taille: 16,6 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 7 ans - étoiles: 0 - forks: 1

Jacob12138xieyuan/hadoop-mapreduce-with-python

hadoop mapreduce algorithm with hadoop streaming (Python)

langage: Jupyter Notebook - taille: 16,6 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 1 - forks: 0

VaishnavJois/CLOUDERA

Cloudera commands used for Big Data Analytics

taille: 13,7 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 5 ans - étoiles: 0 - forks: 0

raineydavid/big-data-processing

Big Data Processing Notes from Masters in Big Data Science

taille: 13,7 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 6 ans - étoiles: 0 - forks: 0

gmarciani/mapreduce-app

Scaffolding for Map/Reduce applications, leveraging Apache Hadoop.

langage: Shell - taille: 1000 octets - dernière synchronisation: il y a environ un an - enregistré: il y a presque 8 ans - étoiles: 1 - forks: 0

Walrussin/MapReduce-Examples

Analyzing air quality index of eight states

langage: Java - taille: 35,6 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 3 ans - étoiles: 0 - forks: 0

manoharpalanisamy/Advanced-Map-Reduce

Running Map reduce jobs on Hadoop Cluster with customized parameter

langage: Java - taille: 21,5 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 7 ans - étoiles: 0 - forks: 0

andrejanesic/Hadoop-Beginner-Exercise-Football-Data

Hadoop beginner exercise in analyzing European football teams' statistics over the last 20 years. The goal is to determine which team had the highest win percentage-rate.

langage: Makefile - taille: 453 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 2 ans - étoiles: 0 - forks: 0

zhermin/topkcommonwords

Extracts the Top K Common Words between 2 Text Files using Hadoop's MapReduce

langage: Java - taille: 84 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 2 ans - étoiles: 0 - forks: 0

MoustafaAMahmoud/BigDataInDepth

Data Engineering Course

langage: TeX - taille: 78,9 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 15 - forks: 9

YMaher99/Parallelizing-the-Feedforward-Operation-of-Neural-Networks-in-Hadoop-MapReduce

Leveraging the mapreduce paradigm we propose a solution to parallelize the feedforward operation of neural networks in order to speed it up for sufficiently large NN architectures and for sufficiently large datasets. Tested Using the MNIST dataset results can be found in the results.html and results.ipynb files.

langage: HTML - taille: 2,1 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 2 ans - étoiles: 0 - forks: 0

rodrigoorf/HadoopStudies

Repo with a few Hadoop exercises

langage: Java - taille: 72,3 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 5 ans - étoiles: 0 - forks: 0

emrectn/HadoopTutorial

hadoop

langage: Java - taille: 15,6 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ 6 ans - étoiles: 0 - forks: 0

prabhuvashwin/PageRank-Algorithm-Implementation 📦

Implementation of Google's PageRank algorithm using Java, Hadoop, and MapReduce

langage: Java - taille: 10,7 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 7 ans - étoiles: 3 - forks: 1

prabhuvashwin/TFIDF-SearchQuery 📦

Implementation for TFIDF and Searching of queries using keywords, using Java and Apache Hadoop

langage: HTML - taille: 887 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 7 ans - étoiles: 1 - forks: 1

prabhuvashwin/Credit-Card-Fraud-Detection 📦

Naive Bayes classifier and Logistic Regression classifier to predict whether a transaction is fraudulent or not

langage: Java - taille: 42,3 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 7 ans - étoiles: 1 - forks: 2

tableMinPark/trendflow

❗ 트랜드 분석 플랫폼 - SSAFY 8기 특화 프로젝트

langage: Java - taille: 55,6 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 2 ans - étoiles: 0 - forks: 0

Dave-Vedant/BigDataTech

This Repository contains the small projects related to Hive, Hadoop, and Spark. Its my contribution of learning new technology and provide my concise knowledge on big data different infrastructures.

langage: Scala - taille: 533 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 5 ans - étoiles: 1 - forks: 1

yangboz/mipr Fork de sozykin/mipr

MapReduce Image Processing framework for Hadoop

langage: Java - taille: 734 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 8 ans - étoiles: 0 - forks: 0

yennanliu/spark_emr_dev

Collection of code for submitting Spark/Hadoop/Hive/Pig tasks to EMR (AWS Elastic MapReduce) | #DE

langage: Scala - taille: 3,72 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 5 ans - étoiles: 3 - forks: 1

koitoer/hadoop

Hadoop and Big Data Training Resources

langage: Java - taille: 12,2 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 8 ans - étoiles: 1 - forks: 1

juliengan/TreesAnalysis

Trees Analysis

langage: Java - taille: 52,3 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 1 - forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

langage: Python - taille: 1,76 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ 2 ans - étoiles: 53 - forks: 34

aadishgoel/Hadoop-Codes

Neat and Handy Place for all Hadoop codes

langage: Java - taille: 25,4 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 7 ans - étoiles: 6 - forks: 3

simondelarue/Hadoop_Map_Reduce_from_scratch

Hadoop MapReduce implementation from scratch using Python | Distributed computing | Multi-processing

langage: Python - taille: 5,39 Mo - dernière synchronisation: il y a 11 mois - enregistré: il y a presque 4 ans - étoiles: 2 - forks: 0

ahtezaz123/hadoop-mapreduce-on-wikipedia-articles-

Big Data Analytics Assignment on Hadoop MapReduce

langage: Jupyter Notebook - taille: 5,54 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a environ un an - étoiles: 0 - forks: 0

juagarmar/Cov-Cor-matrix-via-Rhadoop

Covariance and correlation matrix via Rhadoop (rmr2 and HDFS)

langage: R - taille: 2,93 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ 8 ans - étoiles: 4 - forks: 0

juagarmar/linear-regression-via-Rhadoop

Linear regression via Rhadoop (rmr2 and RHDFS)

langage: R - taille: 5,86 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ 8 ans - étoiles: 3 - forks: 0

nzsaurabh/hadoop_training

Exercises on MapReduce, Pig, Spark, Relational and Non Relational data stores in Hadoop

taille: 939 ko - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 6 ans - étoiles: 0 - forks: 0

syscrest/oozie-graphite

Monitor your oozie server and your oozie bundles with graphite

langage: Java - taille: 621 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ 10 ans - étoiles: 7 - forks: 3

Driramohamedfarouk/bigdata-stock-market-pipeline

langage: Scala - taille: 310 ko - dernière synchronisation: il y a environ un an - enregistré: il y a environ 2 ans - étoiles: 0 - forks: 1

Rohit9314/my-hadoop

Setup hadoop cluster manually and automatically

langage: Python - taille: 23,4 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 8 ans - étoiles: 2 - forks: 0

snehpahilwani/WordCount-hadoop

Word Count code written for Hadoop platform (Java Implementation)

langage: Java - taille: 1,74 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a plus de 7 ans - étoiles: 0 - forks: 0

fzehracetin/big-data-project

Big Data Processing and Analytics course term project.

langage: JavaScript - taille: 8,77 Mo - dernière synchronisation: il y a environ un an - enregistré: il y a presque 4 ans - étoiles: 0 - forks: 0

jayanthanantharapu/WordCounter

Hadoop job to count words

langage: Java - taille: 3,91 ko - dernière synchronisation: il y a environ un an - enregistré: il y a presque 8 ans - étoiles: 0 - forks: 0