GitHub topics: hadoop-hdfs | Ecosyste.ms: Repos

yousef22609/context-hive

🚀 Drive AI collaboration from Day 0 with Context Hive, a methodology that integrates AI as a key team member from the project's inception.

Language: TypeScript - Size: 778 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, xDC replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding. Enterprise version is at seaweedfs.com.

Language: Go - Size: 146 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 26,632 - Forks: 2,506

Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

Language: Java - Size: 1.45 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 29 - Forks: 2

Morphl-AI/MorphL-Community-Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Language: Python - Size: 143 KB - Last synced at: 11 days ago - Pushed at: about 6 years ago - Stars: 260 - Forks: 29

hadoop-sandbox/hadoop-sandbox-images

Docker image builds for Hadoop sandbox.

Language: Dockerfile - Size: 93.8 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 4

groda/big_data

Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.

Language: Jupyter Notebook - Size: 62.5 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 85 - Forks: 27

mgarralda/hadoop-spark-cluster

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 3

ngodongnguyen/Hadoop-and-Flask-on-AWS

Language: HTML - Size: 3.16 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

harishpratap/Credit-Card-Fraud-Detection

The Capstone Project - This project revolves around the most widely used tools used in the Big Data engineering world project revolves around the detection of credit card fraud

Language: Python - Size: 7.3 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

songwo-vx18484646674/Hadoop-based-popular-video-data-analysis-prediction-and-visualization-system-on-Bilibili

本系统采用Java技术SpringBoot框架和Python技术Django框架（两种框架开发的都有）和Hadoop，hdfs，Scrapy爬虫，MySQL数据库，B/S 结构，Vue.js 技术，算法亮点，采用随机森林回归算法进行预测和协同过滤算法(余弦相似性）推荐。

Size: 14.6 KB - Last synced at: 13 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

letsiki/airflow-orchestrated-cleaning-and-hdfs-loading

[data engineer assessment] airflow orchestrated cleaning and loading of transactional data and into hadoop-hdfs

Language: Python - Size: 13.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

hadoop-sandbox/hadoop-sandbox

A fully-functional Hadoop Yarn cluster as docker-compose deployment.

Language: Shell - Size: 118 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 23 - Forks: 5

fbraza/scala-dfs-lib

DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

Language: Scala - Size: 114 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

HabibAroua/Newspaper-analysis

Language: Java - Size: 12.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 1

varghese25/SQLPractical_UseCases

100 SQL Practical Use Cases_Data Engineering

Language: Python - Size: 723 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Language: Java - Size: 297 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 132 - Forks: 34

DuaA-A/Big-Data

hands-on journey through the Big Data training by NTI. Includes labs, notebooks, and notes on tools like HDFS, Spark, Kafka, Flink, Hive, HBase and more.

Size: 32.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Amir2244/movies-rating

"movies-rating" is a recommendation system project that leverages distributed frameworks. Which includes services such as Hadoop Namenode, Hadoop Datanode, Spark Master, Spark Worker, and Redis.

Language: Java - Size: 10.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

kshitij-ka/Skycrate

Skycrate is a web based file management system that uses Hadoop as filesystem.

Language: JavaScript - Size: 12 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

vineet-k09/E-Book-Recommendation

BiblioVerse - E-Book Recommendation project based on hadoop and react with spark

Language: JavaScript - Size: 711 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

AyanantaPramanik/hadoop-banking-transaction-analysis

💳 Scalable banking transaction analysis using Python, HDFS, PySpark & Power BI — from synthetic data generation to real-time insights.

Language: Python - Size: 783 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Asterinos1/Movie-Preference-Analyzer

INF424 Project 2025: Movie Preference Analyzer. Big Data analytics tool using Apache Spark, developed in Scala. adhering to functional programming rules.

Language: Scala - Size: 4.05 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Diakkoo/docker-hadoop-container

Write Flask and HDFS containers using Dockerfile, and deploy HDFS clusters using Docker Compose

Language: Java - Size: 124 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

MasterPandaa/AirBNB_Cloudera_Hadoop

Pengolahan Dataset dan Analisis Tren Harga Sewa Properti AirBNB Menggunakan Cloudera Hadoop

Size: 30 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

DJ623/Twitter-Sentiment-Analysis

A mini project for analyzing, classifying, and visualizing the sentiments of tweets

Language: Python - Size: 1.05 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

Ren294/Log-Analysis-Project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

Language: Python - Size: 2.88 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 1

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

Size: 938 KB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 1,341 - Forks: 477

MarouaHattab/Big-Data-Price-Prediction

A Big Data project that implements a distributed machine learning pipeline for predicting property prices in Tunisia. The solution adapts a high-performing XGBoost model to a Spark environment using Gradient Boosted Trees, achieving 91.5% of the original model's accuracy (R² of 0.6918) while enabling large-scale processing.

Language: Python - Size: 1.3 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

ds2-lab/LambdaFS

λFS: an elastic, high-performance, serverless-function-based metadata service for large-scale distributed file systems (ACM ASPLOS'23)

Language: Java - Size: 173 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 11 - Forks: 2

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 7 months ago - Pushed at: almost 2 years ago - Stars: 102 - Forks: 25

Houssam-11/BigData-Architecture

Big Data system predicts pandemic risk (COVID-19) via data analysis, ML modeling, and real-time dashboard.

Language: Python - Size: 29 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

TechAlhan826/Hadoop-Tasks

Hadoop MapReduce Tasks Java - Big Data Project 🚀

Language: Java - Size: 275 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

KeerthanaJ-rec/210701118-CS19P16-DA-Lab

Data Analytics Laboratory

Language: R - Size: 23.1 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

Mahmoud-nfz/football-big-data

This is a comprehensive solution for real-time football analytics, leveraging Apache Spark execution on yarn for both streaming and batch processing, Hadoop HDFS for distributed storage, Kafka for real-time data ingestion, rethinkdb for live data updates , a custom built search engine and Next.js for data visualization.

Language: TypeScript - Size: 5.92 MB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 2

vaxdata22/NoSQL-and-Big-Data-demonstration

This is a fun assignment task I undertook to explore the world of NoSQL and Big Data. technologies.

Size: 8.6 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

kriss024/Hadoop

Hadoop and Hive fundamental commands

Language: Shell - Size: 451 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

IliesChibane/Projet-IoT-Cloud-BigData

Implémentation d'une pipeline permettant de faire la prédiction de la maladie de parkinson via des outils d'IoT, Cloud, et Big Data

Language: Python - Size: 891 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

elaaatif/JPEG-and-JPEG2000-compression-on-Multi-node-cluster-using-hadoop-and-spark

Big Data technologies can be leveraged for efficient, distributed image compression using JPEG2000 (Spark) and JPEG (MapReduce).

Size: 14.3 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

groda/hats

Hadoop Ansible Test Suite

Language: Shell - Size: 33.2 KB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

berksudan/Distributed-Environment-Installation-Guide

Install Hadoop, HDFS, Yarn and Spark on 3 Ubuntu 18.04 Machines

Size: 3.66 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

madhurimarawat/Big-Data-Analytics

This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 8 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 1

arkady-emelyanov/hadoop-playground 📦

🐘Yet another Hadoop playground

Language: Shell - Size: 49.8 KB - Last synced at: 5 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

chouaib-629/MovieRecommendation

A Hadoop-based Movie Recommendation System using the MovieLens dataset, demonstrating MapReduce for sorting and processing movie ratings.

Language: Java - Size: 320 KB - Last synced at: 8 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

benjdiasaad/MapReduce_WordCount

Création d'un programme Hadoop Java : compteur d’occurrence de mots. Si vous souhaitez compiler manuellement le code sur la machine virtuelle Hadoop, vous devrez y copier ce code dans la VM

Language: Java - Size: 11.7 KB - Last synced at: 4 months ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

mehwishferoz/BDA-project

A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.

Language: Java - Size: 7.42 MB - Last synced at: 8 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Ren294/Covid-Data-Process

This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.

Language: Shell - Size: 6.22 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

Ibrahim-Maiga/Big-Data-Analysis-with-PySpark

A comprehensive big data analysis examining correlations between temperature changes and societal metrics (crime rates, birth rates, and energy consumption) across the US and Canada. The project leverages multiple database systems and cloud computing to process and analyze large-scale climate and social data.

Language: Jupyter Notebook - Size: 454 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

DevLucho/Spark-Procesamiento-en-batch

Este proyecto utiliza PySpark para analizar datos de estudiantes a partir de un archivo CSV almacenado en HDFS.

Language: Python - Size: 93.8 KB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

ANKIT21111/Patient-Alert-ETL

The Patient Alert ETL 🚑 project creates a real-time data pipeline to monitor vital health parameters from IoT devices in hospitals. Using Apache Kafka, Spark, and HBase, it processes streaming data and sends immediate alerts via Amazon SNS when vitals exceed normal thresholds, enhancing patient care through timely interventions.

Language: Python - Size: 5.47 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

IBM/sparksql-for-hbase

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

Size: 614 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 69 - Forks: 22

divithraju/divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

aymane-maghouti/Mobile-Data-Hive-Insights

This project demonstrates the process of extracting data from a MySQL database, transferring it using Apache Sqoop, storing it in Hive Data warehouse (the data actually is store in Hadoop Distributed File System (HDFS)), and performing analysis using Hive Query Language (Hive QL) (it is a language close to SQL). Then visualize the data in Power BI,

Language: HiveQL - Size: 691 KB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

prateekkr1/Project-Work

This repository contains some of my personal projects.

Language: Jupyter Notebook - Size: 3.79 MB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

jodth07/hadoop-installation

Instructions on setting up Hadoop, HDFS, java, sbt, kafka, scala, spark and flume on Ubuntu 18.04

Language: Shell - Size: 61.5 KB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 8 - Forks: 15

INeerav/sparkini

base docker compose to setup the data engineering env in local

Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

KingJin-web/Hadoop

hadoop-hdfs 以及 mapreduce 学习

Language: Java - Size: 7.56 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

xpcosmos/data-lake-prime

This project aims to simulate and configure a Distributed File System using Hadoop HDFS. For this project, 3 machines were created: 1 Master Node and 2 Worker Nodes.

Language: Shell - Size: 815 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

29DCH/Hadoop-HDFS-MapReduce-Examples

Java API操作HDFS文件、基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用

Language: Java - Size: 35.2 KB - Last synced at: 8 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 1

viveknigam3003/hadoop-linux-setup

Python scripts to assist setting up Hadoop v1 in Linux and starting a NameNode, DataNodes and Client.

Language: Python - Size: 80.1 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 3 - Forks: 3

StefanoFioravanzo/evolving-wikipedia-graph

Distributed processing of Wikipedia history files using Hadoop and Spark

Language: Scala - Size: 3.57 MB - Last synced at: 8 months ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

dotM87/triaina

big data project, information storage in hdfs

Size: 2.93 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ANKIT21111/SparNordETL

ETL Pipeline for Spar Nord Bank for the analysis of refilling frequency of the ATM's all over the europe

Language: Jupyter Notebook - Size: 4.59 MB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

subhash26jan96/cluster

This repository has a hadoop cluster code that are automated, ondemand, manual using by python, linux, html etc.

Language: Python - Size: 16.6 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 1

varunajmera0/PyHDFS

PyHDFS: Scalable & resilient distributed file system. Components: Zookeeper, NameNode, DataNode, Metadata service, Client. Setup guide for AWS & local. Explore distributed storage!

Language: Python - Size: 630 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ra312/spark_with_scala

A collection of useful scala scripts to work with Hadoop

Language: Scala - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

ais04134/hyperv-hadoop-spark-cluster

Hadoop Ecosystem - 대규모 빈발 패턴 마이닝을 위한 하둡 클러스터 환경 구축

Language: Shell - Size: 2.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

emrectn/HadoopTutorial

hadoop

Language: Java - Size: 15.6 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

tableMinPark/trendflow

❗ 트랜드 분석 플랫폼 - SSAFY 8기 특화 프로젝트

Language: Java - Size: 55.6 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

prabal03/python-automation-in-linux

Python automation in linux

Language: Python - Size: 16.1 MB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 2

federicopfund/data-engineer

Proceso ETL

Language: Jupyter Notebook - Size: 84.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

vim89/datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Language: Python - Size: 1.76 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 34

aadishgoel/Hadoop-Codes

Neat and Handy Place for all Hadoop codes

Language: Java - Size: 25.4 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 6 - Forks: 3

nzsaurabh/hadoop_training

Exercises on MapReduce, Pig, Spark, Relational and Non Relational data stores in Hadoop

Size: 939 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

Driramohamedfarouk/bigdata-stock-market-pipeline

Language: Scala - Size: 310 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

venkat-a/Exploratory-Data-Analysis-EDA-using-PySpark

Leverage the power of Apache Spark for large-scale data processing and analysis

Language: Jupyter Notebook - Size: 147 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

41xu/Hadoop-ClassNotes

Some code during learning Hadoop.

Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

oahjimenez/x3ia020_pagerank

PageRank - Pig vs PySpark comparison https://madoc.univ-nantes.fr/mod/assign/view.php?id=1511791

Language: Python - Size: 555 KB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

jarlor/TravelWebsite_BigDataAnalysis

旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)

Language: Java - Size: 639 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

Luissalazarsalinas/Avocado-Yield-Prediction

Freelancer Project - Batch processing data pipeline and machine learning application.

Language: Python - Size: 3.53 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

seyfal/MapReduceGraphComparison

Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.

Language: Scala - Size: 1.76 MB - Last synced at: 23 days ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

JohnLyonX/hadoop

Hadoop Configuration

Language: Shell - Size: 38.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

JUL-13N/Hadoop-map-reduce

This is a word count application using the implementation of map and reduce functions. I tested this on Amazon Web Services (AWS) through a double instance cluster network. This network used a functional master (NameNode) and slave (DataNode) nodes.

Language: Java - Size: 11.7 KB - Last synced at: 4 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Lakshmiec/Big-Data-Sentiment-Analysis-of-Amazon-Reviews-for-Seller-and-Brand-Empowerment

Language: Jupyter Notebook - Size: 1.53 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

drewm8080/big_data_management

Contains all homework from the course Foundations of Database Management at USC

Language: Python - Size: 4.75 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

claireboyd/311requests_chicago

Created a simple web app which gives users a summary of the types of 311 requests in their Chicago neighborhood, built with Lambda Architecture principles using Apache's tech stack

Language: HiveQL - Size: 27.5 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

Language: Java - Size: 1000 KB - Last synced at: 22 days ago - Pushed at: about 4 years ago - Stars: 6 - Forks: 0

mikeroyal/Apache-Hadoop-Guide

Apache Hadoop Guide

Size: 141 KB - Last synced at: 7 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 2

marcocolangelo/Big-Data-processing-and-Analytics

The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark

Language: Java - Size: 6.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

MarceloJSSantos/acelereracao-global-dev-4-everis-dio

Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.

Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 4 - Forks: 2

seyfal/SparkMitMAttackSim

Scalable simulation of MitM attacks using parallel random walks and graph analytics on Spark.

Language: Scala - Size: 76.2 KB - Last synced at: 23 days ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

abroniewski/IdleCompute-Data-Management-Architecture

Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.

Language: Jupyter Notebook - Size: 34.8 MB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 1

benjdiasaad/MapReduce_K-means

Implémentation de l'algorithme de clustering k-means en utilisant le framework Hadoop version 3.1.3 (MapReduce).

Language: Java - Size: 32.2 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

lauravoicu/Coursera-Hadoop-Platform-Application

Language: Python - Size: 6.84 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

MatteoM95/Big-data-processing-and-analytics

Exercises on Spark and Hadoop - Done in Distributed architectures for big data processing and analytics course at Politecnico di Torino

Language: Java - Size: 4.94 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 2

HaneefAhamed/Hadoop_Map_Reduce

Hadoop setup and Getting Started with developing Hadoop programs

Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ayush-usf/stack-overflow-logs-hadoop-analysis

Ask Ubuntu Logs analysis with Hadoop, MapReduce 2(Yarn)

Language: Java - Size: 108 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

SubalakshmiShanthosi/PCP1211DALab

Language: TeX - Size: 34.4 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Slimani-CE/big-data-tp1

Un TP qui vise à familiariser les apprenants avec le système de fichiers distribué Hadoop (HDFS). Les objectifs spécifiques comprennent le démarrage des processus Hadoop, la création d'une structure d'arborescence dans le HDFS, la manipulation de fichiers en utilisant des commandes Hadoop.

Size: 495 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

asaldelkhosh-learning/hadoop

Learning Hadoop and Map-Reduce!

Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0