Topic: "hadoop-filesystem"
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
Language: Go - Size: 149 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 4,633 - Forks: 371

GoogleCloudDataproc/hadoop-connectors
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Language: Java - Size: 11.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 285 - Forks: 249

linkedin/dynamometer
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

mmolimar/kafka-connect-fs
Kafka Connect FileSystem Connector
Language: Java - Size: 524 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 110 - Forks: 77

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka
Data Engineering Project with Hadoop HDFS and Kafka
Language: Python - Size: 3.46 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

jingw/pyhdfs
Python HDFS client
Language: Python - Size: 118 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 93 - Forks: 22

longshilin/HDFS-Netdisc
基于Hadoop的分布式云存储系统 :palm_tree:
Language: Java - Size: 3.93 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 71 - Forks: 20

palantir/hadoop-crypto
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Language: Java - Size: 1.48 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 46 - Forks: 36

vivek2319/Learn-Hadoop-and-Spark
This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.
Language: Python - Size: 211 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 46 - Forks: 39

ExpediaGroup/datasqueeze
Hadoop utility to compact small files
Language: Java - Size: 1.19 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 7

procter-gamble-oss/octopufs
OctopuFS library helps managing cloud storage, ADLSgen2 specifically. It allows you to operate on files (moving, copying, setting ACLs) in very efficient manner. Designed to work on databricks, but should work on any other platform as well.
Language: Scala - Size: 1.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 8

averyzhong/hdfs-over-sftp
SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol
Language: Java - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 11 - Forks: 5

waltherg/distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
Language: Shell - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 4

Tapad/sbt-hadoop-oss 📦
An sbt plugin for publishing artifacts to HDFS.
Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 1

fasouto/webhdfspy
Python wrapper to access Hadoop HDFS REST API
Language: Python - Size: 38.1 KB - Last synced at: 4 days ago - Pushed at: over 8 years ago - Stars: 8 - Forks: 5

christopherkindl/twitter-data-pipeline-using-airflow-and-apache-spark
Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment
Language: Python - Size: 5.16 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 1

pfisterer/apache-hadoop-helm Fork of mgit-at/helm-hadoop-3
Helm chart for Apache Hadoop using multi-arch docker images
Language: Dockerfile - Size: 104 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

jazzwang/hadoop_labs
MapReduce Java Code Examples to learn Hadoop
Language: Java - Size: 79.1 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 6 - Forks: 1

aadishgoel/Hadoop-Codes
Neat and Handy Place for all Hadoop codes
Language: Java - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

TritonDataCenter/hadoop-manta
Hadoop Filesystem Driver for Manta
Language: Java - Size: 172 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 6

HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS
In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.
Language: Java - Size: 451 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Find-Average-Length-of-Comments
In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.
Language: Java - Size: 675 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 1

Mohammed-siddiq/hadoop-XMLInputFormatWithMultipleTags
Mahout's XMLInputFormat with support for multiple input and output tags.
Language: Java - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

SarahAyaz/YouTube_Data_Analysis
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
Language: Java - Size: 24.5 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 2

rshad/OpenCCML
Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage
Language: Python - Size: 10.2 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

CUBigDataClass/soccer-tweet-analysis
Ingestion pipeline to analyze soccer tweets
Language: Python - Size: 4.2 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 3 - Forks: 1

jaimess/quickorc
Easy way to write java objects to apache orc files.
Language: Java - Size: 30.3 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

mikeroyal/Apache-Hadoop-Guide
Apache Hadoop Guide
Size: 141 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

samarthtambad/big-data-pl
Analysing programming languages by community characteristics on Github and StackOverflow
Language: Scala - Size: 30.3 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

alex-ber/docker-hive Fork of ops-guru/docker-hive
EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5
Language: Shell - Size: 45.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

NikhilURao/H1B_VisaProject
This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.
Language: Shell - Size: 729 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

Niranjankumar-c/DataAnalytics_using_ClickstreamData
Casestudy completed as part of BigData training from analytix labs
Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 2

huangyueranbbc/hadoop05_pagerank
pagerank hadoop
Language: Java - Size: 39.5 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

Rohit9314/my-hadoop
Setup hadoop cluster manually and automatically
Language: Python - Size: 23.4 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 0

vishal2232/Project_1-Spark-using-Scala-API-
Problem statement, get the revenue and number of orders from order_items on daily basis.
Size: 1.67 MB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 2 - Forks: 0

fbraza/scala-dfs-lib
DFS-Lib is a scala flavoured api to the Hadoop java filesystem api
Language: Scala - Size: 75.2 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

humanbeeng/hadoop-auto-install
A small helper script that can save your valuable time during installation of Apache Hadoop.
Language: Shell - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Evegen55/mastering-spark
mastering spark
Language: Java - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

oykuyildirim/Flume-Service
Getting tweets using Flume service and analyzing tweets
Size: 288 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

AlmazSamatov/SearchEngine
Search Engine implemented with Hadoop Map Reduce using TF/IDF
Language: Java - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

f2e-awesome/HadoopEcosystem
Hadoop 生态体系(ecosystem)
Language: JavaScript - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

dorianbg/EEG_ClientGUI
A Java Swing GUI for building EEG data analysis workflows
Language: Java - Size: 203 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

tertiarycourses/ApacheHadoop
Exercise files for Apache Hadoop Big Data Training
Size: 63.5 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

NthPortal/hdfs-secure-erase 📦
Secure Erase utility for HDFS
Language: Java - Size: 83 KB - Last synced at: 5 months ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

kriss024/Hadoop
Hadoop and Hive fundamental commands
Language: Shell - Size: 451 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

VladRodionov/sidecar
Sidecar is the Hadoop - compatible caching (both reads and writes) file system. It was specifically designed to support faster read/write access to a remote cloud storage systems: S3, Google Cloud Storage, Azure Blob Storages, etc
Language: Java - Size: 504 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Rifat392000/BigDataAnalytics
Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

ondergormez/BLM5127_Big_Data_Analytics
Average Temperature - Hadoop - Mapper - Reducer
Language: Scala - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

codyle50/Airbnb-Big-Data-Management
To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location
Language: Jupyter Notebook - Size: 377 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Slimani-CE/hadoop-crud-api
Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS
Language: Java - Size: 28.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dongma/apache-hbase
apache-hbase imports data from csv files, include create table and fetch relevant data.
Language: Java - Size: 121 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kumarvna/terraform-azurerm-hdinsight
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Language: HCL - Size: 365 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 5

aogunwoolu/Ethereum-analysis
ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing
Language: Jupyter Notebook - Size: 960 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sinevmaxim/WebHDFSClient
Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage
Language: Python - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ReeceASharp/Shridoop
A simulated Distributed File-System
Language: Java - Size: 694 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

subhasisgorai/HQuery
HQuery Codebase. HQuery provides an easy and effective interface through which business users can interact with Hadoop, can submit jobs, check the status, and eventually exports the result in the format they prefer.
Language: Java - Size: 3.53 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

cevheri/hadoop-mr-example-currency
Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer
Language: Java - Size: 313 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

cevheri/hadoop.3-config
My Apache Hadoop 3 config files.
Language: Shell - Size: 65.4 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

nuhyurdev/popular-baby-names
Language: PigLatin - Size: 39.1 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

ppramudita/Hadoop-project-Map-Reduce-project-NCDC-data-set
Implement & Evaluate performance of MySQL, Hadoop MapReduce and Sqoop with HDFS for functions like max temperature on NCDC dataset for large data (20GB).
Language: Java - Size: 2.25 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

NilufaYeasmin/MapReduce
This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/
Language: CSS - Size: 3.53 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

pradyumnameena/COL733-Cloud-Computing
Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta
Language: Python - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

sai-sreenath/Hadoop_Mapreduce_BerkleyGraphDataset
Language: Java - Size: 279 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

rfhussain/Running-a-Spark-Job-on-AWS-Cluster
When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.
Language: Python - Size: 804 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

ManasaPola/Distributed-Parallel_DB
Distributed and Parallel Database Tasks
Language: Python - Size: 1.46 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

amittian/DATA_ANALYSIS-VISUALIZATION-using-Hive-and-TABLEAU
DATA_ANALYSIS & VISUALIZATION using Hadoop , Hive and TABLEAU
Size: 2.13 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Prakhar-FF13/Hadoop
This repository contains Hadoop Ecosystem Files (Code, data, readme etc...)
Language: Java - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

hordiales/hadoop-cluster-docker Fork of kiwenlau/hadoop-cluster-docker
Run Hadoop Custer within Docker Containers (sequenceiq/hadoop-docker image)
Language: Shell - Size: 1.82 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

yelshater/hadoop-2.3.0
Language: Java - Size: 15.8 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

dhitaj/bdc-sapienza
Assignments of Big Data course during the Spring 2017 semester at Sapienza
Language: Java - Size: 337 KB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

rahulsurti97/distributed_file_system
Hadoop style file system
Language: Java - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

imdeepanshugpt/Hadoop
Hadoop-Cluster
Language: Python - Size: 887 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

snehpahilwani/WordCount-hadoop
Word Count code written for Hadoop platform (Java Implementation)
Language: Java - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

jerinisready/MapReduce-Electricty-Problem-Example
MapReduce Electricty Problem Example
Language: Java - Size: 3.66 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

nitin2407/HadoopMapRExamples
Examples of hadoop implementations with different datasets.
Language: Java - Size: 35.9 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

dennisbachmann/docker-spark-hdfs
A debian:jessie based Spark + HadoopDFS docker container.
Language: Shell - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

daniarherikurniawan/hadoop-0.20
Reproducing a bug about decommission monitor thread spending too much cpu time
Language: Java - Size: 70 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Hadoop_MapReduce
Language: Java - Size: 32.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Hadoop_HDFS
Language: Java - Size: 32.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

DataEngg/Kafka-Spark-Streaming
Spark Streaming via Kafka
Size: 26.3 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 3

harsh306/split-files-
Helps to read from file splits.
Language: Java - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

yangboz/mipr Fork of sozykin/mipr
MapReduce Image Processing framework for Hadoop
Language: Java - Size: 734 KB - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0
