An open API service providing repository metadata for many open source software ecosystems.

Topic: "hadoop-filesystem"

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

Language: Go - Size: 149 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 4,633 - Forks: 371

GoogleCloudDataproc/hadoop-connectors

Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.

Language: Java - Size: 11.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 285 - Forks: 249

linkedin/dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Language: Java - Size: 297 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 129 - Forks: 36

mmolimar/kafka-connect-fs

Kafka Connect FileSystem Connector

Language: Java - Size: 524 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 110 - Forks: 77

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

Language: Python - Size: 3.46 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 102 - Forks: 25

jingw/pyhdfs

Python HDFS client

Language: Python - Size: 118 KB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 93 - Forks: 22

longshilin/HDFS-Netdisc

基于Hadoop的分布式云存储系统 :palm_tree:

Language: Java - Size: 3.93 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 71 - Forks: 20

palantir/hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Language: Java - Size: 1.48 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 46 - Forks: 36

vivek2319/Learn-Hadoop-and-Spark

This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.

Language: Python - Size: 211 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 46 - Forks: 39

ExpediaGroup/datasqueeze

Hadoop utility to compact small files

Language: Java - Size: 1.19 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 18 - Forks: 7

procter-gamble-oss/octopufs

OctopuFS library helps managing cloud storage, ADLSgen2 specifically. It allows you to operate on files (moving, copying, setting ACLs) in very efficient manner. Designed to work on databricks, but should work on any other platform as well.

Language: Scala - Size: 1.35 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 8

averyzhong/hdfs-over-sftp

SFTP server which works on the top of HDFS,It is based on Apache sshd to access and operate HDFS through SFTP protocol

Language: Java - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 11 - Forks: 5

waltherg/distributable_docker_sql_on_hadoop

Toy Hadoop cluster combining various SQL-on-Hadoop variants

Language: Shell - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 11 - Forks: 4

Tapad/sbt-hadoop-oss 📦

An sbt plugin for publishing artifacts to HDFS.

Language: Scala - Size: 22.5 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 1

fasouto/webhdfspy

Python wrapper to access Hadoop HDFS REST API

Language: Python - Size: 38.1 KB - Last synced at: 4 days ago - Pushed at: over 8 years ago - Stars: 8 - Forks: 5

christopherkindl/twitter-data-pipeline-using-airflow-and-apache-spark

Data pipeline to process and analyse Twitter data in a distributed fashion using Apache Spark and Airflow in AWS environment

Language: Python - Size: 5.16 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 7 - Forks: 1

pfisterer/apache-hadoop-helm Fork of mgit-at/helm-hadoop-3

Helm chart for Apache Hadoop using multi-arch docker images

Language: Dockerfile - Size: 104 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

jazzwang/hadoop_labs

MapReduce Java Code Examples to learn Hadoop

Language: Java - Size: 79.1 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 6 - Forks: 1

aadishgoel/Hadoop-Codes

Neat and Handy Place for all Hadoop codes

Language: Java - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 3

TritonDataCenter/hadoop-manta

Hadoop Filesystem Driver for Manta

Language: Java - Size: 172 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 6

HxnDev/Finding-Average-Temperature-of-Each-Year-using-Hadoop-HDFS

In this task, we had to calculate the average temperature for each year from the given dataset using Hadoop HDFS. We had to create a MapReduce function to perform this task.

Language: Java - Size: 451 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

HxnDev/Hadoop-MapReduce-to-Find-Average-Length-of-Comments

In this task, we had to find the average length of comments given in the dataset. It was done using Hadoop MapReduce and Hadoop HDFS.

Language: Java - Size: 675 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 1

Mohammed-siddiq/hadoop-XMLInputFormatWithMultipleTags

Mahout's XMLInputFormat with support for multiple input and output tags.

Language: Java - Size: 8.79 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

SarahAyaz/YouTube_Data_Analysis

Analysis of YouTube Data using Hadoop Mapreduce framework in Java.

Language: Java - Size: 24.5 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 3 - Forks: 2

rshad/OpenCCML

Category: Cloud Computing and Machine Learning Application - Subject: A cloud platform to make data processing with machine learning algorithms, built on Openstack, using Spark for data distribution and Hadoop Filesystem for data storage

Language: Python - Size: 10.2 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

CUBigDataClass/soccer-tweet-analysis

Ingestion pipeline to analyze soccer tweets

Language: Python - Size: 4.2 MB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 3 - Forks: 1

jaimess/quickorc

Easy way to write java objects to apache orc files.

Language: Java - Size: 30.3 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 1

mikeroyal/Apache-Hadoop-Guide

Apache Hadoop Guide

Size: 141 KB - Last synced at: 24 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

samarthtambad/big-data-pl

Analysing programming languages by community characteristics on Github and StackOverflow

Language: Scala - Size: 30.3 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

alex-ber/docker-hive Fork of ops-guru/docker-hive

EMR 5.25.0 cluster single node Hadoop docker image. With Amazon Linux, Hadoop 2.8.5 and Hive 2.3.5

Language: Shell - Size: 45.9 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 1

NikhilURao/H1B_VisaProject

This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.

Language: Shell - Size: 729 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

Niranjankumar-c/DataAnalytics_using_ClickstreamData

Casestudy completed as part of BigData training from analytix labs

Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 2 - Forks: 2

huangyueranbbc/hadoop05_pagerank

pagerank hadoop

Language: Java - Size: 39.5 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

Rohit9314/my-hadoop

Setup hadoop cluster manually and automatically

Language: Python - Size: 23.4 KB - Last synced at: about 1 year ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 0

vishal2232/Project_1-Spark-using-Scala-API-

Problem statement, get the revenue and number of orders from order_items on daily basis.

Size: 1.67 MB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 2 - Forks: 0

fbraza/scala-dfs-lib

DFS-Lib is a scala flavoured api to the Hadoop java filesystem api

Language: Scala - Size: 75.2 KB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

humanbeeng/hadoop-auto-install

A small helper script that can save your valuable time during installation of Apache Hadoop.

Language: Shell - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Evegen55/mastering-spark

mastering spark

Language: Java - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

oykuyildirim/Flume-Service

Getting tweets using Flume service and analyzing tweets

Size: 288 KB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

AlmazSamatov/SearchEngine

Search Engine implemented with Hadoop Map Reduce using TF/IDF

Language: Java - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 1

f2e-awesome/HadoopEcosystem

Hadoop 生态体系(ecosystem)

Language: JavaScript - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

dorianbg/EEG_ClientGUI

A Java Swing GUI for building EEG data analysis workflows

Language: Java - Size: 203 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

tertiarycourses/ApacheHadoop

Exercise files for Apache Hadoop Big Data Training

Size: 63.5 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

NthPortal/hdfs-secure-erase 📦

Secure Erase utility for HDFS

Language: Java - Size: 83 KB - Last synced at: 5 months ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

kriss024/Hadoop

Hadoop and Hive fundamental commands

Language: Shell - Size: 451 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

VladRodionov/sidecar

Sidecar is the Hadoop - compatible caching (both reads and writes) file system. It was specifically designed to support faster read/write access to a remote cloud storage systems: S3, Google Cloud Storage, Azure Blob Storages, etc

Language: Java - Size: 504 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Rifat392000/BigDataAnalytics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

ondergormez/BLM5127_Big_Data_Analytics

Average Temperature - Hadoop - Mapper - Reducer

Language: Scala - Size: 73.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

codyle50/Airbnb-Big-Data-Management

To develop an Airbnb database and create a pipeline using MongoDB and Hadoop architecture to ease the process of managing, loading, processing, querying, and analyzing Airbnb data based on location

Language: Jupyter Notebook - Size: 377 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Slimani-CE/hadoop-crud-api

Une API en Java pour interagir avec le Hadoop Distributed File System (HDFS). Cette API offre des fonctionnalités pour la lecture et l'écriture de données dans le HDFS

Language: Java - Size: 28.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dongma/apache-hbase

apache-hbase imports data from csv files, include create table and fetch relevant data.

Language: Java - Size: 121 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kumarvna/terraform-azurerm-hdinsight

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

Language: HCL - Size: 365 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 5

aogunwoolu/Ethereum-analysis

ETH analysis using big data for the QMUL Big Data Processing module. Intended to promote analysis of data retrieved via big data processing

Language: Jupyter Notebook - Size: 960 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

sinevmaxim/WebHDFSClient

Big Data project. Web client for HDFS. Working in the terminal. Has ability to manipulate local and Hadoop storage

Language: Python - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ReeceASharp/Shridoop

A simulated Distributed File-System

Language: Java - Size: 694 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

subhasisgorai/HQuery

HQuery Codebase. HQuery provides an easy and effective interface through which business users can interact with Hadoop, can submit jobs, check the status, and eventually exports the result in the format they prefer.

Language: Java - Size: 3.53 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

cevheri/hadoop-mr-example-currency

Hadoop MapReduce, Read currency.txt and driver, mapper, and reducer

Language: Java - Size: 313 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

cevheri/hadoop.3-config

My Apache Hadoop 3 config files.

Language: Shell - Size: 65.4 KB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

nuhyurdev/popular-baby-names

Language: PigLatin - Size: 39.1 KB - Last synced at: 5 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

ppramudita/Hadoop-project-Map-Reduce-project-NCDC-data-set

Implement & Evaluate performance of MySQL, Hadoop MapReduce and Sqoop with HDFS for functions like max temperature on NCDC dataset for large data (20GB).

Language: Java - Size: 2.25 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

NilufaYeasmin/MapReduce

This repo contains implementations of Mapreduce program in a large text corpus with Apache Hadoop Environment | Nilufa Yeasmin | https://www.linkedin.com/in/nilufayeasmin/

Language: CSS - Size: 3.53 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

pradyumnameena/COL733-Cloud-Computing

Collection of assignments offered under COL733 - Cloud Computing by Prof. Suresh Chand Gupta

Language: Python - Size: 53.4 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

sai-sreenath/Hadoop_Mapreduce_BerkleyGraphDataset

Language: Java - Size: 279 KB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

rfhussain/Running-a-Spark-Job-on-AWS-Cluster

When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.

Language: Python - Size: 804 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

ManasaPola/Distributed-Parallel_DB

Distributed and Parallel Database Tasks

Language: Python - Size: 1.46 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

amittian/DATA_ANALYSIS-VISUALIZATION-using-Hive-and-TABLEAU

DATA_ANALYSIS & VISUALIZATION using Hadoop , Hive and TABLEAU

Size: 2.13 MB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Prakhar-FF13/Hadoop

This repository contains Hadoop Ecosystem Files (Code, data, readme etc...)

Language: Java - Size: 36.1 KB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

hordiales/hadoop-cluster-docker Fork of kiwenlau/hadoop-cluster-docker

Run Hadoop Custer within Docker Containers (sequenceiq/hadoop-docker image)

Language: Shell - Size: 1.82 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

yelshater/hadoop-2.3.0

Language: Java - Size: 15.8 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 1

dhitaj/bdc-sapienza

Assignments of Big Data course during the Spring 2017 semester at Sapienza

Language: Java - Size: 337 KB - Last synced at: 5 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

rahulsurti97/distributed_file_system

Hadoop style file system

Language: Java - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

imdeepanshugpt/Hadoop

Hadoop-Cluster

Language: Python - Size: 887 KB - Last synced at: about 1 month ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

snehpahilwani/WordCount-hadoop

Word Count code written for Hadoop platform (Java Implementation)

Language: Java - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

jerinisready/MapReduce-Electricty-Problem-Example

MapReduce Electricty Problem Example

Language: Java - Size: 3.66 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

nitin2407/HadoopMapRExamples

Examples of hadoop implementations with different datasets.

Language: Java - Size: 35.9 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

dennisbachmann/docker-spark-hdfs

A debian:jessie based Spark + HadoopDFS docker container.

Language: Shell - Size: 3.91 KB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

daniarherikurniawan/hadoop-0.20

Reproducing a bug about decommission monitor thread spending too much cpu time

Language: Java - Size: 70 MB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Hadoop_MapReduce

Language: Java - Size: 32.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

huangyueranbbc/Hadoop_HDFS

Language: Java - Size: 32.8 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

DataEngg/Kafka-Spark-Streaming

Spark Streaming via Kafka

Size: 26.3 MB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 3

harsh306/split-files-

Helps to read from file splits.

Language: Java - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0

yangboz/mipr Fork of sozykin/mipr

MapReduce Image Processing framework for Hadoop

Language: Java - Size: 734 KB - Last synced at: about 1 year ago - Pushed at: over 8 years ago - Stars: 0 - Forks: 0