GitHub topics: cloudera-hadoop

Repositories

jigyasaG18/Airline-Performance-And-Passenger-Satisfaction-Project-Using-Big-Data-Analytics

This project analyzes 10 years of U.S. domestic airline data (~3GB) using Hadoop (Cloudera) and Hive for data processing. Power BI dashboards visualize key metrics like delays, on-time rates, air time, and diversions. The solution includes Hive queries, DAX measures, HDFS ingestion scripts, and year-wise insights with recommendations.

Language: HiveQL - Size: 21.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MasterPandaa/AirBNB_Cloudera_Hadoop

Pengolahan Dataset dan Analisis Tren Harga Sewa Properti AirBNB Menggunakan Cloudera Hadoop

Size: 30 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

AdrianYuu/qualification-big-data-processing

A qualification project for teaching as an assistant at SLC in the COMP6579001 Big Data Processing course.

Language: Jupyter Notebook - Size: 2.11 MB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Rifat392000/BigDataAnalytics

Language: Jupyter Notebook - Size: 18.4 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

akshaydake123/Sentiment-Analysis-on-Twitter-Data

This contains how to perform Sentiment Analysis on the tweets from Twitter using Hive.Collect the tweets from Twitter using Flume, As the tweets coming in from twitter are in Json format, we need to load the tweets into Hive using json input format. Use Cloudera Hive json serde for this purpose.

Size: 575 KB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

meetgajjarx07/Baseball-analysis-BigData

This project utilizes the Cloudera platform and PIG queries to analyze and retrieve information on specific baseball performance and statistics problems. By employing big data methods, the analysis offers valuable insights into player performance, game trends, and strategic patterns.

Size: 3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

VaishnavJois/CLOUDERA

Cloudera commands used for Big Data Analytics

Size: 13.7 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

achintya-kumar/BD2017

Otto-von-Guericke Universität Magdeburg - Big Data SoSe 2017

Language: Java - Size: 28.1 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

Ranjandas/Dirty-CDH-Docker

A quick and dirty CDH cluster skeleton using Docker for Testing

Language: Shell - Size: 7.81 KB - Last synced at: over 1 year ago - Pushed at: over 9 years ago - Stars: 6 - Forks: 2

sergevs/ansible-cloudera-hadoop

ansible playbook to deploy cloudera hadoop components to the cluster

Language: Shell - Size: 6.3 MB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 52 - Forks: 41

smartlin5228/CCA175

Language: Java - Size: 107 KB - Last synced at: over 2 years ago - Pushed at: about 8 years ago - Stars: 7 - Forks: 10

Rishi500067313/Twitter-data-stream-into-MySQL-table-using-NiFI

Size: 1.51 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

shubnimkar/Hadoop

This repository includes two versions of hadoop management tools

Size: 320 MB - Last synced at: 23 days ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

vodkolav/DataEngineerProject

This is my final project for Data Engineer Expert course at Naya College.

Language: Jupyter Notebook - Size: 930 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

JohnnyFoulds/local-hadoop

This project creates a small local Hadoop cluster using Cloudera CDH and CentOS.

Language: Python - Size: 216 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

haspdecrypted/OS-for-Big-Data-and-Hadoop

Getting Started with Hadoop and Big Data

Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

tilakpatidar/cdh5

Docker image for Cloudera Hadoop components (CDH5)

Language: Shell - Size: 51.8 KB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 9 - Forks: 5

Johnny1110/Hadoop_Note

學習 Hadoop 筆記

Language: Shell - Size: 8.41 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

guptasaumya/navigator-data-service

Navigator is a data service that prepares the content for travel agencies, ready for exploration in EWNS (East-West-North-South) direction and hence allows them to render content to the end-user based on their desire to travel.

Language: Java - Size: 30.3 MB - Last synced at: 4 months ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

dengshaochun/cdh-tools

cloudera hadoop auto install

Language: Shell - Size: 923 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 1

kwartile/spark-benchmark

Spark Benchmark suite to evaluate cluster configuration and compare the performance with other big data frameworks.

Language: Scala - Size: 28.3 KB - Last synced at: 8 months ago - Pushed at: over 8 years ago - Stars: 2 - Forks: 0

dorianbg/cloudera-quickstart-installation-guide

How to install Cloudera quickstart

Size: 909 KB - Last synced at: over 2 years ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 3

akshay-madar/MovieTycoon-gcp-based-BI-tool

GCP hosted product for over 1 million movie investors on HSX.com, aiding online movie trading and box-office investments by leveraging Big Data technologies like Hive and Hadoop, and Tableau dashboards

Language: Jupyter Notebook - Size: 1.71 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 1

Ishuan/Page-Rank-Implementation

The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of this assignment are to: 1. Understand the PageRank algorithm and how it works in MapReduce. 2. Implement PageRank and execute it on a large corpus of data. 3. Examine the output from running PageRank on Simple English Wikipedia to measure the relative importance of pages in the corpus. To run your program on the full Simple English Wikipedia archive, you will need to run it on the dsba-hadoop cluster to which you have access.

Language: Java - Size: 36.1 KB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

Related Keywords

cloudera-hadoop 36 cloudera 15 hadoop 14 hive 11 spark 9 hadoop-mapreduce 5 cloudera-hadoop-framework 4 hbase 4 java 4 oozie 3 pyspark 3 hdfs 3 cdh 3 docker 3 kafka 3 scala 3 sparksql 3 bigdata 3 big-data-analytics 3 big-data 3 apache-spark 2 impala 2 spark-sql 2 hbase-shell 2 hadoop-hdfs 2 mysql 2 python3 2 pig 2 twitter 2 hue 2 mapreduce-java 2 jupyter-notebook 2 zookeeper 2 docker-compose 2 hsx 1 gcp 1 cloud 1 investment 1 movies 1 product 1 sentimental-analysis 1 box-office 1 performance 1 benchmarking-suite 1 benchmark 1 auto-install 1 ansible 1 travel 1 navigator 1 eclipse-ide 1 postgresql 1 wordcount 1 cdh5 1 tf-idf 1 term-frequency 1 mapreduce 1 keywords-builder 1 hadoop-platform 1 document-frequency 1 filter 1 distance 1 hive-hbase 1 hadoop-docker 1 docker-image 1 docker-container 1 digitalocean 1 digital-ocean 1 hipchat 1 devops 1 communication 1 cloudera-manager 1 chatops 1 chatbot 1 cloud-computing 1 trading 1 tableau 1 rotten-tomatoes 1 reviewsanalysis-nlp 1 java-mapreduce 1 hadoop-filesystem 1 google-colab-notebook 1 eclipse 1 clustering 1 big-data-processing 1 powerbi 1 tuning 1 powerbi-dashboard 1 price 1 pipeline 1 oozie-hive 1 powerbi-dashboards 1 powerbi-report 1 google-colab 1 dataset 1 cloudera-vm 1 airbnb-pricing-prediction 1 airbnb-prices 1 airbnb-data 1 airbnb 1 powerbidashboard 1