GitHub topics: hadoop-mapreduce

Repositories

41xu/Hadoop-ClassNotes

Some code during learning Hadoop.

Language: Java - Size: 6.1 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

biagiocornacchia/bloom-filters-in-mapreduce

Implementation of the MapReduce Bloom filter construction algorithm using the Hadoop and Spark framework.

Language: Java - Size: 717 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

JohandeGraaf/PageRank

PageRank algorithm implemented in Hadoop MapReduce.

Language: Java - Size: 2 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

csd-auth-ftw/hadoop-http-logs

A hadoop application that searches for errors in Apache logs

Language: Java - Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

JiangtaoXu93/Routing-Project

Giving historical airplane on time performance data, offer suggestions for two-hop flights that minimize the chance of missing a connection.

Language: Java - Size: 42.5 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

verma-rahul/MapReduceProjects

Language: Java - Size: 1.29 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

smaddikonda/Hadoop-MapReduce

Parallel Data Processing using Hadoop MapReduce

Language: Makefile - Size: 35.3 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

salonishah11/MapReduce

Contains PageRank algorithm implemented in MapReduce and Spark. Programs for Combiner, NoCombiner and InMapperCombiner patterns along with Secondary Sort algorithm executed on temperature data.

Language: Java - Size: 1.2 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

marcelxyz/twitter-analysis-hadoop

Hadoop implementation of tweet length and frequent hashtag analysis

Language: Java - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

aravind2060/HadoopAndHiveforLargeScaleDataAnalysis

Language: Java - Size: 186 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nabai-max/mapreduce-wordcount

This project leverages Java and Hadoop MapReduce to analyze text and flight data, focusing on a classic Word Count problem and detailed flight data analysis.

Language: Java - Size: 703 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

nabai-max/Hadoop-MapReduce

Changed readme. This is a Java project that is used for a simple MapReduce Word Count problem.

Language: Java - Size: 711 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

darule0/yarndiff

A rudimentary command line utility for contrasting Apache Yarn container logs.

Language: Shell - Size: 59.6 KB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

shask9/Matrix-Multiplication-Hadoop

Hadoop MapReduce program to compute multiplication of two sparse matrices

Language: Java - Size: 96.7 KB - Last synced at: 7 months ago - Pushed at: about 7 years ago - Stars: 8 - Forks: 5

lalkakonus/ir-hw4

Mail TehnoSphere Information Retrieval HW4

Language: Java - Size: 1.12 MB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

RevanthPosina/YouTube_Analysis_with_Java

YouTube Analysis to find out the top 5 categories with maximum number of videos uploaded and the top 10 rated videos on YouTube

Language: Java - Size: 537 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

WindowsXp-Beta/BookHub

An online bookstore integrated with many fancy technologies.

Language: JavaScript - Size: 6.07 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

10lloydj/NLP-RDF-Inverted-Index

This Map Reduce program should read in a set of RDF/XML documents and output the data in the form: {object}, [(predicate1, position, subject1)...]

Language: Java - Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

AkshayJaitly/CS643-AKSHAY-JAITLY

HADOOP WORDCOUNT ON AWS EC2 INSTANCE

Language: Java - Size: 9.46 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Ruggero1912/mapreduce-bloom-filters

This project investigates how to build Bloom Filters using the MapReduce approach in Hadoop and Spark. Different implementations and further anlysis on performances are reported

Language: Jupyter Notebook - Size: 1.74 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

seyfal/MapReduceGraphComparison

Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.

Language: Scala - Size: 1.76 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

j-buitrago/Distributed-processing-AWS

Amazon Web Services to process big data using a Hadoop cluster

Language: Python - Size: 18.6 KB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

hyeonsangjeon/dataplatform

Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.

Language: Shell - Size: 549 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 1

mhamadelitawi/Handoop

Hadoop Map-Reduce implementations of many scientific computations

Language: Java - Size: 2.46 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

HarshitDawar55/MapReduce

Programs for MapReduce written in java with least complexity!

Language: Java - Size: 76.2 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

yuliya-akchurina/Big-Data-Programming

Big Data Programming Projects

Language: Python - Size: 57.5 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

DavideBruni/ParallelK-Means

Implementation of Parallel k-means using MapReduce in Hadoop

Language: Jupyter Notebook - Size: 485 KB - Last synced at: 6 months ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

drewm8080/big_data_management

Contains all homework from the course Foundations of Database Management at USC

Language: Python - Size: 4.75 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

hamzahamidi/map-reduce-sample

MapReduce exercices sample

Language: Java - Size: 24.7 MB - Last synced at: 3 months ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 1

HxnDev/Hadoop-MapReduce-to-Analyze-Sentiment-of-Keyword

In this task, we had to write a MapReduce program to analyze the sentiment of a keyword from a list of comments. This was done using Hadoop HDFS.

Language: Java - Size: 1000 KB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 0

mikeroyal/Apache-Hadoop-Guide

Apache Hadoop Guide

Size: 141 KB - Last synced at: 2 months ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 2

marcocolangelo/Big-Data-processing-and-Analytics

The current repository contains all the code developed during the Big Data processing and Analytics laboratories. Data are processed and analyzed using Hadoop and Spark

Language: Java - Size: 6.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

JPThakur361/Sample-Projects

Map Reduce implements various mathematical algorithms to divide a task into small parts and assign them to multiple systems. In technical terms, Map Reduce algorithm helps in sending the Map & Reduce tasks to appropriate servers in a cluster. Like:> Sorting ,Searching ,Indexing ,TF-IDF . where we implemented few small things in indexing algorithm .

Language: Java - Size: 1.68 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

Tarunpreetsingh16/hadoop

Data analysis using hadoop.

Size: 63.7 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

PranavPKS/big-data-small-projects

Learning basic concepts of standard big-data technologies

Language: Java - Size: 18.5 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

anshul1004/MutualFriends

Implementation of Hadoop and Spark

Language: Java - Size: 23 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

katipogluMustafa/BigData

Map Reduce

Language: Java - Size: 245 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

gowribhat/sms-corpus-keyword-analysis

Language: Jupyter Notebook - Size: 144 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

yp3722/Distributed-Log-Processing

A distributed system built with Hadoop File System that employs map-reduce approach to analyze large volumes of data to extract insights

Language: Scala - Size: 194 KB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

evaesqmor/WordCountMapReduce

Big Data: Map Reduce Example

Language: Java - Size: 4.88 KB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

thenielfarias/MapReduce-application-with-Hadoop-and-Java-for-WordCount

MapReduce application with Hadoop and Java for WordCount that counts the number of occurrences of each word in the books of the input set.

Language: Java - Size: 784 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

benjdiasaad/MapReduce_K-means

Implémentation de l'algorithme de clustering k-means en utilisant le framework Hadoop version 3.1.3 (MapReduce).

Language: Java - Size: 32.2 KB - Last synced at: 26 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 2

lauravoicu/Coursera-Hadoop-Platform-Application

Language: Python - Size: 6.84 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

MatteoM95/Big-data-processing-and-analytics

Exercises on Spark and Hadoop - Done in Distributed architectures for big data processing and analytics course at Politecnico di Torino

Language: Java - Size: 4.94 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 2

HaneefAhamed/Hadoop_Map_Reduce

Hadoop setup and Getting Started with developing Hadoop programs

Size: 11.7 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ayush-usf/stack-overflow-logs-hadoop-analysis

Ask Ubuntu Logs analysis with Hadoop, MapReduce 2(Yarn)

Language: Java - Size: 108 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

SubalakshmiShanthosi/PCP1211DALab

Language: TeX - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

NikhilURao/H1B_VisaProject

This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.

Language: Shell - Size: 729 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

ronellsalunke/Titanic-BigData

Java Hadoop MapReduce code for my Big Data Analytics Project using the Titanic dataset

Language: Java - Size: 41 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

rfhussain/Running-a-Spark-Job-on-AWS-Cluster

When dealing with huge datasets, it is quite impossible that the code successfully executes on your personal desktop. You either need a locally installed clustered environment i.e. Hadoop Map-Reduce or a Cloud such as AWS. Here's an example of running such Job on AWS cloud.

Language: Python - Size: 804 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

toukirnaim08/Python-Hadoop-MapReduce

Python Hadoop/MapReduce Program

Language: Python - Size: 5.52 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

toukirnaim08/HiveQL-Hadoop-MapReduce

A HiveQL script with Hadoop/MapReduce Program to find out the most popular movies for different age groups.

Language: HiveQL - Size: 5.52 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

toukirnaim08/PigScript-Hadoop-MapReduce

PigLatin script and Hadoop/MapReduce Program

Language: PigLatin - Size: 5.52 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

JhonWilderParionaVilca/MapReduce

Ejemplo de uso de Map Reduce con hadoop y jupiter

Language: Jupyter Notebook - Size: 34.2 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

kartik894/hadoop-sched

Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop’s performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. MapReduce uses speculative execution to improve fault tolerance. Current Hadoop implementation decides whether to run speculative tasks based on the progress rates of running tasks, which does not take into consideration the absolute progress of each task. The modified Hadoop framework was deployed in 6 t2.medium EC2 instances in a master-slave configuration.

Language: Java - Size: 1.57 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 2

shubhamwaghe/Scalable-Data-Mining

Scalable Data Mining - Assignment submissions

Language: Python - Size: 3.38 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 0

Goutham88/Parallel-and-Weighted-Itemset-Mining-by-means-of-MapReduce-FrameWork

Mines heavy Weighted Item-sets(Rating,Reviews)

Language: Java - Size: 75.2 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

tirthmehta/Big-Data-Analysis-with-Apache-Hadoop-Pig-Latin

Big Data Analysis of datasets for taking into account the character occurrences.

Language: PigLatin - Size: 1000 Bytes - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

asaldelkhosh-learning/hadoop

Learning Hadoop and Map-Reduce!

Size: 33.2 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

maphdev/M2_Big_Data_Project

Create a world map with zooms using NASA's geographical data. Use of Hadoop, Spark and HBase.

Language: Java - Size: 48.4 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

xichie/Hadoop

Language: Java - Size: 46.5 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 1 - Forks: 0

dboston1/Reddit-Sentiment-Analysis

Program that performs textual analysis of Reddit data (approx. 300 GB) preprocessed by another team member. Uses Hadoop's Mapreduce to classify comments as either positive or negative based on certain keywords, negation, etc.

Language: Java - Size: 2.34 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 5 - Forks: 0

mohammadsadra/iust-cc-401

This repo contains all supplementary items for Cloud Computing course taught in IUST at Fall 2022.

Size: 340 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

Keerthivasan13/CSCI572-Information_Retrieval_And_Web_Search_Engines

Search Engine projects

Language: Java - Size: 34.5 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 11 - Forks: 17

amitkedia007/Analysis-of-AirBnB-data-Hadoop-Mapreduce

This repo explains the implementation of Map-Reduce Algorithm on the AirBnb data to understand the consumer satisfaction region and country wise. This is the effective use of parallel distributed computing to resolve the big data problems

Language: Java - Size: 1.8 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

joshi-aditya/Amazon-Reviews-Dataset-Analysis-MapReduce

Amazon Customer Reviews Dataset Analysis using Hadoop MapReduce, Pig. Semester end project for INFO7250 Engineering of Big Data Systems course.

Language: Java - Size: 1.66 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 2

RahulReddy-Arva/Search-Engine-BAsed-on-TFIDF

Developed a Basic Search Engine which ranks the documents in the decreasing order of their TF - IDF values based on the Search Query provided by the User and retrieves the top 100 documents according to the Search request. Term Frequency - Inverse Document Frequency is used for Information Retrieval. This is implemented in distributed computing environment using Apache HADOOP.

Language: HTML - Size: 870 KB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

Yeema/WordCount

using Hadoop to rank vocabulary by Aa-Zz

Language: Java - Size: 2.01 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Yeema/Average_Sort

calculate the average of occurrences and sort them by multiple reducers

Language: Java - Size: 2.64 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

Yeema/PageRank

Language: Jupyter Notebook - Size: 779 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

Yeema/LSH

find similar articles

Language: Java - Size: 623 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

Arushi2002/Yet_Another_Map_Reduce

Implemented the core concepts of Hadoop's Map Reduce Framework.

Language: Python - Size: 20.5 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Shubham-vish/hadoop-B-Tree

Language: Java - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 1

ucapdak/Olympic-Tweets

Assignment for Big Data Processing: A collection of programs for analysing tweets related to the 2012 Olympics.

Language: Java - Size: 223 KB - Last synced at: over 1 year ago - Pushed at: almost 8 years ago - Stars: 1 - Forks: 0

tugrulhkarabulut/hadoop-movie-rating-prediction

Movie rating prediction application

Language: CSS - Size: 3.46 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 4 - Forks: 0

AH-Yussef/Health-Monitor-Big-Data-System

A Health Monitor to simulate receiving and processing large amounts of health metrics from many clients with the goal of efficiently finding aggregate statistics

Language: Java - Size: 319 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

DanMolenhouse/Distributed-Systems-Project5-Hadoop-and-Spark

In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.

Language: Java - Size: 70.3 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

suncle1993/hadoop-mapreduce-demo

Hadoop3.1 MapReduce Demo -- Python

Language: Python - Size: 781 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 2

PrateekKumar1709/Ngram-Language-Model-Hadoop-MapReduce

A project to implement a language Models (Ngrams) with Hadoop MapReduce

Language: Python - Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

avulaankith/Matrix-Multiplication-Hadoop

This is code for matrix multiplication using hadoop framework in java and spark framework in scala

Language: HTML - Size: 257 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

e-petrachi/AmazonFoodAnalytic

Un progetto di confronto tra HADOOP, SPARK e HIVE su query simili per analisi distribuite su un dataset in formato CSV relativo a recensioni di prodotti gastronomici Amazon

Language: Java - Size: 535 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 0

mihir09/Burnol

A search engine that allows users to search for multi words query and displays top 10 Wikipedia pages matched with query. Scrapped Wikipedia using Beautiful Soup. Index the data using Hadoop Map Reduce.

Language: Java - Size: 138 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

akarsh3007/HadoopMapRExamples

Language: Java - Size: 35.9 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Grg0rry/MapReduce-Recommendation-System

A recommendation system built on top of Hadoop Distributed File System and MapReduce

Language: Java - Size: 204 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ruxuebu/Java-based-Movie-Recommender

A Movie Recommendation System implemented in Java base on Item-Item collaborative filtering algorithms

Language: Java - Size: 8.79 KB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 4 - Forks: 2

iRahulP/COMP6231

All assignments implementation as part of COMP6231(Distributed System Design) course at Concordia University for Winter21.

Language: Java - Size: 8.87 MB - Last synced at: almost 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

nish-d/MapReduce_on_Cancer_Database

Map Reduce Queries on United States Cancer Statistics Data. Database can be found at mentioned link

Language: Java - Size: 14.3 MB - Last synced at: almost 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

ash-0521/Ensuring-Smiles-using-Spark-ML

The primary objective of this study is to explore the feasibility of using machine learning algorithms to classify health insurance plans based on their coverage for routine dental services. To achieve this, I used six different classification algorithms: LR, DT, RF, GBT, SVM, FM(Tech: PySpark, SQL, Databricks, Zeppelin books, Hadoop, Spark-Submit)

Language: Python - Size: 15.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

Vzzarr/BigData---FineFoodReviews

Language: JavaScript - Size: 2.57 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0

PeterSchuld/UCSanDiego_MicroMasters_DataScience-BigDataAnalyticsUsingSpark

The University of California, San Diego, course DSE230x "Big Data Analytics Using Spark" (Summer 2019): Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform. Part 4 of the »Data Science« MicroMasters® Program on edX. Instructor: Yoav Freund, Professor of CS and Engineering, University of California San Diego.

Size: 6.12 MB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 1

shreyas15/Ranked-File-Search

Information retrieval (IR) is concerned with finding material (e.g., documents) of an unstructured nature (usually text) in response to an information need (e.g., a query) from large collections. One approach to identify relevant documents is to compute scores based on the matches between terms in the query and terms in the documents. For example, a document with words such as ball , team , score , championship is likely to be about sports. It is helpful to define a weight for each term in a document that can be meaningful for computing such a score. I use popular information retrieval metrics such as term frequency, inverse document frequency, and their product, term frequency-inverse document frequency (TF-IDF), that are used to define weights for terms.

Language: Java - Size: 974 KB - Last synced at: almost 2 years ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 0