An open API service providing repository metadata for many open source software ecosystems.

Topic: "sqoop"

heibaiying/BigData-Notes

大数据入门指南 :star:

Language: Java - Size: 22.9 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 16,422 - Forks: 4,279

apache/sqoop 📦

Mirror of Apache Sqoop

Language: Java - Size: 17.9 MB - Last synced at: 4 days ago - Pushed at: about 4 years ago - Stars: 982 - Forks: 583

sunnyandgood/BigData

💎🔥大数据学习笔记

Language: Java - Size: 316 MB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 647 - Forks: 222

WeBankFinTech/Exchangis

Exchangis is a lightweight,highly extensible data exchange platform that supports data transmission between structured and unstructured heterogeneous data sources

Language: Java - Size: 41.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 447 - Forks: 207

bluishglc/bdp

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Language: Java - Size: 403 KB - Last synced at: over 1 year ago - Pushed at: almost 5 years ago - Stars: 184 - Forks: 135

aliyun/aliyun-maxcompute-data-collectors

Language: Java - Size: 93.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 127 - Forks: 64

v5tech/cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Language: Shell - Size: 31.7 MB - Last synced at: about 2 months ago - Pushed at: about 8 years ago - Stars: 55 - Forks: 43

dimajix/spark-training

Repository used for Spark Trainings

Language: Jupyter Notebook - Size: 9 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 53 - Forks: 66

Cigna/ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

Language: Python - Size: 749 KB - Last synced at: 7 months ago - Pushed at: about 3 years ago - Stars: 51 - Forks: 15

vivek2319/Learn-Hadoop-and-Spark

This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.

Language: Python - Size: 211 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 46 - Forks: 39

mrugankray/Big-Data-Cluster

The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.

Language: Shell - Size: 118 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 41 - Forks: 15

san089/Cloudera_Material

Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.

Size: 9.02 MB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 37 - Forks: 30

Powerspace/pg2bq

Export PostgreSQL tables to Google BigQuery

Language: Scala - Size: 640 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 36 - Forks: 11

maniram-yadav/Big_DataHadoop_Projects

Big data projects implemented by Maniram yadav

Language: PigLatin - Size: 2.79 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 33 - Forks: 33

peiliping/meepo 📦

异构存储数据迁移

Language: Java - Size: 986 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 30 - Forks: 22

Mrkuhuo/bigdata_learning

大数据组件学习代码

Language: Java - Size: 36.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 7

zenoyang/web-click-flow

网站点击流离线日志分析

Language: Java - Size: 2.98 MB - Last synced at: 2 months ago - Pushed at: almost 7 years ago - Stars: 19 - Forks: 11

tejasjbansal/HELTHCARE-SYSTEM

Data cleaning, pre-processing, and Analytics on a Health care data using Spark and Python.

Language: Jupyter Notebook - Size: 3 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 17 - Forks: 14

ven2day/Bigdata-docker-sandbox

Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary tools that can be used in the BigData domain, It's a collection of docker containers that you can use directly.

Language: VBA - Size: 79.7 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 17 - Forks: 4

Jayvardhan-Reddy/BigData-Ecosystem-Architecture

Life-cycle: Internal working of HDFS, SQOOP, HIVE, SPARK, HBASE, KAFKA with code.

Language: Shell - Size: 562 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 13 - Forks: 16

conch-stack/conch-bigdata

Big Data

Language: HTML - Size: 14.2 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 3

Stefen-Taime/ETL-Data-Pipeline-RDBMS-TO-HDFS-using-Airflow-Apache-Sqoop-Spark-Postgres-and-Hive

This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)

Language: Python - Size: 17.7 MB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 10 - Forks: 4

Sathiyarajan/big-data-pipeline

Big Data

Language: Java - Size: 705 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 9 - Forks: 6

Pathairush/rdbms_to_hdfs_data_pipeline

A data pipeline moving data from a Relational database system (RDBMS) to a Hadoop file system (HDFS).

Language: Python - Size: 46.9 MB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 8 - Forks: 1

milindjagre/HDPCD

This repository contains all the documents related to HDPCD certification.

Language: PigLatin - Size: 42 KB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 8 - Forks: 10

lovnishverma/bigdataecosystem

Complete Big Data Ecosystem on Docker Desktop

Language: Shell - Size: 405 KB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 7 - Forks: 1

Pathairush/airflow_hive_spark_sqoop

A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)

Language: Shell - Size: 24.4 KB - Last synced at: over 2 years ago - Pushed at: about 4 years ago - Stars: 7 - Forks: 4

MehdiTAZI/BigData-Platform

End to end big data project, that aims to show how to implement different big data layers, from the infrastructure layer to the end user one. [HADOOP][Spark][Kafka][Cassandra][Ansible][Jupyter][Docker]

Language: Jupyter Notebook - Size: 85 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 6

bhvikp/sqoop-spark-hive

MYSQL | SQOOP | SPARK | HIVE workflow

Language: Scala - Size: 33.2 KB - Last synced at: 8 days ago - Pushed at: almost 7 years ago - Stars: 6 - Forks: 8

TritonDataCenter/hadoop-manta

Hadoop Filesystem Driver for Manta

Language: Java - Size: 172 KB - Last synced at: 13 days ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 6

Madadata/hasoop

Hasoop - Node.js client for Sqoop 2

Language: JavaScript - Size: 171 KB - Last synced at: 28 days ago - Pushed at: over 8 years ago - Stars: 6 - Forks: 1

ghosh17/Predictive-Analysis

Predictive Analysis using Big Data platforms and Machine Learning Libraries

Language: Shell - Size: 410 KB - Last synced at: over 2 years ago - Pushed at: almost 9 years ago - Stars: 6 - Forks: 1

melwinmpk/PizzaOrders_DataPipeline

Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

Language: Python - Size: 603 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 2

NickkBright/Spark-SqoopCDC

Change data capture realization using Spark and Sqoop

Language: Scala - Size: 17.6 KB - Last synced at: over 1 year ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 2

jazzwang/hive_labs

Hive, Sqoop related labs

Language: Shell - Size: 3 MB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 5 - Forks: 1

SaravananJaichandar/Big-Data

A Hadoop repository to portray the use-cases of different hadoop components with real-time projects and their workings explained in detail.

Size: 13 MB - Last synced at: 11 months ago - Pushed at: over 7 years ago - Stars: 5 - Forks: 4

VictoriaGomesDS/Intro_Ecossistema_Hadoop

Size: 233 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 4 - Forks: 0

MarceloJSSantos/acelereracao-global-dev-4-everis-dio

Repositório criado para armazenar anotações e atividades desempenhadas no treinamento na plataforma da Digital Inovattion One (DIO) para o Processo seletivo de Engenheiros de Dados pela empresa Everis.

Language: Jupyter Notebook - Size: 57.3 MB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 2

rktrojan/BigDataHadoop

BigData/Hadoop related codebase including Sqoop, Hive/HQL, Spark, Flink

Language: Jupyter Notebook - Size: 1.74 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

thedatasociety/lab-hadoop

Language: PLpgSQL - Size: 4.6 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 3 - Forks: 7

avatime/gamul-gamul

🏆가물가물 : 빅데이터 분산 처리를 활용한 물가기반 식재료 가격 정보 제공 웹앱 서비스 - 🥇SSAFY 7기 특화프로젝트 우수상 1등(2022.10.07)

Language: Java - Size: 190 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

Jayvardhan-Reddy/BigData_Concepts

The various underlying process that takes place on each concept of Big-data ecosystem.

Language: Shell - Size: 10.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 5

lazy-apple/BigData_Long

爬虫+大数据项目

Language: Java - Size: 183 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

lazy-apple/BigData

大数据电商项目

Language: Java - Size: 205 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 3 - Forks: 0

danielsqtang/Data-Ingestion-Shellscript

Scripts for ETL

Language: Shell - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 1

shinde-chandrakant/BANK-ATM-ETL

Spar Nord, a Danish bank, optimizes ATM refill frequency by observing withdrawal behavior and dependent factors. The project builds a batch ETL pipeline to load the data into Redshift Data Mart for analytical queries.

Language: Jupyter Notebook - Size: 2.07 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

terodea/CS-BigData

Learn Big Data tools/ framework by doing examples, POC, per projects.

Language: Python - Size: 15.7 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 3

melwinmpk/UserTransactions_DataPipeline

User Transactions Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive. Implemented the Slowly changing Dimensions (SCD) 1.

Language: Shell - Size: 5.49 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

melwinmpk/Userreview_Data_Pipeline_Sqoop_HDFS

Solving the Restaurant User Review Data Pipeline Scenarios using Shellscript, Python, Sqoop, HDFS

Language: Python - Size: 3.4 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

kasipavankumar/sqoop-docker

Apache Sqoop using Docker. 🐳

Language: Dockerfile - Size: 20.5 KB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

alvarofpp/4linux-hadoop 📦

Scipts usados durante o curso Big Data Analytics com Hadoop oferecido pela 4Linux

Language: PigLatin - Size: 1.03 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 0

sujeethshetty/big-data

Project, assignments & research related to Hadoop Ecosytem

Language: Jupyter Notebook - Size: 16.7 MB - Last synced at: 6 months ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

siddharth271101/Stock-Exchange-Analysis

Created a data pipeline using sqoop to ingest data from sql server into the hive table and used hive for feature engineering and analysis.

Language: Shell - Size: 14.5 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 1

NikhilURao/H1B_VisaProject

This repository contains the H1B_Visa Applicants Data Analysis project/case study using Hadoop undertaken during the training at NIIT. MapReduce,Hive,Pig,Scoop and Shell-scripting are the technologies used.

Language: Shell - Size: 729 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 5

aj-22/incremental

Incremental updates in HIVE via CLI and HUE

Language: TSQL - Size: 167 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 2 - Forks: 5

Niranjankumar-c/DataAnalytics_using_ClickstreamData

Casestudy completed as part of BigData training from analytix labs

Size: 12.6 MB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 2

vishal2232/Project_1-Spark-using-Scala-API-

Problem statement, get the revenue and number of orders from order_items on daily basis.

Size: 1.67 MB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 2 - Forks: 0

VladimirZelenokor1/Big-Data-Project---Predicting-Trip-Fares-with-Spark-Hive

A CRISP-DM–based big data pipeline for predicting NYC ride-sharing trip fares: ingesting 2024 TLC data via Sqoop into HDFS/Hive, performing ETL and feature engineering with Spark & PySpark, training and tuning Linear Regression & Gradient Boosted Tree models, and outlining end-to-end deployment.

Language: Java - Size: 906 KB - Last synced at: 18 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

DadaNanjesha/Redshift-ETL-Project

The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.

Language: Jupyter Notebook - Size: 833 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

thdaraujo/cheat

A handful of cheatsheets and programming tips.

Size: 105 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

Heisenberghj7/Retail-Store-BigData

📊 📑This project provides a step-by-step big data analytics applied in the retail industry through the use of a variety of big data technologies. such as HDFS, Hive and Spark..

Language: Python - Size: 2.11 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

EddieAmaitum/Spar-Nord-Bank-ETL-and-BI-Project

In this project I build a batch ETL pipeline to read transactional data from Amazon RDS, transform it to a usable format and then load it into an Amazon S3 bucket. The data is then loaded into Redshift Tables, after which I perform analytical queries on the loaded data to gain insights.

Language: Jupyter Notebook - Size: 2.72 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

EddieAmaitum/NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce

Language: Python - Size: 5.63 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

sarathchandrikak/ETL-Bank-Transcation

Data Analysis of bank transaction data

Language: Jupyter Notebook - Size: 9.34 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

naimazizi/hive-export

Combination between Apache Spark and Sqoop to extract data from Hive table into relational database, integrated with pipeline using luigi.

Language: Python - Size: 11.7 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ahpuchend/agdata

农产品数据分析系统

Language: JavaScript - Size: 9.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

Psingh12354/Internship_Notes_Cog

Language: Scala - Size: 737 KB - Last synced at: 4 months ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

melwinmpk/SCD_in_Warehouse

Hive Project- Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Language: HiveQL - Size: 22.7 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 4

Linho1150/BIGDATA_PROGRAMMING_PROJECT

Analyzed traffic flow around the university through bus arrival time at the bus stop in Myongji University. Use web crawling (Python), Hadoop, HDFS, Sqoop, Hive, Zeppelin, and Amazon RDS (Mysql).

Language: Python - Size: 11.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

anaghazachariah/sqoop-installation-ubuntu

Size: 22.5 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

RaghuKantamsetti/Hadoop-Use-Case-on-Healthcare

This Repository is about processing and store Healthcare data using Big Data tool Hadoop and its components.

Language: Shell - Size: 42 KB - Last synced at: over 1 year ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 1

AnkitaSinha98/Customer360-Data-Analysis

Big Data is Stored and analyzed of various Customer using Hadoop and other tools like Hive, Zookeeper, Hbase and sqoop and all details of the customer is analyzed then result are given.This result is very useful for companies.

Size: 292 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

alokjani/bigdata-vagrant-devlab

Hadoop Software Development sandbox

Language: Shell - Size: 206 KB - Last synced at: 3 months ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

Sailendra-R-D/Prep-Resource-CCA175

A quick lookup for CCA-175 certification

Size: 27.3 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 2

amittian/BANK-BIG-DATA-ANALYSIS-USING-HADOOP

Banking Data Analysis Using SQL ,SQOOP, HIVE, HADOOP, TABLEAU, R, UNIX

Size: 13.7 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

f2e-awesome/HadoopEcosystem

Hadoop 生态体系(ecosystem)

Language: JavaScript - Size: 3.91 KB - Last synced at: over 2 years ago - Pushed at: almost 7 years ago - Stars: 1 - Forks: 1

Niranjankumar-c/SqoopCaseStudy-ALabs

Sqoop Case Study's done during Analytixlabs Dig Data Classes

Size: 13.7 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

anirudhgupta22/Microsoft-Azure-HDInsight

Short documentation on Microsoft's Azure HDInsight

Size: 2.02 MB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

ahmedmohamedfoua/Big-Data-Project---Predicting-Trip-Fares-with-Spark-Hive

This repository provides a complete workflow for predicting ride-sharing trip fares in New York City using Spark and Hive. Explore the data, models, and results while leveraging the power of big data! 🐙🚀

Language: Java - Size: 906 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Fradhyle/Voo-ong

더조은컴퓨터아카데미 빅데이터 10기 최종 팀 프로젝트

Language: HiveQL - Size: 57.7 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

gilberto-009199/bigdata

Workspaces de BigData:

Language: Java - Size: 60.4 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

Zain970/ETL-Data-Pipline-Using-Apache-Airflow

Utilize Sqoop to import data from relational databases and ingest files from S3 buckets into HDFS.Apply complex transformations using Apache Spark to prepare data for analysis and reporting. Create and manage Hive tables for structured data storage and query optimization.Load processed data into HBase, making it accessible for various teams and app

Language: Python - Size: 3.91 KB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

tejaswirupa/Big-Data-Systems-Project-Hadoop-Hive-MapReduce-Sqoop-Workflows

Designed and implemented scalable data workflows using Hadoop, Hive, and Sqoop. This project involved log aggregation, airline delay analysis, word frequency processing, and TF-IDF computation across multiple datasets using MapReduce, Hive queries, and Hadoop Streaming.

Size: 3.75 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

raja9283/HadoopSCD

A data pipeline on GCP Dataproc using Sqoop, HDFS, Hive, and PySpark to implement SCD Type 2 for an e-commerce use case. Tracks customer and product changes (e.g., address, price) and their impact on sales, demonstrating scalable data warehousing and processing.

Language: Python - Size: 12.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

ANKIT21111/Patient-Alert-ETL

The Patient Alert ETL 🚑 project creates a real-time data pipeline to monitor vital health parameters from IoT devices in hospitals. Using Apache Kafka, Spark, and HBase, it processes streaming data and sends immediate alerts via Amazon SNS when vitals exceed normal thresholds, enhancing patient care through timely interventions.

Language: Python - Size: 5.47 MB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

ccao-data/service-sqoop-iasworld 📦

Service to continually import iasWorld backend data to Parquet using Apache Sqoop

Language: Shell - Size: 407 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

offthetab/VKAPI-ML-DataHarvester

Pipeline to harvest data via VK API for ML analysis with hadoop and spark

Language: Jupyter Notebook - Size: 6.69 MB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

sebastianruizm/CCA175-Exam-Preparation

Backup de mi preparación para el examen CCA175 de Cloudera

Language: Python - Size: 42 KB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

shinde-chandrakant/online-advertising-platform

Online Advertising Platform - a comprehensive big data project

Language: Python - Size: 7.73 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shivananda199/hive-analytics-in-aws-for-e-commerce

A project to create a Hive data warehouse for E-commerce in AWS and perform data analysis.

Language: HiveQL - Size: 344 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

shinde-chandrakant/BigData-Ops-on-TLC-Yellow-Taxi

Analysed New York City's Yellow taxi data set with Big Data tools such as Hadoop, HBase, Sqoop, MapReduce and AWS Cloud Infrastructure.

Language: Python - Size: 7.19 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

faryar251/ecomsales-and-walmartstock-analysis

Performed end-to-end big data analysis on E-Commerce Sales & Walmart Stock data, extracting valuable insights for impactful reporting.

Size: 5.79 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

ashok-amsamani/HIVE-SQOOP-Integration

Listed steps about how to move data from Mysql to HIVE using Sqoop and Hive to Mysql using sqoop.

Size: 216 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

fernandadiasm/study

Repositório criado com o objetivo de reunir exercícios e anotações sobre tecnologias e linguagens.

Language: Shell - Size: 4.01 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

aalexren/iu-bigdata-project

[Innopolis University] Big Data Course 2023. Final Project.

Language: HiveQL - Size: 118 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

sablokgaurav/data_engineering

java_codes

Language: Java - Size: 1.95 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Mehak0310/Real_time_Health_Alert_Notification

Propose a reliable data pipeline solution to capture high-velocity stream of patient vitals such as body temperature, heartbeat, blood pressure (BP) coming from IoT devices and send an instant email notification incase of abnormal vitals.

Language: Python - Size: 3.23 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

Mehak0310/ATM-data-ETL-Pipeline-Sqoop-Pyspark-Redshift

Build a batch ETL pipeline to read transactional data from RDS, transform and load it into target dimensions and facts on Redshift Data Mart(Schema, after which some analytical queries have to be performed on the loaded data.

Language: Jupyter Notebook - Size: 1.98 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

ZGG2016/sqoop

Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

abhij215/dataEngineeringNotes

contains notes on various topics related to data engineer.

Size: 4.88 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0