divithraju | GitHub owners | Ecosyste.ms: Repos

divithraju/divith-aju-Hadoop-Pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

Language: Python - Size: 4.88 KB - Last synced at: 15 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

divithraju/User_behavior_analytics-

Size: 2.93 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-postgreSQL

Size: 10.7 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

divithraju/Github-bot

Language: Python - Size: 360 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Immigration-Data-Engineering

A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)

Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: 26 days ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

divithraju/Divithraju

Config files for my GitHub profile.

Size: 40 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 2 - Forks: 0

divithraju/datacompy Fork of capitalone/datacompy

Pandas, Polars, and Spark DataFrame comparison for humans and more!

Language: Python - Size: 6.49 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/pyspark-example-project Fork of AlexIoannides/pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

Language: Python - Size: 764 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/awesome-spark Fork of awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

Language: Python - Size: 212 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/PySpark Fork of hyunjoonbok/PySpark

PySpark functions and utilities with examples. Assists ETL process of data modeling

Size: 3.79 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

divithraju/pyspark-ai Fork of pyspark-ai/pyspark-ai

English SDK for Apache Spark

Size: 6.42 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-Data-Mining

This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.

Language: Python - Size: 3.91 KB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Customer-Sales-ETL-Pipeline

This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and loading into a database, with Apache Airflow used.

Language: Python - Size: 7.81 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-OpenMetadata

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Language: Python - Size: 68.4 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-SearchEngine-Wikipedia

search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.

Language: Python - Size: 16.6 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-Hadoop-Connectors-Master

Language: Java - Size: 197 KB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-pipeline-hadoop-pyspark

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

Language: Python - Size: 4.88 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Python

This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.

Language: Python - Size: 7.81 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-ETL-Airflow-Project

This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow for task orchestration, the project simulates a scalable and modular ETL solution often required in enterprise data workflows.

Language: Python - Size: 10.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Pyspark-work

Language: Python - Size: 12.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-PySpark-Projects

Language: Python - Size: 10.7 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Webapplication-Spark-memory-cal

The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.

Language: Python - Size: 5.86 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos

GitHub / divithraju 1 Repository