GitHub / divithraju 1 Repository
Big Data Developer| Data Scientist| Developer|Machine Learning| Deep Learning|
divithraju/User_behavior_analytics-
Size: 2.93 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 0 - Forks: 0

divithraju/divith-raju-postgreSQL
Size: 10.7 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

divithraju/Github-bot
Language: Python - Size: 360 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Immigration-Data-Engineering
A Capstone Project that covers several aspects of Data Engineering (Data Exploration, Cleaning, Modeling, Pipelining, Processing)
Language: Jupyter Notebook - Size: 2.5 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

divithraju/Divithraju
Config files for my GitHub profile.
Size: 40 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

divithraju/datacompy Fork of capitalone/datacompy
Pandas, Polars, and Spark DataFrame comparison for humans and more!
Language: Python - Size: 6.49 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

divithraju/pyspark-example-project Fork of AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
Language: Python - Size: 764 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

divithraju/awesome-spark Fork of awesome-spark/awesome-spark
A curated list of awesome Apache Spark packages and resources.
Language: Python - Size: 212 KB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

divithraju/PySpark Fork of hyunjoonbok/PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
Size: 3.79 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

divithraju/pyspark-ai Fork of pyspark-ai/pyspark-ai
English SDK for Apache Spark
Size: 6.42 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-Data-Mining
This project focuses on customer segmentation using data mining techniques, specifically K-Means clustering, to classify customers into distinct groups based on their purchasing behaviors. The goal is to analyze customer data and segment them into clusters for targeted marketing strategies and better customer relationship management.
Language: Python - Size: 3.91 KB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Customer-Sales-ETL-Pipeline
This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and loading into a database, with Apache Airflow used.
Language: Python - Size: 7.81 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Language: Python - Size: 68.4 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-SearchEngine-Wikipedia
search engine optimizationA complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
Language: Python - Size: 16.6 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-Hadoop-Connectors-Master
Language: Java - Size: 197 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

divithraju/divith-raju-pipeline-hadoop-pyspark
This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.
Language: Python - Size: 4.88 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-aju-Hadoop-Pyspark-pipeline
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
Language: Python - Size: 4.88 KB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Python
This repository highlights my ability to develop and integrate diverse Python solutions, ranging from API creation and data management to cloud service integration. Each project in this repository serves a specific purpose, demonstrating both fundamental concepts and practical applications that are essential in real-world software development.
Language: Python - Size: 7.81 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-ETL-Airflow-Project
This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow for task orchestration, the project simulates a scalable and modular ETL solution often required in enterprise data workflows.
Language: Python - Size: 10.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Pyspark-work
Language: Python - Size: 12.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-PySpark-Projects
Language: Python - Size: 10.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Webapplication-Spark-memory-cal
The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.
Language: Python - Size: 5.86 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

divithraju/divith-raju-big-data-projects
divith-raju-big-data-tools
Language: Python - Size: 7.81 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

divithraju/divith-raju-Hadoop-3.3.6-setup-on-Ubuntu
Language: Shell - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

divithraju/Steaming-project-Spark-Kafka-Cassandra
Size: 9.77 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Building-Big-Data-Infrastucture-NoSQL-And-SQL
Big Data Platform on MongoDB Atlas and Heroku PostgreSQL
Language: Python - Size: 197 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

divithraju/divith-raju-Pyspark_Auto.Generate
Language: Python - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

divithraju/divith-raju-Web-Server-Log-Analysis-Pyspark
Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web server log data
Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0
