An open API service providing repository metadata for many open source software ecosystems.

GitHub / alireza-heidarii / Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using Apache Spark, SparkNLP, Spark Streaming, and Kafka.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alireza-heidarii%2FReal-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data
PURL: pkg:github/alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

Stars: 11
Forks: 0
Open issues: 0

License: apache-2.0
Language: Python
Size: 11.7 KB
Dependencies parsed at: Pending

Created at: about 1 year ago
Updated at: 4 months ago
Pushed at: 5 months ago
Last synced at: 4 months ago

Topics: data-cleaning, data-pipelines, data-preprocessing, healthcare-datasets, kafka, medical-data-analysis, natural-language-processing, parquet, pyspark, python, real-time-data-processing, spark-nlp, spark-streaming

    Loading...