An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-processing

Soriano-R/disaster-response-pipeline

A data science pipeline for analyzing and responding to disaster-related data

Language: HTML - Size: 60 MB - Last synced at: about 7 hours ago - Pushed at: about 7 hours ago - Stars: 0 - Forks: 0

qbxlvnf11/data-preprocessing-methods

Image/Text/Signal data processing methods & data parser & other utils etc.

Language: Jupyter Notebook - Size: 4.14 MB - Last synced at: about 11 hours ago - Pushed at: about 11 hours ago - Stars: 2 - Forks: 1

cocoindex-io/cocoindex

Real-time data transformation framework for AI. Ultra performant, with incremental processing.

Language: Rust - Size: 7.78 MB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 1,927 - Forks: 130

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Language: Python - Size: 132 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 27,732 - Forks: 623

johnkerl/miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Language: Go - Size: 201 MB - Last synced at: about 12 hours ago - Pushed at: 9 days ago - Stars: 9,329 - Forks: 224

Labs64/labs64.io-auditflow

Labs64.IO - Scalable & Searchable Auditing Solution

Language: Python - Size: 117 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

deepseek-ai/smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Language: Python - Size: 1.77 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 4,700 - Forks: 415

unionai-oss/pandera

A light-weight, flexible, and expressive statistical data testing library

Language: Python - Size: 3.98 MB - Last synced at: about 21 hours ago - Pushed at: 4 days ago - Stars: 3,861 - Forks: 345

ull0sm/Drawer

Drawer automates single-elimination draw systems, ensuring fairness with balanced group allocation and bias-free brackets. Now enhanced with Docker, it eliminates dependency issues for seamless event management.

Language: Python - Size: 495 KB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

apache/incubator-wayang

Apache Wayang(incubating) is the first cross-platform data processing system.

Language: Java - Size: 19.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 222 - Forks: 96

ndjapic/mat7-2024

Материјали за предмет математика у седмом разреду у школској 2024/2025. години

Language: TeX - Size: 1.27 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Efidieeieiddidfkkfkfkf/Generador-De-Oficios

Aplicación web en Flask que genera oficios personalizados en Word desde una plantilla, usando datos de destinatarios almacenados en un Excel de directorio empresarial.

Language: Python - Size: 14.6 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

olympus-terminal/data-processing

Data analysis and processing tools

Language: Python - Size: 14.6 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Dat09123/btc_address_sorter_by_type

🔎 Ultra-fast Bitcoin address sorter with real-time multiprocessing, address format detection, and low RAM usage. Ideal for forensic research, data analytics, and blockchain intelligence.

Language: Python - Size: 9.77 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

Tyson-cyber/GetMerlin2Api

GetMerlin2Api is a versatile API that allows users to seamlessly integrate Merlin2 software capabilities into their own applications, enabling enhanced project management and collaboration features. With its comprehensive documentation and user-friendly endpoints, developers can easily leverage the power of Merlin2 within their projects for optimal

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 36.1 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 88 - Forks: 18

TomWright/dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Language: Go - Size: 8.56 MB - Last synced at: about 20 hours ago - Pushed at: 3 months ago - Stars: 7,485 - Forks: 149

bytewax/bytewax

Python Stream Processing

Language: Python - Size: 12 MB - Last synced at: about 2 hours ago - Pushed at: 3 months ago - Stars: 1,765 - Forks: 82

BADER76/solar-power-measurement

This repository hosts a solar power measurement system that tracks voltage, current, and power using the STM32F103C8T6 microcontroller. The data is visualized in real-time on an OLED display and sent to the ThingSpeak IoT cloud for further analysis. 🌐🌞

Language: C - Size: 18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

numaproj/numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Language: Go - Size: 45.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,878 - Forks: 133

Naveen-526/Federated-Learning-based-IDS

This repository features a federated learning system designed for intrusion detection in IoT networks, ensuring data privacy while maintaining high accuracy. The project utilizes the Flower framework and includes essential components like data processing, server-client architecture, and SSL certificates for secure communication. 🐙🌐

Language: Jupyter Notebook - Size: 19.5 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

tathithienthanh/WomenFashionProductRecommendationSystem

Build a recommendation system for recommending woman fashion's products on e-commerce platforms

Language: Jupyter Notebook - Size: 49.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Pig85236/45K-Udemy-Course-WordPress-Posts

XML files of 45K+ Udemy courses for WordPress—Share Knowledge, Drive Traffic, & Make Money! 🔥🚀

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4 - Forks: 1

modelscope/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 223 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,607 - Forks: 243

nxoti/cnpj-data-pipeline

# 🇧🇷 CNPJ Data PipelineUm script modular e configurável para processar arquivos CNPJ da Receita Federal do Brasil. 🐙 Este projeto oferece suporte a múltiplos bancos de dados e permite o processamento inteligente de mais de 50 milhões de empresas.

Language: Python - Size: 384 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

crate/cratedb-toolkit

CrateDB Toolkit, an SDK for CrateDB and CrateDB Cloud.

Language: Python - Size: 3.54 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 10 - Forks: 4

johnhany/awesome-list

A list of useful stuff in Machine Learning, Computer Graphics, Software Development, ...

Size: 1.13 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 5

LiberTEM/LiberTEM

Open pixelated STEM framework

Language: Python - Size: 229 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 117 - Forks: 68

allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

Language: Python - Size: 63.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,241 - Forks: 144

seinecle/nocodefunctions-web-app

The code base of the front-end of nocodefunctions.com

Language: Java - Size: 37.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 39 - Forks: 7

drshahizan/HPDP

High performance data processing employs high performance computing (HPC) to process data, which is then translated into information and knowledge. The advent of high-performance computing and data analytics enabled real-time interrogation of extremely large data sets.

Language: Jupyter Notebook - Size: 400 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 121 - Forks: 86

microsoft/GODEL

Large-scale pretrained models for goal-directed dialog

Language: Python - Size: 49.8 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 869 - Forks: 112

deermichel/flowing

🔀 Rusty flow graph processing library

Language: Rust - Size: 17.6 KB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

infoslack/awesome-kafka

A list about Apache Kafka

Size: 96.7 KB - Last synced at: 1 day ago - Pushed at: 3 months ago - Stars: 579 - Forks: 164

KikiBoum4980/2025-One-Billion-Row-Challenge

Projeto One Billion Row atualizado para 2025

Language: Python - Size: 438 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

louisejuliedelhaye/counting-ocean-particles

A set of easy codes to process data on marine suspended particles collected with different sensors

Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

venis-majkofci/Log2Csv

A PowerShell script designed to parse and convert unstructured log files into structured CSV format, facilitating easier analysis and processing.

Language: PowerShell - Size: 34.2 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

speedcell4/torchglyph

Data Processor Combinators for Natural Language Processing

Language: Python - Size: 546 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 7 - Forks: 1

ChenghaoMou/text-dedup

All-in-one text de-duplication

Language: Python - Size: 5.77 MB - Last synced at: 4 days ago - Pushed at: 28 days ago - Stars: 688 - Forks: 74

dd-hebert/uv_pro

Command line tool for parsing and processing UV-Vis data from the Agilent 845x Chemstation software.

Language: Python - Size: 4.96 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 2 - Forks: 1

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 726 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 113 - Forks: 51

StatCan/gensol-gseries

(EN) Package gseries - R version of generalized system G-Series https://StatCan.github.io/gensol-gseries/en/ =========================== (FR) Librairie gseries - Version R du système généralisé G-Séries https://StatCan.github.io/gensol-gseries/fr/

Language: R - Size: 12.6 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 5 - Forks: 1

AtomGraph/Processor

Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.

Language: Java - Size: 1.51 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 60 - Forks: 7

NVIDIA-NeMo/Curator

Scalable data pre processing and curation toolkit for LLMs

Language: Jupyter Notebook - Size: 7.99 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 948 - Forks: 138

ObinnaOkoye89/diet-coach-app

A Python-based Diet Coach app that calculates total nutritional values—calories, fats, proteins, carbohydrates, and sugars—based on user-selected foods and quantities. Built using a JSON nutrition dataset for real-time feedback on dietary choices. Ideal for health-conscious users and developers interested in nutrition-focused applications.

Language: Python - Size: 238 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

technologiestiftung/erfrischungskarte-daten

Code for preprocessing and modeling and raw and resulting data for the 'Erfrischungskarte'.

Language: R - Size: 32.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8 - Forks: 2

arm-university/Arm-Helium-Technology

A reference book on M-Profile Vector Extensions (MVE) for Arm Cortex-M Processors

Size: 9.58 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 12 - Forks: 0

brunocampos01/data-engineering

Language: Python - Size: 165 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 11 - Forks: 2

subhayu99/datasetpipeline

A data processing and analysis pipeline designed to handle various jobs related to data transformation, quality assessment, deduplication, and formatting.

Language: Python - Size: 1.59 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

FaninhoFrade/GSP632-convolution

This repository offers a practical approach to image processing using convolutions and pooling on Google Cloud. 🖼️ Dive into hands-on experiments with SciPy and NumPy to enhance your understanding of deep learning concepts. 💻

Language: Python - Size: 373 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

microsoft/DialoGPT

Large-scale pretraining for dialogue

Language: Python - Size: 43.6 MB - Last synced at: 2 days ago - Pushed at: over 2 years ago - Stars: 2,389 - Forks: 348

ljubogdan/GSP632-convolution

A project for image processing using convolutions and pooling in Google Cloud. Load and process images with SciPy and NumPy, create 3x3 filters, and analyze output effects. Focuses on practical applications of deep learning and computer vision.

Language: Python - Size: 374 KB - Last synced at: 5 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

ictchenbo/SmartETL

SmartETL:一个简单、灵活、可配置、开箱即用的Python ETL框架,具有领域特色,拒绝重复造轮子!提供Wikidata / Wikipedia / GDELT等多种开源数据的处理流程; 支持txt/json/csv/excel等文件格式、MySQL/PostgreSQL/MongoDB/ClickHouse/ElasticSearch等数据库作为输入和输出; 提供大模型、Web API等多种处理算子

Language: Python - Size: 4.74 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 17 - Forks: 3

NVIDIA/nvImageCodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

Language: Jupyter Notebook - Size: 22.3 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 106 - Forks: 8

NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Language: C++ - Size: 395 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 5,429 - Forks: 639

QuantumRevenant/ListProductsImages

ListProductsImages: C#/.NET 8 console utility for listing and filtering files in directories with advanced rules (folders, regex) and interactive menus. Exports to .txt.

Language: C# - Size: 36.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

polyaxon/haupt

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Language: Python - Size: 1.16 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 453 - Forks: 209

CEA-MetroCarac/SPECTROview

SPECTROview : A Tool for Spectroscopic Data Processing and Visualization.

Language: Python - Size: 208 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

niamoto/niamoto

Niamoto is a command-line application and library focused on processing and publishing botanical data

Language: Python - Size: 11.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Language: Elixir - Size: 656 KB - Last synced at: 7 days ago - Pushed at: 16 days ago - Stars: 2,536 - Forks: 167

abhimehro/Seatek_Analysis

R-based analysis tier for Seatek sensor data processing and Excel workbook generation. Part of a three-tier analysis system working in conjunction with Python-based visualization project.

Language: Python - Size: 50.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

adanSiqueira/modular-data-pipeline

Data pipeline in Python, structured with Object-Oriented Programming (OOP), using pandas for processing, requests for automated downloads, and pathlib for directory handling. Modular and organized to transform raw data (JSON and CSV) into analysis-ready datasets with a single command.

Language: Python - Size: 56.1 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

simsam8/ers_data_processing

This repo contains code for processing and visualizing ERS and AIS data from Fiskeridirektoratet.

Language: Jupyter Notebook - Size: 556 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

tinosingh/multipass

Universal API Wrapper - Turn ANY Python Library into a Robust API

Language: Python - Size: 41 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

senbox-org/snap-engine

ESA Earth Observation Toolbox and Java Development Platform

Language: Java - Size: 1020 MB - Last synced at: 7 days ago - Pushed at: 11 days ago - Stars: 193 - Forks: 102

lispking/fluxus

Fluxus Stream Processing Engine

Language: Rust - Size: 5.02 MB - Last synced at: 7 days ago - Pushed at: 29 days ago - Stars: 150 - Forks: 22

aces/cbrain

CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.

Language: Ruby - Size: 20.4 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 78 - Forks: 51

mech-lang/mech

🦾 Mech is a programming language for building data-driven systems like robots, games, and interfaces. Start here!

Language: Rust - Size: 11.2 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 222 - Forks: 12

kwadwo-Oppong/gdp-natural-cubic-spline-regression

This project investigates economic growth factors, specifically GDP, by applying ordinary least squares (OLS) and a more robust, proposed estimator. It includes data preparation, feature engineering with natural cubic splines, and detailed analysis

Language: Jupyter Notebook - Size: 76.2 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

senbox-org/snap-desktop

Desktop GUI for SNAP based on NetBeans Platform

Language: Java - Size: 77.8 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 142 - Forks: 64

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

Language: Python - Size: 21.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 222 - Forks: 55

etsap-TIMES/xl2times

Open source tool to convert TIMES models specified in Excel

Language: Python - Size: 931 KB - Last synced at: 4 days ago - Pushed at: 13 days ago - Stars: 18 - Forks: 9

markus-wa/cq

Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more

Language: Clojure - Size: 202 KB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 178 - Forks: 11

rohankharche34/Solar-panel-performance-optimization

Stacked regression ensemble using sensor and environmental data to forecast solar panel efficiency.

Language: Jupyter Notebook - Size: 5.25 MB - Last synced at: 8 minutes ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

havrak/fmcw-surveillance-radar

Respository of my bachelor's thesis whose subject is constructing a surveillance radar based on FMCW SiRad Easy

Language: MATLAB - Size: 148 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1 - Forks: 0

legend-exp/Juleanita.jl

Meta-package for the Julia software stack to analyse teststand data for the LEGEND experiment.

Language: Julia - Size: 1.15 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 2 - Forks: 1

flow-php/etl-adapter-elasticsearch

PHP ETL Adapter: Elasticsearch

Language: PHP - Size: 289 KB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 1

flow-php/etl

PHP - ETL (Extract Transform Load) data processing library

Language: PHP - Size: 3.7 MB - Last synced at: 7 days ago - Pushed at: 13 days ago - Stars: 359 - Forks: 20

tealtools/awesome-apache-pulsar

A curated list of resources about Apache Pulsar.

Size: 367 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 30 - Forks: 3

abrahamkoloboe27/Housing-Price-Prediction

Language: Jupyter Notebook - Size: 937 KB - Last synced at: 13 days ago - Pushed at: 14 days ago - Stars: 2 - Forks: 0

zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.

Language: JavaScript - Size: 3.66 MB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 33 - Forks: 2

ddeutils/ddeutil-extensions

:building_construction: Dynamic data processing & transformation plugins

Language: Python - Size: 604 KB - Last synced at: about 17 hours ago - Pushed at: 14 days ago - Stars: 0 - Forks: 0

karimosman89/iot-predictive-maintenance

This repository will simulate an IoT-based predictive maintenance system designed to monitor industrial equipment through sensors. It will include data ingestion, processing, and machine learning components to predict potential failures, optimizing maintenance schedules and reducing downtime.

Language: Python - Size: 38.1 KB - Last synced at: 5 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

MrGL1TCH/Exportador_csv_a_base_de_datos

Exportador CSV a Base de Datos es una aplicación web diseñada para simplificar y automatizar la importación de datos desde archivos CSV hacia bases de datos MySQL. Ideal para usuarios y desarrolladores que necesitan una herramienta rápida, confiable y fácil de usar para manejar grandes volúmenes de datos sin complicaciones técnicas.

Language: PHP - Size: 13.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

ColasGael/Machine-Learning-for-Solar-Energy-Prediction

Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning

Language: Python - Size: 922 MB - Last synced at: 7 days ago - Pushed at: over 5 years ago - Stars: 267 - Forks: 113

CoreBlader/autobiz-api-extractor

# Autobiz API Extractor## DescriptionThis project extracts data from the [Autobiz API](https://corporate.autobiz.com/es/nuestros-productos/autobizapi/), storing it in JSON or CSV files, and analyzes the results. It features a modular structure for easy data extraction, processing, and visualization. 🐙📊

Language: Python - Size: 14.6 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

jwalsh/emacsconf-2024

EmacsConf 2024 conference notes, transcript processing, and analysis tools

Language: Python - Size: 356 KB - Last synced at: 3 days ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

getstrm/pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

Language: Kotlin - Size: 13.1 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 36 - Forks: 1

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: 15 days ago - Pushed at: over 3 years ago - Stars: 91 - Forks: 5

pyper-dev/pyper

Concurrent Python made simple

Language: Python - Size: 462 KB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 1,421 - Forks: 28

IBM/ibm-cloud-functions-data-processing-message-hub 📦

Create a serverless, event-driven application with Apache OpenWhisk on IBM Cloud Functions that executes code in response to messages or to handle streams of data records from Apache Kafka or IBM Message Hub.

Language: Shell - Size: 1.55 MB - Last synced at: 13 days ago - Pushed at: about 6 years ago - Stars: 21 - Forks: 26

MDSplus/mdsplus

The MDSplus data management system

Language: Java - Size: 148 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 82 - Forks: 48

Kellybrackets/data-Analytics-projects

A collection of end-to-end analytics projects demonstrating expertise in transforming raw data into actionable business insights using modern analytics tools and methodologies.

Size: 1.95 KB - Last synced at: 18 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

Ambeteco/faster-os

6800% faster "os" module replacement. A drop-in replacement for Python's standard 'OS' module. Fully-rewritten, optimized, and speeded-up functions, that replace ones in the os.path module.

Language: Python - Size: 1.53 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 2

Siteimprove/alfa

:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale

Language: TypeScript - Size: 52.2 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 115 - Forks: 12

ml6team/fondant

Production-ready data processing made easy and shareable

Language: Python - Size: 23 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 351 - Forks: 27

ElecGeek/HealthMeter

Converts the binary file (.DAT) into a more readable and makes some statistics for health or sport meters.

Language: C++ - Size: 53.7 KB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 0 - Forks: 0

remotesensinginfo/rsgislib

Remote Sensing and GIS Software Library; python module tools for processing spatial data.

Language: C++ - Size: 140 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 153 - Forks: 28

Siteimprove/alfa-act-r

:clipboard: Acceptance testing of rules authored by the ACT Rules Community Group (@act-rules) and implemented by Alfa

Language: TypeScript - Size: 34.6 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 2

AmirAli104/Text2Excel

A GUI desktop application that can extract data from a text file and put them in an Excel or CSV file using regular expression (regex) patterns

Language: Python - Size: 208 KB - Last synced at: 4 days ago - Pushed at: 21 days ago - Stars: 4 - Forks: 0

Related Keywords