An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: data-infrastructure

StructuredLabs/preswald

Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.

Language: Python - Size: 79.8 MB - Last synced at: about 2 hours ago - Pushed at: about 2 hours ago - Stars: 3,170 - Forks: 629

cocoindex-io/cocoindex

ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.

Language: Rust - Size: 3.58 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 579 - Forks: 42

uktrade/data-workspace

PostgreSQL-based open source data analysis platform

Language: HCL - Size: 1.89 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 2

ilssaf/data-platform-deployer

CLI tool for automatic data platform deployment

Language: Python - Size: 900 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Noobzik/ATL-Datamart

TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart

Language: Python - Size: 465 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 103

CrunchyData/postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.

Language: Go - Size: 63.7 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 4,097 - Forks: 608

uktrade/stream-unzip

Python function to stream unzip all the files in a ZIP archive on the fly

Language: Python - Size: 727 KB - Last synced at: about 3 hours ago - Pushed at: 5 months ago - Stars: 293 - Forks: 14

uktrade/data-workspace-frontend

An open source data analysis platform with features for users with a range of technical skills

Language: Python - Size: 51.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 46 - Forks: 25

uktrade/kibana-paas

Dockerfile and associated files for deploying Kibana in GOV.UK PaaS

Language: Python - Size: 17.6 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

uktrade/pg-sync-roles

Python utility functions to ensure that a PostgreSQL role has certain permissions

Language: Python - Size: 350 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 1

zalando/spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker

Language: Python - Size: 27.9 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 1,649 - Forks: 422

uktrade/pg-force-execute

Context manager to run PostgreSQL queries with SQLAlchemy, terminating any other clients that block it

Language: Python - Size: 88.9 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 4 - Forks: 0

uktrade/data-workspace-tools

Dockerfile for Data Workspace on-demand tools and related components

Language: HTML - Size: 4.42 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

uktrade/stream-read-ods

Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk

Language: Python - Size: 152 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 1

uktrade/stream-write-ods

Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk

Language: Python - Size: 143 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 0

tensorbase/tensorbase

TensorBase is a new big data warehousing with modern efforts.

Language: Rust - Size: 32.9 MB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 1,447 - Forks: 119

uktrade/sqlite-s3vfs

Python writable virtual filesystem for SQLite on S3

Language: Python - Size: 159 KB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 175 - Forks: 10

uktrade/fargatespawner

Spawns JupyterHub single user servers in Docker containers running in AWS Fargate

Language: Python - Size: 68.4 KB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 48 - Forks: 23

apelullo/yelp_health_data_curation_ops

An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.

Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

abhishek-ch/data-machinelearning-the-boring-way

Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.

Language: Python - Size: 3.33 MB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 11

uktrade/dns-rewrite-proxy

A DNS proxy server that conditionally rewrites and filters A record requests

Language: Python - Size: 116 KB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 30 - Forks: 6

uktrade/activity-stream

Activity Stream is a collector of various interactions between contacts at companies.

Language: Python - Size: 1.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 3

uktrade/stream-read-xbrl

Python package to parse Companies House accounts data in a streaming way

Language: Python - Size: 751 KB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 22 - Forks: 6

uktrade/mobius3

Continuously sync folder to S3, using inotify under the hood

Language: Python - Size: 4.15 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 55 - Forks: 3

uktrade/mbtiles-s3-server

Python server to on-the-fly extract and serve vector tiles from an mbtiles file on S3

Language: Python - Size: 6.78 MB - Last synced at: 28 days ago - Pushed at: 7 months ago - Stars: 154 - Forks: 4

uktrade/data-workspace-gitlab

Language: Shell - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zalando/PGObserver 📦

A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases

Language: Python - Size: 4.75 MB - Last synced at: 13 days ago - Pushed at: almost 5 years ago - Stars: 316 - Forks: 64

uktrade/stream-sqlite

Python function to extract rows from a SQLite file while iterating over its bytes

Language: Python - Size: 10.4 MB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 22 - Forks: 5

uktrade/pg-bulk-ingest

Python utility function to ingest data into a SQLAlchemy-defined PostgreSQL table

Language: Python - Size: 1010 KB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 36 - Forks: 0

alphagov/consent-api

Service for sharing user consent to cookies across multiple domains

Language: Python - Size: 1.56 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

zalando/nakadi 📦

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues

Language: Java - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 956 - Forks: 292

LatiefDataVisionary/data-management-and-data-infrastructure-college-task

Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

iTrauco/streaming-data-platform

skeleton streaming data platform on gcp...

Language: Python - Size: 12.4 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

uktrade/company-matching-service

Language: Python - Size: 130 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 2

uktrade/iterable-subprocess

Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed

Language: Python - Size: 84 KB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 7 - Forks: 2

Corey4005/STEMNET-Daily-Files

The purpose of this repository is to create a data infrastructure that will communicate with the STEMNET server at the University of Alabama Huntsville. In particular, the goal is to give anyone the capability to create clean daily files from all available stations on linux machines.

Language: Python - Size: 128 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

uktrade/to-file-like-obj

Python utility function to convert an iterable of bytes or str to a readable file-like object

Language: Python - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 13 - Forks: 0

uktrade/s3-dropbox

A simple bearer token authenticated dropbox that drops its payloads into an S3 bucket, designed to run in AWS Lambda via a Function URL

Language: Python - Size: 85.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

uktrade/mirror-git-to-s3

Python functions and CLI to mirror git repositories to S3

Language: Python - Size: 113 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 1

uktrade/factset-data-loader

Download data from factset and output to an s3 bucket

Language: Shell - Size: 3.91 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

uktrade/hawk-server-asyncio

Utility function to perform the server-side of Hawk authentication for asyncio HTTP servers

Language: Python - Size: 50.8 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

uktrade/stream-zip

Python function to construct a ZIP archive on the fly

Language: Python - Size: 945 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 113 - Forks: 9

aivanzhang/panda_patrol

Language: Python - Size: 33.2 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 21 - Forks: 0

uktrade/legal-basis-api

Legal Basis for Consent Service API Server

Language: Python - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 1

zalando/postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes

Language: Go - Size: 32.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4,335 - Forks: 979

uktrade/uk-trade-quotas-dashboard

Source code for "UK trade quotas dashboard", a prototype for testing purposes only.

Language: Python - Size: 5.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

amkrajewski/mpdd-alignn Fork of usnistgov/alignn

MPDD Calculator for Atomistic Line Graph Neural Network Deployment

Language: Python - Size: 151 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 1

uktrade/public-data-api

The source for the Department for International Trade's Public Data API

Language: HTML - Size: 7.12 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 1

uktrade/streamlit-gov-uk-components

A collection of Streamlit components that use or are inspired by the GOV.UK Design System

Language: Shell - Size: 2.58 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

uktrade/jupyters3

Jupyter Notebook Contents Manager for AWS S3

Language: Python - Size: 128 KB - Last synced at: 29 days ago - Pushed at: 7 months ago - Stars: 18 - Forks: 6

uktrade/streampq

Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor

Language: Python - Size: 229 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 8 - Forks: 0

uktrade/jwt-postgresql-proxy

Stateless JWT authentication in front of PostgreSQL

Language: Python - Size: 174 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

uktrade/python-streaming-left-join

Join iterables in code without loading them all in memory: similar to a SQL left join

Language: Python - Size: 41 KB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

uktrade/tidy-json-to-csv

Convert JSON to a set of tidy CSV files

Language: Python - Size: 60.5 KB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 23 - Forks: 1

uktrade/hawk-server 📦

Utility function to perform the server-side of Hawk authentication

Language: Python - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

uktrade/aio-throttle-to-next-second Fork of michalc/aiothrottler 📦

Throttler for asyncio Python that throttles to the next whole second

Language: Python - Size: 51.8 KB - Last synced at: 12 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

alphagov/sde-prototype-haas Fork of Nyzl/HaaS

SDE prototype dummy service - Hexagrams as a Service

Language: HTML - Size: 411 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

GiorgiaAuroraAdorni/virtual-CAT-data-infrastructure

This repository contains the data infrastructure for the Virtual Cross Array Task (CAT) platform designed to assess algorithmic skills among K-12 students.

Language: Java - Size: 939 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 1

carbonitech/data-api

Data Virtualization improving accessibility to datasets and enriching those datasets - for the HVAC Industry

Language: Python - Size: 1.9 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

zalando-nakadi/kanadi

Kanadi is a Nakadi client for Scala

Language: Scala - Size: 407 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 20

uktrade/vulnerability-priority-list

A command line report on a GitHub organisation's repositories, ordered by priority, and including time-to-SLA for each severity level

Language: Python - Size: 229 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

bizzabo/elasticsearch_to_bigquery_data_pipeline

A generic data pipeline which will map Elasticsearch documents to Bigquery table rows

Language: Kotlin - Size: 627 KB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 3

uktrade/data-workspace-superset

Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

uktrade/data-workspace-mlflow

Language: Python - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

zalando-incubator/spark-json-schema

JSON schema parser for Apache Spark

Language: Scala - Size: 78.1 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 44

uktrade/countries-of-interest-service

Lightweight API service for querying for companies that have expressed interest in exporting to specific countries

Language: Python - Size: 2.06 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

uktrade/data-engineering-common

Library of common functionality used by data engineering microservices

Language: Python - Size: 84 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

anna-geller/kestra-terraform-examples

Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform

Language: HCL - Size: 746 KB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

uktrade/quicksight-bulk-update-datasets

Command line interface (CLI) to make bulk updates to Quicksight datasets

Language: Python - Size: 114 KB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

thedataengineeringbook/thedataengineeringbook

The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย

Language: JavaScript - Size: 1.54 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 43

uktrade/git-lfs-http-mirror

Simple Python server to serve a read only HTTP mirror of git repositories that use Large File Storage (LFS)

Language: Python - Size: 34.2 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

uktrade/postgresql-proxy

Language: Python - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

uktrade/theia-postgres

PostgreSQL plugin for Theia providing explorer, highlighting, diagnostics, and intellisense

Language: TypeScript - Size: 2.35 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

uktrade/dt08-data-tools 📦

Tools which may be useful for data processing and data science applications

Language: Python - Size: 42 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

alphagov/analytics-settings-database Fork of google/analytics-settings-database

Export Google Analytics (GA4 and UA) settings

Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

uktrade/data-flow-metrics

Language: Python - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

SurenNihalani/incubator-iceberg Fork of apache/iceberg

Apache Iceberg (Incubating)

Language: Java - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

realize-engineering/pipebird

Pipebird is open source infrastructure for securely sharing data with customers.

Language: TypeScript - Size: 1.91 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 168 - Forks: 7

yennanliu/data_infra_repo

Collections of POC/dev data infrastructure. | #SE

Language: Python - Size: 7.06 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

alphagov/sde-prototype-govuk

A fake GOV.UK homepage and start pages for SDE prototype services

Language: HTML - Size: 343 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

Jzbonner/dataengineering-db

Information relating to topics on Data Engineering, Data Infrastructure, Data Storing, Data Warehouses and Business Analysis. For those interested in both conceptual theory and use case examples for database design and development.

Size: 1020 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

uktrade/mlflow-tracking-server 📦

Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

zalando-incubator/darty 📦

Data dependency manager

Language: Python - Size: 35.2 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 22 - Forks: 3

uktrade/data-store-service

Language: Python - Size: 6.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

uktrade/ecs-new-task-definition

Creates a new task definition of an ECS task

Language: Shell - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

uktrade/kibana-proxy

Language: Python - Size: 15.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

uktrade/data-engineering-sample-app

a sample app showing how to use the data-engineering-common repo to create a lightweight flask, hawk authenticated app

Language: Python - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0