GitHub topics: data-infrastructure
StructuredLabs/preswald
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.
Language: Python - Size: 79.8 MB - Last synced at: about 2 hours ago - Pushed at: about 2 hours ago - Stars: 3,170 - Forks: 629

cocoindex-io/cocoindex
ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.
Language: Rust - Size: 3.58 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 579 - Forks: 42

uktrade/data-workspace
PostgreSQL-based open source data analysis platform
Language: HCL - Size: 1.89 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 2

ilssaf/data-platform-deployer
CLI tool for automatic data platform deployment
Language: Python - Size: 900 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Noobzik/ATL-Datamart
TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart
Language: Python - Size: 465 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 103

CrunchyData/postgres-operator
Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
Language: Go - Size: 63.7 MB - Last synced at: 3 days ago - Pushed at: 10 days ago - Stars: 4,097 - Forks: 608

uktrade/stream-unzip
Python function to stream unzip all the files in a ZIP archive on the fly
Language: Python - Size: 727 KB - Last synced at: about 3 hours ago - Pushed at: 5 months ago - Stars: 293 - Forks: 14

uktrade/data-workspace-frontend
An open source data analysis platform with features for users with a range of technical skills
Language: Python - Size: 51.5 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 46 - Forks: 25

uktrade/kibana-paas
Dockerfile and associated files for deploying Kibana in GOV.UK PaaS
Language: Python - Size: 17.6 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

uktrade/pg-sync-roles
Python utility functions to ensure that a PostgreSQL role has certain permissions
Language: Python - Size: 350 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 1

zalando/spilo
Highly available elephant herd: HA PostgreSQL cluster using Docker
Language: Python - Size: 27.9 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 1,649 - Forks: 422

uktrade/pg-force-execute
Context manager to run PostgreSQL queries with SQLAlchemy, terminating any other clients that block it
Language: Python - Size: 88.9 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 4 - Forks: 0

uktrade/data-workspace-tools
Dockerfile for Data Workspace on-demand tools and related components
Language: HTML - Size: 4.42 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

uktrade/stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Language: Python - Size: 152 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 1

uktrade/stream-write-ods
Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Language: Python - Size: 143 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 0

tensorbase/tensorbase
TensorBase is a new big data warehousing with modern efforts.
Language: Rust - Size: 32.9 MB - Last synced at: 15 days ago - Pushed at: almost 3 years ago - Stars: 1,447 - Forks: 119

uktrade/sqlite-s3vfs
Python writable virtual filesystem for SQLite on S3
Language: Python - Size: 159 KB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 175 - Forks: 10

uktrade/fargatespawner
Spawns JupyterHub single user servers in Docker containers running in AWS Fargate
Language: Python - Size: 68.4 KB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 48 - Forks: 23

apelullo/yelp_health_data_curation_ops
An AWS-based data pipeline to extract, process, store, and monitor Yelp "health-related" facility data in support of ongoing health system initiatives.
Language: Jupyter Notebook - Size: 1.17 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

abhishek-ch/data-machinelearning-the-boring-way
Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.
Language: Python - Size: 3.33 MB - Last synced at: 18 days ago - Pushed at: over 2 years ago - Stars: 57 - Forks: 11

uktrade/dns-rewrite-proxy
A DNS proxy server that conditionally rewrites and filters A record requests
Language: Python - Size: 116 KB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 30 - Forks: 6

uktrade/activity-stream
Activity Stream is a collector of various interactions between contacts at companies.
Language: Python - Size: 1.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 3

uktrade/stream-read-xbrl
Python package to parse Companies House accounts data in a streaming way
Language: Python - Size: 751 KB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 22 - Forks: 6

uktrade/mobius3
Continuously sync folder to S3, using inotify under the hood
Language: Python - Size: 4.15 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 55 - Forks: 3

uktrade/mbtiles-s3-server
Python server to on-the-fly extract and serve vector tiles from an mbtiles file on S3
Language: Python - Size: 6.78 MB - Last synced at: 28 days ago - Pushed at: 7 months ago - Stars: 154 - Forks: 4

uktrade/data-workspace-gitlab
Language: Shell - Size: 10.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

zalando/PGObserver 📦
A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
Language: Python - Size: 4.75 MB - Last synced at: 13 days ago - Pushed at: almost 5 years ago - Stars: 316 - Forks: 64

uktrade/stream-sqlite
Python function to extract rows from a SQLite file while iterating over its bytes
Language: Python - Size: 10.4 MB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 22 - Forks: 5

uktrade/pg-bulk-ingest
Python utility function to ingest data into a SQLAlchemy-defined PostgreSQL table
Language: Python - Size: 1010 KB - Last synced at: 21 days ago - Pushed at: about 2 months ago - Stars: 36 - Forks: 0

alphagov/consent-api
Service for sharing user consent to cookies across multiple domains
Language: Python - Size: 1.56 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

zalando/nakadi 📦
A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
Language: Java - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 956 - Forks: 292

LatiefDataVisionary/data-management-and-data-infrastructure-college-task
Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

iTrauco/streaming-data-platform
skeleton streaming data platform on gcp...
Language: Python - Size: 12.4 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

uktrade/company-matching-service
Language: Python - Size: 130 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 2

uktrade/iterable-subprocess
Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed
Language: Python - Size: 84 KB - Last synced at: 13 days ago - Pushed at: 7 months ago - Stars: 7 - Forks: 2

Corey4005/STEMNET-Daily-Files
The purpose of this repository is to create a data infrastructure that will communicate with the STEMNET server at the University of Alabama Huntsville. In particular, the goal is to give anyone the capability to create clean daily files from all available stations on linux machines.
Language: Python - Size: 128 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

uktrade/to-file-like-obj
Python utility function to convert an iterable of bytes or str to a readable file-like object
Language: Python - Size: 33.2 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 13 - Forks: 0

uktrade/s3-dropbox
A simple bearer token authenticated dropbox that drops its payloads into an S3 bucket, designed to run in AWS Lambda via a Function URL
Language: Python - Size: 85.9 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

uktrade/mirror-git-to-s3
Python functions and CLI to mirror git repositories to S3
Language: Python - Size: 113 KB - Last synced at: 28 days ago - Pushed at: 5 months ago - Stars: 3 - Forks: 1

uktrade/factset-data-loader
Download data from factset and output to an s3 bucket
Language: Shell - Size: 3.91 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

uktrade/hawk-server-asyncio
Utility function to perform the server-side of Hawk authentication for asyncio HTTP servers
Language: Python - Size: 50.8 KB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

uktrade/stream-zip
Python function to construct a ZIP archive on the fly
Language: Python - Size: 945 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 113 - Forks: 9

aivanzhang/panda_patrol
Language: Python - Size: 33.2 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 21 - Forks: 0

uktrade/legal-basis-api
Legal Basis for Consent Service API Server
Language: Python - Size: 1.23 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 1

zalando/postgres-operator
Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
Language: Go - Size: 32.9 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4,335 - Forks: 979

uktrade/uk-trade-quotas-dashboard
Source code for "UK trade quotas dashboard", a prototype for testing purposes only.
Language: Python - Size: 5.47 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

amkrajewski/mpdd-alignn Fork of usnistgov/alignn
MPDD Calculator for Atomistic Line Graph Neural Network Deployment
Language: Python - Size: 151 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 4 - Forks: 1

uktrade/public-data-api
The source for the Department for International Trade's Public Data API
Language: HTML - Size: 7.12 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 1

uktrade/streamlit-gov-uk-components
A collection of Streamlit components that use or are inspired by the GOV.UK Design System
Language: Shell - Size: 2.58 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

uktrade/jupyters3
Jupyter Notebook Contents Manager for AWS S3
Language: Python - Size: 128 KB - Last synced at: 29 days ago - Pushed at: 7 months ago - Stars: 18 - Forks: 6

uktrade/streampq
Python PostgreSQL adapter to stream results of multi-statement queries without a server-side cursor
Language: Python - Size: 229 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 8 - Forks: 0

uktrade/jwt-postgresql-proxy
Stateless JWT authentication in front of PostgreSQL
Language: Python - Size: 174 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

uktrade/python-streaming-left-join
Join iterables in code without loading them all in memory: similar to a SQL left join
Language: Python - Size: 41 KB - Last synced at: 16 days ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

uktrade/tidy-json-to-csv
Convert JSON to a set of tidy CSV files
Language: Python - Size: 60.5 KB - Last synced at: 27 days ago - Pushed at: 7 months ago - Stars: 23 - Forks: 1

uktrade/hawk-server 📦
Utility function to perform the server-side of Hawk authentication
Language: Python - Size: 42 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

uktrade/aio-throttle-to-next-second Fork of michalc/aiothrottler 📦
Throttler for asyncio Python that throttles to the next whole second
Language: Python - Size: 51.8 KB - Last synced at: 12 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

alphagov/sde-prototype-haas Fork of Nyzl/HaaS
SDE prototype dummy service - Hexagrams as a Service
Language: HTML - Size: 411 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

GiorgiaAuroraAdorni/virtual-CAT-data-infrastructure
This repository contains the data infrastructure for the Virtual Cross Array Task (CAT) platform designed to assess algorithmic skills among K-12 students.
Language: Java - Size: 939 KB - Last synced at: 10 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 1

carbonitech/data-api
Data Virtualization improving accessibility to datasets and enriching those datasets - for the HVAC Industry
Language: Python - Size: 1.9 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

zalando-nakadi/kanadi
Kanadi is a Nakadi client for Scala
Language: Scala - Size: 407 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 20

uktrade/vulnerability-priority-list
A command line report on a GitHub organisation's repositories, ordered by priority, and including time-to-SLA for each severity level
Language: Python - Size: 229 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 3 - Forks: 0

bizzabo/elasticsearch_to_bigquery_data_pipeline
A generic data pipeline which will map Elasticsearch documents to Bigquery table rows
Language: Kotlin - Size: 627 KB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 3

uktrade/data-workspace-superset
Language: Python - Size: 36.1 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

uktrade/data-workspace-mlflow
Language: Python - Size: 25.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

zalando-incubator/spark-json-schema
JSON schema parser for Apache Spark
Language: Scala - Size: 78.1 KB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 81 - Forks: 44

uktrade/countries-of-interest-service
Lightweight API service for querying for companies that have expressed interest in exporting to specific countries
Language: Python - Size: 2.06 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

uktrade/data-engineering-common
Library of common functionality used by data engineering microservices
Language: Python - Size: 84 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

anna-geller/kestra-terraform-examples
Bring Infrastructure as Code best practices to your data workflows with Kestra and Terraform
Language: HCL - Size: 746 KB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 5 - Forks: 0

uktrade/quicksight-bulk-update-datasets
Command line interface (CLI) to make bulk updates to Quicksight datasets
Language: Python - Size: 114 KB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

thedataengineeringbook/thedataengineeringbook
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Language: JavaScript - Size: 1.54 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 43

uktrade/git-lfs-http-mirror
Simple Python server to serve a read only HTTP mirror of git repositories that use Large File Storage (LFS)
Language: Python - Size: 34.2 KB - Last synced at: 20 days ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

uktrade/postgresql-proxy
Language: Python - Size: 17.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

uktrade/theia-postgres
PostgreSQL plugin for Theia providing explorer, highlighting, diagnostics, and intellisense
Language: TypeScript - Size: 2.35 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

uktrade/dt08-data-tools 📦
Tools which may be useful for data processing and data science applications
Language: Python - Size: 42 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

alphagov/analytics-settings-database Fork of google/analytics-settings-database
Export Google Analytics (GA4 and UA) settings
Language: Python - Size: 47.9 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

uktrade/data-flow-metrics
Language: Python - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

SurenNihalani/incubator-iceberg Fork of apache/iceberg
Apache Iceberg (Incubating)
Language: Java - Size: 4.53 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

realize-engineering/pipebird
Pipebird is open source infrastructure for securely sharing data with customers.
Language: TypeScript - Size: 1.91 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 168 - Forks: 7

yennanliu/data_infra_repo
Collections of POC/dev data infrastructure. | #SE
Language: Python - Size: 7.06 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

alphagov/sde-prototype-govuk
A fake GOV.UK homepage and start pages for SDE prototype services
Language: HTML - Size: 343 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

Jzbonner/dataengineering-db
Information relating to topics on Data Engineering, Data Infrastructure, Data Storing, Data Warehouses and Business Analysis. For those interested in both conceptual theory and use case examples for database design and development.
Size: 1020 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

uktrade/mlflow-tracking-server 📦
Language: Python - Size: 4.88 KB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

zalando-incubator/darty 📦
Data dependency manager
Language: Python - Size: 35.2 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 22 - Forks: 3

uktrade/data-store-service
Language: Python - Size: 6.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

uktrade/ecs-new-task-definition
Creates a new task definition of an ECS task
Language: Shell - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 0 - Forks: 0

uktrade/kibana-proxy
Language: Python - Size: 15.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

uktrade/data-engineering-sample-app
a sample app showing how to use the data-engineering-common repo to create a lightweight flask, hawk authenticated app
Language: Python - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0
