Topic: "data-integration"
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language: Python - Size: 374 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 39,930 - Forks: 14,972

Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
Language: Python - Size: 150 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 18,048 - Forks: 1,881

airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Language: Python - Size: 666 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 18,044 - Forks: 4,503

dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Language: Python - Size: 1.26 GB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 13,081 - Forks: 1,668

apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Language: Java - Size: 42.4 MB - Last synced at: about 1 hour ago - Pushed at: about 2 hours ago - Stars: 8,478 - Forks: 1,966

mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Language: Python - Size: 233 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8,300 - Forks: 841

cloudquery/cloudquery
The developer first cloud governance platform
Language: Go - Size: 172 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 6,083 - Forks: 528

apache/flink-cdc
Flink CDC is a streaming data integration tool
Language: Java - Size: 40.9 MB - Last synced at: 2 days ago - Pushed at: 12 days ago - Stars: 6,051 - Forks: 2,010

apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Language: Java - Size: 1.74 GB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 5,756 - Forks: 2,396

infinyon/fluvio
🦀 event stream processing for developers to stream and process data in motion to power responsive data intensive applications.
Language: Rust - Size: 34.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,916 - Forks: 514

jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Language: TypeScript - Size: 43 MB - Last synced at: about 7 hours ago - Pushed at: 5 days ago - Stars: 4,292 - Forks: 312

rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
Language: Go - Size: 308 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4,176 - Forks: 330

DTStack/chunjun
A data integration framework
Language: Java - Size: 126 MB - Last synced at: 13 days ago - Pushed at: 2 months ago - Stars: 4,046 - Forks: 1,699

seandavi/awesome-single-cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Size: 1.43 MB - Last synced at: 12 days ago - Pushed at: 20 days ago - Stars: 3,358 - Forks: 1,016

bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Language: Python - Size: 167 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2,951 - Forks: 79

apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Language: Go - Size: 38.3 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 2,721 - Forks: 577

mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Language: Python - Size: 3.29 MB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 2,078 - Forks: 100

bytedance/bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Language: Java - Size: 26.4 MB - Last synced at: 27 days ago - Pushed at: over 1 year ago - Stars: 1,654 - Forks: 331

apache/hop
Hop Orchestration Platform
Language: Java - Size: 197 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,130 - Forks: 367

kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
Language: JavaScript - Size: 7.79 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 792 - Forks: 54

apache/seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Language: Java - Size: 17.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 677 - Forks: 302

artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Language: Go - Size: 3.85 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 651 - Forks: 33

immunogenomics/harmony
Fast, sensitive and accurate integration of single-cell data with Harmony
Language: R - Size: 52.9 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 568 - Forks: 102

leesf/hudi-resources
汇总Apache Hudi相关资料
Size: 23.7 MB - Last synced at: about 1 hour ago - Pushed at: about 3 hours ago - Stars: 551 - Forks: 161

saeyslab/nichenetr
NicheNet: predict active ligand-target links between interacting cells
Language: R - Size: 152 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 534 - Forks: 124

ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Language: Go - Size: 12.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 490 - Forks: 51

theislab/scarches
Reference mapping for single-cell genomics
Language: Jupyter Notebook - Size: 825 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 347 - Forks: 52

gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
Language: Python - Size: 1.42 MB - Last synced at: 14 days ago - Pushed at: 16 days ago - Stars: 344 - Forks: 26

CategoricalData/CQL
Categorical Query Language IDE
Language: Java - Size: 145 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 299 - Forks: 22

cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language: JavaScript - Size: 28 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 285 - Forks: 28

hetio/hetionet
Hetionet: an integrative network of disease
Language: HTML - Size: 380 MB - Last synced at: 28 days ago - Pushed at: about 2 years ago - Stars: 282 - Forks: 69

pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Size: 219 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 274 - Forks: 29

CommonCoreOntology/CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Language: Makefile - Size: 16.5 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 230 - Forks: 61

dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Language: JavaScript - Size: 281 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 226 - Forks: 33

slowkow/harmonypy
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
Language: Python - Size: 2.77 MB - Last synced at: 6 days ago - Pushed at: 10 months ago - Stars: 217 - Forks: 22

morph-kgc/morph-kgc
Powerful RDF Knowledge Graph Generation with RML Mappings
Language: Python - Size: 32.8 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 209 - Forks: 39

opensanctions/nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Language: Python - Size: 2.54 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 209 - Forks: 38

mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
Language: Python - Size: 24 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 178 - Forks: 39

ceumicrodata/mETL
mito ETL tool
Language: Python - Size: 7.43 MB - Last synced at: 4 days ago - Pushed at: almost 4 years ago - Stars: 163 - Forks: 41

mims-harvard/scikit-fusion
scikit-fusion: Data fusion via collective latent factor models
Language: Python - Size: 9.28 MB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 147 - Forks: 44

google/megalista 📦
First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products (Google Ads, Campaign Manager, Google Analytics).
Language: Python - Size: 1.34 MB - Last synced at: 22 days ago - Pushed at: 4 months ago - Stars: 137 - Forks: 55

genular/pandora
PANDORA :computer:
Language: Vue - Size: 16.4 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 135 - Forks: 21

SDM-TIB/SDM-RDFizer
An Efficient RML-Compliant Engine for Knowledge Graph Construction
Language: Python - Size: 21 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 118 - Forks: 25

starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Language: Scala - Size: 170 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 107 - Forks: 23

olehmberg/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Language: Java - Size: 18.6 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 105 - Forks: 32

thedataengineeringbook/thedataengineeringbook
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
Language: JavaScript - Size: 1.54 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 103 - Forks: 43

Teichlab/cellhint
A tool for semi-automatic cell type harmonization and integration
Language: Python - Size: 6.78 MB - Last synced at: 28 days ago - Pushed at: about 2 months ago - Stars: 102 - Forks: 14

runprism/prism
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Language: Python - Size: 2.42 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 85 - Forks: 2

SysBioChalmers/GECKO
Toolbox for including enzyme constraints on a genome-scale model.
Language: MATLAB - Size: 107 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 70 - Forks: 52

saezlab/cosmosR
COSMOS (Causal Oriented Search of Multi-Omic Space) is a method that integrates phosphoproteomics, transcriptomics, and metabolomics data sets.
Language: R - Size: 53.2 MB - Last synced at: 23 days ago - Pushed at: 2 months ago - Stars: 60 - Forks: 16

munchy-bytes/SchemaMapper
A .NET class library that allows you to import data from different sources into a unified destination
Language: C# - Size: 5.9 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 60 - Forks: 16

jupyter-naas/drivers
Low-code Python library enabling access to APIs, tools, data sources in seconds.
Language: Python - Size: 1.53 MB - Last synced at: 28 days ago - Pushed at: 10 months ago - Stars: 59 - Forks: 12

siyul-park/uniflow
A high-performance, extremely flexible, and easily extensible universal workflow engine.
Language: Go - Size: 2.94 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 51 - Forks: 5

CogStack/CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Language: Python - Size: 74.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 50 - Forks: 20

DP6/marketing-data-sync Fork of google/megalista
First Party data integration solution built for marketing teams to enable audience and conversion onboarding into Google Marketing products and Facebook Ads.
Language: Python - Size: 959 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 49 - Forks: 6

linkml/linkml-model
Link Modeling Language (LinkML) model
Language: Python - Size: 12.7 MB - Last synced at: 23 days ago - Pushed at: 24 days ago - Stars: 48 - Forks: 20

datasphere-oss/datasphere-integration
an data-centric integration platform
Language: Java - Size: 20.7 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 48 - Forks: 17

umer7/Data-Warehouse-Concepts-Design-and-Data-Integration
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
Size: 35 MB - Last synced at: 6 months ago - Pushed at: almost 7 years ago - Stars: 45 - Forks: 32

neuroforgede/nfcompose
Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.
Language: Python - Size: 2.57 MB - Last synced at: about 6 hours ago - Pushed at: 24 days ago - Stars: 39 - Forks: 3

Azure/data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Language: Bicep - Size: 11.3 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 22

mara/mara-etl-tools
Utilities for creating ETL pipelines with mara
Language: PLpgSQL - Size: 54.7 KB - Last synced at: about 23 hours ago - Pushed at: almost 3 years ago - Stars: 36 - Forks: 4

Azure/data-product-streaming
Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Language: Bicep - Size: 12.1 MB - Last synced at: about 1 year ago - Pushed at: almost 2 years ago - Stars: 35 - Forks: 12

AltschulerWu-Lab/MUSE
MUSE is a deep learning approach characterizing tissue composition through combined analysis of morphologies and transcriptional states for spatially resolved transcriptomics data.
Language: Jupyter Notebook - Size: 153 MB - Last synced at: 19 days ago - Pushed at: about 3 years ago - Stars: 34 - Forks: 8

selbouhaddani/OmicsPLS
R package for High dimensional data analysis and integration with O2PLS!
Language: HTML - Size: 31.4 MB - Last synced at: 28 days ago - Pushed at: about 1 month ago - Stars: 32 - Forks: 8

DerwenAI/ERKG
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
Size: 13.9 MB - Last synced at: 9 days ago - Pushed at: 9 months ago - Stars: 32 - Forks: 6

JonnyTran/OpenOmics
A bioinformatics API to interface with public multi-omics bio databases for wicked fast data integration.
Language: Python - Size: 68.5 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 32 - Forks: 11

oeg-upm/mapeathor
Translator of spreadsheet mappings into R2RML, RML or YARRRML
Language: Python - Size: 58.8 MB - Last synced at: 30 days ago - Pushed at: 12 months ago - Stars: 32 - Forks: 10

dhimmel/integrate
Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
Language: Jupyter Notebook - Size: 565 MB - Last synced at: 29 days ago - Pushed at: over 7 years ago - Stars: 32 - Forks: 17

zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and automate ETL processes for efficient data management.
Language: JavaScript - Size: 3.66 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 30 - Forks: 2

linkedin/data-integration-library
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
Language: Java - Size: 1.51 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 30 - Forks: 14

YangLabHKUST/Portal
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets
Language: Python - Size: 119 KB - Last synced at: 22 days ago - Pushed at: almost 2 years ago - Stars: 30 - Forks: 6

DTUComputeStatisticsAndDataAnalysis/MBPLS
(Multiblock) Partial Least Squares Regression for Python
Language: Python - Size: 16.6 MB - Last synced at: 8 days ago - Pushed at: over 5 years ago - Stars: 30 - Forks: 7

thymeflow/thymeflow
Installer for Thymeflow, a personal knowledge management system.
Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: about 7 years ago - Stars: 30 - Forks: 5

raamana/pyradigm
Research data management in biomedical and machine learning applications
Language: Python - Size: 7.25 MB - Last synced at: 8 days ago - Pushed at: about 2 years ago - Stars: 29 - Forks: 12

cthoyt/doctoral-thesis
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Language: TeX - Size: 68.6 MB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 29 - Forks: 2

ginkgobioworks/geckopy
Enzyme-constrained genome-scale models in python
Language: Python - Size: 4.84 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 26 - Forks: 7

dosorio/rPanglaoDB
An R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database into a Seurat object.
Language: HTML - Size: 2.24 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 26 - Forks: 3

cloudquery/plugin-sdk
CloudQuery Go SDK for source and destination plugins
Language: Go - Size: 18.2 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 25 - Forks: 25

glasgowcompbio/pyMultiOmics
Python toolbox for multi-omics data mapping and analysis
Language: Jupyter Notebook - Size: 45.9 MB - Last synced at: 22 days ago - Pushed at: about 2 years ago - Stars: 24 - Forks: 5

davidfoerster/schema-matching
Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can contact me for questions and I may even add docs, if I sense enough interest.
Language: Python - Size: 271 KB - Last synced at: 17 days ago - Pushed at: over 5 years ago - Stars: 24 - Forks: 8

JinmiaoChenLab/FastIntegration
FastIntegrate integrates thousands of scRNA-seq datasets and outputs batch-corrected values for downstream analysis
Language: R - Size: 2.37 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 23 - Forks: 4

shuxiaoc/mario-py
MARIO: single-cell proteomic data matching and integration using both shared and distinct features
Language: Jupyter Notebook - Size: 660 MB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 23 - Forks: 2

abcsys/libem
Compound AI toolchain for fast and accurate entity matching, powered by LLMs.
Language: Python - Size: 3.54 MB - Last synced at: about 13 hours ago - Pushed at: about 2 months ago - Stars: 22 - Forks: 4

yezhengSTAT/ADTnorm
ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
Language: R - Size: 48.2 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 22 - Forks: 5

bio2bel/bio2bel
A Python framework for integrating biological databases and structured data sources in Biological Expression Language (BEL)
Language: Python - Size: 417 KB - Last synced at: about 15 hours ago - Pushed at: over 3 years ago - Stars: 21 - Forks: 5

JohnnyBravo75/DataBridge.NET
Configurable data bridge for permanent ETL jobs
Language: C# - Size: 11.1 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 20 - Forks: 10

Amine-Smahi/R-Learning-Journey
Some of the projects i made when starting to learn R for Data Science at the university
Language: R - Size: 63.5 KB - Last synced at: about 1 month ago - Pushed at: almost 6 years ago - Stars: 20 - Forks: 0

CloudFormations/CF.Cumulus
A cloud data platform product to accelerate time to insights. Our open-source framework is designed for the real world. Stripping away the complexity, giving you the power to build, scale, and manage your dataflows with ease, accelerating data delivery.
Language: TSQL - Size: 10.5 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 19 - Forks: 11

caokai1073/Pamona
The software of Pamona, a partial manifold alignment algorithm.
Language: Jupyter Notebook - Size: 41.4 MB - Last synced at: 26 days ago - Pushed at: about 4 years ago - Stars: 19 - Forks: 3

NPLinker/nplinker
A python framework for microbial natural products data mining by integrating genomics and metabolomics data
Language: Python - Size: 116 MB - Last synced at: 20 days ago - Pushed at: 27 days ago - Stars: 18 - Forks: 13

oeg-upm/gtfs-bench
GTFS-Madrid-Bench: A Benchmark for Knowledge Graph Construction Engines
Language: Python - Size: 197 MB - Last synced at: 14 minutes ago - Pushed at: about 2 months ago - Stars: 18 - Forks: 13

scify/jedai-ui
UI for JedAI Toolkit
Language: Java - Size: 1.09 MB - Last synced at: 29 days ago - Pushed at: almost 3 years ago - Stars: 17 - Forks: 5

cutterkom/remove-na-lgbtiq-queer-knowledge-graph
A knowledge graph on queer history
Language: R - Size: 9.45 MB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 16 - Forks: 1

alexkychen/assignPOP
Population Assignment using Genetic, Non-genetic or Integrated Data in a Machine-learning Framework. Methods in Ecology and Evolution. 2018;9:439–446.
Language: R - Size: 8.81 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 4

NYXFLOWER/GripNet
GripNet: Graph Information Propagation on Supergraph for Heterogeneous Graphs (PatternRecognit, 2023)
Language: Python - Size: 88 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 2

MeltanoLabs/Singer-Working-Group
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
Size: 28.3 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 4

KarrLab/datanator
Toolkit for discovering and aggregating data for whole-cell modeling
Language: Python - Size: 73.9 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 4

michaelbironneau/analyst
A declarative, SQL-like DSL for data integration tasks.
Language: Go - Size: 4.05 MB - Last synced at: 11 months ago - Pushed at: almost 7 years ago - Stars: 14 - Forks: 2

cognitedata/python-extractor-utils
Framework for developing extractors in Python
Language: Python - Size: 1.24 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 13 - Forks: 5

lisad/phaser
The missing layer for complex data batch integration pipelines
Language: Python - Size: 548 KB - Last synced at: 18 days ago - Pushed at: 2 months ago - Stars: 13 - Forks: 1
