An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: ingestion

cyclotruc/gitingest

Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase

Language: Python - Size: 505 KB - Last synced at: about 5 hours ago - Pushed at: about 1 month ago - Stars: 8,890 - Forks: 698

giulianoc/CatraMMS

Media Management System: ingestion, processing, encoding, delivery, ...

Language: C++ - Size: 89.8 MB - Last synced at: about 17 hours ago - Pushed at: 1 day ago - Stars: 39 - Forks: 15

getlago/lago

Open Source Metering and Usage Based Billing API ⭐️ Consumption tracking, Subscription management, Pricing iterations, Payment orchestration & Revenue analytics

Language: Go - Size: 132 MB - Last synced at: 1 day ago - Pushed at: 5 days ago - Stars: 7,686 - Forks: 377

jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

Language: Go - Size: 5.65 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 176 - Forks: 28

opensearch-project/data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

Language: Java - Size: 133 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 304 - Forks: 232

StarlightSearch/EmbedAnything

Production-ready Inference, Ingestion and Indexing built in Rust 🦀

Language: Rust - Size: 37.1 MB - Last synced at: 5 days ago - Pushed at: 7 days ago - Stars: 585 - Forks: 51

netboxlabs/diode

Diode data model and ingestion services for NetBox, from NetBox Labs

Language: Go - Size: 2.2 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 97 - Forks: 8

lcandy2/gitingest-extension

✨ A extension can help you open git ingest to turn any git repository into a prompt-friendly text ingest for LLMs.

Language: TypeScript - Size: 111 KB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 148 - Forks: 13

NASA-PDS/nucleus

Nucleus is a software platform used to create workflows for the Planetary Data (PDS).

Language: HCL - Size: 15.8 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

mrsimonemms/gobblr

Make your development databases gobble up known data

Language: Go - Size: 138 KB - Last synced at: 4 days ago - Pushed at: 11 days ago - Stars: 6 - Forks: 0

apache/gobblin

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Language: Java - Size: 127 MB - Last synced at: 1 day ago - Pushed at: 12 days ago - Stars: 2,237 - Forks: 750

ryhkml/ytingest

Extract YouTube video, feed it to any LLM as knowledge

Language: C - Size: 191 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

netboxlabs/diode-netbox-plugin

Official NetBox Labs plugin for NetBox for Diode

Language: Python - Size: 443 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 46 - Forks: 11

akram0zaki/breach-ingestor

A resilient, prefix-sharded ingestion pipeline for large static breach dumps (e.g. AntiPublic), optimized for low-resource environments (e.g., Raspberry Pi + NAS/SSD).

Language: JavaScript - Size: 25.4 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

PerisN/Data-Ingestion-Pipeline

Python-based data pipeline that extracts CSV files from a ZIP archive, converts them to Parquet format, and ingests them into a PostgreSQL database. Ideal for automating ETL workflows with minimal configuration.

Language: Jupyter Notebook - Size: 104 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

datainsider-co/rocket-bi

A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica

Language: TypeScript - Size: 69.5 MB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 111 - Forks: 31

alekLukanen/ChapterhouseDB-v1

Allows you to create simple data streaming warehouses written in Golang using Apache Parquet and Arrow.

Language: Go - Size: 189 KB - Last synced at: 8 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

ylem-co/ylem

Ylem is an open-source platform for real-time data streaming orchestration

Language: JavaScript - Size: 5.87 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 71 - Forks: 0

7-docs/7-docs

Use local files or public GitHub repository as a source and ask questions through ChatGPT about it

Language: TypeScript - Size: 363 KB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 119 - Forks: 9

emcd/python-mimeogram

Exchange collections of files with Large Language Models.

Language: Python - Size: 520 KB - Last synced at: 24 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

Dicklesworthstone/automatic_log_collector_and_analyzer

Replace Splunk in your small company with this one weird trick!

Language: Python - Size: 824 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 407 - Forks: 37

jrcichra/ingestd

HTTP server that easily ingests data into a database

Language: Go - Size: 388 KB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

tagbase/tagbase-server

Tagbase is a data lifecycle management system for electronic timeseries sensor data. It supports different types of data and works with equipment from various manufacturers.

Language: Python - Size: 2.18 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 7 - Forks: 2

beebeeep/chafka

Real-time Kafka to ClickHouse ingestion service

Language: Rust - Size: 119 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 33 - Forks: 0

souzomain/logflow

LogFlow é uma aplicação ETL (Extração, Transformação e Carregamento) especializada em processamento de logs

Language: Python - Size: 3.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

samber/go-quickwit

🍱 A Go ingestion client for Quickwit

Language: Go - Size: 26.4 KB - Last synced at: 8 days ago - Pushed at: 11 months ago - Stars: 3 - Forks: 2

EricZoop/vsingest

Transform any codebase or techstack in Visual Studio to prompt-friendly text for LLMs!

Language: JavaScript - Size: 59 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

nathadriele/spotify-data-pipeline

This project implements a full-stack data engineering solution that connects to the Spotify Web API to extract a user’s recently played tracks, stores the data in a PostgreSQL database, applies transformations using dbt, and delivers actionable insights via Metabase dashboards.

Language: Python - Size: 1.98 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

endernoke/linkedingest

Turn LinkedIn profiles into AI-friendly text ingests.

Language: JavaScript - Size: 354 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 3

FellowTraveler/ngest

Python script for ingesting various files into a semantic graph. For text, images, cpp, python, rust, javascript, and PDFs.

Language: Python - Size: 2.98 MB - Last synced at: about 2 months ago - Pushed at: 9 months ago - Stars: 25 - Forks: 2

vertica/PSTL

Parallel Streaming Transformation Loader

Language: Java - Size: 106 MB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 9 - Forks: 6

jmfeck/bigquery-local-framework

This repo provides tools to manage BigQuery operations locally, simplifying tasks like uploading flat files, running SQL queries, and downloading tables. It offers a unified interface for local BigQuery interactions, enabling more efficient interaction with it.

Language: Python - Size: 44.9 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

abakermi/gitllm

A powerful GitHub repository analysis tool that helps you process and analyze repository content efficiently. Built with Next.js, Cloudflare Workers, and modern web technologies

Language: TypeScript - Size: 3.55 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

aymane-maghouti/Big-Data-Project

This project aims to predict smartphone prices using a combination of batch and stream processing techniques in a Big Data environment. The architecture follows the Lambda Architecture pattern, providing both real-time and batch processing capabilities to users.

Language: Python - Size: 960 KB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 2

jgperrin/net.jgp.labs.spark

Apache Spark examples exclusively in Java

Language: Java - Size: 1.75 MB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 101 - Forks: 49

xycloo/rs-ingest

Single and multi-threaded custom ingestion crate for Stellar Futurenet, written in Rust.

Language: Rust - Size: 101 KB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

Jaebum0505/subscription-tracker-api

Skip the basic CRUD—this Backend Crash Course is all about building a production-ready Subscription Management System with real users, real money, and real business logic. You'll learn JWT authentication, database modeling, API architecture, security, automated workflows, and much more!

Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Clarifai/clarifai-python-datautils

Extract Transform and Load unstructured data into the Clarifai's AI platform

Language: Python - Size: 1.02 MB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

rapidomize/rapidomize

Rapidly Access, Processes, Analyze & Visualize Your Data

Size: 14.6 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

ahammadnafiz/RepoRAG

A fully interactive tool designed to streamline your GitHub repository prompt generation process and facilitate RAG (Retrieval-Augmented Generation) workflows

Language: Python - Size: 222 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

jgperrin/net.jgp.books.spark.ch09

Spark in Action, 2e - chapter 9 - Advanced ingestion: finding data sources and building your own

Language: Java - Size: 26.9 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 18 - Forks: 14

akornatskyy/sample-etl-flink-java

The sample ingests multiline gzipped files of popular books into postgres.

Language: Java - Size: 61.5 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

BonnardValentin/nmemo-foundation

Nmemo Foundation is a minimal, domain-driven platform for ingesting and retrieving documents with PostgreSQL and a modular architecture. It’s designed for easy expansion to other data sources and advanced search features

Language: Python - Size: 22.5 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

Azure/azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language: Scala - Size: 19.6 MB - Last synced at: 1 day ago - Pushed at: 4 months ago - Stars: 235 - Forks: 178

garethcmurphy/SciCat-Data-Ingestion-with-TypeScript

# SciCat Data Ingestion with TypeScript 📥✨ This repository provides a **TypeScript-based tool** for importing and ingesting data into **SciCat**, the science data catalog used at the **European Spallation Source (ESS)**. --- ## Features ✨ - **Data Ingestion**: Automates data import into SciCat. - **TypeScript Implementation**: Ensures ty

Language: TypeScript - Size: 15.6 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

vicentedpsantos/repo2text

repo2text is a command-line tool that converts the content of a Git repository into a structured text file. It extracts all committed files and outputs them in a format suitable for easy ingestion by AI tools like ChatGPT. Ideal for sharing or analyzing repository contents in AI-driven conversations. 🤖

Language: Rust - Size: 23.4 KB - Last synced at: 30 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

lopezj1/youtube_fishing

This project ingests YouTube video data related to fishing, stores it in MongoDB, and provides visualizations through Metabase for analysis.

Language: Python - Size: 3.91 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

averemee-si/oralog

Ingestion tool for various database logs

Language: Java - Size: 126 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

rodrigo85/dms_ingestion

This project demonstrates how to use Apache Airflow to orchestrate AWS Database Migration Service (DMS) tasks for data ingestion. The solution leverages the power of Airflow for workflow automation and AWS DMS for seamless data migration to Amazon S3.

Language: Python - Size: 881 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

AbsaOSS/hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

Language: Scala - Size: 1.64 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 44 - Forks: 13

Scody0/SQL-Injection-Training-Site

SQL Injection Training Site

Language: HTML - Size: 15.6 KB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

zezs/Langchain-Docs---AI-Chat-Assistant

This repository is dedicated to learning LangChain by creating a generative AI application. This web application uses Pinecone as a vector store to answer questions related to LangChain, utilizing sources from the official LangChain documentation.

Language: Python - Size: 1.21 MB - Last synced at: 2 days ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

lmolas/http-ingestor

Go implementation for handling huge amounts of http uploads

Language: Go - Size: 5.86 KB - Last synced at: over 2 years ago - Pushed at: almost 6 years ago - Stars: 0 - Forks: 0

rachita27/AUTOMATING

Automating Ingestion Excel Files On To Azure Data Studio (SQL-Server)

Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: 28 days ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 0

kharigardner/pyfivetran

Simple python interface for the Fivetran API. Powered by HTTPx.

Language: Python - Size: 139 KB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 2 - Forks: 0

ClarityNLP/ingest-api

Ingest data into Solr from a variety of sources

Language: JavaScript - Size: 1.69 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

ClarityNLP/ingest-client

React client for Solr ingest

Language: JavaScript - Size: 1.87 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

postlang/posthog-llm-examples

Upload data to PostHog-LLM

Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

apivideo/ingest.new

A simple demo application for uploading, ingesting, embedding videos and converting them to mp4s. From api.video (https://api.video)

Language: JavaScript - Size: 739 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

CocoaPriest/AssistAI

macOS app to chat with your local documents

Language: Swift - Size: 164 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

streamsqldb/streamsql-js

The javascript ingestion API for streamsql.

Language: TypeScript - Size: 59.6 KB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

marceloboeira/crowd

👥 [WIP] An experimental High Available Reverse Proxy for Massive Asynchronous Message Consumption

Language: Go - Size: 559 KB - Last synced at: 2 months ago - Pushed at: almost 6 years ago - Stars: 6 - Forks: 1

se02035/azure-eventhub-ingestor

This repo contains samples of a single process / high performant eventhub ingestor (using C#)

Language: C# - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

CocoaPriest/bubbleai

FastAPI server to process client request both for injection and interference

Language: Python - Size: 48.8 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Azure/Azure-AppServices-Diagnostics-KustoIngestor

Azure App Service Diagnostics Kusto Ingestor provides developers ability to write custom logic before logs in Kusto can be aggregated and ingested in that may not be possible within a single query. Supported ingestion mechanisms are ingest from query and ingest from DataTable.

Language: C# - Size: 39.1 KB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 2 - Forks: 3

abideenml/RealTime-StarRatingPrediction-with-AWSKinesis

This repository contains an End to End Real time 🕰️ Machine Learning Pipeline to predict star ⭐️ rating of product reviews. This project uses AWS Sagemaker, Kinesis, Lambda, S3, Redshift, Athena, and Step functions. Deployment of multiple models for AB testing and Bandit testing is also included.

Language: Jupyter Notebook - Size: 15.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

tweag/lagoon

Data centralization tool

Language: Haskell - Size: 341 KB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 35 - Forks: 1

mahmudie/data_engineering_projects

List of my data engineering projects

Language: Jupyter Notebook - Size: 384 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

biocaddie/foundry-nlp-enhancer

NLP enhancer plugin for Foundry-ES pipeline management system. The service that enhances elasticsearch functionality with NLP elements.

Language: Java - Size: 156 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

coxwave/impaction-ai-sdk-python

Server-side impaction.ai Data Ingestion SDK for Python

Language: Python - Size: 78.1 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

coxwave/impaction-ai-sdk-node

Server-side impaction.ai Data Ingestion SDK for Node.js

Language: TypeScript - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

dativebase/dailp-ingest-clj

DAILP Ingest (of Cherokee language data from Google Sheets)

Language: Clojure - Size: 197 KB - Last synced at: about 1 year ago - Pushed at: about 5 years ago - Stars: 2 - Forks: 0

jacobmarks/twilio-automation-plugin

Automate data ingestion into FiftyOne with Twilio

Language: Python - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

3amory99/Building-Sales-Data-Mart-Using-ETL-SSIS

By using AdventureWorks2022 Dataset I have built a Sales Data Mart using (SQL Server Integration Services SSIS) SQL Server involves leveraging the capabilities of Integration Services (SSIS) and the Modeling of SQL Server, This Data mart offers several benefits, making them valuable components in the main purpose of data management and analytics wi

Language: TSQL - Size: 1.87 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

rconjoe/etl.ts Fork of smartive/proc-that

smartive/proc-that forked to play with

Size: 406 KB - Last synced at: about 13 hours ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

mbsuraj/postgresql_ingestion_script

Ingest any format data into postgreSQL database

Language: Python - Size: 15.6 KB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 1 - Forks: 1

Grokery/grokerylab

A data pipeline management platform

Language: JavaScript - Size: 32.3 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

ldaniels528/transgress

A distributed processing/orchestration server and ETL for NodeJS

Language: Scala - Size: 685 KB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

seb7887/janus

Data ingestion service written in Go

Language: Go - Size: 110 KB - Last synced at: 12 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

c-drault/ingest-csv-for-elasticsearch 📦

Lab n°2 of "Applications of Big-Data" @ Efrei Paris

Language: Jupyter Notebook - Size: 5.86 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

woodRock/psychic-invention

NZODN Data Ingestion Project

Language: Shell - Size: 5.34 MB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

crosslibs/incremental-ingestion-using-airflow

Periodically ingest incremental updates (inserts / deletes) into BigQuery using Cloud Composer / Airflow orchestration workflow

Language: Python - Size: 7.81 KB - Last synced at: 10 months ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 1

gpism/OpenDataCore

Welcome to the fascinating intersection of Web3, Artificial Intelligence (AI), Open Data Core (ODC), and Composable Enterprise Fabric - a nexus of modern technologies that are significantly reshaping the enterprise landscape

Language: Java - Size: 14.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

phphoebe/Python-Data-Analysis-with-NumPy-and-Pandas

NumPy & Pandas for data science, data analysis & business intelligence, with practical, hands-on Python projects

Language: Jupyter Notebook - Size: 54.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

azuregig/work_with_OrdnanceSurvey_data

Sample Azure Data Factory pipeline for ingesting Data Packages directly from the Download API of the Ordnance Survey Data Hub into Azure Storage.

Size: 2.21 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 3

luccayz/dataengineer_project_001

Efetuar o download de arquivos da web com Python. Inserir dados de um dataframe na cloud Azure com Azure SQL Database. Efetuar transformações nos dados com Azure Data Factory.

Language: Jupyter Notebook - Size: 11.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

helenamin/deb-finalProject-group3

End-To-End-Solution-DataEngineering-FinalProject

Language: Python - Size: 2.87 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 3

phdata/pipeforge 📦

Language: Scala - Size: 12.9 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 6

sorcero/ingestum

Read-only mirror of https://gitlab.com/sorcero/community/ingestum

Language: Python - Size: 2.54 MB - Last synced at: 26 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 0

snollygolly/borrow-bot

:moneybag: A bot for maximizing the borrow subreddit

Language: JavaScript - Size: 860 KB - Last synced at: about 1 month ago - Pushed at: over 8 years ago - Stars: 27 - Forks: 0

italia/daf-replicate-ingestion 📦

Microservice to ingest data from Replicate and push it into DAF. Warning: this repo is deprecated.

Language: Java - Size: 151 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 6

zalando-zmon/zmon-data-service 📦

Receiving end of new worker to push data across DC boundaries

Language: Java - Size: 555 KB - Last synced at: about 2 months ago - Pushed at: about 5 years ago - Stars: 6 - Forks: 2

timxor/bitcoind-data-ingestion

crypto payments bitcoind data ingestion

Language: JavaScript - Size: 90.8 KB - Last synced at: 17 days ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

san089/Yelp_Project

This project is to create a Data lake for Yelp data-set and further using the it to create an Analytical Sandbox Data Science purpose and also creating a data warehouse for reporting purpose.

Language: Jupyter Notebook - Size: 351 KB - Last synced at: 3 months ago - Pushed at: almost 6 years ago - Stars: 2 - Forks: 2

Cigna/ibis

IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.

Language: Python - Size: 749 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 51 - Forks: 15

padogrid/bundle-hazelcast-3n4n5-app-pado_dbsched-perf_test_dbsched-docker-mysql

The dbsched bundle is preconfigured with the Pado scheduler to periodically execute jobs that dump database tables to CSV files from which it automatically extracts column information to generate the corresponding VersionedPortable classes. It then transforms the CSV records to objects using the generated classes before ingesting them into Hazelcast.

Language: Shell - Size: 744 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Kinjuriu/python-ingestion

Data Ingestion, reading files, working with databases, troubleshooting data, calling APIs and schemas

Language: Julia - Size: 18.2 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

magengit/magen-in

Ingestion Server for Magen Data Leak Prevention Software

Language: Python - Size: 537 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 1

projectkeas/ingestion

The core ingestion API for KEAS

Language: Go - Size: 70.3 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

Soumyadeep-github/Data-Ingestion

The aim of this project is automate data ingestion from flat files like CSV and compressed files GZIP into a database like Postgres. The entire setup is automated using Docker and is pretty fast too as multiprocessing is being used.

Language: Python - Size: 33.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2