An open API service providing repository metadata for many open source software ecosystems.

Topic: "data-processing"

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Language: Python - Size: 133 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 50,011 - Forks: 1,454

onceupon/Bash-Oneliner

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

Size: 919 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 10,595 - Forks: 641

johnkerl/miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Language: Go - Size: 201 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9,521 - Forks: 227

TomWright/dasel

Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

Language: Go - Size: 10.1 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 7,664 - Forks: 157

NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Language: C++ - Size: 398 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 5,551 - Forks: 649

datajuicer/data-juicer

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Language: Python - Size: 560 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 5,517 - Forks: 289

deepseek-ai/smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Language: Python - Size: 1.77 MB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 4,839 - Forks: 431

unionai-oss/pandera

A light-weight, flexible, and expressive statistical data testing library

Language: Python - Size: 4.46 MB - Last synced at: 16 days ago - Pushed at: 20 days ago - Stars: 4,073 - Forks: 363

cocoindex-io/cocoindex

Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!

Language: Rust - Size: 78.3 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3,258 - Forks: 262

dashbitco/broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Language: Elixir - Size: 665 KB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 2,586 - Forks: 169

microsoft/DialoGPT

Large-scale pretraining for dialogue

Language: Python - Size: 43.6 MB - Last synced at: about 1 month ago - Pushed at: about 3 years ago - Stars: 2,412 - Forks: 346

numaproj/numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Language: Rust - Size: 52.7 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 2,411 - Forks: 147

asyml/texar

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 13.6 MB - Last synced at: 20 days ago - Pushed at: about 4 years ago - Stars: 2,391 - Forks: 369

bytewax/bytewax

Python Stream Processing

Language: Python - Size: 12 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 1,861 - Forks: 96

python-bonobo/bonobo

Extract Transform Load for Python 3.5+

Language: Python - Size: 1.46 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 1,600 - Forks: 142

pyper-dev/pyper

Concurrent Python made simple

Language: Python - Size: 462 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 1,503 - Forks: 30

OpenDCAI/DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Language: Python - Size: 4.58 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 1,476 - Forks: 102

GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Language: Jupyter Notebook - Size: 6.51 MB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 1,387 - Forks: 727

allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

Language: Python - Size: 62.9 MB - Last synced at: 5 days ago - Pushed at: 14 days ago - Stars: 1,343 - Forks: 154

NVIDIA-NeMo/Curator

Scalable data pre processing and curation toolkit for LLMs

Language: Python - Size: 18.6 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1,219 - Forks: 188

run-house/kubetorch

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

Language: Python - Size: 31.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,093 - Forks: 45

microsoft/GODEL

Large-scale pretrained models for goal-directed dialog

Language: Python - Size: 49.8 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 883 - Forks: 112

GoogleCloudPlatform/DataflowJavaSDK 📦

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Size: 12.9 MB - Last synced at: about 1 month ago - Pushed at: almost 5 years ago - Stars: 852 - Forks: 320

benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

Language: Pascal - Size: 2.09 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 818 - Forks: 45

asyml/texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 3.08 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 747 - Forks: 113

hstreamdb/hstream

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

Language: Haskell - Size: 6.28 MB - Last synced at: 17 days ago - Pushed at: 11 months ago - Stars: 727 - Forks: 55

ChenghaoMou/text-dedup

All-in-one text de-duplication

Language: Python - Size: 58.9 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 714 - Forks: 73

SebKrantz/collapse

Advanced and Fast Data Transformation in R

Language: C - Size: 111 MB - Last synced at: about 19 hours ago - Pushed at: about 20 hours ago - Stars: 693 - Forks: 35

jofpin/synthBTC

A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.

Language: JavaScript - Size: 6.46 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 672 - Forks: 398

infoslack/awesome-kafka

A list about Apache Kafka

Size: 96.7 KB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 583 - Forks: 165

kousun12/eternal

👾~ music, eternal ~ 👾

Language: JavaScript - Size: 91.4 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 555 - Forks: 34

Puchaczov/Musoq

SQL Syntax without any database

Language: C# - Size: 17.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 493 - Forks: 21

constellation-rs/amadeus

Harmonious distributed data analysis in Rust.

Language: Rust - Size: 2.46 MB - Last synced at: 22 days ago - Pushed at: over 4 years ago - Stars: 482 - Forks: 25

polyaxon/haupt

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Language: Python - Size: 1.18 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 451 - Forks: 210

maykulkarni/Machine-Learning-Notebooks

Machine Learning notebooks for refreshing concepts.

Language: Jupyter Notebook - Size: 13.2 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 420 - Forks: 218

msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

Language: Python - Size: 25.4 KB - Last synced at: 3 months ago - Pushed at: about 3 years ago - Stars: 377 - Forks: 27

flow-php/etl

PHP - ETL (Extract Transform Load) data processing library

Language: PHP - Size: 3.73 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 366 - Forks: 22

ml6team/fondant

Production-ready data processing made easy and shareable

Language: Python - Size: 23 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 354 - Forks: 27

lithops-cloud/lithops

A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀

Language: Python - Size: 12.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 348 - Forks: 114

keithorange/PatternPy

📈 PatternPy: A Python package revolutionizing trading analysis with high-speed pattern recognition, leveraging Pandas & Numpy. Effortlessly spot Head & Shoulders, Tops & Bottoms, Supports & Resistances. For experts & beginners. #TradingMadeEasy 🔥

Language: Python - Size: 404 KB - Last synced at: 7 months ago - Pushed at: over 1 year ago - Stars: 335 - Forks: 78

matousc89/padasip

Python Adaptive Signal Processing

Language: Python - Size: 5.93 MB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 314 - Forks: 52

PytLab/VASPy

Manipulating VASP files with Python.

Language: Python - Size: 21.1 MB - Last synced at: 21 days ago - Pushed at: over 3 years ago - Stars: 289 - Forks: 99

alttch/rapidtables

Super fast list of dicts to pre-formatted tables conversion library for Python 2/3

Language: Python - Size: 240 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 288 - Forks: 9

streamnative/pulsar-flink 📦

Elastic data processing with Apache Pulsar and Apache Flink

Language: Java - Size: 2.16 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 279 - Forks: 120

ColasGael/Machine-Learning-for-Solar-Energy-Prediction

Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning

Language: Python - Size: 922 MB - Last synced at: 9 days ago - Pushed at: about 6 years ago - Stars: 279 - Forks: 120

svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Language: Python - Size: 3.45 MB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 271 - Forks: 45

Yord/pxi

🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.

Language: JavaScript - Size: 19.6 MB - Last synced at: about 1 month ago - Pushed at: about 5 years ago - Stars: 268 - Forks: 3

scramjetorg/scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.

Size: 2.71 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 252 - Forks: 20

airscholar/e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

Language: Python - Size: 289 KB - Last synced at: 6 months ago - Pushed at: 9 months ago - Stars: 250 - Forks: 123

asyml/forte

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

Language: Python - Size: 17.8 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 248 - Forks: 59

mech-lang/mech

🦾 Mech is a programming language for building data-driven systems like robots, games, and interfaces. Start here!

Language: Rust - Size: 18.5 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 245 - Forks: 14

apache/incubator-wayang

Apache Wayang(incubating) is the first cross-platform data processing system.

Language: Java - Size: 19.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 234 - Forks: 107

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

Language: Python - Size: 22.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 225 - Forks: 57

senbox-org/snap-engine

ESA Earth Observation Toolbox and Java Development Platform

Language: Java - Size: 1.01 GB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 199 - Forks: 105

LibreCat/Catmandu

Catmandu - a data processing toolkit

Language: Perl - Size: 53.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 191 - Forks: 36

hxz393/BrutalityExtractor

适用于高性能系统的多进程解压缩软件(A multiprocess decompression software for high-performance system)

Language: Python - Size: 4.91 MB - Last synced at: 3 months ago - Pushed at: about 2 years ago - Stars: 187 - Forks: 12

markus-wa/cq

Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more

Language: Clojure - Size: 202 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 183 - Forks: 11

fluxus-labs/fluxus

Fluxus Stream Processing Engine

Language: Rust - Size: 5.07 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 166 - Forks: 22

remotesensinginfo/rsgislib

Remote Sensing and GIS Software Library; python module tools for processing spatial data.

Language: C++ - Size: 140 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 153 - Forks: 28

senbox-org/snap-desktop

Desktop GUI for SNAP based on NetBeans Platform

Language: Java - Size: 77.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 143 - Forks: 66

iam-mhaseeb/Skytrax-Data-Warehouse 📦

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.

Language: Python - Size: 1.34 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 137 - Forks: 30

tollwerk/data-processing-agreements

Collection of Data Processing Agreement (DPA) and GDPR compliance resources

Language: SCSS - Size: 98.6 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 134 - Forks: 24

Nonanti/PipeFlow

High-performance ETL pipeline library for .NET. Process CSV, JSON, Excel, and SQL data with minimal memory usage through streaming operations.

Language: C# - Size: 4.01 MB - Last synced at: 24 days ago - Pushed at: about 2 months ago - Stars: 132 - Forks: 10

luckylittle/blinkist-m4a-downloader

Grabs all of the audio files from all of the Blinkist books

Language: Go - Size: 101 KB - Last synced at: 7 months ago - Pushed at: almost 3 years ago - Stars: 132 - Forks: 25

kfultz07/go-dataframe

A simple package to abstract away the process of creating usable DataFrames for data analytics. This package is heavily inspired by the amazing Python library, Pandas.

Language: Go - Size: 3.96 MB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 130 - Forks: 8

thu-coai/cotk

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

Language: Python - Size: 10.5 MB - Last synced at: 2 months ago - Pushed at: about 5 years ago - Stars: 128 - Forks: 26

LiberTEM/LiberTEM

Open pixelated STEM framework

Language: Python - Size: 230 MB - Last synced at: 17 days ago - Pushed at: 30 days ago - Stars: 122 - Forks: 68

NVIDIA/nvImageCodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

Language: Jupyter Notebook - Size: 30.6 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 122 - Forks: 9

drshahizan/HPDP

High performance data processing employs high performance computing (HPC) to process data, which is then translated into information and knowledge. The advent of high-performance computing and data analytics enabled real-time interrogation of extremely large data sets.

Language: Jupyter Notebook - Size: 527 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 122 - Forks: 89

Siteimprove/alfa

:wheelchair: Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale

Language: HTML - Size: 64.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 121 - Forks: 15

utdemir/distributed-dataset

A distributed data processing framework in Haskell.

Language: Haskell - Size: 875 KB - Last synced at: about 2 months ago - Pushed at: over 5 years ago - Stars: 117 - Forks: 5

streamnative/pulsar-spark

Spark Connector to read and write with Pulsar

Language: Scala - Size: 722 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 116 - Forks: 53

zengwangfa/2019-Electronic-Design-Competition

【电赛】2019 全国大学生电子设计竞赛 (F题)纸张数量检测装置 (基于STM32F407 & FDC2214 & USART HMI)

Language: C - Size: 80.9 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 105 - Forks: 41

kubeflow/mcp-apache-spark-history-server

MCP Server for Apache Spark History Server. The bridge between Agentic AI and Apache Spark.

Language: Python - Size: 2.33 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 102 - Forks: 33

whoiskatrin/financial-statement-pdf-extractor

Python script to extract as much structured information as possible from annual/quarterly reports.

Language: Python - Size: 17.6 KB - Last synced at: 8 months ago - Pushed at: almost 2 years ago - Stars: 99 - Forks: 24

akashlevy/Deep-Learn-Oil

Deep learning tools for predicting oil well data

Language: Python - Size: 512 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 97 - Forks: 55

docwire/docwire

DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing is possible for security and confidentiality

Language: C++ - Size: 36.4 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 94 - Forks: 24

asavinov/prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Language: Python - Size: 1.95 MB - Last synced at: about 1 month ago - Pushed at: almost 4 years ago - Stars: 91 - Forks: 5

MDSplus/mdsplus

The MDSplus data management system

Language: Java - Size: 196 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 87 - Forks: 49

DRUMNICORN/Visio

Visio is an AI-powered IDE concept that turns software development into a visual, code-free experience, making programming accessible to everyone.

Size: 1020 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 83 - Forks: 5

aces/cbrain

CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.

Language: Ruby - Size: 20.5 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 80 - Forks: 51

pauliacomi/pyGAPS

A framework for processing adsorption data and isotherm fitting

Language: Python - Size: 26.4 MB - Last synced at: 10 days ago - Pushed at: 9 months ago - Stars: 78 - Forks: 26

vortex-exoplanet/VIP

VIP is a python package/library for angular, reference star and spectral differential imaging for exoplanet/disk detection through high-contrast imaging.

Language: Python - Size: 330 MB - Last synced at: 20 days ago - Pushed at: 28 days ago - Stars: 77 - Forks: 62

duoan/ijcai18-mama-ads-competition

IJCAI-18 阿里妈妈搜索广告转化预测初赛方案

Language: Jupyter Notebook - Size: 1.11 MB - Last synced at: over 1 year ago - Pushed at: over 7 years ago - Stars: 72 - Forks: 22

alirezatheh/perke

A keyphrase extractor for Persian

Language: Python - Size: 143 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 69 - Forks: 8

JusperLee/LRS3-For-Speech-Separation

Multi-modal speech separation task data generation script on LRS3 data set.

Language: MATLAB - Size: 3.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 69 - Forks: 14

p-ranav/pipeline

Pipelines for Modern C++

Language: C++ - Size: 245 KB - Last synced at: 7 months ago - Pushed at: about 5 years ago - Stars: 67 - Forks: 8

UrbanOS-Public/smartcitiesdata

The core micro services of UrbanOS as an umbrella project with component documentation

Language: Elixir - Size: 14.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 64 - Forks: 11

NVIDIA-AI-IOT/deepstream_libraries

DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.

Language: Python - Size: 246 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 63 - Forks: 1

unidentifieddeveloper/blaze

A blazing fast exporter for your Elasticsearch data.

Language: C++ - Size: 34.2 KB - Last synced at: 6 months ago - Pushed at: 11 months ago - Stars: 62 - Forks: 9

AtomGraph/Processor

Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.

Language: Java - Size: 1.51 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 60 - Forks: 7

BjoernKW/ZenQuery

Enterprise backend as a service

Language: Java - Size: 5.84 MB - Last synced at: over 2 years ago - Pushed at: about 7 years ago - Stars: 60 - Forks: 15

josephmachado/online_store

End to end data engineering project

Language: Python - Size: 1.53 MB - Last synced at: 5 months ago - Pushed at: about 3 years ago - Stars: 57 - Forks: 18

wq/itertable

⇔ IterTable is a Pythonic API for iterating through tabular data formats, including CSV, XLSX, XML, and JSON.

Language: Python - Size: 248 KB - Last synced at: 8 days ago - Pushed at: over 2 years ago - Stars: 53 - Forks: 11

31z4/storm-docker 📦

Docker image packaging for Apache Storm

Language: Dockerfile - Size: 81.1 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 52 - Forks: 27

TirendazAcademy/Data-Visualization-with-Python

Data Visualization Tutorial | Matplotlib | Seaborn | Pandas

Language: Jupyter Notebook - Size: 25.5 MB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 34

luisbelloch/data_processing_course

Some class materials for a data processing course using PySpark

Language: Python - Size: 563 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 51 - Forks: 24

Samson-Mano/Fast_Fourier_Transform

C# implementation of Cooley–Tukey's FFT algorithm.

Language: C# - Size: 1.44 MB - Last synced at: 7 months ago - Pushed at: over 2 years ago - Stars: 48 - Forks: 17

adelekuzmiakova/CS229-machine-learning-solar-energy-predictions

Predicting solar energy using machine learning (LSTM, PCA, boosting). This is our CS 229 project from autumn 2017. Report and poster are included.

Language: Python - Size: 922 MB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 48 - Forks: 12

gabyx/ExecutionGraph

Fast Generic Execution Graph/Network

Language: C++ - Size: 24.8 MB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 46 - Forks: 7