Topic: "big data"
jtholmi/wit_io
WITio: A MATLAB data evaluation toolbox to script broader insights into big data from WITec microscopes
Last synced at: over 2 years ago - Stars: 2 - Forks: 0

nescience/machine_learning
New machine learning algorithms based on the minimum nescience principle
Last synced at: over 2 years ago - Stars: 2 - Forks: 0


datahackformation/community/workshops/workshop_1_druid_gcpcertification
Workshop dictado por Jesús Méndez (https://pe.linkedin.com/in/jmendezgal) y Antonio Cachuán (https://linkedin.com/in/antoniocachuan/) los temas de Apache Druid, Certificarte en GCP y nuestro Data Engineering Program
Last synced at: over 2 years ago - Stars: 1 - Forks: 0
datahackformation/community/workshops/workshop_2_bigdata_hadoop
Workshop de Big Data a cargo de Jimmy Farfán docente del curso online "Desarrollo de Aplicaciones de Big Data en Hadoop". Si requieren más información o cualquier duda pueden ubicarnos en facebook como Data Hack Formation.
Last synced at: over 2 years ago - Stars: 1 - Forks: 0
gmarciani/flink-app
Scaffolding for data stream processing applications, leveraging Apache Flink.
Last synced at: over 2 years ago - Stars: 1 - Forks: 0

gmarciani/mapreduce-app
Scaffolding for Map/Reduce applications, leveraging Apache Hadoop.
Last synced at: over 2 years ago - Stars: 1 - Forks: 0

neuroscience-lab/bndf
Structured Big data framework based on Apache Spark for storing and manipulating large scale multi channel neurophysiological recording data
Last synced at: over 2 years ago - Stars: 1 - Forks: 0
neuroscience-lab/bndfcluster
BNDF Private cluster
Last synced at: over 2 years ago - Stars: 1 - Forks: 2
v-bootcamp-bd-ml/big-data-processing.spark-y-scala.practice
Práctica del módulo Big Data Processing (Spark y Scala) del V Bootcamp BD & ML de Keepcoding
Last synced at: over 2 years ago - Stars: 1 - Forks: 0
amit-kamat/Map-Reduce-Ukraine
This project aggregates trending data from Ukraine based Twitter accounts. The raw aggregated data is cleansed before analysis using some Big-data methods. The purpose of this project is to familiarize myself with the workings of Hadoop for HDFS and Map-Reduce infrastructure.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

arnimjenett/fsdb
file system based database for the management of big image data.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

contactprincebansal/pyspark-azure-hdinsight-sample
Deploying PySpark Jobs on Azure HDInsight Spark Cluster (CI/CD)
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

dars1608/geographically-weighted-regression-in-apache-spark
Implementation of Geographically Weighted Regression (GWR) using Apache Spark, Spark ML and Apache Sedona.
Last synced at: 9 months ago - Stars: 0 - Forks: 0
dars1608/geospatial-index-distributed
Spatial join of geospatial data from Kafka streams using Apache Spark (Spark Streaming).
Last synced at: 9 months ago - Stars: 0 - Forks: 0
dephekt/crawler
A Python app for scanning large data sets of URLs for a given signature and storing the results to an ElasticSearch index. Useful applications for CERTs and security researchers, maybe others.
Last synced at: about 2 years ago - Stars: 0 - Forks: 0
erichgatejen/autohit-2003
XML based testing platform.
Last synced at: about 2 years ago - Stars: 0 - Forks: 0
erichgatejen/dadadJ
dadadJ data operating environement
Last synced at: about 2 years ago - Stars: 0 - Forks: 0
jal7/DataScience
A Learning Path for Data Science professional development
Last synced at: 10 months ago - Stars: 0 - Forks: 1
jbferet/bigRaster
The package bigRaster allows handling large rasters when they can be processed by chunk. This includes computing spectral indices, applying regression models, stacking individual rasters into larger rasters...
Last synced at: almost 2 years ago - Stars: 0 - Forks: 0
kaelta/kwn
Studying the effects of music in the growth of plants through an IoT automated farming solution.
Last synced at: over 1 year ago - Stars: 0 - Forks: 0

knowledge-bases/data-science
https://data.rtfm.page
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leliac/ganymede
Execute Hadoop and Spark applications on the BigData@Polito cluster with a single command
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

leo-plese/big-data-algorithms/apache-hadoop-framework-hdfs-mapreduce-programming-model
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/collaborative-filtering-recommendation-system-user-user-item-item-cf-approach
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/datar-gionis-indyk-motwani-dgim-algorithm-for-approximating-number-of-ones-in-data-bit-stream
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/girvan-newman-gn-algorithm-using-betweenness-centrality-for-detecting-communities-in-social-network-graphs
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/large-graphs-algorithms-node-rank-closest-black-node-tasks
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/pcy-park-chen-yu-algorithm-for-finding-frequent-itemsets
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
leo-plese/big-data-algorithms/simhash-algorithm-for-text-hashing-finding-near-duplicates-and-lsh-locality-sensitive-hashing-algorithm-for-finding-similar-items
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
maspadaru/taskmaster
Taskmaster is a light-weight open-source software framework that aims to simplify distribution of big data processing and analysis tasks over multiple worker nodes.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
migandr/hadoop-premier-league
This project was an exercise for the Master in Big Data Engineering and Data Science at "Universidad Autónoma de Madrid". See the readme.md for more information.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0


rvalfo/shluker
NLTK for sentiment analysis given a Twitter streaming for a word. Configuration scripts for MongoDB and twitter streaming.
Last synced at: over 2 years ago - Stars: 0 - Forks: 1
rychly-edu/theses/dist-forensic-digital-data-repo
Distributed storage for digital forensic data with data/metadata repository, API for queries and incoming/outgoing data, indexing, plug-in system for yet unsupported data-types, etc.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
samy_benslimane/nf26-project
From Data ASOS (https://mesonet.agron.iastate.edu/request/download.phtml), Analysis of aviation data to underline some patterns
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
siddie/stackexchange-dump-spark-research-tools
Stack Exchange releases "data dumps" of all its publicly available content roughly every three months via archive.org. This project is an example and a framework for building ETL for this data with Apache Spark and Java.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

stefano.slobodiuk/open-data-for-bike-marstefo
The unofficial Bike Sharing analytics service for Udine (aka Bike MarStefo) makes the free download of the dataset available to everyone.
Last synced at: 10 months ago - Stars: 0 - Forks: 0

therackio/big-data/binaries/apache-hadoop-bin-arm64
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
therackio/big-data/binaries/apache-hive-bin-arm64
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
therackio/big-data/binaries/apache-hudi-bin-arm64
Apache Hudi (https://hudi.apache.org/), compiled on ARM64.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
therackio/big-data/binaries/apache-spark-bin-arm64
Last synced at: over 2 years ago - Stars: 0 - Forks: 0
tymyrddin/seedlings
Deep learning (CNN) deployment pipeline
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

vqphuynh/dp3-algorithm
DP3 is an algorithm for distributed and shared-memory parallel Frequent Itemsets Mining.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0