An open API service providing repository metadata for many open source software ecosystems.

GitHub / michael-pupulin / Scala_Spark_and_SQL

I do some basic statistics and machine learning work on a dataset of tornado events across the United States. The dataset is nowhere near big enough to warrant using Spark over something like R, but I was looking for practice. I do some basic SQL to find out which years and states saw the most tornadoes and the most F5 tornadoes. Then I use Spark's MLlib to do linear regression of time and tornado counts.

JSON API: http://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michael-pupulin%2FScala_Spark_and_SQL

Stars: 0
Forks: 0
Open issues: 0

License: None
Language: Scala
Size: 30.3 KB
Dependencies parsed at: Pending

Created at: almost 3 years ago
Updated at: 11 months ago
Pushed at: almost 3 years ago
Last synced at: 11 months ago

Topics: scala, spark, spark-mllib, spark-sql

    Loading...