An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: apache-spark-library

bayudwiyansatria/library-java-apache-spark

Apache Spark Libraries. Apache Spark has as its architectural foundation the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API. In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged even though the RDD API is not deprecated. The RDD technology still underlies the Dataset API.

Language: Java - Size: 70.2 MB - Last synced at: 2 months ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 1