Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub / commoncrawl / cc-index-table
Index Common Crawl archives in tabular format
JSON API: https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fcc-index-table
Stars: 95
Forks: 9
Open Issues: 8
License: apache-2.0
Language: Java
Repo Size: 155 KB
Dependencies:
9
Created: over 6 years ago
Updated: about 1 month ago
Last pushed: 9 months ago
Last synced: about 1 month ago
Topics: apache-parquet, aws-athena, columnar-storage, commoncrawl, spark, sql
Files
Loading...
Readme
Loading...
Dependencies
pom.xml
maven
- software.amazon.awssdk:bom 2.17.177 import
- org.apache.spark:spark-core_2.12 3.2.1 provided
- org.apache.spark:spark-sql_2.12 3.2.1 provided
- com.github.crawler-commons:crawler-commons 1.2
- com.google.code.gson:gson 2.8.9
- commons-cli:commons-cli 1.2
- org.slf4j:slf4j-api 1.7.36
- software.amazon.awssdk:s3
- org.junit.jupiter:junit-jupiter-engine 5.8.2 test