Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
Package Usage: clojars: subotai
Mining HTML documents
25 versions
Latest release: over 4 years ago
1,854 downloads total
View more package details: https://packages.ecosyste.ms/registries/clojars.org/packages/subotai
View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/shriphani%2Fsubotai
Dependent Repos 8
shriphani/topix_crawler
Topix crawler- 0.2.16 project.clj
Size: 1.05 MB - Last synced: about 1 year ago - Pushed: over 9 years ago
shriphani/kba-2013-clj
Clojure tools to work with the 2013 streamcorpus- 0.2.12 project.clj
Size: 203 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago
shriphani/web-corpus
Clueweb web corpus pipeline- 0.2.12 project.clj
Size: 168 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago
shriphani/process-common-crawl
Process the common crawl dataset for clueweb- 0.2.12 project.clj
Size: 176 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago
shriphani/heritrix-utils
Simple utils to process corpora downloaded by heritrix and index them with berkeleydb- 0.2.17 project.clj
Size: 102 KB - Last synced: about 1 year ago - Pushed: over 9 years ago
shriphani/index-page-crawler
Follow pagination and get pages- 0.2.16 project.clj
Size: 93.8 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago
shriphani/structural-cluster-corpus
Structural clustering project codebase- 0.2.16 project.clj
Size: 297 KB - Last synced: about 1 year ago - Pushed: over 9 years ago
shriphani/untargeted-demo
Demo of untargeted crawling- 0.3.4 project.clj
Size: 152 KB - Last synced: about 1 year ago - Pushed: over 9 years ago