Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

Package Usage: clojars: subotai

Mining HTML documents
25 versions
Latest release: over 4 years ago
1,854 downloads total

View more package details: https://packages.ecosyste.ms/registries/clojars.org/packages/subotai

View more repository details: https://repos.ecosyste.ms/hosts/GitHub/repositories/shriphani%2Fsubotai

Dependent Repos 8

shriphani/topix_crawler
Topix crawler
  • 0.2.16 project.clj

Size: 1.05 MB - Last synced: about 1 year ago - Pushed: over 9 years ago

shriphani/kba-2013-clj
Clojure tools to work with the 2013 streamcorpus
  • 0.2.12 project.clj

Size: 203 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago

shriphani/web-corpus
Clueweb web corpus pipeline
  • 0.2.12 project.clj

Size: 168 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago

shriphani/process-common-crawl
Process the common crawl dataset for clueweb
  • 0.2.12 project.clj

Size: 176 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago

shriphani/heritrix-utils
Simple utils to process corpora downloaded by heritrix and index them with berkeleydb
  • 0.2.17 project.clj

Size: 102 KB - Last synced: about 1 year ago - Pushed: over 9 years ago

shriphani/index-page-crawler
Follow pagination and get pages
  • 0.2.16 project.clj

Size: 93.8 KB - Last synced: about 1 year ago - Pushed: almost 10 years ago

shriphani/structural-cluster-corpus
Structural clustering project codebase
  • 0.2.16 project.clj

Size: 297 KB - Last synced: about 1 year ago - Pushed: over 9 years ago

shriphani/untargeted-demo
Demo of untargeted crawling
  • 0.3.4 project.clj

Size: 152 KB - Last synced: about 1 year ago - Pushed: over 9 years ago