Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: nutch

apache/nutch-site

Apache Nutch Website

Language: CSS - Size: 36.7 MB - Last synced: 4 minutes ago - Pushed: 2 days ago - Stars: 0 - Forks: 2

apache/nutch

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 131 MB - Last synced: 4 minutes ago - Pushed: about 7 hours ago - Stars: 2,818 - Forks: 1,248

daijiale/OCR_FontsSearchEngine

A OCR Search Engine With Tesseract Nutch Solr And PHP

Language: JavaScript - Size: 94.3 MB - Last synced: 24 days ago - Pushed: over 5 years ago - Stars: 111 - Forks: 20

USCDataScience/sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Language: Java - Size: 23.1 MB - Last synced: 25 days ago - Pushed: about 1 year ago - Stars: 409 - Forks: 142

jgimeno/solr-nutch-orchestrator

Launch fast and easy an Apache Solr linked with Apache Nutch in separated docker containers.

Size: 5.86 KB - Last synced: about 2 months ago - Pushed: over 8 years ago - Stars: 4 - Forks: 1

nasa-jpl-memex/memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data

Language: Python - Size: 14 MB - Last synced: 3 months ago - Pushed: over 8 years ago - Stars: 120 - Forks: 69

Saitejakatineni/SearchEngine

Developed as part of an Information Retrieval coursework, this project showcases a search engine that efficiently indexes and retrieves information from a given dataset.

Language: Python - Size: 27.1 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

hseghetti/simple-crawler

Simple crawler using apache nutch and elasticsearch

Language: Shell - Size: 7.81 KB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 4 - Forks: 1

mehroosali/Information-Retrieval-Search-Engine

Search Engine project for Information Retrieval class.

Language: Python - Size: 38.2 MB - Last synced: 9 months ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

apache/nutch-webapp

Apache Nutch is an extensible and scalable web crawler

Language: Java - Size: 124 KB - Last synced: 3 months ago - Pushed: 10 months ago - Stars: 6 - Forks: 6

AzeemQidwai/nutch-solr-mongodb

DataHarvest: Dockerized Web Crawling, Indexing, and Storage Solution

Language: Python - Size: 138 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

yegor256/nutch-in-java πŸ“¦

How to use Apache Nutch without command line

Language: Java - Size: 79.1 KB - Last synced: 8 days ago - Pushed: over 1 year ago - Stars: 13 - Forks: 6

nasa-jpl-memex/nutch-python Fork of chrismattmann/nutch-python

Python port of Nutch that allows controlling Apache Nutch via its REST API.

Language: Python - Size: 64.5 KB - Last synced: about 1 year ago - Pushed: over 8 years ago - Stars: 5 - Forks: 2

rootcss/nutch-cassandra-docker Fork of meabed/nutch-cassandra-docker

Nutch with Cassandra and Elasticsearch on Docker

Language: Shell - Size: 32.2 KB - Last synced: 9 months ago - Pushed: almost 7 years ago - Stars: 1 - Forks: 0

basraven/nutch-solr-integration

An ultra small PoC to show how to combine Apache Nutch and Apache Solr, crawling through web pages and storing the results in Solr for quering

Size: 17.6 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 9 - Forks: 7

cc-archive/discovered πŸ“¦

based on Apache Nutch

Language: Java - Size: 145 MB - Last synced: 24 days ago - Pushed: about 3 years ago - Stars: 0 - Forks: 0

openturing/turing-nutch πŸ“¦

:sparkles: :dna: Apache Nutch Plugin for Viglet Turing Search

Language: Java - Size: 188 KB - Last synced: 23 days ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

AGMLab/giranking

Link ranking with Apache Giraph for Apache Nutch

Language: Java - Size: 97.7 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 7 - Forks: 3

RonnyFalconeri/CrawlingSpider

A simple web crawler inside a docker container using Apache Nutch 1 and Solr.

Language: Dockerfile - Size: 23.4 KB - Last synced: about 1 year ago - Pushed: over 3 years ago - Stars: 2 - Forks: 0

nbro/FinancialNewsSearchEngine

A very simple search engine "specialised" in searching financial news.

Language: Shell - Size: 61.5 MB - Last synced: about 1 year ago - Pushed: over 7 years ago - Stars: 5 - Forks: 6

mversellie/void-engine-backend

Rest Service for Spring/Solr backed search engine.

Language: Java - Size: 72.3 KB - Last synced: 11 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

AkhilSourav/Distributed-Crawler

Web Crawler in a Distributed manner

Size: 3.07 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

SC-CS-KS/KS-SearchEngine

Search engine knowledge systemsοΌˆζœη΄’εΌ•ζ“ŽηŸ₯识体系).

Size: 1.84 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

dice-group/ldcbench-nutch-system-adapter

Apache Nutch system adapter for ORCA

Language: Java - Size: 13.1 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

asioso/elastic-6-nutch

Nutch 1.x Indexer Plugin that runs against ES6.7

Language: Java - Size: 21.5 KB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 3 - Forks: 2

BalestraPatrick/AppleSearch

A Vapor app consisting in a simple search engine built for my information retrieval course project.

Language: Swift - Size: 1.04 MB - Last synced: 9 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 0

BeccaLiu/FBI-vault-spatial-search

Developed a Spatial Search website that allow users to search documents from FBI Vault website. Extract the most frequently occurring location in each of documents, and load the geo-tagged data into Apache Solr to index the documents, visualize search results using the Google Maps API.

Language: Java - Size: 172 KB - Last synced: about 1 year ago - Pushed: over 9 years ago - Stars: 2 - Forks: 0

rahmanidashti/nutch-element-filter Fork of kaqqao/nutch-element-selector

Nutch 2.3.1 plugin for Whitelisting/Blacklisting specific HTML elements

Language: Java - Size: 26.4 KB - Last synced: about 1 year ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0