Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: tika
dadoonet/fscrawler
Elasticsearch File System Crawler (FS Crawler)
Language: Java - Size: 14.6 MB - Last synced: about 5 hours ago - Pushed: about 5 hours ago - Stars: 1,309 - Forks: 294
apache/tika-helm
A Helm chart to deploy Apache Tika on Kubernetes.
Language: Smarty - Size: 86.9 KB - Last synced: about 11 hours ago - Pushed: about 19 hours ago - Stars: 20 - Forks: 15
OpenSextant/Xponents
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
Language: Java - Size: 78.5 MB - Last synced: 24 days ago - Pushed: 25 days ago - Stars: 42 - Forks: 7
sesam-community/content-extractor Fork of sesam-io/content-extraction-service
Extract textual information using the Apache Tika library from JSON streams
Language: Java - Size: 23.4 KB - Last synced: 2 days ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0
apache/tika-docker
Convenience Docker images for Apache Tika Server
Language: Shell - Size: 95.7 KB - Last synced: about 10 hours ago - Pushed: about 1 month ago - Stars: 102 - Forks: 58
albertus82/extfix
File Extension Fix Tool - Find and rename files with wrong extensions.
Language: Java - Size: 10.9 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 0 - Forks: 0
shelfio/tika-text-extract
Extract text from a document by Apache Tika
Language: TypeScript - Size: 318 KB - Last synced: 4 days ago - Pushed: 5 days ago - Stars: 15 - Forks: 4
kairohm/tikatree
Directory tree metadata parser using Apache Tika
Language: Python - Size: 42 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 3 - Forks: 0
TYPO3-Solr/ext-tika
A TYPO3 CMS extension that provides Apache Tika functionality
Language: PHP - Size: 2.07 MB - Last synced: 3 days ago - Pushed: 7 days ago - Stars: 6 - Forks: 29
kestra-io/plugin-tika
Language: Java - Size: 3.41 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 2 - Forks: 2
AlexsJones/kubernetes-apache-tika
Apache tika the attachment processor
Language: Shell - Size: 3.91 KB - Last synced: 8 days ago - Pushed: over 5 years ago - Stars: 1 - Forks: 1
commitd/krill
Improved HTML output for Tika extraction
Language: Java - Size: 1.92 MB - Last synced: 9 days ago - Pushed: over 1 year ago - Stars: 4 - Forks: 2
rse/tika-server
Apache Tika Server as a Background Service in Node.js
Language: JavaScript - Size: 75.2 KB - Last synced: 8 days ago - Pushed: about 1 month ago - Stars: 18 - Forks: 5
hmmh/typo3-solr-file-indexer
TYPO3 Extension: solr_file_indexer
Language: PHP - Size: 466 KB - Last synced: 13 days ago - Pushed: 7 months ago - Stars: 9 - Forks: 6
bcgov/nr-bcws-opensearch
opensearch related code
Language: Java - Size: 395 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 1 - Forks: 7
ICIJ/extract
A cross-platform command line tool for parallelised content extraction and analysis.
Language: Java - Size: 69.4 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 233 - Forks: 30
kressi/search-media
Parse media files with Apache Tika, add documents to Lucene index and query this index.
Language: Scala - Size: 30.3 MB - Last synced: 16 days ago - Pushed: about 7 years ago - Stars: 1 - Forks: 0
quarkiverse/quarkus-tika
Quarkus Tika extension
Language: Java - Size: 619 KB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 10 - Forks: 12
riccardo1980/simple-extractor
Simple test for document extractor
Language: Java - Size: 16.6 KB - Last synced: 18 days ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
shebinleo/pdf2html
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Language: JavaScript - Size: 939 KB - Last synced: 1 day ago - Pushed: 4 months ago - Stars: 138 - Forks: 29
juanpablo-santos/jspwiki-tika-searchprovider
Apache JSPWiki tika search provider integration sample
Size: 7.81 KB - Last synced: 20 days ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0
fedelemantuano/tika-app-python
Python bindings for Apache Tika
Language: Python - Size: 244 KB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 20 - Forks: 7
M-Haertling/WorkforceResearchGuide
This is a UTDallas senior design project developed for Alliance Data. Its purpose is to provide a more robust system for searching through a document repository. This is achieved through high level indexing and the addition of a tagging system. This is a Maven project. Third party libraries used include Apache Lucene, Apache Tika, and SQLite.
Language: Perl - Size: 43.3 MB - Last synced: 22 days ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0
chrismattmann/tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Language: Python - Size: 3.2 MB - Last synced: 8 days ago - Pushed: about 2 months ago - Stars: 102 - Forks: 59
DFKI/leechcrawler
Incremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
Language: Java - Size: 95.2 MB - Last synced: about 1 month ago - Pushed: 5 months ago - Stars: 8 - Forks: 5
apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Language: Java - Size: 231 MB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 2,137 - Forks: 740
sarbanandabhikkhu/tipitaka-xml
Roman Tipitaka (CSCD)
Language: JavaScript - Size: 55.6 MB - Last synced: 29 days ago - Pushed: 30 days ago - Stars: 1 - Forks: 0
USCDataScience/sparkler
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Language: Java - Size: 23.1 MB - Last synced: 25 days ago - Pushed: about 1 year ago - Stars: 409 - Forks: 142
chrismattmann/MLwithTensorFlow2ed
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
Language: Jupyter Notebook - Size: 546 MB - Last synced: 8 days ago - Pushed: over 1 year ago - Stars: 134 - Forks: 68
abhayalekal74/NLP-Information-Extraction
Extracting information from PDF files.
Language: Python - Size: 3.78 MB - Last synced: about 1 month ago - Pushed: about 5 years ago - Stars: 1 - Forks: 0
chrismattmann/imagecat
ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.
Language: Java - Size: 175 MB - Last synced: 8 days ago - Pushed: over 5 years ago - Stars: 94 - Forks: 40
alexferl/tika
Golang client for Apache Tika
Language: Go - Size: 11.7 KB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 6 - Forks: 1
StegarescuAnaMaria/Java_Indexer_and_Searcher
This project is a simulation of a search engine which outputs the path of the documents based on the search string query input.
Language: Java - Size: 15.6 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
nasa-jpl-memex/memex-explorer
Viewers for statistics and dashboarding of Domain Search Engine data
Language: Python - Size: 14 MB - Last synced: about 3 hours ago - Pushed: over 8 years ago - Stars: 121 - Forks: 69
sergio11/struts2-hibernate
This project demonstrates building a web application with Struts2, Apache Tika, Hibernate, and Wildfly 10. 🚀 Users can upload PDF files, extract text content using Apache Tika, and store metadata in a database using Hibernate. 🔒 Additionally, the project provides instructions for setting up a JDBC Realm on Wildfly 10 for enhanced security.
Language: Java - Size: 140 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
USCDataScience/tika-dockers
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
Size: 21.5 KB - Last synced: 8 days ago - Pushed: about 1 month ago - Stars: 20 - Forks: 6
KevM/tikaondotnet
Use the Java Tika text extraction library on the .NET platform
Language: Rich Text Format - Size: 155 MB - Last synced: 11 days ago - Pushed: 27 days ago - Stars: 193 - Forks: 73
wbicode/TikaService
A windows service wrapper for the tika JSR 311 network server.
Language: Batchfile - Size: 305 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 1 - Forks: 0
Dimous/tsundoku
Book Management System for e-bibliomaniacs
Language: Java - Size: 89.8 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
tspannhw/nifi-extracttext-processor
Apache NiFi Custom Processor Extracting Text From Files with Apache Tika
Language: Java - Size: 891 KB - Last synced: 24 days ago - Pushed: 9 months ago - Stars: 34 - Forks: 29
vaites/php-apache-tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Language: PHP - Size: 13.8 MB - Last synced: 15 days ago - Pushed: 8 months ago - Stars: 111 - Forks: 21
sergio11/document_search_engine_architecture
📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.
Language: Java - Size: 13.4 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 22 - Forks: 9
arquivo/dspace-link-extractor
Extracts links from DSpace repositories
Language: Java - Size: 62.9 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0
welle/JTika
Quick & Dirty project to generate java enumeration class for all mimetype in Apache Tika.
Language: Java - Size: 624 KB - Last synced: 6 months ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0
nasa-jpl-memex/image_space
Interactive Image similarity and Visual Search and Retrieval application
Language: JavaScript - Size: 2.25 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 93 - Forks: 46
alexoley/ReadWithMeBot
telegram bot available by username @ReadWithMeBot
Language: Kotlin - Size: 151 KB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
phantom0301/MetaSpider
基于Python和Tika的网络富文本元信息爬虫,Web crawler for rich text meta information based on Python and Tika
Language: Python - Size: 9.77 KB - Last synced: 7 months ago - Pushed: almost 6 years ago - Stars: 3 - Forks: 2
tirthmehta/Apache-Solr-based-Web-Search-Engine
Deployment of a search engine utilizing Apache Solr, Apache Tika and spelling correction programs.
Size: 14.6 KB - Last synced: 7 months ago - Pushed: almost 7 years ago - Stars: 0 - Forks: 0
mrcsparker/ruby_tika_app
A ruby wrapper for the Tika jar (tika-app.jar) that extracts text in a lot of formats from PDF, xls, doc, etc files
Language: DIGITAL Command Language - Size: 415 MB - Last synced: 6 days ago - Pushed: over 1 year ago - Stars: 26 - Forks: 20
Keerthivasan13/CSCI572-Information_Retrieval_And_Web_Search_Engines
Search Engine projects
Language: Java - Size: 34.5 MB - Last synced: 7 months ago - Pushed: almost 4 years ago - Stars: 11 - Forks: 17
nasa-jpl-memex/GeoPath-Clustering
To cluster geo paths that travel very similar paths
Language: HTML - Size: 10.5 MB - Last synced: 7 months ago - Pushed: almost 6 years ago - Stars: 5 - Forks: 7
nasa-jpl-memex/GeoParser
Extract and Visualize location from any file
Language: JavaScript - Size: 159 MB - Last synced: 8 days ago - Pushed: about 1 year ago - Stars: 53 - Forks: 23
liquidinvestigations/hoover-snoop2
Processing system for the search engine service in Liquid Investigations.
Language: Python - Size: 1.74 MB - Last synced: 26 days ago - Pushed: about 1 month ago - Stars: 6 - Forks: 5
catalyst/moodle-search_elastic
An Elasticsearch engine plugin for Moodle's Global Search
Language: PHP - Size: 1.35 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 13 - Forks: 15
httpreserve/tikalinkextract
Tika based link (URL) extractor for httpreserve
Language: HTML - Size: 171 MB - Last synced: 2 days ago - Pushed: almost 3 years ago - Stars: 8 - Forks: 1
lagenorhynque/tika
git diff settings for Microsoft Office files
Language: Shell - Size: 65.8 MB - Last synced: 26 days ago - Pushed: over 6 years ago - Stars: 10 - Forks: 1
whentotrade/Noggle.TikaOnDotNet
.NET Tika Wrapper
Language: Rich Text Format - Size: 95.1 MB - Last synced: 12 days ago - Pushed: almost 5 years ago - Stars: 2 - Forks: 1
sergeyt/pandora
Small box of pandora to prototype your app with ready for use backend. This is just my compilation of different solutions occasionally applied in hackathons and challenges
Language: Go - Size: 1.82 MB - Last synced: 30 days ago - Pushed: 3 months ago - Stars: 26 - Forks: 8
luisbalru/Information-Retrieval
Language: Java - Size: 2.02 MB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 1 - Forks: 1
sbelassa/SMIR
smart multimodal information retrieval project
Language: HTML - Size: 26.2 MB - Last synced: 9 months ago - Pushed: about 7 years ago - Stars: 0 - Forks: 0
Journalisme-UQAM/extractionPDF
Trois façons d'extraire le texte de fichiers PDF à l'aide de python
Language: Python - Size: 16.6 KB - Last synced: 9 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 1
khanium/couchbase-fts-binary
Demo project for uploading binary documents into Couchbase and indexing their metadata & content
Language: JavaScript - Size: 21.7 MB - Last synced: 9 months ago - Pushed: over 1 year ago - Stars: 3 - Forks: 3
public-law/oregon-law-parser
Distill information about amendments to the Oregon Revised Statutes.
Language: Haskell - Size: 50.1 MB - Last synced: 9 days ago - Pushed: 7 months ago - Stars: 17 - Forks: 3
puthurr/tika-docker
Contains a custom tika 1.x server docker image.
Language: Dockerfile - Size: 245 MB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
ipfs-search/ipfs-tika 📦
Java web application taking IPFS hashes, extracting (textual) content and metadata through Apache's Tika.
Language: Java - Size: 52.7 KB - Last synced: 20 minutes ago - Pushed: over 2 years ago - Stars: 30 - Forks: 5
hungneox/tika-php
A PHP client for Apache Tika
Language: PHP - Size: 11.7 KB - Last synced: 10 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 0
procesaur/TExASe
Flask application for OCR and extraction of text from documents with support for repository applications
Language: Python - Size: 14.7 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0
thecogworks/Cogworks.ExamineFileIndexer
An examine indexer that uses Apache Tika.
Language: C# - Size: 23.1 MB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 7 - Forks: 6
CogStack/CogStack-Pipeline 📦
Distributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning
Language: Java - Size: 25.6 MB - Last synced: 9 months ago - Pushed: over 1 year ago - Stars: 41 - Forks: 13
ropensci/rtika
R Interface to Apache Tika
Language: R - Size: 133 MB - Last synced: 3 months ago - Pushed: about 1 year ago - Stars: 54 - Forks: 8
schopenhauer/tikka
Flask-based file drop on sterioids, powered by Apache Tika
Language: Python - Size: 4.88 KB - Last synced: 9 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
codingstar77/Automated-College-Result-Management-System-
It Parses PDF result provided By Pune University automatically into the Database,Generates reports and notifies student about his/her result on email
Language: Java - Size: 504 KB - Last synced: about 1 year ago - Pushed: about 6 years ago - Stars: 2 - Forks: 1
catalyst/moodle-search_postgresfulltext
Moodle search engine implemented using Postgres full text indexing
Language: PHP - Size: 51.8 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 7
scotthaleen/py-tika-socket-server
Language: Clojure - Size: 133 KB - Last synced: about 1 year ago - Pushed: over 8 years ago - Stars: 0 - Forks: 1
Sotera/newman
Quickly analyze and explore email with advanced analytics and visualization.
Language: JavaScript - Size: 266 MB - Last synced: 9 months ago - Pushed: over 2 years ago - Stars: 50 - Forks: 14
mixpeek/top-ocr-libraries
Most popular open source OCR libraries listed by accuracy and speed
Size: 4.88 KB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 1 - Forks: 1
krish-kunal/task
Helps to parse bank statement(PDF)
Language: Python - Size: 34.4 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 3 - Forks: 0
izveigor/X-MAS-HACK
Веб-приложение, которое предсказывает тип документа по его содержанию 📝
Language: TypeScript - Size: 883 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0
cloudogu/spotter
Content-Type and language recognition library
Language: Java - Size: 246 KB - Last synced: 26 days ago - Pushed: 9 months ago - Stars: 4 - Forks: 2
Anthonyive/DSCI-550-Assignment-1 📦
📧 Analysis of Cyber Phishing Emails: Fraudulent Emails and Social Engineering.
Language: Jupyter Notebook - Size: 70.4 MB - Last synced: about 2 months ago - Pushed: about 3 years ago - Stars: 5 - Forks: 2
Anthonyive/DSCI-550-Assignment-2 📦
👨🦰 Large Scale Active Social Engineering Defense (ASED): Multimedia and Social Engineering
Language: HTML - Size: 154 MB - Last synced: about 2 months ago - Pushed: about 3 years ago - Stars: 6 - Forks: 2
mkalus/tika-page-extractor 📦
Tika per page PDF extractor server returning content as JSON.
Language: Java - Size: 19.5 KB - Last synced: about 1 year ago - Pushed: about 8 years ago - Stars: 6 - Forks: 3
chrismattmann/trec-dd-polar
A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.
Language: Shell - Size: 85 KB - Last synced: 8 days ago - Pushed: over 6 years ago - Stars: 13 - Forks: 7
TheoGicquel/L3-IrisaParser
Parse scientific papers using python
Language: Python - Size: 249 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
chrismattmann/drat
The Distributed Release Audit Tool (DRAT) for code analysis and verification.
Language: JavaScript - Size: 94.7 MB - Last synced: 8 days ago - Pushed: 10 months ago - Stars: 8 - Forks: 1
sarbanandabhikkhu/DhammaChakka
Early Buddhist texts from the Tipitaka (Tripitaka). Suttas (sutras) with the Buddha's teachings on mindfulness, insight, wisdom, and meditation.
Language: JavaScript - Size: 6.31 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
nguyenhiepvan/tika_server_forever Fork of vuthaihoc/tika_server_forever
Run tika server forever with health check process
Language: Shell - Size: 76.7 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
lguberan/LuceneFx
Tiny unofficial javafx demo application for Apache's Lucene and Tika.
Language: Java - Size: 79.1 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
jettdc/semester-search
Semester Search is a utility for quickly searching through downloadable class materials so that you can spend more time learning and less time clicking through dozens of links on your professors' websites.
Language: Go - Size: 66.5 MB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
jwo29/spring-boot-camunda
spring-boot-camunda
Language: Java - Size: 741 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
chrisbratlien/aws-bucketeer
Apache Solr/Tika index/search plus SHA256 content-based addressing for files stored into AWS S3 buckets
Language: PHP - Size: 150 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0
EricLondon/Docker-Rails-Tika-Elasticsearch
Docker Rails Tika Elasticsearch
Language: Ruby - Size: 147 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 1 - Forks: 0
Slvkelevra/information-retrieval-system
Information retrieval system for documents.
Language: HTML - Size: 78.9 MB - Last synced: 8 months ago - Pushed: about 2 years ago - Stars: 0 - Forks: 0
graboskyc/MQTTtoRealm
A c# console app to act as MQTT broker and write messages to MongoDB Realm
Language: C# - Size: 116 KB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0
wbicode/TikaService-Installer
A Windows Installer (MSI) for the windows service wrapper of the tika JSR 311 network server.
Language: C# - Size: 80.1 KB - Last synced: about 1 year ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0
FrodeRanders/disksearch
Indexes a directory hierarchy and provides a crude search interface onto that index
Language: Java - Size: 25.4 KB - Last synced: 16 days ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
opensemanticsearch/tesseract-ocr-cache
Tesseract OCR wrapper for Apache Tika and/or Open Semantic ETL caching the OCR results, so Tika-Server or Open Semantic ETL has not to reprocess slow and expensive OCR on same images again
Language: Python - Size: 32.2 KB - Last synced: 6 months ago - Pushed: over 2 years ago - Stars: 5 - Forks: 1
puthurr/tika-fork Fork of apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Language: Java - Size: 227 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0
opensemanticsearch/tika-server.deb
Apache Tika Server as Debian GNU/Linux and Ubuntu Linux package
Language: Dockerfile - Size: 47.4 MB - Last synced: 6 months ago - Pushed: over 1 year ago - Stars: 5 - Forks: 8
mrspaceman/elibraryserver
Language: Java - Size: 4.88 KB - Last synced: 24 days ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0