An open API service providing repository metadata for many open source software ecosystems.

Topic: "apache-tika"

tspannhw/OpenSourceComputerVision

Open Source Computer Vision with TensorFlow, MiniFi, Apache NiFi, OpenCV, Apache Tika and Python For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos and more which are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. A pache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing. At the Apache NiFi cluster in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with it's schema referenced from a central Schema Registry) is routed and routed using Record Processing and SQL, this data in enriched and augment before conversion to AVRO to be send via Apache Kafka to SAM. Streaming Analytics Manager then does deeper processing on this stream and others including weather and twitter to determine what should be done on this data. References https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html https://github.com/tspannhw/rpi-noir-screen https://community.hortonworks.com/articles/77988/ingest-remote-camera-images-from-raspberry-pi-via.html https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html

Language: Python - Size: 419 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 43 - Forks: 17

Deep2018530/FileParseUtil

可以将word(doc、docx)、excel、pdf、ppt、csv、txt文件的文本内容提取出来,同时能够提取出word、pdf文件的目录

Language: Java - Size: 243 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 17

fedelemantuano/tika-app-python

Python bindings for Apache Tika

Language: Python - Size: 244 KB - Last synced at: 2 days ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 5

USCDataScience/tika-dockers

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video

Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 6

greed2411/tokyo

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

Language: Clojure - Size: 19.5 KB - Last synced at: 2 days ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 0

shelfio/tika-text-extract

Extract text from a document by Apache Tika

Language: TypeScript - Size: 346 KB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 17 - Forks: 6

shelfio/apache-tika-lambda-layer

AWS Lambda layer containing latest version of Apache Tika

Language: Shell - Size: 327 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 14 - Forks: 6

IBM/visualize-unstructured-data-with-watson 📦

Visualize unstructured data using Watson NLU

Language: CoffeeScript - Size: 855 KB - Last synced at: 2 days ago - Pushed at: almost 4 years ago - Stars: 10 - Forks: 14

tspannhw/ApacheDeepLearning101

ApacheDeepLearning101

Language: Python - Size: 16.9 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 0

tspannhw/nifi-langdetect-processor

Apache NiFi + Apache Tika + OptimaizeLangDetector

Language: Java - Size: 78.3 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

alexferl/tika 📦

Golang client for Apache Tika

Language: Go - Size: 11.7 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 6 - Forks: 1

saidsef/tika-document-to-text

Apache Tika - Toolkit detects and extracts metadata

Language: JavaScript - Size: 604 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 3

kimtth/pyspark-tika-text-extraction

🚴‍♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.

Language: Python - Size: 261 MB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

tspannhw/nifi-processors

All my processors (NARs) in one place

Size: 36.2 MB - Last synced at: about 1 year ago - Pushed at: almost 6 years ago - Stars: 5 - Forks: 0

immontilla/file-uploading-web-app

A security in mind file uploading web app

Language: Java - Size: 859 KB - Last synced at: 14 days ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 1

kairohm/tikatree

Directory tree metadata parser using Apache Tika

Language: Python - Size: 42 KB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

fraponyo94/Text-Extraction-Scanned-Pdf

Text extraction from scanned pdf documents in java

Language: Java - Size: 2.64 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 4

MaxSquared-WebCraft/findit

Document management system implemented with microservices

Language: TypeScript - Size: 35.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

cfsimplicity/lucee-tika

Lucee wrapper for Apache Tika

Language: ColdFusion - Size: 44.4 MB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 1

OmarAssadi/matroska-tika

Tika detector for MKV and WebM

Language: Java - Size: 66.4 KB - Last synced at: 6 days ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

yashajoshi/PDF-Search-Engine-for-UN-agencies-and-NGOs-

A simple information retrieval system, a PDF Search Engine for UN agencies and NGOs.

Language: Jupyter Notebook - Size: 1.4 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 1

USCDataScience/tika-dl-models

A place to release saved machine learning models for tika-dl

Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

BeccaLiu/FBI-vault-spatial-search

Developed a Spatial Search website that allow users to search documents from FBI Vault website. Extract the most frequently occurring location in each of documents, and load the geo-tagged data into Apache Solr to index the documents, visualize search results using the Google Maps API.

Language: Java - Size: 172 KB - Last synced at: 6 months ago - Pushed at: over 10 years ago - Stars: 2 - Forks: 0

withzombies/tika-magic

A permissively licensed crate to detect MIME types

Language: Rust - Size: 65.7 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

nenadjakic/ocr-studio

This application is designed for managing OCR (Optical Character Recognition) tasks. It allows users to define, schedule, and execute OCR tasks through a REST API. The core technologies used are Spring Framework, MongoDB, and Tesseract OCR.

Language: Kotlin - Size: 189 KB - Last synced at: 4 days ago - Pushed at: 27 days ago - Stars: 1 - Forks: 0

baughmann/tikara

The metadata and text content extractor for almost every file type.

Language: Python - Size: 161 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

todd-gavin/DSCI550-PixstoryMediaExtractionAndAnalysis

Extraction analysis of PixStory Social Media Dataset using language detection, language translation, tike geotopic parser, tika image object recognition/image caption generation, and PyTorch detoxify.

Language: Jupyter Notebook - Size: 349 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

ergottli/text_recognition_container

Language: Python - Size: 22.1 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

orijtech/tikago

Apache Tika adapter in Go

Language: Go - Size: 48 MB - Last synced at: 4 days ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 0

kevv1m/tikara

The metadata and text content extractor for almost every file type.

Size: 1000 Bytes - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

glebshur/song-microservice

microservice web application for uploading and downloading audio files

Language: Java - Size: 3.85 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

DALAI-project/Document-analysis_API

This API use Annif as local server, NER component is included. It also includes Tesseract and uses Apache-tika software for language detection. It also has a limited multilingual support.

Language: Python - Size: 22.5 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

todd-gavin/DSCI550-PixstoryDataAnalysis

Analysis of PixStory social media data combined with Snapchat, COVID-19, and YouTube data. This project uses the Apache Tika Clustering software to cluster certain social media posts together.

Language: Jupyter Notebook - Size: 227 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

gctools-outilsgc/apache-solr-search

This repository holds everything that is required to run the Apache Solr Engine and its functionality to crawl documents

Language: JavaScript - Size: 3.27 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

RyanQuey/es-index-onedrive

Apache Tika integration built in scala for indexing OneDrive files into ElasticSearch.

Language: Scala - Size: 189 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

BogdanKandra/romanian-information-retrieval-system

Information Retrieval system for indexing and searching files stored on disk, with support for Romanian language

Language: Java - Size: 116 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

jhecking/tika-lambda Fork of cmaxwellau/tika-lambda

Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3

Language: Java - Size: 104 KB - Last synced at: 3 days ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

bjverde/cargaCorretagem

Application in php to test load of pdf files, using docker-compose and apache-tika.

Language: PHP - Size: 7.86 MB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

aswath86/AWS-lambda-S3-to-Elastic-Indexing-Connector

AWS Lambda code to index S3 buckets into Elasticsearch

Language: Java - Size: 3.91 KB - Last synced at: almost 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

sidmishraw/Broodmother

[SLOW][WIP] Broodmother is a high performance, distributed, search engine using Apache Tika, Apache Solr, Akka, Neo4j, and Spring.

Language: Java - Size: 64.5 KB - Last synced at: 2 months ago - Pushed at: about 7 years ago - Stars: 0 - Forks: 0

immontilla/secure-file-uploader

Secure file uploader web application

Size: 2.93 KB - Last synced at: 14 days ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

ldkhanh/simple-search

Using Apache Lucene, TIKI, Solr

Language: PHP - Size: 12.4 MB - Last synced at: about 2 years ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

sidmishraw/autobot

PDF parsing and extraction utility using Apache Tika

Language: Java - Size: 48.3 MB - Last synced at: 2 months ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

saxenaj/DocContentIndexing

Language: Java - Size: 10.1 MB - Last synced at: almost 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0