An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: tika-python

chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Language: Python - Size: 31.5 MB - Last synced at: 19 days ago - Pushed at: about 2 months ago - Stars: 1,587 - Forks: 240

stumpylog/tika-client

A modern Python REST client for Apache Tika server

Language: Python - Size: 2.17 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 6

chrismattmann/tika-similarity

Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.

Language: Python - Size: 3.22 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 107 - Forks: 60

chrismattmann/drat

The Distributed Release Audit Tool (DRAT) for code analysis and verification.

Language: JavaScript - Size: 94.7 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

abhayalekal74/NLP-Information-Extraction

Extracting information from PDF files.

Language: Python - Size: 3.78 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

mthompson64/DSCI550_Assignment3

USC DSCI 550 Assignment 3 - Spring 2021

Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

USCDataScience/tika-dockers

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video

Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 6

kimtth/pyspark-tika-text-extraction

🚴‍♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.

Language: Python - Size: 261 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

nasa-jpl-memex/image_space

Interactive Image similarity and Visual Search and Retrieval application

Language: JavaScript - Size: 2.25 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 93 - Forks: 46

nipun-goyal/DocuMeta-The-Art-of-Generating-Metadata

This project showcase the application of LDA Topic Modelling and KMeans Clustering for extracting information from the PDF documents

Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

izveigor/X-MAS-HACK

Веб-приложение, которое предсказывает тип документа по его содержанию 📝

Language: TypeScript - Size: 883 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

pmagtulis/practice-notebooks

Compilation of my coding practice notebooks tackling different stuff from simple Python to scraping and pandas.

Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

skupriienko/Pyxtract

python module for extracting texts from URL and PDF

Language: Jupyter Notebook - Size: 5.16 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

opensemanticsearch/tika-python.deb

tika-python as Debian GNU/Linux and Ubuntu Linux package

Size: 6.84 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 1