GitHub topics: tika-python
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Language: Python - Size: 31.5 MB - Last synced at: 19 days ago - Pushed at: about 2 months ago - Stars: 1,587 - Forks: 240

stumpylog/tika-client
A modern Python REST client for Apache Tika server
Language: Python - Size: 2.17 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 6

chrismattmann/tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Language: Python - Size: 3.22 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 107 - Forks: 60

chrismattmann/drat
The Distributed Release Audit Tool (DRAT) for code analysis and verification.
Language: JavaScript - Size: 94.7 MB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

abhayalekal74/NLP-Information-Extraction
Extracting information from PDF files.
Language: Python - Size: 3.78 MB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

mthompson64/DSCI550_Assignment3
USC DSCI 550 Assignment 3 - Spring 2021
Language: Jupyter Notebook - Size: 53.4 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 0 - Forks: 0

USCDataScience/tika-dockers
A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
Size: 21.5 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 6

kimtth/pyspark-tika-text-extraction
🚴♂️⛷Data Lake, Performance tuning for text extraction from a huge amount of files.
Language: Python - Size: 261 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 0

nasa-jpl-memex/image_space
Interactive Image similarity and Visual Search and Retrieval application
Language: JavaScript - Size: 2.25 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 93 - Forks: 46

nipun-goyal/DocuMeta-The-Art-of-Generating-Metadata
This project showcase the application of LDA Topic Modelling and KMeans Clustering for extracting information from the PDF documents
Language: Jupyter Notebook - Size: 1.05 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

izveigor/X-MAS-HACK
Веб-приложение, которое предсказывает тип документа по его содержанию 📝
Language: TypeScript - Size: 883 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

pmagtulis/practice-notebooks
Compilation of my coding practice notebooks tackling different stuff from simple Python to scraping and pandas.
Language: Jupyter Notebook - Size: 11.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

skupriienko/Pyxtract
python module for extracting texts from URL and PDF
Language: Jupyter Notebook - Size: 5.16 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 1

opensemanticsearch/tika-python.deb
tika-python as Debian GNU/Linux and Ubuntu Linux package
Size: 6.84 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 1
