An open API service providing repository metadata for many open source software ecosystems.

Topic: "pdf-parser"

opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 124 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 34,114 - Forks: 2,746

py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Language: Python - Size: 20.5 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 9,079 - Forks: 1,470

dromara/yft-design

基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。A beautiful and powerful online design tool

Language: TypeScript - Size: 50.8 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 1,285 - Forks: 256

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

Language: Rust - Size: 2.88 MB - Last synced at: 18 days ago - Pushed at: 5 months ago - Stars: 1,101 - Forks: 46

adithya-s-k/marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

Language: Python - Size: 35 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 854 - Forks: 96

drmingler/docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.

Language: Python - Size: 3.48 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 585 - Forks: 58

liweiphys/layra

LAYRA is a ready-to-use visual RAG system with a complete web UI built with Next.js and FastAPI, preserving document layout, tables, paragraphs, and graphical elements without any structural fragmentation.

Language: TypeScript - Size: 2.61 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 427 - Forks: 42

titipata/scipdf_parser

Python PDF parser for scientific publications: content and figures

Language: Python - Size: 29.2 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 403 - Forks: 64

iamarunbrahma/vision-parse

Parse PDFs into markdown using Vision LLMs

Language: Python - Size: 374 KB - Last synced at: 11 days ago - Pushed at: 4 months ago - Stars: 373 - Forks: 51

lazyFrogLOL/llmdocparser

A package for parsing PDFs and analyzing their content using LLMs.

Language: Python - Size: 1.21 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 269 - Forks: 9

michelcrypt4d4mus/pdfalyzer

Analyze PDFs. With colors. And Yara.

Language: Python - Size: 93.5 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 265 - Forks: 19

ispras/dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

Language: Python - Size: 235 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 237 - Forks: 28

sypht-team/sypht-python-client

A python client for the Sypht API

Language: Python - Size: 165 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 162 - Forks: 5

codereverser/casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

Language: Python - Size: 7.85 MB - Last synced at: about 10 hours ago - Pushed at: 3 months ago - Stars: 145 - Forks: 66

sypht-team/sypht-java-client

A Java client for the Sypht API

Language: Java - Size: 108 KB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 87 - Forks: 1

BitMiracle/Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

Language: Visual Basic .NET - Size: 53.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 78 - Forks: 39

datalogics/adobe-pdf-library-samples

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

Size: 43.3 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 77 - Forks: 62

autokent/pdf-parse

Pure javascript cross-platform module to extract texts from PDFs.

Last synced at: 29 days ago - Stars: 66 - Forks: 53

drmingler/smart-llm-loader

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

Language: Python - Size: 1.09 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 65 - Forks: 2

oidlabs-com/Lexoid

Multimodal document parser for high quality data understanding and extraction

Language: Python - Size: 46.7 MB - Last synced at: about 9 hours ago - Pushed at: about 20 hours ago - Stars: 62 - Forks: 8

davendw49/sciparser

PDF parsing toolkit for preparing academic text corpus

Language: Python - Size: 113 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 45 - Forks: 2

k16shikano/hpdft

tools to poke pdf using haskell

Language: Haskell - Size: 403 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 43 - Forks: 0

ashutoshvarma/pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

Language: Cython - Size: 12.2 MB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 42 - Forks: 17

SimpleApp/PDFParser

Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

Language: Swift - Size: 146 KB - Last synced at: 6 months ago - Pushed at: almost 6 years ago - Stars: 37 - Forks: 10

sypht-team/sypht-golang-client

A Golang client for the Sypht API

Language: Go - Size: 73.2 KB - Last synced at: about 2 months ago - Pushed at: almost 5 years ago - Stars: 33 - Forks: 0

lesterchan/linkedin-pdf-resume-parser

Parse LinkedIn PDF Resume and extract out name, email, education and work experiences.

Language: PHP - Size: 289 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 25 - Forks: 11

sylphxltd/pdf-reader-mcp

An MCP server built with Node.js/TypeScript that allows AI agents to securely read PDF files (local or URL) and extract text, metadata, or page counts. Uses pdf-parse.

Language: TypeScript - Size: 1.15 MB - Last synced at: 7 days ago - Pushed at: 14 days ago - Stars: 24 - Forks: 3

ridi/content-parser

Content data parser for Ridibooks services

Language: JavaScript - Size: 49.2 MB - Last synced at: 13 days ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 7

dunso/pdf-parser

Convert PDF content and layout information with pdf.js

Language: JavaScript - Size: 2.18 MB - Last synced at: 10 days ago - Pushed at: over 5 years ago - Stars: 21 - Forks: 7

lucasjvds/Scanipy

Scanipy stands for "scan it with Python"—it's your smart Python library for scanning and parsing complex PDF files like books, reports, articles, and academic papers. Utilizing cutting-edge Deep Learning algorithms, Scanipy transforms your PDFs into a treasure trove of extractable information: tables, images, equations, and text.

Language: Python - Size: 273 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 1

tarfin-labs/easy-pdf

Pdf wrapper for laravel

Language: PHP - Size: 204 KB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 17 - Forks: 3

tuffstuff9/nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

Language: TypeScript - Size: 44.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

adrienjoly/HsbcStatementParser

Transforms PDF bank statements from HSBC into a list of operations in JSON or TSV format.

Language: JavaScript - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: over 9 years ago - Stars: 17 - Forks: 6

nlitsme/pyPdfCrack

Investigation in PDF encryption

Language: Python - Size: 34.2 KB - Last synced at: about 2 months ago - Pushed at: almost 2 years ago - Stars: 16 - Forks: 7

sypht-team/sypht-node-client

A Nodejs client for the Sypht API

Language: JavaScript - Size: 62.5 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 13 - Forks: 4

VishwaGauravIn/pdf-parser-client-side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

Language: TypeScript - Size: 26.4 KB - Last synced at: 7 days ago - Pushed at: 12 months ago - Stars: 12 - Forks: 0

sypht-team/sypht-kotlin-client

A Kotlin client for the Sypht API

Language: Kotlin - Size: 136 KB - Last synced at: about 2 months ago - Pushed at: about 2 years ago - Stars: 11 - Forks: 2

yintellect/auto-law-review

Automate the case review on legal case documents.

Language: Jupyter Notebook - Size: 30.5 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 11 - Forks: 3

easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

Language: Jupyter Notebook - Size: 85.9 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 4

clarekang/form-pdf2json

NodeJS library to convert JSON to PDF or vice versa

Language: JavaScript - Size: 2.67 MB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 2

shine-jayakumar/Extract-Data-From-PDF-In-Python

Batch-convert pdf to text, extract data from pdf in python

Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 4

datalogics/apdfl-csharp-dotnet-samples

Sample code for the Datalogics .NET interface of the Adobe PDF Library

Language: C# - Size: 315 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 9

datalogics/apdfl-cplusplus-samples

Sample code for the Datalogics C++ interface of the Adobe PDF Library

Language: C++ - Size: 11.1 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 8 - Forks: 7

Daniel-Alvarenga/Boot Fork of VitorCarvalho67/Boot

Digital platform tailored for the educational environment, designed to facilitate the dissemination of internship opportunities and promote student engagement

Language: Vue - Size: 8.16 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

ashutoshvarma/libxpdf

Static library built from source of www.xpdfreader.com with most of dependencies built within

Language: C++ - Size: 613 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 8 - Forks: 4

aidayang/MinerU-OneClick

MinerU免安装部署一键启动整合包

Size: 49.8 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

sypht-team/sypht-csharp-client

A C# / .NET client for the Sypht API

Language: C# - Size: 65.4 KB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 1

sypht-team/sypht-elixir-client

An Elixir client for the Sypht API https://sypht.com

Language: Elixir - Size: 47.9 KB - Last synced at: 11 days ago - Pushed at: about 5 years ago - Stars: 6 - Forks: 0

bkawan/pdf-parser

Language: Python - Size: 3.25 MB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 5 - Forks: 0

sidmishraw/cs-267-project

PDF-Parser and Apriori and Simplical Complex algorithm implementations

Language: Python - Size: 10.8 MB - Last synced at: about 2 months ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 0

datalogics/apdfl-java-maven-samples

Sample code for the Datalogics Java interface of the Adobe PDF Library setup to build with Maven

Language: Java - Size: 1.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 4 - Forks: 11

ishaangupta-YB/nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and custom drag nd drop file-uploader. Ideal for developers seeking a ready-to-use solution for PDF content extraction in their Next.js projects.

Language: TypeScript - Size: 200 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 3

aleff-github/PDF-Parser-VirusTotal-Based 📦

PDF Parser based on VirusTotal API

Language: Python - Size: 709 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

sypht-team/sypht-ruby-client

A Ruby client for the Sypht API

Language: Ruby - Size: 53.7 KB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 0

datalogics/apdfl-csharp-dotnet-framework-samples

Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

Language: C# - Size: 563 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 9

FlazeFy/Gudangku-Laravel

GudangKu helps you manage your belongings, from home supplies and food stock to furniture. Set reminders to remind you to cleaning or maybe time to restocking some of your home supplies. In this apps also have generate reports to create shopping or maintenance list. Start organizing your inventory with GudangKu’s features. Created using Laravel

Language: PHP - Size: 1.31 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 3 - Forks: 0

Stravah/eosin

Custom Bank Statement Parsing based on pure text positioning.

Language: Python - Size: 5.22 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 3 - Forks: 1

eli64s/pdflex

CLI for merging PDF contexts.

Language: Python - Size: 465 KB - Last synced at: 21 days ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

tomrule007/pdf-template-parse

JS Front-end PDF parser with template engine to convert pdf documents into organized data objects

Language: JavaScript - Size: 80.1 KB - Last synced at: 5 days ago - Pushed at: over 4 years ago - Stars: 3 - Forks: 1

cschen1205/spring-pdf-search-engine

PDF Search Engine implemented in Java and Spring Boot

Language: Java - Size: 77 MB - Last synced at: about 2 months ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 5

seinecle/nocodefunctions-io

io for nocodefunctions: csv, txt, pdf, and xlsx so far

Language: Java - Size: 240 KB - Last synced at: about 1 hour ago - Pushed at: about 10 hours ago - Stars: 2 - Forks: 0

bansalsahab/Parser

pdf heading parser

Language: Python - Size: 12.7 KB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

ashot-israelyan/nextjs-pdf-openai-chat

A demo AI application for uploading PDF files and chatting withChatGPT regarding the content

Language: TypeScript - Size: 2.42 MB - Last synced at: 7 months ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

leandroroser/prettyparser

Parallel processing and parsing PDF and TXT files, and Python objects with text (str, list) using rules (regular expressions).

Language: Python - Size: 106 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

Aumlo123/pdfdoom

DOOM in a PDF (as ascii art)

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Besthope-Official/predoc

Preprocess document service for RAG (Retriveal Augumented Generation)

Language: Python - Size: 102 KB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 2

datalogics/apdfl-vb-dotnet-samples

Adobe PDF Library Samples in Visual Basic for .NET

Language: Visual Basic .NET - Size: 174 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 4

judaicalink/rdf_generator

A library to generate rdf files in turtle format for Judaicalink.

Language: Python - Size: 27.3 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

J-sephB-lt-n/pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

Language: Python - Size: 65.4 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1 - Forks: 1

aqiftekhar/OpenAIChatBot

This is a healthcare Chatbot implemented using Open AI that also recieve PDF Documents and Images and prescribe based on summary

Language: TypeScript - Size: 73.2 KB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

greatjourney589/rogu-platform

React&Firebase platform for Ecommerce&Game

Language: JavaScript - Size: 176 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Kanchii/avenue-brokerage-to-excel

A simple script to convert Avenue's brokerage statements to excel, extracting some data

Language: Python - Size: 11.7 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Siddhantsingh1230/SnapCV

A Simple NLP Web App to create summaries of your CVs

Language: CSS - Size: 343 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

BuildmodeOne/canisius-parser

A pdf parser to extract the meal plan from the "Katholische Canisiusstiftung" in Ingolstadt

Language: TypeScript - Size: 634 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 1 - Forks: 0

102AMIT/bank_statement_parser

Language: JavaScript - Size: 35 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

thinkoid/ypdf

A PDF parser.

Language: C++ - Size: 147 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

danielphan-dp/pdf-tools

PDF tools in Python. Including scripts to process multiple files at once.

Language: Python - Size: 15.3 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

minjunk/welstory-menu-pdf-parser 📦

웰스토리 메뉴 PDF Parser

Language: TypeScript - Size: 130 KB - Last synced at: 9 days ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 2

bratergit/hacktoberfest2020

Hacktoberfest 2020 - Faça um programa desktop que rode no terminal que dado um pdf da toro investimentos com as corretagens do dia. Mostre o Cálculo do Imposto de Renda para day trade do mini dolar e mini índice da bovespa.

Language: JavaScript - Size: 92.8 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 2

eliottvincent/cep Fork of zarov/cep

📜 parse your Caisse d'Épargne PDF statements to CSV!

Language: Python - Size: 134 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

vnyk/Pdf-Parser-Python

Pdf parser that can extract the information from a pdf file in a string and can store the extracted information in MySql

Language: Python - Size: 1000 Bytes - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 3

Polyte/OMS_OCR

This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.

Language: TypeScript - Size: 69.9 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

Stranger123444/u

An interactive command-line tool designed to quickly navigate directories and perform various file operations efficiently. Its simple syntax and intuitive commands make it a favorite among developers for streamlining workflow tasks.

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

dadicharan/Log-Analyzer

Log Analyzer with AI is a Streamlit-based tool for AI-powered log analysis. It supports CSV log uploads, data visualization (Plotly & Matplotlib), and anomaly detection using DeepSeek LLM via Ollama API. Users can explore logs, detect patterns, and gain AI-driven insights. 🚀 Python, Pandas, Streamlit, AI

Language: Python - Size: 13.7 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

datalogics/apdfl-kotlin-samples

Adobe PDF Library Samples in Kotlin

Language: Kotlin - Size: 139 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 6

code-418-dpr/SportHub-parser

Парсер PDF-файла ЕКП Минспорта РФ для проекта SportHub

Language: Python - Size: 4.08 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

diegoabeltran16/OpenPages-pipeline

Open-source tool for turning technical documents into AI-ready formats. Built for better access to knowledge.

Language: Python - Size: 1.78 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

aescarias/pdfnaut

A Python library for exploring PDFs with ease.

Language: Python - Size: 773 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 0 - Forks: 0

chinmaymisra/personal-finance-tracker

Upload Axis Bank statements as PDFs, automatically parse transactions, and view them cleanly in a modern UI. Handles invalid files and non-supported banks gracefully. Built using React (Vite) and FastAPI.

Language: Python - Size: 143 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

sankeer28/PDF-Searcher

Live website to parse multiple PDFs using PDF.js to find matching text

Language: JavaScript - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

luccaHirae/invoice-extract-server

API para extração de dados de faturas

Language: TypeScript - Size: 75.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

FayazK/Document-Metadata-Extractor

A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.

Language: Python - Size: 11.7 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cuiyuheng/olmocr Fork of allenai/olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Size: 30.9 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

dills122/cardboard-crack

Web app for parsing/viewing Soccer Card Checklists

Language: JavaScript - Size: 1.3 MB - Last synced at: 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

race-tech/f1-data-updater

A repository made to update automatically the f1 database used in the f1-api.

Language: Rust - Size: 194 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

RiccardoSenica/pdf-text-parsing

PDF-parsing demo

Language: TypeScript - Size: 167 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Alapipapi/MinerU Fork of opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Language: Python - Size: 103 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

cuiyuheng/docling Fork of docling-project/docling

🥚 Transform PDF to JSON or Markdown with ease and speed 🐣

Size: 28.5 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

yvnggodemis/pdf-parse

PDF Parser built in Rust

Language: Rust - Size: 146 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

antea-p/flashcard_maker

Flashcard maker written in TypeScript, utilizing OpenAI API to create great cloze flashcards.

Language: TypeScript - Size: 25.4 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0