GitHub topics: document-processing
Daniel-codi/Concept_Curve_Embeddings_Indexation
Code to make any AI have unlimited context persistent memory. In the example, a software for any AI to read the Uniform Commercial Code of Michigan. A document of 220,000 tokens
Language: JavaScript - Size: 20.3 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 0 - Forks: 0

jmanhype/DSPy-Multi-Document-Agents
An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.
Language: Python - Size: 135 KB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 29 - Forks: 2

felixdittrich92/docling-OCR-OnnxTR
OnnxTR OCR plugin for Docling
Language: Python - Size: 1.47 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

ucbepic/TWIX
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
Language: Python - Size: 177 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 171 - Forks: 7

ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
Language: Python - Size: 66.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,937 - Forks: 185

kevv1m/tikara
The metadata and text content extractor for almost every file type.
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

awslabs/project-lakechain
:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
Language: TypeScript - Size: 177 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 177 - Forks: 26

mancrurod/Resume-Optimization
Personal project that automates resume adaptation using LLMs. Converts .docx resumes to Markdown, tailors them to job descriptions with GPT-4o-mini or Gemini, and exports clean HTML and PDF resumes — with built-in editing and logging features.
Language: Python - Size: 71.3 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

abdullahshafiq-20/ResumeTex
ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaTeX syntax.
Language: JavaScript - Size: 560 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 19 - Forks: 1

diegoabeltran16/OpenPages-pipeline
Open-source tool for turning technical documents into AI-ready formats. Built for better access to knowledge.
Language: Python - Size: 1.78 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

AshrafulAlamShaqib/pdf-page-counter
Offline web app to count pages in PDF files using PDF.js
Language: JavaScript - Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

AhmedZeyadTareq/Smart-markdown-Extractor
A smart AI-powered application to extract, reorganize, and interact with file content, converting it into clean Markdown format using OpenAI and Streamlit.
Language: Python - Size: 5.86 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

awslabs/rhubarb
A Python framework for multi-modal document understanding with Amazon Bedrock
Language: Python - Size: 31.7 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 82 - Forks: 6

iamarunbrahma/pdf-to-markdown
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.
Language: Python - Size: 69.3 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 74 - Forks: 7

0x22B9/ai-telegram-bot
AI Telegram bot using Gemini for chat, audio, and docs, with HuggingFace image gen. Deploy on Fly.io. Try it now!
Language: Python - Size: 233 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

formkiq/formkiq-core
A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Please 🌟 star to support our work!
Language: Java - Size: 20.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 128 - Forks: 18

enoch3712/ExtractThinker
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Language: Python - Size: 20.3 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,205 - Forks: 118

credeed/credeed-pdf-to-markdown
Convert PDF to Markdown using AI, can be used for Agent to understand documents.
Size: 0 Bytes - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 0 - Forks: 0

CentralFloridaAttorney/zmongo_retriever
Use data from MongoDB in LangChain, Llama and OpenAI
Language: Python - Size: 27.3 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 4 - Forks: 1

aws-samples/sample-document-processing-with-amazon-bedrock-data-automation
This repository contains examples for customers to get started using Amazon Bedrock Data Automation. The samples focus mainly on document processing use cases
Language: Jupyter Notebook - Size: 9.09 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 5 - Forks: 2

eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents.
Language: JavaScript - Size: 6.16 MB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 15 - Forks: 0

gs-ai/PDFProfessor
PDF Professor 2.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!
Language: Python - Size: 1.95 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

AmadeusITGroup/docs2vecs
CLI that helps with docs splitting, embedding and exposing them in a seamless manner
Language: Python - Size: 1.51 MB - Last synced at: 25 days ago - Pushed at: 26 days ago - Stars: 3 - Forks: 5

Node0/timbermill
OCR-powered chat session renderer that slices long conversations into paginated, searchable PDFs
Size: 3.91 KB - Last synced at: 5 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

swiss-ai-center/layout-analysis-service
Layout Analysis Service detect part of an image-based document using PP-PicoDet.
Language: Python - Size: 9.99 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai
This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.
Language: Jupyter Notebook - Size: 3.18 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

aws-solutions/enhanced-document-understanding-on-aws
Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
Language: JavaScript - Size: 61.7 MB - Last synced at: 27 days ago - Pushed at: about 1 month ago - Stars: 37 - Forks: 14

jromero132/pdf-splitter
PDF Splitter is a Python tool that takes a multi-page PDF file and splits it into individual PDF files, one for each page of the original document.
Language: Python - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

souvik03-136/TenderBot
Task
Language: Python - Size: 127 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

jromero132/pdf-merger
A Python utility for merging multiple PDFs and images into a single PDF file. This tool maintains aspect ratios, centers content on custom-sized pages (default A4), and supports recursive directory processing. Perfect for organizing documents and creating cohesive PDF compilations.
Language: Python - Size: 2.93 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

easytocloud/Mac-letterhead
A macOS utility for merging letterhead templates with PDF and Markdown documents using a drag-and-drop interface
Language: Python - Size: 3.2 MB - Last synced at: 10 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

QuiddityAI/PDFerret
An all-in-one converter to make your files LLM-understandable
Language: HTML - Size: 32.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

kili-technology/awesome-datasets
A comprehensive list of annotated training datasets classified by use case.
Size: 24.9 MB - Last synced at: 6 days ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 6

JDM-Github/debahra-efficio
DEHBARA (Efficio) is a React and Express-based web application designed to streamline service requests for DTI, SSS, and other document processing needs. It simplifies the process of requesting official papers and services, integrating cloud storage for efficient data management.
Language: TypeScript - Size: 13.3 MB - Last synced at: 27 days ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
Language: Python - Size: 1.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 66 - Forks: 1

FayazK/Document-Metadata-Extractor
A Python tool that uses Google's Gemini AI to automatically extract structured metadata from PDF and DOCX documents, saving results to Excel for easy analysis and organizing raw responses as JSON files.
Language: Python - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

aswinpradeepc/llmsearch
AI-powered search tool for querying financial reports, mutual fund documents, and market research using natural language. Built with FastAPI, Streamlit, OpenAI embeddings, and Pinecone vector search.
Language: Python - Size: 17.6 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

jcaperella29/Document_cleaning_CLI
A deep learning-based pipeline for cleaning scanned document images. Automatically removes noise, enhances text clarity, and optimizes images for OCR. 🚀
Language: MATLAB - Size: 94.5 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

unix-ami/Invertify
Invertify is a tool for inverting the colors of PDF files, perfect for creating dark mode versions of documents.
Language: Python - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

adhikaritusharAAA/Document_cleaning_CLI
A deep learning-based pipeline for cleaning scanned document images. Automatically removes noise, enhances text clarity, and optimizes images for OCR. 🚀
Language: Python - Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Swiftgum/swiftgum
The user data connection layer for AI applications. Transform any source into LLM-ready markdown. Focus on your AI, not integrations.
Language: TypeScript - Size: 3.05 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

KrzysztofTybinka/DocMiner
RAG APi with OCR feature, with option to use local embeddings and language models for secure, offline document processing and intelligent retrieval.
Language: C# - Size: 547 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Md-Emon-Hasan/LangChain
Powerful framework for building applications with Large Language Models (LLMs), enabling seamless integration with memory, agents, and external data sources.
Language: Jupyter Notebook - Size: 737 KB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

abdur75648/urdu-text-detection
Text line detection for Urdu OCR (UTRNet)
Language: Python - Size: 48.5 MB - Last synced at: 12 days ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

dhlab-epfl/dhSegment
Generic framework for historical document processing
Language: Python - Size: 5.89 MB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 374 - Forks: 115

baughmann/tikara
The metadata and text content extractor for almost every file type.
Language: Python - Size: 161 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

kallebysantos/ocrlot
A distributed ocr engine 🐆
Language: Elixir - Size: 291 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Jayanth-MKV/advanced-rag-cookbooks
Advanced RAG Techniques and Projects
Language: HTML - Size: 1.71 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

adibshirazi/PDFMerger
PDF Merger Tool
Language: TypeScript - Size: 13.7 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

acsenrafilho/cucaracha
A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis
Language: Python - Size: 6.44 MB - Last synced at: 3 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 1

oeo/processor-rs
High-performance document processing pipeline in Rust. Extracts text, performs OCR, and optimizes images from PDFs and other document formats with parallel processing and memory efficiency.
Language: Rust - Size: 42 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

qlfv/Docling-Testing
Repository for testing and demonstrating the capabilities of Docling for document conversion.
Language: HTML - Size: 18.4 MB - Last synced at: 24 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 2

Huang-lab/figure-extractor
Flask-based service using PDFFigures 2.0 to extract figures and tables from scholarly PDFs. Features REST API, CLI, Docker support, and JSON metadata output (~1.5s/page processing). Designed for document processing and RAG pipelines.
Language: Python - Size: 16.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

drgsn/filefusion
FileFusion is a powerful file concatenation tool designed specifically for Large Language Model (LLM)
Language: Go - Size: 173 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

LF3551/AutoDocMark
AutoDocMark: Streamline Document-to-Markdown Workflows
Language: Python - Size: 112 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

steindani/pandoc-include
An include filter for Pandoc
Language: Haskell - Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 62 - Forks: 20

BjornMelin/pdfusion
A lightweight Python utility for effortlessly merging multiple PDF files into a single document.
Language: Python - Size: 40 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

jayllfpt/table2html
A Python package that converts table images into HTML format using Object Detection model and OCR.
Language: Python - Size: 365 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

terilios/file-upload-embeddings
Enterprise-grade document intelligence platform leveraging vector embeddings and LLMs for advanced document processing, semantic search, and information retrieval.
Language: Python - Size: 173 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

MBAigner/PDFSegmenter
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.
Language: Python - Size: 399 KB - Last synced at: 15 days ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 3

maemresen/mae-ghostscript
mae-ghostscript is a Docker-based tool for compressing PDF files efficiently using Ghostscript. This containerized solution simplifies the process of PDF compression, providing a consistent environment that works across different platforms. Users can run the container by mounting their local directories and specifying the PDF to compress.
Language: Shell - Size: 12.7 KB - Last synced at: 2 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

towfique-elahe/pdf-to-structured-csv
A Python-based tool for extracting structured data from PDFs using OCR and regex, and exporting it to CSV. Ideal for processing invoices, logs, or scanned documents into organized, usable datasets.
Language: Jupyter Notebook - Size: 27.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 0

digiparser/digiparser-website
DigiParser | Extract data from documents and emails
Language: TypeScript - Size: 88.1 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

cburschka/lyx
Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)
Language: C++ - Size: 616 MB - Last synced at: 6 days ago - Pushed at: about 2 years ago - Stars: 36 - Forks: 7

deBUGger404/RAG-Powered-GPT-4-Chatbot
🚀 Revolutionize your data interaction with a cutting-edge chatbot built on Retrieval-Augmented Generation (RAG) and OpenAI’s GPT-4. Upload documents, create custom knowledge bases, and get precise, contextual answers. Ideal for research, business operations, customer support, and more!
Language: HTML - Size: 23.4 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 0 - Forks: 1

Shahrom-S/BarsAI
AI assistant
Language: Python - Size: 11.2 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

kaypro283/document-merger-analyzer
Automate merging of DOC, DOCX, and PDF files with word frequency analysis. Streamlines document consolidation for large-scale projects.
Language: Python - Size: 7.81 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

pratheeshkumar99/Document-based-Question-Answering-System
This project demonstrates a Retrieval-Augmented Generation (RAG) system for question answering. It integrates OpenAI’s GPT-4 model with FAISS for vector similarity search, enabling the system to provide accurate and contextually relevant answers based on a given document or dataset.
Language: Python - Size: 13.7 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

caltechlibrary/popstar
Phone-Oriented Processing SofTware for ARchives
Language: Makefile - Size: 49.2 MB - Last synced at: 29 days ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

swiss-ai-center/document-vectorizer-service
Service to vectorize documents into a FAISS vectorstore.
Language: Python - Size: 557 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

SDpDas/Document_annotate_tool
Adds annotation to each element in document and defines what it is.
Language: Python - Size: 292 KB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

ArtemZarubin/XmlDocumentProcessor
XmlDocumentProcessor: A .NET component for XML document processing. It analyzes XML content, performs keyword-based queries, and transforms data into HTML. Emphasizes design patterns like Strategy pattern, with a focus on class diagramming. Implements penalty for non-compliance.
Language: C# - Size: 19.5 KB - Last synced at: 2 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

dayang4321/MSc-Team-Project-CMPU9010-2023-24-Group-3
TU Dublin Computer Science MSc. Final Project Group 3 - Accessibilator
Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

Jackojc/old-wotpp 📦
A document preprocessor that works in conjunction with tools like groff/troff & refer.
Language: C++ - Size: 60.5 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 1 - Forks: 0

rina-reimer/uwb-hacks-ai-local
AI-powered chatbot designed to simplify the job search process
Language: TypeScript - Size: 443 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

thoth2357/Watermark-removal
Program Helps remove watermark from a pdf document
Language: Python - Size: 3.91 KB - Last synced at: 3 months ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

afrozas/proceedings
Semantic extraction from conference proceedings.
Language: Python - Size: 1.06 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 31 - Forks: 1

johnsirmon/clearcouncil
ClearCouncil: Automated tools for collecting, organizing, and embedding publicly available local state county council documents (minutes, agendas) into LLMs. Python, JS, and wget scripts included for easy data retrieval and integration.
Language: Python - Size: 71.3 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 1

cemonal/Pdf2xNet
Pdf2xNet is a .NET library for seamless integration with Xpdf tools, enabling easy conversion of PDF documents to text, images, and HTML formats within your .NET applications.
Language: C# - Size: 11.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

greed2411/tokyo
tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.
Language: Clojure - Size: 19.5 KB - Last synced at: 4 days ago - Pushed at: almost 5 years ago - Stars: 18 - Forks: 0

m4nd0mb3/document-templater
Document Templater is a powerful tool for automated document generation. Streamline the process of creating standard documents, such as contracts, reports, and forms, using predefined templates. This repository contains the source code for Document Templater, allowing you to easily integrate this functionality into your projects and automate docs.
Language: JavaScript - Size: 579 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 5 - Forks: 0

Oneirocom/generative-intent-detection
Generative intent detection with Magick
Language: TypeScript - Size: 42 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 2 - Forks: 0

fonckchain/pdf-text-converter
Python tool for converting PDF files to text. Simplify your document processing tasks.
Language: Python - Size: 1000 Bytes - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

x1ao4/doc-merger
通过 python 脚本将两个相对不完整的文档合并为一个完整的文档 / merge two relatively incomplete documents into one complete document via python script
Language: Python - Size: 22.5 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

jackvaughan09/phil Fork of hudnash/phil
Minimize the time requirement of audit report analysis with a containerized file conversion and scraping system
Language: Jupyter Notebook - Size: 106 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

anne27/Information-Retrieval
An implementation of basic IR techniques from scratch.
Language: Python - Size: 27.8 MB - Last synced at: 11 months ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 0

NinjaRocks/Data2Xml
Data2Xml is .Net 6.0 Library to map data to xml by list of XPATH. Supports data sets from API and SQL.
Language: C# - Size: 58.6 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

joseferrerh/invoices-leanautomation
This set of robots provides support for automatically obtaining information from invoices using docDigitizer API and keep track of the processed invoices on an Airtable repository
Language: RobotFramework - Size: 403 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

RPetitpierre/Generic_Semantic_Segmentation_of_Historical_Maps
Language: Jupyter Notebook - Size: 94.4 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

jeanbaptisteb/doccleaner
A Python command-line utility intended for automating some copyediting tasks in documents. It allows editing zipped, XML-based files (e.g. docx, odt, or epub), through XSLT stylesheets. Can be rather easily extended with your own custom xsl stylesheets.
Language: XSLT - Size: 81.1 KB - Last synced at: about 1 year ago - Pushed at: almost 7 years ago - Stars: 6 - Forks: 2

zyrolasting/dynamic-xml
Apply keyword procedures in a given Racket namespace using X-expressions.
Language: Racket - Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 0

trehman65/backtoschool
School/College Stationary List OCR and Parsing
Language: C++ - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: about 8 years ago - Stars: 0 - Forks: 0
