An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: semi-structured-data

RomualdRousseau/Archery

Framework to manipulate semi structured documents and extract data from them

Language: Java - Size: 194 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 1 - Forks: 1

snap-stanford/stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (NeurIPS D&B 2024)

Language: Python - Size: 8.78 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 310 - Forks: 37

Amur-N/Semi-structured-Dataset-Collection

An open collection includes 100+ semi-structured textual datasets. (LOG datasets, TXT datasets, CSV datasets etc.)

Language: PHP - Size: 945 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

BartJongejan/Bracmat

Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.

Language: C - Size: 23.9 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 47 - Forks: 5

VorTECHsa/refinery

Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.

Language: Kotlin - Size: 387 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 50 - Forks: 6

kuhumcst/texton-Java

Web-based workflow management system that computes candidate tool workflows given input file(s) and the user's requirements regarding the output. Afterwards, runs a workflow selected by the user from the list of candidates. Implemented in Bracmat (~75%) and Java (~25%).

Language: Java - Size: 13.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 2

ansh-info/stark-agent

STaRK: Agentic AI benchmark, which is designed to evaluate how well LLMs and retrieval systems work with semi-structured knowledge bases.

Language: Python - Size: 2.53 MB - Last synced at: 10 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

taehyounpark/queryosity

Coherent data analysis library

Language: C++ - Size: 5.27 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 0

Dibyakanti/AutoTNLI-code

This repository contains the official code for the paper : Realistic Data Augmentation Framework for Enhancing Tabular Reasoning.

Language: HTML - Size: 3.99 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 6 - Forks: 1

ropensci/EndoMineR

Endoscopic and Pathological data extraction for various endo-pathological data extraction

Language: R - Size: 50.8 MB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 13 - Forks: 4

RomualdRousseau/Any2Json-Parquet

Any2Jaon Parquet Plugin

Language: Java - Size: 36.4 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Pdf

Any2Json PDF Plugin

Language: Java - Size: 27.8 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Dbf

Any2Json Dbf Plugin

Language: Java - Size: 29.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Net-Classifier

Any2Json Net Classifier Plugin

Language: Java - Size: 647 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

RomualdRousseau/Any2Json-Layex-Parser

Any2Json Layex Parser Plugin

Language: Java - Size: 495 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2json-Llm-Classifier

Any2Json LLM Classifier Plugin

Language: Java - Size: 409 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Excel

Any2Json Excel Plugin

Language: Java - Size: 56.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/PyAny2Json

Python binding of Any2Json

Language: Python - Size: 4.32 MB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Examples

Examples that demonstrates how you can use the Any2Json to load documents from "real life".

Language: Java - Size: 114 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

meaghancoconnor/prerequiste_checks

A python program which parses student transcript data to determine eligibility

Language: Python - Size: 5.86 KB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

RomualdRousseau/Any2Json-Documents

Documentation how you can use the Any2Json to load documents from "real life".

Language: TeX - Size: 3.55 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

eternalchaoticinflation/YoutubeChannelReader

Language: Java - Size: 72.3 KB - Last synced at: about 1 year ago - Pushed at: about 8 years ago - Stars: 1 - Forks: 0

RomualdRousseau/Any2Json-Models

Repository of basic Models for Any2Json

Size: 27.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Info-Sync/InfoSync

Implementation of the semi-structured inference model in our ACL 2023 paper: INFOSYNC: Information Synchronization across Multilingual Semi-structured Tables.

Language: HTML - Size: 244 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

utahnlp/infotabs-code

Implementation of the semi-structured inference model in our ACL 2020 paper, INFOTABS: Inference on Tables as Semi-structured Data.

Language: Python - Size: 127 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 17 - Forks: 7

utahnlp/knowledge_infotabs

Repository containing code for the NAACL 2021 paper (Incorporating External Knowledge to Enhance Tabular Reasoning)

Language: Python - Size: 16.3 MB - Last synced at: about 1 year ago - Pushed at: almost 4 years ago - Stars: 17 - Forks: 5

mansakondo/activemodel-embedding

An ActiveModel extension to model your semi-structured data using embedded associations

Language: Ruby - Size: 200 KB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

MaimoonaKhilji/Hive-Queries

Hive queries

Size: 870 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

rub-ksv/MyFixit-Annotator

A semi-automatic web-based annotation tool for MyFixit dataset :

Language: CSS - Size: 3.91 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

rub-ksv/MyFixit-Dataset

A dataset for extracting information from repair manuals

Language: Python - Size: 41.3 MB - Last synced at: over 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 3

cyk1337/UrbanDict

Urban Dict spelling variant dataset. Source code of How to Evaluate Word Representations of Informal Domain?

Language: Jupyter Notebook - Size: 118 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 3 - Forks: 0

sebastiz/EndoMineR Fork of ropensci/EndoMineR

Endoscopic and Pathological data extraction for various endo-pathological data extraction

Language: R - Size: 52.9 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

Promeos/Am_I_Speaking_Your_Language Fork of NLP-Darden-Project-Team-6/NLP-3

❗️WIP❗️ Using semi-structured data from GitHub, I predicted the programming language of a repository with X% accuracy using a [Place Holder] Classifier Model. The model outperformed the baseline by X%.

Language: Jupyter Notebook - Size: 12.6 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

ngmy/eloquent-serialized-lob

Eloquent Serialized LOB is a trait for Laravel Eloquent models that allows Serialized LOB pattern

Language: PHP - Size: 306 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

patrikken/PrefTwig2Stack

Java Standalone application for querying XML documents with requests with preferences (GTPs requests with preferences)

Language: Java - Size: 30.3 MB - Last synced at: 3 months ago - Pushed at: over 5 years ago - Stars: 2 - Forks: 0