GitHub topics: corpus-generator
johentsch/ms3
A parser for annotated MuseScore 3 files.
Language: Python - Size: 105 MB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 47 - Forks: 3

user1342/AutoCorpus
AutoCorpus is a tool backed by a large language model (LLM) for automatically generating corpus files for fuzzing.
Language: Python - Size: 390 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 70 - Forks: 10

writecrow/crow_backend
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Language: PHP - Size: 2.41 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 1 - Forks: 0

biomedicalinformaticsgroup/cadmus
A full-text article retrieval pipeline for biomedical literature.
Language: Python - Size: 271 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 2

bitextor/bitextor
Bitextor generates translation memories from multilingual websites
Language: Python - Size: 177 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 290 - Forks: 43

patasmith/corpusmaker
Create a corpus for fine-tuning an OpenAI model
Language: Python - Size: 201 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

uma-pi1/OPIEC-pipeline
Language: Java - Size: 59.3 MB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 14 - Forks: 2

mohabmes/Sinai-corpus
A clean Fusha Arabic tagged corpus.
Language: Python - Size: 53.9 MB - Last synced at: 20 days ago - Pushed at: over 4 years ago - Stars: 10 - Forks: 0

Pendulun/WebCrawler
Language: Python - Size: 648 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

apple-fritter/weechat.driftwood
Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.
Language: Python - Size: 31.3 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

apple-fritter/scrimshaw
Scrimshaw parses IRC logs stored in the driftwood format for quotes attributable to a given user. Written in Rust.
Language: Rust - Size: 112 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

phueb/MissingAdjunct
Generate pseudo-English sentences for research in semantic composition
Language: Python - Size: 18.7 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

FerreroJeremy/Plagiarized-Corpus-Generator
A corpus builder for evaluation of plagiarism detection tools
Language: PHP - Size: 242 KB - Last synced at: about 2 years ago - Pushed at: over 8 years ago - Stars: 2 - Forks: 0

felipetovarhenao/exquisitecorpus
A set of corpus-based sampling & analysis M4L devices
Language: Max - Size: 22.8 MB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

ibnmalik/golden-corpus-arabic
golden arabic corpus build for test Assem's arabicstemmer and other arabic stemmers
Language: Python - Size: 17.6 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 8

saganoren/ukr-twi-corpus
A corpus of Ukrainian Twitter texts + instructions for downloading and filtering texts.
Language: Jupyter Notebook - Size: 49.5 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 11 - Forks: 3

thecsw/katya-dev
Katya or The Liberated Corpus a text corpus that allows you to request and scrape any web resource!
Language: Go - Size: 22.8 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

phueb/GroundedLang
A prototype for generating language in a grounded simulation of a simple hunter-gatherer world
Language: Python - Size: 139 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

ishalyminov/babi_tools
Augmentation scripts for the bAbI Dialog Tasks dataset
Language: Python - Size: 101 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 14 - Forks: 5
