An open API service providing repository metadata for many open source software ecosystems.

Topic: "parallel-data"

thammegowda/mtdata

A tool that locates, downloads, and extracts machine translation corpora

Language: Python - Size: 6.36 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 154 - Forks: 23

PartitionedArrays/PartitionedArrays.jl

Large-scale, distributed, sparse linear algebra in Julia.

Language: Julia - Size: 5.58 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 125 - Forks: 21

VinAIResearch/PhoMT

PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)

Size: 11.7 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 4

Elbria/xling-SemDiv

Code and data for the EMNLP 2020 paper: "Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank"

Language: Python - Size: 9.36 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 3

lormaechea/wivico

Wikipedia-Vikidia Corpus (WiViCo) - A general-purpose parallel sentence simplification dataset for French

Size: 21.7 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 1

Datsede04/Amharic-corps-collector-bot

A Telegram Bot for Amharic Speech Data Collection

Language: JavaScript - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 3