Topic: "parallel-data"
thammegowda/mtdata
A tool that locates, downloads, and extracts machine translation corpora
Language: Python - Size: 6.36 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 154 - Forks: 23

PartitionedArrays/PartitionedArrays.jl
Large-scale, distributed, sparse linear algebra in Julia.
Language: Julia - Size: 5.58 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 125 - Forks: 21

VinAIResearch/PhoMT
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
Size: 11.7 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 43 - Forks: 4

Elbria/xling-SemDiv
Code and data for the EMNLP 2020 paper: "Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank"
Language: Python - Size: 9.36 MB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 6 - Forks: 3

lormaechea/wivico
Wikipedia-Vikidia Corpus (WiViCo) - A general-purpose parallel sentence simplification dataset for French
Size: 21.7 MB - Last synced at: 19 days ago - Pushed at: 20 days ago - Stars: 2 - Forks: 1

Datsede04/Amharic-corps-collector-bot
A Telegram Bot for Amharic Speech Data Collection
Language: JavaScript - Size: 43.9 KB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 3
