parallel-corpus | Topic | Ecosyste.ms: Repos

Topic: "parallel-corpus"

NiuTrans/Classical-Modern

非常全的文言文（古文）-现代文平行语料

Language: Python - Size: 400 MB - Last synced at: about 23 hours ago - Pushed at: about 1 year ago - Stars: 1,331 - Forks: 306

kirralabs/indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 228 - Forks: 50

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.

Language: Python - Size: 2.05 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 144 - Forks: 45

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

Language: Python - Size: 4.32 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 104 - Forks: 20

bfsujason/bertalign

Multilingual sentence alignment using sentence embeddings

Language: Python - Size: 296 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 97 - Forks: 44

FerreroJeremy/Cross-Language-Dataset

A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection

Size: 657 MB - Last synced at: about 2 years ago - Pushed at: almost 8 years ago - Stars: 60 - Forks: 25

tsuruoka-lab/BSD

The Business Scene Dialogue corpus

Size: 2.91 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 55 - Forks: 6

matbahasa/TALPCo

TUFS Asian Language Parallel Corpus

Language: TeX - Size: 1.57 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 48 - Forks: 13

sharad461/nepali-translator

Neural Machine Translation on the Nepali-English language pair

Language: Python - Size: 3.85 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 47 - Forks: 16

ShathaTm/LK-Hadith-Corpus

Leeds University and King Saud University (LK) Hadith Corpus

Language: Python - Size: 13.4 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 38 - Forks: 18

asmelashteka/HornMT

Machine translation (MT) benchmark dataset for languages in the Horn of Africa.

Size: 3.91 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 37 - Forks: 11

Caucasus-Rosetta/Lingua-Corpus

Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)

Language: Python - Size: 25.2 GB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 35 - Forks: 6

Kartikaggarwal98/Indian_ParallelCorpus

Curated list of publicly available parallel corpus for Indian Languages

Size: 8.79 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 28 - Forks: 1

BramVanroy/astred

An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instance useful for comparing a translation with the original text, to find differences and similarities between two different translations, or to see how a machine translation differs from a reference translation.

Language: Python - Size: 257 KB - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 22 - Forks: 0

korenyoni/opus-api

OPUS (opus.nlpl.eu) Python3 API

Language: Python - Size: 117 KB - Last synced at: 29 days ago - Pushed at: 6 months ago - Stars: 18 - Forks: 5

priyanshu2103/Sanskrit-Hindi-Machine-Translation

Machine Translation from Sanskrit to Hindi using Unsupervised and Supervised Learning

Language: Jupyter Notebook - Size: 9.69 MB - Last synced at: 10 months ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 12

KurdishBLARK/InterdialectCorpus

A parallel corpus of Sorani, Kurmanji and English

Size: 21.7 MB - Last synced at: 16 days ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 3

cfiltnlp/IITB-English-Hindi-PC

The IIT Bombay English-Hindi Parallel Corpus

Language: Jupyter Notebook - Size: 501 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 11 - Forks: 2

YerevaNN/PARASITE

🪱 PARASITE || A parallel sentence data preprocessing toolkit. Originally developed as a part of the `en-ru` winner submission of WMT20 Biomedical Translation Task.

Language: Python - Size: 426 KB - Last synced at: over 1 year ago - Pushed at: almost 4 years ago - Stars: 11 - Forks: 6

tanloong/interlaced.nvim

Neovim plugin for aligning bilingual parallel texts

Language: Lua - Size: 250 KB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 10 - Forks: 0

spraakbanken/swell-editor

Editor for normalising learner texts (error annotation and tagging.)

Language: TypeScript - Size: 4.18 MB - Last synced at: 23 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 3

soumendrak/MTEnglish2Odia 📦

Machine Translation from English to Odia language.

Language: Jupyter Notebook - Size: 62.6 MB - Last synced at: 10 months ago - Pushed at: almost 4 years ago - Stars: 9 - Forks: 7

michmech/irish-sentence-bank

4,500 sentences in Irish, tokenized, manually lemmatized, translated into English.

Size: 400 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 9 - Forks: 1

stibiumghost/tajik-to-persian-transliteration

Tajik-to-Persian transliteration project

Language: Jupyter Notebook - Size: 45.6 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 8 - Forks: 1

farshadjafari/parallel_corpus_generator

Python application, generating parallel corpus for any language pairs, can be used for training nmt (Neural Machine Translation) systems

Language: Python - Size: 33.8 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

UUDigitalHumanitieslab/timealign

Parallel corpus annotation and visualization

Language: Python - Size: 6.16 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 1

UUDigitalHumanitieslab/perfectextractor

Extracting present perfects (and related forms) from parallel corpora

Language: Python - Size: 1.26 MB - Last synced at: 17 days ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 2

rggdmonk/hadal

A simple and eﬀicient tool for mining and aligning sentences with pre-trained models.

Language: Python - Size: 680 KB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 6 - Forks: 0

shashanksiripragada/pib-crawl

Code to extract multilingual parallel corpus from Press Information Bureau (PIB) website.

Language: Python - Size: 1.03 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 6 - Forks: 4

x39826/Pali_Tripitaka

Pali Buddhist scriptures of 15 countries and its parallel corpus

Language: Python - Size: 56.6 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 4

PyThaiNLP/Thai-Lao-Parallel-Corpus

Thai Lao Parallel corpus

Size: 524 KB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

tldr-pages/tldr-translation-pairs-gen

Generates a structured dataset in various formats derived from tldr-pages.

Language: TypeScript - Size: 365 KB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 4 - Forks: 3

Deeptiman/php-dom-parser-translation-tool

A Simple DOM Parser and Translation Tool using PHP, HTML, and MySQL. The translation model is supported for English to Odia language. There is a built in dictionary to support the translation.

Language: PHP - Size: 4.62 MB - Last synced at: about 2 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

OdiaWikimedia/English-Odia 📦

English to Odia/Oriya parallel corpus of phrases

Language: Jupyter Notebook - Size: 4 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 4 - Forks: 4

x39826/Multilang_Translator_For_Pali_Tripitaka

Parallel corpus and multilingual machine translation system of the Pali Buddhist scriptures in 15 countries（15国巴利文大藏经平行语料与多语言机器翻译系统）

Language: Python - Size: 43.7 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 1

time-in-translation/preprocess-corpora

Creating (parallel) corpora from scratch using Uplug tooling

Language: Python - Size: 770 KB - Last synced at: 5 days ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 1

gederajeg/constructional-equivalence

Repository of supplementary materials and RStudio project for the paper on corpus-based approach to measuring constructional equivalence.

Language: TeX - Size: 2.53 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

tm4roon/jawikinews-headline-dataset

A parallel corpus of article-headline pairs obtained from Japanese Wikinews.

Language: Jupyter Notebook - Size: 7.9 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 0

davidwarrior22/machine-translation-for-african-languages

This repository focuses on developing machine translation and NLP tools specifically for African languages. Join us in addressing the challenges and opportunities in this vital area of language technology! 🛠️🌍

Language: TeX - Size: 177 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

AlexsandroANP/heiwukong-multilingual-corpora

“黑悟空” 多语言（平行）语料库，包含实现代码和 Streamlit 检索项目

Language: Python - Size: 5.52 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

Nexdata-AI/5310000-Groups-Chinese-Germany-Parallel-Corpus-Data

5310000-Groups-Chinese-Germany-Parallel-Corpus-Data

Size: 1.95 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/9830000-Groups-Chinese-Japanese-Parallel-Corpus-Data

Chinese-Japanese Parallel Corpus Data

Size: 1.08 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/6020000-Groups-Chinese-French-Parallel-Corpus-Data

6020000-Groups-Chinese-French-Parallel-Corpus-Data

Size: 1.15 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/1000000-Groups-Chinese-Russian-Parallel-Corpus-Data

1000000-Groups-Chinese-Russian-Parallel-Corpus-Data

Size: 1.16 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/1080000-Groups-English-Russian-Parallel-Corpus-Data

English and Russian parallel corpus

Size: 168 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/7440000-Groups-Chinese-Hindi-Parallel-Corpus-Data

7440000-Groups-Chinese-Hindi-Parallel-Corpus-Data

Size: 189 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/10030000-Groups-Chinese-Portuguese-Parallel-Corpus-Data

10030000-Groups-Chinese-Portuguese-Parallel-Corpus-Data

Size: 235 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/7290000-Groups--Chinese--Vietnamese-Parallel-Corpus-Data

7290000-Groups-Chinese-Vietnamese-Parallel-Corpus-Data

Size: 219 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/750000-Groups-Chinese-Burmese-Parallel-Corpus-Data

750000-Groups-Chinese-Burmese-Parallel-Corpus-Data

Size: 128 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/4720000-Groups-Chinese-Uighur-Parallel-Corpus-Data

4720000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Size: 153 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/850000-Groups-English-Japanese-Parallel-Corpus-Data

English Japanese Parallel Corpus Data

Size: 187 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/3140000-Groups-Chinese-Spanish-Parallel-Corpus-Data

3140000-Groups-Chinese-Spanish-Parallel-Corpus-Data

Size: 223 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/12820000-Groups-Chinese-Korean-Parallel-Corpus-Data

Chinese-Korean-Parallel-Corpus-Data

Size: 147 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/1140000-Groups-Chinese-Hebrew-Parallel-Corpus-Data

1140000-Groups-Chinese-Hebrew-Parallel-Corpus-Data

Size: 158 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/5010000-Groups-Chinese-Tibetan-Parallel-Corpus-Data

5010000-Groups-Chinese-Tibetan-Parallel-Corpus-Data

Size: 142 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/APY230328001_980000-Groups-Chinese-Urdu-Parallel-Corpus-Data

APY230328001_980000-Groups-Chinese-Urdu-Parallel-Corpus-Data

Size: 186 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

Size: 257 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/3060000-Groups-Chinese-English-Parallel-Corpus-Data

Chinese-English-Parallel-Corpus-Data

Size: 141 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/100000-Groups-Chinese-Uighur-Parallel-Corpus-Data

100000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Size: 1000 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/380000-Groups-Japanese-English-Parallel-Corpus-Data

Japanese and English parallel corpus

Size: 1.57 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/1340000-Groups-English-Korean-Parallel-Corpus-Data

English and Korean parallel corpus

Size: 161 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Nexdata-AI/380000-Groups-Uighur-Chinese-Parallel-Corpus-Data

380000-Groups-Uighur-Chinese-Parallel-Corpus-Data

Size: 1020 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

umoqnier/Esquite Fork of ElotlMX/Esquite

Framework para corpus paralelos | Framework for parallel corpora

Language: Python - Size: 6.9 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

xavier-gz/SLI_Galician_Corpora

Corpora for Galician language

Size: 365 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

time-in-translation/ParTy2OPUS

ParTy2OPUS converts documents from the ParTy corpus to the OPUS format

Language: Python - Size: 3.91 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ikros98/ULMFiT-for-Italian

ULMFit model for the Italian language / creation of a parallel corpus

Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

OdiaNLP/wikipedia-corpus 📦

Odia wikipedia monolingual corpus extraction

Language: Python - Size: 1.02 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0