GitHub topics: machine-translation-data-processing
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Language: Python - Size: 4.31 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 276 - Forks: 40

lt3/nfr
Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine translation.
Language: Python - Size: 34 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 11 - Forks: 2

ELDAELRA/elda_cmtk
ELDA Crawled Data Management Toolkit
Language: OCaml - Size: 166 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 3 - Forks: 1

geovedi/nmt-playground
Personal NMT Playground
Language: Python - Size: 95.7 MB - Last synced at: 7 days ago - Pushed at: almost 8 years ago - Stars: 2 - Forks: 1

MarsPanther/machine-translation-research
just trying to translate from Amharic to English
Language: Shell - Size: 888 KB - Last synced at: almost 2 years ago - Pushed at: about 7 years ago - Stars: 1 - Forks: 2

moodser/splitter-transliteration
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Language: Python - Size: 10.7 KB - Last synced at: almost 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

alphadl/corpus_filter
Scripts for machine translation corpora filtering/ 机器翻译平行语料过滤的脚本
Language: Python - Size: 313 KB - Last synced at: almost 2 years ago - Pushed at: about 6 years ago - Stars: 8 - Forks: 2

ShristiK/Cross-Lingual-Document-Translator
Translator developed and trained on a provided corpus using IBM model
Language: Jupyter Notebook - Size: 56.8 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

deepak2233/Nueral_Machine_Translation_Eng_to_Hin
Using Sq2Sq LSTM based model alsg with attension
Language: Jupyter Notebook - Size: 579 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 1 - Forks: 0

vzhomeexperiments/cloud_translate
repository for automatic files translation using Google Translate API and R Statistical Software
Language: R - Size: 40 KB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 6

mrsumitbd/SOParallelCorpusReplication
Replication package for SO processing for bitext
Language: Python - Size: 434 KB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 1

erayyildiz/parallel-sentence-quality-filter
Parallel sentence quality filter based on text classification methods
Language: Perl - Size: 1.21 MB - Last synced at: over 2 years ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0
