GitHub / cisnlp 3 Repositories
Deep Natural Language Processing Group at Center for Language and Information Processing, University of Munich (LMU)
cisnlp/GlotScript
🖋 Resource and Tool for Writing System Identification -- LREC 2024
Language: Python - Size: 128 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 14 - Forks: 2

cisnlp/MEXA
🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 0

cisnlp/GlotCC
🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
Language: Jupyter Notebook - Size: 2.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 0

cisnlp/GlotWeb
🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.
Language: Python - Size: 1.59 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 0

cisnlp/cisnlp.github.io
Homepage of cisnlp
Language: SCSS - Size: 47.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

cisnlp/simalign
Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
Language: Python - Size: 136 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 361 - Forks: 48

cisnlp/Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
Language: Python - Size: 151 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 100 - Forks: 4

cisnlp/code-specific-neurons
How Programming Concepts and Neurons Are Shared in Code Language Models
Language: Jupyter Notebook - Size: 2.54 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

cisnlp/manchu-in-context-mt
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
Language: Python - Size: 1.53 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cisnlp/GlotLID
Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Language: Python - Size: 409 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 92 - Forks: 7

cisnlp/ungoliant Fork of oscar-project/ungoliant
:spider: The pipeline for the OSCAR/GlotCC corpus
Language: Rust - Size: 4.4 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

cisnlp/oscar-io Fork of oscar-project/oscar-io
Readers/Writers for GlotCC/OSCAR corpus
Language: Rust - Size: 726 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

cisnlp/oscar-tools Fork of oscar-project/oscar-tools
The original tooling for the GlotCC/OSCAR corpus rewritten in Rust
Language: Rust - Size: 173 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

cisnlp/Taxi1500
Language: Python - Size: 49.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

cisnlp/analogical_reasoning
Language: JavaScript - Size: 2.03 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

cisnlp/lohoravens-webpage
Language: JavaScript - Size: 57.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

cisnlp/MaskLID
MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024
Language: Python - Size: 12.7 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

cisnlp/TransMI
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
Language: Python - Size: 278 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

cisnlp/XAMPLER
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Language: Python - Size: 70.3 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

cisnlp/graph-align
code for EMNLP graph align paper
Language: Python - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

cisnlp/GlotStoryBook
Children StoryBooks for 180 langauges.
Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

cisnlp/Spatial_Schemas
Language: JavaScript - Size: 2.78 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cisnlp/mPLM-Sim
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Language: Python - Size: 35.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cisnlp/ColexificationNet
Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

cisnlp/TransliCo
Language: Python - Size: 47.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cisnlp/ofa
A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
Language: Python - Size: 70.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cisnlp/bias-in-nlp
Literature overview: gender bias in natural language processing
Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 10 - Forks: 0

cisnlp/parcoure
ParCourE - Parallel Corpus Explorer
Language: Python - Size: 329 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 0

cisnlp/semi-markov-crf
Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"
Language: Python - Size: 31.1 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 16 - Forks: 4
