An open API service providing repository metadata for many open source software ecosystems.

GitHub / cisnlp 3 Repositories

Deep Natural Language Processing Group at Center for Language and Information Processing, University of Munich (LMU)

cisnlp/GlotScript

🖋 Resource and Tool for Writing System Identification -- LREC 2024

Language: Python - Size: 128 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 14 - Forks: 2

cisnlp/MEXA

🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

Language: Python - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 10 - Forks: 0

cisnlp/GlotCC

🕸 GlotCC Dataset and Pipline -- NeurIPS 2024

Language: Jupyter Notebook - Size: 2.31 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 18 - Forks: 0

cisnlp/GlotWeb

🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.

Language: Python - Size: 1.59 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 0

cisnlp/cisnlp.github.io

Homepage of cisnlp

Language: SCSS - Size: 47.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

cisnlp/simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)

Language: Python - Size: 136 KB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 361 - Forks: 48

cisnlp/Glot500

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023

Language: Python - Size: 151 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 100 - Forks: 4

cisnlp/code-specific-neurons

How Programming Concepts and Neurons Are Shared in Code Language Models

Language: Jupyter Notebook - Size: 2.54 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

cisnlp/manchu-in-context-mt

Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu

Language: Python - Size: 1.53 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

cisnlp/GlotLID

Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

Language: Python - Size: 409 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 92 - Forks: 7

cisnlp/ungoliant Fork of oscar-project/ungoliant

:spider: The pipeline for the OSCAR/GlotCC corpus

Language: Rust - Size: 4.4 MB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 3 - Forks: 0

cisnlp/oscar-io Fork of oscar-project/oscar-io

Readers/Writers for GlotCC/OSCAR corpus

Language: Rust - Size: 726 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

cisnlp/oscar-tools Fork of oscar-project/oscar-tools

The original tooling for the GlotCC/OSCAR corpus rewritten in Rust

Language: Rust - Size: 173 KB - Last synced at: 4 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

cisnlp/Taxi1500

Language: Python - Size: 49.6 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 0

cisnlp/analogical_reasoning

Language: JavaScript - Size: 2.03 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

cisnlp/lohoravens-webpage

Language: JavaScript - Size: 57.3 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 0

cisnlp/MaskLID

MaskLID: Code-Switching Language Identification through Iterative Masking -- ACL 2024

Language: Python - Size: 12.7 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 2 - Forks: 0

cisnlp/TransMI

TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data

Language: Python - Size: 278 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

cisnlp/XAMPLER

XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples

Language: Python - Size: 70.3 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

cisnlp/graph-align

code for EMNLP graph align paper

Language: Python - Size: 4 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 0

cisnlp/GlotStoryBook

Children StoryBooks for 180 langauges.

Language: Jupyter Notebook - Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

cisnlp/Spatial_Schemas

Language: JavaScript - Size: 2.78 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cisnlp/mPLM-Sim

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Language: Python - Size: 35.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cisnlp/ColexificationNet

Language: Jupyter Notebook - Size: 12.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

cisnlp/TransliCo

Language: Python - Size: 47.9 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cisnlp/ofa

A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining

Language: Python - Size: 70.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

cisnlp/bias-in-nlp

Literature overview: gender bias in natural language processing

Language: Python - Size: 13.7 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 10 - Forks: 0

cisnlp/parcoure

ParCourE - Parallel Corpus Explorer

Language: Python - Size: 329 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 10 - Forks: 0

cisnlp/semi-markov-crf

Code for paper "Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging"

Language: Python - Size: 31.1 MB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 16 - Forks: 4