An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: glotcc

cisnlp/GlotLID

💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023

Language: Python - Size: 438 KB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 136 - Forks: 8

cisnlp/GlotCC

🕸 GlotCC Dataset and Pipline -- NeurIPS 2024

Language: Jupyter Notebook - Size: 2.31 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 18 - Forks: 0

cisnlp/ungoliant Fork of oscar-project/ungoliant

:spider: The pipeline for the OSCAR/GlotCC corpus

Language: Rust - Size: 4.4 MB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 3 - Forks: 0

cisnlp/oscar-io Fork of oscar-project/oscar-io

Readers/Writers for GlotCC/OSCAR corpus

Language: Rust - Size: 726 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

cisnlp/oscar-tools Fork of oscar-project/oscar-tools

The original tooling for the GlotCC/OSCAR corpus rewritten in Rust

Language: Rust - Size: 173 KB - Last synced at: 3 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0