Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: dsfsi-datasets
dsfsi/gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
Language: Jupyter Notebook - Size: 1.32 GB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 3 - Forks: 0
dsfsi/Higher_Education_EDA
This is an EDA Git for education researchers and practitioners
Language: Jupyter Notebook - Size: 1.24 MB - Last synced: 4 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 1
dsfsi/covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Language: Jupyter Notebook - Size: 87.4 MB - Last synced: 3 months ago - Pushed: 6 months ago - Stars: 256 - Forks: 200
dsfsi/vukuzenzele-nlp Fork of dsfsi/dsfsi-dataset-template
The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.
Language: Jupyter Notebook - Size: 5.28 GB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 6 - Forks: 3
dsfsi/PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
Language: Makefile - Size: 3.56 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 3 - Forks: 0
dsfsi/izindaba-zesizulu
Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.
Size: 2.15 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
dsfsi/embedding-eval-data
Embedding Evaluation Data for South African Languages
Size: 8.79 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0
dsfsi/dlindaba-2019-uber
UBER Rider Rating Data from the DLIndaba 2019
Size: 11.7 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
dsfsi/sa-parliament
South African Member Of Parliament Data
Language: Python - Size: 61.5 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 5
dsfsi/StatsSA-Language
StatsSA statistical language glossary in machine-readable format
Language: Jupyter Notebook - Size: 30.3 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 2
dsfsi/za-bank-risk
This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIE–FRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.
Language: Jupyter Notebook - Size: 1.31 GB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0
dsfsi/project-state-capture
Zondo Commission or State Capture Commission Transcripts
Size: 50.3 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0
dsfsi/za-terminology
DSFSI South African Terminlogy Lists and Lexicon Project
Language: Makefile - Size: 1.27 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
dsfsi/healthfacilitymap
South African Health Facility map. Created to aid in covid19za responses
Language: JavaScript - Size: 485 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 1
dsfsi/gov-za-sona-multilingual
Language: Python - Size: 2.35 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 0 - Forks: 0
dsfsi/za-fake-news-2020
Dataset of South African Disinformation [Fake News] Website Data collected in 2020
Size: 15.6 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 3
dsfsi/za-isizulu-siswati-news-2022
IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022
Size: 292 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 1 - Forks: 0
dsfsi/PuoData
Curated corpora for Setswana. Used to train PuoBERTa.
Size: 8.32 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 2 - Forks: 0