Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: corpus
kscanne/gaelg
NLP resources for Manx Gaelic, mainly in support of the gv2ga MT engine
Language: Perl - Size: 11.3 MB - Last synced: about 3 hours ago - Pushed: about 5 hours ago - Stars: 3 - Forks: 1
s-lilo/brat-peek
Framework for working with brat-annotated .ann files
Language: Python - Size: 146 KB - Last synced: about 10 hours ago - Pushed: about 12 hours ago - Stars: 7 - Forks: 2
luciamariaalvarezcrespo/GalMisoCorpus2023
:bookmark_tabs: Galician corpus for misogyny detection
Language: Python - Size: 4.22 MB - Last synced: about 11 hours ago - Pushed: about 13 hours ago - Stars: 6 - Forks: 0
endymecy/awesome-deeplearning-resources
Deep Learning and deep reinforcement learning research papers and some codes
Size: 290 MB - Last synced: about 4 hours ago - Pushed: 2 months ago - Stars: 2,824 - Forks: 664
sagesolar/Corpus-of-Taylor-Swift
This is a dataset consisting of all song lyric words found on all of Taylor Swift's studio albums (up to and including TTPD), as well as a selection of other songs written by her.
Size: 5.93 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 4 - Forks: 0
DFKI-NLP/product-corpus
This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the relation CompanyProvidesProduct.
Size: 22 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 12 - Forks: 2
open-discourse/open-discourse
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Language: Python - Size: 1.68 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 81 - Forks: 7
dracor-org/gerdracor
German Drama Corpus
Language: CSS - Size: 133 MB - Last synced: 1 day ago - Pushed: 1 day ago - Stars: 8 - Forks: 10
centre-for-educational-technology/evkk
ELLE - Estonian language learning and analysis environment for learners, educators and linguists
Language: JavaScript - Size: 109 MB - Last synced: about 9 hours ago - Pushed: 1 day ago - Stars: 1 - Forks: 3
saferwall/malware-souk
Collaborative malware exchange repository.
Language: Python - Size: 55.8 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 27 - Forks: 7
amir-zeldes/gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Language: Python - Size: 1.06 GB - Last synced: 3 days ago - Pushed: 5 days ago - Stars: 86 - Forks: 51
fendouai/Awesome-Chatbot
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Language: Python - Size: 12.7 KB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 2,006 - Forks: 406
quanteda/quanteda
An R package for the Quantitative Analysis of Textual Data
Language: R - Size: 741 MB - Last synced: about 20 hours ago - Pushed: 2 days ago - Stars: 825 - Forks: 186
mhbashari/awesome-persian-nlp-ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Size: 192 KB - Last synced: about 7 hours ago - Pushed: 6 months ago - Stars: 700 - Forks: 113
khashashin/chechen_corpora
This repository contains the source code for the Chechen Language Corpora website.
Language: TypeScript - Size: 4.43 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 6 - Forks: 2
esteeschwarz/SPUND-LX
linguistics essais
Language: HTML - Size: 74.6 MB - Last synced: 3 days ago - Pushed: 4 days ago - Stars: 0 - Forks: 0
UniversalDependencies/UD_Portuguese-Bosque
This Universal Dependencies (UD) Portuguese treebank.
Language: Common Lisp - Size: 209 MB - Last synced: 4 days ago - Pushed: 4 days ago - Stars: 46 - Forks: 11
PyThaiNLP/thaigov-v2-corpus
Thai News Dataset from Thai government website.
Language: Jupyter Notebook - Size: 89.3 MB - Last synced: 8 days ago - Pushed: 8 days ago - Stars: 11 - Forks: 1
sparkfish/shabby-pages
ShabbyPages is a state-of-the-art corpus of born-digital document images with both ground truth and distorted versions appropriate for use in training models to reverse distortions and recover to original denoised documents.
Language: Jupyter Notebook - Size: 84.2 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 42 - Forks: 5
ko-nlp/Korpora
Korean corpus repository
Language: Python - Size: 3.32 MB - Last synced: 5 days ago - Pushed: over 1 year ago - Stars: 651 - Forks: 77
franciellevargas/HausaHate
HausaHate is a benchmark dataset for Hausa hate speech detection task. it was extracted from West African Facebook pages and comprises 2,000 comments annotated according to a binary class (offensive and non-offensive) and hate speech targets (race, gender and none).
Size: 894 KB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
spottolaq/corpus-spotted-2020
This repository houses a comprehensive collection of 14,701 Instagram posts authored by Italian university students between January 2020 and December 2020. These posts offer invaluable insights into the experiences and reflections of students during the challenging period of the COVID-19 lockdown in Italy.
Size: 16.6 KB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0
oroszgy/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
Size: 110 KB - Last synced: 6 days ago - Pushed: 6 months ago - Stars: 207 - Forks: 18
gauravcodepro/pubmed-abstract-fetcher
This function will prepare the abstract and the id information for all the pubmed articles that you want to read and have as a citation. I coded this using a web scraping approach and it is blazing fast and parses better than ncbi eutils
Language: Python - Size: 27.3 KB - Last synced: 2 days ago - Pushed: 8 days ago - Stars: 1 - Forks: 0
OYE93/Chinese-NLP-Corpus
Collections of Chinese NLP corpus
Language: Python - Size: 7.14 MB - Last synced: 5 days ago - Pushed: over 3 years ago - Stars: 848 - Forks: 207
dkalpakchi/awesome-swedish-nlp
A curated list of resources for natural language processing (NLP) in Swedish
Size: 25.4 KB - Last synced: about 6 hours ago - Pushed: over 1 year ago - Stars: 19 - Forks: 1
qundao/corpus
语料数据和词库收集:中文、英文停用词,情感分析,分类词典,敏感词库(违禁词,审查词)
Size: 19.6 MB - Last synced: 7 days ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
EmilStenstrom/suc_to_iob
Convert the SUC 3.0 corpus from a custom format to IOB2 for use in training NER applications
Language: Python - Size: 25.4 KB - Last synced: 8 days ago - Pushed: over 2 years ago - Stars: 5 - Forks: 2
kan-bayashi/VCTKCorpusFullContextLabel
Full context label for VCTK Corpus.
Size: 54.3 MB - Last synced: 8 days ago - Pushed: about 4 years ago - Stars: 2 - Forks: 0
kamaravichow/text-summariser-python
Simple text summariser using NLTK in python
Language: Jupyter Notebook - Size: 15.6 KB - Last synced: 8 days ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0
yilunzhu/ontogum
Repository for the OntoGUM Corpus
Language: Python - Size: 8.36 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 6 - Forks: 0
bertvandepoel/snelSLiM
A linguistic set of tools in Go and web interface in PHP to do quick Stable Lexical Marker Analysis
Language: JavaScript - Size: 4.2 MB - Last synced: 8 days ago - Pushed: almost 3 years ago - Stars: 3 - Forks: 0
mikahama/SemFi
Semantic relations for Finnish words
Language: HTML - Size: 79.1 KB - Last synced: 8 days ago - Pushed: 5 months ago - Stars: 2 - Forks: 2
rcarmo/newsfeed-corpus
A Dockerized RSS feed fetcher for NLP work, using asyncio
Language: JavaScript - Size: 794 KB - Last synced: 8 days ago - Pushed: over 1 year ago - Stars: 20 - Forks: 2
dariusk/corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Language: JavaScript - Size: 2.94 MB - Last synced: 7 days ago - Pushed: 3 months ago - Stars: 4,852 - Forks: 1,299
franciellevargas/SentiAspect-pt
The SentiAspect-pt comprises 180 product reviews annotated according to implicit and explicit fine-grained opinions, which were hierarchically organized for aspect-based sentiment analysis and opinion summarization applications.
Size: 1.48 MB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 5 - Forks: 1
mesolitica/malaysian-dataset
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Language: Jupyter Notebook - Size: 1.32 GB - Last synced: 9 days ago - Pushed: 9 days ago - Stars: 282 - Forks: 101
quasilyte/phpcorpus
A collection of various PHP code; useful for PHP tools writers to get some insights on how "real-world" PHP code looks like
Size: 10.7 KB - Last synced: 10 days ago - Pushed: about 2 years ago - Stars: 1 - Forks: 1
quasilyte/eldb
Emacs Lisp corpus. Code collected from many-many projects for you to query it!
Language: Emacs Lisp - Size: 347 KB - Last synced: 10 days ago - Pushed: over 6 years ago - Stars: 0 - Forks: 0
proycon/colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
Language: C++ - Size: 10.1 MB - Last synced: 3 days ago - Pushed: 6 months ago - Stars: 123 - Forks: 19
Taiwan-Social-Media-Corpus/blacklab-demo
A repo that demonstrates how to build Blacklab corpus via Docker and Nginx.
Language: Shell - Size: 184 KB - Last synced: 11 days ago - Pushed: 11 days ago - Stars: 0 - Forks: 0
ThinamXx/WordFrequency_using_NLTK
In this repository, I have used NLP to determine: What are the most frequent words in Herman Melville's novel Moby Dick and how often do they occur?
Language: HTML - Size: 1.04 MB - Last synced: 12 days ago - Pushed: over 3 years ago - Stars: 1 - Forks: 0
seanpm2001/DroppedText_Corpus
A text corpus collection for the DroppedText language.
Size: 2.14 MB - Last synced: 12 days ago - Pushed: over 1 year ago - Stars: 3 - Forks: 2
german-asr/megs
A merged version of multiple open-source German speech datasets.
Language: Jupyter Notebook - Size: 235 KB - Last synced: 6 days ago - Pushed: 6 days ago - Stars: 28 - Forks: 3
kgjerde/corporaexplorer
An R package for dynamic exploration of text collections
Language: R - Size: 5.89 MB - Last synced: 7 days ago - Pushed: 11 months ago - Stars: 63 - Forks: 4
innerNULL/mia
My Implementations' Archive
Language: Python - Size: 1.6 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0
julienijs/keep_V-ing
The grammaticalization of keep
Language: R - Size: 5 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0
clarin-eric/ParlaMint
ParlaMint: Comparable Parliamentary Corpora
Language: XSLT - Size: 2.1 GB - Last synced: 21 days ago - Pushed: 23 days ago - Stars: 37 - Forks: 50
grammarly/ua-gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Language: Macaulay2 - Size: 18 MB - Last synced: about 16 hours ago - Pushed: 3 months ago - Stars: 254 - Forks: 21
writecrow/crow_backend
The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing
Language: PHP - Size: 2.72 MB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 1 - Forks: 0
KurdishBLARK/KTC
Kurdish Textbooks Corpus
Size: 1.92 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 6 - Forks: 0
crownpku/Small-Chinese-Corpus
Some useful Chinese corpus datasets 中文语料小数据
Size: 92.4 MB - Last synced: 8 days ago - Pushed: about 4 years ago - Stars: 526 - Forks: 161
kunansy/RNC
API for Russian National Corpus
Language: Python - Size: 745 KB - Last synced: 6 days ago - Pushed: 10 months ago - Stars: 7 - Forks: 0
alexeykosh/lingcorpora.py
API for corpora
Language: Python - Size: 146 KB - Last synced: 17 days ago - Pushed: almost 5 years ago - Stars: 8 - Forks: 11
NiuTrans/Classical-Modern
非常全的文言文(古文)-现代文平行语料
Language: Python - Size: 400 MB - Last synced: 18 days ago - Pushed: 18 days ago - Stars: 891 - Forks: 194
adliska/parallel_text_cleaning
Code for my BSc thesis: Cleaning of Parallel Texts for Machine Translation
Language: Java - Size: 15.6 KB - Last synced: 19 days ago - Pushed: about 8 years ago - Stars: 0 - Forks: 0
several27/FakeNewsCorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Size: 442 KB - Last synced: 20 days ago - Pushed: over 4 years ago - Stars: 373 - Forks: 96
philipperemy/japanese-words-to-vectors
Word2vec (word to vectors) approach for Japanese language using Gensim and Mecab.
Language: Python - Size: 333 KB - Last synced: 8 days ago - Pushed: over 2 years ago - Stars: 83 - Forks: 19
agaraman0/Fundamentals-Of-NLP
Natural language Processing
Language: Jupyter Notebook - Size: 34.2 KB - Last synced: 22 days ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 0
chatopera/efaqa-corpus-zh
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Language: Python - Size: 204 KB - Last synced: 22 days ago - Pushed: 4 months ago - Stars: 547 - Forks: 80
christos-c/bible-corpus
A multilingual parallel corpus created from translations of the Bible.
Size: 138 MB - Last synced: 21 days ago - Pushed: about 2 months ago - Stars: 163 - Forks: 45
PenguinCabinet/mama-katu-DM-corpus
The corpus of Japanese spam messages of invitation Mama Katu.
Language: Python - Size: 44.9 KB - Last synced: 22 days ago - Pushed: 12 months ago - Stars: 42 - Forks: 12
744189447/tfidf
A golang library supporting Chinese and English tag extraction, Chinese word segmentation using Jieba, according to the tfidf weight to extract corpus tags, corpus set using BoltDB.
Language: Go - Size: 13.7 KB - Last synced: 22 days ago - Pushed: almost 2 years ago - Stars: 0 - Forks: 0
gambolputty/german-nouns
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Language: Python - Size: 21 MB - Last synced: 20 days ago - Pushed: about 2 months ago - Stars: 127 - Forks: 18
techiaith/corpws-meincnodi-rhannau-ymadrodd
Corpws ar gyfer meincnodi tagwyr rhannau ymadrodd Cymraeg | A corpus for benchmarking Welsh part-of-speech taggers
Size: 104 KB - Last synced: 23 days ago - Pushed: about 2 years ago - Stars: 0 - Forks: 1
sorinmarti/fruechtekorb
This is a text corpus management system for the german linguistic department of the university of Basel.
Language: PHP - Size: 531 KB - Last synced: 23 days ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0
Superar/Puntuguese
Language: Python - Size: 3.61 MB - Last synced: 22 days ago - Pushed: 22 days ago - Stars: 3 - Forks: 0
aquilax/bg-words-dict
Списък с думи на български език.
Language: Shell - Size: 7.82 MB - Last synced: 23 days ago - Pushed: about 3 years ago - Stars: 5 - Forks: 2
NCBI-Hackathons/ClusterDuck
Disease Clustering from Literature Based on Minimal Training Data
Language: Python - Size: 274 KB - Last synced: 23 days ago - Pushed: over 1 year ago - Stars: 7 - Forks: 6
hugovk/everyfinnishword
Every Finnish word
Size: 1.01 MB - Last synced: 8 days ago - Pushed: almost 9 years ago - Stars: 28 - Forks: 1
GateNLP/corpusconversion-tiger
Tool to convert the German Tiger corpus and other corpora in Tiger format to GATE
Language: Groovy - Size: 6.84 KB - Last synced: 24 days ago - Pushed: over 7 years ago - Stars: 0 - Forks: 0
fbkarsdorp/concy
Simple Concordance Tool
Language: Python - Size: 1.95 KB - Last synced: 24 days ago - Pushed: over 6 years ago - Stars: 2 - Forks: 0
tensorlayer/seq2seq-chatbot
Chatbot in 200 lines of code using TensorLayer
Language: Python - Size: 14.7 MB - Last synced: 24 days ago - Pushed: over 2 years ago - Stars: 836 - Forks: 316
linhd-postdata/disco
Diachronic Spanish Sonnet Corpus. Canonical and minor authors in Spanish (Europe and America): 15th to 19th century
Size: 12.2 MB - Last synced: 24 days ago - Pushed: almost 6 years ago - Stars: 4 - Forks: 0
lxs602/Chinese-Mandarin-Dictionaries
中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.
Language: HTML - Size: 7.49 GB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 109 - Forks: 18
adbar/trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Language: Python - Size: 23.2 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 2,688 - Forks: 205
PyThaiNLP/Thai-Lao-Parallel-Corpus
Thai Lao Parallel corpus
Size: 524 KB - Last synced: 8 days ago - Pushed: over 2 years ago - Stars: 5 - Forks: 1
MiMoText/roman18
Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)
Language: HTML - Size: 440 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 17 - Forks: 7
INL/corpus-frontend
BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
Language: TypeScript - Size: 13.3 MB - Last synced: 27 days ago - Pushed: 28 days ago - Stars: 15 - Forks: 6
INL/BlackLab
Linguistic search for large annotated text corpora, based on Apache Lucene
Language: Java - Size: 25.2 MB - Last synced: 28 days ago - Pushed: 29 days ago - Stars: 97 - Forks: 51
AMS21/DLXEmu-Corpus
Corpus storage for DLXEmu fuzzers
Size: 1.74 GB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 0 - Forks: 0
erc-dharma/tfb-daksinakosala-epigraphy
DHARMA project Task Force B, Dakṣiṇa Kosala epigraphic corpus being prepared by Natasja Bosma.
Language: HTML - Size: 1.55 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
flairNLP/fundus
A very simple news crawler with a funny name
Language: Python - Size: 14.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 38 - Forks: 5
CanCLID/canto-filter
粵文語料篩選器 Cantonese text filter
Language: Python - Size: 21.5 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 29 - Forks: 2
zjunlp/IEPile
IEPile: A Large-Scale Information Extraction Corpus
Language: Python - Size: 2.07 MB - Last synced: 28 days ago - Pushed: 29 days ago - Stars: 61 - Forks: 4
lucasjinreal/weibo_terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Language: Python - Size: 162 KB - Last synced: 26 days ago - Pushed: over 4 years ago - Stars: 2,312 - Forks: 460
howl-anderson/MITIE_Chinese_Wikipedia_corpus
Pre-trained Wikipedia corpus by MITIE
Size: 5.86 KB - Last synced: 8 days ago - Pushed: over 5 years ago - Stars: 52 - Forks: 9
unendin/Trump_Campaign_Corpus
Corpus of campaign speeches, interviews, debates, statements and tweets by Donald Trump
Size: 22.1 MB - Last synced: 29 days ago - Pushed: almost 7 years ago - Stars: 14 - Forks: 6
SMIL-SPCRAS/DAVIS
Official repo for "Audio-Visual Speech Recognition In-the-Wild: Multi-Angle Vehicle Cabin Corpus and Attention-based Method" in ICASSP 2024
Language: JavaScript - Size: 5.82 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 0
chatopera/insuranceqa-corpus-zh
:helicopter: 保险行业语料库,聊天机器人
Language: Python - Size: 533 MB - Last synced: 26 days ago - Pushed: 6 months ago - Stars: 989 - Forks: 338
islamAndAi/QURAN-NLP
Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
Language: Jupyter Notebook - Size: 105 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 42 - Forks: 9
gunthercox/chatterbot-corpus
A multilingual dialog corpus
Language: Python - Size: 536 KB - Last synced: 27 days ago - Pushed: 3 months ago - Stars: 1,339 - Forks: 1,149
CBLUEbenchmark/CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Language: Python - Size: 1.61 MB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 667 - Forks: 118
nevenjovanovic/croatiae-auctores-latini-textus
XML texts of Croatian Latin authors (published as CroALa digital collection)
Language: XQuery - Size: 39.4 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 6 - Forks: 4
FLAGlab/SimCorp
This corpus contains different datasets of behaviorally equivalent C/C++ programs to evaluate their semantic similitude. The datasets: 6 Type-4 scenarios extracted from the BigCloneBench 10 programs for sorting, aggregation, and search algorithms 566 programs extracted from CodeForces solving 5 different problems
Language: C++ - Size: 9.06 MB - Last synced: 23 days ago - Pushed: almost 2 years ago - Stars: 2 - Forks: 0
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Size: 308 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 877 - Forks: 78
PlexPt/chatgpt-corpus
ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
Size: 151 MB - Last synced: about 1 month ago - Pushed: 11 months ago - Stars: 722 - Forks: 122
utopiaio/eKeyboard
Make typing Amharic [on mobile] great [again].
Language: JavaScript - Size: 3.39 MB - Last synced: 25 days ago - Pushed: almost 7 years ago - Stars: 14 - Forks: 3
liulalemx/felig-toolkit
A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus indexer and Term weighter.
Language: TypeScript - Size: 7.41 MB - Last synced: 30 days ago - Pushed: 12 months ago - Stars: 28 - Forks: 3
chakki-works/CoARiJ
Corpus of Annual Reports in Japan
Language: Python - Size: 37.1 KB - Last synced: 10 days ago - Pushed: over 3 years ago - Stars: 80 - Forks: 7