Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: corpus-linguistics

Repositories

ssciwr/argumentation-management

Annotator combining different NLP pipelines.

Language: Python - Size: 3.68 MB - Last synced: 5 days ago - Pushed: 8 months ago - Stars: 0 - Forks: 1

partigabor/corpus

A corpus and computational linguistic workspace

Language: Jupyter Notebook - Size: 228 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

keeleleek/votic-corpora

Votic language corpora

Language: XQuery - Size: 951 KB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 2 - Forks: 0

fau-klue/docker-corpus-tool

Docker Images for IMS Open Corpus Workbench and UCS Toolkit

Language: Dockerfile - Size: 16.6 KB - Last synced: 9 months ago - Pushed: about 1 year ago - Stars: 4 - Forks: 1

catlism/catlism.github.io

Companion website for "Corpus Approaches to Language in Social Media" - source and build versions

Language: HTML - Size: 35.9 MB - Last synced: 9 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

elmiram/russian_learner_corpus

Russian Learner Corpus, a platform for corpus search and annotation

Language: JavaScript - Size: 3.91 MB - Last synced: 9 months ago - Pushed: over 5 years ago - Stars: 3 - Forks: 3

julienijs/Predictability_of_Complexity

How predictable is linguistic complexity?

Language: R - Size: 2.18 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

MarsPanther/Amharic-English-Machine-Translation-Corpus

Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.

Language: Python - Size: 7.42 MB - Last synced: 10 months ago - Pushed: almost 6 years ago - Stars: 29 - Forks: 20

MarsPanther/crawl-for-parallel-corpora

simple bs4 based web crawl for a corpus in need of statistical machine translation

Language: Python - Size: 5.86 KB - Last synced: 10 months ago - Pushed: almost 3 years ago - Stars: 10 - Forks: 5

Meidad1/Textual_Database_Generator

A tool which automatically produces a database of Khoekhoe language articles, using web scraping tools. This tool is used in a linguistic research project of the Namibian Khoekhoe language and its varieties across the Kalahari Basin area (HUJI).

Language: Python - Size: 15.6 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

gederajeg/anger-in-indonesian

Repository of dataset, R Markdown Notebook with codes, and other ancillary materials for a book chapter on the conceptualisation of ANGER in Indonesian.

Language: TeX - Size: 3.11 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 0 - Forks: 0

annadmitrieva/collocations_thesis

Code for my Master's thesis

Language: Jupyter Notebook - Size: 1.58 MB - Last synced: 10 months ago - Pushed: almost 5 years ago - Stars: 0 - Forks: 0

juletx/corpus-linguistics

Corpus Linguistics slides, labs, assignments and data

Language: R - Size: 35.9 MB - Last synced: 10 months ago - Pushed: about 2 years ago - Stars: 3 - Forks: 0

sfu-discourse-lab/MDA-OnlineComments

Supplementary materials for: Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register (version 1.0)

Language: R - Size: 410 KB - Last synced: 10 months ago - Pushed: over 3 years ago - Stars: 0 - Forks: 0

SpydazWebAI-NLP/BasicCorpus2023

A Basic Corpus Object , Giving Positional Encoding / Decoding . ,A Fully Loaded Corpus = Corpus > Document > Sentences > Clauses > Words

Language: Visual Basic .NET - Size: 2.63 MB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 1 - Forks: 1

chrisdrymon/Treebanks

Treebanks modified from PROIEL and Perseus.

Size: 96.5 MB - Last synced: 10 months ago - Pushed: about 6 years ago - Stars: 0 - Forks: 0

chrisdrymon/CL-Mishmash

Non-Machine Learning Computational Linguistic Projects

Language: Python - Size: 43 KB - Last synced: 10 months ago - Pushed: about 6 years ago - Stars: 3 - Forks: 1

daleman/tesis

Hacia un método computacional para detectar léxico contrastivo

Language: Jupyter Notebook - Size: 300 MB - Last synced: 10 months ago - Pushed: about 6 years ago - Stars: 3 - Forks: 1

dlukes/shiny-mda

A Shiny app for visualizing Multi-Dimensional Analysis results

Language: R - Size: 398 KB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

infraling/atomic

Software for multi-level annotation of linguistic corpora

Language: Java - Size: 9.77 MB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 17 - Forks: 5

ml4ai/nli4wills-corpus

Legal Will Statements for Natural Language Inference

Language: Python - Size: 2.85 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 4 - Forks: 0

jaypmorgan/DPnorm

DPnorm Calculator

Language: Julia - Size: 26.4 KB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Linguista/Frequency-List-Wizard

Frequency List Wizard is a command-line program that does various useful things with... frequency lists.

Language: Perl - Size: 4.95 MB - Last synced: 10 months ago - Pushed: almost 8 years ago - Stars: 2 - Forks: 0

Linguista/FreeLing-es_CL

Linguistic resources for adapting FreeLing to Chilean Spanish

Language: Makefile - Size: 7.99 MB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

Linguista/CQPweb-Instabox

Script that sets up and configures an entire CQPweb server installation

Language: Shell - Size: 731 KB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 10 - Forks: 1

drgriffis/text-essence

Preprocessing and analysis for training SNOMED-CT concept embeddings from CORD-19 corpus

Language: Python - Size: 392 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 12 - Forks: 0

petar-popovic-bg/Jerteh

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

Language: Python - Size: 82 KB - Last synced: 25 days ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0

gisly/evenki-corpus

evenki-corpus

Language: Python - Size: 39.5 MB - Last synced: 10 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

steve3p0/LING576

Portland State University LING 575: CORPUS LINGUISTICS code repo for winter term 2020.

Language: Python - Size: 2.19 MB - Last synced: 10 months ago - Pushed: about 4 years ago - Stars: 0 - Forks: 0

Text-Mining/Ferdowsi-Annotated-Academic-Linguistic-Corpus

دو پیکره زبانی مربوط به مجموعه مقالات دانشگاه فردوسی مشهد

Size: 57.6 MB - Last synced: 10 months ago - Pushed: about 3 years ago - Stars: 1 - Forks: 1

JaviAgua/EsLiPro

Estimations of Linguistic Productivity

Language: Python - Size: 71.3 KB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 2 - Forks: 0

writecrow/corpus_text_processor

A desktop application for preparing files for use in a corpus

Language: Python - Size: 27.4 MB - Last synced: about 2 months ago - Pushed: 5 months ago - Stars: 6 - Forks: 4

MiDiTeS/IntroToRforLinguistics02

R Course for Corpus Linguistics Research

Language: R - Size: 35.6 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 4 - Forks: 0

McTwiszt/string_similarity

Calculates Distance between a cell of a DF and the cell below containing strings. Adds a new column with the distance for each cell. It adds a col called SimSum that enables to see the context above and below of each row with a certain threshold. This facilitates preprocessing of corpus data. Filter SimSum column in a Calc-program by > 0.

Language: R - Size: 1.95 KB - Last synced: 4 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

acoli-repo/conll

ACoLi CoNLL libraries: Several tools for processing, manipulating and transforming TSV formats (CoNLL-RDF, CoNLL-Merge, CQP4RDF)

Size: 74.2 KB - Last synced: 10 months ago - Pushed: over 2 years ago - Stars: 5 - Forks: 1

engisalor/corpusama

A language corpus creation tool for ReliefWeb

Language: Python - Size: 396 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 0 - Forks: 1

jonathandunn/common_crawl_corpus

Scripts for building a geo-located web corpus using Common Crawl data

Language: Python - Size: 238 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 7 - Forks: 0

aviiciii/tamil-word-frequency

Repo that analyses the frequency of words in Tamil

Language: Python - Size: 1.66 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

zelewskap/BA_heuristics

Heuristics and cognitive biases in public discourse on climate changes - lingustic data analysis

Language: Jupyter Notebook - Size: 3.18 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

jonathandunn/corpus_similarity

Measure the similarity of text corpora for 74 languages

Language: Python - Size: 6.08 MB - Last synced: 10 days ago - Pushed: 4 months ago - Stars: 10 - Forks: 3

cognitive-metascience/word_sketch

Open source Python package to produce word sketches inspired by Sketch Engine (to make reproducible analyses)

Language: GLSL - Size: 25.7 MB - Last synced: 11 months ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

apple-fritter/weechat.driftwood

Natively log WeeChat channel and private messages, CTCP, and notices, in the driftwood standard. Written in Python.

Language: Python - Size: 31.3 KB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0

lirondos/discursos-de-navidad

A corpus of the Christmas speeches delivered by the head of state of Spain from 1937 to 2021

Language: HTML - Size: 3.27 MB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 17 - Forks: 4

versotym/phoebeConverter

Converter from PhoEBE (phonetic notation used in Corpus of Czech Verse) to IPA, X-SAMPA, and Czech Phonetic Transcription.

Language: PHP - Size: 9.77 KB - Last synced: 10 months ago - Pushed: over 6 years ago - Stars: 5 - Forks: 0

sinaahmadi/ZazaGoraniCorpus

A corpus for the Zazaki and Gorani languages

Size: 26.3 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 4 - Forks: 0

Aniezka/CAF

Supplementary material for "Correlations between accuracy, complexity, and task type: Learner corpus research"

Language: R - Size: 3.2 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

gederajeg/database-verba-bahasa-indonesia

VerbInd: Pangkalan data verba bahasa Indonesia berbasis korpus

Language: HTML - Size: 29.8 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

angeldollface/crawly.rs

Search for specific words and surrounding text in a dataset of text (files). :scroll: :mag_right:

Language: Rust - Size: 578 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

dohliam/more-stoplists

stoplists for African languages generated from the ASP corpus

Language: Ruby - Size: 91.8 KB - Last synced: 8 months ago - Pushed: over 8 years ago - Stars: 11 - Forks: 3

IgnatiusEzeani/IgnatiusEzeani.github.io

This is the website repo of Dr Ignatius Ezeani

Language: JavaScript - Size: 38.8 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

dlgranadosm/Text-processing-for-annotation

Language: R - Size: 17.6 KB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

upunaprosk/corpora-manipulation Fork of ivantor0/corpora-manipulation

Tool for converting error corpora to parallel datasets

Language: Python - Size: 145 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

elenlefoll/TextbookEnglish

Online Appendix: Data and code from Elen Le Foll's PhD thesis

Language: HTML - Size: 250 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 1 - Forks: 0

wiragotama/TIARA-annotationTool

An Interactive Tool for Annotating Discourse Structure and Text Improvement

Language: JavaScript - Size: 51.5 MB - Last synced: 9 months ago - Pushed: over 2 years ago - Stars: 16 - Forks: 3

MagneticMule/word-lists

An assortment of word-lists and micro dictionaries in English. Especially suited to English language learning tasks.

Size: 135 KB - Last synced: about 1 year ago - Pushed: over 8 years ago - Stars: 1 - Forks: 0

sylvainloiseau/rcqp

R interface for the CQP corpus indexation/query software

Language: C - Size: 2.02 MB - Last synced: 3 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0

sylvainloiseau/interlineaR

Importing into R interlinearized corpora and associated dictionaries.

Language: R - Size: 752 KB - Last synced: 3 months ago - Pushed: over 3 years ago - Stars: 4 - Forks: 2

poethan/AlphaMWE

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

Size: 265 KB - Last synced: 10 months ago - Pushed: about 1 year ago - Stars: 3 - Forks: 2

Garrett-Webb/trumptweets

analyze trump's nonsense, feed in a topic, and generate a new tweet based on a custom corpus.

Language: Python - Size: 6.67 MB - Last synced: about 1 year ago - Pushed: almost 3 years ago - Stars: 6 - Forks: 1

antcont/GeLeCo

A large German Legal Corpus of laws, administrative regulations and court decisions issued in Germany at federal level. Query the corpus: corpora.dipintra.it

Language: Python - Size: 1.49 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

giocoal/word-embedding-italian-literature

Using distibuctional semantics (word2vec family algorithms and the CADE framework) to learn word embeddings from the Italian literary corpuses we generated.

Language: Python - Size: 21.4 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 6 - Forks: 2

garaupere/TextGrid2TXT

Size: 21.5 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

gederajeg/happiness-malay-indonesian

The repository of dataset and R codes for the study of HAPPINESS metaphors in Classical Malay and Indonesian languages (published in Review of Cognitive Linguistics)

Language: HTML - Size: 3.66 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

dohliam/ebook-corpus

Ebook Corpus - A parser and extractor for electronic books

Language: Ruby - Size: 39.1 KB - Last synced: over 1 year ago - Pushed: almost 5 years ago - Stars: 4 - Forks: 0

motazsaad/arabic-hatespeech-data

Arabic hate speech data

Size: 4 MB - Last synced: over 1 year ago - Pushed: almost 4 years ago - Stars: 7 - Forks: 3

CaterinaBi/interrogatives-corpus-work

Paper that Lena Baunaz and I are working on as part of my SNSF-funded 'Focus in diachrony' research project at the University of Cambridge, UK.

Language: Jupyter Notebook - Size: 29.7 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

CaterinaBi/my-academic-work

Papers and books I published during my 7 years of research at the Universities of Geneva and Cambridge.

Size: 1.95 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

CaterinaBi/pypelet

A cooperative project for the creation of an open-source corpus of spoken interactions in Romance.

Size: 3.37 MB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0

Esukhia/derge-kangyur-old 📦

DEPRECATED - replaced by https://github.com/Esukhia/derge-kangyur

Language: Python - Size: 1.2 GB - Last synced: over 1 year ago - Pushed: about 5 years ago - Stars: 5 - Forks: 13

markanewman/linguisticmeasures 📦

Tools for calculating linguistic measures and other useful utilities

Language: Python - Size: 47.9 KB - Last synced: 12 months ago - Pushed: over 4 years ago - Stars: 0 - Forks: 0

digitallinguistics/data-format

The Data Format for Digital Linguistics (DaFoDiL)

Language: JavaScript - Size: 2.57 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 19 - Forks: 0

acoli-repo/powla

Represent any linguistic annotation in RDF, as Linked Data and/or OWL2/DL

Language: Java - Size: 19.4 MB - Last synced: over 1 year ago - Pushed: over 2 years ago - Stars: 1 - Forks: 0

rezonators/rezonateR

rezonateR (say "resonate R") uses R to analyze data created by Rezonator

Language: R - Size: 257 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 4 - Forks: 0

leoalenc/nheengatu

Tools and resources for the computational processing of the Nheengatu language

Language: Grammatical Framework - Size: 80.1 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 0

matbahasa/ETA

Easy Text Annotator

Language: JavaScript - Size: 570 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

mars-aria/ur_not_alone_phrase_finder

For a corpus linguistics project, I created an information retrieval program called "You Are Not Alone". My phrase_finder() function searches for a self-identifying phrase in 4 large classic texts (The Souls of Black Folk, Jane Eyre, The Strange Case of Dr. Jekyll & Mr. Hyde, and Frankenstein). Standpoint: "So Matilda’s strong young mind continued to grow, nurtured by the voices of all those authors who had sent their books out into the world like ships on the sea. These books gave Matilda a hopeful and comforting message: You are not alone.” ~ from Matilda by Roald Dahl 📖

Language: Python - Size: 4.88 KB - Last synced: over 1 year ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0