GitHub topics: linguistic-analysis

Repositories

sillsdev/FieldWorks

FieldWorks is a suite of software tools for language and cultural data, with support for complex scripts.

Language: C# - Size: 1010 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 93 - Forks: 35

matthias-stemmler/annimate

Your Friendly ANNIS Match Exporter

Language: TypeScript - Size: 5.69 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 11 - Forks: 0

livingtongues/living-dictionaries

Speeding the availability of language resources for endangered languages. Tools such as this have the power to shift how we think about endangered languages. Rather than perceiving them as being antiquated, difficult to learn and on the brink of vanishing, we see them as modern, easily accessible for learning online in text and audio formats.

Language: TypeScript - Size: 19 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 37 - Forks: 2

korpling/graphANNIS

This is a new backend implementation of the ANNIS linguistic search and visualization system.

Language: Rust - Size: 15.4 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 17 - Forks: 1

milosen/arc

ARC: A tool for creating artificial languages with rhythmicity control

Language: Jupyter Notebook - Size: 11 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 2 - Forks: 0

DmitryRyumin/INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

Size: 11.4 MB - Last synced at: 7 days ago - Pushed at: 5 months ago - Stars: 669 - Forks: 42

rcverse/another-noun-phrase-extractor

Extract complete noun phrases with structural output and customisation

Language: Python - Size: 1.4 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

audreycs/ImpScore

A repository for paper ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences accepted to ICLR 2025.

Language: Python - Size: 5.02 MB - Last synced at: 18 days ago - Pushed at: 3 months ago - Stars: 7 - Forks: 0

NEU-DSG/dailp-encoding

Digital Archive of American Indian Languages Preservation and Perseverance

Language: TypeScript - Size: 9.81 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 21 - Forks: 3

hoangsonww/Amazon-Reviews-Analysis

🧐 This project analyzes Amazon Fine Food Reviews to investigate whether negative reviews are more emotionally intense and lexically repetitive than positive ones. Using R, we apply sentiment analysis and lexical diversity metrics to uncover patterns in consumer review language.

Language: R - Size: 209 KB - Last synced at: 25 days ago - Pushed at: 27 days ago - Stars: 17 - Forks: 12

public-law/readability

How readable is your text? Provide a text input and get its grade level. Validated against the source data.

Language: Python - Size: 92.8 KB - Last synced at: about 14 hours ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

azagniotov/language-detection

This is a refined and re-implemented version of the archived plugin for ElasticSearch elasticsearch-langdetect, which itself builds upon the original work by Nakatani Shuyo, found at https://github.com/shuyo/language-detection. The aforementioned implementation by Nakatani Shuyo serves as the default language detection component within Apache Solr.

Language: Java - Size: 18.2 MB - Last synced at: 1 day ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

kivanc57/hapax_analysis

This project processes text files to identify hapax legomena (words that appear only once) and saves the results in an Excel file. It uses tokenization, optional lemmatization, and frequency analysis to extract and list these rare words.

Language: Go - Size: 140 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

THU-KEG/ChatLog

⏳ ChatLog: Recording and Analysing ChatGPT Across Time

Language: Jupyter Notebook - Size: 6.17 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 98 - Forks: 3

jcvasquezc/phonet

Keras-based python framework to compute phonological posterior probabilities from audio files

Language: Python - Size: 23 MB - Last synced at: 11 days ago - Pushed at: over 2 years ago - Stars: 43 - Forks: 18

mmmaurer/elfen

A python package to efficiently extract linguistic features for text/NLP datasets

Language: Python - Size: 5.69 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

AlaaAlzahrani/Jiwar

Jiwar: A calculator for orthographic, phonological and phonographic neighborhood measures. Supports 40+ languages.

Language: Python - Size: 120 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

LSYS/LexicalRichness

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

Language: Python - Size: 3.46 MB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 106 - Forks: 20

julienijs/Predictability-of-language-change

How predictable is language change?

Language: R - Size: 8.62 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

sociocom/limco

limco: a linguistic measure collection

Language: Python - Size: 1.17 MB - Last synced at: 11 days ago - Pushed at: 15 days ago - Stars: 2 - Forks: 3

brucewlee/lingfeat

[EMNLP 2021] LingFeat - A Comprehensive Linguistic Features Extraction ToolKit for Readability Assessment

Language: Python - Size: 56.9 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 127 - Forks: 16

michal-owsiak/swps-university-research-part-II

Python-based linguistic analysis project including natural language processing (NLP) techniques.

Language: Jupyter Notebook - Size: 12.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

arjo129/LangCluster

A visuallization for cognates in various languages and how they spread

Language: Python - Size: 363 KB - Last synced at: about 2 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 2

fidelisrafael/esperanto-analyzer

Morphological and syntactic analysis of Esperanto sentences

Language: Python - Size: 209 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 32 - Forks: 1

Halvani/Constituent-Treelib

A lightweight Python library for constructing, processing, and visualizing constituent trees.

Language: Jupyter Notebook - Size: 2.67 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 67 - Forks: 12

nykolai-d/concreteness-score-of-word

This code takes an English word as input and returns its concreteness score and position of speech, using as reference the Brysbaert, M. et. al. (2014) concreteness ratings for 40 thousand generally known English word lemmas.

Language: Jupyter Notebook - Size: 1.36 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Abe-Alefew/LexiLink

The aim of this mini-project is to to analyze the text and phonemic similarities between the Afan Oromo and Somali languages by examining word frequency, overlap, and phonemic distribution.

Language: Python - Size: 75.2 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

nikisetti01/Hadoop-MapReduce-LetterFrequency-Analysis

Simple example of Hadoop Application count letter, with an intersting Romance Language Analysis

Language: Jupyter Notebook - Size: 2.71 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 2 - Forks: 2

kivanc57/RQuests

The RQuest project uses R to analyze textual data, focusing on tasks like calculating word lengths, comparing languages, and extracting linguistic features with udpipe. It includes statistical methods, visualizations, and stochastic simulations, showcasing diverse approaches to text modeling.

Language: R - Size: 6.57 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

ashithapallath/Name-Nationality-Classifier-Using-DeepLearning

This project implements a deep learning-based classifier to identify whether a name is Indian or Non-Indian. By leveraging advanced neural networks to analyze name patterns, the classifier offers accurate predictions, with applications in demographic studies, personalized services, and more.

Language: Jupyter Notebook - Size: 15.6 KB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

LanguageMachines/foliatest

Test suite for libfolia

Language: C++ - Size: 847 KB - Last synced at: 10 days ago - Pushed at: 6 months ago - Stars: 0 - Forks: 2

unrealtecellp/life

Linguistic Field Data Management and Analysis System [LiFE]

Language: Python - Size: 295 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 1

radu-macphee96/Star-Trek-Coding-Script

Star Trek: Exolinguistic Comprehensive Translation Matrix

Size: 5.86 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Sl1mb0/tran-scraper

project aimed at cleaning, scraping, and analyzing question type and frequency of linguistic transcripts.

Language: Shell - Size: 12.4 MB - Last synced at: 10 months ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 0

Deeptiman/php-dom-parser-translation-tool

A Simple DOM Parser and Translation Tool using PHP, HTML, and MySQL. The translation model is supported for English to Odia language. There is a built in dictionary to support the translation.

Language: PHP - Size: 4.62 MB - Last synced at: 2 months ago - Pushed at: about 4 years ago - Stars: 4 - Forks: 1

TALP-UPC/saga

SAGA - Phonetic transcription software for all Spanish variants.

Language: C - Size: 466 KB - Last synced at: 6 months ago - Pushed at: over 4 years ago - Stars: 12 - Forks: 6

AndreasBlombach/Possessiver_Dativ

Daten und Analysen zum possessiven Dativ

Language: HTML - Size: 18.5 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

SondreWold/lexical_complexity_estimation

Code related to the LREC-COLING 2024 paper "Estimating Lexical Complexity from Document-Level Distributions"

Language: Python - Size: 970 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

Vivek-Tate/Language-Model

Language Model project is a Java-based language and N-Gram model developed for the COM6516 module. It predicts up to two words based on a single word input and provides detailed text analysis statistics. Demonstrating advanced object-oriented programming and design principles, it is a valuable tool for predictive text input and linguistic analysis.

Language: Java - Size: 6.48 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

nickduran/align-linguistic-alignment

Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora.

Language: Python - Size: 54.8 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 38 - Forks: 11

0ldriku/CAF-Annotator

Audio annotation tool designed for second language acquisition (SLA) researchers

Language: Jupyter Notebook - Size: 55.4 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

sadielbartholomew/cf-standard-names-linguistics

Lexical & semantic analysis of the CF Conventions Standard Names

Language: Python - Size: 51.2 MB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 0

Anastassssiia/Surname-analysis

Наш проект направлен на изучение фамилий болгаро-гагаузского происхождения. Пользователи смогут проанализировать свои фамилии и больше узнать о своей идентичности. Кроме того, инструмент позволит исследователям изучать целый пул фамилий одновременно.

Language: Jupyter Notebook - Size: 26.4 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

Itabashi-don/Shiina

板橋在住の女子高生、しいちゃんですっ( ˙꒳˙ )

Language: JavaScript - Size: 4.25 MB - Last synced at: 8 days ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 0

julienijs/Linguistic-complexity

Measuring linguistic complexity through information theory

Language: Python - Size: 5.31 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

AtharvaKatre/Numbers-Prophecy

An experiment to demonstrate the biases and predictability of our world.

Language: Python - Size: 5.06 MB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 2

spottolaq/corpus-spotted-2020

This repository houses a comprehensive collection of 14,701 Instagram posts authored by Italian university students between January 2020 and December 2020. These posts offer invaluable insights into the experiences and reflections of students during the challenging period of the COVID-19 lockdown in Italy.

Size: 16.6 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

EvgeniaViskovatykh/Quantitative-analysis-of-semantic-shift

Language: Jupyter Notebook - Size: 14.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

sorinmarti/textanalyzer

Java Software to analyze text files.

Language: Java - Size: 268 KB - Last synced at: about 1 year ago - Pushed at: over 7 years ago - Stars: 0 - Forks: 0

ggeraldina/nominative_field_v2.0

Построение номинативного поля концепта (2017-2018г)

Language: HTML - Size: 16.8 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

morganlee123/2GPTEmpathicDialogues

Code, analyses, and data for 'A Linguistic Comparison between Human and ChatGPT-Generated Conversations'

Language: Python - Size: 9.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

katreparitosh/Discourse-Analytics-of-Political-Speech-Transcripts

Political Discourse Analysis (PDA) of Political Speech Transcripts using Natural Language Processing (NLP)

Language: Jupyter Notebook - Size: 22.7 MB - Last synced at: about 1 year ago - Pushed at: about 4 years ago - Stars: 15 - Forks: 1

kingsdigitallab/dral-django

Distant Reading across Languages

Language: HTML - Size: 21.6 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 1

alschmut/code2semantics

Parse software-code for semantic identifier names

Language: Python - Size: 742 KB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Ghozayel/Lextale

The LexTALE-package calculates the % correctAv score for the LexTALE-test, English, German and Dutch versions.

Language: R - Size: 204 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 4 - Forks: 4

CoCoLabErica/LIWC2015

A program built on LIWCalike and quanteda to produce LIWC2015 results

Language: R - Size: 12.7 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

amadeusferro/English-language-reading-assistant

Introducing an English reading assistant—a web application using NLP to enhance understanding of English documents. It allows users to upload English PDFs, employing NLP to highlight recurring words and their definitions.

Language: Python - Size: 327 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

jtanwk/nytcrossword

An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.

Language: HTML - Size: 7.43 MB - Last synced at: 7 months ago - Pushed at: over 6 years ago - Stars: 122 - Forks: 8

Halvani/TextUnitLib

A Python library that allows easy extraction of a variety of text units within texts...

Language: Python - Size: 31.3 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

phughesmcr/LIWCjs-Dictionary

Parse and manipulate multiple LIWC dictionary files.

Language: TypeScript - Size: 172 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 1 - Forks: 1

kthomas4031/Author-Detector

Detects the author based on linguistic signatures

Language: Java - Size: 1.24 MB - Last synced at: over 1 year ago - Pushed at: over 6 years ago - Stars: 0 - Forks: 0

robert1ridley/linguisticBias

This is the code and data used to produce the results from the EMNLP 2023 paper Addressing Linguistic Bias through a Contrastive Analysis of Academic Writing in the NLP Domain.

Language: Python - Size: 23 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

zyocum/phoible-notebook

Exploratory notebook for inspecting the PHOIBLE data set.

Language: Jupyter Notebook - Size: 322 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

parthNJ/Research---Teenage-development-observed-via-their-twitter-tweets

The following is a research paper with the hypothesis to study whether teenage development can also be observed via their twitter tweets. Using a dataset of teenagers from twitter I was able to confirm my study that as we develop as humans our development is also found on our social media via linguistic aspects such as spelling, maturity, bad word usage, acronym usage, and more. Please see the finalpaper for more detailed explanation. The paper is also soon to be published.

Language: Python - Size: 755 KB - Last synced at: over 1 year ago - Pushed at: about 6 years ago - Stars: 3 - Forks: 0