subword | Topic | Ecosyste.ms: Repos

Topic: "subword"

scarletcho/KoLM

Korean text normalization and language preparation package for LM in Kaldi-based ASR system

Language: Python - Size: 136 KB - Last synced at: 20 days ago - Pushed at: about 5 years ago - Stars: 60 - Forks: 20

zouharvi/tokenization-scorer

Simple-to-use scoring function for arbitrarily tokenized texts.

Language: Python - Size: 42 KB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 39 - Forks: 4

cooelf/subMrc

Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)

Language: Python - Size: 3.4 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 15 - Forks: 6

andreasgrv/johnny

johnny - a neural network graph based DEPendency Parser

Language: Python - Size: 154 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 1

wang-h/FMDL

Unsupervised Word Segmentation using Minimum Description Length for Neural Machine Translation (NMT)

Language: C++ - Size: 508 KB - Last synced at: about 1 year ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 1

cooelf/subword_seg

Effective Subword Segmentation for Text Comprehension (TASLP 2019)

Language: C++ - Size: 8.14 MB - Last synced at: about 1 month ago - Pushed at: over 6 years ago - Stars: 4 - Forks: 4

explanare/char-iit

A causal intervention framework to learn robust and interpretable character representations inside subword-based language models

Language: Jupyter Notebook - Size: 23.5 MB - Last synced at: 2 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

jluo41/NLPText

Language: Jupyter Notebook - Size: 24.6 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

burcgokden/BERT-Subword-Tokenizer-Wrapper

A framework for generating subword vocabulary from a tensorflow dataset and building custom BERT tokenizer models.

Language: Python - Size: 12.7 KB - Last synced at: about 2 months ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 0

scarletcho/subword-mikolov

An implementation of subword division algorithm proposed in T. Mikolov (2012).

Language: HTML - Size: 33.2 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 1 - Forks: 1

TiMauzi/dawg

The concept of DAWGs is based on: Blumer, A. et al. (1985). The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40, 31–55.

Language: Java - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords. Hence, tokenization can be broadly classified into 3 types – word, character, and subword (n-gram characters) tokenization.

Language: Jupyter Notebook - Size: 60.5 KB - Last synced at: 2 months ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

kkaryl/AI6127-Deep_NLP

This repository contains source code implementation of assignments for NTU's MSAI course AI6127 on Deep Neural Networks for Natural Language Processing (2019 Sem 2).

Language: Jupyter Notebook - Size: 734 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 1

Scitator/subword-nmt Fork of rsennrich/subword-nmt

Subword Neural Machine Translation

Language: Python - Size: 53.7 KB - Last synced at: 4 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos