An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: tokenize

USSarmy/wink

A tool to Win the week

Size: 1000 Bytes - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

stdlib-js/string-base-format-interpolate

Generate string from a token array by interpolating values.

Language: JavaScript - Size: 536 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

flex-development/fsm-tokenizer

finite state machine tokenizer

Language: TypeScript - Size: 3.22 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0

micromark/micromark

small, safe, and great commonmark (optionally gfm, mdx) compliant markdown parser

Language: JavaScript - Size: 2.02 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1,949 - Forks: 70

syntax-tree/mdast-util-from-markdown

mdast utility to parse markdown

Language: JavaScript - Size: 266 KB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 244 - Forks: 21

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 81 - Forks: 1

winkjs/wink-nlp

Developer friendly Natural Language Processing ✨

Language: JavaScript - Size: 26.8 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 1,267 - Forks: 59

wooorm/markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions

Language: Rust - Size: 2.32 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1,137 - Forks: 59

TI-Toolkit/tivars_lib_py

A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files

Language: Python - Size: 3.61 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 19 - Forks: 1

stdlib-js/string-base-format-tokenize

Tokenize a string into an array of string parts and format identifier objects.

Language: JavaScript - Size: 338 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 2 - Forks: 0

winkjs/wink-nlp-utils

NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.

Language: JavaScript - Size: 2.98 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 127 - Forks: 11

SkyflowFoundry/bulk_insert_tokenization

Python scripts for tokenizing data - from CSV or Postgres - in the Skyflow Privacy Vault.

Language: Python - Size: 35.2 KB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 1

alasdairforsythe/tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Language: Go - Size: 734 KB - Last synced at: 13 days ago - Pushed at: 10 months ago - Stars: 576 - Forks: 21

sina-al/pynlp 📦

A pythonic wrapper for Stanford CoreNLP.

Language: Python - Size: 79.1 KB - Last synced at: 26 days ago - Pushed at: almost 7 years ago - Stars: 108 - Forks: 11

seemueller-io/toak

instantly tokenize a git repository

Language: TypeScript - Size: 759 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

here-be/snapdragon

snapdragon is an extremely pluggable, powerful and easy-to-use parser-renderer factory.

Language: JavaScript - Size: 207 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 226 - Forks: 25

bent10/stopmarkdown

Extracts plain text from Markdown strings. It's useful for Natural Language Processing (NLP) tasks.

Language: TypeScript - Size: 234 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

jonschlinkert/tokenize-comment

Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.

Language: JavaScript - Size: 355 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 14 - Forks: 6

bent10/attributes-parser

Tokenize and parse attributes string into meaningful tokens and key-value pairs.

Language: TypeScript - Size: 200 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

SiddiqSoft/string2map

Header only C++17 library to parse a string containing delimited key-value pairs into a map container including the feature to convert from std::string to std::wstring and vice-versa.

Language: C++ - Size: 96.7 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

akb89/witokit

A Python toolkit to generate a tokenized dump of Wikipedia for NLP

Language: Python - Size: 47.9 KB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 1

hexydec/htmldoc

A token based HTML Document parser and minifier written in PHP. Extract attribute values and text using CSS selectors.

Language: PHP - Size: 505 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 21 - Forks: 3

myint/untokenize

Transforms tokens into original source code (while preserving whitespace)

Language: Python - Size: 16.6 KB - Last synced at: 17 days ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 0

jonschlinkert/extract-comments

Extract JavaScript code comments from a string or glob of files.

Language: JavaScript - Size: 390 KB - Last synced at: 5 days ago - Pushed at: over 6 years ago - Stars: 49 - Forks: 10

bent10/stophtml

Extracts plain text from an HTML string. It's useful for Natural Language Processing (NLP) tasks.

Language: TypeScript - Size: 88.9 KB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

bent10/nomark

Transform hypertext strings (e.g., HTML, Markdown) into plain text for natural language processing (NLP) normalization

Language: TypeScript - Size: 176 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

chayanforyou/bkash-pgwclient-demo-flutter

bKash payment gateway integration in flutter

Language: Dart - Size: 34 MB - Last synced at: 30 days ago - Pushed at: 7 months ago - Stars: 41 - Forks: 28

paceaux/methodius

A utility for analyzing text on the web

Language: TypeScript - Size: 889 KB - Last synced at: 24 days ago - Pushed at: 8 months ago - Stars: 5 - Forks: 2

dragonofmercy/Tokenize2 📦

Tokenize2 is a plugin which allows your users to select multiple items from a predefined list or ajax, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook or tags on tumblr.

Language: JavaScript - Size: 322 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 25

flex-development/docast-util-from-docs

docast utility to parse docblocks

Language: TypeScript - Size: 2.14 MB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

hugohiraoka/Airline_Tweets_Sentiment_Analysis

A Natural Language Processing model to perform Sentiment Analysis of US Airline Customers

Language: HTML - Size: 22.9 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jonschlinkert/babel-extract-comments

Uses babel to extract JavaScript code comments from a string. Returns an array of comment objects, with line, column, index, comment type and comment string.

Language: JavaScript - Size: 271 KB - Last synced at: 27 days ago - Pushed at: almost 7 years ago - Stars: 14 - Forks: 2

kopcho/Sales180

PKH Events

Size: 14.6 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

cygig/SerialConfigCommand

SerialConfigCommand allows user to issue commands, with or without values via the Serial Monitor easily. Example: "LED=255", "Lock=1", "Start". Compatible with Arduino String() class and character array.

Language: C++ - Size: 480 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

SeanZLiu/NLPLab1

2020Fall NLP Lab1

Language: Python - Size: 44.2 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

BaseMax/go-lexer-token-simple

Simple Go lexer: Lex own syntax and read it's from file.

Language: Go - Size: 17.6 KB - Last synced at: about 24 hours ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

fcf-framework/fcf-framework-core

A package of basic functions and classes required for the framework to work

Language: JavaScript - Size: 550 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

begin/parsers-compilers

Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?

Size: 11.7 KB - Last synced at: 7 days ago - Pushed at: about 8 years ago - Stars: 22 - Forks: 1

queckezz/json-tokenize

Splits a JSON string into an annotated list of tokens

Language: JavaScript - Size: 209 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 1

parridhi/Interactive-ChatBot

An interactive chatbot that engages with users in real-time conversations, providing personalized responses. With its advanced Natural Language Processing (NLP) capabilities, it can interpret user queries accurately. Created for websites catering to a small niche.

Language: Python - Size: 32.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

schemed-js/core

Modular TypeScript template engine

Language: TypeScript - Size: 180 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

spydaz/ClassTokenizer

Basic Tokenizer - Creates tokens - enabling for creation of personal syntax; removal of unwanted characters etc

Language: Visual Basic .NET - Size: 16.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 2

here-be/snapdragon-scanner

Easily scan a string with an object of regex patterns to produce an array of tokens. ~100 sloc.

Language: JavaScript - Size: 12.7 KB - Last synced at: 24 days ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 1

OpenVoiceOS/quebra_frases

chunks strings into byte sized pieces

Language: Python - Size: 35.2 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 3

carlosplanchon/tokenizesentences

Python3 module to tokenize english sentences.

Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 0

GruDev325/NFTSwap-frontend

NFTSwaps is a cross-chain and permissionless platform to tokenize NFTs and make them tradable on AMMs such as PancakeSwap or BakerySwap through the NFTSwaps UI.

Language: JavaScript - Size: 61.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 4

q-m/tokkens-ruby 📦

Basic text to numbers tokenizer for machine learning

Language: Ruby - Size: 36.1 KB - Last synced at: 23 days ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 0

blinky-z/REPL

A Read–Eval–Print Loop (REPL) and Better Bash Lang & Compiler written in C++

Language: C++ - Size: 344 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

fooSynaptic/transfromer_NN_Block

Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading comprehension model.

Language: Python - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 0

rtflynn/NLP-Sentiment

Sentiment analysis for amazon product reviews using NLTK, Scikit-Learn, and Keras. Using hyperparameter search and LSTM, our best model achieves ~96% accuracy.

Language: Python - Size: 101 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 5

asmeurer/brown-water-python

More detailed documentation for the Python tokenize module

Size: 13.2 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

YongWookHa/kor-text-preprocess

Korean text data preprocess toolkit for NLP

Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 16 - Forks: 2

turkeruzun/twitter-data-analysis-nlp

Analyze Tweets Sentiment Using NLP

Language: Jupyter Notebook - Size: 2.81 MB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

tcanich/stringmod

Fortran 2008 string library

Language: Fortran - Size: 9.77 KB - Last synced at: 28 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kelvng/Natural-Language-Processing

Language: Jupyter Notebook - Size: 123 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 2

linguistic-dev/n-gram-extractor

A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.

Language: PHP - Size: 28.3 KB - Last synced at: 2 days ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

cNiev/W-tfis-ART

Create a practical definition of art (very few words) and work for its universal acceptance.

Size: 2.93 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

prabormukherjee/Language_translator

Language translation eng to french

Language: Jupyter Notebook - Size: 19.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

rryi/Tokens.jl

parse text into tokens, build memory-efficient token lists and trees,

Language: Julia - Size: 1.89 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

hexacta/tokenizer

Split text into tokens.

Language: JavaScript - Size: 34.2 KB - Last synced at: 9 months ago - Pushed at: about 8 years ago - Stars: 2 - Forks: 0

poyo46/jadoc

Tokenizes Japanese documents to enable CRUD operations.

Language: Python - Size: 105 KB - Last synced at: about 18 hours ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

aminya/TokenizeMeta.jl

Utilities for evaluated Tokens

Language: Julia - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

lensvol/tokelor

Visualize Python token stream produced by tokenize module.

Language: Python - Size: 192 KB - Last synced at: 16 days ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

hamedzarei/nlp-simple-punctuation-correction

simple regex for correcting punctuations

Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

xpepermint/ngramablejs

String ngram splitter.

Language: TypeScript - Size: 291 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

sigdev2/lazy_py

Lazy calculations for Python based on iterators

Language: Python - Size: 351 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

mugendi/wordize

Language: JavaScript - Size: 5.86 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

here-be/snapdragon-location

Adds a location object to snapdragon token or AST node.

Language: JavaScript - Size: 25.4 KB - Last synced at: 7 days ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

here-be/snapdragon-stack

Snapdragon utility for creating a stack.

Language: JavaScript - Size: 18.6 KB - Last synced at: 24 days ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 2

icai/token-sort

Token(ize) Sort, also support weight sort

Language: JavaScript - Size: 73.2 KB - Last synced at: 8 days ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

mk60991/NLP-tutorial

Language: Python - Size: 321 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

here-be/snapdragon-token

Create a snapdragon token. Used by the snapdragon lexer, but can also be used by plugins.

Language: JavaScript - Size: 23.4 KB - Last synced at: about 3 hours ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

clitetailor/tokenize-monster

Yet another powerful tokenizer in js.

Language: JavaScript - Size: 101 KB - Last synced at: 9 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0