GitHub topics: tokenize
USSarmy/wink
A tool to Win the week
Size: 1000 Bytes - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

stdlib-js/string-base-format-interpolate
Generate string from a token array by interpolating values.
Language: JavaScript - Size: 536 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

flex-development/fsm-tokenizer
finite state machine tokenizer
Language: TypeScript - Size: 3.22 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 3 - Forks: 0

micromark/micromark
small, safe, and great commonmark (optionally gfm, mdx) compliant markdown parser
Language: JavaScript - Size: 2.02 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1,949 - Forks: 70

syntax-tree/mdast-util-from-markdown
mdast utility to parse markdown
Language: JavaScript - Size: 266 KB - Last synced at: 10 days ago - Pushed at: 2 months ago - Stars: 244 - Forks: 21

privateai/deid-examples
Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.
Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 81 - Forks: 1

winkjs/wink-nlp
Developer friendly Natural Language Processing ✨
Language: JavaScript - Size: 26.8 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 1,267 - Forks: 59

wooorm/markdown-rs
CommonMark compliant markdown parser in Rust with ASTs and extensions
Language: Rust - Size: 2.32 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1,137 - Forks: 59

TI-Toolkit/tivars_lib_py
A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files
Language: Python - Size: 3.61 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 19 - Forks: 1

stdlib-js/string-base-format-tokenize
Tokenize a string into an array of string parts and format identifier objects.
Language: JavaScript - Size: 338 KB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 2 - Forks: 0

winkjs/wink-nlp-utils
NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
Language: JavaScript - Size: 2.98 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 127 - Forks: 11

SkyflowFoundry/bulk_insert_tokenization
Python scripts for tokenizing data - from CSV or Postgres - in the Skyflow Privacy Vault.
Language: Python - Size: 35.2 KB - Last synced at: 24 days ago - Pushed at: 25 days ago - Stars: 0 - Forks: 1

alasdairforsythe/tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Language: Go - Size: 734 KB - Last synced at: 13 days ago - Pushed at: 10 months ago - Stars: 576 - Forks: 21

sina-al/pynlp 📦
A pythonic wrapper for Stanford CoreNLP.
Language: Python - Size: 79.1 KB - Last synced at: 26 days ago - Pushed at: almost 7 years ago - Stars: 108 - Forks: 11

seemueller-io/toak
instantly tokenize a git repository
Language: TypeScript - Size: 759 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

here-be/snapdragon
snapdragon is an extremely pluggable, powerful and easy-to-use parser-renderer factory.
Language: JavaScript - Size: 207 KB - Last synced at: 3 days ago - Pushed at: about 2 years ago - Stars: 226 - Forks: 25

bent10/stopmarkdown
Extracts plain text from Markdown strings. It's useful for Natural Language Processing (NLP) tasks.
Language: TypeScript - Size: 234 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

jonschlinkert/tokenize-comment
Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.
Language: JavaScript - Size: 355 KB - Last synced at: 6 days ago - Pushed at: almost 2 years ago - Stars: 14 - Forks: 6

bent10/attributes-parser
Tokenize and parse attributes string into meaningful tokens and key-value pairs.
Language: TypeScript - Size: 200 KB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 1

SiddiqSoft/string2map
Header only C++17 library to parse a string containing delimited key-value pairs into a map container including the feature to convert from std::string to std::wstring and vice-versa.
Language: C++ - Size: 96.7 KB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

akb89/witokit
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
Language: Python - Size: 47.9 KB - Last synced at: 14 days ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 1

hexydec/htmldoc
A token based HTML Document parser and minifier written in PHP. Extract attribute values and text using CSS selectors.
Language: PHP - Size: 505 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 21 - Forks: 3

myint/untokenize
Transforms tokens into original source code (while preserving whitespace)
Language: Python - Size: 16.6 KB - Last synced at: 17 days ago - Pushed at: almost 6 years ago - Stars: 8 - Forks: 0

jonschlinkert/extract-comments
Extract JavaScript code comments from a string or glob of files.
Language: JavaScript - Size: 390 KB - Last synced at: 5 days ago - Pushed at: over 6 years ago - Stars: 49 - Forks: 10

bent10/stophtml
Extracts plain text from an HTML string. It's useful for Natural Language Processing (NLP) tasks.
Language: TypeScript - Size: 88.9 KB - Last synced at: 2 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

bent10/nomark
Transform hypertext strings (e.g., HTML, Markdown) into plain text for natural language processing (NLP) normalization
Language: TypeScript - Size: 176 KB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

chayanforyou/bkash-pgwclient-demo-flutter
bKash payment gateway integration in flutter
Language: Dart - Size: 34 MB - Last synced at: 30 days ago - Pushed at: 7 months ago - Stars: 41 - Forks: 28

paceaux/methodius
A utility for analyzing text on the web
Language: TypeScript - Size: 889 KB - Last synced at: 24 days ago - Pushed at: 8 months ago - Stars: 5 - Forks: 2

dragonofmercy/Tokenize2 📦
Tokenize2 is a plugin which allows your users to select multiple items from a predefined list or ajax, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook or tags on tumblr.
Language: JavaScript - Size: 322 KB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 83 - Forks: 25

flex-development/docast-util-from-docs
docast utility to parse docblocks
Language: TypeScript - Size: 2.14 MB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

hugohiraoka/Airline_Tweets_Sentiment_Analysis
A Natural Language Processing model to perform Sentiment Analysis of US Airline Customers
Language: HTML - Size: 22.9 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

jonschlinkert/babel-extract-comments
Uses babel to extract JavaScript code comments from a string. Returns an array of comment objects, with line, column, index, comment type and comment string.
Language: JavaScript - Size: 271 KB - Last synced at: 27 days ago - Pushed at: almost 7 years ago - Stars: 14 - Forks: 2

kopcho/Sales180
PKH Events
Size: 14.6 KB - Last synced at: 2 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

cygig/SerialConfigCommand
SerialConfigCommand allows user to issue commands, with or without values via the Serial Monitor easily. Example: "LED=255", "Lock=1", "Start". Compatible with Arduino String() class and character array.
Language: C++ - Size: 480 KB - Last synced at: about 1 year ago - Pushed at: almost 3 years ago - Stars: 2 - Forks: 0

SeanZLiu/NLPLab1
2020Fall NLP Lab1
Language: Python - Size: 44.2 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

BaseMax/go-lexer-token-simple
Simple Go lexer: Lex own syntax and read it's from file.
Language: Go - Size: 17.6 KB - Last synced at: about 24 hours ago - Pushed at: about 4 years ago - Stars: 2 - Forks: 0

fcf-framework/fcf-framework-core
A package of basic functions and classes required for the framework to work
Language: JavaScript - Size: 550 KB - Last synced at: 7 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

begin/parsers-compilers
Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?
Size: 11.7 KB - Last synced at: 7 days ago - Pushed at: about 8 years ago - Stars: 22 - Forks: 1

queckezz/json-tokenize
Splits a JSON string into an annotated list of tokens
Language: JavaScript - Size: 209 KB - Last synced at: about 1 month ago - Pushed at: about 8 years ago - Stars: 5 - Forks: 1

parridhi/Interactive-ChatBot
An interactive chatbot that engages with users in real-time conversations, providing personalized responses. With its advanced Natural Language Processing (NLP) capabilities, it can interpret user queries accurately. Created for websites catering to a small niche.
Language: Python - Size: 32.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

schemed-js/core
Modular TypeScript template engine
Language: TypeScript - Size: 180 KB - Last synced at: 6 days ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 0

spydaz/ClassTokenizer
Basic Tokenizer - Creates tokens - enabling for creation of personal syntax; removal of unwanted characters etc
Language: Visual Basic .NET - Size: 16.6 KB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 2

here-be/snapdragon-scanner
Easily scan a string with an object of regex patterns to produce an array of tokens. ~100 sloc.
Language: JavaScript - Size: 12.7 KB - Last synced at: 24 days ago - Pushed at: over 6 years ago - Stars: 7 - Forks: 1

OpenVoiceOS/quebra_frases
chunks strings into byte sized pieces
Language: Python - Size: 35.2 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 3

carlosplanchon/tokenizesentences
Python3 module to tokenize english sentences.
Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 6 years ago - Stars: 6 - Forks: 0

GruDev325/NFTSwap-frontend
NFTSwaps is a cross-chain and permissionless platform to tokenize NFTs and make them tradable on AMMs such as PancakeSwap or BakerySwap through the NFTSwaps UI.
Language: JavaScript - Size: 61.9 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 4

q-m/tokkens-ruby 📦
Basic text to numbers tokenizer for machine learning
Language: Ruby - Size: 36.1 KB - Last synced at: 23 days ago - Pushed at: over 8 years ago - Stars: 1 - Forks: 0

blinky-z/REPL
A Read–Eval–Print Loop (REPL) and Better Bash Lang & Compiler written in C++
Language: C++ - Size: 344 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 0

fooSynaptic/transfromer_NN_Block
Implemented transformer NN block for Machine translation, text classfication, Natural language inference as well as Machine reading comprehension model.
Language: Python - Size: 1.32 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 0

rtflynn/NLP-Sentiment
Sentiment analysis for amazon product reviews using NLTK, Scikit-Learn, and Keras. Using hyperparameter search and LSTM, our best model achieves ~96% accuracy.
Language: Python - Size: 101 KB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 6 - Forks: 5

asmeurer/brown-water-python
More detailed documentation for the Python tokenize module
Size: 13.2 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 9 - Forks: 1

YongWookHa/kor-text-preprocess
Korean text data preprocess toolkit for NLP
Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 16 - Forks: 2

turkeruzun/twitter-data-analysis-nlp
Analyze Tweets Sentiment Using NLP
Language: Jupyter Notebook - Size: 2.81 MB - Last synced at: 12 months ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

tcanich/stringmod
Fortran 2008 string library
Language: Fortran - Size: 9.77 KB - Last synced at: 28 days ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

kelvng/Natural-Language-Processing
Language: Jupyter Notebook - Size: 123 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 2

linguistic-dev/n-gram-extractor
A PHP Library to extract n-grams from a text. Simple preprocessing tools (cleaning, tokenizing) included.
Language: PHP - Size: 28.3 KB - Last synced at: 2 days ago - Pushed at: over 7 years ago - Stars: 3 - Forks: 0

cNiev/W-tfis-ART
Create a practical definition of art (very few words) and work for its universal acceptance.
Size: 2.93 KB - Last synced at: 5 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

prabormukherjee/Language_translator
Language translation eng to french
Language: Jupyter Notebook - Size: 19.1 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 1

rryi/Tokens.jl
parse text into tokens, build memory-efficient token lists and trees,
Language: Julia - Size: 1.89 MB - Last synced at: 3 months ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

hexacta/tokenizer
Split text into tokens.
Language: JavaScript - Size: 34.2 KB - Last synced at: 9 months ago - Pushed at: about 8 years ago - Stars: 2 - Forks: 0

poyo46/jadoc
Tokenizes Japanese documents to enable CRUD operations.
Language: Python - Size: 105 KB - Last synced at: about 18 hours ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

aminya/TokenizeMeta.jl
Utilities for evaluated Tokens
Language: Julia - Size: 11.7 KB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

lensvol/tokelor
Visualize Python token stream produced by tokenize module.
Language: Python - Size: 192 KB - Last synced at: 16 days ago - Pushed at: about 5 years ago - Stars: 1 - Forks: 0

hamedzarei/nlp-simple-punctuation-correction
simple regex for correcting punctuations
Language: Python - Size: 9.77 KB - Last synced at: 3 months ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 0

xpepermint/ngramablejs
String ngram splitter.
Language: TypeScript - Size: 291 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 2 - Forks: 1

sigdev2/lazy_py
Lazy calculations for Python based on iterators
Language: Python - Size: 351 KB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 0 - Forks: 0

mugendi/wordize
Language: JavaScript - Size: 5.86 KB - Last synced at: 18 days ago - Pushed at: over 7 years ago - Stars: 1 - Forks: 0

here-be/snapdragon-location
Adds a location object to snapdragon token or AST node.
Language: JavaScript - Size: 25.4 KB - Last synced at: 7 days ago - Pushed at: over 7 years ago - Stars: 2 - Forks: 2

here-be/snapdragon-stack
Snapdragon utility for creating a stack.
Language: JavaScript - Size: 18.6 KB - Last synced at: 24 days ago - Pushed at: over 7 years ago - Stars: 4 - Forks: 2

icai/token-sort
Token(ize) Sort, also support weight sort
Language: JavaScript - Size: 73.2 KB - Last synced at: 8 days ago - Pushed at: almost 5 years ago - Stars: 1 - Forks: 0

mk60991/NLP-tutorial
Language: Python - Size: 321 KB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 0 - Forks: 0

here-be/snapdragon-token
Create a snapdragon token. Used by the snapdragon lexer, but can also be used by plugins.
Language: JavaScript - Size: 23.4 KB - Last synced at: about 3 hours ago - Pushed at: about 7 years ago - Stars: 2 - Forks: 1

clitetailor/tokenize-monster
Yet another powerful tokenizer in js.
Language: JavaScript - Size: 101 KB - Last synced at: 9 days ago - Pushed at: almost 8 years ago - Stars: 0 - Forks: 0
