GitHub topics: tokenize

Repositories

USSarmy/wink

A tool to Win the week

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 0 - Forks: 0

flex-development/fsm-tokenizer

finite state machine tokenizer

Language: TypeScript - Size: 2.2 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

micromark/micromark

small, safe, and great commonmark (optionally gfm, mdx) compliant markdown parser

Language: JavaScript - Size: 2.02 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 2,069 - Forks: 74

geoffsee/repo-tokenizer

it's a tool, it's a library, it's regular expressions!

Language: TypeScript - Size: 2.92 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 0 - Forks: 0

syntax-tree/mdast-util-from-markdown

mdast utility to parse markdown

Language: JavaScript - Size: 266 KB - Last synced at: 6 days ago - Pushed at: 8 months ago - Stars: 268 - Forks: 24

sina-al/pynlp 📦

A pythonic wrapper for Stanford CoreNLP.

Language: Python - Size: 85 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 108 - Forks: 11

bent10/stopmarkdown

Extracts plain text from Markdown strings. It's useful for Natural Language Processing (NLP) tasks.

Language: TypeScript - Size: 191 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

wooorm/markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions

Language: Rust - Size: 2.32 MB - Last synced at: 23 days ago - Pushed at: 6 months ago - Stars: 1,334 - Forks: 70

hexydec/htmldoc

A token based HTML Document parser and minifier written in PHP. Extract attribute values and text using CSS selectors.

Language: PHP - Size: 534 KB - Last synced at: 5 days ago - Pushed at: 24 days ago - Stars: 24 - Forks: 4

TI-Toolkit/tivars_lib_py

A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files

Language: Python - Size: 3.94 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 21 - Forks: 1

privateai/deid-examples

Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

Language: Jupyter Notebook - Size: 37.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 84 - Forks: 1

here-be/snapdragon

snapdragon is an extremely pluggable, powerful and easy-to-use parser-renderer factory.

Language: JavaScript - Size: 207 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 225 - Forks: 25

here-be/snapdragon-scanner

Easily scan a string with an object of regex patterns to produce an array of tokens. ~100 sloc.

Language: JavaScript - Size: 12.7 KB - Last synced at: 10 days ago - Pushed at: almost 7 years ago - Stars: 8 - Forks: 1

paceaux/methodius

A utility for analyzing text on the web

Language: TypeScript - Size: 362 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 5 - Forks: 2

alasdairforsythe/tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

Language: Go - Size: 734 KB - Last synced at: about 2 months ago - Pushed at: over 1 year ago - Stars: 600 - Forks: 20

winkjs/wink-nlp-utils

NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.

Language: JavaScript - Size: 2.98 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 132 - Forks: 11

winkjs/wink-nlp

Developer friendly Natural Language Processing ✨

Language: JavaScript - Size: 27 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 1,298 - Forks: 61

Tokenize2 is a plugin which allows your users to select multiple items from a predefined list or ajax, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook or tags on tumblr.

Language: JavaScript - Size: 322 KB - Last synced at: 11 days ago - Pushed at: almost 3 years ago - Stars: 84 - Forks: 24

bent10/nomark

Transform hypertext strings (e.g., HTML, Markdown) into plain text for natural language processing (NLP) normalization

Language: TypeScript - Size: 95.7 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

bent10/stophtml

Extracts plain text from an HTML string. It's useful for Natural Language Processing (NLP) tasks.

Language: TypeScript - Size: 96.7 KB - Last synced at: 11 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

bent10/attributes-parser

Tokenize and parse attributes string into meaningful tokens and key-value pairs.

Language: TypeScript - Size: 208 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 1

stdlib-js/string-base-format-interpolate

Generate string from a token array by interpolating values.

Language: JavaScript - Size: 549 KB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

fcf-framework/fcf-framework-core

A package of basic functions and classes required for the framework to work

Language: JavaScript - Size: 552 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

jonschlinkert/extract-comments

Extract JavaScript code comments from a string or glob of files.

Language: JavaScript - Size: 390 KB - Last synced at: about 1 month ago - Pushed at: almost 7 years ago - Stars: 50 - Forks: 10

stdlib-js/string-base-format-tokenize

Tokenize a string into an array of string parts and format identifier objects.

Language: JavaScript - Size: 342 KB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

chayanforyou/bkash-pgwclient-demo-flutter

bKash payment gateway integration in flutter

Language: Dart - Size: 34 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 28

SkyflowFoundry/bulk_insert_tokenization

Python scripts for tokenizing data - from CSV or Postgres - in the Skyflow Privacy Vault.

Language: Python - Size: 39.1 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

fundiprotocol/FundiFactoryFoundryStarterKit

Build and tokenize your own smart contract factory using Fundi, Openzeppelin, and Chainlink contracts with Foundry framework on Etherum/Base Sepolia

Language: Solidity - Size: 4.56 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

regexhq/quoted-string-regex

JavaScript regular expression for matching a quoted string literal.

Language: JavaScript - Size: 8.79 KB - Last synced at: 6 days ago - Pushed at: almost 8 years ago - Stars: 6 - Forks: 0

jonschlinkert/tokenize-comment

Uses snapdragon to tokenize a single JavaScript block comment into an object, with description, tags, and code example sections that can be passed to any other comment parsers for further parsing.

Language: JavaScript - Size: 355 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 14 - Forks: 6

SiddiqSoft/string2map

Header only C++17 library to parse a string containing delimited key-value pairs into a map container including the feature to convert from std::string to std::wstring and vice-versa.

Language: C++ - Size: 96.7 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

akb89/witokit

A Python toolkit to generate a tokenized dump of Wikipedia for NLP

Language: Python - Size: 47.9 KB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 11 - Forks: 1

myint/untokenize

Transforms tokens into original source code (while preserving whitespace)

Language: Python - Size: 16.6 KB - Last synced at: 4 days ago - Pushed at: over 6 years ago - Stars: 8 - Forks: 1

flex-development/docast-util-from-docs

docast utility to parse docblocks

Language: TypeScript - Size: 2.14 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

hugohiraoka/Airline_Tweets_Sentiment_Analysis

A Natural Language Processing model to perform Sentiment Analysis of US Airline Customers

Language: HTML - Size: 22.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

gravity182/REPL

A Read–Eval–Print Loop (REPL) and Better Bash Lang & Compiler written in C++

Language: C++ - Size: 344 KB - Last synced at: 6 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

jonschlinkert/babel-extract-comments

Uses babel to extract JavaScript code comments from a string. Returns an array of comment objects, with line, column, index, comment type and comment string.

Language: JavaScript - Size: 271 KB - Last synced at: 4 days ago - Pushed at: over 7 years ago - Stars: 14 - Forks: 2

kopcho/Sales180

PKH Events

Size: 14.6 KB - Last synced at: 8 months ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

cygig/SerialConfigCommand

SerialConfigCommand allows user to issue commands, with or without values via the Serial Monitor easily. Example: "LED=255", "Lock=1", "Start". Compatible with Arduino String() class and character array.

Language: C++ - Size: 480 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

SeanZLiu/NLPLab1

2020Fall NLP Lab1

Language: Python - Size: 44.2 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

BaseMax/go-lexer-token-simple

Simple Go lexer: Lex own syntax and read it's from file.

Language: Go - Size: 17.6 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

begin/parsers-compilers

Lexers, tokenizers, parsers, compilers, renderers, stringifiers... What's the difference, and how do they work?

Size: 11.7 KB - Last synced at: 23 days ago - Pushed at: over 8 years ago - Stars: 22 - Forks: 1

queckezz/json-tokenize

Splits a JSON string into an annotated list of tokens

Language: JavaScript - Size: 209 KB - Last synced at: 6 months ago - Pushed at: over 8 years ago - Stars: 5 - Forks: 1

parridhi/Interactive-ChatBot

An interactive chatbot that engages with users in real-time conversations, providing personalized responses. With its advanced Natural Language Processing (NLP) capabilities, it can interpret user queries accurately. Created for websites catering to a small niche.

Language: Python - Size: 32.2 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

schemed-js/core

Modular TypeScript template engine

Language: TypeScript - Size: 180 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

spydaz/ClassTokenizer

Basic Tokenizer - Creates tokens - enabling for creation of personal syntax; removal of unwanted characters etc

Language: Visual Basic .NET - Size: 16.6 KB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

OpenVoiceOS/quebra_frases

chunks strings into byte sized pieces

Language: Python - Size: 35.2 KB - Last synced at: 3 months ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 3

carlosplanchon/tokenizesentences

Python3 module to tokenize english sentences.

Language: Python - Size: 21.5 KB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 6 - Forks: 1

GruDev325/NFTSwap-frontend

NFTSwaps is a cross-chain and permissionless platform to tokenize NFTs and make them tradable on AMMs such as PancakeSwap or BakerySwap through the NFTSwaps UI.

Language: JavaScript - Size: 61.9 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 6 - Forks: 4