framagit.org topics: natural language processing
nlp/substitutionstring
Modification of strings without loss of information. Useful for cleaning, normalizing, de-noising, filtering, ... any work when insertion and deletion from and to a string are in use.
Last synced at: 14 days ago - Stars: 0 - Forks: 0

nlp/iamtokenizing
Tokenizer classes for several NLP tasks: splitting a text on white space, using a REGEX expression, ... This package is based on the tokenspan package, see https://framagit.org/nlp/tokenspan
Last synced at: 17 days ago - Stars: 0 - Forks: 0

nlp/extractionstring
Extract part of a string in a versatile way, and without destroying information from the parent string. Allows discontinuous part of a string to be collected as an ExtractionString. Allows several strategies of string-splitting at the same time, for a given string.
Last synced at: 25 days ago - Stars: 0 - Forks: 0

nlp/iambagging
Bag of Words tools to represent natural language processing, and associate a few graph representation of a document. The main interest of this module is to be agnostic of the preprocessing and/or normalizing and or clean and/or tokenization protocols
Last synced at: almost 2 years ago - Stars: 0 - Forks: 0

nlp/iamnormalizing
Tools that normalize a text in a non-destructive way.
Last synced at: over 2 years ago - Stars: 0 - Forks: 0

Quent--y/extension-mozilla
Diccionari ortografic per Mozilla Firefox, basat sul dico Hunspell (https://gitlab.com/taissou/hunspell-files-for-occitan-lengadocian/-/tree/master/Files)
Last synced at: about 2 years ago - Stars: 0 - Forks: 0

nlp/tokenspan
Deprecated from sept. 2022. See https://framagit.org/nlp/extractionstring for improved tools to extract any sub-string from a parent one without losing information from the parent string.
Last synced at: 4 days ago - Stars: 0 - Forks: 0