Tokenization

Definition & Meaning

Last updated 23 month ago

What is Tokenization?

Tokenization is the act of breaking up a chain of Strings into pieces which include phrases, key phrases, terms, symbols and different elements known as tokens. Tokens may be Character words, phrases or even whole sentences. In the manner of tokenization, some characters like punctuation marks are discarded. The tokens end up the input for every other technique like parsing and textual content Mining.

Tokenization is utilized in Computer technology, wherein it plays a big part in the process of Lexical Analysis.

What Does Tokenization Mean?

Tokenization relies totally on easy Heuristics so one can separate tokens by using following some steps:

  • Tokens or words are separated by way of whitespace, punctuation marks or line breaks
  • White area or punctuation marks may additionally or might not be covered depending at the want
  • All characters inside contiguous strings are part of the token. Tokens can be made from all alpha characters, Alphanumeric characters or numeric characters handiest.

Tokens themselves can also be separators. For example, in most Programming Languages, Identifiers can be placed collectively with mathematics Operators without white areas. Although it appears that evidently this will seem as a single phrase or token, the grammar of the language sincerely considers the mathematical Operator (a token) as a separator, so even when multiple tokens are bunched up together, they can nevertheless be separated via the mathematical operator.

Share Tokenization article on social networks

Your Score to Tokenization article

Score: 5 out of 5 (1 voters)

Be the first to comment on the Tokenization

9841- V4

tech-term.com© 2023 All rights reserved