Tokenization is the act of breaking up a chain of Strings into pieces which include phrases, key phrases, terms, symbols and different elements known as tokens. Tokens may be Character words, phrases or even whole sentences. In the manner of tokenization, some characters like punctuation marks are discarded. The tokens end up the input for every other technique like parsing and textual content Mining.
Tokenization is utilized in Computer technology, wherein it plays a big part in the process of Lexical Analysis.
Tokenization relies totally on easy Heuristics so one can separate tokens by using following some steps:
Tokens themselves can also be separators. For example, in most Programming Languages, Identifiers can be placed collectively with mathematics Operators without white areas. Although it appears that evidently this will seem as a single phrase or token, the grammar of the language sincerely considers the mathematical Operator (a token) as a separator, so even when multiple tokens are bunched up together, they can nevertheless be separated via the mathematical operator.
Your Score to Tokenization article
Score: 5 out of 5 (1 voters)
Be the first to comment on the Tokenization
tech-term.com© 2023 All rights reserved