Back to Glossary

Token (NLP)

NLP & Language Models

Smallest processing unit for language models.


A token is the smallest unit of text a model processes, often a word piece or symbol.

  • Types: Words, subwords, punctuation marks.
  • Impact: Tokenization affects context length and model efficiency.
  • Example: 'ChatGPT' may be split into 'Chat' and 'GPT'.