Token
A token is the fundamental unit of text that a large language model processes. Tokenization splits text into chunks—sometimes whole words, sometimes subwords, sometimes individual characters—that the model can work with. In English text, one token is roughly 3-4 characters or 0.75 words. In code, tokens map to keywords, operators, variable names, and whitespace.
How tokenization works for code
LLMs use tokenizers (like BPE—byte pair encoding) to convert raw text into a sequence of integer IDs. Each ID maps to a token in the model's vocabulary. Common words like "the" or "function" are single tokens. Uncommon or long words get split into multiple tokens. In code, common keywords (if, return, const) are single tokens, but unusual variable names may be split into several subword tokens. This affects both cost (you pay per token) and context limits (the model can only process a fixed number of tokens).
Why tokens matter for developers
- +Cost: API pricing is per token. A 1,000-line file might cost 3,000-5,000 tokens to process.
- +Context window: the model's token limit determines how much code it can "see" at once.
- +Speed: more tokens = longer generation time. Concise prompts get faster responses.
- +Code density: code is more token-dense than natural language. A Python file uses more tokens per line than English prose.
// Tokenization example (using tiktoken-like tokenizer)
// English: "Hello world" → ["Hello", " world"] = 2 tokens
// Code: more tokens per line
// "const result = await fetchUser(id);"
// → ["const", " result", " =", " await", " fetch", "User", "(", "id", ");"]
// = 9 tokens
// Approximate token counts for code files:
// 100-line TypeScript file: ~800 tokens
// 500-line Python module: ~3,500 tokens
// 1,000-line Java class: ~8,000 tokensClaude Code manages tokens automatically, reading files on demand rather than loading everything upfront. You do not need to manually count tokens, but understanding the concept helps you write more cost-effective prompts.
How many tokens is a typical code file?+
Are input tokens and output tokens priced differently?+
How do I reduce token usage?+
Related terms
Related comparisons
Master Claude Code in days, not months
37 hands-on lessons from beginner to CI/CD automation. Module 1 is free.
START FREE →