Tokens Explained
Learn what tokens are, why they matter for billing and performance, and how to optimize token usage in your AI applications.
What is a Token?
A token is a small chunk of text that a language model processes. Tokens can be:
Whole words:
"hello"= 1 tokenParts of words:
"running"= might be 2 tokens ("run"+"ning")Single characters:
"!"= 1 tokenSpaces and punctuation:
" "or","= 1 token
Examples
"Hello, world!" ≈ 4 tokens
["Hello", ",", " world", "!"]
"The quick brown fox" ≈ 5 tokens
["The", " quick", " brown", " fox"]
"Artificial Intelligence" ≈ 3 tokens
["Art", "ificial", " Intelligence"]
"I'm coding in JavaScript" ≈ 6 tokens
["I", "'m", " coding", " in", " JavaScript"]Rule of Thumb
English: ~1 token ≈ 4 characters or 0.75 words
Code: ~1 token ≈ 3-4 characters
Other languages: May use more tokens per word
Quick estimate: 100 words ≈ 133 tokens
Why Tokens Matter
1. Billing is Based on Tokens
Every API request costs money based on the number of tokens:
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)Example:
Request: "Summarize this article..." (50 tokens input)
Response: "This article discusses..." (100 tokens output)
Using Standard profile:
- Input: 50 × $0.25/M = $0.0000125
- Output: 100 × $2.50/M = $0.00025
Total: $0.0002625 (0.26 credits)2. Context Limits are in Tokens
Each model profile has a maximum token capacity:
LITE
100,000 tokens
STANDARD
200,000 tokens
DEEPTHINK
400,000 tokens
DEV
10,000 tokens
The context includes:
System messages
Conversation history
User prompt
AI response
If you exceed the limit, the API returns an error.
3. Response Time Depends on Tokens
More tokens = longer processing time:
Input tokens: Time to understand your request
Output tokens: Time to generate the response
Tip: Limit max_tokens for faster responses when you don't need long outputs.
Input vs Output Tokens
Input Tokens
Everything you send to the API:
System messages
User messages
Assistant messages (conversation history)
Function/tool definitions
const completion = await client.chat.completions.create({
model: "standard",
messages: [
{ role: "system", content: "You are helpful." }, // Input
{ role: "user", content: "What is the capital of France?" }, // Input
],
});Output Tokens
Everything the AI generates:
Assistant's response content
Reasoning tokens (for o1-style models)
// The response content counts as output tokens
console.log(completion.choices[0].message.content);
// "The capital of France is Paris."Pricing Difference
Output tokens cost significantly more than input tokens:
lite
$0.05/M
$0.50/M
10x
standard
$0.25/M
$2.50/M
10x
deepthink
$1.25/M
$12.50/M
10x
For the most up-to-date pricing, please refer to the Profile Dashboard (requires login).
Implication: Long outputs cost more than long inputs.
Reasoning Tokens (Thinking Tokens)
Some advanced AI models use reasoning tokens (also called "thinking tokens") during their internal processing before generating the final response.
What Are Reasoning Tokens?
Reasoning tokens represent the model's internal "thinking" process:
Internal computation: The model reasons through the problem step-by-step
Not visible: These tokens don't appear in the final response
Counted as input: Added to your input token count for billing purposes
Quality improvement: Help the model provide better, more accurate responses
Which Models Use Reasoning Tokens?
Primarily advanced reasoning models like:
OpenAI's 5 series
Other models with chain-of-thought capabilities
In Halfred, this primarily affects the DEEPTHINK profile.
Key Points
Transparent billing: Reasoning tokens are always included in the
prompt_tokenscountVariable usage: Complex questions may generate more reasoning tokens
Value proposition: You pay for the improved reasoning quality
Profile-specific: Most common in DEEPTHINK, rare in LITE/STANDARD
Monitoring Reasoning Tokens
Check the usage object in API responses:
console.log(`Prompt tokens: ${completion.usage.prompt_tokens}`);
console.log(`Completion tokens: ${completion.usage.completion_tokens}`);
// If prompt_tokens is much higher than your input length,
// the model used reasoning tokens💡 Tip: For cost-sensitive applications, consider using STANDARD or LITE profiles which typically don't use reasoning tokens.
How Halfred Counts Tokens
Understanding how Halfred counts tokens ensures accurate billing and transparency.
Token Counting Methods
Halfred uses two different approaches depending on the stage of your request:
1. Estimation (Before Request)
For pre-request estimates, Halfred uses the tiktoken library, which implements the industry-standard tokenization method used by most AI providers. This allows you to:
Plan API calls and estimate costs
Validate that content fits within context limits
Budget your token usage in advance
2. Actual Billing (After Request)
For billing purposes, Halfred prioritizes the token count returned by the model provider itself:
Primary Method: Halfred uses the exact token count reported by the model provider (OpenAI, Anthropic, Google, etc.) in their API response
Fallback Method (Rare): If the provider doesn't return a token count, Halfred applies the same tiktoken-based estimation method
Why Use Provider Counts?
Using the provider's actual token count ensures:
Accuracy: Reflects the exact tokens processed by the specific model
Consistency: Matches how the underlying provider bills
Transparency: You see the same counts the provider reports
No markup: Token counts are passed through directly with no added charges
Billing Transparency
Every API response includes a usage object showing:
prompt_tokens: Input tokens (including reasoning tokens if applicable)
completion_tokens: Output tokens generated
total_tokens: Sum of input and output tokens
These are the exact numbers used to calculate your bill. All requests are logged with their token counts in your dashboard for full audit trail.
💡 Best Practice: Always check the usage field in responses to track your actual token consumption and costs.
Optimizing Token Usage
1. Write Concise Prompts
// ❌ Verbose (32 tokens)
"I would really appreciate it if you could help me by providing a detailed explanation of how artificial intelligence works";
// ✅ Concise (11 tokens)
"Explain how artificial intelligence works";2. Limit System Messages
// ❌ Long system message
{
role: "system",
content: "You are a highly knowledgeable assistant who always provides detailed, comprehensive, and well-researched answers..."
}
// ✅ Concise system message
{
role: "system",
content: "You are a knowledgeable assistant."
}3. Use max_tokens
Control output length:
await client.chat.completions.create({
model: "standard",
messages: [...],
max_tokens: 100 // Limit to 100 output tokens
});4. Avoid Unnecessary Conversation History
// ❌ Sending entire history every time
const allMessages = [
/* 100 previous messages */
];
// ✅ Send only relevant recent messages
const relevantMessages = allMessages.slice(-10);5. Choose Appropriate Profile
Don't use a large context if you don't need it:
// ❌ Overkill for short prompts
model: "deepthink"; // 400K context for a 10-token prompt
// ✅ Appropriate for the task
model: "lite"; // 100K context is more than enoughSpecial Token Considerations
Code
Code typically uses more tokens than natural language:
// Natural language: ~15 tokens
"Create a function that adds two numbers";
// Code: ~25 tokens
function add(a, b) {
return a + b;
}JSON
JSON structure adds extra tokens:
{
"name": "John",
"age": 30
}This is ~10 tokens due to brackets, quotes, and formatting.
Different Languages
Non-English languages may use more tokens:
English: ~133 tokens per 100 words
Spanish: ~140 tokens per 100 words
Chinese: ~170 tokens per 100 words
Arabic: ~180 tokens per 100 words
Monitoring Token Usage
Track Per-Request
const completion = await client.chat.completions.create({
model: "standard",
messages: [...]
});
const { prompt_tokens, completion_tokens, total_tokens } = completion.usage;
console.log(`Input: ${prompt_tokens} tokens`);
console.log(`Output: ${completion_tokens} tokens`);
console.log(`Total: ${total_tokens} tokens`);
const cost = (prompt_tokens * 0.75 + completion_tokens * 2.50) / 1_000_000;
console.log(`Cost: $${cost.toFixed(6)}`);Common Token Errors
Error: Context Length Exceeded
Error: This model's maximum context length is 200000 tokens.
However, your messages resulted in 250000 tokens.Solutions:
Reduce message history
Shorten your prompt
Use a profile with larger context (deepthink)
Summarize earlier conversation
Error: max_tokens Too Large
Error: max_tokens value exceeds available contextSolution: Reduce max_tokens or shorten input
Frequently Asked Questions
How can I reduce token costs?
Write concise prompts
Limit conversation history
Use
max_tokensto control outputChoose the right profile for the task
Cache common responses
Do emojis count as tokens?
Yes! Emojis typically count as 1-2 tokens each: 😀 = 1-2 tokens
Are tokens counted before or after processing?
Token counting happens before the API request is sent, allowing you to stay within limits.
Can I see token counts before making a request?
Yes, use tokenizer libraries (like tiktoken) to count tokens beforehand.
Do system messages count toward the limit?
Yes, everything counts: system messages, user messages, assistant messages, and the generated response.
What happens if I hit the token limit mid-response?
The response will be truncated, and finish_reason will be "length" instead of "stop".
Next Steps
Learn about Pricing
Understand Model Profiles
Optimize with Best Practices
Start building with our Quick Start Guide
Support
Questions about tokens?
Email: [email protected]
Discord: Join our community
Dashboard: Monitor usage at halfred.ai
Last updated