Model Profiles
Understand Halfred's model profiles - Lite, Standard, DeepThink, and Dev. Choose the right profile for your use case and budget.
Why Profiles?
The AI landscape is complex and constantly evolving:
Too many choices: OpenAI, Anthropic, Google, Mistral, and more all offer multiple models
Constant updates: New models launch frequently, making comparisons difficult
Performance varies: The "best" model depends on your specific task
Price vs quality: Balancing cost and performance requires expertise
Halfred solves this by abstracting models into profiles based on what actually matters: your use case, budget, and quality requirements.
Available Profiles
Note: The specific models associated with each profile may evolve over time. Halfred continuously evaluates and updates the underlying models to ensure optimal performance and cost-efficiency while maintaining consistent profile characteristics. Your API calls remain compatible—only the quality and speed may improve.
LITE - Fast & Cost-Effective
Best for: Simple tasks, UI assistants, high-volume low-stakes applications
{
model: "lite",
// Automatically routes to the best lightweight model
}Characteristics:
Speed: Optimized for low-latency responses
Cost: $0.50 per million output tokens
Context: Up to 100,000 tokens
Use cases:
Short form content generation
Simple Q&A and chatbots
UI helpers and autocomplete
Budget-sensitive workflows
High-volume batch processing
Supported Models (automatically selected):
GPT-4.1-nano
Gemini 2.5 Flash Lite
Mistral Tiny
When NOT to use:
Complex reasoning tasks
Long document analysis
Tasks requiring deep understanding
Critical business decisions
STANDARD - Balanced Performance (Recommended)
Best for: Most production applications, general-purpose AI tasks
{
model: "standard",
// Halfred selects the best value model for your request
}Characteristics:
Balance: Optimal mix of speed, intelligence, and cost
Cost: $2.50 per million output tokens
Context: Up to 200,000 tokens
Use cases:
Customer support chatbots
Content creation and editing
Data extraction and summarization
Code generation and assistance
General business applications
Supported Models (automatically selected):
GPT-5-mini
Gemini 2.5 Flash
Mistral Medium
Why it's recommended:
Handles 90% of production use cases
Best price-to-performance ratio
Reliable and consistent quality
Good for most document sizes
DEEPTHINK - Advanced Reasoning
Best for: Complex analysis, long documents, research, strategic tasks
{
model: "deepthink",
// Routes to the most advanced models available
}Characteristics:
Intelligence: Highest reasoning and comprehension capabilities
Cost: $12.50 per million output tokens
Context: Up to 400,000 tokens
Use cases:
Complex data analysis
Strategic planning and decision support
Research and technical writing
Long-form content with deep context
Legal and medical document analysis
Code review and architecture planning
Supported Models (automatically selected):
GPT-5
Claude Sonnet 4.5
Gemini 2.5 Pro
When to use:
Task requires nuanced understanding
Working with large documents (100K+ tokens)
Need for step-by-step reasoning
High-stakes business decisions
R&D and innovation projects
DEV - Free Development Tier
Best for: Development, testing, prototyping, CI/CD pipelines
{
model: "dev",
// Free tier - no credits consumed
}Characteristics:
Cost: Free (no credits consumed)
Context: Up to 10,000 tokens (limited)
Use cases:
Local development and testing
Prototyping new features
CI/CD integration tests
Learning and experimentation
Debugging
Supported Models:
GPT-4.1-nano
Gemini 2.5 Flash Lite
Mistral Tiny
Limitations:
Lower accuracy and stability
Smaller context window
May use older model versions
Not recommended for production
Comparison Table
LITE
$0.50
100K
⚡⚡⚡ Fast
⭐⭐ Good
Simple tasks, high volume
STANDARD
$2.50
200K
⚡⚡ Balanced
⭐⭐⭐ Great
Most production apps
DEEPTHINK
$12.50
400K
⚡ Thorough
⭐⭐⭐⭐ Excellent
Complex reasoning
DEV
$0.00
10K
⚡⚡ Moderate
⭐ Basic
Testing only
For the most up-to-date pricing, please refer to the Profile Dashboard (requires login).
How Profile Selection Works
When you specify a profile, Halfred:
Selects the best model: Chooses from available models matching the profile's performance and cost characteristics
Checks availability: Ensures the selected model service is available
Handles failover: Automatically switches to another model in the profile if the primary is unavailable
Returns results: You get the response without worrying about the details
// You specify the profile
const completion = await client.chat.completions.create({
model: "standard", // Your choice
messages: [...]
});
// Halfred returns which model was actually used
console.log(completion.model); // e.g., "gpt-5-mini"
console.log(completion.provider); // e.g., "openai"
console.log(completion.profile); // "standard"Default Profile Configuration
You have two ways to specify which profile to use:
Option 1: Specify Profile Per Request
Add the model attribute to each API request:
const completion = await client.chat.completions.create({
model: "lite", // Explicitly choose profile
messages: [{ role: "user", content: "Hello!" }],
});Option 2: Use Default Profile (Recommended for Consistent Workloads)
If you omit the model attribute, Halfred uses your default profile configured in the dashboard:
const completion = await client.chat.completions.create({
// No model specified - uses your default profile
messages: [{ role: "user", content: "Hello!" }],
});Configure your default profile: Dashboard > Profile
When to Use Default Profile
Using a default profile is particularly useful when:
Consistent needs: Most of your application uses the same profile (e.g., 90% of requests are STANDARD)
Simplified code: Reduce code duplication by not specifying the model on every request
Easy adaptation: Adjust costs or performance across your entire project by changing one setting
Example Use Case: If you notice increasing costs due to traffic, you can switch your default profile from STANDARD to LITE in the dashboard—instantly affecting all requests that don't explicitly specify a model, without changing any code.
// Most requests use the default profile
await client.chat.completions.create({
messages: [{ role: "user", content: "Standard task" }],
});
// Override for specific high-priority requests
await client.chat.completions.create({
model: "deepthink", // Override default for complex analysis
messages: [{ role: "user", content: "Analyze this complex scenario..." }],
});Choosing the Right Profile
Decision Tree
Is this for production?
├─ No → Use DEV (free)
└─ Yes → Continue...
What's your budget priority?
├─ Cost is critical → Use LITE
├─ Balanced → Continue...
└─ Quality over cost → Continue...
How complex is the task?
├─ Simple (FAQ, UI text, short replies) → Use LITE
├─ Moderate (chatbot, content, summaries) → Use STANDARD
└─ Complex (analysis, research, strategy) → Use DEEPTHINKExamples by Use Case
E-Commerce Chatbot
Profile: standard
Needs: Good understanding, moderate context
Why: Handles product questions, order status, general support
Cost-effective for volume
Legal Document Analysis
Profile: deepthink
Needs: Deep comprehension, large context window
Why: Requires nuanced understanding of complex documents
Worth the higher cost for accuracy
UI Autocomplete
Profile: lite
Needs: Speed, simple predictions
Why: High-volume, low-stakes suggestions
Fast responses matter most
Code Review
Profile: deepthink
Needs: Deep analysis, security awareness
Why: Requires understanding of code patterns and best practices
High-stakes quality checks
Blog Post Generator
Profile: standard
Needs: Creativity, moderate length
Why: Good balance of quality and cost
Suitable for most content needs
Multi-Profile Strategy
Many applications use multiple profiles for different features:
// Simple queries - use lite
async function quickAnswer(question) {
return await client.chat.completions.create({
model: "lite",
messages: [{ role: "user", content: question }],
});
}
// Complex analysis - use deepthink
async function analyzeDocument(document) {
return await client.chat.completions.create({
model: "deepthink",
messages: [
{
role: "user",
content: `Analyze this document: ${document}`,
},
],
});
}
// General chat - use standard
async function chatResponse(history) {
return await client.chat.completions.create({
model: "standard",
messages: history,
});
}Profile Updates
Halfred regularly updates which models are included in each profile:
New models: Automatically added when they offer better performance
Deprecated models: Gracefully removed with advance notice
Performance tuning: Routing logic improves over time
No code changes needed: Your integration continues to work
This means you benefit from AI advancements without updating your code.
Pricing Considerations
Cost Per Request
The actual cost depends on tokens used:
Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)Example (Standard profile):
Input: 500 tokens @ $0.25/M = $0.000125
Output: 200 tokens @ $2.50/M = $0.0005
Total: $0.000625 per request
Optimization Tips
Start with lite, upgrade if quality isn't sufficient
Use standard as default, reserve deepthink for specific needs
Cache common responses to avoid repeat requests
Trim prompts to reduce input tokens
Set max_tokens to control output length
Frequently Asked Questions
Can I choose a specific model instead of a profile?
No. Halfred's value proposition is intelligent model selection. If you need a specific model, consider using that provider directly.
What if the model selected isn't optimal for my request?
Halfred's routing improves over time. If you consistently see issues, contact support with examples.
Do all profiles support the same features?
Yes. All profiles support the same features. Halfred only adds features to profiles when all models selected for that profile have consistent behavior and capabilities for that feature.
Can I switch profiles mid-conversation?
Yes! Each request is independent. You can use different profiles for different turns in a conversation.
Will my costs increase if profiles are updated with better models?
Pricing is tied to the profile, not the underlying model. If a profile's price changes, we'll announce it in advance.
Next Steps
Understand costs: Read Pricing & Credits
Learn about usage: See Tokens Explained
Start building: Follow our Quick Start Guide
Optimize: Check Best Practices
Support
Questions about profiles?
Email: [email protected]
Discord: Join our community
Dashboard: View your profile usage at halfred.ai
Last updated