Model Profiles

Understand Halfred's model profiles - Lite, Standard, DeepThink, and Dev. Choose the right profile for your use case and budget.

Why Profiles?

The AI landscape is complex and constantly evolving:

Too many choices: OpenAI, Anthropic, Google, Mistral, and more all offer multiple models
Constant updates: New models launch frequently, making comparisons difficult
Performance varies: The "best" model depends on your specific task
Price vs quality: Balancing cost and performance requires expertise

Halfred solves this by abstracting models into profiles based on what actually matters: your use case, budget, and quality requirements.

Available Profiles

Note: The specific models associated with each profile may evolve over time. Halfred continuously evaluates and updates the underlying models to ensure optimal performance and cost-efficiency while maintaining consistent profile characteristics. Your API calls remain compatible—only the quality and speed may improve.

LITE - Fast & Cost-Effective

Best for: Simple tasks, UI assistants, high-volume low-stakes applications

{
  model: "lite",
  // Automatically routes to the best lightweight model
}

Characteristics:

Speed: Optimized for low-latency responses
Cost: $0.50 per million output tokens
Context: Up to 100,000 tokens
Use cases:
- Short form content generation
- Simple Q&A and chatbots
- UI helpers and autocomplete
- Budget-sensitive workflows
- High-volume batch processing

Supported Models (automatically selected):

GPT-4.1-nano
Gemini 2.5 Flash Lite
Mistral Tiny

When NOT to use:

Complex reasoning tasks
Long document analysis
Tasks requiring deep understanding
Critical business decisions

STANDARD - Balanced Performance (Recommended)

Best for: Most production applications, general-purpose AI tasks

{
  model: "standard",
  // Halfred selects the best value model for your request
}

Characteristics:

Balance: Optimal mix of speed, intelligence, and cost
Cost: $2.50 per million output tokens
Context: Up to 200,000 tokens
Use cases:
- Customer support chatbots
- Content creation and editing
- Data extraction and summarization
- Code generation and assistance
- General business applications

Supported Models (automatically selected):

GPT-5-mini
Gemini 2.5 Flash
Mistral Medium

Why it's recommended:

Handles 90% of production use cases
Best price-to-performance ratio
Reliable and consistent quality
Good for most document sizes

DEEPTHINK - Advanced Reasoning

Best for: Complex analysis, long documents, research, strategic tasks

{
  model: "deepthink",
  // Routes to the most advanced models available
}

Characteristics:

Intelligence: Highest reasoning and comprehension capabilities
Cost: $12.50 per million output tokens
Context: Up to 400,000 tokens
Use cases:
- Complex data analysis
- Strategic planning and decision support
- Research and technical writing
- Long-form content with deep context
- Legal and medical document analysis
- Code review and architecture planning

Supported Models (automatically selected):

GPT-5
Claude Sonnet 4.5
Gemini 2.5 Pro

When to use:

Task requires nuanced understanding
Working with large documents (100K+ tokens)
Need for step-by-step reasoning
High-stakes business decisions
R&D and innovation projects

DEV - Free Development Tier

Best for: Development, testing, prototyping, CI/CD pipelines

{
  model: "dev",
  // Free tier - no credits consumed
}

Characteristics:

Cost: Free (no credits consumed)
Context: Up to 10,000 tokens (limited)
Use cases:
- Local development and testing
- Prototyping new features
- CI/CD integration tests
- Learning and experimentation
- Debugging

Supported Models:

GPT-4.1-nano
Gemini 2.5 Flash Lite
Mistral Tiny

Limitations:

Lower accuracy and stability
Smaller context window
May use older model versions
Not recommended for production

Comparison Table

Profile

Price/M tokens (out)

Context Size

Speed

Quality

Best Use Case

LITE

$0.50

100K

⚡⚡⚡ Fast

⭐⭐ Good

Simple tasks, high volume

STANDARD

$2.50

200K

⚡⚡ Balanced

⭐⭐⭐ Great

Most production apps

DEEPTHINK

$12.50

400K

⚡ Thorough

⭐⭐⭐⭐ Excellent

Complex reasoning

DEV

$0.00

10K

⚡⚡ Moderate

⭐ Basic

Testing only

For the most up-to-date pricing, please refer to the Profile Dashboard (requires login).

How Profile Selection Works

When you specify a profile, Halfred:

Selects the best model: Chooses from available models matching the profile's performance and cost characteristics
Checks availability: Ensures the selected model service is available
Handles failover: Automatically switches to another model in the profile if the primary is unavailable
Returns results: You get the response without worrying about the details

// You specify the profile
const completion = await client.chat.completions.create({
  model: "standard",  // Your choice
  messages: [...]
});

// Halfred returns which model was actually used
console.log(completion.model);     // e.g., "gpt-5-mini"
console.log(completion.provider);  // e.g., "openai"
console.log(completion.profile);   // "standard"

Default Profile Configuration

You have two ways to specify which profile to use:

Option 1: Specify Profile Per Request

Add the model attribute to each API request:

const completion = await client.chat.completions.create({
  model: "lite", // Explicitly choose profile
  messages: [{ role: "user", content: "Hello!" }],
});

Option 2: Use Default Profile (Recommended for Consistent Workloads)

If you omit the model attribute, Halfred uses your default profile configured in the dashboard:

const completion = await client.chat.completions.create({
  // No model specified - uses your default profile
  messages: [{ role: "user", content: "Hello!" }],
});

Configure your default profile: Dashboard > Profile

When to Use Default Profile

Using a default profile is particularly useful when:

Consistent needs: Most of your application uses the same profile (e.g., 90% of requests are STANDARD)
Simplified code: Reduce code duplication by not specifying the model on every request
Easy adaptation: Adjust costs or performance across your entire project by changing one setting

Example Use Case: If you notice increasing costs due to traffic, you can switch your default profile from STANDARD to LITE in the dashboard—instantly affecting all requests that don't explicitly specify a model, without changing any code.

// Most requests use the default profile
await client.chat.completions.create({
  messages: [{ role: "user", content: "Standard task" }],
});

// Override for specific high-priority requests
await client.chat.completions.create({
  model: "deepthink", // Override default for complex analysis
  messages: [{ role: "user", content: "Analyze this complex scenario..." }],
});

Choosing the Right Profile

Decision Tree

Is this for production?
├─ No → Use DEV (free)
└─ Yes → Continue...

    What's your budget priority?
    ├─ Cost is critical → Use LITE
    ├─ Balanced → Continue...
    └─ Quality over cost → Continue...

        How complex is the task?
        ├─ Simple (FAQ, UI text, short replies) → Use LITE
        ├─ Moderate (chatbot, content, summaries) → Use STANDARD
        └─ Complex (analysis, research, strategy) → Use DEEPTHINK

Examples by Use Case

E-Commerce Chatbot

Profile: standard

Needs: Good understanding, moderate context
Why: Handles product questions, order status, general support
Cost-effective for volume

Legal Document Analysis

Profile: deepthink

Needs: Deep comprehension, large context window
Why: Requires nuanced understanding of complex documents
Worth the higher cost for accuracy

UI Autocomplete

Profile: lite

Needs: Speed, simple predictions
Why: High-volume, low-stakes suggestions
Fast responses matter most

Code Review

Profile: deepthink

Needs: Deep analysis, security awareness
Why: Requires understanding of code patterns and best practices
High-stakes quality checks

Blog Post Generator

Profile: standard

Needs: Creativity, moderate length
Why: Good balance of quality and cost
Suitable for most content needs

Multi-Profile Strategy

Many applications use multiple profiles for different features:

// Simple queries - use lite
async function quickAnswer(question) {
  return await client.chat.completions.create({
    model: "lite",
    messages: [{ role: "user", content: question }],
  });
}

// Complex analysis - use deepthink
async function analyzeDocument(document) {
  return await client.chat.completions.create({
    model: "deepthink",
    messages: [
      {
        role: "user",
        content: `Analyze this document: ${document}`,
      },
    ],
  });
}

// General chat - use standard
async function chatResponse(history) {
  return await client.chat.completions.create({
    model: "standard",
    messages: history,
  });
}

Profile Updates

Halfred regularly updates which models are included in each profile:

New models: Automatically added when they offer better performance
Deprecated models: Gracefully removed with advance notice
Performance tuning: Routing logic improves over time
No code changes needed: Your integration continues to work

This means you benefit from AI advancements without updating your code.

Pricing Considerations

Cost Per Request

The actual cost depends on tokens used:

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example (Standard profile):

Input: 500 tokens @ $0.25/M = $0.000125
Output: 200 tokens @ $2.50/M = $0.0005
Total: $0.000625 per request

Optimization Tips

Start with lite, upgrade if quality isn't sufficient
Use standard as default, reserve deepthink for specific needs
Cache common responses to avoid repeat requests
Trim prompts to reduce input tokens
Set max_tokens to control output length

Frequently Asked Questions

Can I choose a specific model instead of a profile?

No. Halfred's value proposition is intelligent model selection. If you need a specific model, consider using that provider directly.

What if the model selected isn't optimal for my request?

Halfred's routing improves over time. If you consistently see issues, contact support with examples.

Do all profiles support the same features?

Yes. All profiles support the same features. Halfred only adds features to profiles when all models selected for that profile have consistent behavior and capabilities for that feature.

Can I switch profiles mid-conversation?

Yes! Each request is independent. You can use different profiles for different turns in a conversation.

Will my costs increase if profiles are updated with better models?

Pricing is tied to the profile, not the underlying model. If a profile's price changes, we'll announce it in advance.

Next Steps

Understand costs: Read Pricing & Credits
Learn about usage: See Tokens Explained
Start building: Follow our Quick Start Guide
Optimize: Check Best Practices

Support

Questions about profiles?

Email: [email protected]
Discord: Join our community
Dashboard: View your profile usage at halfred.ai

PreviousCore Concepts NextPricing & Credits

Last updated 9 days ago