Model Profiles

Understand Halfred's model profiles - Lite, Standard, DeepThink, and Dev. Choose the right profile for your use case and budget.

Why Profiles?

The AI landscape is complex and constantly evolving:

  • Too many choices: OpenAI, Anthropic, Google, Mistral, and more all offer multiple models

  • Constant updates: New models launch frequently, making comparisons difficult

  • Performance varies: The "best" model depends on your specific task

  • Price vs quality: Balancing cost and performance requires expertise

Halfred solves this by abstracting models into profiles based on what actually matters: your use case, budget, and quality requirements.

Available Profiles

Note: The specific models associated with each profile may evolve over time. Halfred continuously evaluates and updates the underlying models to ensure optimal performance and cost-efficiency while maintaining consistent profile characteristics. Your API calls remain compatible—only the quality and speed may improve.

LITE - Fast & Cost-Effective

Best for: Simple tasks, UI assistants, high-volume low-stakes applications

{
  model: "lite",
  // Automatically routes to the best lightweight model
}

Characteristics:

  • Speed: Optimized for low-latency responses

  • Cost: $0.50 per million output tokens

  • Context: Up to 100,000 tokens

  • Use cases:

    • Short form content generation

    • Simple Q&A and chatbots

    • UI helpers and autocomplete

    • Budget-sensitive workflows

    • High-volume batch processing

Supported Models (automatically selected):

  • GPT-4.1-nano

  • Gemini 2.5 Flash Lite

  • Mistral Tiny

When NOT to use:

  • Complex reasoning tasks

  • Long document analysis

  • Tasks requiring deep understanding

  • Critical business decisions


Best for: Most production applications, general-purpose AI tasks

{
  model: "standard",
  // Halfred selects the best value model for your request
}

Characteristics:

  • Balance: Optimal mix of speed, intelligence, and cost

  • Cost: $2.50 per million output tokens

  • Context: Up to 200,000 tokens

  • Use cases:

    • Customer support chatbots

    • Content creation and editing

    • Data extraction and summarization

    • Code generation and assistance

    • General business applications

Supported Models (automatically selected):

  • GPT-5-mini

  • Gemini 2.5 Flash

  • Mistral Medium

Why it's recommended:

  • Handles 90% of production use cases

  • Best price-to-performance ratio

  • Reliable and consistent quality

  • Good for most document sizes


DEEPTHINK - Advanced Reasoning

Best for: Complex analysis, long documents, research, strategic tasks

{
  model: "deepthink",
  // Routes to the most advanced models available
}

Characteristics:

  • Intelligence: Highest reasoning and comprehension capabilities

  • Cost: $12.50 per million output tokens

  • Context: Up to 400,000 tokens

  • Use cases:

    • Complex data analysis

    • Strategic planning and decision support

    • Research and technical writing

    • Long-form content with deep context

    • Legal and medical document analysis

    • Code review and architecture planning

Supported Models (automatically selected):

  • GPT-5

  • Claude Sonnet 4.5

  • Gemini 2.5 Pro

When to use:

  • Task requires nuanced understanding

  • Working with large documents (100K+ tokens)

  • Need for step-by-step reasoning

  • High-stakes business decisions

  • R&D and innovation projects


DEV - Free Development Tier

Best for: Development, testing, prototyping, CI/CD pipelines

{
  model: "dev",
  // Free tier - no credits consumed
}

Characteristics:

  • Cost: Free (no credits consumed)

  • Context: Up to 10,000 tokens (limited)

  • Use cases:

    • Local development and testing

    • Prototyping new features

    • CI/CD integration tests

    • Learning and experimentation

    • Debugging

Supported Models:

  • GPT-4.1-nano

  • Gemini 2.5 Flash Lite

  • Mistral Tiny

Limitations:

  • Lower accuracy and stability

  • Smaller context window

  • May use older model versions

  • Not recommended for production


Comparison Table

Profile
Price/M tokens (out)
Context Size
Speed
Quality
Best Use Case

LITE

$0.50

100K

⚡⚡⚡ Fast

⭐⭐ Good

Simple tasks, high volume

STANDARD

$2.50

200K

⚡⚡ Balanced

⭐⭐⭐ Great

Most production apps

DEEPTHINK

$12.50

400K

⚡ Thorough

⭐⭐⭐⭐ Excellent

Complex reasoning

DEV

$0.00

10K

⚡⚡ Moderate

⭐ Basic

Testing only

For the most up-to-date pricing, please refer to the Profile Dashboard (requires login).

How Profile Selection Works

When you specify a profile, Halfred:

  1. Selects the best model: Chooses from available models matching the profile's performance and cost characteristics

  2. Checks availability: Ensures the selected model service is available

  3. Handles failover: Automatically switches to another model in the profile if the primary is unavailable

  4. Returns results: You get the response without worrying about the details

// You specify the profile
const completion = await client.chat.completions.create({
  model: "standard",  // Your choice
  messages: [...]
});

// Halfred returns which model was actually used
console.log(completion.model);     // e.g., "gpt-5-mini"
console.log(completion.provider);  // e.g., "openai"
console.log(completion.profile);   // "standard"

Default Profile Configuration

You have two ways to specify which profile to use:

Option 1: Specify Profile Per Request

Add the model attribute to each API request:

const completion = await client.chat.completions.create({
  model: "lite", // Explicitly choose profile
  messages: [{ role: "user", content: "Hello!" }],
});

If you omit the model attribute, Halfred uses your default profile configured in the dashboard:

const completion = await client.chat.completions.create({
  // No model specified - uses your default profile
  messages: [{ role: "user", content: "Hello!" }],
});

Configure your default profile: Dashboard > Profile

When to Use Default Profile

Using a default profile is particularly useful when:

  • Consistent needs: Most of your application uses the same profile (e.g., 90% of requests are STANDARD)

  • Simplified code: Reduce code duplication by not specifying the model on every request

  • Easy adaptation: Adjust costs or performance across your entire project by changing one setting

Example Use Case: If you notice increasing costs due to traffic, you can switch your default profile from STANDARD to LITE in the dashboard—instantly affecting all requests that don't explicitly specify a model, without changing any code.

// Most requests use the default profile
await client.chat.completions.create({
  messages: [{ role: "user", content: "Standard task" }],
});

// Override for specific high-priority requests
await client.chat.completions.create({
  model: "deepthink", // Override default for complex analysis
  messages: [{ role: "user", content: "Analyze this complex scenario..." }],
});

Choosing the Right Profile

Decision Tree

Is this for production?
├─ No → Use DEV (free)
└─ Yes → Continue...

    What's your budget priority?
    ├─ Cost is critical → Use LITE
    ├─ Balanced → Continue...
    └─ Quality over cost → Continue...

        How complex is the task?
        ├─ Simple (FAQ, UI text, short replies) → Use LITE
        ├─ Moderate (chatbot, content, summaries) → Use STANDARD
        └─ Complex (analysis, research, strategy) → Use DEEPTHINK

Examples by Use Case

E-Commerce Chatbot

Profile: standard

  • Needs: Good understanding, moderate context

  • Why: Handles product questions, order status, general support

  • Cost-effective for volume

Profile: deepthink

  • Needs: Deep comprehension, large context window

  • Why: Requires nuanced understanding of complex documents

  • Worth the higher cost for accuracy

UI Autocomplete

Profile: lite

  • Needs: Speed, simple predictions

  • Why: High-volume, low-stakes suggestions

  • Fast responses matter most

Code Review

Profile: deepthink

  • Needs: Deep analysis, security awareness

  • Why: Requires understanding of code patterns and best practices

  • High-stakes quality checks

Blog Post Generator

Profile: standard

  • Needs: Creativity, moderate length

  • Why: Good balance of quality and cost

  • Suitable for most content needs

Multi-Profile Strategy

Many applications use multiple profiles for different features:

// Simple queries - use lite
async function quickAnswer(question) {
  return await client.chat.completions.create({
    model: "lite",
    messages: [{ role: "user", content: question }],
  });
}

// Complex analysis - use deepthink
async function analyzeDocument(document) {
  return await client.chat.completions.create({
    model: "deepthink",
    messages: [
      {
        role: "user",
        content: `Analyze this document: ${document}`,
      },
    ],
  });
}

// General chat - use standard
async function chatResponse(history) {
  return await client.chat.completions.create({
    model: "standard",
    messages: history,
  });
}

Profile Updates

Halfred regularly updates which models are included in each profile:

  • New models: Automatically added when they offer better performance

  • Deprecated models: Gracefully removed with advance notice

  • Performance tuning: Routing logic improves over time

  • No code changes needed: Your integration continues to work

This means you benefit from AI advancements without updating your code.

Pricing Considerations

Cost Per Request

The actual cost depends on tokens used:

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Example (Standard profile):

  • Input: 500 tokens @ $0.25/M = $0.000125

  • Output: 200 tokens @ $2.50/M = $0.0005

  • Total: $0.000625 per request

Optimization Tips

  1. Start with lite, upgrade if quality isn't sufficient

  2. Use standard as default, reserve deepthink for specific needs

  3. Cache common responses to avoid repeat requests

  4. Trim prompts to reduce input tokens

  5. Set max_tokens to control output length

Frequently Asked Questions

Can I choose a specific model instead of a profile?

No. Halfred's value proposition is intelligent model selection. If you need a specific model, consider using that provider directly.

What if the model selected isn't optimal for my request?

Halfred's routing improves over time. If you consistently see issues, contact support with examples.

Do all profiles support the same features?

Yes. All profiles support the same features. Halfred only adds features to profiles when all models selected for that profile have consistent behavior and capabilities for that feature.

Can I switch profiles mid-conversation?

Yes! Each request is independent. You can use different profiles for different turns in a conversation.

Will my costs increase if profiles are updated with better models?

Pricing is tied to the profile, not the underlying model. If a profile's price changes, we'll announce it in advance.

Next Steps

Support

Questions about profiles?

Last updated