Pricing & Credits

Understand Halfred's transparent prepaid credit system, profile pricing, cost calculation, and optimization strategies.

How It Works

1. Add Credits

Purchase credits in advance through your dashboard:

$10 = 10 credits
$50 = 50 credits
$100 = 100 credits
Custom amounts available

2. Credits Are Consumed

Each API request consumes credits based on:

Model profile used (lite, standard, or deepthink)
Number of tokens processed (input + output)
Current pricing for that profile

3. Monitor Usage

Track your credit balance and usage in real-time:

Dashboard shows current balance
Each API response includes token usage
Historical usage reports available
Set up low-balance alerts

Profile Pricing

Credits are consumed based on token usage and the profile you choose:

Profile

Price per Million Output Tokens

Price per Million Input Tokens

LITE

$0.50

$0.05

STANDARD

$2.50

$0.25

DEEPTHINK

$12.50

$1.25

DEV

$0.00 (Free)

For the most up-to-date pricing, please refer to the Profile Dashboard (requires login).

Input token pricing is 10% of output token pricing

Calculating Costs

Formula

Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Input Tokens & Thinking Models

Some advanced models use reasoning tokens (also called "thinking tokens") during their internal processing. These tokens are counted as input tokens and billed accordingly.

What are reasoning tokens?

Internal "thinking" the model does before generating the final response
Used by models like OpenAI's gpt-5 series for complex reasoning tasks
Not visible in the final output but improve response quality

Cost impact:

Reasoning tokens are added to your input token count
More complex questions may generate more reasoning tokens
Primarily affects the DEEPTHINK profile, which uses advanced reasoning models

Example:

User prompt: "Solve this complex math problem..."
- Visible input tokens: 10
- Reasoning tokens (internal): 500
- Total input tokens billed: 510
- Output tokens: 100

Cost with DEEPTHINK profile:
- Input: 510 × $1.25/M = $0.0006375
- Output: 100 × $12.50/M = $0.00125
- Total: ~$0.00189 (1.89 credits)

💡 Tip: Check the usage field in API responses to see the exact breakdown of prompt tokens (input + reasoning) and completion tokens (output).

Examples

Example 1: Simple Question (Lite Profile)

Request: "What is 2+2?"
- Input: ~10 tokens
- Output: ~5 tokens

Cost calculation:
- Input: 10 tokens × $0.05 / 1M = $0.0000005
- Output: 5 tokens × $0.50 / 1M = $0.0000025
Total: ~$0.000003 (0.003 credits)

Example 2: Content Generation (Standard Profile)

Request: "Write a 500-word blog post about coffee"
- Input: ~20 tokens
- Output: ~750 tokens

Cost calculation:
- Input: 20 tokens × $0.25 / 1M = $0.000005
- Output: 750 tokens × $2.50 / 1M = $0.001875
Total: ~$0.00188 (1.88 credits)

Example 3: Document Analysis (DeepThink Profile)

Request: Analyze a 10-page legal document
- Input: ~5,000 tokens
- Output: ~1,000 tokens

Cost calculation:
- Input: 5,000 tokens × $1.25 / 1M = $0.00625
- Output: 1,000 tokens × $12.50 / 1M = $0.0125
Total: ~$0.01875 (18.75 credits)

Token-to-Credit Conversion

1 credit = $1 USD

This makes cost calculation straightforward:

$10 = 10 credits
$100 = 100 credits
$1,000 = 1,000 credits

Cost Optimization Strategies

1. Choose the Right Profile

Don't pay for more than you need:

// ❌ Expensive for simple tasks
await client.chat.completions.create({
  model: "deepthink",
  messages: [{ role: "user", content: "Hello!" }],
});

// ✅ Cost-effective
await client.chat.completions.create({
  model: "lite",
  messages: [{ role: "user", content: "Hello!" }],
});

2. Reduce Input Tokens

Shorter prompts = lower costs:

// ❌ Verbose
const prompt = "I would like you to please help me by generating a summary of the following text that I am about to provide to you...";

// ✅ Concise
const prompt = "Summarize this text:";

3. Limit Output Tokens

Control response length:

await client.chat.completions.create({
  model: "standard",
  messages: [...],
  max_tokens: 100  // Limit response to 100 tokens
});

4. Cache Common Responses

Store frequently requested responses:

const cache = new Map();

async function getCachedCompletion(prompt) {
  if (cache.has(prompt)) {
    return cache.get(prompt); // No API call = no cost
  }

  const response = await client.chat.completions.create({
    model: "standard",
    messages: [{ role: "user", content: prompt }],
  });

  cache.set(prompt, response);
  return response;
}

5. Use DEV Profile for Testing

Don't consume credits during development:

// In development
const model = process.env.NODE_ENV === 'production' ? 'standard' : 'dev';

await client.chat.completions.create({
  model: model,
  messages: [...]
});

6. Batch Similar Requests

Process multiple items in one request:

// ❌ Multiple expensive requests
for (const item of items) {
  await client.chat.completions.create({
    model: "standard",
    messages: [{ role: "user", content: `Process: ${item}` }],
  });
}

// ✅ Single batched request
await client.chat.completions.create({
  model: "standard",
  messages: [
    {
      role: "user",
      content: `Process these items: ${items.join(", ")}`,
    },
  ],
});

Monitoring Usage

In Your Dashboard

Current Balance: See credits remaining
Usage History: Track daily/weekly/monthly consumption
Alerts: Get notified when balance is low

In API Responses

Every response includes usage information:

{
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 100,
    "total_tokens": 150
  },
  "profile": "standard",
  "model": "gpt-5-mini"
}

Programmatic Monitoring

const completion = await client.chat.completions.create({
  model: "standard",
  messages: [...]
});

const cost = (
  completion.usage.prompt_tokens * 0.75 / 1_000_000 +
  completion.usage.completion_tokens * 2.50 / 1_000_000
);

console.log(`Request cost: $${cost.toFixed(6)}`);
console.log(`Tokens used: ${completion.usage.total_tokens}`);

Billing & Invoices

Prepaid System

No subscriptions: Pay only for what you use
No auto-renewal: Credits don't expire but must be added manually
Full control: You can't accidentally overspend

Adding Credits

Log in to your dashboard at halfred.ai
Navigate to Credits
Click "Top up Credits"
Choose an amount or enter a custom value (minimum $5)
Complete the secure payment via Stripe

Credit Balance & Limits

What Happens When Credits Run Out?

When your balance reaches zero:

API requests will return a 429 Insufficient Credits error
No requests will be processed until credits are added
No charges or overdrafts - you stay in control

Frequently Asked Questions

Do credits expire?

No, credits never expire. Use them at your own pace.

Can I get a refund?

Yes, unused credits can be refunded within 30 days of purchase.

What happens if I run out mid-request?

Requests are checked before processing. You won't be charged if you have insufficient credits.

Can I set spending limits?

Yes, you control your balance. We never charge beyond your prepaid amount.

Are there hidden fees?

No. The only cost is the credits you purchase and consume.

How does pricing compare to using providers directly?

Halfred adds a small markup for routing intelligence, unified billing, and multi-provider access. For most users, the convenience and time savings outweigh the minimal additional cost.

Yes. Credits are tied to your account. Currently, Halfred operates with one project per account, one API key, and one shared credit pool.

Do you offer free trials?

Yes! Use the DEV profile for free testing. No credit card required for signup.

Support

Questions about pricing or billing?

Email: [email protected]
Dashboard: halfred.ai/billing

Next Steps

Learn about Model Profiles
Understand Tokens
Optimize with Best Practices
Start building with our Quick Start Guide

PreviousModel Profiles NextTokens Explained

Last updated 1 month ago