Halfred API Reference

Complete REST API reference for Halfred including chat completions, models endpoint, request parameters, and response formats.

Base URL

https://api.halfred.ai/v1/

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header:

Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Getting an API Key

First Time:

  1. Sign up for a Halfred account at halfred.ai using Google, GitHub, or email

  2. Upon first login, your API key is automatically generated and displayed

  3. Copy and store it securely - you'll need it for all API requests

Regenerating Your Key: If you need to regenerate your API key:

  1. Go to your Dashboard

  2. Navigate to the "Your Key" section

  3. Click the three dots menu (⋮) and select "Revoke Key & Get New One"

  4. Copy your new API key immediately

⚠️ Important: Revoking a key is immediate. Update all applications using the old key to avoid service interruption.

API Key Security

  • Never expose keys: Keep your API keys secure and never expose them in client-side code or public repositories

  • Server-side only: API keys should only be used in server-side applications

  • Monitor usage: Track your API key usage and request logs in the dashboard

  • Revoke if compromised: Immediately regenerate your key if you suspect it has been compromised

Models

Halfred offers different model profiles optimized for various use cases:

Available Profiles

Profile
Description

LITE

Lightweight and fast AI for everyday simple tasks

STANDARD

Balanced performance - let Halfred select the best value model

DEEPTHINK

Advanced reasoning with extended context

DEV

Free calls for development and testing

📖 Learn more: Model Profiles Guide - Detailed information on pricing, context sizes, use cases, and how to choose the right profile.

Endpoints


POST /chat/completions

Generate AI completions for chat conversations.

Request

POST https://api.halfred.ai/v1/chat/completions
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: application/json

Request Body

Conversation Messages

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "model": "standard",
  "temperature": 0.7,
  "response_format": {
    "type": "json_object"
  },
  "max_tokens": 150
}

Request Parameters

Parameter
Type
Required
Description

messages

array

Yes

Array of conversation messages

messages[].role

string

Yes

Message role: "system", "user", or "assistant"

messages[].content

string

Yes

Message content

model

string

No

Model profile to use (defaults to project default or "lite")

temperature

number

No

Sampling temperature between 0 and 2 (default varies by model)

response_format

object

No

Response format specification

response_format.type

string

No

Format type (e.g., "json_object")

stream

boolean

No

If true, sends partial message deltas (not yet supported)

max_completion_tokens

integer

No

Upper bound for completion tokens (including reasoning tokens)

max_tokens

integer

No

Maximum number of tokens to generate in the chat completion

Message Roles

  • system: Sets the behavior and context for the assistant

  • user: Messages from the user/human

  • assistant: Previous responses from the AI (for conversation history)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "provider": "openai",
  "model": "gpt-4o",
  "profile": "standard",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Response Fields

Field
Type
Description

id

string

Unique identifier for the completion

object

string

Always "chat.completion"

created

number

Unix timestamp of when the completion was created

provider

string

The underlying AI provider used

model

string

The specific model that generated the response

profile

string

The Halfred profile used

choices

array

Array of completion choices (typically one)

choices[].index

number

Choice index (usually 0)

choices[].message

object

The generated message

choices[].message.role

string

Always "assistant"

choices[].message.content

string

The generated response content

choices[].finish_reason

string

Reason completion finished: "stop", "length", "content_filter", or "tool_calls"

usage

object

Token usage information

usage.prompt_tokens

number

Number of tokens in the prompt

usage.completion_tokens

number

Number of tokens in the completion

usage.total_tokens

number

Total tokens used (prompt + completion)


GET /models

Retrieve a list of available models and their details.

Request

GET https://api.halfred.ai/v1/models
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Response

{
  "object": "list",
  "data": [
    {
      "id": "halfred-standard",
      "object": "model",
      "created": "2024-01-15T10:30:00.000Z",
      "owned_by": "halfred"
    },
    {
      "id": "standard",
      "object": "model",
      "created": "2024-01-15T10:30:00.000Z",
      "owned_by": "halfred"
    },
    {
      "id": "halfred-deepthink",
      "object": "model",
      "created": "2024-01-15T10:30:00.000Z",
      "owned_by": "halfred"
    },
    {
      "id": "deepthink",
      "object": "model",
      "created": "2024-01-15T10:30:00.000Z",
      "owned_by": "halfred"
    },
    (...)
  ]
}

Response Fields

Field
Type
Description

object

string

Always "list"

data

array

Array of available models

data[].id

string

Model identifier (can be used in chat completions)

data[].object

string

Always "model"

data[].created

string

ISO 8601 timestamp when the model was created

data[].owned_by

string

Always "halfred"

Model ID Formats

Models are available in two formats:

  • halfred-{profile} (e.g., halfred-standard)

  • {profile} (e.g., standard)

Both formats are equivalent and can be used interchangeably.


Error Handling

The API uses standard HTTP status codes and returns detailed error information in JSON format.

Error Response Format

{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

Common Error Codes

HTTP Status
Error Type
Description

400

invalid_request

Bad request - check your parameters

401

authentication_error

Invalid or missing API key

403

permission_error

API key doesn't have required permissions

404

not_found

Resource not found

429

rate_limit_error

Too many requests or insufficient credits

500

server_error

Internal server error

Specific Error Scenarios

Authentication Errors

  • Missing Authorization header: Include Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx

  • Invalid API key: Check that your API key is correct and not revoked

  • Expired API key: Generate a new API key if yours has expired

Request Errors

  • Missing or empty messages: You must provide a non-empty messages array

  • Invalid model: Use a valid model ID from the /models endpoint

  • Context limit exceeded: Reduce message length or use a profile with larger context

  • Streaming not supported: The stream parameter is not yet supported

Usage Errors

  • Insufficient credits: Add credits to your account

  • Profile usage limit exceeded: Upgrade your plan or wait for limit reset

  • Rate limit exceeded: Reduce request frequency

Code Examples

cURL

# Get available models
curl -X GET "https://api.halfred.ai/v1/models" \
  -H "Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Chat completion
curl -X POST "https://api.halfred.ai/v1/chat/completions" \
  -H "Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "standard",
    "temperature": 0.7
  }'

SDK Implementation

For easier integration, we recommend using our official SDKs instead of making direct HTTP requests:

For other languages, see our OpenAI SDK Compatibility Guide.

Best Practices

Model Selection

  • Use LITE for simple, fast responses and cost-sensitive applications

  • Use STANDARD for most production applications - it automatically selects the best model

  • Use DEEPTHINK for complex reasoning, long documents, or strategic tasks

  • Use DEV only for development and testing (free but not production-ready)

Message Formatting

  • Always use the messages array for all requests

  • Include system messages to set context and behavior

  • Keep conversation history to maintain context across turns

  • For single prompts, use a messages array with one user message

Error Handling

  • Always check HTTP status codes

  • Parse error responses to understand issues

  • Implement retry logic for transient errors (5xx codes)

  • Handle rate limiting gracefully

Performance

  • Cache responses when appropriate

  • Use appropriate temperature settings (lower for factual, higher for creative)

  • Monitor token usage to optimize costs

  • Consider using streaming for long responses (contact support for streaming access)

Support

For additional help:

Last updated