Halfred API Reference
Complete REST API reference for Halfred including chat completions, models endpoint, request parameters, and response formats.
Base URL
https://api.halfred.ai/v1/Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxxGetting an API Key
First Time:
Sign up for a Halfred account at halfred.ai using Google, GitHub, or email
Upon first login, your API key is automatically generated and displayed
Copy and store it securely - you'll need it for all API requests
Regenerating Your Key: If you need to regenerate your API key:
Go to your Dashboard
Navigate to the "Your Key" section
Click the three dots menu (⋮) and select "Revoke Key & Get New One"
Copy your new API key immediately
⚠️ Important: Revoking a key is immediate. Update all applications using the old key to avoid service interruption.
API Key Security
Never expose keys: Keep your API keys secure and never expose them in client-side code or public repositories
Server-side only: API keys should only be used in server-side applications
Monitor usage: Track your API key usage and request logs in the dashboard
Revoke if compromised: Immediately regenerate your key if you suspect it has been compromised
Models
Halfred offers different model profiles optimized for various use cases:
Available Profiles
LITE
Lightweight and fast AI for everyday simple tasks
STANDARD
Balanced performance - let Halfred select the best value model
DEEPTHINK
Advanced reasoning with extended context
DEV
Free calls for development and testing
📖 Learn more: Model Profiles Guide - Detailed information on pricing, context sizes, use cases, and how to choose the right profile.
Endpoints
POST /chat/completions
Generate AI completions for chat conversations.
Request
POST https://api.halfred.ai/v1/chat/completions
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Type: application/jsonRequest Body
Conversation Messages
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"model": "standard",
"temperature": 0.7,
"response_format": {
"type": "json_object"
},
"max_tokens": 150
}Request Parameters
messages
array
Yes
Array of conversation messages
messages[].role
string
Yes
Message role: "system", "user", or "assistant"
messages[].content
string
Yes
Message content
model
string
No
Model profile to use (defaults to project default or "lite")
temperature
number
No
Sampling temperature between 0 and 2 (default varies by model)
response_format
object
No
Response format specification
response_format.type
string
No
Format type (e.g., "json_object")
stream
boolean
No
If true, sends partial message deltas (not yet supported)
max_completion_tokens
integer
No
Upper bound for completion tokens (including reasoning tokens)
max_tokens
integer
No
Maximum number of tokens to generate in the chat completion
Message Roles
system: Sets the behavior and context for the assistant
user: Messages from the user/human
assistant: Previous responses from the AI (for conversation history)
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677652288,
"provider": "openai",
"model": "gpt-4o",
"profile": "standard",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}Response Fields
id
string
Unique identifier for the completion
object
string
Always "chat.completion"
created
number
Unix timestamp of when the completion was created
provider
string
The underlying AI provider used
model
string
The specific model that generated the response
profile
string
The Halfred profile used
choices
array
Array of completion choices (typically one)
choices[].index
number
Choice index (usually 0)
choices[].message
object
The generated message
choices[].message.role
string
Always "assistant"
choices[].message.content
string
The generated response content
choices[].finish_reason
string
Reason completion finished: "stop", "length", "content_filter", or "tool_calls"
usage
object
Token usage information
usage.prompt_tokens
number
Number of tokens in the prompt
usage.completion_tokens
number
Number of tokens in the completion
usage.total_tokens
number
Total tokens used (prompt + completion)
GET /models
Retrieve a list of available models and their details.
Request
GET https://api.halfred.ai/v1/models
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxxResponse
{
"object": "list",
"data": [
{
"id": "halfred-standard",
"object": "model",
"created": "2024-01-15T10:30:00.000Z",
"owned_by": "halfred"
},
{
"id": "standard",
"object": "model",
"created": "2024-01-15T10:30:00.000Z",
"owned_by": "halfred"
},
{
"id": "halfred-deepthink",
"object": "model",
"created": "2024-01-15T10:30:00.000Z",
"owned_by": "halfred"
},
{
"id": "deepthink",
"object": "model",
"created": "2024-01-15T10:30:00.000Z",
"owned_by": "halfred"
},
(...)
]
}Response Fields
object
string
Always "list"
data
array
Array of available models
data[].id
string
Model identifier (can be used in chat completions)
data[].object
string
Always "model"
data[].created
string
ISO 8601 timestamp when the model was created
data[].owned_by
string
Always "halfred"
Model ID Formats
Models are available in two formats:
halfred-{profile}(e.g.,halfred-standard){profile}(e.g.,standard)
Both formats are equivalent and can be used interchangeably.
Error Handling
The API uses standard HTTP status codes and returns detailed error information in JSON format.
Error Response Format
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"param": null,
"code": "invalid_api_key"
}
}Common Error Codes
400
invalid_request
Bad request - check your parameters
401
authentication_error
Invalid or missing API key
403
permission_error
API key doesn't have required permissions
404
not_found
Resource not found
429
rate_limit_error
Too many requests or insufficient credits
500
server_error
Internal server error
Specific Error Scenarios
Authentication Errors
Missing Authorization header: Include
Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxxInvalid API key: Check that your API key is correct and not revoked
Expired API key: Generate a new API key if yours has expired
Request Errors
Missing or empty messages: You must provide a non-empty
messagesarrayInvalid model: Use a valid model ID from the
/modelsendpointContext limit exceeded: Reduce message length or use a profile with larger context
Streaming not supported: The
streamparameter is not yet supported
Usage Errors
Insufficient credits: Add credits to your account
Profile usage limit exceeded: Upgrade your plan or wait for limit reset
Rate limit exceeded: Reduce request frequency
Code Examples
cURL
# Get available models
curl -X GET "https://api.halfred.ai/v1/models" \
-H "Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Chat completion
curl -X POST "https://api.halfred.ai/v1/chat/completions" \
-H "Authorization: Bearer halfred_xxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"model": "standard",
"temperature": 0.7
}'SDK Implementation
For easier integration, we recommend using our official SDKs instead of making direct HTTP requests:
Node.js / TypeScript SDK - Full implementation examples with TypeScript support
Python SDK - Complete Python implementation with async support
For other languages, see our OpenAI SDK Compatibility Guide.
Best Practices
Model Selection
Use LITE for simple, fast responses and cost-sensitive applications
Use STANDARD for most production applications - it automatically selects the best model
Use DEEPTHINK for complex reasoning, long documents, or strategic tasks
Use DEV only for development and testing (free but not production-ready)
Message Formatting
Always use the
messagesarray for all requestsInclude system messages to set context and behavior
Keep conversation history to maintain context across turns
For single prompts, use a messages array with one user message
Error Handling
Always check HTTP status codes
Parse error responses to understand issues
Implement retry logic for transient errors (5xx codes)
Handle rate limiting gracefully
Performance
Cache responses when appropriate
Use appropriate temperature settings (lower for factual, higher for creative)
Monitor token usage to optimize costs
Consider using streaming for long responses (contact support for streaming access)
Support
For additional help:
Check our documentation
Contact support at [email protected]
Join our community Discord
Last updated