Models API
Run inference against hosted AI models. Currently powered by Ollama.
Available Models
| Model | Parameters | Provider | Description |
|---|---|---|---|
phi3:3.8b | 3.8B | Ollama | Microsoft Phi-3 Mini — compact, efficient model for general tasks |
More models will be added over time.
List Available Models
GET /api/models
Public endpoint — no authentication required.
Response
{
"models": [
{
"name": "phi3:3.8b",
"provider": "ollama",
"parameterSize": "3.8B",
"description": "Microsoft Phi-3 Mini — compact, efficient model for general tasks"
}
]
}
Run Inference
POST /api/models/infer
Requires: Authorization: Bearer YOUR_JWT_TOKEN
Request Body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model identifier (e.g. phi3:3.8b) |
prompt | string | Yes | — | The prompt to send |
systemPrompt | string | No | — | System instructions for the model |
temperature | number | No | 0.7 | Sampling temperature (0.0–2.0) |
maxTokens | number | No | 1024 | Maximum tokens to generate (1–8192) |
stream | boolean | No | false | Whether to stream the response |
Example
curl -X POST https://intelligence.cognitera.ai/api/models/infer \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:3.8b",
"prompt": "Write a Python function to calculate fibonacci numbers",
"systemPrompt": "You are a helpful programming assistant.",
"temperature": 0.3,
"maxTokens": 2048
}'
Response
{
"id": "inference-uuid",
"model": "phi3:3.8b",
"response": "Here's a Python function to calculate Fibonacci numbers:\n\n```python\ndef fibonacci(n):\n if n <= 1:\n return n\n a, b = 0, 1\n for _ in range(2, n + 1):\n a, b = b, a + b\n return b\n```\n\nThis iterative approach runs in O(n) time...",
"tokensUsed": 187,
"durationMs": 2340
}
Health Check
GET /api/models/health
Public endpoint — verify model provider connectivity.
Response
{
"ollama": true
}