Guide: Run AI Inference
Use the Cognitera API to run inference against hosted AI models.
Available Models
Currently, the following model is available:
| Model | Description |
|---|---|
phi3:3.8b | Microsoft Phi-3 Mini — 3.8 billion parameters, great for general tasks |
Basic Inference
curl -X POST https://intelligence.cognitera.ai/api/models/infer \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:3.8b",
"prompt": "What is machine learning?"
}'
With System Prompt
Control the model's behavior with a system prompt:
curl -X POST https://intelligence.cognitera.ai/api/models/infer \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:3.8b",
"prompt": "Explain gradient descent",
"systemPrompt": "You are a machine learning professor. Explain concepts clearly with examples.",
"temperature": 0.5,
"maxTokens": 2048
}'
Parameters
Temperature
Controls randomness in the output:
- 0.0–0.3: More focused, deterministic responses
- 0.5–0.7: Balanced creativity (default: 0.7)
- 1.0–2.0: More creative, diverse responses
Max Tokens
Limits the length of the generated response (1–8192). Default: 1024.
Node.js Example
const response = await fetch('https://intelligence.cognitera.ai/api/models/infer', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
},
body: JSON.stringify({
model: 'phi3:3.8b',
prompt: 'Write a haiku about programming',
temperature: 0.9,
}),
});
const data = await response.json();
console.log(data.response);
console.log(`Tokens used: ${data.tokensUsed}`);
Python Example
import requests
response = requests.post(
'https://intelligence.cognitera.ai/api/models/infer',
headers={'Authorization': f'Bearer {token}'},
json={
'model': 'phi3:3.8b',
'prompt': 'Explain the transformer architecture',
'temperature': 0.3,
'maxTokens': 4096,
},
)
data = response.json()
print(data['response'])
print(f"Tokens used: {data['tokensUsed']}, Duration: {data['durationMs']}ms")
Cost
Inference is billed per token generated. See Pricing for current rates.