Skip to main content

Guide: Run AI Inference

Use the Cognitera API to run inference against hosted AI models.

Available Models

Currently, the following model is available:

ModelDescription
phi3:3.8bMicrosoft Phi-3 Mini — 3.8 billion parameters, great for general tasks

Basic Inference

curl -X POST https://intelligence.cognitera.ai/api/models/infer \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:3.8b",
"prompt": "What is machine learning?"
}'

With System Prompt

Control the model's behavior with a system prompt:

curl -X POST https://intelligence.cognitera.ai/api/models/infer \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "phi3:3.8b",
"prompt": "Explain gradient descent",
"systemPrompt": "You are a machine learning professor. Explain concepts clearly with examples.",
"temperature": 0.5,
"maxTokens": 2048
}'

Parameters

Temperature

Controls randomness in the output:

  • 0.0–0.3: More focused, deterministic responses
  • 0.5–0.7: Balanced creativity (default: 0.7)
  • 1.0–2.0: More creative, diverse responses

Max Tokens

Limits the length of the generated response (1–8192). Default: 1024.

Node.js Example

const response = await fetch('https://intelligence.cognitera.ai/api/models/infer', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
},
body: JSON.stringify({
model: 'phi3:3.8b',
prompt: 'Write a haiku about programming',
temperature: 0.9,
}),
});

const data = await response.json();
console.log(data.response);
console.log(`Tokens used: ${data.tokensUsed}`);

Python Example

import requests

response = requests.post(
'https://intelligence.cognitera.ai/api/models/infer',
headers={'Authorization': f'Bearer {token}'},
json={
'model': 'phi3:3.8b',
'prompt': 'Explain the transformer architecture',
'temperature': 0.3,
'maxTokens': 4096,
},
)

data = response.json()
print(data['response'])
print(f"Tokens used: {data['tokensUsed']}, Duration: {data['durationMs']}ms")

Cost

Inference is billed per token generated. See Pricing for current rates.