Guide: Run AI Inference

Use the Cognitera API to run inference against hosted AI models.

Available Models

Currently, the following model is available:

Model	Description
`phi3:3.8b`	Microsoft Phi-3 Mini — 3.8 billion parameters, great for general tasks

Basic Inference

curl -X POST https://intelligence.cognitera.ai/api/models/infer \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi3:3.8b",
    "prompt": "What is machine learning?"
  }'

With System Prompt

Control the model's behavior with a system prompt:

curl -X POST https://intelligence.cognitera.ai/api/models/infer \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi3:3.8b",
    "prompt": "Explain gradient descent",
    "systemPrompt": "You are a machine learning professor. Explain concepts clearly with examples.",
    "temperature": 0.5,
    "maxTokens": 2048
  }'

Parameters

Temperature

Controls randomness in the output:

0.0–0.3: More focused, deterministic responses
0.5–0.7: Balanced creativity (default: 0.7)
1.0–2.0: More creative, diverse responses

Max Tokens

Limits the length of the generated response (1–8192). Default: 1024.

Node.js Example

const response = await fetch('https://intelligence.cognitera.ai/api/models/infer', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`,
  },
  body: JSON.stringify({
    model: 'phi3:3.8b',
    prompt: 'Write a haiku about programming',
    temperature: 0.9,
  }),
});

const data = await response.json();
console.log(data.response);
console.log(`Tokens used: ${data.tokensUsed}`);

Python Example

import requests

response = requests.post(
    'https://intelligence.cognitera.ai/api/models/infer',
    headers={'Authorization': f'Bearer {token}'},
    json={
        'model': 'phi3:3.8b',
        'prompt': 'Explain the transformer architecture',
        'temperature': 0.3,
        'maxTokens': 4096,
    },
)

data = response.json()
print(data['response'])
print(f"Tokens used: {data['tokensUsed']}, Duration: {data['durationMs']}ms")

Cost

Inference is billed per token generated. See Pricing for current rates.

Available Models​

Basic Inference​

With System Prompt​

Parameters​

Temperature​

Max Tokens​

Node.js Example​

Python Example​

Cost​