Guide: Submit GPU Jobs

Run compute-intensive workloads on our GPU cluster.

Overview

GPU jobs let you execute arbitrary compute tasks (training, fine-tuning, batch inference, etc.) on our hardware. Jobs are queued and assigned to available GPUs automatically.

Submit a Job

curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "command": "python train.py --epochs 10 --batch-size 32",
    "dockerImage": "nvidia/cuda:12.0-base"
  }'

Response

{
  "id": "job-uuid",
  "status": "queued",
  "queuePosition": 1
}

Monitor Your Job

curl https://intelligence.cognitera.ai/api/gpu/jobs/JOB_ID \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Job Lifecycle

queued → running → completed
                 → failed

Check Cluster Availability

Before submitting, check if GPUs are available:

curl https://intelligence.cognitera.ai/api/gpu/status \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Tips

Docker Images: Specify a Docker image with the frameworks you need (PyTorch, TensorFlow, etc.)
Queue Priority: Jobs are processed FIFO. If no GPU is available, your job will queue
Cost: You're billed for GPU-seconds while your job is running. See Pricing
Timeouts: Long-running jobs should implement checkpointing

Example: Fine-Tune a Model

curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "command": "python finetune.py --model phi3 --dataset my_data.jsonl --output /output/model",
    "dockerImage": "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
  }'

Overview​

Submit a Job​

Response​

Monitor Your Job​

Job Lifecycle​

Check Cluster Availability​

Tips​

Example: Fine-Tune a Model​