Skip to main content

Guide: Submit GPU Jobs

Run compute-intensive workloads on our GPU cluster.

Overview

GPU jobs let you execute arbitrary compute tasks (training, fine-tuning, batch inference, etc.) on our hardware. Jobs are queued and assigned to available GPUs automatically.

Submit a Job

curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"command": "python train.py --epochs 10 --batch-size 32",
"dockerImage": "nvidia/cuda:12.0-base"
}'

Response

{
"id": "job-uuid",
"status": "queued",
"queuePosition": 1
}

Monitor Your Job

curl https://intelligence.cognitera.ai/api/gpu/jobs/JOB_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN"

Job Lifecycle

queued → running → completed
→ failed

Check Cluster Availability

Before submitting, check if GPUs are available:

curl https://intelligence.cognitera.ai/api/gpu/status \
-H "Authorization: Bearer YOUR_JWT_TOKEN"

Tips

  • Docker Images: Specify a Docker image with the frameworks you need (PyTorch, TensorFlow, etc.)
  • Queue Priority: Jobs are processed FIFO. If no GPU is available, your job will queue
  • Cost: You're billed for GPU-seconds while your job is running. See Pricing
  • Timeouts: Long-running jobs should implement checkpointing

Example: Fine-Tune a Model

curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"command": "python finetune.py --model phi3 --dataset my_data.jsonl --output /output/model",
"dockerImage": "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
}'