Guide: Submit GPU Jobs
Run compute-intensive workloads on our GPU cluster.
Overview
GPU jobs let you execute arbitrary compute tasks (training, fine-tuning, batch inference, etc.) on our hardware. Jobs are queued and assigned to available GPUs automatically.
Submit a Job
curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"command": "python train.py --epochs 10 --batch-size 32",
"dockerImage": "nvidia/cuda:12.0-base"
}'
Response
{
"id": "job-uuid",
"status": "queued",
"queuePosition": 1
}
Monitor Your Job
curl https://intelligence.cognitera.ai/api/gpu/jobs/JOB_ID \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Job Lifecycle
queued → running → completed
→ failed
Check Cluster Availability
Before submitting, check if GPUs are available:
curl https://intelligence.cognitera.ai/api/gpu/status \
-H "Authorization: Bearer YOUR_JWT_TOKEN"
Tips
- Docker Images: Specify a Docker image with the frameworks you need (PyTorch, TensorFlow, etc.)
- Queue Priority: Jobs are processed FIFO. If no GPU is available, your job will queue
- Cost: You're billed for GPU-seconds while your job is running. See Pricing
- Timeouts: Long-running jobs should implement checkpointing
Example: Fine-Tune a Model
curl -X POST https://intelligence.cognitera.ai/api/gpu/jobs \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"command": "python finetune.py --model phi3 --dataset my_data.jsonl --output /output/model",
"dockerImage": "pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime"
}'