Slurm
Updated at 2016-12-21 08:26
Slurm is an open source cluster management and job scheduling system.
Slurm software consists of two parts:
slurmd
: deamon running on each compute node.slurmctld
: deamon running on a management node.
Slurm logical entities:
- Node: a compute resource
- Partition: possibly overlapping group of nodes
- Job: allocation of resouces assigned to a user for a specific amount of time
- Job Step: set of possibly parallel tasks within a job
Partitions can be considered job queues, each has assortment of constraints such as:
- job size limit
- job time limit
- users permitted to use it
- TODO: etc?
Commands:
sacct
: report job or step information about active or completed jobs.salloc
: used to allocate resources for a job in real time.sattach
: attach STDIN, STDOUT and STDERR to currently running job or step.sbatch
: used to submit a job script with multiple parallelsruns
for later execution.sbcast
: used to transfer a file from local disk to local disk on nodes allocated to a job.scancel
: used to send arbitrary signals to all processes associated with a job or step, usually to send cancel for a pending or running job.scontrol
: administrative tool used to view or modify Slurm state.sinfo
: reports state of partitions and nodes managed by Slurm.smap
: reports state of jobs, partitions and nodes as graphical display.squeue
: reports sate of jobs and steps.srun
: used to submit a job or step for execution in real time.strigger
: set, get, or view event triggers.sview
: graphical UI to get and update state information for jobs, partitions and nodes.
# Here we can see that we have 2 partitions with 5 and 10 nodes.
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up 30:00 2 down* adev[1-2]
debug* up 30:00 3 idle adev[3-5]
batch up 30:00 3 down* adev[6,13,15]
batch up 30:00 3 alloc adev[7-8,14]
batch up 30:00 4 idle adev[9-12]
# We have 3 jobs in queue, 2 of them running and one pending.
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
65646 batch chem mike R 24:19 2 adev[7-8]
65647 batch bio joan R 0:09 1 adev14
65648 batch math phil PD 0:00 6 (Resources)
# scontrol shows more details
scontrol show partition
PartitionName=debug TotalNodes=5 TotalCPUs=40 RootOnly=NO
Default=YES OverSubscribe=FORCE:4 PriorityTier=1 State=UP
MaxTime=00:30:00 Hidden=NO
MinNodes=1 MaxNodes=26 DisableRootJobs=NO AllowGroups=ALL
Nodes=adev[1-5] NodeIndices=0-4
PartitionName=batch TotalNodes=10 TotalCPUs=80 RootOnly=NO
Default=NO OverSubscribe=FORCE:4 PriorityTier=1 State=UP
MaxTime=16:00:00 Hidden=NO
MinNodes=1 MaxNodes=26 DisableRootJobs=NO AllowGroups=ALL
Nodes=adev[6-15] NodeIndices=5-14
# Execute /bin/hostname on three nodes (-N3).
srun -N3 -l /bin/hostname
0: adev3
1: adev4
2: adev5
# Executes /bin/hostname in four tasks (-n4), one CPU per task by default.
srun -n4 -l /bin/hostname
0: adev3
1: adev3
2: adev3
3: adev3
# Queue a specific script for later on specific nodes.
cat my.script
#!/bin/sh
#SBATCH --time=1
/bin/hostname
srun -l /bin/hostname
srun -l /bin/pwd
sbatch -n4 -w "adev[9-10]" -o my.stdout my.script
sbatch: Submitted batch job 469
cat my.stdout
# 1) define resource
# 2) transfer the executable program a.out to /tmp/joe.a.out on local storages
# 3) run it on nodes
# 4) delete it from nodes
# 5) exit
salloc -N1024 bash
sbcast a.out /tmp/joe.a.out
srun /tmp/joe.a.out
srun rm /tmp/joe.a.out
exit
# Submit a batch job, get its status and cancel it.
sbatch test
squeue
scancel 473
squeue