Getting Job Efficiency

To optimize the clusters usage, it is necessary that your reservation fits the memory, the CPU and the expected time to the needs of your job.

You can check the usage of memory and CPU for either running or completed jobs to calculate their efficiency.

Running jobs

For running jobs, use squeue to get the compute node name where your job is running.

$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
         202232023     batch    numpy ceciuser  R       0:36      1 'NODEID'

In this case, it is NODEID. Then use the top command through an ssh connexion to the compute node to get the CPU usage and the process ID:

$ ssh -t NODEID top -u $USER -b -n 1
Warning: Permanently added NODEID,192.168.1.107 (ECDSA) to the list of known hosts.
top - 17:23:27 up 262 days,  2:01,  0 users,  load average: 29.59, 30.05, 32.05
Tasks: 478 total,  26 running, 238 sleeping,   0 stopped,   0 zombie
%Cpu(s): 57.4 us,  0.6 sy,  0.0 ni, 41.0 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26379747+total, 21854316+free, 16868692 used, 28385624 buff/cache
KiB Swap:  4194300 total,  3826884 free,   367416 used. 24502859+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
'11154' ceciuser  20   0 1504856 1.027g  14964 R '752.9'  0.4   6:40.75 python
 11233  ceciuser  20   0  172696   5092   4164 R    5.9   0.0   0:00.02 top
 11130  ceciuser  20   0  113856   3668   2888 S    0.0   0.0   0:00.01 slurm_scr+
 11232  ceciuser  20   0  168620   4596   3280 S    0.0   0.0   0:00.00 sshd

In this example the CPU usage is 752.9% and the process ID (PID) is 11154. Use the PID to get the memory usage with this command

$ ssh NODEID cat /proc/11154/status | grep VmRSS
Warning: Permanently added NODEID,192.168.1.107 (ECDSA) to the list of known hosts.
VmRSS:  '1076752 kB'

The CPU efficiency is 752.9÷800×100 = 94.1% and the memory efficiency is (1076752÷1024)÷1056×100 = 99.6%

In this example, the reservation was 8 CPU and 1056 MB

#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=132m

Completed jobs

For completed jobs you can use the seff command to get the efficiency, using the job ID as argument.

$ seff 202232023
Job ID: 202232023
Cluster: clusername
User/Group: ceciuser/ceciuser
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 8
CPU Utilized: 00:16:25
CPU Efficiency: '76.95%' of 00:21:20 core-walltime
Job Wall-clock time: 00:02:40
Memory Utilized: 1.03 GB
Memory Efficiency: '99.91%' of 1.03 GB

In the job example, after completion, we get 76.95% CPU efficiency and 99.91% memory efficiency.

Final remarks

You can also get the efficiency of a running or completed job with the sacct command.

$ sacct -j 202232023 -o "User,JobID%20,ReqMem,ReqCPUS,TotalCPU,Elapsed,MaxRSS,State"
     User                JobID     ReqMem  ReqCPUS   TotalCPU    Elapsed     MaxRSS      State
--------- -------------------- ---------- -------- ---------- ---------- ---------- ----------
 ceciuser            202232023      132Mc        8  16:25.207   00:02:40             COMPLETED
               202232023.batch      132Mc        8  16:25.207   00:02:40   1080420K  COMPLETED
  • MaxRSS/(ReqMem*ReqCPUS) gives the maximum of memory efficiency. In this example, (1080420)÷(132×8×1024)*100 = 99.91%
  • TotalCPU/(Elapsed*ReqCPUS) gives the CPU efficiency. In this example, (16×60+25,207)÷((2×60+40)×8)×100 = 76.95%

Note

Slurm checks periodically for the memory usage to get the “Maximum resident set size” of all tasks in job. If your code has a short peak usage of memory slurm will not see it so the value will be underestimated.

If your memory efficiency is bad, you should set the requested memory a little larger than the MaxRSS. Also, see if it is possible to estimate the memory usage based on a parameter, such as the grid size, matrix size, size of the big bunch of data read from a file ...

Several reasons can be the cause of a bad CPU efficiency

  • Some jobs have a pre-compute or post-compute part that uses less CPU. You can see if it is possible to split the calculations into several dependent jobs.

  • Some jobs creates multiple threads but only some of them are in “Running” status and the others are in “Sleep” status. This may be checked periodically while the job is running with the ssh -t NODEID top -H -n 1 -p PID command, where PID is the process ID (see above).

    $ ssh -t NODEID top -H -n 1 -p 11154
    Warning: Permanently added NODEID,192.168.1.107 (ECDSA) to the list of known hosts.
    top - 09:36:43 up 62 days, 22:10,  1 user,  load average: 23.14, 23.09, 23.08
    Threads:   4 total,   3 running,   1 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 64.8 us,  0.8 sy,  0.0 ni, 28.2 id,  6.2 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 26378952+total, 18337344+free, 46629468 used, 33786600 buff/cache
    KiB Swap:  4194300 total,  4111604 free,    82696 used. 21539291+avail Mem
    
       PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    113551 ceciuser  20   0 4826492   3.1g   2364 R 99.9  1.2 653:34.06 python
     11154 ceciuser  20   0 4826492   3.1g   2364 D 99.9  1.2   1106:26 python
    113552 ceciuser  20   0 4826492   3.1g   2364 R 99.9  1.2 653:18.76 python
    113553 ceciuser  20   0 4826492   3.1g   2364 R 99.9  1.2 652:35.98 python
    

    In this example, 3 threads are running and 1 is sleeping, as indicated by the third line of the output.

    You can reduce the number of reserved CPU to the number of running threads.