advanced torque administration...

37
Advanced TORQUE Administration © Cluster Resources, Inc. Advanced TORQUE Administration Nick Ihli, Sales Engineer Josh Butikofer, Director of Grid Technologies Scott Jackson, VP Software Engineering

Upload: trinhkhue

Post on 24-May-2019

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Advanced TORQUE Administration

© Cluster Resources, Inc.

Advanced TORQUE Administration

Nick Ihli, Sales Engineer

Josh Butikofer, Director of Grid Technologies

Scott Jackson, VP Software Engineering

Page 2: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

TORQUE Resource Manager

• General Overview

• Routing Queues

• Job Arrays

• Node Health

© Cluster Resources, Inc.

• Node Health

• Handling Failures

• Checkpoint/Restart

• High Throughput

• Tuning for Scale

• New Capabilities

• On the Horizon

Page 3: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Node States

• States

– down (down)

– offline (drained)

– job-exclusive (busy)

– free (idle/running)

© Cluster Resources, Inc.

– free (idle/running)

• Changing node state

– Offline

• pbsnodes -o <nodename>

– Online

• pbsnodes -c <nodename>

• Viewing nodes of a particular state

• pbsnodes -l

Page 4: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Example Job Script

#!/bin/sh

#PBS -N ds14FeedbackDefaults

#PBS -S /bin/sh

#PBS -l nodes=1:ppn=2,walltime=240:00:00

#PBS -M [email protected]

© Cluster Resources, Inc.

#PBS -M [email protected]

#PBS -m ae

source ~/.bashrc

cat $PBS_NODEFILE

cat $PBS_O_JOBID

Page 5: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Job Submission Options

Option Description

-d Working directory path to be used for the job

-e Path for standard error

© Cluster Resources, Inc.

-I Interactive job

-l Resources required by the job

-m Mail options

-N Job name

-o Path for standard output

-q Destination queue

Page 6: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Monitoring Jobs

• qstat

-f detailed job information

© Cluster Resources, Inc.

> qstat

Job id Name User Time Use S Queue

---------------- ---------------- ---------------- -------- - -----

4807 scatter user01 12:56:34 R batch

Page 7: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Job Management

• qdel

-m sends a comment

$ qstatJob id Name User Time Use S Queue

© Cluster Resources, Inc.

Job id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----4807 scatter user01 12:56:34 R batch...$ qdel -m "hey! Stop abusing the NFS servers" 4807$

• qalter

- modify job options

Page 8: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Routing Queues

qmgr -c "create queue route"qmgr -c "set queue route queue_type=Route"qmgr –c "set queue route route_destinations=short"qmgr –c "set queue route route_destinations+=med"qmgr –c "set queue route route_destinations+=long"

© Cluster Resources, Inc.

qmgr –c "set queue route route_destinations+=long"qmgr -c "set queue route started=true"qmgr -c "set queue route enabled=true"

qmgr -c "set server default_queue=route"qmgr –c "set server resources_default.ncpus=1"qmgr –c "set server resources_default.walltime=12:00:00"

Page 9: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

qmgr -c "create queue short"qmgr -c "set queue short queue_type=execution"qmgr -c "set queue short started=true"qmgr -c "set queue short enabled=true"qmgr -c "set queue short resources_max.walltime=1:00:00"qmgr –c "set queue short priority=10000"

qmgr -c "create queue med"qmgr -c "set queue med queue_type=execution"

© Cluster Resources, Inc.

qmgr -c "set queue med queue_type=execution"qmgr -c "set queue med started=true"qmgr -c "set queue med enabled=true"qmgr -c "set queue med resources_min.walltime=1:00:00"qmgr -c "set queue med resources_max.walltime=12:00:00"qmgr –c "set queue med priority=1000"

qmgr -c "create queue long"qmgr -c "set queue long queue_type=execution" qmgr -c "set queue long started=true"qmgr -c "set queue long enabled=true "qmgr -c "set queue long resources_min.walltime=12:00:00"qmgr –c "set queue long priority=1"

Page 10: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Job Arrays

• Creation of multiple jobs with one qsub command

• Reference entire set of jobs as one group

> qsub -t 0-100 job_script

© Cluster Resources, Inc.

> qsub -t 0-100 job_script1098.hostname

qstat1098-0.hostname ...1098-1.hostname ...1098-2.hostname ...1098-3.hostname ...1098-4.hostname ...……

Page 11: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Node Health

© Cluster Resources, Inc.

Page 12: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Prologue/Epilogue Scripts

• Perform node health checks, clean up or prepare a system, etc.

• Must be available on all compute nodes

© Cluster Resources, Inc.

• Located in $PBS_HOME/mom_priv/

• Available arguments – on next slide

Page 13: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Prologue – Available Arguments

• job id

• job execution user name

• job execution group name

© Cluster Resources, Inc.

• job execution group name

• job name

• list of requested resource limits

• job execution queue

• job account

Page 14: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Epilogue – Available Arguments

• job id

• job execution user name

• job execution group name

© Cluster Resources, Inc.

• job execution group name

• job name session id

• list of requested resource limits

• list of resources used by job

• job execution queue

• job account

Page 15: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Compute Node Health Check

• Configured via the pbs_mom config file using the parameters:

– $node_check_script

– $node_check_interval

© Cluster Resources, Inc.

• Example Health Check Script

#!/bin/sh

/bin/mount | grep global

if [ $? != "0" ]thenecho "ERROR cannot locate filesystem global"

fi

http://www.clusterresources.com/wiki/doku.php?id=torque:10.2_compute_node_health_check

Page 16: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Node Health Script and Moab

• Create triggers based on the failure the health script reports

• Trigger will perform an action

© Cluster Resources, Inc.

– Offline node, email admin, display message in Moab diagnostic commands

Page 17: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

HA – High Availability

• Multiple server host machines

– One server locks the server.lock file

– Other server spins in a loop until lock clears

© Cluster Resources, Inc.

• pbs_server -ha

Page 18: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Handling Failures

© Cluster Resources, Inc.

Page 19: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Job Failures

• keep_completed option

– Specifies # of seconds job information should be kept after job has completed

– Set on the queue or server by qmgr

© Cluster Resources, Inc.

– Set on the queue or server by qmgr

– If set on both levels, queue value is used

Page 20: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Job Failures – tracejob

• tracejob– tracejob [ -n <DAYS>] <JOBID>

05/28/2008 11:41:31 S enqueuing into route, state 1 hop 105/28/2008 11:41:31 S dequeuing from route, state QUEUED05/28/2008 11:41:31 S enqueuing into long, state 1 hop 105/28/2008 11:41:31 S Job Queued at request of torque@mele, owner =

torque@mele, job name = STDIN, queue = long05/28/2008 11:41:31 A queue=route05/28/2008 11:41:31 A queue=long

© Cluster Resources, Inc.

05/28/2008 11:42:20 S Job Modified at request of root@mele05/28/2008 11:42:20 S Job Run at request of root@mele05/28/2008 11:42:20 S Job Modified at request of root@mele05/28/2008 11:42:20 A user=torque group=torque jobname=STDIN queue=long

ctime=1211996491 qtime=1211996491 etime=1211996491start=1211996540 owner=torque@mele exec_host=pala/1Resource_List.ncpus=1 Resource_List.neednodes=palaResource_List.nodect=1 Resource_List.nodes=1:ppn=1Resource_List.walltime=122:00:00

05/28/2008 11:44:00 S Exit_status=0 resources_used.cput=00:00:00resources_used.mem=3104kb resources_used.vmem=11496kbresources_used.walltime=00:01:34

05/28/2008 11:44:00 A user=torque group=torque jobname=STDIN queue=longctime=1211996491 qtime=1211996491 etime=1211996491start=1211996540 owner=torque@mele exec_host=pala/1Resource_List.ncpus=1 Resource_List.nodect=1Resource_List.nodes=1:ppn=1Resource_List.walltime=122:00:00 session=16069end=1211996640 Exit_status=0resources_used.cput=00:00:00 resources_used.mem=3104kbresources_used.vmem=11496kbresources_used.walltime=00:01:34

05/28/2008 11:44:09 S Post job file processing error

Page 21: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Cleaning up

• qdel -p

– Purge jobs that cannot be properly deleted

• qrerun [-f]

– Rerun a specified job to rerun, completed and idle

© Cluster Resources, Inc.

Page 22: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Checkpoint/Restart

© Cluster Resources, Inc.

Page 23: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Berkeley Lab Checkpoint/Restart (BLCR)

• Kernel level package – no changes needed to application code

• Allows programs running on Linux to be "checkpointed"

© Cluster Resources, Inc.

• Allows programs running on Linux to be "checkpointed"

— http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml

Page 24: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

TORQUE/BLCR Integration (Beta)

• BLCR must be installed into the kernel

• Provides 3 command line utilities:

– cr_run – runs a subprocess with checkpoint library loaded

– cr_checkpoint – causes a process, all processes within a process group, or all processes within a session, to be

© Cluster Resources, Inc.

process group, or all processes within a session, to be checkpointed

– cr_restart - restarts a process from a checkpoint file created with cr_checkpoint

Page 25: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

TORQUE pbs_mom configuration

• checkpoint_interval

– How often periodic job checkpoints will be taken (minutes)

• checkpoint_script

– Path to BLCR checkpoint script

© Cluster Resources, Inc.

– Path to BLCR checkpoint script

• restart_script

– Path to BLCR restart script

• checkpoint_run_exe

– Path to cr_run

Page 26: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Starting a Checkpointable Job

• Use -c and other arguments to control checkpointing behavior– enabled

• Checkpointing allowed but must be explicitly invoked by either qhold or qchkpt

– shutdown• Checkpointing at pbs_mom shutdown

© Cluster Resources, Inc.

• Checkpointing at pbs_mom shutdown– periodic

• Enable periodic checkpointing– interval=minutes

• Checkpoint interval in minutes.– depth=number

• Number of checkpoint images to be kept– dir=path

• Checkpoint directory (default is /var/spool/torque/checkpoint)

Page 27: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Checkpointing and Restarting

• qsub –c [argument]

• qhold or qchkpt

• qrls

© Cluster Resources, Inc.

• qrls

• Checkpoint and restart through Moab preemption policies

Page 28: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

High Throughput

© Cluster Resources, Inc.

Page 29: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Asynchronous Job Start

• qrun -a

– Reply from pbs_server returns immediately

– Reply returns before node assignments

© Cluster Resources, Inc.

– Reply returns before job is started on pbs_mom

Page 30: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

No “neednodes”

• Typical job submission runs through a submission, start and modify

• No “neednodes” removes the modify step

© Cluster Resources, Inc.

• EnableMsubQuickSubmit

Page 31: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Disable Authentication

• Faster submission as user authentication is turned off

• More practical in data centers and highly trusted environments

© Cluster Resources, Inc.

• Edit src/include/libpbs.h � #define ENABLE_TRUSTED_AUTH TRUE

• Save, recompile, reinstall pbs_server

Page 32: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Unix Domain Sockets

• ./configure --enable-unixsockets

– Enables the use of Unix domain sockets instead of internet sockets

– Faster communication for messages within the

© Cluster Resources, Inc.

– Faster communication for messages within the same machine

– Is now the default TORQUE behaviour – as of 2.3.0

Page 33: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Autorun (Experimental)

• TORQUE server finds first available node

• Runs job - bypasses the scheduler

• If failure happens, scheduler takes over the job

© Cluster Resources, Inc.

• If failure happens, scheduler takes over the job

Page 34: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Tuning for Scale

• tcp_timeout

– Default=6

• >300 nodes - build TORQUE using TCP rather than the default of RPP

© Cluster Resources, Inc.

the default of RPP

– --disable-rpp

• End user command caching

– Reduce excessive load caused from excessive user client command usage

Page 35: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

Tuning for Scale Cont’d

• job_stat_rate

• poll_jobs

• pbs_tcp_timeout

© Cluster Resources, Inc.

• pbs_tcp_timeout

• Moab specific

– JOBAGGREGATIONTIME

– RM TIMEOUT

• --disable-filesync

• Network ARP cache

Page 36: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

New Capabilities

• Cpusets

© Cluster Resources, Inc.

Page 37: Advanced TORQUE Administration NickI.ppthpc.csu.edu.cn/HPCSourceFromFtp/作业调度资料/AdvancedTorqueAdministration.pdf · • Creation of multiple jobs with one qsub command

On the Horizon

• Scheduler synch

• Ensure a stable branch

• Improved Documentation

© Cluster Resources, Inc.

• Improved Documentation

• Fulltime and Community Developers