A.6. Cray, Torque PBS, and ALPS

The Cray Linux Environment sometime uses the Torque scheduler and the ALPS launcher for spawning jobs. In this environment, you typically write a batch script, and submit it with the qsub command. The script invokes the ALPS tool aprun to launch your binary across the cluster.

By default aprun optimizes the launch by pushing your supplied application binary to a ram disk on each compute node. This staging mechanism improves the launch time for normal runs.

When aprun launches the sampler from Freja, the primary sampler binary also enjoys this staging, but unfortunately ALPS does not know about the other Freja binaries, or even your application, so these binaries are not staged. Consequently, execution fails when these applications cannot be located.

In the Cray environment, you must rewrite your batch script to invoke aprun with the -b flag. This inhibits the staging mechanism altogether, and the normal rules for launching applications are used again.

 

The aprun command accepts the following options:

-b

Bypasses the transfer of the application executable to compute nodes. By default, the executable is transferred to the compute nodes as part of the aprun process of launching an application. You would likely use the -b option only if the executable to be launched was part of a file system accessible from the compute node. For more information, see the EXAMPLES section.

 
 --From: aprun(1) manual page

The proper way to launch a Freja sampling in a Cray Linux Environment with Torque and ALPS is to create a batch file:

my-job.pbs:

    #!/bin/bash
    #PBS -N my-job
    #PBS -l mppwidth=64
    #PBS -l mppnppn=32
    #PBS -l walltime=00:10:00
    
    # set PATH to include the Freja bin directory
    PATH=$PATH:installation_directory/bin

    # change directory where the job was submitted from
    cd $PBS_O_WORKDIR

    aprun -b -n 64 -N 32 sample -g 1 -o my-job-samplefiles/process-%r.smp \
          -r ./my-job arg1 arg2

and invoke this script using:

$ qsub my-job.pbs

Sample files appear in the directory $PBS_O_WORKDIR/my-job-samplefiles.

If you only launch one rank, omit -g 1, or change it into -g 0