Scientific Computing

Tid-bits, FAQs and How-To's designed to help students survive Graduate School, when the world of computers is against them.

Friday, October 18, 2013

HowTo: Launch Multiple Jobs in One Batch Submission Script, Pt 1

Many Long Jobs at Once


Recently, we have been using supercomputer resources that give higher priority for larger job (e.g. larger node count).  Some examples of jobs that might be good for these are quantum chemistry simulations to explore a potential energy surface for example exploring a reaction coordinate, or running many replicates of a stochastic simulation.  This proves to be an interesting yet relatively simple problem to solve.  A sample batch script (for the Portable Batch System (PBS)) looks like this:


#!/bin/sh
#PBS -N [JobName]
#PBS -M [email@site.domain]
#PBS -m abe
#PBS -q [QueueName]
#PBS -l nodes=<nodes>:ppn=<processors per node>,walltime=<HH>:<MM>:<SS>

# Move to the working directory
WORKING=[working directory]
cd $WORKING

# Calculate processor variables
NODESPERJOB=<nodesperjob>
PROCSPERJOB=<ppn>

# Split the available nodes
cat $PBS_NODEFILE | uniq > allNodes.txt
split -l $NODESPERJOB allNodes.txt nodefile
ls -1 nodefile?? > nodefiles.txt

# Define some things
PROGRAM=/uufs/chpc.utah.edu/common/home/u0554548/Scratch/Builds/opt/bin/StandAlone/sus

count=0
for f in $(cat nodefiles.txt)
do
  mpirun -np $PROCSPERJOB --hostfile $f $PROGRAM <arguemnts> > outfile$count.log &
  count=`expr $count + 1`
done

wait

# Cleanup temporary files
rm allNodes.txt
rm nodefile??
rm nodefiles.txt


This script assumes that the number of nodes the jobs requires evenly divides the number of processors. Here, the variables denoted by <> (those that are nonstandard) are set to integer values and those denoted with [] should be replaced with strings where appropriate.  Next week I will write a post on how to run a whole boat load of (perhaps short) jobs on a few number of nodes using an bash queue/bash array approach.  Until then, hope this helps!

No comments:

Post a Comment