Users Guide

In the following we will refer to a tool called nohup. This name is missleading as we do not use the actual nohup tool for several reasons, the most significant being that nohup some times hangs up. Instead we start a background job in a subshell, i.e.

$ (./job_name parameters & )

This essentially have the desired effect, namely that the program does not hang up when the shell is closed. Throughout this document we will refer to this as nohup rather than starting a background job in a subshell. For more information on nohup alternatives see What to do when nohup hangs up anyway.

Quick start

This section is intended for getting you started quickly with BatchQ and consequently, few or no explanations of the commands/scripts will be given. If you would like the full explanation on how BatchQ works, skip this section. If you choose to read the quick start read all of it no matter whether you prefer Python over bash.

Command line

First you need to create configurations for the machines you want to access. This is not necessary, but convenient (more details are given in the following sections). Open bash and type

$ q configuration my_server_configuration  --working_directory="Submission" --command="./script"  --input_directory="." --port=22 --server="server.address.com" --global
$ q configuration your_name  --username="your_default_used"  --global

In the above change the server.address.com to the server address you wish to access. Also, change the username in the second line to your default username. Next, create a new director MyFirstSubmission and download the script sleepy

mkdir MyFirstSubmission
cd MyFirstSubmission
wget https://raw.github.com/troelsfr/BatchQ/master/scripts/sleepy
chmod +x sleepy

The job sleepy sleeps for 100 seconds every and for every second it echos “Hello world”. Submit it using server.address.com using the command:

$ q [batch_system] job@my_server_configuration,your_name --command="./sleepy"

Here batch_system should be either nohup, ssh-nohup or lsf. Check the status of the job with

$ q [batch_system] job@my_server_configuration,your_name --command="./sleepy"
Job is running.

And after 100s you get

$ q [batch_system] job@my_server_configuration,your_name --command="./sleepy"
Job has finished.
Retrieving results.

At this point new files should appear in your current directory:

$ ls
sleepy   sleepy.data

In order to see the logs of the submission type

$ q [batch_system] stdout@my_server_configuration,your_name --command="./sleepy"
This is the sleepy stdout.
$ q [batch_system] stderr@my_server_configuration,your_name --command="./sleepy"
This is the sleepy stderr.
$ q [batch_system] log@my_server_configuration,your_name --command="./sleepy"
(...)

The last command will differ depending on which submission system you use. Finally, we clean up on the server:

$ q [batch_system] delete@my_server_configuration,your_name --command="./sleepy"
True

Congratulations! You have submitted your first job using the command line tool.

Python

Next, open an editor, enter the following Python code:

from batchq.queues import  LSFBSub
from batchq.core.batch import DescriptorQ

class ServerDescriptor(DescriptorQ):
  queue = LSFBSub
  username = "default_user"
  server="server.address.com"
  port=22
  prior = "module load open_mpi goto2 python hdf5 cmake mkl\nexport PATH=$PATH:$HOME/opt/alps/bin"
  working_directory = "Submission"

desc1 = ServerDescriptor(username="tronnow",command="./sleepy 1", input_directory=".", output_directory=".")
desc2 = ServerDescriptor(desc1, command="./sleepy 2", input_directory=".", output_directory=".")

print "Handling job 1"
desc1.job()
print "Handling job 2"
desc2.job()

and save it as job_submitter.py in MyFirstSubmission. Note that in the above code we use the second descriptor to initiate the first descriptor in order to reuse the queue defined for desc1 in desc2. Go back to the shell and type:

$ python job_submitter.py
Rerun the code to get the status of the job and to pull finished jobs.
Your second submission was done with Python.

Note

If choosing same input and output directory you will run into problems when running this script several times as hash sum of the input changes once the results have been pulled. This means that you may accidently resubmit a finished job.

The above can be overcome by either separating input_directory and output_directory, or by setting the submission id manually:

from batchq.queues import  LSFBSub
from batchq.core.batch import Descriptor as DescriptorQ

class ServerDescriptor(DescriptorQ):
  queue = LSFBSub
  username = "default_user"
  server="server.address.com"
  port=22
  prior = "module load open_mpi goto2 python hdf5 cmake mkl\nexport PATH=$PATH:$HOME/opt/alps/bin"
  working_directory = "Submission"
  
desc1 = ServerDescriptor(username="tronnow",command="./sleepy 1", input_directory=".", output_directory=".", overwrite_submission_id="simu1")
desc2 = ServerDescriptor(desc1, command="./sleepy 2", input_directory=".", output_directory=".", overwrite_submission_id="simu2")

print "Handling job 1"
desc1.job()
print "Handling job 2"
desc2.job()

To shorten the above you may use your previously defined configurations

from batchq.queues import  LSFBSub
from batchq.core.batch import DescriptorQ, load_queue

q = load_queue(LSFBSub, "my_server_configuration,your_name")  
desc1 = DescriptorQ(q, command="./sleepy 1", input_directory=".", output_directory=".", overwrite_submission_id="simu1")
desc2 = DescriptorQ(q, command="./sleepy 2", input_directory=".", output_directory=".", overwrite_submission_id="simu2")

print "Handling job 1"
desc1.job()
print "Handling job 2"
desc2.job()

We can now generalise this to arbitrarily many jobs:

from batchq.queues import  LSFBSub
from batchq.core.batch import DescriptorQ, load_queue

q = load_queue(LSFBSub, "my_server_configuration,your_name") 
for i in range(1,10):
    desc = DescriptorQ(q, command="./sleepy %d" %i, input_directory=".", output_directory=".", overwrite_submission_id="simu%d" %i)

    print "Handling job %d" %i
    desc.job()

If we know that the output files does not overwrite each other it is only necessary to keep one copy of the input folder. This can be done by specifying the subdirectory

from batchq.queues import  LSFBSub
from batchq.core.batch import DescriptorQ, load_queue

q = load_queue(LSFBSub, "my_server_configuration,your_name") 
for i in range(1,15):
    desc = DescriptorQ(q, command="./sleepy %d" %i, input_directory=".", output_directory=".", overwrite_submission_id="simu%d" %i, subdirectory="mysimulation")
    print "Handling job %d" %i
    desc.job()

Using the command line tool

The following section will treat usage of BatchQ from the command line.

Available modules

The modules available to BatchQ will vary from system to system depending on whether custom modules have been installed. Modules are divided into four categories: functions, queues, pipelines and templates. The general syntax of the Q command is:

$ q [function/queue/template] [arguments]

The following functions are available through the console interface and using Python and are standard modules included in BatchQ which provides information about other modules

Submitting jobs

The BatchQ command line interface provides you with two predefined submission modules: nohup and lsf. nohup is available on every

To submit a job type:

$ cd /path/to/input/directory
$ q lsf submit -i --username=user --working_directory="Submission" --command="./script" --input_directory="." --port=22 --server="server.address.com"

The above command will attempt to log on to server.address.com using the username user through port 22. It then creates a working directory called Submission in the entrance folder (usually your home directory on the server) and transfer all the files from your input_directory to this folder. The command is then submitted to lsf and the SSH connection is terminated.

Once you have automated the submission process you want to store the configuration parameters in a file in order to shorten the commands need to operate on your submissions. Using the example from before, this can be done as

$ q configuration brutus -i --username=user --working_directory="Submission" --command="./script" --input_directory="." --port=22 --server="server.address.com"

The above code creates a configuration named “brutus” which contains the instructions for submitting your job on “server.address.com”. Having created a configuration file you can now submit jobs and check status with

$ q lsf submit@brutus
True
$ q lsf pid@brutus
12452

This keeps things short and simple. You will need to create a configuration file for each server you want to submit your job. If for one or another reason you temporarily want to change parameters of your configuration, say the working_directory, this can be done by adding a long parameter:

$ q lsf submit@brutus --working_directory="Submission2"
True

You can configure Batch Q command line tool with several input configurations

Checking the status of a job, retrieving results and deleting the working directory of a simulation is now equally simple

$ q lsf status@brutus
DONE

$ q lsf recv@brutus
True

$ q lsf delete@brutus
True

The retrieve command will only retrieves files that does not exist, or differs from those in the input directory.

Finally, the Q system implements a fully automated job submission meaning that the system will try to determine the state of you job and take action accordingly. For fast job submission and status checking write:

$ q lsf job@brutus,config
Uploading input directory.
Submitted job on brutus.ethz.ch

$ q lsf job@brutus,config
Job pending on brutus.ethz.ch

$ q lsf job@brutus,config
Job running on brutus.ethz.ch

$ q lsf job@brutus,config
Job finished on brutus.ethz.ch
Retrieving results.

Do you want to remove the directory 'Submission2' on brutus.ethz.ch (Y/N)? Y
Deleted Submission2 on brutus.ethz.ch

You can equally submit the job on your local machine using nohup instead of lsf.

A few words on job hashing

When submitting a job Batch Q generates a hash for your job. The hash includes following:

  • An MD5/SHA sum of the input directory
  • The name of the server to which the job is submitted
  • The submitted command (including parameters)

It is not recommended, nevertheless possible, to overwrite the hash key. This can be done by adding a --overwrite_submission_id="your_custom_id". This can be useful in some cases. For instance you might want to work on your source code during development. This would consequently changed the MD5 of your input directory and Batch Q would be incapable of recognising your job submission. Batch Q is shipped with a configuration for debugging which can be invoked by

$ q lsf submit@brutus,debug

The debug configuration is only suitable for debug as the submission id is debug.

Another scenario where you may want to change the hashing routine is the case where you store your output data in your input directory. Submitting several jobs and pulling results will over time change the hash of the input directory. To overcome this issue add eio (short for equal input/output directory) to your configuration

$ q lsf submit@brutus,eio

The eio flag will overwrite your output_directory with the value of your input_directory and change the hashing routine to only include the command and server name.

Example: Submitting ALPS jobs using nohup

Example: Submitting ALPS jobs using LSF

Example: Submitting multiple jobs from one submission directory

In some cases one may not want to copy the same directory several times to the server as this may take up vast amounts of space. If the simulation output only depends the command line parameters (as is the case for ALPS spinmc) one can use the eio configuration to submit several commands reusing the same submission directory

$ q lsf job@brutus,eio --working_directory="Submission" --command="spinmc TODO1"
$ q lsf job@brutus,eio --working_directory="Submission" --command="spinmc TODO2"
$ q lsf job@brutus,eio --working_directory="Submission" --command="spinmc TODO3"

Using Python

Submitting jobs

Retrieving results

Q descriptors, Q holders and Q functions

Batch Q user API is based on three main classes Q descriptors, Q holders (queues) and Q functions. Usually Q functions are members of instances of Q holder classes while Q descriptors are reference objects used to ensure that you do not open more SSH connections than necessary. Descriptors link a set of input configuration parameters to a given queue. An example could be:

class ServerDescriptor(DescriptorQ):
      queue = LSFBSub
      username = "user"
      server="server.address.com"
      port=22
      options = ""
      prior = "module load open_mpi goto2 python hdf5 cmake mkl\nexport PATH=$PATH:$HOME/opt/alps/bin"
      post = ""
      working_directory = "Submission"

The descriptor ServerDescriptor implements all Q functions and properties defined in the class LSFBSub. However, the descriptor ensures that the all queue parameters are set accordingly to those given by the descriptor definition before executing a command on the queue. Therefore, if you have two descriptor instances that shares a queue

queue = LSFBSub()
desc1 = DescriptorQ(queue)
desc1.update_configuration(working_directory = "Submission1")
desc2 = DescriptorQ(queue)
desc2.update_configuration(working_directory = "Submission2")

you are ensured that your are working in the correct directory by using the descriptor instead of the queue directly. Notice that in order to update the queue properties using the descriptor one needs to use update_configuration rather than descriptor.property (i.e. desc2.working_directory in the above example). The reason for this is that any method or property of a descriptor object is an “reference” to queue methods and properties. The only methods that are not redirected are the implemented descriptor methods:

desc1 = ServerDescriptor()
desc2 = ServerDescriptor(desc2)
desc2.update_configuration(working_directory = "Submission2")

In general, when copying descriptors, make sure to do a shallow copy as you do not want to make a deep copy of the queue object.

Example: Submitting ALPS jobs using nohup

The BatchQ package comes with a preprogrammed package for ALPS. This enables easy and fast scripting for submitting background jobs on local and remote machines. Our starting points is the Spin MC example from the ALPS documentation:

import pyalps
import matplotlib.pyplot as plt
import pyalps.plot
import sys

print "Starting"

parms = []
for t in [1.5,2,2.5]:
   parms.append(
       { 
         'LATTICE'        : "square lattice", 
         'T'              : t,
         'J'              : 1 ,
         'THERMALIZATION' : 1000,
         'SWEEPS'         : 100000,
         'UPDATE'         : "cluster",
         'MODEL'          : "Ising",
         'L'              : 8
       }
   )

input_file = pyalps.writeInputFiles('parm1',parms)
desc = pyalps.runApplication('spinmc',input_file,Tmin=5,writexml=True)

result_files = pyalps.getResultFiles(prefix='parm1')
print result_files
print pyalps.loadObservableList(result_files)
data = pyalps.loadMeasurements(result_files,['|Magnetization|','Magnetization^2'])
print data
plotdata = pyalps.collectXY(data,'T','|Magnetization|')
plt.figure()
pyalps.plot.plot(plotdata)
plt.xlim(0,3)
plt.ylim(0,1)
plt.title('Ising model')
plt.show()
print pyalps.plot.convertToText(plotdata)
print pyalps.plot.makeGracePlot(plotdata)
print pyalps.plot.makeGnuplotPlot(plotdata)
binder = pyalps.DataSet()
binder.props = pyalps.dict_intersect([d[0].props for d in data])
binder.x = [d[0].props['T'] for d in data]
binder.y = [d[1].y[0]/(d[0].y[0]*d[0].y[0]) for d in data]
print binder
plt.figure()
pyalps.plot.plot(binder)
plt.xlabel('T')
plt.ylabel('Binder cumulant')
plt.show()

Introducing a few small changes the script now runs using BatchQ for submission:

from batchq.contrib.alps import runApplicationBackground, LSFBSub, DescriptorQ
import pyalps
import matplotlib.pyplot as plt
import pyalps.plot
import sys

parms = []
for t in [1.5,2,2.5]:
   parms.append(
       { 
         'LATTICE'        : "square lattice", 
         'T'              : t,
         'J'              : 1 ,
         'THERMALIZATION' : 1000,
         'SWEEPS'         : 100000,
         'UPDATE'         : "cluster",
         'MODEL'          : "Ising",
         'L'              : 8
       }
   )

input_file = pyalps.writeInputFiles('parm1',parms)

class Brutus(DescriptorQ):
  queue = LSFBSub
  username = "tronnow"
  server="brutus.ethz.ch"
  port=22
  options = ""
  prior = "module load open_mpi goto2 python hdf5 cmake mkl\nexport PATH=$PATH:$HOME/opt/alps/bin"
  post = ""
  working_directory = "Submission"

desc = runApplicationBackground('spinmc',input_file,Tmin=5,writexml=True, descriptor = Brutus(), force_resubmit = False )


if not desc.finished():
   print "Your simulations has not yet ended, please run this command again later."
else:
    if desc.failed():
        print "Your submission has failed"
        sys.exit(-1)
    result_files = pyalps.getResultFiles(prefix='parm1')
    print result_files
    print pyalps.loadObservableList(result_files)
    data = pyalps.loadMeasurements(result_files,['|Magnetization|','Magnetization^2'])
    print data
    plotdata = pyalps.collectXY(data,'T','|Magnetization|')
    plt.figure()
    pyalps.plot.plot(plotdata)
    plt.xlim(0,3)
    plt.ylim(0,1)
    plt.title('Ising model')
    plt.show()
    print pyalps.plot.convertToText(plotdata)
    print pyalps.plot.makeGracePlot(plotdata)
    print pyalps.plot.makeGnuplotPlot(plotdata)
    binder = pyalps.DataSet()
    binder.props = pyalps.dict_intersect([d[0].props for d in data])
    binder.x = [d[0].props['T'] for d in data]
    binder.y = [d[1].y[0]/(d[0].y[0]*d[0].y[0]) for d in data]
    print binder
    plt.figure()
    pyalps.plot.plot(binder)
    plt.xlabel('T')
    plt.ylabel('Binder cumulant')
    plt.show()

Executing this code submit the program as a background process on the local machine. It can easily be extended to supporting SSH and LFS by changing the queue object:

# TODO: Give example

Example: Submitting multiple jobs from one submission directory

Using VisTrails

Submitting jobs

Retrieving results

Example: Submitting ALPS jobs using nohup

Example: Submitting ALPS jobs using LSF