Will work for science.: parallelization

Here is a little sneak-preview of the upcoming release of SHIELD-HIT12A.

Pimp my Niva... (artist's impression of SHIELD-HIT12A).

SHIELD-HIT12A is a Monte Carlo particle transport code capable of transporting heavy ions through arbitrary media. The -A fork was made in 2010 from SHIELD-HIT08, and since then we have added plenty of new features, solved many bugs, increased calculation speed and optimized the nuclear models to new data on carbon-12 fragmentation.

A free demo version where the random seed and statistics are fixed to 10.000 particles can be downloaded from the project development page. There are builds for Linux and Windows systems (32- and 64 bit). It is a beta-release, but we would like to bring this demo version to a broader community, and thereby hopefully also fix some more bugs before we release the full version.

The new features in SHIELD-HIT12A are:

New simplified material and beam parameter parser in free format and extensible without breaking downward compability.
Including 279 ICRU default materials and elements, it has never been so easy to specify a material.
New beam model: divergence and focus distance can now be specified (thanks to Uli Weber from Marburg).
Arbitrary starting beam directions now possible.
New routine for Vavilov straggling, 5-6 times faster than the original one by Rotondi and Montagna which was used in Geant3.21. In total, this alone means a speed improvement of roughly 30-40%.
Ripple filter has two modes of operation, Monte Carlo type or Modulus type.
Logarithmic energy binning in SPC files for TRiP
Full howto for generating DDD files for TRiP
Now only three input files are needed to setup a run.
Improved documentation.
Scoring by zones using detect.dat (complementary to Cartesian mesh and cylindrical scoring)
Alanine response model included, so SHIELD-HIT12A can directly calculate the dose equivalent response in alanine.
Flat circular and square beams can be defined
Neutron data for natural Argon was added (needed for detailed simulations of air)
Another ton of bug fixes.

Of course SHIELD-HIT12A includes the features from SHIELD-HIT10A (which was never released):

Totally new (parallelizable) scoring system:

Arbitrary Mesh and Cylindrical scoring
Lots of detectors such as energy, fluence, dose-averaged LET, track-averaged LET, average velocity (beta), dose to medium (where medium can be changed if you want to calculate stopping power ratios) etc....

finally SHIELD-HIT10A is parallelizable

New random number generator, which gives a massive performance boost
New adjusted inelastic cross sections for carbon ions based on recent data
Fine tuning of the fermi-breakup parameters
SHIELD-HIT10A can be configured without accessing the source code anymore, so no programming knowledge required to use SHIELD-HIT10A.
SHIELD-HIT10A is installable
Runs on linux again, even when compiling with code optimizations, ok with GNU gfortran, Intel and Portland compilers.
Many bug fixes

Enjoy! And please drop me a line when you find bugs in the software and errors in the manual, they are there, but we hid them well. :o)

----

(Single PC version)

Imagine you got a desktop or laptop PC with 4 or perhaps even 8 CPU cores available, and you want to run the Monte Carlo particle transport program FLUKA on it using all CPU cores.
The FLUKA execution script rfluka however was designed to run in "serial" mode. That is, if you request to repeat your simulation a lot of times (say, 100) issuing the command rfluka -N0 -M100 example, each process is launched serially, instead of utilizing all available cores on your PC.

A solution can be to use a job queuing system and a scheduler. Here, I'll present one way to do it on a Debian based Linux system. Ubuntu might work just as well, since Ubuntu is very similar to Debian. A feature of the method presented here, is that it can easily be extended to cover several PCs on your network, so you can use the computing power of your colleagues when they do not use their PCs (e.g. at night). However, this post will try to make it very simple, namely set it just on your own PC. In less than 10 minutes you'll have it up and running...

The idea is to use TORQUE in a very minimal configuration. There will be no fuzz with Maui or similar schedulers, we will only use packages we can get from the Debian/Ubuntu software repositories.
In order to be friendly to all the Ubuntu users out there, all commands issued as root are here prefixed with the "sudo" command. As a Debian user you can become root using the "su" command first.

First install these packages:

$ sudo apt-get install torque-server torque-scheduler 
$ sudo apt-get install torque-common torque-mom libtorque2

and either

$ sudo apt-get install torque-client

$ sudo apt-get install torque-client-x11

after installation we need to setup torque properly. I here assume that your PC hostname cannot be resolved by DNS, which is quite common on small local networks. You can test whether your hostname can be resolved by the "host" command. Assuming your PC has the name "kepler", you may get an answer like:

$ host $HOSTNAME
Host kepler not found: 3(NXDOMAIN)

this means you may need to edit the /etc/hosts file, so your PC can associate an IP number with your hostname. Debian like distros may have a propensity to assign the hostname to 127.0.1.1 which will not work with torque. Instead I looked up my IP number (which in my case is pretty static) using /sbin/ifconfig, and edited the /etc/hosts accordingly, using your favourite text editor (emacs, gedit, vi...)
My /etc/hosts file ended up looking like this:

127.0.0.1 localhost
#127.0.1.1 kepler.lan kepler
192.168.1.108   kepler

If your hostname of your PC can be resolved, you can ommit the last line, but under all circumstances you must comment out the line starting with 127.0.1.1.

Once this is done, execute the following commands to configure torque:

$ sudo echo $HOSTNAME > /etc/torque/server_name
$ sudo echo $HOSTNAME > /var/spool/torque/server_name
$ sudo pbs_server -t create
$ sudo echo $HOSTNAME np=`grep proc /proc/cpuinfo | wc -l` > /var/spool/torque/server_priv/nodes 
$ sudo qterm
$ sudo pbs_server
$ sudo pbs_mom

(Update: If qterm fails, you probably have a problem with your /etc/hosts file. You can still kill the server with $killall -r "pbs_*".)

Now let's see if things are running as expected:

$ pbsnodes -a
kepler
     state = free
     np = 4
     ntype = cluster
     status = rectime=1326926041,varattr=,jobs=,state=free,netload=3304768553,gres=,loadave=0.09,ncpus=4,physmem=3988892kb,availmem=6643852kb,totmem=7876584kb,idletime=2518,nusers=2,nsessions=8,sessions=1183 1760 2170 2271 2513 15794 16067 16607,uname=Linux kepler 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 x86_64,opsys=linux

and also

$sudo momctl -d 0 -h $HOSTNAME

Host: kepler/kepler   Version: 2.4.16   PID: 16835
Server[0]: kepler (192.168.1.108:15001)
  Last Msg From Server:   279 seconds (CLUSTER_ADDRS)
  Last Msg To Server:     9 seconds
HomeDirectory:          /var/spool/torque/mom_priv
MOM active:             280 seconds
LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
NOTE:  no local jobs detected

Now setup a queue, which here is called "batch".

$ sudo qmgr -c 'create queue batch'
$ sudo qmgr -c 'set queue batch queue_type = Execution'
$ sudo qmgr -c 'set queue batch resources_default.nodes = 1'
$ sudo qmgr -c 'set queue batch resources_default.walltime = 01:00:00'
$ sudo qmgr -c 'set queue batch enabled = True'
$ sudo qmgr -c 'set queue batch started = True'
$ sudo qmgr -c 'set server default_queue = batch'
$ sudo qmgr -c 'set server scheduling = True'

[update: you may want to increase walltime to 10:00:00 so jobs dont stop after 1 hour]

and start the scheduler:

$ sudo pbs_sched

The rest of the commands can be issued as a normal user (i.e. non-root).

Let's see if all servers are running:

$ ps -e | grep pbs
 1286 ?        00:00:00 pbs_mom
 1293 ?        00:00:00 pbs_server
 2174 ?        00:00:00 pbs_sched

Anything in the queue?

$ qstat
$

Nope, it's empty.

Lets try to submit a simple job

echo "sleep 20" | qsub

and within the next 20 seconds you can test, if its in the queue:

$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.kepler                 STDIN            bassler                0 R batch

Great, now were ready to rock 'n roll! This is really a minimalistic setup, which just works. For more bells and whistles, check the torque manual.

All we need, is a simple FLUKA job submission script: rtfluka.sh

#!/bin/bash
#
# how to use this
# change to directory with the files you want to run
# and enter:
# $ qsub -V -t 0-9 -d . rtfluka.sh
#
#PBS -N FLUKA_JOB
#
start="$PBS_ARRAYID"
let stop="$start+1"
stop_pad=`printf "%03i\n" $stop`
#
# Init new random number sequence for each calculation. 
# This may be a poor solution.
cp $FLUPRO/random.dat ranexample$stop_pad
sed -i '/RANDOMIZE        1.0/c\RANDOMIZE        1.0 '"${RANDOM}"'.0 \' example.inp
$FLUPRO/flutil/rfluka -N$start -M$stop example -e flukadpm3

Update: Note that your RANDOMIZE card in your own .inp file must match the sed regular expression above, else you may repeat the exact same simulation over and over again...

Let's submit 10 jobs:

$ qsub -V -t 0-9 -d . rtfluka.sh

And watch the blinkenlichts.

$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
15-0.kepler               FLUKA_JOB-0      bassler                0 R batch          
15-1.kepler               FLUKA_JOB-1      bassler                0 R batch          
15-2.kepler               FLUKA_JOB-2      bassler                0 R batch          
15-3.kepler               FLUKA_JOB-3      bassler                0 R batch          
15-4.kepler               FLUKA_JOB-4      bassler                0 Q batch          
15-5.kepler               FLUKA_JOB-5      bassler                0 Q batch          
15-6.kepler               FLUKA_JOB-6      bassler                0 Q batch          
15-7.kepler               FLUKA_JOB-7      bassler                0 Q batch          
15-8.kepler               FLUKA_JOB-8      bassler                0 Q batch          
15-9.kepler               FLUKA_JOB-9      bassler                0 Q batch

Surely, this can be improved a lot, suggestions are most welcome in the comments below. One problem is for instance, that the random number seed is limited to a 16 bit integer, which only covers a very small fraction of the possible seeds for the RANDOMIZE card.
Update: There is also a very small risk that the same seed occasionally is used twice (or more often). Alternatively one could just add a random number to a starting seed after each run. (Any MC random number experts out there?)

Output data can be processed in regular ways, using flair.
Alternatively you may use some of the scripts in the auflukatools package, which for instance can do the merging of USRBIN output with a single command. Auflukatools also includes rtfluka.sh as well as a CONDOR job submission script rcfluka.py, which is better suited for heterogenous clusters.

Finally, here is a job script for SHIELD_HITxxA, (which is even shorter):

#!/bin/bash
#
# how to use
# change to directory you want to run
# $ qsub -V -t 0-9 -d . rtshield.sh
#
#PBS -N SHIELD_JOB
shield_exe  -N$PBS_ARRAYID

Enjoy!

Totally unrelated: englishrussia.com just posted some nice pics from the Budker institute for Nuclear Physics in Novosibirsk, Russia. Certainly worth visiting, have a look at:
http://englishrussia.com/2012/01/21/the-budker-institute-of-nuclear-physics/
:-) Heaps of pioneering accelerator technology was developed there, such as electron cooling, the first collider, lithium lenses (e.g. for capturing antiprotons), and they supplied the conventional magnets for the beam transfer lines to the LHC at CERN. I visited the center many years ago but my pics are not as good. :-/ The German wiki about Budker himself, is also worth reading.

Will work for science.

Roy and Niels

Wednesday, November 21, 2012

SHIELD-HIT12A demo version released

Sunday, January 22, 2012

The Quick and Dirty Guide for Parallelizing FLUKA