[Wien] Wien on Cray XT4

Wed Oct 10 14:20:24 CEST 2007

Thank you very much for your report which might be very useful also for
other users.

It is clear that there are always many solutions to one and the same problem.

Your solution is quite tricky, maybe it would have been easier to change just

x_lapw:    one line
    set t=time      to your "yod command"

*para_lapw:
        redefine $exe   to your "yod commands"
        (and sumpara in lapw2para)

I think all other scripts are calling x_lapw (and not an executable itself).	

Otherwise I fully agree with your statements at the end: Using an expensive
machine like this Cray makes sense only for bigger cases in the mpi-parallel
case, which may benefit from fast communication, but smaller k-point parallel
cases should be run on a cheap PC cluster.

Thanks and regards

Matyáš Novák schrieb:
> Dear Wien users,
> 
> I implemented WIEN on computer cluster Cray XT4
> (http://www.csc.fi/english/pages/louhi_guide/hardware/index_html,
> http://www.cray.com/products/xt4/) so I'm sending some experience.
> 
> Best Regards,
> Matyas Novak
> 
> ---------------------------------------------------------------------
> 
> There are two sorts of nodes in the Cray XT4: few login (or service)
> ones and many computational (so-called catamount by the name of their
> operating system). Each node contain one doublecore Opteron and some
> (in my case 2GB) memory.
> 
> The main problem is, that catamount microkernel on computing nodes doesn't
> allow to run scripts nor launching subprocess -- the only way (that I 
> find) how
> to run process on computing node is by command yod from service node.
> 
> So the point is: I must run lapw_scripts on service node, how to run 
> executables
> on computational ones? (Using some kind of "ssh replacement script" is 
> not possible,
> because many executables are called directly from lapw_scripts)
> 
> So I think the one from the better ways how to implement Wien on Cray 
> XT4 (without
> rewriting all *_lapw scripts) is to replace all executables in Wien by 
> script, that
> will call yod (to call the executables itself on the computational 
> nodes). The steps
> to do it are here:
> 
> 1) Compile Wien
> My compiler switches with PGI fortran compiler 7.0 are
> -fastsse -Mfree
> With exception lapw1 and lapw2, that requires only
> -O1 -Mfree
> -fastsse or -O2 causes wrong results in lapw1 and segmentation fault in 
> lapw2
> 
> 2) Split Wien into two directories. In the first one there will be 
> scripts and configuration
> files (we will call them <script directory>), in the second one all the 
> (compiled)
> executables (<executables directory>).
> The splitting can be done e.g. by midnight commander (the script with 
> the biggest size is
> x_lapw, all bigger files with are executables), or by simple shell script.
> 
> 3) In the <script directory> make instead of each (removed) executables 
> symbolic link to
> the following script
> 
> yod -sz 1 <executables directory>/`basename $0` $@
> 
> (We will call this script YodScript.)
> It can be easily done by following script (Using the fact that size of 
> the biggest script
> has lower size than 100kB and smaller compiled executable is greater 
> than 100kB):
> 
> 
> ls -l <executables directory> | awk '
> { if(NF >= 9) next; #get rid off symlinks
> if(int($5) < 100000) next; #rid of scripts
> system("ln --s <pathAndNameOfYodScript> <script directory>/$8)}'
> 
> 4) Set shared memory mode in Wien (so the YodScript won't be called by ssh)
> 
> 5) Generate proper .machine file:
> Because yod manages launching processes on proper hosts automatically, 
> the .machines file
> in fact need inform Wien only about the number of assigned processors. 
> So in the .machines
> file there can be "dummy" names of assigned machines, the only important 
> thing here is
> the right number of machines.
> .machines file can be generated by this script (argument is number of 
> assigned nodes).
> 
> #!/bin/sh
> hnm=`hostname`
> for i in `seq 1 $1` ; do
> echo "1:$hnm-$i";
> done
> echo "extrafine: 1"
> 
> 6) There can be problem on catamount nodes with memory -- catamount 
> microkernel needs to know,
> how it should divide memory between heap and stack (default behavior 
> seems not to be optimal
> to Wien). Yod switches '-heap' and '-stack' (see man yod) will help, add 
> them to YodScript. I've
> got good experiences with -heap 18000000
> 
> The disadvantage of Cray architecture is, that there is no way how to 
> run two independent
> executables (without mpi) on one node - so one must use whole node (two 
> cores) for one
> process and so one core is "unemployed". (There is workaround by using 
> yod -F parameter, but
> it would require to complete rewrite *_para scripts.) However, Cray is 
> created for launching
> mpi jobs (or more generally parallel jobs that requires a lot of 
> communication between nodes)
> and to launch others jobs there is a bit wasting of pricy hardware - so 
> if you want to use
> both cores, you must use mpi.
> 
> (There is a fast lustra filesystem on Cray, but I find no difference 
> against NFS (in the view
> of implementing Wien)).
> 
> MPI
> -----
> 
> On the Cray where I implemented WIEN there was no need to add specific 
> option for compiler to
> use mpi. But one must ensure the proper startup of mpi processes. So one 
> must modify previous
> steps with this one:
> 
> 1) Enable mpi by ./siteconfig (parallel execution) and compile mpi 
> version of WIEN
> (It was sufficent for me to add only -DParallel switch)
> 
> 2) Generate proper .machines file:
> Same as before, but the structure is a bit complicated and one should 
> generate line for lapw0 (it's
> not so necessarily because scaling of lapw0 is not so good), so the 
> script can look like following
> script (argument is number of chunks and number of processors in chunk).
> 
> #!/bin/sh
> hnm=`hostname`
> seq=`seq 1 $2`
> lapw1=lapw0:
> for i in `seq 1 $1` ; do
>     t=1:
>     for y in $seq ; do
>         t="$t $hnm-$i-$y:1";
>         lapw1="$lapw1 $hnm:1"
>     done
>     echo $t
> done
> echo $lapw1
> echo "extrafine: 1"
> 
> It's noticeable, that if we want to use two chunks with nine processors, 
> we must allocate ten nodes,
> because yod can't launch two different mpi processes in one node (so one 
> chunk require five nodes).
> lapw2_vector_split not work to me.
> 
> 3) Set the proper mpirun command
> There is probably simple (but no so flexible) way: set MPIRUN to
> 
> MPIRUN=yod -sz $WIEN_MPISIZE <executables directory>/_EXEC_ $@
> 
> but I don't try it. I use the same way as by nonmpi job: to replace all 
> mpi-executables (in
> <script directory>) by symbolic link to following script:
> 
> yod -sz $WIEN_MPISIZE <executables directory>/`basename $0` $@
> 
> and MPIRUN set to
> MPIRUN=_EXEC_
> 
> All mpi chunks should has the same size and the number of cores in one 
> chunk should be contained in
> exported variable (in csh family shells setenv variable) $WIEN_MACHINES) 
> The script for replacing both
> mpi and nonmpi executables can look like this:
> 
> ls -l <executables directory> | awk '
> { if(NF >= 9) next;
>   if(int($5) < 100000) next;
>   if(int($8) ~ "mpi")
>      system("ln --s <pathAndNameOfMpiYodScript> <script directory>/$8)}'
>   else
>      system("ln --s <pathAndNameOfYodScript> <script directory>/$8)}'
> 
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671             FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------