[Wien] Trouble setting up parallel jobs

PGanesh pganesh at ciw.edu
Fri Dec 19 19:53:05 CET 2008


HI,

We have installed WIEN2K on our local cluster (it has ROCKS ) and can 
run it in serial beautifully.  But when running it in parallel 
environment I get the following errors in jobs.err:

Got 11 slots.
Without hostfile option, hostnames must be specified on command line.
usage: mpirun_rsh [-verbose] [-v] [-rsh|-ssh] [-paramfile=pfile] 
[-timeout=N][-debug] -[tv] [-xterm] [-show] -np N (-machinefile mfile | 
-hostfile hfile | h1 h2 ... hN) [a.out args]
Where:
        verbose    => verbose
        v          => Show version and exit
        rsh        => to use rsh for connecting
        ssh        => to use ssh for connecting
        paramfile  => file containing run-time MVICH parameters
        debug      => run each process under the control of gdb
        tv         => run each process under the control of totalview
        xterm      => run remote processes under xterm
        show       => show command for remote execution but dont run it
        np         => specify the number of processes
        h1 h2...   => names of hosts where processes should run
or      hostfile   => name of file contining hosts, one per line
or      machinefile   => name of file contining host and MPI binary, one per
                line. If MPI binary is empty for 1 or many hosts then
                the default is executed
        timeout    => Timeout for child processes to terminate
        a.out      => name of (default) MPI binary.
                It is a mandatory parameter if machinefile is not specified
                OR if machinefile has empty MPI Binary entries for 1 or
                more hosts
        args       => arguments for MPI binary

and this is what I get in my *.dayfile:

Calculating BSCCO in /home/pganesh/WIEN2k/BSCCO
on compute-0-27.local with PID 20199

    start       (Fri Dec 19 13:43:53 EST 2008) with lapw0 (40/99 to go)

    cycle 1     (Fri Dec 19 13:43:53 EST 2008)  (40/99 to go)

 >   lapw0 -p    (13:43:53) starting parallel lapw0 at Fri Dec 19 
13:43:53 EST 2008
-------- .machine1 : 3 processors
compute-0-27 compute-0-2 compute-0-17
--------
0.008u 0.032s 0:00.20 15.0%     0+0k 0+0io 6pf+0w
error: command   /home/pganesh/WIEN2k/lapw0para lapw0.def   failed

 >   stop error

I copied the scipt on the WIEN2K website that would make the .machines 
file and then executes the command:  run_lapw -NI  -p  -fc 3

This is how my .machines file looks like: 

#
lapw0:compute-0-27  compute-0-2  compute-0-17  
1:compute-0-27
1:compute-0-27
1:compute-0-27
1:compute-0-27
1:compute-0-2
1:compute-0-2
1:compute-0-2
1:compute-0-2
1:compute-0-17
1:compute-0-17
1:compute-0-17
granularity:1
extrafine:1


Thank you for the help.

regards,
Ganesh


More information about the Wien mailing list