[Wien] problem in parallel mode calculation

Gavin Abo gsabo at crimson.ua.edu
Tue Mar 14 04:25:05 CET 2017


The .machines file looks fine to me, but one of the others might see 
something that I didn't notice (besides the WIEN2k command not being 
there at the bottom of the file - likely missed in the copy and paste).

The main problem seems to the "bash: lapw1: command not found" unless 
something happened earlier that is not shown.  Tracking down parallel 
error messages is more complicated.  Unlike a serial calculation that 
can output the standard output and error to the display of a terminal on 
a desktop, a parallel calculation on a cluster with a queue system can 
put them in a standard output (-o) and standard error file (-e) or a 
combined output/error file (-j) with user specified name(s) [1,2].  They 
can also be written to the hidden dot files like .time* or .stdout* as 
mentioned before [3,4,5].

The "lapw1: command not found" might be because $WIENROOT didn't get 
added to the PATH on one of the nodes [ 
http://www.supercluster.org/pipermail/torqueusers/2010-March/010143.html 
].  Did you try checking if the path to WIEN2k is in the PATH, such as 
PBS_O_PATH with qstat -f jobid [ 
http://stackoverflow.com/questions/21248406/sleep-command-not-found-in-torque-pbs-but-works-in-shell 
].

Did you try to ssh into all 8 nodes and see if you can see lapw1 on each 
node?  For example,

ssh n024
ls -l $WIENROOT/lapw1

ssh n225
ls -l $WIENROOT/lapw1

...

Above, I'm just guessing about the commands/configuration for your 
system, but the administrator or helpdesk for your cluster should know 
everything about your system and be able to help you much better with 
resolving the command not found error.

[1] http://beige.ucs.indiana.edu/I590/node39.html
[2] 
https://wikis.nyu.edu/display/NYUHPC/Tutorial+-+Submitting+a+job+using+qsub
[3] 
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13598.html
[4] 
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14148.html
[5] http://zeus.theochem.tuwien.ac.at/pipermail/wien/2017-March/026109.html

On 3/13/2017 1:25 PM, shaymlal dayananda wrote:
> Dear developers and users
>
> I was trying to do a volume optimization and scf calculation with spin 
> polarization in parallel mode. But my both the jobs crashes and I got 
> the following error file. However both cases run correctly when 
> parallel mode is removed.
> ............................................................................
> 'LAPW2' - can't open unit: 30
>  'LAPW2' -        filename: case.energyup_1
> **  testerror: Error in Parallel LAPW2
> .................................................................................
> Also in STDOUT , I see the following particular errors. (
>
> .......................................................................
> bash: lapw1: command not found
> ...
> ....
> .....
> FERMI - Error
> grep: *scf1dn*: No such file or directory
> 0.381u 0.507s 1:12.66 1.2%    0+0k 128+1736io 1pf+0w
> Test-TiC-VOl-parallel.scf1dn_1: No such file or directory.
> .............................................................................
>
>
> I copied my machine file and the job file here. But I think this is 
> not correct and I am not sure whether I needs to have lines for lapw2 
> and lapwsp separately. Any help to get corrected this is highly 
> appreciated.
>
> ".machnes" file
> .............................
> #
> lapw0:n024  n225  n220  n218  n045  n044  n043  n043
> 1:n024
> 1:n225
> 1:n220
> 1:n218
> 1:n045
> 1:n044
> 1:n043
> 1:n043
> granularity:1
> extrafine:1
>
> ......................................................
>
> job file is copied below.
>
>
> # example for 8 nodes
> #PBS -l procs=8
> #PBS -l pmem=2048mb
> #PBS -l walltime=4:00:00
>
> module load wien2k
>
> # change into your working directory
> cd $PBS_O_WORKDIR
> #start creating .machines
> cat $PBS_NODEFILE |cut -c1-6 >.machines_current
> aa=`cat .machines_current | wc -l`
> echo '#' > .machines
>
> # example for an MPI parallel lapw0
> echo -n 'lapw0:' >> .machines
> i=1
> while [ $i -lt $aa ]
> do
> echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >>.machines
> i=$((i+1))
> done
> echo  `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >>.machines
>
> #example for k-point parallel lapw1/2
> i=1
> while [ $i -le $aa ]
> do
> echo -n '1:' >>.machines
> head -$i .machines_current |tail -1 >> .machines
> i=$((i+1))
> done
>
> echo 'granularity:1' >>.machines
> echo 'extrafine:1' >>.machines
>
> #define here your WIEN2k command
>
>
> ....................................................................
>
>
> Thank you
>
> Chami
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20170313/9f1348a8/attachment.html>


More information about the Wien mailing list