<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#ffffff">
Dear all,<br>
<br>
<br>
Sorry for this long post. I tried to be as complete as possible...<br>
<br>
<br>
I have problems to run Wien2k (v.10) in parallel. Note that the
serial executables work fine.<br>
I have compiled Wien2k successfully (at least, I guess...) on a SGI
with intel processors.<br>
The configuration I use is the following:<br>
intel/10.1.017 mkl/10.0.3.020 mvapich/1.0.1 fftw/2.1.5<br>
<br>
First, I compiled with a more recent configuration: <br>
intel/11.1.059 mkl/11.1.059 and SGI MPT 1.26 libraries<br>
but it seems that the mpirun command does not accept the -machinefile
option...<br>
<br>
<br>
The cluster consists in nodes composed of quadricores bi-processors,
so 8 cores per node.<br>
<br>
<br>
The queuing system is PBS. My script is:<br>
<br>
<font color="#3333ff">##### PBS SCRIPT<br>
#!/usr/bin/csh -v<br>
#PBS -S /usr/bin/csh<br>
#PBS -l select=1:ncpus=8:mpiprocs=8<br>
#PBS -l walltime=00:10:00<br>
#PBS -N scf_wien2k<br>
#PBS -j eo<br>
<br>
module load intel/10.1.017 mvapich/1.0.1 fftw/2.1.5<br>
cd $PBS_O_WORKDIR<br>
<br>
cat $PBS_NODEFILE > .machines_current<br>
set aa=`wc .machines_current`<br>
<br>
# for MPI<br>
echo -n 'lapw0:' > .machines<br>
set i=1<br>
while ($i < $aa[1] )<br>
echo -n `cat $PBS_NODEFILE |head -$i | tail -1` ' ' >>
.machines<br>
@ i ++<br>
end<br>
echo `cat $PBS_NODEFILE |head -$i|tail -1` ' ' >> .machines<br>
<br>
# for k-points<br>
set i=1<br>
while ($i <= $aa[1] )<br>
echo -n '1:' >> .machines<br>
head -$i .machines_current |tail -1 >> .machines<br>
@ i ++<br>
end<br>
echo 'granularity:1' >> .machines<br>
echo 'extrafine:1' >> .machines<br>
<br>
setenv WIENROOT /work/cdeck/wien2k_10<br>
setenv W2WEB_CASE_BASEDIR /work/cdeck/WIEN2k<br>
setenv STRUCTEDIT_PATH $WIENROOT/SRC_structeditor/bin<br>
<br>
$WIENROOT/run_lapw -p -i 60 -ec 0.0001 >& wien2k.log<br>
<br>
##### END OF PBS SCRIPT<br>
</font><br>
The .machines file created by the script looks like this:<br>
<br>
<font color="#3333ff">lapw0:r14i0n12 r14i0n12 r14i0n12 r14i0n12
r14i0n12 r14i0n12 r14i0n12 r14i0n12 <br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
1:r14i0n12<br>
granularity:1<br>
extrafine:1<br>
</font><br>
I also tried the following:<br>
<br>
<font color="#3333ff">lapw0:r14i0n12:8<br>
</font><font color="#3333ff">1:r14i0n12:8<br>
</font><font color="#3333ff">granularity:1<br>
extrafine:1</font><br>
<br>
But it does not work either.<br>
<br>
Now here is the content of TiC.dayfile:<br>
<br>
<font color="#3333ff">Calculating TiC in /work/cdeck/WIEN2k/TiC<br>
on r14i0n12 with PID 13222<br>
using WIEN2k_10.1 (Release 7/6/2010) in /work/cdeck/wien2k_10<br>
<br>
<br>
start (Thu Jul 22 22:49:58 CEST 2010) with lapw0 (60/99 to go)<br>
<br>
cycle 1 (Thu Jul 22 22:49:58 CEST 2010) (60/99 to go)<br>
<br>
> lapw0 -p (22:49:58) starting parallel lapw0 at Thu Jul 22
22:49:58 CEST 2010<br>
-------- .machine0 : 8 processors<br>
0.032u 0.080s 0:04.95 2.2% 0+0k 0+0io 37pf+0w<br>
error: command /work/cdeck/wien2k_10/lapw0para lapw0.def failed<br>
<br>
> stop error</font><br>
<br>
That of the log file that I called "wien2k.log" (in the PBS script):<br>
<br>
<font color="#3333ff">Exit code -3 signaled from r14i0n12<br>
Killing remote processes...2 - MPI_COMM_RANK : Null communicator<br>
[2] [] Aborting Program!<br>
[6] [] Aborting Program!<br>
5 - MPI_COMM_RANK : Null communicator<br>
4 - MPI_COMM_RANK : Null communicator<br>
3 - MPI_COMM_RANK : Null communicator<br>
[4] [] Aborting Program!<br>
[5] [] Aborting Program!<br>
[3] [] Aborting Program!<br>
7 - MPI_COMM_RANK : Null communicator<br>
[7] [] Aborting Program!<br>
Abort signaled by rank 6: Aborting program !<br>
Abort signaled by rank 2: Aborting program !<br>
Abort signaled by rank 3: Aborting program !<br>
Abort signaled by rank 4: Aborting program !<br>
Abort signaled by rank 5: Aborting program !<br>
Abort signaled by rank 7: Aborting program !<br>
1 - MPI_COMM_RANK : Null communicator<br>
[1] [] Aborting Program!<br>
Abort signaled by rank 1: Aborting program !<br>
0 - MPI_COMM_RANK : Null communicator<br>
[0] [] Aborting Program!<br>
Abort signaled by rank 0: Aborting program !<br>
MPI process terminated unexpectedly<br>
DONE<br>
<br>
> stop error<br>
</font><br>
The lapw0.error file contains: Error in LAPW0<br>
<br>
<br>
Could anyone give me a hint on what is wrong?<br>
<br>
Thank you<br>
Pascal<br>
<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
Dr. pascal Boulet, computational chemist
University of Aix-Marseille I
Laboratoire Chimie Provence, UMR 6264
Group of Theoretical Chemistry
Avenue Normandie-Niemen
13397 Marseille Cedex 20
France
**********
Tel. (+33) (0)491.63.71.17
Fax. (+33) (0)491.63.71.11
**********
<a class="moz-txt-link-freetext" href="http://www.lc-provence.fr">http://www.lc-provence.fr</a>
<a class="moz-txt-link-freetext" href="https://sites.google.com/a/univ-provence.fr/pb-comput-chem">https://sites.google.com/a/univ-provence.fr/pb-comput-chem</a>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
</pre>
</body>
</html>