<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
See below for my comments.<br>
<br>
<blockquote
cite="mid:CAFZG4C4qKLt8aUZvja616wH+8mC8jF0cL0kd2MTtRh3TyX5guQ@mail.gmail.com"
type="cite">
<div dir="ltr">Thanks for all the information and suggestions.
<div>
<div><br>
</div>
<div>I have tried to change <span style="color:rgb(80,0,80)">-lmkl_blacs_intelmpi_lp64
to </span><span style="color:rgb(80,0,80)">-lmkl_blacs_lp64
and recompile. However, I got the following error message
in the screen output</span></div>
<div><span style="color:rgb(80,0,80)"><br>
</span></div>
<div>
<div><font color="#500050"> LAPW0 END</font></div>
<div><font color="#500050">[cli_14]: [cli_15]: [cli_6]:
aborting job:</font></div>
<div><font color="#500050">Fatal error in PMPI_Comm_size:</font></div>
<div><font color="#500050">Invalid communicator, error
stack:</font></div>
<div><font color="#500050">PMPI_Comm_size(110):
MPI_Comm_size(comm=0x5b, size=0x7f190c) failed</font></div>
<div><font color="#500050">PMPI_Comm_size(69).: Invalid
communicator</font></div>
<div><font color="#500050">
<div>aborting job:</div>
<div>Fatal error in PMPI_Comm_size:</div>
<div>Invalid communicator, error stack:</div>
<div>PMPI_Comm_size(110): MPI_Comm_size(comm=0x5b,
size=0x7f190c) failed</div>
<div>PMPI_Comm_size(69).: Invalid communicator</div>
<div>.......<br>
</div>
</font></div>
</div>
<div>
<div><font color="#500050">[z0-5:mpispawn_0][readline]
Unexpected End-Of-File on file descriptor 20. MPI
process died?</font></div>
<div><font color="#500050">[z0-5:mpispawn_0][mtpmi_processops]
Error while reading PMI socket. MPI process died?</font></div>
<div>
<div>[z0-5:mpispawn_0][child_handler] MPI process (rank:
14, pid: 11260) exited with status 1</div>
<div>[z0-5:mpispawn_0][child_handler] MPI process (rank:
3, pid: 11249) exited with status 1</div>
<div>[z0-5:mpispawn_0][child_handler] MPI process (rank:
6, pid: 11252) exited with status 1</div>
<div>.....</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
This is probably because you are using the wrong blacs library. The
-lmkl_blacs_lp64 is for MPICH, but you are using a variant of
MPICH3.<br>
<br>
<blockquote
cite="mid:CAFZG4C4qKLt8aUZvja616wH+8mC8jF0cL0kd2MTtRh3TyX5guQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div style="color:rgb(80,0,80)">Previously I compiled the
program with -lmkl_blacs_intelmpi_lp64 and the mpi
parallelization on a single node seems to be working. I
notice that during the run, the *.error files have finite
sizes, but I re-examine them after the job finished and
there were no errors written inside (and the files have
0kb now). Does this indicates that the mpi is not running
probably at all even on a single node? But I have checked
the output result and it's in agreement with the non-mpi
results..(for some simple cases)</div>
</div>
</div>
</div>
</blockquote>
<br>
Sounds like it is working fine on a single node. At least for now,
stay with -lmkl_blacs_intelmpi_lp64 as it works for a single node.<br>
<br>
As I asked before, did you give us all the error information in the
case.dayfile and from standard output? It is not entirely clear in
your previous posts, but it looks to me that you might have only
provided information from the case.dayfile and the error files (cat
*.error), but maybe not from the standard output. Are you still
using the PBS script in your old post at
<a class="moz-txt-link-freetext" href="http://www.mail-archive.com/wien%40zeus.theochem.tuwien.ac.at/msg11770.html">http://www.mail-archive.com/wien%40zeus.theochem.tuwien.ac.at/msg11770.html</a>
? In the script, I can see that the standard output is set to be
written to a file called wien2k_output.<br>
<br>
When it runs fine on a single node, does it always use the same node
(say z1-17) or does it run fine on other nodes (like z1-18)?<br>
<br>
<blockquote
cite="mid:CAFZG4C4qKLt8aUZvja616wH+8mC8jF0cL0kd2MTtRh3TyX5guQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div style="color:rgb(80,0,80)">I also tried changing the
mpirun to mpiexec as suggested by Prof. Marks by setting:<br>
</div>
<div><font color="#500050">setenv WIEN_MPIRUN
"/usr/local/mvapich2-icc/bin/mpiexec -np _NP_ -f _HOSTS_
_EXEC_"</font></div>
<div><span style="color:rgb(80,0,80)">in the
parallel_option. In this case, the program does not run
and also does not terminate (qstat on cluster just gives
00:00:00 for the time with a running status)..</span></div>
</div>
</div>
</div>
</blockquote>
<br>
At least for now, stay with mpirun since it works on a single node.<br>
<br>
<br>
</body>
</html>