<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thank you very much for all your responses.<br>
<br>
I did some more testing to provide more information.<br>
<br>
1. I tried a new compilation (since dr Gavin had no problems with my
calculation, I thought it might have been a compilation issue) but
nothing changed.<br>
<br>
2. Adding "x" to opticpara script shows that the script loops on a:<br>
<br>
<font face="monospace">while ( 0 < 2 )<br>
set p = 1<br>
if ( 0 && 0 ) set p = 2<br>
while ( 1 < = 0 )<br>
end</font><br>
<br>
which corresponds to lines 213-246 (in opticpara):<br>
<font face="monospace"><br>
while ($loop < $maxproc)<br>
set p = 1<br>
if ($?residue && $?resok) set p = 2<br>
while ($p <= $#machine)<br>
end</font><br>
<br>
I tracked down that line 126:<br>
<br>
<font face="monospace">set machine = `grep -v $init .processes
|grep : | grep -v $res | cut -f2 -d: | xargs`</font><br>
<br>
gives me nothing (the output of this command is just blank).<br>
It is supposed to take the second column from my .processes file
(without the init:* lines), which in my case is empty:<br>
<font face="monospace"><br>
init:wn0975.ib.trojan.kdm.wcss.pl<br>
init:wn1016.ib.trojan.kdm.wcss.pl<br>
1 : : 143 : 1 : 1 : 0<br>
2 : : 143 : 1 : 2 : 0</font><br>
<br>
What is supposed to be in that column? Isn't that the node names?
.processes is generated automatically from .machines, and my
machines looks OK (and it works for previous calculations):<br>
<font face="monospace"><br>
granularity:1<br>
extrafine:1<br>
1:wn0975.ib.trojan.kdm.wcss.pl:1<br>
1:wn1016.ib.trojan.kdm.wcss.pl:1</font><br>
<br>
There is line 125:<br>
<br>
<font face="monospace">set machine = `grep $init .processes |cut
-f2 -d: | xargs`</font><br>
<br>
which is commented, but it would make more sense to use it here. I
commented line 126, uncommented 125 and it seems to work now, but I
don't know if it has any other consequences. Can I leave it like
that? Someone wiser than me commented that line, and they probably
had some reason for doing so.<br>
<br>
I'm not really sure what to do next. Any help would be appreciated.
Please tell me if there is any other info that you might need.<br>
<br>
Best regards,<br>
<br>
Maciej Polak<br>
<br>
<br>
<br>
P.S. the answers to your other questions:<br>
1. All the files that are created after "x optic -p" is executed:<br>
<font face="monospace"><br>
-rw------- 1 mpolak grant045 172 04-23 02:53 .script<br>
-rw------- 1 mpolak grant045 17 04-23 02:53
.running.100962.wn0926.2304025353<br>
-rw------- 1 mpolak grant045 8 04-23 02:53 .processes2<br>
-rw------- 1 mpolak grant045 7793 04-23 02:53 :parallel<br>
-rw------- 1 mpolak grant045 8 04-23 02:53 .opticpara<br>
-rw------- 1 mpolak grant045 28 04-23 02:53 optic.error<br>
-rw------- 1 mpolak grant045 1475 04-23 02:53 optic.def<br>
-rw------- 1 mpolak grant045 1495 04-23 02:53 optic_2.def<br>
-rw------- 1 mpolak grant045 1495 04-23 02:53 optic_1.def<br>
-rw------- 1 mpolak grant045 1115 04-23 02:53 .mist<br>
-rw------- 1 mpolak grant045 2449 04-23 02:53 :log<br>
-rw------- 1 mpolak grant045 5 04-23 02:53 .lapw1para<br>
-rw------- 1 mpolak grant045 0 04-23 02:53 lapw1.error</font><br>
<br>
2. "ps -ef | grep optic" gives:<br>
<br>
<font face="monospace">mpolak 102451 97092 0 03:04 ?
00:00:00 /bin/csh -f /home/mpolak/WIEN2k/x optic -p<br>
mpolak 102465 102451 11 03:04 ? 00:00:03 /bin/csh -fx
/home/mpolak/WIEN2k/opticpara optic.def<br>
</font><br>
<br>
<br>
<div class="moz-cite-prefix">On 04/22/2016 07:27 AM, Peter Blaha
wrote:<br>
</div>
<blockquote cite="mid:5719B658.90404@theochem.tuwien.ac.at"
type="cite">First one needs a detailed information which files are
really generated in order to see where it stucks.
<br>
ls -alsrt list the files with full information (empty or
non-empty files, date+time of last write).
<br>
<br>
Then you should do a ps -ef and see what is running in connection
with optic (maybe add |grep optic)
<br>
<br>
If it does not start the parallel optic calculations, you may edit
opticpara and replace -f by -fx in the first line of this
script.
<br>
<br>
It will give you a very lengthy, hard to read output, but
basically this should help to find the exact position/reason where
it got stuck.
<br>
<br>
PS: I guess you have tried this to reproduce in a fresh directory
?
<br>
<br>
Am 22.04.2016 um 05:08 schrieb Gavin Abo:
<br>
<blockquote type="cite">If you haven't already done so, I would
suggest looking at the content
<br>
in the files .timeop_1, .timeop_2, ... , and .timeop_X (e.g.,
while in
<br>
the case directory: cat .timeop_*), because an error message
might be
<br>
logged in these files for a parallel optic calculation.
<br>
<br>
On 4/21/2016 3:44 PM, Maciej Polak wrote:
<br>
<blockquote type="cite">Dear WIEN2k Community,
<br>
<br>
I want to calculate the joint density of states but I ran into
some
<br>
problems with parallel execution of x optic. I use only
K-point
<br>
parallelization and run the newest 14.2 version of WIEN2k.
<br>
<br>
When I do sequential calculations, it all works fine. But for
bigger
<br>
cases, and many K-points it is impossible to finish on one
CPU. After
<br>
I add the -p flag to the relevant procedures, the last output
I see
<br>
is: running OPTIC in parallel mode. From then, nothing
happens. The
<br>
optic_X.def files are generated, and an optic.error file
containing
<br>
"Error in Parallel OPTIC", nothing else. The code just stands
still
<br>
after that, no activity on CPUs.
<br>
<br>
A simple minimalistic example to reproduce the error:
<br>
<br>
init_lapw -bw -vxc 5 -rkmax 7 -numk 1000 -red 2
<br>
run_lapw -p
<br>
x kgen <<< 10000
<br>
x lapw1 -p
<br>
x lapw2 -fermi -p
<br>
x optic -p
<br>
<br>
The same set of calculations, without the -p flag, would work
just
<br>
fine. However, when I generate a bigger k-mesh and have a
large number
<br>
of atoms it is absolutely impossible to perform the
calculations on a
<br>
single core.
<br>
<br>
Regular k-point calculations (geometry optimization,
bandstructures,
<br>
etc.) work perfectly.
<br>
<br>
I attached my *.struct and *.inop, but they are not the
problem in
<br>
this case, since they work with sequential version as
intended. This
<br>
is just a super simple FCC Si calculation just for testing.
<br>
<br>
I would really appreciate any help. I tried to read through
the
<br>
mailing list, but couldn't find a similar problem.
<br>
<br>
Best regards,
<br>
<br>
Maciej Polak
<br>
Wroclaw University of Science and Technology
<br>
</blockquote>
_______________________________________________
<br>
Wien mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a>
<br>
<a class="moz-txt-link-freetext" href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a>
<br>
SEARCH the MAILING-LIST at:
<br>
<a class="moz-txt-link-freetext" href="http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html">http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html</a>
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>