[Wien] Problem with parallel OPTIC
Maciej Polak
maciej.polak at pwr.edu.pl
Sat Apr 23 03:38:10 CEST 2016
Thank you very much for all your responses.
I did some more testing to provide more information.
1. I tried a new compilation (since dr Gavin had no problems with my
calculation, I thought it might have been a compilation issue) but
nothing changed.
2. Adding "x" to opticpara script shows that the script loops on a:
while ( 0 < 2 )
set p = 1
if ( 0 && 0 ) set p = 2
while ( 1 < = 0 )
end
which corresponds to lines 213-246 (in opticpara):
while ($loop < $maxproc)
set p = 1
if ($?residue && $?resok) set p = 2
while ($p <= $#machine)
end
I tracked down that line 126:
set machine = `grep -v $init .processes |grep : | grep -v $res | cut
-f2 -d: | xargs`
gives me nothing (the output of this command is just blank).
It is supposed to take the second column from my .processes file
(without the init:* lines), which in my case is empty:
init:wn0975.ib.trojan.kdm.wcss.pl
init:wn1016.ib.trojan.kdm.wcss.pl
1 : : 143 : 1 : 1 : 0
2 : : 143 : 1 : 2 : 0
What is supposed to be in that column? Isn't that the node names?
.processes is generated automatically from .machines, and my machines
looks OK (and it works for previous calculations):
granularity:1
extrafine:1
1:wn0975.ib.trojan.kdm.wcss.pl:1
1:wn1016.ib.trojan.kdm.wcss.pl:1
There is line 125:
set machine = `grep $init .processes |cut -f2 -d: | xargs`
which is commented, but it would make more sense to use it here. I
commented line 126, uncommented 125 and it seems to work now, but I
don't know if it has any other consequences. Can I leave it like that?
Someone wiser than me commented that line, and they probably had some
reason for doing so.
I'm not really sure what to do next. Any help would be appreciated.
Please tell me if there is any other info that you might need.
Best regards,
Maciej Polak
P.S. the answers to your other questions:
1. All the files that are created after "x optic -p" is executed:
-rw------- 1 mpolak grant045 172 04-23 02:53 .script
-rw------- 1 mpolak grant045 17 04-23 02:53
.running.100962.wn0926.2304025353
-rw------- 1 mpolak grant045 8 04-23 02:53 .processes2
-rw------- 1 mpolak grant045 7793 04-23 02:53 :parallel
-rw------- 1 mpolak grant045 8 04-23 02:53 .opticpara
-rw------- 1 mpolak grant045 28 04-23 02:53 optic.error
-rw------- 1 mpolak grant045 1475 04-23 02:53 optic.def
-rw------- 1 mpolak grant045 1495 04-23 02:53 optic_2.def
-rw------- 1 mpolak grant045 1495 04-23 02:53 optic_1.def
-rw------- 1 mpolak grant045 1115 04-23 02:53 .mist
-rw------- 1 mpolak grant045 2449 04-23 02:53 :log
-rw------- 1 mpolak grant045 5 04-23 02:53 .lapw1para
-rw------- 1 mpolak grant045 0 04-23 02:53 lapw1.error
2. "ps -ef | grep optic" gives:
mpolak 102451 97092 0 03:04 ? 00:00:00 /bin/csh -f
/home/mpolak/WIEN2k/x optic -p
mpolak 102465 102451 11 03:04 ? 00:00:03 /bin/csh -fx
/home/mpolak/WIEN2k/opticpara optic.def
On 04/22/2016 07:27 AM, Peter Blaha wrote:
> First one needs a detailed information which files are really
> generated in order to see where it stucks.
> ls -alsrt list the files with full information (empty or non-empty
> files, date+time of last write).
>
> Then you should do a ps -ef and see what is running in connection
> with optic (maybe add |grep optic)
>
> If it does not start the parallel optic calculations, you may edit
> opticpara and replace -f by -fx in the first line of this script.
>
> It will give you a very lengthy, hard to read output, but basically
> this should help to find the exact position/reason where it got stuck.
>
> PS: I guess you have tried this to reproduce in a fresh directory ?
>
> Am 22.04.2016 um 05:08 schrieb Gavin Abo:
>> If you haven't already done so, I would suggest looking at the content
>> in the files .timeop_1, .timeop_2, ... , and .timeop_X (e.g., while in
>> the case directory: cat .timeop_*), because an error message might be
>> logged in these files for a parallel optic calculation.
>>
>> On 4/21/2016 3:44 PM, Maciej Polak wrote:
>>> Dear WIEN2k Community,
>>>
>>> I want to calculate the joint density of states but I ran into some
>>> problems with parallel execution of x optic. I use only K-point
>>> parallelization and run the newest 14.2 version of WIEN2k.
>>>
>>> When I do sequential calculations, it all works fine. But for bigger
>>> cases, and many K-points it is impossible to finish on one CPU. After
>>> I add the -p flag to the relevant procedures, the last output I see
>>> is: running OPTIC in parallel mode. From then, nothing happens. The
>>> optic_X.def files are generated, and an optic.error file containing
>>> "Error in Parallel OPTIC", nothing else. The code just stands still
>>> after that, no activity on CPUs.
>>>
>>> A simple minimalistic example to reproduce the error:
>>>
>>> init_lapw -bw -vxc 5 -rkmax 7 -numk 1000 -red 2
>>> run_lapw -p
>>> x kgen <<< 10000
>>> x lapw1 -p
>>> x lapw2 -fermi -p
>>> x optic -p
>>>
>>> The same set of calculations, without the -p flag, would work just
>>> fine. However, when I generate a bigger k-mesh and have a large number
>>> of atoms it is absolutely impossible to perform the calculations on a
>>> single core.
>>>
>>> Regular k-point calculations (geometry optimization, bandstructures,
>>> etc.) work perfectly.
>>>
>>> I attached my *.struct and *.inop, but they are not the problem in
>>> this case, since they work with sequential version as intended. This
>>> is just a super simple FCC Si calculation just for testing.
>>>
>>> I would really appreciate any help. I tried to read through the
>>> mailing list, but couldn't find a similar problem.
>>>
>>> Best regards,
>>>
>>> Maciej Polak
>>> Wroclaw University of Science and Technology
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20160423/ebbb20c7/attachment.html>
More information about the Wien
mailing list