[Wien] Problem with parallel OPTIC

Peter Blaha pblaha at theochem.tuwien.ac.at
Sun Apr 24 21:31:09 CEST 2016


When using wien2k_14.2 execute the following:

x lapw1 -p -d

It will just recreate the .processes file.

the 2nd field in lines 3+4 should contain the name of the machine.

It works fine for me.
If it does not work for you, try to use just   wn1016   without the domain.

Am 23.04.2016 um 03:38 schrieb Maciej Polak:
> Thank you very much for all your responses.
>
> I did some more testing to provide more information.
>
> 1. I tried a new compilation (since dr Gavin had no problems with my
> calculation, I thought it might have been a compilation issue) but
> nothing changed.
>
> 2. Adding "x" to opticpara script shows that the script loops on a:
>
> while ( 0 < 2 )
> set p = 1
> if ( 0 && 0 ) set p = 2
> while ( 1 < = 0 )
> end
>
> which corresponds to lines 213-246 (in opticpara):
>
> while ($loop < $maxproc)
>    set p = 1
>    if ($?residue && $?resok) set p = 2
>    while ($p <= $#machine)
> end
>
> I tracked down that line 126:
>
> set machine  = `grep -v $init .processes |grep : | grep -v $res | cut
> -f2 -d: | xargs`
>
> gives me nothing (the output of this command is just blank).
> It is supposed to take the second column from my .processes file
> (without the init:* lines), which in my case is empty:
>
> init:wn0975.ib.trojan.kdm.wcss.pl
> init:wn1016.ib.trojan.kdm.wcss.pl
> 1 :  :  143 : 1 : 1 : 0
> 2 :  :  143 : 1 : 2 : 0
>
> What is supposed to be in that column? Isn't that the node names?
> .processes is generated automatically from .machines, and my machines
> looks OK (and it works for previous calculations):
>
> granularity:1
> extrafine:1
> 1:wn0975.ib.trojan.kdm.wcss.pl:1
> 1:wn1016.ib.trojan.kdm.wcss.pl:1
>
> There is line 125:
>
> set machine  = `grep $init .processes |cut -f2 -d: | xargs`
>
> which is commented, but it would make more sense to use it here. I
> commented line 126, uncommented 125 and it seems to work now, but I
> don't know if it has any other consequences. Can I leave it like that?
> Someone wiser than me commented that line, and they probably had some
> reason for doing so.
>
> I'm not really sure what to do next. Any help would be appreciated.
> Please tell me if there is any other info that you might need.
>
> Best regards,
>
> Maciej Polak
>
>
>
> P.S. the answers to your other questions:
> 1. All the files that are created after "x optic -p" is executed:
>
> -rw------- 1 mpolak grant045     172 04-23 02:53 .script
> -rw------- 1 mpolak grant045      17 04-23 02:53
> .running.100962.wn0926.2304025353
> -rw------- 1 mpolak grant045       8 04-23 02:53 .processes2
> -rw------- 1 mpolak grant045    7793 04-23 02:53 :parallel
> -rw------- 1 mpolak grant045       8 04-23 02:53 .opticpara
> -rw------- 1 mpolak grant045      28 04-23 02:53 optic.error
> -rw------- 1 mpolak grant045    1475 04-23 02:53 optic.def
> -rw------- 1 mpolak grant045    1495 04-23 02:53 optic_2.def
> -rw------- 1 mpolak grant045    1495 04-23 02:53 optic_1.def
> -rw------- 1 mpolak grant045    1115 04-23 02:53 .mist
> -rw------- 1 mpolak grant045    2449 04-23 02:53 :log
> -rw------- 1 mpolak grant045       5 04-23 02:53 .lapw1para
> -rw------- 1 mpolak grant045       0 04-23 02:53 lapw1.error
>
> 2. "ps -ef | grep optic" gives:
>
> mpolak   102451  97092  0 03:04 ? 00:00:00 /bin/csh -f
> /home/mpolak/WIEN2k/x optic -p
> mpolak   102465 102451 11 03:04 ?        00:00:03 /bin/csh -fx
> /home/mpolak/WIEN2k/opticpara optic.def
>
>
>
> On 04/22/2016 07:27 AM, Peter Blaha wrote:
>> First one needs a detailed information which files are really
>> generated in order to see where it stucks.
>> ls -alsrt  list the files with full information  (empty or non-empty
>> files, date+time of last write).
>>
>> Then you should do a ps -ef  and see what is running in connection
>> with optic  (maybe add |grep optic)
>>
>> If it does not start the parallel optic calculations, you may edit
>> opticpara and replace   -f by -fx in the first line of this script.
>>
>> It will give you a very lengthy, hard to read output, but basically
>> this should help to find the exact position/reason where it got stuck.
>>
>> PS: I guess you have tried this to reproduce in a fresh directory ?
>>
>> Am 22.04.2016 um 05:08 schrieb Gavin Abo:
>>> If you haven't already done so, I would suggest looking at the content
>>> in the files .timeop_1, .timeop_2, ... , and .timeop_X (e.g., while in
>>> the case directory: cat .timeop_*), because an error message might be
>>> logged in these files for a parallel optic calculation.
>>>
>>> On 4/21/2016 3:44 PM, Maciej Polak wrote:
>>>> Dear WIEN2k Community,
>>>>
>>>> I want to calculate the joint density of states but I ran into some
>>>> problems with parallel execution of x optic. I use only K-point
>>>> parallelization and run the newest 14.2 version of WIEN2k.
>>>>
>>>> When I do sequential calculations, it all works fine. But for bigger
>>>> cases, and many K-points it is impossible to finish on one CPU. After
>>>> I add the -p flag to the relevant procedures, the last output I see
>>>> is: running OPTIC in parallel mode. From then, nothing happens. The
>>>> optic_X.def files are generated, and an optic.error file containing
>>>> "Error in Parallel OPTIC", nothing else. The code just stands still
>>>> after that, no activity on CPUs.
>>>>
>>>> A simple minimalistic example to reproduce the error:
>>>>
>>>> init_lapw -bw -vxc 5 -rkmax 7 -numk 1000 -red 2
>>>> run_lapw -p
>>>> x kgen <<< 10000
>>>> x lapw1 -p
>>>> x lapw2 -fermi -p
>>>> x optic -p
>>>>
>>>> The same set of calculations, without the -p flag, would work just
>>>> fine. However, when I generate a bigger k-mesh and have a large number
>>>> of atoms it is absolutely impossible to perform the calculations on a
>>>> single core.
>>>>
>>>> Regular k-point calculations (geometry optimization, bandstructures,
>>>> etc.) work perfectly.
>>>>
>>>> I attached my *.struct and *.inop, but they are not the problem in
>>>> this case, since they work with sequential version as intended. This
>>>> is just a super simple FCC Si calculation just for testing.
>>>>
>>>> I would really appreciate any help. I tried to read through the
>>>> mailing list, but couldn't find a similar problem.
>>>>
>>>> Best regards,
>>>>
>>>> Maciej Polak
>>>> Wroclaw University of Science and Technology
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--------------------------------------------------------------------------


More information about the Wien mailing list