[Wien] parallel wien2k

Yurko Natanzon yurko.natanzon at gmail.com
Tue Feb 23 13:11:02 CET 2010


Try to remove the "lapw0" string from the .machines file, so it reads:

1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1

If it will not work, also try running lapw0 in serial mode :
lapw0:nx1
1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1

also, take a look at the scripts which generate the proper .machines
file: http://www.wien2k.at/reg_user/faq/pbs.html

regards,
Yurko

On 23 February 2010 06:24, Zhiyong Zhang <zyzhang at stanford.edu> wrote:
> OK. Here are some more clues about the problem:
>
> forrtl: severe (64): input conversion error, unit 19, file /home/zzhang/wien2k-runs/lapw/TiC/TiC.vns
> Image              PC                Routine            Line        Source
> lapw1              00000000004E6F1E  Unknown               Unknown  Unknown
> lapw1              00000000004E611A  Unknown               Unknown  Unknown
> lapw1              000000000049FB76  Unknown               Unknown  Unknown
> lapw1              000000000046D75A  Unknown               Unknown  Unknown
> lapw1              000000000046CD76  Unknown               Unknown  Unknown
> lapw1              0000000000486885  Unknown               Unknown  Unknown
> lapw1              00000000004540F8  rdswar_                    29  rdswar_tmp_.F
> lapw1              0000000000435FD3  inilpw_                   393  inilpw.f
> lapw1              0000000000438224  MAIN__                     41  lapw1_tmp_.F
> lapw1              0000000000404422  Unknown               Unknown  Unknown
> libc.so.6          0000003E1251C40B  Unknown               Unknown  Unknown
> lapw1              000000000040436A  Unknown               Unknown  Unknown
>
> I checked the TiC.vns in the parallel calculation and found the following (Please note the NaN entries):
>
>     TOTAL POTENTIAL IN INTERSTITIAL
>
>                136 NUMBER OF PW
>       0    0    0 NaN                0.000000000000E+00
>      -1   -1   -1 0.966480192428E-08 0.000000000000E+00
>       0    0   -2 0.237305964226E-06 0.000000000000E+00
>       0   -2   -2 0.383070560427E-08 0.000000000000E+00
>      -1   -1   -3-0.108089242452E-08 0.000000000000E+00
>
> However, in the TiC.vns from the serial run, which seem to have worked fine, I found the following:
>
>     TOTAL POTENTIAL IN INTERSTITIAL
>
>                136 NUMBER OF PW
>       0    0    0-0.227173083856E-01 0.000000000000E+00
>      -1   -1   -1 0.114592956480E-02 0.000000000000E+00
>       0    0   -2-0.115420958078E-01 0.000000000000E+00
>       0   -2   -2 0.184312999415E-01 0.000000000000E+00
>      -1   -1   -3-0.137802961139E-03 0.000000000000E+00
>      -2   -2   -2-0.285539143809E-02 0.000000000000E+00
>
> Does anybody have any clue about the problem?
>
> Thanks again,
>
> Zhiyong
>
>
> ----- Original Message -----
> From: "Zhiyong Zhang" <zyzhang at stanford.edu>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Monday, February 22, 2010 9:15:44 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Hello Ricardo and All,
>
> Thank you for the information. I think you are right that part of the problem is because no forces printed. The example I am using is the TiC in the user guide. when I used "run_lapw -i 40 0.001 -I" in serial mode it worked fine.
>
> The problem "/home/zzhang/wien2k/lapw1para lapw1.def" seems to be due to the .machines file definition. If I remove the "lapw1:nx1  nx1  nx62  nx62" from the .machines file ans use the following .machines file,
>
> lapw0:nx1  nx1  nx62  nx62
> 1:nx1
> 1:nx1
> 1:nx62
> 1:nx62
> granularity:1
> extrafine:1
>
> Then the LAPW1 can run in parallel.
>
> Does this mean that lapw1/2 can only be run in k-point parallel mode, not fine grain MPI mode?
>
> How ever, I still got the following error in TiC.dayfile:
>
> 4 number_of_parallel_jobs
>     nx1(11) 0.226u 0.017s 0.31 76.18%      0+0k 0+0io 0pf+0w
>     nx1(11) 0.224u 0.009s 0.31 73.04%      0+0k 0+0io 0pf+0w
>     nx62(11) 0.222u 0.008s 0.32 71.21%      0+0k 0+0io 0pf+0w
>     nx62(11) 0.222u 0.010s 0.26 88.21%      0+0k 0+0io 0pf+0w
>     nx1(1) 0.224u 0.008s 0.26 88.89%      0+0k 0+0io 0pf+0w
>     nx1(1) 0.223u 0.008s 0.26 88.17%      0+0k 0+0io 0pf+0w
>     nx62(1) 0.222u 0.009s 0.26 86.19%      0+0k 0+0io 0pf+0w
> **  LAPW1 crashed!
> 0.062u 0.436s 0:11.45 4.2%      0+0k 0+0io 0pf+0w
> error: command   /home/zzhang/wien2k/lapw1para lapw1.def   failed
>
> Which files should I read to find possible causes of the crash? I looked the *.error files but can't seem to find anything useful.
>
> Best,
> Zhiyong
>
>
>
> ----- Original Message -----
> From: "Ricardo Faccio" <rfaccio at fq.edu.uy>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Monday, February 22, 2010 8:28:35 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Hi Zhiyong
> What is your test case? remember that forces are printed if you have atoms
> located in general positions. For example, Fe in the bcc space group, will
> not print forces, since all atoms have the same symmetric environment.
> Regards
> Ricardo
>
> --
>  -------------------------------------------------------------------------
> -----   Dr. Ricardo Faccio
>
>  Mail: Cryssmat-Lab., Cátedra de Física, DETEMA
>  Facultad de Química, Universidad de la República
>       Av. Gral. Flores 2124, C.C. 1157
>       C.P. 11800, Montevideo, Uruguay.
>  E-mail: rfaccio at fq.edu.uy
>  Phone: 598 2 9241860 Int. 109
>             598 2 9290705
>  Fax:    598 2 9241906
>  Web:  http://cryssmat.fq.edu.uy/ricardo/ricardo.htm
>
>> Dear All,
>>
>>
>>
>> I am trying to test wien2k in parallel mode and I got into some problem. I
>> am using
>>
>>
>>
>> run_lapw -p -i 40 -fc 0.001 -I
>>
>>
>>
>> If I use a number of 0.001 for the option fc above, I got the following
>> error:
>>
>>
>>
>> Force-convergence not possible. Forces not present.
>>
>>
>>
>> If I do not use a number for the -fc option, but use "run_lapw -p -i 40
>> -fc
>> -I" instead
>>
>>
>>
>> Then lapw0 finishes without a problem but the program doesn't branch to
>> lapw1. An error message is generated when doing the test
>>
>>
>>
>> "if ($fcut == "0") goto lapw1
>>
>>
>>
>> I was able to do "run_lapw -p -i 40 -I", without the "-fc" option at all
>> and
>> was able to finish "lapw0 -p" and then start "lapw1 -p" but got into the
>> following error:
>>
>>
>>
>> error: command   /home/zzhang/wien2k/lapw1para lapw1.def   failed
>>
>>
>>
>> Does anybody have similar problems and know how to fix this?
>>
>>
>>
>> It does the following:
>>
>>
>>
>> running LAPW1 in parallel mode (using .machines)
>>
>>
>>
>> and the .machines file is as follows:
>>
>>
>>
>> #
>>
>> lapw0:nx1  nx1  nx62  nx62
>>
>> lapw1:nx1  nx1  nx62  nx62
>>
>> lapw2:nx1  nx1  nx62  nx62
>>
>> 1:nx1
>>
>> 1:nx1
>>
>> 1:nx62
>>
>> 1:nx62
>>
>> granularity:1
>>
>> extrafine:1
>>
>>
>>
>> Thanks,
>>
>> Zhiyong
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>



-- 
Yurko (aka Yuriy, Iurii, Jurij etc) Natanzon
PhD student
Department for Structural Research (NZ31)
Henryk Niewodniczański Institute of Nuclear Physics
Polish Academy of Sciences
ul. Radzikowskiego 152,
31-342 Krakow, Poland
E-mail: Yurii.Natanzon at ifj.edu.pl, yurko.natanzon at gmail.com


More information about the Wien mailing list