[Wien] parallel wien2k
Yurko Natanzon
yurko.natanzon at gmail.com
Tue Feb 23 13:11:02 CET 2010
Try to remove the "lapw0" string from the .machines file, so it reads:
1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1
If it will not work, also try running lapw0 in serial mode :
lapw0:nx1
1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1
also, take a look at the scripts which generate the proper .machines
file: http://www.wien2k.at/reg_user/faq/pbs.html
regards,
Yurko
On 23 February 2010 06:24, Zhiyong Zhang <zyzhang at stanford.edu> wrote:
> OK. Here are some more clues about the problem:
>
> forrtl: severe (64): input conversion error, unit 19, file /home/zzhang/wien2k-runs/lapw/TiC/TiC.vns
> Image PC Routine Line Source
> lapw1 00000000004E6F1E Unknown Unknown Unknown
> lapw1 00000000004E611A Unknown Unknown Unknown
> lapw1 000000000049FB76 Unknown Unknown Unknown
> lapw1 000000000046D75A Unknown Unknown Unknown
> lapw1 000000000046CD76 Unknown Unknown Unknown
> lapw1 0000000000486885 Unknown Unknown Unknown
> lapw1 00000000004540F8 rdswar_ 29 rdswar_tmp_.F
> lapw1 0000000000435FD3 inilpw_ 393 inilpw.f
> lapw1 0000000000438224 MAIN__ 41 lapw1_tmp_.F
> lapw1 0000000000404422 Unknown Unknown Unknown
> libc.so.6 0000003E1251C40B Unknown Unknown Unknown
> lapw1 000000000040436A Unknown Unknown Unknown
>
> I checked the TiC.vns in the parallel calculation and found the following (Please note the NaN entries):
>
> TOTAL POTENTIAL IN INTERSTITIAL
>
> 136 NUMBER OF PW
> 0 0 0 NaN 0.000000000000E+00
> -1 -1 -1 0.966480192428E-08 0.000000000000E+00
> 0 0 -2 0.237305964226E-06 0.000000000000E+00
> 0 -2 -2 0.383070560427E-08 0.000000000000E+00
> -1 -1 -3-0.108089242452E-08 0.000000000000E+00
>
> However, in the TiC.vns from the serial run, which seem to have worked fine, I found the following:
>
> TOTAL POTENTIAL IN INTERSTITIAL
>
> 136 NUMBER OF PW
> 0 0 0-0.227173083856E-01 0.000000000000E+00
> -1 -1 -1 0.114592956480E-02 0.000000000000E+00
> 0 0 -2-0.115420958078E-01 0.000000000000E+00
> 0 -2 -2 0.184312999415E-01 0.000000000000E+00
> -1 -1 -3-0.137802961139E-03 0.000000000000E+00
> -2 -2 -2-0.285539143809E-02 0.000000000000E+00
>
> Does anybody have any clue about the problem?
>
> Thanks again,
>
> Zhiyong
>
>
> ----- Original Message -----
> From: "Zhiyong Zhang" <zyzhang at stanford.edu>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Monday, February 22, 2010 9:15:44 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Hello Ricardo and All,
>
> Thank you for the information. I think you are right that part of the problem is because no forces printed. The example I am using is the TiC in the user guide. when I used "run_lapw -i 40 0.001 -I" in serial mode it worked fine.
>
> The problem "/home/zzhang/wien2k/lapw1para lapw1.def" seems to be due to the .machines file definition. If I remove the "lapw1:nx1 nx1 nx62 nx62" from the .machines file ans use the following .machines file,
>
> lapw0:nx1 nx1 nx62 nx62
> 1:nx1
> 1:nx1
> 1:nx62
> 1:nx62
> granularity:1
> extrafine:1
>
> Then the LAPW1 can run in parallel.
>
> Does this mean that lapw1/2 can only be run in k-point parallel mode, not fine grain MPI mode?
>
> How ever, I still got the following error in TiC.dayfile:
>
> 4 number_of_parallel_jobs
> nx1(11) 0.226u 0.017s 0.31 76.18% 0+0k 0+0io 0pf+0w
> nx1(11) 0.224u 0.009s 0.31 73.04% 0+0k 0+0io 0pf+0w
> nx62(11) 0.222u 0.008s 0.32 71.21% 0+0k 0+0io 0pf+0w
> nx62(11) 0.222u 0.010s 0.26 88.21% 0+0k 0+0io 0pf+0w
> nx1(1) 0.224u 0.008s 0.26 88.89% 0+0k 0+0io 0pf+0w
> nx1(1) 0.223u 0.008s 0.26 88.17% 0+0k 0+0io 0pf+0w
> nx62(1) 0.222u 0.009s 0.26 86.19% 0+0k 0+0io 0pf+0w
> ** LAPW1 crashed!
> 0.062u 0.436s 0:11.45 4.2% 0+0k 0+0io 0pf+0w
> error: command /home/zzhang/wien2k/lapw1para lapw1.def failed
>
> Which files should I read to find possible causes of the crash? I looked the *.error files but can't seem to find anything useful.
>
> Best,
> Zhiyong
>
>
>
> ----- Original Message -----
> From: "Ricardo Faccio" <rfaccio at fq.edu.uy>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Monday, February 22, 2010 8:28:35 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Hi Zhiyong
> What is your test case? remember that forces are printed if you have atoms
> located in general positions. For example, Fe in the bcc space group, will
> not print forces, since all atoms have the same symmetric environment.
> Regards
> Ricardo
>
> --
> -------------------------------------------------------------------------
> ----- Dr. Ricardo Faccio
>
> Mail: Cryssmat-Lab., Cátedra de Física, DETEMA
> Facultad de Química, Universidad de la República
> Av. Gral. Flores 2124, C.C. 1157
> C.P. 11800, Montevideo, Uruguay.
> E-mail: rfaccio at fq.edu.uy
> Phone: 598 2 9241860 Int. 109
> 598 2 9290705
> Fax: 598 2 9241906
> Web: http://cryssmat.fq.edu.uy/ricardo/ricardo.htm
>
>> Dear All,
>>
>>
>>
>> I am trying to test wien2k in parallel mode and I got into some problem. I
>> am using
>>
>>
>>
>> run_lapw -p -i 40 -fc 0.001 -I
>>
>>
>>
>> If I use a number of 0.001 for the option fc above, I got the following
>> error:
>>
>>
>>
>> Force-convergence not possible. Forces not present.
>>
>>
>>
>> If I do not use a number for the -fc option, but use "run_lapw -p -i 40
>> -fc
>> -I" instead
>>
>>
>>
>> Then lapw0 finishes without a problem but the program doesn't branch to
>> lapw1. An error message is generated when doing the test
>>
>>
>>
>> "if ($fcut == "0") goto lapw1
>>
>>
>>
>> I was able to do "run_lapw -p -i 40 -I", without the "-fc" option at all
>> and
>> was able to finish "lapw0 -p" and then start "lapw1 -p" but got into the
>> following error:
>>
>>
>>
>> error: command /home/zzhang/wien2k/lapw1para lapw1.def failed
>>
>>
>>
>> Does anybody have similar problems and know how to fix this?
>>
>>
>>
>> It does the following:
>>
>>
>>
>> running LAPW1 in parallel mode (using .machines)
>>
>>
>>
>> and the .machines file is as follows:
>>
>>
>>
>> #
>>
>> lapw0:nx1 nx1 nx62 nx62
>>
>> lapw1:nx1 nx1 nx62 nx62
>>
>> lapw2:nx1 nx1 nx62 nx62
>>
>> 1:nx1
>>
>> 1:nx1
>>
>> 1:nx62
>>
>> 1:nx62
>>
>> granularity:1
>>
>> extrafine:1
>>
>>
>>
>> Thanks,
>>
>> Zhiyong
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
--
Yurko (aka Yuriy, Iurii, Jurij etc) Natanzon
PhD student
Department for Structural Research (NZ31)
Henryk Niewodniczański Institute of Nuclear Physics
Polish Academy of Sciences
ul. Radzikowskiego 152,
31-342 Krakow, Poland
E-mail: Yurii.Natanzon at ifj.edu.pl, yurko.natanzon at gmail.com
More information about the Wien
mailing list