[Wien] parallel wien2k
Zhiyong Zhang
zyzhang at stanford.edu
Tue Feb 23 06:24:33 CET 2010
OK. Here are some more clues about the problem:
forrtl: severe (64): input conversion error, unit 19, file /home/zzhang/wien2k-runs/lapw/TiC/TiC.vns
Image PC Routine Line Source
lapw1 00000000004E6F1E Unknown Unknown Unknown
lapw1 00000000004E611A Unknown Unknown Unknown
lapw1 000000000049FB76 Unknown Unknown Unknown
lapw1 000000000046D75A Unknown Unknown Unknown
lapw1 000000000046CD76 Unknown Unknown Unknown
lapw1 0000000000486885 Unknown Unknown Unknown
lapw1 00000000004540F8 rdswar_ 29 rdswar_tmp_.F
lapw1 0000000000435FD3 inilpw_ 393 inilpw.f
lapw1 0000000000438224 MAIN__ 41 lapw1_tmp_.F
lapw1 0000000000404422 Unknown Unknown Unknown
libc.so.6 0000003E1251C40B Unknown Unknown Unknown
lapw1 000000000040436A Unknown Unknown Unknown
I checked the TiC.vns in the parallel calculation and found the following (Please note the NaN entries):
TOTAL POTENTIAL IN INTERSTITIAL
136 NUMBER OF PW
0 0 0 NaN 0.000000000000E+00
-1 -1 -1 0.966480192428E-08 0.000000000000E+00
0 0 -2 0.237305964226E-06 0.000000000000E+00
0 -2 -2 0.383070560427E-08 0.000000000000E+00
-1 -1 -3-0.108089242452E-08 0.000000000000E+00
However, in the TiC.vns from the serial run, which seem to have worked fine, I found the following:
TOTAL POTENTIAL IN INTERSTITIAL
136 NUMBER OF PW
0 0 0-0.227173083856E-01 0.000000000000E+00
-1 -1 -1 0.114592956480E-02 0.000000000000E+00
0 0 -2-0.115420958078E-01 0.000000000000E+00
0 -2 -2 0.184312999415E-01 0.000000000000E+00
-1 -1 -3-0.137802961139E-03 0.000000000000E+00
-2 -2 -2-0.285539143809E-02 0.000000000000E+00
Does anybody have any clue about the problem?
Thanks again,
Zhiyong
----- Original Message -----
From: "Zhiyong Zhang" <zyzhang at stanford.edu>
To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
Sent: Monday, February 22, 2010 9:15:44 PM GMT -08:00 US/Canada Pacific
Subject: Re: [Wien] parallel wien2k
Hello Ricardo and All,
Thank you for the information. I think you are right that part of the problem is because no forces printed. The example I am using is the TiC in the user guide. when I used "run_lapw -i 40 0.001 -I" in serial mode it worked fine.
The problem "/home/zzhang/wien2k/lapw1para lapw1.def" seems to be due to the .machines file definition. If I remove the "lapw1:nx1 nx1 nx62 nx62" from the .machines file ans use the following .machines file,
lapw0:nx1 nx1 nx62 nx62
1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1
Then the LAPW1 can run in parallel.
Does this mean that lapw1/2 can only be run in k-point parallel mode, not fine grain MPI mode?
How ever, I still got the following error in TiC.dayfile:
4 number_of_parallel_jobs
nx1(11) 0.226u 0.017s 0.31 76.18% 0+0k 0+0io 0pf+0w
nx1(11) 0.224u 0.009s 0.31 73.04% 0+0k 0+0io 0pf+0w
nx62(11) 0.222u 0.008s 0.32 71.21% 0+0k 0+0io 0pf+0w
nx62(11) 0.222u 0.010s 0.26 88.21% 0+0k 0+0io 0pf+0w
nx1(1) 0.224u 0.008s 0.26 88.89% 0+0k 0+0io 0pf+0w
nx1(1) 0.223u 0.008s 0.26 88.17% 0+0k 0+0io 0pf+0w
nx62(1) 0.222u 0.009s 0.26 86.19% 0+0k 0+0io 0pf+0w
** LAPW1 crashed!
0.062u 0.436s 0:11.45 4.2% 0+0k 0+0io 0pf+0w
error: command /home/zzhang/wien2k/lapw1para lapw1.def failed
Which files should I read to find possible causes of the crash? I looked the *.error files but can't seem to find anything useful.
Best,
Zhiyong
----- Original Message -----
From: "Ricardo Faccio" <rfaccio at fq.edu.uy>
To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
Sent: Monday, February 22, 2010 8:28:35 PM GMT -08:00 US/Canada Pacific
Subject: Re: [Wien] parallel wien2k
Hi Zhiyong
What is your test case? remember that forces are printed if you have atoms
located in general positions. For example, Fe in the bcc space group, will
not print forces, since all atoms have the same symmetric environment.
Regards
Ricardo
--
-------------------------------------------------------------------------
----- Dr. Ricardo Faccio
Mail: Cryssmat-Lab., Cátedra de Física, DETEMA
Facultad de Química, Universidad de la República
Av. Gral. Flores 2124, C.C. 1157
C.P. 11800, Montevideo, Uruguay.
E-mail: rfaccio at fq.edu.uy
Phone: 598 2 9241860 Int. 109
598 2 9290705
Fax: 598 2 9241906
Web: http://cryssmat.fq.edu.uy/ricardo/ricardo.htm
> Dear All,
>
>
>
> I am trying to test wien2k in parallel mode and I got into some problem. I
> am using
>
>
>
> run_lapw -p -i 40 -fc 0.001 -I
>
>
>
> If I use a number of 0.001 for the option fc above, I got the following
> error:
>
>
>
> Force-convergence not possible. Forces not present.
>
>
>
> If I do not use a number for the -fc option, but use "run_lapw -p -i 40
> -fc
> -I" instead
>
>
>
> Then lapw0 finishes without a problem but the program doesn't branch to
> lapw1. An error message is generated when doing the test
>
>
>
> "if ($fcut == "0") goto lapw1
>
>
>
> I was able to do "run_lapw -p -i 40 -I", without the "-fc" option at all
> and
> was able to finish "lapw0 -p" and then start "lapw1 -p" but got into the
> following error:
>
>
>
> error: command /home/zzhang/wien2k/lapw1para lapw1.def failed
>
>
>
> Does anybody have similar problems and know how to fix this?
>
>
>
> It does the following:
>
>
>
> running LAPW1 in parallel mode (using .machines)
>
>
>
> and the .machines file is as follows:
>
>
>
> #
>
> lapw0:nx1 nx1 nx62 nx62
>
> lapw1:nx1 nx1 nx62 nx62
>
> lapw2:nx1 nx1 nx62 nx62
>
> 1:nx1
>
> 1:nx1
>
> 1:nx62
>
> 1:nx62
>
> granularity:1
>
> extrafine:1
>
>
>
> Thanks,
>
> Zhiyong
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
More information about the Wien
mailing list