[Wien] parallel wien2k

Zhiyong Zhang zyzhang at stanford.edu
Tue Feb 23 06:24:33 CET 2010


OK. Here are some more clues about the problem: 

forrtl: severe (64): input conversion error, unit 19, file /home/zzhang/wien2k-runs/lapw/TiC/TiC.vns
Image              PC                Routine            Line        Source
lapw1              00000000004E6F1E  Unknown               Unknown  Unknown
lapw1              00000000004E611A  Unknown               Unknown  Unknown
lapw1              000000000049FB76  Unknown               Unknown  Unknown
lapw1              000000000046D75A  Unknown               Unknown  Unknown
lapw1              000000000046CD76  Unknown               Unknown  Unknown
lapw1              0000000000486885  Unknown               Unknown  Unknown
lapw1              00000000004540F8  rdswar_                    29  rdswar_tmp_.F
lapw1              0000000000435FD3  inilpw_                   393  inilpw.f
lapw1              0000000000438224  MAIN__                     41  lapw1_tmp_.F
lapw1              0000000000404422  Unknown               Unknown  Unknown
libc.so.6          0000003E1251C40B  Unknown               Unknown  Unknown
lapw1              000000000040436A  Unknown               Unknown  Unknown

I checked the TiC.vns in the parallel calculation and found the following (Please note the NaN entries): 

     TOTAL POTENTIAL IN INTERSTITIAL

                136 NUMBER OF PW
       0    0    0 NaN                0.000000000000E+00
      -1   -1   -1 0.966480192428E-08 0.000000000000E+00
       0    0   -2 0.237305964226E-06 0.000000000000E+00
       0   -2   -2 0.383070560427E-08 0.000000000000E+00
      -1   -1   -3-0.108089242452E-08 0.000000000000E+00

However, in the TiC.vns from the serial run, which seem to have worked fine, I found the following: 

     TOTAL POTENTIAL IN INTERSTITIAL

                136 NUMBER OF PW
       0    0    0-0.227173083856E-01 0.000000000000E+00
      -1   -1   -1 0.114592956480E-02 0.000000000000E+00
       0    0   -2-0.115420958078E-01 0.000000000000E+00
       0   -2   -2 0.184312999415E-01 0.000000000000E+00
      -1   -1   -3-0.137802961139E-03 0.000000000000E+00
      -2   -2   -2-0.285539143809E-02 0.000000000000E+00
 
Does anybody have any clue about the problem? 

Thanks again, 

Zhiyong


----- Original Message -----
From: "Zhiyong Zhang" <zyzhang at stanford.edu>
To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
Sent: Monday, February 22, 2010 9:15:44 PM GMT -08:00 US/Canada Pacific
Subject: Re: [Wien] parallel wien2k

Hello Ricardo and All, 

Thank you for the information. I think you are right that part of the problem is because no forces printed. The example I am using is the TiC in the user guide. when I used "run_lapw -i 40 0.001 -I" in serial mode it worked fine. 

The problem "/home/zzhang/wien2k/lapw1para lapw1.def" seems to be due to the .machines file definition. If I remove the "lapw1:nx1  nx1  nx62  nx62" from the .machines file ans use the following .machines file,  

lapw0:nx1  nx1  nx62  nx62  
1:nx1
1:nx1
1:nx62
1:nx62
granularity:1
extrafine:1

Then the LAPW1 can run in parallel. 

Does this mean that lapw1/2 can only be run in k-point parallel mode, not fine grain MPI mode? 

How ever, I still got the following error in TiC.dayfile: 

4 number_of_parallel_jobs
     nx1(11) 0.226u 0.017s 0.31 76.18%      0+0k 0+0io 0pf+0w
     nx1(11) 0.224u 0.009s 0.31 73.04%      0+0k 0+0io 0pf+0w
     nx62(11) 0.222u 0.008s 0.32 71.21%      0+0k 0+0io 0pf+0w
     nx62(11) 0.222u 0.010s 0.26 88.21%      0+0k 0+0io 0pf+0w
     nx1(1) 0.224u 0.008s 0.26 88.89%      0+0k 0+0io 0pf+0w
     nx1(1) 0.223u 0.008s 0.26 88.17%      0+0k 0+0io 0pf+0w
     nx62(1) 0.222u 0.009s 0.26 86.19%      0+0k 0+0io 0pf+0w
**  LAPW1 crashed!
0.062u 0.436s 0:11.45 4.2%      0+0k 0+0io 0pf+0w
error: command   /home/zzhang/wien2k/lapw1para lapw1.def   failed

Which files should I read to find possible causes of the crash? I looked the *.error files but can't seem to find anything useful. 

Best, 
Zhiyong



----- Original Message -----
From: "Ricardo Faccio" <rfaccio at fq.edu.uy>
To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
Sent: Monday, February 22, 2010 8:28:35 PM GMT -08:00 US/Canada Pacific
Subject: Re: [Wien] parallel wien2k

Hi Zhiyong
What is your test case? remember that forces are printed if you have atoms
located in general positions. For example, Fe in the bcc space group, will
not print forces, since all atoms have the same symmetric environment.
Regards
Ricardo

-- 
  -------------------------------------------------------------------------
-----   Dr. Ricardo Faccio

  Mail: Cryssmat-Lab., Cátedra de Física, DETEMA
  Facultad de Química, Universidad de la República
       Av. Gral. Flores 2124, C.C. 1157
       C.P. 11800, Montevideo, Uruguay.
  E-mail: rfaccio at fq.edu.uy
  Phone: 598 2 9241860 Int. 109
             598 2 9290705
  Fax:    598 2 9241906
  Web:  http://cryssmat.fq.edu.uy/ricardo/ricardo.htm

> Dear All,
>
>
>
> I am trying to test wien2k in parallel mode and I got into some problem. I
> am using
>
>
>
> run_lapw -p -i 40 -fc 0.001 -I
>
>
>
> If I use a number of 0.001 for the option fc above, I got the following
> error:
>
>
>
> Force-convergence not possible. Forces not present.
>
>
>
> If I do not use a number for the -fc option, but use "run_lapw -p -i 40
> -fc
> -I" instead
>
>
>
> Then lapw0 finishes without a problem but the program doesn't branch to
> lapw1. An error message is generated when doing the test
>
>
>
> "if ($fcut == "0") goto lapw1
>
>
>
> I was able to do "run_lapw -p -i 40 -I", without the "-fc" option at all
> and
> was able to finish "lapw0 -p" and then start "lapw1 -p" but got into the
> following error:
>
>
>
> error: command   /home/zzhang/wien2k/lapw1para lapw1.def   failed
>
>
>
> Does anybody have similar problems and know how to fix this?
>
>
>
> It does the following:
>
>
>
> running LAPW1 in parallel mode (using .machines)
>
>
>
> and the .machines file is as follows:
>
>
>
> #
>
> lapw0:nx1  nx1  nx62  nx62
>
> lapw1:nx1  nx1  nx62  nx62
>
> lapw2:nx1  nx1  nx62  nx62
>
> 1:nx1
>
> 1:nx1
>
> 1:nx62
>
> 1:nx62
>
> granularity:1
>
> extrafine:1
>
>
>
> Thanks,
>
> Zhiyong
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>


_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
_______________________________________________
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien


More information about the Wien mailing list