[Wien] k point parallel calculations

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Feb 25 07:40:29 CET 2015


Looks like a problem with the network / fileserver / NFS.

It looks as if some files are not written properly.

Look into mixer at line 168. It is reading something incorrectly ...
(which could again be a NFS problem.

Are you sure that there is nothing wrong with case.scf1_* ?
Do     ls -alsrt *scf1_*

Are all files written properly (till the end, check their size),
are the dates/timestamp correct ?

Try parallelization with fewer nodes.

Am 24.02.2015 um 19:19 schrieb Priyanka Seth:
> Hello all,
>
> I have been trying to run some k-point parallel calculations for some large structures and have been having problems for versions 12, 13 and 14 on an ifort compilation. In
> all cases, I am running on the same number of cores as k vectors. Note that calculations begun from the same input and run on a single core calculation run without any
> problems.
>
> v12/v13
> =====
>
> This is the output for versions 12 and 13 (I've removed the node-dependent lines):
>
> LAPW0 END
> LAPW1 END
> LAPW2 - FERMI; weighs written
> LAPW2 END
> SUMPARA END
> CORE  END
> forrtl: severe (59): list-directed I/O syntax error, unit -5, file Internal List-Directed Read
> Image              PC                Routine            Line Source
> mixer              000000000051693D  Unknown               Unknown Unknown
> mixer              0000000000515445  Unknown               Unknown Unknown
> mixer              00000000004BC9E0  Unknown               Unknown Unknown
> mixer              000000000046F4BA  Unknown               Unknown Unknown
> mixer              000000000046ECB0  Unknown               Unknown Unknown
> mixer              0000000000492B76  Unknown               Unknown Unknown
> mixer              000000000049043B  Unknown               Unknown Unknown
> mixer              0000000000407E7E  MAIN__                    168 mixer.F
> mixer              000000000040414C  Unknown               Unknown Unknown
> libc.so.6          00000037C241D994  Unknown               Unknown Unknown
> mixer              0000000000403FC9  Unknown               Unknown Unknown
>
>  >   stop error
>
> Looking at the error files, I have "Error in MIXER" in both versions.
>
> The dayfile ends as follows:
> 1.884u 0.844s 0:09.73 27.9%    0+0k 0+0io 8pf+0w
>  >   lcore    (09:33:51) 0.046u 0.007s 0:00.14 28.5%    0+0k 0+0io 7pf+0w
>  >   mixer    (09:33:51) 0.000u 0.005s 0:00.04 0.0%    0+0k 0+0io 8pf+0w
> error: command   /home/pseth/SOURCES/WIEN2K_v13/mixer mixer.def failed
>
>  >   stop error
>
>
> v14
> ===
>
> I get to the second cycle, but then the calculation crashes with "Error in LAPW1" in lapw1_*.error:
>
>   LAPW2 END
>   SUMPARA END
>   CORE  END
>   MIXER END
> ec cc and fc_conv 0 0 1
> in cycle 2    ETEST: 0   CTEST: 0
>   LAPW0 END
>
> There is nothing obviously wrong looking at the case.scf1_* files or at the dayfile which ends like this:
>
>  >   lapw1  -p           (09:37:40) starting parallel lapw1 at Tue Feb 10 09:37:40 CET 2015
> ->  starting parallel LAPW1 jobs at Tue Feb 10 09:37:40 CET 2015
> running LAPW1 in parallel mode (using .machines)
> 24 number_of_parallel_jobs
> [1] 30405
> [2] 30437
> [3] 30471
> [4] 30507
> [5] 30559
> [6] 30606
> [7] 30653
> [8] 30717
> [9] 30809
> [10] 30916
> [11] 31000
> [12] 31070
> [13] 31192
> [14] 31329
> [15] 31428
> [16] 31504
> [17] 31664
> [18] 31788
> [19] 31871
> [20] 31900
> [21] 31928
> [22] 31956
> [23] 31982
> [24] 32010
> [5]    Done                          ( ( $remote $machine[$p]  ...
>
>
> I understand that this is not much information to go on, but I don't really know where else to look! Has anyone had similar issues? What else would help in diagnosing the
> problem?
>
> Many thanks,
> Priyanka
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
-----------------------------------------
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-----------------------------------------


More information about the Wien mailing list