[Wien] k point parallel calculations

Priyanka Seth priyanka.seth at polytechnique.edu
Tue Feb 24 19:19:14 CET 2015


Hello all,

I have been trying to run some k-point parallel calculations for some 
large structures and have been having problems for versions 12, 13 and 
14 on an ifort compilation. In all cases, I am running on the same 
number of cores as k vectors. Note that calculations begun from the same 
input and run on a single core calculation run without any problems.

v12/v13
=====

This is the output for versions 12 and 13 (I've removed the 
node-dependent lines):

LAPW0 END
LAPW1 END
LAPW2 - FERMI; weighs written
LAPW2 END
SUMPARA END
CORE  END
forrtl: severe (59): list-directed I/O syntax error, unit -5, file 
Internal List-Directed Read
Image              PC                Routine            Line Source
mixer              000000000051693D  Unknown               Unknown Unknown
mixer              0000000000515445  Unknown               Unknown Unknown
mixer              00000000004BC9E0  Unknown               Unknown Unknown
mixer              000000000046F4BA  Unknown               Unknown Unknown
mixer              000000000046ECB0  Unknown               Unknown Unknown
mixer              0000000000492B76  Unknown               Unknown Unknown
mixer              000000000049043B  Unknown               Unknown Unknown
mixer              0000000000407E7E  MAIN__                    168 mixer.F
mixer              000000000040414C  Unknown               Unknown Unknown
libc.so.6          00000037C241D994  Unknown               Unknown Unknown
mixer              0000000000403FC9  Unknown               Unknown Unknown

 >   stop error

Looking at the error files, I have "Error in MIXER" in both versions.

The dayfile ends as follows:
1.884u 0.844s 0:09.73 27.9%    0+0k 0+0io 8pf+0w
 >   lcore    (09:33:51) 0.046u 0.007s 0:00.14 28.5%    0+0k 0+0io 7pf+0w
 >   mixer    (09:33:51) 0.000u 0.005s 0:00.04 0.0%    0+0k 0+0io 8pf+0w
error: command   /home/pseth/SOURCES/WIEN2K_v13/mixer mixer.def failed

 >   stop error


v14
===

I get to the second cycle, but then the calculation crashes with "Error 
in LAPW1" in lapw1_*.error:

  LAPW2 END
  SUMPARA END
  CORE  END
  MIXER END
ec cc and fc_conv 0 0 1
in cycle 2    ETEST: 0   CTEST: 0
  LAPW0 END

There is nothing obviously wrong looking at the case.scf1_* files or at 
the dayfile which ends like this:

 >   lapw1  -p           (09:37:40) starting parallel lapw1 at Tue Feb 
10 09:37:40 CET 2015
->  starting parallel LAPW1 jobs at Tue Feb 10 09:37:40 CET 2015
running LAPW1 in parallel mode (using .machines)
24 number_of_parallel_jobs
[1] 30405
[2] 30437
[3] 30471
[4] 30507
[5] 30559
[6] 30606
[7] 30653
[8] 30717
[9] 30809
[10] 30916
[11] 31000
[12] 31070
[13] 31192
[14] 31329
[15] 31428
[16] 31504
[17] 31664
[18] 31788
[19] 31871
[20] 31900
[21] 31928
[22] 31956
[23] 31982
[24] 32010
[5]    Done                          ( ( $remote $machine[$p]  ...


I understand that this is not much information to go on, but I don't 
really know where else to look! Has anyone had similar issues? What else 
would help in diagnosing the problem?

Many thanks,
Priyanka


More information about the Wien mailing list