[Wien] [SPAM?] Re: k point parallel calculations
Gavin Abo
gsabo at crimson.ua.edu
Wed Feb 25 00:06:55 CET 2015
In addition:
Did you setup and run the calculation from scratch for each WIEN2k
version in its own directory? It is usually not a good idea to mix
initialization and run of a calculation in a single directory with
different WIEN2k versions. In WIEN2k 12/13, I believe the exchange and
correlation potential was specified by a number is case.in0. However,
words (characters) are now being used instead of a number in the 14 version.
Good, it looks like you have checked the case.dayfile and *.error
files. However, it looks like you have one of those cases where they
don't provide anything too useful. The other thing to check would be
the terminal output. Since it is failing in lapw1, you would want to
run just that step (x lapw1 -p) in a terminal and see what it gives you
as output in the terminal. If you are not allowed to run "x lapw1 -p"
directly in a terminal and are required to use a queue system like qsub,
the terminal output is usually written instead to a user named file (or
sometimes two files, an output and error file) instead of the terminal [
http://stackoverflow.com/questions/9096959/how-to-specify-error-log-file-and-output-file-in-qsub
]. So, you should check if you haven't already done so the standard
output and error file(s).
On 2/24/2015 11:39 AM, Laurence Marks wrote:
>
> I am not certain, but it looks like the mixer error for 12/13 is due
> to a format error in your case.in0. This may be incorrect, please look
> at what is at line 168 of your mixer.F.
>
> In most cases where I have seen errors such as this it is because
> something has gone wrong earlier. Check with "cat *.error" as all
> theses files should be empty. Check that your case.clmval and
> case.clmcor are not empty and do not contain NAN. Look at the end of
> the case.output* files to check that the programs really worked.
>
> ___________________________
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
> MURI4D.numis.northwestern.edu <http://MURI4D.numis.northwestern.edu>
> Co-Editor, Acta Cryst A
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi
>
> On Feb 24, 2015 12:19 PM, "Priyanka Seth"
> <priyanka.seth at polytechnique.edu
> <mailto:priyanka.seth at polytechnique.edu>> wrote:
>
> Hello all,
>
> I have been trying to run some k-point parallel calculations for some
> large structures and have been having problems for versions 12, 13 and
> 14 on an ifort compilation. In all cases, I am running on the same
> number of cores as k vectors. Note that calculations begun from
> the same
> input and run on a single core calculation run without any problems.
>
> v12/v13
> =====
>
> This is the output for versions 12 and 13 (I've removed the
> node-dependent lines):
>
> LAPW0 END
> LAPW1 END
> LAPW2 - FERMI; weighs written
> LAPW2 END
> SUMPARA END
> CORE END
> forrtl: severe (59): list-directed I/O syntax error, unit -5, file
> Internal List-Directed Read
> Image PC Routine Line Source
> mixer 000000000051693D Unknown Unknown Unknown
> mixer 0000000000515445 Unknown Unknown Unknown
> mixer 00000000004BC9E0 Unknown Unknown Unknown
> mixer 000000000046F4BA Unknown Unknown Unknown
> mixer 000000000046ECB0 Unknown Unknown Unknown
> mixer 0000000000492B76 Unknown Unknown Unknown
> mixer 000000000049043B Unknown Unknown Unknown
> mixer 0000000000407E7E MAIN__ 168 mixer.F
> mixer 000000000040414C Unknown Unknown Unknown
> libc.so.6 00000037C241D994 Unknown Unknown Unknown
> mixer 0000000000403FC9 Unknown Unknown Unknown
>
> > stop error
>
> Looking at the error files, I have "Error in MIXER" in both versions.
>
> The dayfile ends as follows:
> 1.884u 0.844s 0:09.73 27.9% 0+0k 0+0io 8pf+0w
> > lcore (09:33:51) 0.046u 0.007s 0:00.14 28.5% 0+0k 0+0io 7pf+0w
> > mixer (09:33:51) 0.000u 0.005s 0:00.04 0.0% 0+0k 0+0io
> 8pf+0w
> error: command /home/pseth/SOURCES/WIEN2K_v13/mixer mixer.def failed
>
> > stop error
>
>
> v14
> ===
>
> I get to the second cycle, but then the calculation crashes with
> "Error
> in LAPW1" in lapw1_*.error:
>
> LAPW2 END
> SUMPARA END
> CORE END
> MIXER END
> ec cc and fc_conv 0 0 1
> in cycle 2 ETEST: 0 CTEST: 0
> LAPW0 END
>
> There is nothing obviously wrong looking at the case.scf1_* files
> or at
> the dayfile which ends like this:
>
> > lapw1 -p (09:37:40) starting parallel lapw1 at Tue Feb
> 10 09:37:40 CET 2015
> -> starting parallel LAPW1 jobs at Tue Feb 10 09:37:40 CET 2015
> running LAPW1 in parallel mode (using .machines)
> 24 number_of_parallel_jobs
> [1] 30405
> [2] 30437
> [3] 30471
> [4] 30507
> [5] 30559
> [6] 30606
> [7] 30653
> [8] 30717
> [9] 30809
> [10] 30916
> [11] 31000
> [12] 31070
> [13] 31192
> [14] 31329
> [15] 31428
> [16] 31504
> [17] 31664
> [18] 31788
> [19] 31871
> [20] 31900
> [21] 31928
> [22] 31956
> [23] 31982
> [24] 32010
> [5] Done ( ( $remote $machine[$p] ...
>
>
> I understand that this is not much information to go on, but I don't
> really know where else to look! Has anyone had similar issues?
> What else
> would help in diagnosing the problem?
>
> Many thanks,
> Priyanka
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20150224/2d26eb4c/attachment.html>
More information about the Wien
mailing list