[Wien] Restarting HF with SO
Laurence Marks
L-marks at northwestern.edu
Thu May 18 20:03:47 CEST 2017
The -s option does not always work, because sometimes other files need
to be setup; I think I saw at least once this with an -hf calculation.
In many cases it is safer to restart the current cycle, e.g. do
mkdir Anxiety
cp * Anxiety
runsp_lapw -hf ... NI
You might get away with using -s lapw1, I am not sure.
On Thu, May 18, 2017 at 12:50 PM, Luis Ogando <lcodacal at gmail.com> wrote:
> Dear Prof. Marks,
>
> Thank you very much for your help !
> Unfortunately, I would like to understand why the -s option, designed
> to restart a calculation at the same point where it crashed, does not work.
> Without this, I am afraid that even your suggestion will not help.
> Thank you again,
> Luis
>
>
> 2017-05-18 14:39 GMT-03:00 Laurence Marks <L-marks at northwestern.edu>:
>>
>> I don't have the answer, but you may want to contemplate in the future
>> doing something like a set of shorter runs saving the interim results
>>
>> for i in 1 2 3 4 ... XX
>> do
>> mkdir Safety
>> runsp_lapw -hf ... -i 3 -NI
>> rm Safety/*bro*
>> mv *bro* Safety
>> save -f -d Safety
>> cp Safety/*bro* ./ ; cp Safety/*.scf ./
>> done
>>
>> (It would be easier if save_lapw had an option to not delete the *bro*
>> files and retain case.scf -- a simple hack.)
>>
>> On Thu, May 18, 2017 at 12:27 PM, Luis Ogando <lcodacal at gmail.com> wrote:
>> > Dear Gavin,
>> >
>> > Thank you very much for your answer.
>> > I am using Wien2k 14.2 and, unfortunately, that was the only message
>> > I
>> > got from the standard output file (queuing system). The error files and
>> > case.dayfile have no useful information.
>> > The interruption was during the hf execution, after lapw1, that
>> > finished without a problem.
>> > It was not the first time I had to restart the calculation due to a
>> > shut
>> > down. In the other cases, I restarted the calculation from scratch, but,
>> > with a non parallel calculation, I have to solve this reinitialization
>> > issue
>> > or the calculation will never end. So, I would be glad if someone else
>> > could
>> > give me another hint.
>> > Thank you again.
>> > All the best,
>> > Luis
>> >
>> >
>> >
>> >
>> > 2017-05-18 11:35 GMT-03:00 Gavin Abo <gsabo at crimson.ua.edu>:
>> >>
>> >> Sorry, those code line numbers are for WIEN2k 16.1. For example, if
>> >> you
>> >> are using WIEN2k 14.2, the line numbers should be 998 instead of 1354
>> >> and
>> >> 1006 instead of 1365 in SRC_hf/calc_h.F.
>> >>
>> >>
>> >> On 5/18/2017 8:19 AM, Gavin Abo wrote:
>> >>
>> >> Unfortunately, I think that error message can tell you "why" the
>> >> calculation stopped, but it might not tell you the initial "cause" of
>> >> it.
>> >> That is likely because the issue that caused it happened earlier in the
>> >> calculation (perhaps lapw1?). The vector file size is smaller than the
>> >> vectorhf_old. I'm not sure if they should be the same size or not. If
>> >> so,
>> >> perhaps you need to restart the calculation in the lapw1 step (-s
>> >> lapw1) to
>> >> regenerate the vector file instead of starting with the hf step (-s
>> >> hf),
>> >> which I believe comes later in the calculation from that of lapw1, or
>> >> you
>> >> might just have to start the calculation over from scratch.
>> >>
>> >> In SRC_hf/calc_h_2.F, you should see:
>> >>
>> >> line 1354:
>> >> !_COMPLEX call
>> >> zheev('V','U',nbf,ham,nbf,enknew,workdiag,2*nbf-1,rworkdiag,info)
>> >>
>> >> line 1365:
>> >> if (info .ne. 0) then
>> >> print *, 'info=', info
>> >> stop 'error in calc_h_2: info not equal to 0'
>> >> endif
>> >>
>> >> From the code above, you can see that there likely should be a little
>> >> more
>> >> error information available from the "print *, 'info=', info" statement
>> >> that
>> >> you did not report. I believe this should have been printed to the
>> >> standard
>> >> output (terminal or std output file if you are using a queuing system).
>> >>
>> >> Depending on the value of the info variable, the calculation seems to
>> >> have
>> >> stopped because it encountered an illegal value or there was a
>> >> convergence
>> >> problem [1]:
>> >>
>> >> INFO is INTEGER
>> >> = 0: successful exit
>> >> < 0: if INFO = -i, the i-th argument had an illegal value
>> >> > 0: if INFO = i, the algorithm failed to converge; i
>> >> off-diagonal elements of an intermediate tridiagonal
>> >> form did not converge to zero.
>> >>
>> >> Perhaps, the software developers of the hf code have further insight
>> >> than
>> >> I currently do into what could resolve the problem.
>> >>
>> >> [1]
>> >>
>> >> http://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_ga70c041fd19635ff621cfd5d804bd7a30.html#ga70c041fd19635ff621cfd5d804bd7a30
>> >>
>> >> On 5/18/2017 5:52 AM, Luis Ogando wrote:
>> >>
>> >> I do not know if it is relevant, but my calculation is complex (-c).
>> >> Thank you again,
>> >> Luis
>> >>
>> >>
>> >> 2017-05-18 8:29 GMT-03:00 Luis Ogando <lcodacal at gmail.com>:
>> >>>
>> >>> Dear Wien2k community,
>> >>>
>> >>> I am trying to calculate the dielectric function for wurtzite GaP
>> >>> using -hf and -so as previously discussed (
>> >>>
>> >>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14603.html
>> >>> ).
>> >>> There was a shut down of the machine during the hf execution in
>> >>> the
>> >>> first step of the calculation ( run_lapw -hf ... ). When the
>> >>> machine came
>> >>> back, I removed the case.vectorhf (case.vectorhf_old is still there)
>> >>> and
>> >>> case.energyhf. Then, I executed
>> >>>
>> >>> run_lapw -hf -NI -s hf -ec 0.0001 -cc 0.0001 -i 200
>> >>>
>> >>> trying to restart the calculation (non-parallel execution due to the
>> >>> HF x
>> >>> SO issue discussed in the previous messages above).
>> >>> The calculation restarted without a problem, but when the the
>> >>> case.vectorhf reached 187MB (less than a half of the expected size,
>> >>> see
>> >>> below) I got an error.
>> >>>
>> >>> -rw-r--r-- 1 luisoda luisoda 187M Mai 18 03:51
>> >>> GaPwurtHSE-DielSO-1.vector
>> >>> -rw-r--r-- 1 luisoda luisoda 187M Mai 18 00:14
>> >>> GaPwurtHSE-DielSO-1.vectorhf
>> >>> -rw-r--r-- 1 luisoda luisoda 565M Abr 23 21:33
>> >>> GaPwurtHSE-DielSO-1.vectorhf_old
>> >>>
>> >>> The only related error message I found it was:
>> >>>
>> >>> error in calc_h: info not equal to 0
>> >>>
>> >>> I am probably making a mistake when restarting the calculation and
>> >>> I
>> >>> would really appreciate any help with this issue.
>> >>> Many thanks in advance.
>> >>> All the best,
>> >>> Luis
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Wien mailing list
>> >> Wien at zeus.theochem.tuwien.ac.at
>> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >> SEARCH the MAILING-LIST at:
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >>
>> >
>>
>>
>>
>> --
>> Professor Laurence Marks
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought", Albert Szent-Gyorgi
>> www.numis.northwestern.edu ; Corrosion in 4D:
>> MURI4D.numis.northwestern.edu
>> Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent
>> Co-Editor, Acta Cryst A
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>
--
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what
nobody else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu ; Corrosion in 4D: MURI4D.numis.northwestern.edu
Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent
Co-Editor, Acta Cryst A
More information about the Wien
mailing list