[Wien] Restarting HF with SO

Thu May 18 20:03:47 CEST 2017

The -s option does not always work, because sometimes other files need
to be setup; I think I saw at least once this with an -hf calculation.
In many cases it is safer to restart the current cycle, e.g. do
mkdir Anxiety
cp * Anxiety
runsp_lapw -hf ... NI

You might get away with using -s lapw1, I am not sure.

On Thu, May 18, 2017 at 12:50 PM, Luis Ogando <lcodacal at gmail.com> wrote:
> Dear Prof. Marks,
>
>    Thank you very much for your help !
>    Unfortunately, I would like to understand why the  -s  option, designed
> to restart a calculation at the same point where it crashed, does not work.
> Without this, I am afraid that even your suggestion will not help.
>    Thank you again,
>                        Luis
>
>
> 2017-05-18 14:39 GMT-03:00 Laurence Marks <L-marks at northwestern.edu>:
>>
>> I don't have the answer, but you may want to contemplate in the future
>> doing something like a set of shorter runs saving the interim results
>>
>> for i in 1 2 3 4 ... XX
>> do
>>   mkdir Safety
>>   runsp_lapw -hf ... -i 3 -NI
>>   rm Safety/*bro*
>>   mv *bro* Safety
>>   save -f -d Safety
>>   cp Safety/*bro* ./ ; cp Safety/*.scf ./
>> done
>>
>> (It would be easier if save_lapw had an option to not delete the *bro*
>> files and retain case.scf -- a simple hack.)
>>
>> On Thu, May 18, 2017 at 12:27 PM, Luis Ogando <lcodacal at gmail.com> wrote:
>> > Dear Gavin,
>> >
>> >    Thank you very much for your answer.
>> >    I am using Wien2k 14.2 and, unfortunately, that was the only message
>> > I
>> > got from the standard output file (queuing system). The error files and
>> > case.dayfile have no useful information.
>> >    The interruption was during the  hf  execution, after lapw1, that
>> > finished without a problem.
>> >    It was not the first time I had to restart the calculation due to a
>> > shut
>> > down. In the other cases, I restarted the calculation from scratch, but,
>> > with a non parallel calculation, I have to solve this reinitialization
>> > issue
>> > or the calculation will never end. So, I would be glad if someone else
>> > could
>> > give me another hint.
>> >    Thank you again.
>> >    All the best,
>> >                      Luis
>> >
>> >
>> >
>> >
>> > 2017-05-18 11:35 GMT-03:00 Gavin Abo <gsabo at crimson.ua.edu>:
>> >>
>> >> Sorry, those code line numbers are for WIEN2k 16.1.  For example, if
>> >> you
>> >> are using WIEN2k 14.2, the line numbers should be 998 instead of 1354
>> >> and
>> >> 1006 instead of 1365 in SRC_hf/calc_h.F.
>> >>
>> >>
>> >> On 5/18/2017 8:19 AM, Gavin Abo wrote:
>> >>
>> >> Unfortunately, I think that error message can tell you "why" the
>> >> calculation stopped, but it might not tell you the initial "cause" of
>> >> it.
>> >> That is likely because the issue that caused it happened earlier in the
>> >> calculation (perhaps lapw1?).  The vector file size is smaller than the
>> >> vectorhf_old.  I'm not sure if they should be the same size or not.  If
>> >> so,
>> >> perhaps you need to restart the calculation in the lapw1 step (-s
>> >> lapw1) to
>> >> regenerate the vector file instead of starting with the hf step (-s
>> >> hf),
>> >> which I believe comes later in the calculation from that of lapw1, or
>> >> you
>> >> might just have to start the calculation over from scratch.
>> >>
>> >> In SRC_hf/calc_h_2.F, you should see:
>> >>
>> >> line 1354:
>> >> !_COMPLEX call
>> >> zheev('V','U',nbf,ham,nbf,enknew,workdiag,2*nbf-1,rworkdiag,info)
>> >>
>> >> line 1365:
>> >>         if (info .ne. 0) then
>> >>           print *, 'info=', info
>> >>           stop 'error in calc_h_2: info not equal to 0'
>> >>         endif
>> >>
>> >> From the code above, you can see that there likely should be a little
>> >> more
>> >> error information available from the "print *, 'info=', info" statement
>> >> that
>> >> you did not report.  I believe this should have been printed to the
>> >> standard
>> >> output (terminal or std output file if you are using a queuing system).
>> >>
>> >> Depending on the value of the info variable, the calculation seems to
>> >> have
>> >> stopped because it encountered an illegal value or there was a
>> >> convergence
>> >> problem [1]:
>> >>
>> >>         INFO is INTEGER
>> >>           = 0:  successful exit
>> >>           < 0:  if INFO = -i, the i-th argument had an illegal value
>> >>           > 0:  if INFO = i, the algorithm failed to converge; i
>> >>                 off-diagonal elements of an intermediate tridiagonal
>> >>                 form did not converge to zero.
>> >>
>> >> Perhaps, the software developers of the hf code have further insight
>> >> than
>> >> I currently do into what could resolve the problem.
>> >>
>> >> [1]
>> >>
>> >> http://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_ga70c041fd19635ff621cfd5d804bd7a30.html#ga70c041fd19635ff621cfd5d804bd7a30
>> >>
>> >> On 5/18/2017 5:52 AM, Luis Ogando wrote:
>> >>
>> >>    I do not know if it is relevant, but my calculation is complex (-c).
>> >>    Thank you again,
>> >>                     Luis
>> >>
>> >>
>> >> 2017-05-18 8:29 GMT-03:00 Luis Ogando <lcodacal at gmail.com>:
>> >>>
>> >>> Dear Wien2k community,
>> >>>
>> >>>    I am trying to calculate the dielectric function for wurtzite GaP
>> >>> using -hf and -so as previously discussed (
>> >>>
>> >>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg14603.html
>> >>> ).
>> >>>    There was a shut down of the machine during the  hf  execution in
>> >>> the
>> >>> first step of the calculation  (  run_lapw -hf ...  ). When the
>> >>> machine came
>> >>> back, I removed the case.vectorhf (case.vectorhf_old is still there)
>> >>> and
>> >>> case.energyhf.  Then, I executed
>> >>>
>> >>> run_lapw -hf -NI -s hf -ec 0.0001 -cc 0.0001 -i 200
>> >>>
>> >>> trying to restart the calculation (non-parallel execution due to the
>> >>> HF x
>> >>> SO issue discussed in the previous messages above).
>> >>>    The calculation restarted without a problem, but when the the
>> >>> case.vectorhf reached 187MB (less than a half of the expected size,
>> >>> see
>> >>> below) I got an error.
>> >>>
>> >>> -rw-r--r-- 1 luisoda luisoda 187M Mai 18 03:51
>> >>> GaPwurtHSE-DielSO-1.vector
>> >>> -rw-r--r-- 1 luisoda luisoda 187M Mai 18 00:14
>> >>> GaPwurtHSE-DielSO-1.vectorhf
>> >>> -rw-r--r-- 1 luisoda luisoda 565M Abr 23 21:33
>> >>> GaPwurtHSE-DielSO-1.vectorhf_old
>> >>>
>> >>>    The only related error message I found it was:
>> >>>
>> >>> error in calc_h: info not equal to 0
>> >>>
>> >>>    I am probably making a mistake when restarting the calculation and
>> >>> I
>> >>> would really appreciate any help with this issue.
>> >>>    Many thanks in advance.
>> >>>    All the best,
>> >>>              Luis
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Wien mailing list
>> >> Wien at zeus.theochem.tuwien.ac.at
>> >> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> >> SEARCH the MAILING-LIST at:
>> >> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> >>
>> >
>>
>>
>>
>> --
>> Professor Laurence Marks
>> "Research is to see what everybody else has seen, and to think what
>> nobody else has thought", Albert Szent-Gyorgi
>> www.numis.northwestern.edu ; Corrosion in 4D:
>> MURI4D.numis.northwestern.edu
>> Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent
>> Co-Editor, Acta Cryst A
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
>

-- 
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what
nobody else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu ; Corrosion in 4D: MURI4D.numis.northwestern.edu
Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent
Co-Editor, Acta Cryst A