Hm, interesting, something else to try. About the two hours for the lapw1 cycle, it was mostly because I had mistyped RKMAX. For some weird reason I had placed 9 instead of 7.5... Now that I replaced it by 7.5, and performed ipo in the compilation, the time went down to 35 minutes on the slowest nodes. Thanks for the tip for TMPDIR, I will also try it.<div>
<br></div><div>Marcos<br><br><div class="gmail_quote">On Thu, Jul 29, 2010 at 2:09 PM, Laurence Marks <span dir="ltr"><<a href="mailto:L-marks@northwestern.edu">L-marks@northwestern.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
If it takes 2 hours for a lapw1up/dn, you need to use more resources,<br>
either mpi or nodes.<br>
<br>
As an addendum, you may want to set TMPDIR (maybe FORT_TMPDIR) for<br>
some scratch files which are used by (I think) lapw2, see<br>
<br>
<a href="http://software.intel.com/en-us/forums/showthread.php?t=60212&o=d&s=lr" target="_blank">http://software.intel.com/en-us/forums/showthread.php?t=60212&o=d&s=lr</a><br>
<div><div></div><div class="h5"><br>
2010/7/28 Marcos Veríssimo Alves <<a href="mailto:marcos.verissimo.alves@gmail.com">marcos.verissimo.alves@gmail.com</a>>:<br>
> Hi Laurence,<br>
> I will try all you have said. I didn't know about the -assu buff option - I<br>
> suppose it is valid for ifort, right?<br>
> My scratch is already set. In fact, it was one of the variables I had the<br>
> care to set, because I saw the size of the vector files (scary...)<br>
> Finally, no problem with slowing down things a little. I'd rather have<br>
> things slowed down a few seconds than have two hours lost (that is roughly<br>
> the time it takes for running lapw1 -up/dn for my system) plus the hours<br>
> when nothing happen until I realize the job has died... And I stil have to<br>
> include U and spin-orbit after that!<br>
> Thanks a lot,<br>
> Marcos<br>
><br>
> On Wed, Jul 28, 2010 at 9:00 PM, Laurence Marks <<a href="mailto:L-marks@northwestern.edu">L-marks@northwestern.edu</a>><br>
> wrote:<br>
>><br>
>> This could be quite a lot of work. Some simpler suggestions:<br>
>><br>
>> 1) In param.inc in SRC_lapw[0-2] change to<br>
>> PARAMETER (restrict_output= 1)<br>
>> This will reduce the size of the log files<br>
>><br>
>> 2) Use -assu buff in your compilation options -- this writes data in<br>
>> big chunks not line-by-line and is much<br>
>> friendlier on file servers.<br>
>><br>
>> 3) Set the environmental variable SCRATCH (export it from bash should<br>
>> work) so large data files such<br>
>> as the case.vector_X are local.<br>
>><br>
>> 4) In $WIENROOT/parallel_options add (or edit)<br>
>> set sleepy = XX # additional sleep before checking<br>
>> set delay = YY<br>
>><br>
>> where XX, YY are adjusted to try and reduce AFS problems ( 0.5 ? --<br>
>> this will slow things down but...)<br>
>><br>
>><br>
>> 2010/7/28 Marcos Veríssimo Alves <<a href="mailto:marcos.verissimo.alves@gmail.com">marcos.verissimo.alves@gmail.com</a>>:<br>
>> > Hi all,<br>
>> > I have managed to run Wien2k in our cluster, with k-point<br>
>> > parallelization.<br>
>> > However, it looks like our NFS system (which is actually an AFS one) is<br>
>> > still a bit unstable, since the cluster has been upgraded and<br>
>> > re-assembled<br>
>> > very recently. Problem is, the sysadmins have gone on vacations, so I'll<br>
>> > have to find a way of getting around this the best I can until the<br>
>> > beginning<br>
>> > of next month.<br>
>> > My current problem is that looks like some nodes of our cluster have<br>
>> > been<br>
>> > losing connection with the AFS server intermittently, and from what I<br>
>> > see<br>
>> > (please correct me if I'm wrong) all the writing is done over the<br>
>> > network to<br>
>> > the home directory. So, during the writing of the energy_up files, if<br>
>> > the<br>
>> > connection is lost then lapw2 will crash. Indeed, one of the instances<br>
>> > of<br>
>> > lapw1 resulted in an energyup file, in the end, with 0 size. This in<br>
>> > turn<br>
>> > made lapw2 crash, and this has happened overnight.<br>
>> > My question is, I would like to make a small (I guess) change in the<br>
>> > scripts, wherever needed. Instead of writing some files (only the ones<br>
>> > that<br>
>> > are critical for the execution of the next code) to the home, which<br>
>> > would be<br>
>> > done over AFS, they would be done in the scratch directory, which is<br>
>> > local.<br>
>> > Then, at the end of the execution, they would be copied to the home<br>
>> > directory, possibly with a check on the success of the operation. I<br>
>> > don't<br>
>> > know if this would be better, but at least the problems with network<br>
>> > load<br>
>> > would be much more punctual, and it could also be more prone to error<br>
>> > control.<br>
>> > Since I do not have much knowledge of csh programming (I'm mostly a bash<br>
>> > guy) and the Wien2k scripts are pretty complex beasts to which I am not<br>
>> > very<br>
>> > acquainted, could you give your opinions on the feasibility of my<br>
>> > suggestions, and if they are not too complex to implement, possible<br>
>> > changes<br>
>> > and/or places to be changed in the scripts?<br>
>> > Best regards,<br>
>> > Marcos<br>
>> > _______________________________________________<br>
>> > Wien mailing list<br>
>> > <a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
>> > <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
>> ><br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> Laurence Marks<br>
>> Department of Materials Science and Engineering<br>
>> MSE Rm 2036 Cook Hall<br>
>> 2220 N Campus Drive<br>
>> Northwestern University<br>
>> Evanston, IL 60208, USA<br>
>> Tel: (847) 491-3996 Fax: (847) 491-7820<br>
>> email: L-marks at northwestern dot edu<br>
>> Web: <a href="http://www.numis.northwestern.edu" target="_blank">www.numis.northwestern.edu</a><br>
>> Chair, Commission on Electron Crystallography of IUCR<br>
>> <a href="http://www.numis.northwestern.edu/" target="_blank">www.numis.northwestern.edu/</a><br>
>> Electron crystallography is the branch of science that uses electron<br>
>> scattering and imaging to study the structure of matter.<br>
>> _______________________________________________<br>
>> Wien mailing list<br>
>> <a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
>> <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
><br>
><br>
> _______________________________________________<br>
> Wien mailing list<br>
> <a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
> <a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Laurence Marks<br>
Department of Materials Science and Engineering<br>
MSE Rm 2036 Cook Hall<br>
2220 N Campus Drive<br>
Northwestern University<br>
Evanston, IL 60208, USA<br>
Tel: (847) 491-3996 Fax: (847) 491-7820<br>
email: L-marks at northwestern dot edu<br>
Web: <a href="http://www.numis.northwestern.edu" target="_blank">www.numis.northwestern.edu</a><br>
Chair, Commission on Electron Crystallography of IUCR<br>
<a href="http://www.numis.northwestern.edu/" target="_blank">www.numis.northwestern.edu/</a><br>
Electron crystallography is the branch of science that uses electron<br>
scattering and imaging to study the structure of matter.<br>
_______________________________________________<br>
Wien mailing list<br>
<a href="mailto:Wien@zeus.theochem.tuwien.ac.at">Wien@zeus.theochem.tuwien.ac.at</a><br>
<a href="http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien" target="_blank">http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien</a><br>
</div></div></blockquote></div><br></div>