[Wien] lapwso_mpi error

Md. Fhokrul Islam fislam at hotmail.com
Thu Nov 17 19:47:38 CET 2016


Hi Gavin,


    After I tried your other suggestion of running a small job, which worked

out fine, I thought problem was not with ifort. But I think you are right. I have

recompiled Wien2k with -O0 option and, although the job is still running at

the 1st cycle, I didn't get error at LAPWSO and it is creating test.vectorso

files as it is supposed to be. Hopefully, problem is resolved now.

Thank you very much for your help.


Regards,
Fhokrul


________________________________
From: Wien <wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Gavin Abo <gsabo at crimson.ua.edu>
Sent: Thursday, November 17, 2016 1:07 PM
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] lapwso_mpi error

So you are using the ifort version with the unformatted file read bug.  Based on the Intel page at the link in the previous post below, did you try recompiling lapwso_mpi with -O0 or revert to one of the versions of ifort that Intel mentioned to see if it fixed the problem or not?

On 11/14/2016 8:34 AM, Md. Fhokrul Islam wrote:

Hi Gavin,


    Thanks for your suggestion. Yes, I am using 16.0.3.210 version of ifort. Debugging such a

big file with 'od'  seems to be difficult but I will try with a smaller system and see if I get the

same error.



Fhokrul


________________________________
From: Wien <wien-bounces at zeus.theochem.tuwien.ac.at><mailto:wien-bounces at zeus.theochem.tuwien.ac.at> on behalf of Gavin Abo <gsabo at crimson.ua.edu><mailto:gsabo at crimson.ua.edu>
Sent: Sunday, November 13, 2016 11:40 PM
To: A Mailing list for WIEN2k users
Subject: Re: [Wien] lapwso_mpi error

Ok, I agree that it is likely not due to the set up of the scratch directory.

What version of ifort was used?  If you happened to use 16.0.3.210, maybe it is caused by an ifort bug [ https://software.intel.com/en-us/articles/read-failure-unformatted-file-io-psxe-16-update-3 ].


Perhaps you can use the linux "od" command to try to troubleshot and identify what the data mismatch is between the writing and reading of the 3Mn.vectordn_1 file, similar to what is described on the web pages at:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/269993

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/270436

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/268503


Though, it might be harder to diagnose with the large 3Mn.vectordn_1, which looks to be about 12 GB.  So you may want to create a mpi SO calculation that creates a smaller case.vectordn_1 for that.

On 11/13/2016 7:30 AM, Md. Fhokrul Islam wrote:

Hi Gavin,


   In my .bashrc scratch is defined as  $SCRATCH = ./ so if I use the command

echo $SCRATCH, it always returns ./


For large jobs, I use local temporary directory that is associated with each node

in our system and is given by $SNIC_TMP.  This temporary directory is created

on fly, so I set $SCRATCH = $SNIC_TMP in my job submission script. As I said

this set up works fine if I do MPI calculations without spin-orbit and I get converged

results. But if I submit the job after initializing with spin-orbit, it crashes at lapwso.

SO I think problem is probably not due to the set up with scratch directory, it is

something to do with MPI version of LAPWSO.



Thanks for your comment.


Fhokrul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20161117/1d50b8d8/attachment.html>


More information about the Wien mailing list