[Wien] MPI Problem
Laurence Marks
L-marks at northwestern.edu
Sat May 4 05:32:33 CEST 2013
Please have a look at the end of case.outputup_* which gives the real cpu
and wall times and post those. It may be that the times being reported are
misleading.
In addition, I do not understand why you are seeing an error and the script
is continuing - it should not. Maybe some of the tasks are not working or
there are bugs in the csh. It may be useful to post the dayfile.
---------------------------
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On May 3, 2013 6:47 PM, "Oliver Albertini" <ora at georgetown.edu> wrote:
> Thanks to you both for the suggestions. The OS was recently updated
> beyond those versions mentioned in the link (now 6100-08).
>
> Adding the iostat statement to all the errclr.f files prevents the
> program from stopping altogether although error messages sill appear in the
> output:
>
> STOP LAPW0 END
> STOP LAPW0 END
> STOP LAPW0 END
> STOP LAPW0 END
> STOP LAPW0 END
> STOP LAPW1 - Error
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW1 - Error
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW2 - FERMI; weighs written
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP SUMPARA END
> STOP LAPW2 - FERMI; weighs written
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP LAPW2 END
> STOP SUMPARA END
> STOP CORE END
> STOP CORE END
> STOP MIXER END
>
>
> which are more prevalent when using higher processor counts. After
> completing a few runs with more processors, the times have continually
> increased:
>
> real 6m43.33s
>
>
> user 6m19.18s serial
>
>
> sys 0m13.59s
>
>
>
>
>
> real 10m36.03s
>
>
> user 1m4.68s 2proc
>
>
> sys 0m47.79s
>
>
>
>
>
> real 11m11.25s
>
>
> user 1m5.24s 4proc
>
>
> sys 0m52.17s
>
>
>
>
>
> real 11m39.17s
>
>
> user 1m6.18s 8proc
>
>
> sys 1m10.65s
>
>
>
>
>
> real 14m31.16s
>
>
> user 1m7.95s 16proc
>
>
> sys 2m7.63s
>
> After looking into various IBM Parallel Operating Environment (poe)
> environmental variables (MP_SHARED_MEMORY,MP_IO_BUFFER_SIZE,MP_EAGER_LIMIT)
> it seems like none of them are improving performance. Any ideas why this is
> getting slower?
>
>
> On Thu, May 2, 2013 at 8:49 PM, Gavin Abo <gsabo at crimson.ua.edu> wrote:
>
>>
>> STOP LAPW0 END
>>> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be
>>> completed because an errno value of 2 (A file or directory in the path name
>>> does not exist.) was received while closing the file. The program will
>>> stop.
>>> STOP LAPW1 END
>>>
>> If this is on operating system AIX 6.1 [http://zeus.theochem.tuwien.**
>> ac.at/pipermail/wien/2013-**March/018560.html<http://zeus.theochem.tuwien.ac.at/pipermail/wien/2013-March/018560.html>],
>> the following link mentions that a fix might be needed for some release
>> levels:
>>
>> http://www-01.ibm.com/support/**docview.wss?uid=isg1IZ23555<http://www-01.ibm.com/support/docview.wss?uid=isg1IZ23555>
>> ______________________________**_________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.**at <Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>> SEARCH the MAILING-LIST at: http://www.mail-archive.com/**
>> wien at zeus.theochem.tuwien.ac.**at/index.html<http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130503/735c92a5/attachment.htm>
More information about the Wien
mailing list