[Wien] MPI Problem

Laurence Marks L-marks at northwestern.edu
Sat May 4 05:32:33 CEST 2013


Please have a look at the end of case.outputup_* which gives the real cpu
and wall times and post those. It may be that the times being reported are
misleading.

In addition, I do not understand why you are seeing an error and the script
is continuing - it should not. Maybe some of the tasks are not working or
there are bugs in the csh. It may be useful to post the dayfile.

---------------------------
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On May 3, 2013 6:47 PM, "Oliver Albertini" <ora at georgetown.edu> wrote:

>  Thanks to you both for the suggestions. The OS was recently updated
> beyond those versions mentioned in the link (now 6100-08).
>
>  Adding the iostat statement to all the errclr.f files prevents the
> program from stopping altogether although error messages sill appear in the
> output:
>
>  STOP  LAPW0 END
> STOP  LAPW0 END
> STOP  LAPW0 END
> STOP  LAPW0 END
> STOP  LAPW0 END
> STOP LAPW1 - Error
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP LAPW1 - Error
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP LAPW2 - FERMI; weighs written
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  SUMPARA END
> STOP LAPW2 - FERMI; weighs written
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  SUMPARA END
> STOP  CORE  END
> STOP  CORE  END
> STOP  MIXER END
>
>
>  which are more prevalent when using higher processor counts. After
> completing a few runs with more processors, the times have continually
> increased:
>
>  real    6m43.33s
>
>
> user    6m19.18s    serial
>
>
> sys     0m13.59s
>
>
>
>
>
> real    10m36.03s
>
>
> user    1m4.68s       2proc
>
>
> sys     0m47.79s
>
>
>
>
>
> real    11m11.25s
>
>
> user    1m5.24s     4proc
>
>
> sys     0m52.17s
>
>
>
>
>
> real    11m39.17s
>
>
> user    1m6.18s    8proc
>
>
> sys     1m10.65s
>
>
>
>
>
> real    14m31.16s
>
>
> user    1m7.95s   16proc
>
>
> sys     2m7.63s
>
>  After looking into various IBM Parallel Operating Environment (poe)
> environmental variables (MP_SHARED_MEMORY,MP_IO_BUFFER_SIZE,MP_EAGER_LIMIT)
> it seems like none of them are improving performance. Any ideas why this is
> getting slower?
>
>
> On Thu, May 2, 2013 at 8:49 PM, Gavin Abo <gsabo at crimson.ua.edu> wrote:
>
>>
>>  STOP  LAPW0 END
>>> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be
>>> completed because an errno value of 2 (A file or directory in the path name
>>> does not exist.) was received while closing the file.  The program will
>>> stop.
>>> STOP  LAPW1 END
>>>
>>  If this is on operating system AIX 6.1 [http://zeus.theochem.tuwien.**
>> ac.at/pipermail/wien/2013-**March/018560.html<http://zeus.theochem.tuwien.ac.at/pipermail/wien/2013-March/018560.html>],
>> the following link mentions that a fix might be needed for some release
>> levels:
>>
>> http://www-01.ibm.com/support/**docview.wss?uid=isg1IZ23555<http://www-01.ibm.com/support/docview.wss?uid=isg1IZ23555>
>> ______________________________**_________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.**at <Wien at zeus.theochem.tuwien.ac.at>
>> http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>
>> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/**
>> wien at zeus.theochem.tuwien.ac.**at/index.html<http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130503/735c92a5/attachment.htm>


More information about the Wien mailing list