[Wien] MPI Problem
Laurence Marks
L-marks at northwestern.edu
Sat May 4 15:07:24 CEST 2013
It looks as if your .machines file is OK, I assume that you added the
A*** in front for emailing, but Wien2k does not use a hosts file
itself. I guess that you are using a server at ibm in almaden.
Unfortunately very few people that I know of are running WIen2k on
ibm/aix machines which is going to make it very hard for anyone to
give useful advice remotely by guessing.
I suggest that you download the benchmarks from
http://www.wien2k.at/reg_user/benchmark/ and run these then compare
the times. Beyond that get help from someone at ibm who knows the poe
command. Or try something more standard such as openmpi which many
people know.
On Fri, May 3, 2013 at 10:32 PM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> Please have a look at the end of case.outputup_* which gives the real cpu
> and wall times and post those. It may be that the times being reported are
> misleading.
>
> In addition, I do not understand why you are seeing an error and the script
> is continuing - it should not. Maybe some of the tasks are not working or
> there are bugs in the csh. It may be useful to post the dayfile.
>
> ---------------------------
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>
> On May 3, 2013 6:47 PM, "Oliver Albertini" <ora at georgetown.edu> wrote:
>>
>> Thanks to you both for the suggestions. The OS was recently updated beyond
>> those versions mentioned in the link (now 6100-08).
>>
>> Adding the iostat statement to all the errclr.f files prevents the program
>> from stopping altogether although error messages sill appear in the output:
>>
>> STOP LAPW0 END
>> STOP LAPW0 END
>> STOP LAPW0 END
>> STOP LAPW0 END
>> STOP LAPW0 END
>> STOP LAPW1 - Error
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW1 - Error
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW1 END
>> STOP LAPW2 - FERMI; weighs written
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP SUMPARA END
>> STOP LAPW2 - FERMI; weighs written
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP LAPW2 END
>> STOP SUMPARA END
>> STOP CORE END
>> STOP CORE END
>> STOP MIXER END
>>
>>
>> which are more prevalent when using higher processor counts. After
>> completing a few runs with more processors, the times have continually
>> increased:
>>
>> real 6m43.33s
>> user 6m19.18s serial
>> sys 0m13.59s
>>
>> real 10m36.03s
>> user 1m4.68s 2proc
>> sys 0m47.79s
>>
>> real 11m11.25s
>> user 1m5.24s 4proc
>> sys 0m52.17s
>>
>> real 11m39.17s
>> user 1m6.18s 8proc
>> sys 1m10.65s
>>
>> real 14m31.16s
>> user 1m7.95s 16proc
>> sys 2m7.63s
>>
>> After looking into various IBM Parallel Operating Environment (poe)
>> environmental variables (MP_SHARED_MEMORY,MP_IO_BUFFER_SIZE,MP_EAGER_LIMIT)
>> it seems like none of them are improving performance. Any ideas why this is
>> getting slower?
>>
>>
>> On Thu, May 2, 2013 at 8:49 PM, Gavin Abo <gsabo at crimson.ua.edu> wrote:
>>>
>>>
>>>> STOP LAPW0 END
>>>> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be
>>>> completed because an errno value of 2 (A file or directory in the path name
>>>> does not exist.) was received while closing the file. The program will
>>>> stop.
>>>> STOP LAPW1 END
>>>
>>> If this is on operating system AIX 6.1
>>> [http://zeus.theochem.tuwien.ac.at/pipermail/wien/2013-March/018560.html],
>>> the following link mentions that a fix might be needed for some release
>>> levels:
>>>
>>> http://www-01.ibm.com/support/docview.wss?uid=isg1IZ23555
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
More information about the Wien
mailing list