[Wien] MPI Problem

Laurence Marks L-marks at northwestern.edu
Sat May 4 15:07:24 CEST 2013


It looks as if your .machines file is OK, I assume that you added the
A*** in front for emailing, but Wien2k does not use a hosts file
itself. I guess that you are using a server at ibm in almaden.
Unfortunately very few people that I know of are running WIen2k on
ibm/aix machines which is going to make it very hard for anyone to
give useful advice remotely by guessing.

I suggest that you download the benchmarks from
http://www.wien2k.at/reg_user/benchmark/ and run these then compare
the times. Beyond that get help from someone at ibm who knows the poe
command. Or try something more standard such as openmpi which many
people know.

On Fri, May 3, 2013 at 10:32 PM, Laurence Marks
<L-marks at northwestern.edu> wrote:
> Please have a look at the end of case.outputup_* which gives the real cpu
> and wall times and post those. It may be that the times being reported are
> misleading.
>
> In addition, I do not understand why you are seeing an error and the script
> is continuing - it should not. Maybe some of the tasks are not working or
> there are bugs in the csh. It may be useful to post the dayfile.
>
> ---------------------------
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>
> On May 3, 2013 6:47 PM, "Oliver Albertini" <ora at georgetown.edu> wrote:
>>
>> Thanks to you both for the suggestions. The OS was recently updated beyond
>> those versions mentioned in the link (now 6100-08).
>>
>> Adding the iostat statement to all the errclr.f files prevents the program
>> from stopping altogether although error messages sill appear in the output:
>>
>> STOP  LAPW0 END
>> STOP  LAPW0 END
>> STOP  LAPW0 END
>> STOP  LAPW0 END
>> STOP  LAPW0 END
>> STOP LAPW1 - Error
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP LAPW1 - Error
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP LAPW2 - FERMI; weighs written
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  SUMPARA END
>> STOP LAPW2 - FERMI; weighs written
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  LAPW2 END
>> STOP  SUMPARA END
>> STOP  CORE  END
>> STOP  CORE  END
>> STOP  MIXER END
>>
>>
>> which are more prevalent when using higher processor counts. After
>> completing a few runs with more processors, the times have continually
>> increased:
>>
>> real    6m43.33s
>> user    6m19.18s    serial
>> sys     0m13.59s
>>
>> real    10m36.03s
>> user    1m4.68s       2proc
>> sys     0m47.79s
>>
>> real    11m11.25s
>> user    1m5.24s     4proc
>> sys     0m52.17s
>>
>> real    11m39.17s
>> user    1m6.18s    8proc
>> sys     1m10.65s
>>
>> real    14m31.16s
>> user    1m7.95s   16proc
>> sys     2m7.63s
>>
>> After looking into various IBM Parallel Operating Environment (poe)
>> environmental variables (MP_SHARED_MEMORY,MP_IO_BUFFER_SIZE,MP_EAGER_LIMIT)
>> it seems like none of them are improving performance. Any ideas why this is
>> getting slower?
>>
>>
>> On Thu, May 2, 2013 at 8:49 PM, Gavin Abo <gsabo at crimson.ua.edu> wrote:
>>>
>>>
>>>> STOP  LAPW0 END
>>>> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be
>>>> completed because an errno value of 2 (A file or directory in the path name
>>>> does not exist.) was received while closing the file.  The program will
>>>> stop.
>>>> STOP  LAPW1 END
>>>
>>> If this is on operating system AIX 6.1
>>> [http://zeus.theochem.tuwien.ac.at/pipermail/wien/2013-March/018560.html],
>>> the following link mentions that a fix might be needed for some release
>>> levels:
>>>
>>> http://www-01.ibm.com/support/docview.wss?uid=isg1IZ23555
>>> _______________________________________________
>>> Wien mailing list
>>> Wien at zeus.theochem.tuwien.ac.at
>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>> SEARCH the MAILING-LIST at:
>>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>>
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


More information about the Wien mailing list