[Wien] MPI Problem

Laurence Marks L-marks at northwestern.edu
Fri May 3 05:11:02 CEST 2013


I think these are semi-harmless, and you can add ",iostat=i" to the
relevant lines. You may need to add the same to any write statements to
unit 99 in errclr.f.

However, your timing seems strange, 6.5 serial versus 9.5 parallel. Is this
CPU time, the WALL time may be more reliable.

---------------------------
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
 On May 2, 2013 7:25 PM, "Oliver Albertini" <ora at georgetown.edu> wrote:

>  Dear W2K,
>
>  On an AIX 560 server with 16 processors, I have been running scf for NiO
> supercell (2x2x2) in serial as well as MPI parallel (one kpoint). The
> serial version runs fine. When running in parallel, the following error
> appears:
>
>  STOP LAPW2 - FERMI; weighs written
> "errclr.f", line 64: 1525-014 The I/O operation on unit 99 cannot be
> completed because an errno value of 2 (A file or directory in the path name
> does not exist.) was received while opening the file.  The program will
> stop.
>
>  A similar error that appears which does not stop the program is the
> following:
>
>   STOP  LAPW0 END
> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be
> completed because an errno value of 2 (A file or directory in the path name
> does not exist.) was received while closing the file.  The program will
> stop.
> STOP  LAPW1 END
>
>
> The second error is always there, while the former only appears with more
> than 2 (4,8 or 16) processors. Running the scf in serial took ~6.5 minutes,
> in parallel with two processors ~9.5 minutes. The problem occurs regardless
> of MPI/USER_REMOTE set to 0 or 1.
>
>
>  My compile options:
>
>  FC = xlf90
> MPF = mpxlf90
> CC = xlc -q64
> FOPT =  -O5 -qarch=pwr6 -q64 -qextname=flush:w2k_catch_signal
> FPOPT =  -O5 -qarch=pwr6 -q64 -qfree=f90
> -qextname=flush:w2k_catch_signal:fftw_mpi_execute_dft
> #DParallel = '-WF,-DParallel'
> FGEN = $(PARALLEL)
> LDFLAGS = -L /lapack-3.4.2/ -L /usr/lpp/ppe.poe/lib/ -L /usr/local/lib -I
> /usr/include -q64 -bnoquiet
> R_LIBS     = -llapack -lessl -lfftw3 -lm -lfftw3_essl_64
> RP_LIBS = $(R_LIBS) -lpessl -lmpi -lfftw3_mpi
>
>  WIEN_MPI_RUN='poe _EXEC_ -procs _NP_'
>
>  .machines and host.list attached.
>
>  As always, any advice on this matter would be great,
>
>  Oliver Albertini
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130502/d0573eb1/attachment.htm>


More information about the Wien mailing list