[Wien] a parallel error of lapw0 with MBJLDA potential (updated)

Fri Jun 11 08:13:52 CEST 2010

Thanks for your timely reply!

I known that lapw0_mpi parallel will not speed up the small system,
like GaAs. It's just a test case before we calculate some larger
system.

Now, The code can deal with the lapw0 parallel of GaAs correctly, but,
another problem arised when we calculate some larger system(3 or 8
inequivalent atoms in primitive cell)!

The calcultion can not proceed normally at the second call of lapw0
whether or not use the parallel of lapw0.

The job will not stop, and the lapw0 (or lapw0_mpi) run without any
error infomation, but lapw0 (or lapw0_mpi) will not done after a long
long time.

======== case.dayfile
===============================================================

    start       (Fri Jun 11 00:08:00 CST 2010) with lapw0 (1/99 to go)

    cycle 1     (Fri Jun 11 00:08:00 CST 2010)  (1/99 to go)

>   lapw0 -grr -p       (00:08:00) starting parallel lapw0 at Fri Jun 11 00:08:00 CST 2010
-------- .machine0 : 16 processors
0.824u 0.444s 0:10.82 11.6%     0+0k 0+0io 0pf+0w
>   lapw0 -p    (00:08:11) starting parallel lapw0 at Fri Jun 11 00:08:11 CST 2010
-------- .machine0 : 16 processors

=====================================================================================

It seems that the code can't handle the system which contains more
than two inequivalent atoms. We doubt there are still some bugs in
lapw0 about MBJLDA potential.

The attachment could be used as a test example.

Thanks,

Feng.

2010/6/10 Peter Blaha <pblaha at theochem.tuwien.ac.at>:
> Thank's for the report. I could verify the problem with the mpi-parallel
> version for mBJ and a corrected version is on the web for download.
>
> HOWEVER: Please be aware, that   lapw0_mpi  parallelizes (mainly) over the
> atoms. Thus for GaAs I do not expect any speedup by using more than 2
> processors.
>
> Furthermore: Do NOT blindly use a "parallel" calculations. For these small
> systems a sequential calculation (maybe with OMP_NUM_THREAD set to 2) might
> be FASTER than a 8 or more fold parallel calculation. (parallel overhead,
> disk I/O, "summary" steps, slower memory access, ...)
> Always compare the "real timings" of lapw0/1/2 in the dayfiles of a
> sequential
> and parallel calculation.
>
-------------- next part --------------
blebleble                                s-o calc. M||  0.00  0.00  1.00       
F                            3  216                                            
             RELA                                                              
 12.425894 12.425894 12.425894 90.000000 90.000000 90.000000                   
ATOM  -1: X=0.00000000 Y=0.00000000 Z=0.00000000
          MULT= 1          ISPLIT=-2
Bi         NPT=  781  R0=.000005000 RMT=   2.50000   Z:  83.00000              
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                     0.0000000 1.0000000 0.0000000
                     0.0000000 0.0000000 1.0000000
ATOM  -2: X=0.25000000 Y=0.25000000 Z=0.25000000
          MULT= 1          ISPLIT=-2
Pt         NPT=  781  R0=.000005000 RMT=   2.50000   Z:  78.00000              
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                     0.0000000 1.0000000 0.0000000
                     0.0000000 0.0000000 1.0000000
ATOM  -3: X=0.50000000 Y=0.00000000 Z=0.00000000
          MULT= 1          ISPLIT=-2
Lu         NPT=  781  R0=.000010000 RMT=   2.50000   Z:  71.00000              
LOCAL ROT MATRIX:    1.0000000 0.0000000 0.0000000
                     0.0000000 1.0000000 0.0000000
                     0.0000000 0.0000000 1.0000000
   8      NUMBER OF SYMMETRY OPERATIONS
 0 1 0 0.0000000
-1 0 0 0.0000000
 0 0-1 0.0000000
       1   A   3 so. oper.  type  orig. index
 0-1 0 0.0000000
 1 0 0 0.0000000
 0 0-1 0.0000000
       2   A   7
-1 0 0 0.0000000
 0-1 0 0.0000000
 0 0 1 0.0000000
       3   A  16
 1 0 0 0.0000000
 0 1 0 0.0000000
 0 0 1 0.0000000
       4   A  24
 1 0 0 0.0000000
 0-1 0 0.0000000
 0 0-1 0.0000000
       5   B   1
-1 0 0 0.0000000
 0 1 0 0.0000000
 0 0-1 0.0000000
       6   B   9
 0 1 0 0.0000000
 1 0 0 0.0000000
 0 0 1 0.0000000
       7   B  18
 0-1 0 0.0000000
-1 0 0 0.0000000
 0 0 1 0.0000000
       8   B  22
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ouput0.rar
Type: application/octet-stream
Size: 25203 bytes
Desc: not available
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20100611/b81dd720/attachment.dll>