[Wien] wien in parallel mode is slower than serial mode

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.be
Fri Dec 21 19:07:01 CET 2007


Three remarks,

1) I agree with Florent that it does not make much sense to base  
conclusions on such small jobs. Try something that has at least one  
minute execution time for lapw1.

2) You said you did try a larger test case, with the same results. And  
indeed, there is something weird in the dayfiles you sent: lapw1 has  
always 100% cpu occupancy (serial and parrallel), while lapw2 goes way  
beyond 100% (even to 400% for the serial run). It looks like you've  
compiled lapw1 with OMP_NUM_TRHEADS=1, while for lapw2 it is 4, and  
you probably run this on a quadcore cpu...? In that case, lapw2 would  
be somewhat parallellized even in a serial run, while lapw1 is not.  
This will spoil your timing.

3) lapw2 heavily writes to the disk, lapw1 does this much less. Bare  
cpu usage is therefore more important for lapw1, while disk access  
time and bus speed is more important for lapw2. Another reason why the  
parallellization might differently affect lapw1 and lapw2.

Stefaan

-------------> Single mode<-----------
Calculating case in /home/nilton/lapw/gaas/case
on slv1.gfba.fis.ufba.br with PID 23327

      start       (Wed Dec 19 10:20:05 BRT 2007) with lapw0 (40/99 to go)

      cycle 1     (Wed Dec 19 10:20:05 BRT 2007)  (40/99 to go)
    lapw0       (10:20:05) 15.541u 0.109s 0:06.11 255.9%        0+0k
0+496io 0pf+0w
    lapw1  -c   (10:20:12) 2.458u 0.094s 0:02.55 99.6%  0+0k 0+8224io 0pf+0w
    lapw2 -c    (10:20:14) 10.270u 0.097s 0:02.63 393.9%        0+0k
0+552io 0pf+0w
    lcore       (10:20:17) 0.021u 0.018s 0:00.04 75.0%  0+0k 0+1152io 0pf+0w
    mixer       (10:20:17) 0.125u 0.016s 0:00.11 118.1% 0+0k 0+808io 0pf+0w
:ENERGY convergence:  1 0.0001 .0000450000000000 :CHARGE convergence:
0 0.0000 .0011826
ec cc and fc_conv 1 1 1
--------------> END SINGLE MODE<--------

----------> Parallel mode<------------
running lapw0 in single mode
15.488u 0.154s 0:06.34 246.5%   0+0k 0+520io 0pf+0w
    lapw1  -c -p        (09:25:00) starting parallel lapw1 at Wed Dec
19 09:25:00 BRT 2007
->  starting parallel LAPW1 jobs at Wed Dec 19 09:25:00 BRT 2007
running LAPW1 in parallel mode (using .machines)
5 number_of_parallel_jobs
       localhost(2) 0.138u 0.019s 0:00.15 93.3%   0+0k 0+640io 0pf+0w
       localhost(9) 0.520u 0.034s 0:00.55 100.0%  0+0k 0+1544io 0pf+0w
       localhost(9) 0.492u 0.035s 0:00.52 100.0%  0+0k 0+1560io 0pf+0w
       localhost(9) 0.492u 0.023s 0:00.51 100.0%  0+0k 0+1552io 0pf+0w
       localhost(18) 0.963u 0.067s 0:01.03 99.0%  0+0k 0+3144io 0pf+0w
     Summary of lapw1para:
     localhost     k=47    user=2.605      wallclock=2.76
2.703u 0.353s 0:07.21 42.3%     0+0k 0+9232io 0pf+0w
    lapw2 -c  -p        (09:25:07) running LAPW2 in parallel mode
        localhost 0.562u 0.015s 0:00.23 247.8% 0+0k 0+520io 0pf+0w
   localhost 3.589u 0.024s 0:01.03 349.5% 0+0k 0+520io 0pf+0w
        localhost 33.012u 1.100s 0:19.64 173.6% 0+0k 0+520io 0pf+0w
        localhost 8.394u 0.263s 0:06.73 128.5% 0+0k 0+520io 0pf+0w
        localhost 30.013u 0.850s 0:17.83 173.0% 0+0k 0+520io 0pf+0w
     Summary of lapw2para:
     localhost     user=75.57      wallclock=45.46
75.714u 2.462s 0:23.15 337.6%   0+0k 0+4208io 0pf+0w
    lcore       (09:25:30) 0.022u 0.021s 0:00.04 100.0% 0+0k 0+1152io 0pf+0w
    mixer       (09:25:31) 0.136u 0.012s 0:00.11 127.2% 0+0k 0+808io 0pf+0w
:ENERGY convergence:  1 0.0001 .0000010000000000 :CHARGE convergence:
0 0.0000 .0009136
ec cc and fc_conv 1 1 1
    stop
----------------------------------------------------------------------








Quoting nilton at ufba.br:

> Dear Florent,
>
> Yes, I did your previous remark and it not work! This is the reason
> why I put that question again. And I sent one e-mail for your private
> e-mail address before, explaining this point. Thanks for your
> consideration and your wish to help me, but it is not enough to me. I
> raised the matrix size of that system on more than 10 times and the
> behavior is the same: lapw1 is faster, but lapw2 is very slower. If
> lapw2 write on the disk, lapw1 also to. So, why could lapw1 be faster
> and lapw2 not? This don't make sense. Also, the total time of the run
> in parallel running is bigger than in serial running as you com see in
> my prevous e-mail.
>
>> My opinion is to use parallel calculations if you win something.
>> Changing from 4hours to 1 hours by running on 4 nodes instead of 1 is
>> efficient, but running on 15s instead of 60s... I am not sure it is and
>> furthermore, I'm not sure it will justify the time that people will
>> spend to answer your question.
> I want not people stop yours job to answer my question, but I hope if
> someone had experience this problem and wants help me I will appreciate.
> Regards and merry Christmas
> Nilton
>>
>> --
>>  -------------------------------------------------------------------------
>> | Florent BOUCHER                    |                                    |
>> | Institut des Matériaux Jean Rouxel | Mailto:Florent.Boucher at cnrs-imn.fr |
>> | 2, rue de la Houssinière           | Phone: (33) 2 40 37 39 24          |
>> | BP 32229                           | Fax:   (33) 2 40 37 39 95          |
>> | 44322 NANTES CEDEX 3 (FRANCE)      | http://www.cnrs-imn.fr             |
>>  -------------------------------------------------------------------------
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
>
>
> ----------------------------------------------------------------
> Universidade Federal da Bahia - http://www.portal.ufba.br
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>



-- 
Stefaan Cottenier
Instituut voor Kern- en Stralingsfysica
K.U.Leuven
Celestijnenlaan 200 D
B-3001 Leuven (Belgium)

tel: + 32 16 32 71 45
fax: + 32 16 32 79 85
e-mail: stefaan.cottenier at fys.kuleuven.be


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the Wien mailing list