[Wien] wien in parallel mode is slower than serial mode
Peter Blaha
pblaha at theochem.tuwien.ac.at
Sat Dec 22 18:03:54 CET 2007
The timings you send look a bit "unusual". Sometimes they show much more
than 100%,..... It is difficult to say exactly what could be the
problem, but note:
Quad-core cpus are probably not an optimal platform, because memory
access is too slow. In addition, your local disk or network my be
overloaded by 4 parallel tasks.
Where are your files? On a local disk or an NFS-mounted drive?
If it is NFS, try using a SCRATCH variable and use the local disk.
nilton at ufba.br schrieb:
> Dear Stefaan Cottenier
> Thank very much for your answer.
>
>> 1) I agree with Florent that it does not make much sense to base
>> conclusions on such small jobs. Try something that has at least one
>> minute execution time for lapw1.
> I did that calculation for one large system (please see below) and the
> results do not changed. As you can see my system have matrix size of
> almost 4000 and the cpu time is almost 16 min to lapw1 what satisfy
> your remark. I am not using mpi because according wien user's guide is
> not necessary to shared memory system(as I said in my first e-mail I
> have core 2 quad system).
>
> .....
>> compiled lapw1 with OMP_NUM_TRHEADS=1, while for lapw2 it is 4, and
>> you probably run this on a quadcore cpu...? In that case, lapw2 would
>> be somewhat parallellized even in a serial run, while lapw1 is not.
> I afraid I not get your point. Please, could you tell where I can find
> this variable OMP_NUM_THREADS? I searched in Makefile and in lapw1
> files and don't found it.
> regards,
> Nilton
>
> Here is the output of my system. As you can see it has the same
> behavior as before, so, to solve my problem I need understand why
> lapw2 spends long time even when compared with serial mode. In
> parallel mode each instance of lapw1 and 2 need work with small number
> of k-points than in serial so that do not make sense lapw2 take more
> time in each operation.
>
> ------------------run in parallel mode-----------------------------
>
> :RKM : MATRIX SIZE 3957LOs: 360 RKM= 8.99 WEIGHT= 1.00 PGR:
>
> running lapw0 in single mode
> 81.944u 0.645s 0:39.98 206.5% 0+0k 0+7952io 0pf+0w
>> lapw1 -p (15:30:36) starting parallel lapw1 at Fri Dec 21
>> 15:30:36 BRT 2007
> -> starting parallel LAPW1 jobs at Fri Dec 21 15:30:36 BRT 2007
> running LAPW1 in parallel mode (using .machines)
> 4 number_of_parallel_jobs
> localhost(6) 725.214u 2.050s 16:34.90 73.0% 0+0k 0+159168io 0pf+0w
> localhost(6) 684.668u 2.080s 15:57.46 71.7% 0+0k 0+152816io 0pf+0w
> localhost(6) 699.967u 2.097s 16:10.79 72.3% 0+0k 8+153688io 0pf+0w
> localhost(6) 688.346u 1.890s 16:02.18 71.7% 0+0k 8+154800io 0pf+0w
> Summary of lapw1para:
> localhost k=24 user=2798.2 wallclock=3885.33
> 2799.483u 9.766s 16:36.02 282.0% 0+0k 16+621200io 0pf+0w
>> lapw2 -p (15:47:12) running LAPW2 in parallel mode
> localhost 1246.379u 41.656s 19:13.26 111.6% 0+0k 0+11392io 0pf+0w
> localhost 952.876u 34.233s 16:58.17 96.9% 0+0k 0+11392io 0pf+0w
> localhost 552.495u 19.128s 11:48.03 80.7% 0+0k 0+11392io 0pf+0w
> localhost 781.333u 27.519s 15:09.69 88.9% 0+0k 0+11392io 0pf+0w
> Summary of lapw2para:
> localhost user=3533.08 wallclock=3789.15
> 3533.914u 122.942s 19:15.69 316.4% 0+0k 8+58696io 0pf+0w
>> lcore (16:06:28) 0.119u 0.061s 0:00.24 70.8% 0+0k 0+4248io 0pf+0w
>> mixer (16:06:28) 1.053u 0.180s 0:00.98 125.5% 0+0k 0+14200io 0pf+0w
> :ENERGY convergence: 1 0.0001 .0000100000000000
> :CHARGE convergence: 0 0.0000 .0005089
> ec cc and fc_conv 1 1 1
>
>> stop
>
>
> ----------------------------------------------------------------
> Universidade Federal da Bahia - http://www.portal.ufba.br
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
More information about the Wien
mailing list