[Wien] I still have problem with wienk in parallel mode

Nilton nilton.dantas at gmail.com
Mon Jan 2 19:22:41 CET 2012


Dear L. Marks,

I did your suggestions but unfortunately I have no success yet.

2011/12/28 Laurence Marks <L-marks at northwestern.edu>

> Suggestions, assuming that all your computers are dual quadcores:
> a) Use as .machines file
> 1:bodesking.uefs.br:8
> 1:compute-0-0.local:8
> 1:compute-0-1.local:8


> This will run 3 tasks each using mpi with 8 cores on each computer. If
> they are not dual quadcores but only have (for instance) 4 cores
> change the "8" to "4".
>

I tried it with 8 and 4 options. My computer is an xeon quadcore, but I am
not sure if it is dual.  Here is the output of /proc/cpuinfo
 ------------------------------------------------------------------------------------------------------------------------------

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 30
model name      : Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz
stepping        : 5
cpu MHz         : 1197.000
cache size      : 8192 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx
rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx smx est tm2
ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4800.07
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management: [8]
--------------------------------------------------------------------------------------------------------------------


> b) If this still fails, do "tail *.scf1* " and "tail *.output1*" and
> see if only one failed, or all failed. I assume you are using a
> terminal not just w2web. Have you checked the error files?
>

This fail because it only finish lapw0.  It genarate lapw1_1,2,3.error but
they are empty.
Here is the content of case.dayfile
-------------------------------------------------case.dayfile---------------------------------------------------
Calculating case in
/home/nilton/pesquisa/dftCalc/calWien/gaxtl1-xas/075/case
on bodesking.uefs.br with PID 13581
using WIEN2k_10.1 (Release 7/6/2010) in /home/nilton/wien2k


    start     (Mon Jan  2 14:59:45 BRT 2012) with lapw0 (40/99 to go)

    cycle 1     (Mon Jan  2 14:59:45 BRT 2012)     (40/99 to go)

>   lapw0 -p    (14:59:45) starting parallel lapw0 at Mon Jan  2 14:59:45
BRT 2012
-------- .machine0 : processors
running lapw0 in single mode
14.244u 0.418s 0:14.67 99.8%    0+0k 0+0io 0pf+0w
>   lapw1  -c -p      (15:00:00) starting parallel lapw1 at Mon Jan  2
15:00:00 BRT 2012
->  starting parallel LAPW1 jobs at Mon Jan  2 15:00:00 BRT 2012
running LAPW1 in parallel mode (using .machines)
3 number_of_parallel_jobs
[1] 13841
[1]  + Exit 255                      ( $remote $remotemachine "cd $PWD;$t
$ttt;rm -f .lock_$lockfile[$p]" ) >>  ...
[1] 13871
[1]  + Exit 255                      ( $remote $remotemachine "cd $PWD;$t
$ttt;rm -f .lock_$lockfile[$p]" ) >>  ...
[1] 13898
[1]  + Exit 255                      ( $remote $remotemachine "cd $PWD;$t
$ttt;rm -f .lock_$lockfile[$p]" ) >>  ...


[1] 13749
-----------------------------------------------------------------------------------------------------------------------------------------




> c) Do you have ssh without password setup? For instance you need to be
> able to do "ssh compute-0-0.local" and not be asked for a password. If
> it is not setup, you may have to as many mpi versions need it.
>

Yes, I can do ssh without password.


>
> d) Do "cd $WIENROOT ; cp lapw1para lapw1para_hold" then edit lapw1para
> and change the first line to "#!/bin/csh -xf" . This will give you
> masses of output, and may show an error. If nothing else it will show
> a command such as "mpirun ..." You can then paste this particular
> command and run it at the terminal to get more information.
>
I did. I dont got any error. but an strange message:
----------------------------------------------------it is a long message. I
have the message in a file---------------------------
sleep 1
Pseudo-terminal will not be allocated because stdin is not a terminal.
ssh: cd /home/nilton/pesquisa/dftCalc/calWien/gaxtl1-xas/075/case;time
mpirun -np 4 -machinefile .machine: Name or service not known
end
------------------------------------------------------------------------------------------------------------------------------

It seems that are looking for .machine file. but It dont exist. I can paste
this command because it dont have the exec. file


Nilton
-- 
Nilton S. Dantas
Universidade Estadual de Feira de Santana
Departamento de Ciências Exatas
Área de Informática
Av. Transnordestina, S/N, Bairro Novo Horizonte
CEP 44036900 - Feira de Santana, Bahia, Brasil
Tel./Fax +55 75 31618086
http://www2.ecomp.uefs.br/ <http://www.uefs.br/portal>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20120102/27101f6d/attachment.htm>


More information about the Wien mailing list