Dear L. Marks,<br><br>I did your suggestions but unfortunately I have no success yet.<br><br><div class="gmail_quote">2011/12/28 Laurence Marks <span dir="ltr"><<a href="mailto:L-marks@northwestern.edu">L-marks@northwestern.edu</a>></span><br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Suggestions, assuming that all your computers are dual quadcores:<br>
a) Use as .machines file<br>
1:<a href="http://bodesking.uefs.br:8" target="_blank">bodesking.uefs.br:8</a><br>
1:compute-0-0.local:8<br>
1:compute-0-1.local:8 </blockquote><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
This will run 3 tasks each using mpi with 8 cores on each computer. If<br>
they are not dual quadcores but only have (for instance) 4 cores<br>
change the "8" to "4".<br></blockquote><div> </div><div>I tried it with 8 and 4 options. My computer is an xeon quadcore, but I am not sure if it is dual. Here is the output of /proc/cpuinfo<br> ------------------------------------------------------------------------------------------------------------------------------ <br>
processor : 0<br>vendor_id : GenuineIntel<br>cpu family : 6<br>model : 30<br>model name : Intel(R) Xeon(R) CPU X3430 @ 2.40GHz<br>stepping : 5<br>cpu MHz : 1197.000<br>
cache size : 8192 KB<br>physical id : 0<br>siblings : 4<br>core id : 0<br>cpu cores : 4<br>apicid : 0<br>fpu : yes<br>fpu_exception : yes<br>cpuid level : 11<br>wp : yes<br>
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm<br>
bogomips : 4800.07<br>clflush size : 64<br>cache_alignment : 64<br>address sizes : 36 bits physical, 48 bits virtual<br>power management: [8]<br>--------------------------------------------------------------------------------------------------------------------<br>
<br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
b) If this still fails, do "tail *.scf1* " and "tail *.output1*" and<br>
see if only one failed, or all failed. I assume you are using a<br>
terminal not just w2web. Have you checked the error files?<br></blockquote><div> <br></div><div>This fail because it only finish lapw0. It genarate lapw1_1,2,3.error but they are empty.<br>Here is the content of case.dayfile<br>
-------------------------------------------------case.dayfile---------------------------------------------------<br>Calculating case in /home/nilton/pesquisa/dftCalc/calWien/gaxtl1-xas/075/case<br>on <a href="http://bodesking.uefs.br">bodesking.uefs.br</a> with PID 13581<br>
using WIEN2k_10.1 (Release 7/6/2010) in /home/nilton/wien2k<br><br><br> start (Mon Jan 2 14:59:45 BRT 2012) with lapw0 (40/99 to go)<br><br> cycle 1 (Mon Jan 2 14:59:45 BRT 2012) (40/99 to go)<br><br>> lapw0 -p (14:59:45) starting parallel lapw0 at Mon Jan 2 14:59:45 BRT 2012<br>
-------- .machine0 : processors<br>running lapw0 in single mode<br>14.244u 0.418s 0:14.67 99.8% 0+0k 0+0io 0pf+0w<br>> lapw1 -c -p (15:00:00) starting parallel lapw1 at Mon Jan 2 15:00:00 BRT 2012<br>-> starting parallel LAPW1 jobs at Mon Jan 2 15:00:00 BRT 2012<br>
running LAPW1 in parallel mode (using .machines)<br>3 number_of_parallel_jobs<br>[1] 13841<br>[1] + Exit 255 ( $remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]" ) >> ...<br>
[1] 13871<br>[1] + Exit 255 ( $remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]" ) >> ...<br>[1] 13898<br>[1] + Exit 255 ( $remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]" ) >> ...<br>
<br> <br>[1] 13749<br>----------------------------------------------------------------------------------------------------------------------------------------- <br><br><br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
c) Do you have ssh without password setup? For instance you need to be<br>
able to do "ssh compute-0-0.local" and not be asked for a password. If<br>
it is not setup, you may have to as many mpi versions need it.<br></blockquote><div> </div><div>Yes, I can do ssh without password.<br> <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
d) Do "cd $WIENROOT ; cp lapw1para lapw1para_hold" then edit lapw1para<br>
and change the first line to "#!/bin/csh -xf" . This will give you<br>
masses of output, and may show an error. If nothing else it will show<br>
a command such as "mpirun ..." You can then paste this particular<br>
command and run it at the terminal to get more information.<br></blockquote><div>I did. I dont got any error. but an strange message:<br>----------------------------------------------------it is a long message. I have the message in a file---------------------------<br>
sleep 1<br>Pseudo-terminal will not be allocated because stdin is not a terminal.
<br>ssh: cd /home/nilton/pesquisa/dftCalc/calWien/gaxtl1-xas/075/case;time mpirun -np 4 -machinefile .machine: Name or service not known
<br>end<br>------------------------------------------------------------------------------------------------------------------------------<br><br>It seems that are looking for .machine file. but It dont exist. I can paste this command because it dont have the exec. file<br>
<br></div></div><br clear="all">Nilton<br>-- <br>Nilton S. Dantas<br>Universidade Estadual de Feira de Santana<br>Departamento de Ciências Exatas<br>Área de Informática<br>Av. Transnordestina, S/N, Bairro Novo Horizonte<br>
CEP 44036900 - Feira de Santana, Bahia, Brasil<br>Tel./Fax +55 75 31618086<br><a href="http://www.uefs.br/portal" target="_blank">http://www2.ecomp.uefs.br/</a><br><br>