<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Dear all WIEN2k community members:</div><div><br></div><div>We encountered some problem when running in parallel (K-point, MPI or both)--the calculations crashed at LAPW2. Note we had no problem running it in serial. We have tried to diagnose the problem, recompile the code with difference options and test with difference cases and parameters based on similar problems reported on the mail list, but the problem persists. So we write here hoping someone can offer us some suggestion. We have attached related files below for your reference. Your replies are appreciated in advance! </div><div><br></div><div>This is a TiC example running in both Kpoint and MPI parallel on two nodes <i>r1i0n0</i> and <i>r1i0n1</i> (8cores/node):</div><div><br></div><div><b>1. </b><b>stdout </b><b>(abridged) </b></div><div>MPI: invalid option -machinefile</div><div>real<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.004s</div><div>user<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.000s</div><div>sys<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.000s</div><div>...</div><div>MPI: invalid option -machinefile</div><div>real<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.003s</div><div>user<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.000s</div><div>sys<span class="Apple-tab-span" style="white-space: pre; ">        </span>0m0.004s</div><div>TiC.scf1up_1: No such file or directory.</div><div><br></div><div>LAPW2 - Error. Check file lapw2.error</div><div>cp: cannot stat `.in.tmp': No such file or directory</div><div>rm: cannot remove `.in.tmp': No such file or directory</div><div><b></b><b><span class="Apple-style-span" style="font-weight: normal; ">rm: cannot remove `.in.tmp1': No such file or directory</span></b></div><div><b><br></b></div><div><b><span class="Apple-style-span" style="font-weight: normal; "></span>2. TiC.dayfile (abridged) </b></div><div>...</div><div> start <span class="Apple-tab-span" style="white-space: pre; ">        </span>(Sun Oct 31 16:25:06 MDT 2010) with lapw0 (40/99 to go)</div><div> cycle 1 <span class="Apple-tab-span" style="white-space: pre; ">        </span>(Sun Oct 31 16:25:06 MDT 2010) <span class="Apple-tab-span" style="white-space: pre; ">        </span>(40/99 to go)</div><div><br></div><div>> lapw0 -p<span class="Apple-tab-span" style="white-space: pre; ">        </span>(16:25:06) starting parallel lapw0 at Sun Oct 31 16:25:07 MDT 2010</div><div>-------- .machine0 : 16 processors</div><div>invalid "local" arg: -machinefile</div><div><br></div><div>0.436u 0.412s 0:04.63 18.1%<span class="Apple-tab-span" style="white-space: pre; ">        </span>0+0k 2600+0io 1pf+0w</div><div>> lapw1 -up -p <span class="Apple-tab-span" style="white-space: pre; ">        </span>(16:25:12) starting parallel lapw1 at Sun Oct 31 16:25:12 MDT 2010</div><div>-> starting parallel LAPW1 jobs at Sun Oct 31 16:25:12 MDT 2010</div><div>running LAPW1 in parallel mode (using .machines)</div><div>2 number_of_parallel_jobs</div><div> r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1 r1i0n1(1) r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0 r1i0n0(1) Summary of lapw1para:</div><div> r1i0n0<span class="Apple-tab-span" style="white-space: pre; ">        </span> k=0<span class="Apple-tab-span" style="white-space: pre; ">        </span> user=0<span class="Apple-tab-span" style="white-space: pre; ">        </span> wallclock=0</div><div> r1i0n1<span class="Apple-tab-span" style="white-space: pre; ">        </span> k=0<span class="Apple-tab-span" style="white-space: pre; ">        </span> user=0<span class="Apple-tab-span" style="white-space: pre; ">        </span> wallclock=0</div><div>...</div><div>0.116u 0.316s 0:10.48 4.0%<span class="Apple-tab-span" style="white-space: pre; ">        </span>0+0k 0+0io 0pf+0w</div><div>> lapw2 -up -p <span class="Apple-tab-span" style="white-space: pre; ">        </span>(16:25:34) running LAPW2 in parallel mode</div><div>** LAPW2 crashed!</div><div>0.032u 0.104s 0:01.13 11.5%<span class="Apple-tab-span" style="white-space: pre; ">        </span>0+0k 82304+0io 8pf+0w</div><div>error: command /home/xiew/WIEN2k_10/lapw2para -up uplapw2.def failed</div><div><br></div><div><b>3. uplapw2.error </b></div><div>Error in LAPW2</div><div> 'LAPW2' - can't open unit: 18 </div><div> 'LAPW2' - filename: TiC.vspup </div><div> 'LAPW2' - status: old form: formatted </div><div>** testerror: Error in Parallel LAPW2</div><div><br></div><div><div><div><b>4. .machines</b></div><div>#</div><div>1:r1i0n0:8</div><div>1:r1i0n1:8</div><div>lapw0:r1i0n0:8 r1i0n1:8 </div><div>granularity:1</div><div>extrafine:1</div></div></div><div><br></div><div><div><b>5. compilers, MPI and options</b></div><div>Intel Compilers and MKL 11.1.046</div><div>Intel MPI 3.2.0.011</div><div><br></div><div>current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback</div><div>current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback</div><div>current:LDFLAGS:$(FOPT) -L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -pthread</div><div>current:DPARALLEL:'-DParallel'</div><div>current:R_LIBS:-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lguide</div><div>current:RP_LIBS:-L/usr/local/intel/Compiler/11.1/046/mkl/lib/em64t -lmkl_scalapack_lp64 /usr/local/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -openmp -lpthread -L/home/xiew/fftw-2.1.5/lib -lfftw_mpi -lfftw $(R_LIBS)</div><div>current:MPIRUN:mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_</div></div><div><br></div><div>Best regards,</div><div>Wei Xie</div><div>Computational Materials Group</div><div>University of Wisconsin-Madison</div><div><br></div></body></html>