[Wien] ** testerror: Error in Parallel LAPW

Peter Blaha peter.blaha at tuwien.ac.at
Wed Jun 21 09:11:17 CEST 2023


> it  crashed with the message  "Host key verification failed. "
>
Seems that your cluster does not allow   ssh to an allocated node.(Ask 
your sys admin).

In $WIENROOT/WIEN2k_parallel_options  there are variables like

USE_REMOTE.  If set to zero, ssh is not used and you can run in 
parallel, but only on one shared memory node.

In order to use multiple nodes, you need to be able to do passwordless 
ssh to the allocated nodes (or any other command substituting ssh).


> Herethe content of file 
> /lustre/ukt/milias/scratch/Wien2k_23.2_job.main.N1.n4.jid3009460/LvO2onQg/.machines:
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
> 1:lxbk1177
>
> Job is running on lxbk1177, with 8 cpus allocated;
>
> and this is from log :
>
> running x dstart :
> starting parallel dstart at Tue 20 Jun 2023 05:16:21 PM CEST
> -------- .machine0 : processors
> running dstart in single mode
> STOP DSTART ENDS
> 10.249u 0.322s 0:11.19 94.3%    0+0k 158496+101160io 437pf+0w
>
> running 'run_lapw -p -ec 0.0001 -NI'
> STOP  LAPW0 END
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerr
> or_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; 
> if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .
> temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \% 
> .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> Host key verification failed.
> [1]    Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def 
> ;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
> ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw 
> .stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; 
> grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
> LvO2onQg.scf1_1: No such file or directory.
> grep: *scf1*: No such file or directory
> STOP FERMI - Error
> cp: cannot stat '.in.tmp': No such file or directory
> grep: *scf1*: No such file or directory
>
> >   stop error
>
>
>
> file ":parallel"
>
> starting parallel lapw1 at Tue 20 Jun 2023 05:17:08 PM CEST
>     lxbk1177(4)      lxbk1177(3)      lxbk1177(3)      lxbk1177(3) 
>      lxbk1177(3)      lxbk1177(3)      lxbk1177(3)      l
> xbk1177(3)    Summary of lapw1para:
>   lxbk1177      k=25    user=0  wallclock=0
> <-  done at Tue 20 Jun 2023 05:17:14 PM CEST
> -----------------------------------------------------------------
> ->  starting Fermi on lxbk1177 at Tue 20 Jun 2023 05:17:15 PM CEST
> **  LAPW2 crashed at Tue 20 Jun 2023 05:17:16 PM CEST
> **  check ERROR FILES!
> -----------------------------------------------------------------
>
>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

-- 
-----------------------------------------------------------------------
Peter Blaha,  Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email:peter.blaha at tuwien.ac.at           
WWW:http://www.imc.tuwien.ac.at       WIEN2k:http://www.wien2k.at
-------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20230621/40a055a3/attachment.htm>


More information about the Wien mailing list