[Wien] parallel ssh error

Peter Blaha pblaha at theochem.tuwien.ac.at
Mon Sep 30 08:59:07 CEST 2019


So there is progress as now the environment seems to be accepted in the 
remote shell.

lapw1para (called by x_lapw, which is called by run_lapw -p) creates the 
splitted klists-files (case.klist_1,...) and def files lapw1_1.def,...

It uses the $cwd variable and executes basically:

ssh vlsi1 "cd $cwd; lapw1c lapw1_1.def "

Does this work on your computers ?



On 9/29/19 7:16 PM, Indranil mal wrote:
> Now echo $WIENROOT is giving the $WIENROOT location.
> 
> echo $WIENROOT/lapw*
> 
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi 
> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw 
> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c 
> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara 
> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para 
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2 
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi 
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi 
> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw 
> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c 
> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c 
> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c 
> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc 
> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara 
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso 
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi 
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
> 
> ssh vlsi1 'echo $WIENROOT/lapw*'
> 
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi 
> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw 
> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c 
> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara 
> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para 
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2 
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi 
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi 
> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw 
> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c 
> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c 
> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c 
> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc 
> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara 
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso 
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi 
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
> 
> 
> However getting the same error
> 
> 
> 	
> 
>>   stop error
> 
> grep: *scf1*: No such file or directory
> cp: cannot stat '.in.tmp': No such file or directory
> FERMI - Error
> grep: *scf1*: No such file or directory
> Parallel.scf1_1: No such file or directory.
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
>   LAPW0 END
> hup: Command not found.
> 
> 
> and lapw2 error file
> 
>   'LAPW2' - can't open unit: 30
>   'LAPW2' -        filename: Parallel.energy_1
> **  testerror: Error in Parallel LAPW2
> 
> 
> 
> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu 
> <mailto:gsabo at crimson.ua.edu>> wrote:
> 
>     The "sudo service sshd restart" step, which I forgot to copy and
>     paste, that is missing is corrected below.
> 
>     On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>
>>     After you set both "SendEnv *" and "AcceptEnv *", did you restart
>>     the sshd service [1]?  The following illustrates steps that might
>>     help you verify that WIENROOT appears on a remote vlsi node:
>>
>>     username at computername:~$ echo $WIENROOT
>>
>>     username at computername:~$ export WIENROOT=/servernode1
>>     username at computername:~$ echo $WIENROOT
>>     /servernode1
>>     username at computername:~$ ssh vlsi
>>     Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic x86_64)
>>     ...
>>     Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>>     username at computername:~$ echo $WIENROOT
>>
>>     username at computername:~$ exit
>>     logout
>>     Connection to vlsi closed.
>>     username at computername:~$ sudo gedit /etc/ssh/ssh_config
>>     [sudo] password for username:
>>
>>     username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>
>>     username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>         SendEnv LANG LC_* WIENROOT
>>     username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>>     AcceptEnv LANG LC_* WIENROOT
>>
>         username at computername:~$ sudo service sshd restart
>>
>>     username at computername:~$ ssh vlsi
>>     ...
>>     username at computername:~$ echo $WIENROOT
>>     /servernode1
>>     username at computername:~$ exit
>>
>>     [1]
>>     https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>>
>>     On 9/28/2019 11:22 AM, Indranil mal wrote:
>>>     Sir I have tried with " SetEnv * " Still nothing is coming with
>>>     echo  commad and user name by mistake I posted wrong Otherwise no
>>>     issue with user name and I have set the parallel options file
>>>     taksset "no" and remote options are 1 1 in server and client
>>>     machines.
>>>
>>>
>>>     On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu
>>>     <mailto:gsabo at crimson.ua.edu>> wrote:
>>>
>>>>         Respected Sir, In my linux(Ubuntu 18.04 LTS) in ssh_config,
>>>>         and in sshd_config there are two line already "SendEnv LANG
>>>>         LC_*" "AcceptEnv LANG LC_*" respectively.
>>>
>>>         The "LANG LC_*" probably only puts just the local language
>>>         variables in the remote environment.  Did you follow the
>>>         previous advice [1] of trying to use "*" to put all variables
>>>         from the local environment?
>>>
>>>         [1]
>>>         https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>>
>>>>         However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>
>>>         That seems to be the main cause of the problem as it should
>>>         not return (blank) but needs to return "/servernode1" as you
>>>         previously mentioned [2].
>>>
>>>         [2]
>>>         https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>>
>>>         Perhaps the message below is a clue.  It you had set the
>>>         WIENROOT variable in .bashrc of your /home/vlsi accounts on
>>>         each system, you likely have to login and use that same
>>>         /home/vlsi account on the head node as the output below seems
>>>         to indicate login to a different /home/niel account. 
>>>         Alternatively, setting the WIENROOT variable in .bashrc of
>>>         all /home/niel accounts on each node might work too.
>>>
>>>>            The command ssh vsli1 'pwd $WIENROOT' print "/home/vlsi"
>>>>         the common home directory and
>>>>         ssh vlsi1 "env"
>>>>         ...
>>>>         USER=niel
>>>>         PWD=/home/niel
>>>>         HOME=/home/niel
>>>>         ...
>>>>         this is similar as server, and other nodes.
>>>>
>>>>         Sir After changing the parallel option file in $WIENROOT in
>>>>         server to
>>>>
>>>>         setenv TASKSET *"yes" from "no"*
>>>>         if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>>         if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>>         setenv WIEN_GRANULARITY 1
>>>>         setenv DELAY 0.1
>>>>         setenv SLEEPY 1
>>>>         setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"
>>>>         setenv CORES_PER_NODE 1
>>>>
>>>>         the error is not coming but the program is not increasing
>>>>         steps after lapw0 it stuck in lapw1
>>>
>>>         Since it seemed to be throwing an appropriate error message
>>>         with TASKSET previously unlike when set to "yes", probably
>>>         you should change it back to "no".
>>>
>     _______________________________________________
>     Wien mailing list
>     Wien at zeus.theochem.tuwien.ac.at <mailto:Wien at zeus.theochem.tuwien.ac.at>
>     http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>     SEARCH the MAILING-LIST at:
>     http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/TC_Blaha
--------------------------------------------------------------------------


More information about the Wien mailing list