[Wien] parallel ssh error

Gavin Abo gsabo at crimson.ua.edu
Mon Sep 30 13:41:45 CEST 2019


An additional comment, /home/username/WIEN2k (or ~/WIEN2k) is where I 
have WIEN2k installed.  Whereas, you have installed WIEN2k at 
/servernode1 [1].  In the examples of my previous posts (e.g. [2]) you 
might find some typographical errors were I forget to replace my 
/home/username/WIEN2k with your /servernode1.

It is best to have WIEN2k setup at a common path location on all nodes 
[3,4] (i.e., your vlsi1-vsli4).  Therefore, I recommend to not have 
WIEN2k at other locations among your system nodes like the 
/home/username/WIEN2K as you mentioned below.

[1] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
[2] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19061.html
[3] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17988.html
[4] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09229.html

On 9/30/2019 12:59 AM, Peter Blaha wrote:
> So there is progress as now the environment seems to be accepted in 
> the remote shell.
>
> lapw1para (called by x_lapw, which is called by run_lapw -p) creates 
> the splitted klists-files (case.klist_1,...) and def files 
> lapw1_1.def,...
>
> It uses the $cwd variable and executes basically:
>
> ssh vlsi1 "cd $cwd; lapw1c lapw1_1.def "
>
> Does this work on your computers ?
>
>
>
> On 9/29/19 7:16 PM, Indranil mal wrote:
>> Now echo $WIENROOT is giving the $WIENROOT location.
>>
>> echo $WIENROOT/lapw*
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi 
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw 
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c 
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara 
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para 
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2 
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi 
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi 
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw 
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c 
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c 
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c 
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc 
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara 
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso 
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi 
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>> ssh vlsi1 'echo $WIENROOT/lapw*'
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi 
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw 
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c 
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara 
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para 
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2 
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi 
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi 
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw 
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c 
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c 
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c 
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc 
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara 
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso 
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi 
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>>
>> However getting the same error
>>
>>
>>
>>
>>>   stop error
>>
>> grep: *scf1*: No such file or directory
>> cp: cannot stat '.in.tmp': No such file or directory
>> FERMI - Error
>> grep: *scf1*: No such file or directory
>> Parallel.scf1_1: No such file or directory.
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>>   LAPW0 END
>> hup: Command not found.
>>
>>
>> and lapw2 error file
>>
>>   'LAPW2' - can't open unit: 30
>>   'LAPW2' -        filename: Parallel.energy_1
>> **  testerror: Error in Parallel LAPW2
>>
>>
>>
>> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu 
>> <mailto:gsabo at crimson.ua.edu>> wrote:
>>
>>     The "sudo service sshd restart" step, which I forgot to copy and
>>     paste, that is missing is corrected below.
>>
>>     On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>>
>>>     After you set both "SendEnv *" and "AcceptEnv *", did you restart
>>>     the sshd service [1]?  The following illustrates steps that might
>>>     help you verify that WIENROOT appears on a remote vlsi node:
>>>
>>>     username at computername:~$ echo $WIENROOT
>>>
>>>     username at computername:~$ export WIENROOT=/servernode1
>>>     username at computername:~$ echo $WIENROOT
>>>     /servernode1
>>>     username at computername:~$ ssh vlsi
>>>     Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic x86_64)
>>>     ...
>>>     Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>>>     username at computername:~$ echo $WIENROOT
>>>
>>>     username at computername:~$ exit
>>>     logout
>>>     Connection to vlsi closed.
>>>     username at computername:~$ sudo gedit /etc/ssh/ssh_config
>>>     [sudo] password for username:
>>>
>>>     username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>>
>>>     username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>>         SendEnv LANG LC_* WIENROOT
>>>     username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>>>     AcceptEnv LANG LC_* WIENROOT
>>>
>>         username at computername:~$ sudo service sshd restart
>>>
>>>     username at computername:~$ ssh vlsi
>>>     ...
>>>     username at computername:~$ echo $WIENROOT
>>>     /servernode1
>>>     username at computername:~$ exit
>>>
>>>     [1]
>>> https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>>>
>>>     On 9/28/2019 11:22 AM, Indranil mal wrote:
>>>>     Sir I have tried with " SetEnv * " Still nothing is coming with
>>>>     echo  commad and user name by mistake I posted wrong Otherwise no
>>>>     issue with user name and I have set the parallel options file
>>>>     taksset "no" and remote options are 1 1 in server and client
>>>>     machines.
>>>>
>>>>
>>>>     On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu
>>>>     <mailto:gsabo at crimson.ua.edu>> wrote:
>>>>
>>>>>         Respected Sir, In my linux(Ubuntu 18.04 LTS) in ssh_config,
>>>>>         and in sshd_config there are two line already "SendEnv LANG
>>>>>         LC_*" "AcceptEnv LANG LC_*" respectively.
>>>>
>>>>         The "LANG LC_*" probably only puts just the local language
>>>>         variables in the remote environment.  Did you follow the
>>>>         previous advice [1] of trying to use "*" to put all variables
>>>>         from the local environment?
>>>>
>>>>         [1]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>>>
>>>>>         However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>>
>>>>         That seems to be the main cause of the problem as it should
>>>>         not return (blank) but needs to return "/servernode1" as you
>>>>         previously mentioned [2].
>>>>
>>>>         [2]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>>>
>>>>         Perhaps the message below is a clue.  It you had set the
>>>>         WIENROOT variable in .bashrc of your /home/vlsi accounts on
>>>>         each system, you likely have to login and use that same
>>>>         /home/vlsi account on the head node as the output below seems
>>>>         to indicate login to a different /home/niel account. 
>>>>         Alternatively, setting the WIENROOT variable in .bashrc of
>>>>         all /home/niel accounts on each node might work too.
>>>>
>>>>>            The command ssh vsli1 'pwd $WIENROOT' print "/home/vlsi"
>>>>>         the common home directory and
>>>>>         ssh vlsi1 "env"
>>>>>         ...
>>>>>         USER=niel
>>>>>         PWD=/home/niel
>>>>>         HOME=/home/niel
>>>>>         ...
>>>>>         this is similar as server, and other nodes.
>>>>>
>>>>>         Sir After changing the parallel option file in $WIENROOT in
>>>>>         server to
>>>>>
>>>>>         setenv TASKSET *"yes" from "no"*
>>>>>         if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>>>         if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>>>         setenv WIEN_GRANULARITY 1
>>>>>         setenv DELAY 0.1
>>>>>         setenv SLEEPY 1
>>>>>         setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ 
>>>>> _EXEC_"
>>>>>         setenv CORES_PER_NODE 1
>>>>>
>>>>>         the error is not coming but the program is not increasing
>>>>>         steps after lapw0 it stuck in lapw1
>>>>
>>>>         Since it seemed to be throwing an appropriate error message
>>>>         with TASKSET previously unlike when set to "yes", probably
>>>>         you should change it back to "no".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190930/0a0963a4/attachment.html>


More information about the Wien mailing list