[Wien] parallel ssh error

Gavin Abo gsabo at crimson.ua.edu
Sun Sep 29 15:02:39 CEST 2019


Checking with "which lapw1c" on each node (vlsi1, vlsi2, vlsi3, and 
vlsi4) is a good idea.  However, since WIENROOT is (blank) [1], it 
probably won't work until that is resolved.

It was mentioned that the WIEN2k .bashrc block was setup on each node by 
running userconfig [2]. So it definitely seems strange that WIENROOT is 
(blank) on the client nodes since I would think it would work if both 
WIENROOT and PATH are both defined from userconfig in .bashrc:

username at computername:~$ ssh vlsi1
...
username at computername:~$ cd ~/WIEN2k
username at computername:~/WIEN2k$ which lapw1c
username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
username at computername:~/WIEN2k$ ./userconfig
...
username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
export WIENROOT=/servernode1
username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
export 
PATH=$WIENROOT:$STRUCTEDIT_PATH:$WIENROOT/SRC_IRelast/script-elastic:$PATH:.
export PATH=$PATH:$WIENROOT:.
username at computername:~/WIEN2k$ source ~/.bashrc
username at computername:~/WIEN2k$ which lapw1c
/home/username/WIEN2k/lapw1c
username at computername:~/WIEN2k$ exit
logout
Connection to vlsi1 closed.

Though, I suppose if something like a conf file [3] was setup by the 
user to override .bashrc or a job queue scheduler system is in use [4] 
it might also cause the issue.

[1] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19052.html
[2] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
[3] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08016.html
[4] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15985.html

On 9/29/2019 6:11 AM, Laurence Marks wrote:
> What does
>
> ssh vlsi1 which lapw1c
> give, what does "cat *.error" give in the case directory?
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what 
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Sun, Sep 29, 2019, 01:17 Indranil mal <indranil.mal at gmail.com 
> <mailto:indranil.mal at gmail.com>> wrote:
>
>     Now echo $WIENROOT is giving the $WIENROOT location.
>
>     echo $WIENROOT/lapw*
>
>     /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>     /home/username/WIEN2K/lapw0para
>     /home/username/WIEN2K/lapw0para_lapw /home/username/WIEN2K/lapw1
>     /home/username/WIEN2K/lapw1c /home/username/WIEN2K/lapw1c_mpi
>     /home/username/WIEN2K/lapw1cpara /home/username/WIEN2K/lapw1_mpi
>     /home/username/WIEN2K/lapw1para
>     /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>     /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>     /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>     /home/username/WIEN2K/lapw2para
>     /home/username/WIEN2K/lapw2para_lapw /home/username/WIEN2K/lapw3
>     /home/username/WIEN2K/lapw3c /home/username/WIEN2K/lapw5
>     /home/username/WIEN2K/lapw5c /home/username/WIEN2K/lapw7
>     /home/username/WIEN2K/lapw7c /home/username/WIEN2K/lapwdm
>     /home/username/WIEN2K/lapwdmc /home/username/WIEN2K/lapwdmcpara
>     /home/username/WIEN2K/lapwdmpara
>     /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>     /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>     /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
>     ssh vlsi1 'echo $WIENROOT/lapw*'
>
>     /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>     /home/username/WIEN2K/lapw0para
>     /home/username/WIEN2K/lapw0para_lapw /home/username/WIEN2K/lapw1
>     /home/username/WIEN2K/lapw1c /home/username/WIEN2K/lapw1c_mpi
>     /home/username/WIEN2K/lapw1cpara /home/username/WIEN2K/lapw1_mpi
>     /home/username/WIEN2K/lapw1para
>     /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>     /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>     /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>     /home/username/WIEN2K/lapw2para
>     /home/username/WIEN2K/lapw2para_lapw /home/username/WIEN2K/lapw3
>     /home/username/WIEN2K/lapw3c /home/username/WIEN2K/lapw5
>     /home/username/WIEN2K/lapw5c /home/username/WIEN2K/lapw7
>     /home/username/WIEN2K/lapw7c /home/username/WIEN2K/lapwdm
>     /home/username/WIEN2K/lapwdmc /home/username/WIEN2K/lapwdmcpara
>     /home/username/WIEN2K/lapwdmpara
>     /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>     /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>     /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
>
>     However getting the same error
>
>
>     	
>
>     >   stop error
>
>     grep: *scf1*: No such file or directory
>     cp: cannot stat '.in.tmp': No such file or directory
>     FERMI - Error
>     grep: *scf1*: No such file or directory
>     Parallel.scf1_1: No such file or directory.
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>     bash: fixerror_lapw: command not found
>     bash: lapw1c: command not found
>       LAPW0 END
>     hup: Command not found.
>
>
>     and lapw2 error file
>
>      'LAPW2' - can't open unit: 30
>      'LAPW2' -        filename: Parallel.energy_1
>     **  testerror: Error in Parallel LAPW2
>
>
>
>     On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu
>     <mailto:gsabo at crimson.ua.edu>> wrote:
>
>         The "sudo service sshd restart" step, which I forgot to copy
>         and paste, that is missing is corrected below.
>
>         On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>
>>         After you set both "SendEnv *" and "AcceptEnv *", did you
>>         restart the sshd service [1]?  The following illustrates
>>         steps that might help you verify that WIENROOT appears on a
>>         remote vlsi node:
>>
>>         username at computername:~$ echo $WIENROOT
>>
>>         username at computername:~$ export WIENROOT=/servernode1
>>         username at computername:~$ echo $WIENROOT
>>         /servernode1
>>         username at computername:~$ ssh vlsi
>>         Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic
>>         x86_64)
>>         ...
>>         Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>>         username at computername:~$ echo $WIENROOT
>>
>>         username at computername:~$ exit
>>         logout
>>         Connection to vlsi closed.
>>         username at computername:~$ sudo gedit /etc/ssh/ssh_config
>>         [sudo] password for username:
>>
>>         username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>
>>         username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>             SendEnv LANG LC_* WIENROOT
>>         username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>>         AcceptEnv LANG LC_* WIENROOT
>>
>            username at computername:~$ sudo service sshd restart
>>
>>         username at computername:~$ ssh vlsi
>>         ...
>>         username at computername:~$ echo $WIENROOT
>>         /servernode1
>>         username at computername:~$ exit
>>
>>         [1]
>>         https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__askubuntu.com_questions_462968_take-2Dchanges-2Din-2Dfile-2Dsshd-2Dconfig-2Dfile-2Dwithout-2Dserver-2Dreboot&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=FLliCLwqxudgCchv5BCSfVP-J7BhHgTh4n7QZlioHSc&e=>
>>
>>         On 9/28/2019 11:22 AM, Indranil mal wrote:
>>>         Sir I have tried with " SetEnv * " Still nothing is coming
>>>         with echo  commad and user name by mistake I posted wrong
>>>         Otherwise no issue with user name and I have set the
>>>         parallel options file taksset "no" and remote options are 1
>>>         1 in server and client machines.
>>>
>>>
>>>         On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu
>>>         <mailto:gsabo at crimson.ua.edu>> wrote:
>>>
>>>>             Respected Sir, In my linux(Ubuntu 18.04 LTS) in
>>>>             ssh_config, and in sshd_config there are two line
>>>>             already "SendEnv LANG LC_*" "AcceptEnv LANG LC_*"
>>>>             respectively.
>>>
>>>             The "LANG LC_*" probably only puts just the local
>>>             language variables in the remote environment.  Did you
>>>             follow the previous advice [1] of trying to use "*" to
>>>             put all variables from the local environment?
>>>
>>>             [1]
>>>             https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>>             <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19049.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=F2Kzs7Ld5paBoEnONGhjuu1Gkmmzcm97Ym-J9K4SEZI&e=>
>>>
>>>>             However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>
>>>             That seems to be the main cause of the problem as it
>>>             should not return (blank) but needs to return
>>>             "/servernode1" as you previously mentioned [2].
>>>
>>>             [2]
>>>             https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>>             <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19036.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=vGl31Rca7NV32sDbba9qX9Fj6fpuj8KtDG8FBeL1emI&e=>
>>>
>>>             Perhaps the message below is a clue.  It you had set the
>>>             WIENROOT variable in .bashrc of your /home/vlsi accounts
>>>             on each system, you likely have to login and use that
>>>             same /home/vlsi account on the head node as the output
>>>             below seems to indicate login to a different /home/niel
>>>             account. Alternatively, setting the WIENROOT variable in
>>>             .bashrc of all /home/niel accounts on each node might
>>>             work too.
>>>
>>>>                The command ssh vsli1 'pwd $WIENROOT' print
>>>>             "/home/vlsi" the common home directory and
>>>>             ssh vlsi1 "env"
>>>>             ...
>>>>             USER=niel
>>>>             PWD=/home/niel
>>>>             HOME=/home/niel
>>>>             ...
>>>>             this is similar as server, and other nodes.
>>>>
>>>>             Sir After changing the parallel option file in
>>>>             $WIENROOT in server to
>>>>
>>>>             setenv TASKSET *"yes" from "no"*
>>>>             if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>>             if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>>             setenv WIEN_GRANULARITY 1
>>>>             setenv DELAY 0.1
>>>>             setenv SLEEPY 1
>>>>             setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile
>>>>             _HOSTS_ _EXEC_"
>>>>             setenv CORES_PER_NODE 1
>>>>
>>>>             the error is not coming but the program is not
>>>>             increasing steps after lapw0 it stuck in lapw1
>>>
>>>             Since it seemed to be throwing an appropriate error
>>>             message with TASKSET previously unlike when set to
>>>             "yes", probably you should change it back to "no".
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190929/b2684eb2/attachment-0001.html>


More information about the Wien mailing list