[Wien] parallel ssh error
Gavin Abo
gsabo at crimson.ua.edu
Sun Sep 29 15:02:39 CEST 2019
Checking with "which lapw1c" on each node (vlsi1, vlsi2, vlsi3, and
vlsi4) is a good idea. However, since WIENROOT is (blank) [1], it
probably won't work until that is resolved.
It was mentioned that the WIEN2k .bashrc block was setup on each node by
running userconfig [2]. So it definitely seems strange that WIENROOT is
(blank) on the client nodes since I would think it would work if both
WIENROOT and PATH are both defined from userconfig in .bashrc:
username at computername:~$ ssh vlsi1
...
username at computername:~$ cd ~/WIEN2k
username at computername:~/WIEN2k$ which lapw1c
username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
username at computername:~/WIEN2k$ ./userconfig
...
username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
export WIENROOT=/servernode1
username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
export
PATH=$WIENROOT:$STRUCTEDIT_PATH:$WIENROOT/SRC_IRelast/script-elastic:$PATH:.
export PATH=$PATH:$WIENROOT:.
username at computername:~/WIEN2k$ source ~/.bashrc
username at computername:~/WIEN2k$ which lapw1c
/home/username/WIEN2k/lapw1c
username at computername:~/WIEN2k$ exit
logout
Connection to vlsi1 closed.
Though, I suppose if something like a conf file [3] was setup by the
user to override .bashrc or a job queue scheduler system is in use [4]
it might also cause the issue.
[1]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19052.html
[2]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
[3]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08016.html
[4]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15985.html
On 9/29/2019 6:11 AM, Laurence Marks wrote:
> What does
>
> ssh vlsi1 which lapw1c
> give, what does "cat *.error" give in the case directory?
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu <http://www.numis.northwestern.edu>
>
> On Sun, Sep 29, 2019, 01:17 Indranil mal <indranil.mal at gmail.com
> <mailto:indranil.mal at gmail.com>> wrote:
>
> Now echo $WIENROOT is giving the $WIENROOT location.
>
> echo $WIENROOT/lapw*
>
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
> /home/username/WIEN2K/lapw0para
> /home/username/WIEN2K/lapw0para_lapw /home/username/WIEN2K/lapw1
> /home/username/WIEN2K/lapw1c /home/username/WIEN2K/lapw1c_mpi
> /home/username/WIEN2K/lapw1cpara /home/username/WIEN2K/lapw1_mpi
> /home/username/WIEN2K/lapw1para
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
> /home/username/WIEN2K/lapw2para
> /home/username/WIEN2K/lapw2para_lapw /home/username/WIEN2K/lapw3
> /home/username/WIEN2K/lapw3c /home/username/WIEN2K/lapw5
> /home/username/WIEN2K/lapw5c /home/username/WIEN2K/lapw7
> /home/username/WIEN2K/lapw7c /home/username/WIEN2K/lapwdm
> /home/username/WIEN2K/lapwdmc /home/username/WIEN2K/lapwdmcpara
> /home/username/WIEN2K/lapwdmpara
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
> ssh vlsi1 'echo $WIENROOT/lapw*'
>
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
> /home/username/WIEN2K/lapw0para
> /home/username/WIEN2K/lapw0para_lapw /home/username/WIEN2K/lapw1
> /home/username/WIEN2K/lapw1c /home/username/WIEN2K/lapw1c_mpi
> /home/username/WIEN2K/lapw1cpara /home/username/WIEN2K/lapw1_mpi
> /home/username/WIEN2K/lapw1para
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
> /home/username/WIEN2K/lapw2para
> /home/username/WIEN2K/lapw2para_lapw /home/username/WIEN2K/lapw3
> /home/username/WIEN2K/lapw3c /home/username/WIEN2K/lapw5
> /home/username/WIEN2K/lapw5c /home/username/WIEN2K/lapw7
> /home/username/WIEN2K/lapw7c /home/username/WIEN2K/lapwdm
> /home/username/WIEN2K/lapwdmc /home/username/WIEN2K/lapwdmcpara
> /home/username/WIEN2K/lapwdmpara
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
>
> However getting the same error
>
>
>
>
> > stop error
>
> grep: *scf1*: No such file or directory
> cp: cannot stat '.in.tmp': No such file or directory
> FERMI - Error
> grep: *scf1*: No such file or directory
> Parallel.scf1_1: No such file or directory.
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> LAPW0 END
> hup: Command not found.
>
>
> and lapw2 error file
>
> 'LAPW2' - can't open unit: 30
> 'LAPW2' - filename: Parallel.energy_1
> ** testerror: Error in Parallel LAPW2
>
>
>
> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu
> <mailto:gsabo at crimson.ua.edu>> wrote:
>
> The "sudo service sshd restart" step, which I forgot to copy
> and paste, that is missing is corrected below.
>
> On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>
>> After you set both "SendEnv *" and "AcceptEnv *", did you
>> restart the sshd service [1]? The following illustrates
>> steps that might help you verify that WIENROOT appears on a
>> remote vlsi node:
>>
>> username at computername:~$ echo $WIENROOT
>>
>> username at computername:~$ export WIENROOT=/servernode1
>> username at computername:~$ echo $WIENROOT
>> /servernode1
>> username at computername:~$ ssh vlsi
>> Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic
>> x86_64)
>> ...
>> Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>> username at computername:~$ echo $WIENROOT
>>
>> username at computername:~$ exit
>> logout
>> Connection to vlsi closed.
>> username at computername:~$ sudo gedit /etc/ssh/ssh_config
>> [sudo] password for username:
>>
>> username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>
>> username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>> SendEnv LANG LC_* WIENROOT
>> username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>> AcceptEnv LANG LC_* WIENROOT
>>
> username at computername:~$ sudo service sshd restart
>>
>> username at computername:~$ ssh vlsi
>> ...
>> username at computername:~$ echo $WIENROOT
>> /servernode1
>> username at computername:~$ exit
>>
>> [1]
>> https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__askubuntu.com_questions_462968_take-2Dchanges-2Din-2Dfile-2Dsshd-2Dconfig-2Dfile-2Dwithout-2Dserver-2Dreboot&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=FLliCLwqxudgCchv5BCSfVP-J7BhHgTh4n7QZlioHSc&e=>
>>
>> On 9/28/2019 11:22 AM, Indranil mal wrote:
>>> Sir I have tried with " SetEnv * " Still nothing is coming
>>> with echo commad and user name by mistake I posted wrong
>>> Otherwise no issue with user name and I have set the
>>> parallel options file taksset "no" and remote options are 1
>>> 1 in server and client machines.
>>>
>>>
>>> On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu
>>> <mailto:gsabo at crimson.ua.edu>> wrote:
>>>
>>>> Respected Sir, In my linux(Ubuntu 18.04 LTS) in
>>>> ssh_config, and in sshd_config there are two line
>>>> already "SendEnv LANG LC_*" "AcceptEnv LANG LC_*"
>>>> respectively.
>>>
>>> The "LANG LC_*" probably only puts just the local
>>> language variables in the remote environment. Did you
>>> follow the previous advice [1] of trying to use "*" to
>>> put all variables from the local environment?
>>>
>>> [1]
>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19049.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=F2Kzs7Ld5paBoEnONGhjuu1Gkmmzcm97Ym-J9K4SEZI&e=>
>>>
>>>> However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>
>>> That seems to be the main cause of the problem as it
>>> should not return (blank) but needs to return
>>> "/servernode1" as you previously mentioned [2].
>>>
>>> [2]
>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19036.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=vGl31Rca7NV32sDbba9qX9Fj6fpuj8KtDG8FBeL1emI&e=>
>>>
>>> Perhaps the message below is a clue. It you had set the
>>> WIENROOT variable in .bashrc of your /home/vlsi accounts
>>> on each system, you likely have to login and use that
>>> same /home/vlsi account on the head node as the output
>>> below seems to indicate login to a different /home/niel
>>> account. Alternatively, setting the WIENROOT variable in
>>> .bashrc of all /home/niel accounts on each node might
>>> work too.
>>>
>>>> The command ssh vsli1 'pwd $WIENROOT' print
>>>> "/home/vlsi" the common home directory and
>>>> ssh vlsi1 "env"
>>>> ...
>>>> USER=niel
>>>> PWD=/home/niel
>>>> HOME=/home/niel
>>>> ...
>>>> this is similar as server, and other nodes.
>>>>
>>>> Sir After changing the parallel option file in
>>>> $WIENROOT in server to
>>>>
>>>> setenv TASKSET *"yes" from "no"*
>>>> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>> setenv WIEN_GRANULARITY 1
>>>> setenv DELAY 0.1
>>>> setenv SLEEPY 1
>>>> setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile
>>>> _HOSTS_ _EXEC_"
>>>> setenv CORES_PER_NODE 1
>>>>
>>>> the error is not coming but the program is not
>>>> increasing steps after lapw0 it stuck in lapw1
>>>
>>> Since it seemed to be throwing an appropriate error
>>> message with TASKSET previously unlike when set to
>>> "yes", probably you should change it back to "no".
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190929/b2684eb2/attachment-0001.html>
More information about the Wien
mailing list