[Wien] parallel ssh error

Laurence Marks laurence.marks at gmail.com
Sun Sep 29 14:11:54 CEST 2019


What does

ssh vlsi1 which lapw1c
give, what does "cat *.error" give in the case directory?
_____
Professor Laurence Marks
"Research is to see what everybody else has seen, and to think what nobody
else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu

On Sun, Sep 29, 2019, 01:17 Indranil mal <indranil.mal at gmail.com> wrote:

> Now echo $WIENROOT is giving the $WIENROOT location.
>
> echo $WIENROOT/lapw*
>
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
> ssh vlsi1 'echo $WIENROOT/lapw*'
>
> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>
>
> However getting the same error
>
>
> >   stop error
>
> grep: *scf1*: No such file or directory
> cp: cannot stat '.in.tmp': No such file or directory
> FERMI - Error
> grep: *scf1*: No such file or directory
> Parallel.scf1_1: No such file or directory.
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
>  LAPW0 END
> hup: Command not found.
>
>
> and lapw2 error file
>
>  'LAPW2' - can't open unit: 30
>
>  'LAPW2' -        filename: Parallel.energy_1
>
> **  testerror: Error in Parallel LAPW2
>
>
> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu> wrote:
>
>> The "sudo service sshd restart" step, which I forgot to copy and paste,
>> that is missing is corrected below.
>> On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>
>> After you set both "SendEnv *" and "AcceptEnv *", did you restart the
>> sshd service [1]?  The following illustrates steps that might help you
>> verify that WIENROOT appears on a remote vlsi node:
>>
>> username at computername:~$ echo $WIENROOT
>>
>> username at computername:~$ export WIENROOT=/servernode1
>> username at computername:~$ echo $WIENROOT
>> /servernode1
>> username at computername:~$ ssh vlsi
>> Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic x86_64)
>> ...
>> Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>> username at computername:~$ echo $WIENROOT
>>
>> username at computername:~$ exit
>> logout
>> Connection to vlsi closed.
>> username at computername:~$ sudo gedit /etc/ssh/ssh_config
>> [sudo] password for username:
>>
>> username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>
>> username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>     SendEnv LANG LC_* WIENROOT
>> username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>> AcceptEnv LANG LC_* WIENROOT
>>
>>    username at computername:~$ sudo service sshd restart
>>
>> username at computername:~$ ssh vlsi
>> ...
>> username at computername:~$ echo $WIENROOT
>> /servernode1
>> username at computername:~$ exit
>>
>> [1]
>> https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__askubuntu.com_questions_462968_take-2Dchanges-2Din-2Dfile-2Dsshd-2Dconfig-2Dfile-2Dwithout-2Dserver-2Dreboot&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=FLliCLwqxudgCchv5BCSfVP-J7BhHgTh4n7QZlioHSc&e=>
>> On 9/28/2019 11:22 AM, Indranil mal wrote:
>>
>> Sir I have tried with " SetEnv * " Still nothing is coming with echo
>>  commad and user name by mistake I posted wrong Otherwise no issue with
>> user name and I have set the parallel options file taksset "no" and remote
>> options are 1 1 in server and client machines.
>>
>>
>> On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu> wrote:
>>
>>> Respected Sir, In my linux(Ubuntu 18.04 LTS) in ssh_config, and in
>>> sshd_config there are two line already "SendEnv LANG LC_*" "AcceptEnv LANG
>>> LC_*" respectively.
>>>
>>> The "LANG LC_*" probably only puts just the local language variables in
>>> the remote environment.  Did you follow the previous advice [1] of trying
>>> to use "*" to put all variables from the local environment?
>>>
>>> [1]
>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19049.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=F2Kzs7Ld5paBoEnONGhjuu1Gkmmzcm97Ym-J9K4SEZI&e=>
>>>
>>> However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>
>>> That seems to be the main cause of the problem as it should not return
>>> (blank) but needs to return "/servernode1" as you previously mentioned [2].
>>>
>>> [2]
>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19036.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=vGl31Rca7NV32sDbba9qX9Fj6fpuj8KtDG8FBeL1emI&e=>
>>>
>>> Perhaps the message below is a clue.  It you had set the WIENROOT
>>> variable in .bashrc of your /home/vlsi accounts on each system, you likely
>>> have to login and use that same /home/vlsi account on the head node as
>>> the output below seems to indicate login to a different /home/niel
>>> account.  Alternatively, setting the WIENROOT variable in .bashrc of all
>>> /home/niel accounts on each node might work too.
>>>
>>>    The command ssh vsli1 'pwd $WIENROOT' print "/home/vlsi" the common
>>> home directory and
>>> ssh vlsi1 "env"
>>> ...
>>> USER=niel
>>> PWD=/home/niel
>>> HOME=/home/niel
>>> ...
>>> this is similar as server, and other nodes.
>>>
>>> Sir After changing the parallel option file in $WIENROOT in server to
>>>
>>> setenv TASKSET *"yes" from "no"*
>>> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>> setenv WIEN_GRANULARITY 1
>>> setenv DELAY 0.1
>>> setenv SLEEPY 1
>>> setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"
>>> setenv CORES_PER_NODE 1
>>>
>>> the error is not coming but the program is not increasing steps after
>>> lapw0 it stuck in lapw1
>>>
>>> Since it seemed to be throwing an appropriate error message with TASKSET
>>> previously unlike when set to "yes", probably you should change it back to
>>> "no".
>>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=R8E75eB_E250kh7rsujLZ73qy8ca1CUKIEvwHn35DlE&e=>
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=Dd_AQj1CRpawVjdP0DjOwjLe_2gCb6LP-j7gZpAsQzU&e=>
>>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__zeus.theochem.tuwien.ac.at_mailman_listinfo_wien&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=R8E75eB_E250kh7rsujLZ73qy8ca1CUKIEvwHn35DlE&e=
> SEARCH the MAILING-LIST at:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_index.html&d=DwICAg&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=Dd_AQj1CRpawVjdP0DjOwjLe_2gCb6LP-j7gZpAsQzU&e=
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190929/8bb95954/attachment.html>


More information about the Wien mailing list