[Wien] parallel ssh error

Indranil mal indranil.mal at gmail.com
Mon Sep 30 18:35:58 CEST 2019


 Thank you Sir for your instantaneous support. Now it is working smoothly
only with  hup: Command not found.



On Sun, Sep 29, 2019 at 6:32 PM Gavin Abo <gsabo at crimson.ua.edu> wrote:

> Checking with "which lapw1c" on each node (vlsi1, vlsi2, vlsi3, and vlsi4)
> is a good idea.  However, since WIENROOT is (blank) [1], it probably won't
> work until that is resolved.
>
> It was mentioned that the WIEN2k .bashrc block was setup on each node by
> running userconfig [2].  So it definitely seems strange that WIENROOT is
> (blank) on the client nodes since I would think it would work if both
> WIENROOT and PATH are both defined from userconfig in .bashrc:
>
> username at computername:~$ ssh vlsi1
> ...
> username at computername:~$ cd ~/WIEN2k
> username at computername:~/WIEN2k$ which lapw1c
> username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
> username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
> username at computername:~/WIEN2k$ ./userconfig
> ...
> username at computername:~/WIEN2k$ grep "export WIENROOT" ~/.bashrc
> export WIENROOT=/servernode1
> username at computername:~/WIEN2k$ grep "export PATH" ~/.bashrc
> export
> PATH=$WIENROOT:$STRUCTEDIT_PATH:$WIENROOT/SRC_IRelast/script-elastic:$PATH:.
> export PATH=$PATH:$WIENROOT:.
> username at computername:~/WIEN2k$ source ~/.bashrc
> username at computername:~/WIEN2k$ which lapw1c
> /home/username/WIEN2k/lapw1c
> username at computername:~/WIEN2k$ exit
> logout
> Connection to vlsi1 closed.
>
> Though, I suppose if something like a conf file [3] was setup by the user
> to override .bashrc or a job queue scheduler system is in use [4] it might
> also cause the issue.
> [1]
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19052.html
> [2]
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
> [3]
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08016.html
> [4]
> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg15985.html
>
> On 9/29/2019 6:11 AM, Laurence Marks wrote:
>
> What does
>
> ssh vlsi1 which lapw1c
> give, what does "cat *.error" give in the case directory?
> _____
> Professor Laurence Marks
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought", Albert Szent-Gyorgi
> www.numis.northwestern.edu
>
> On Sun, Sep 29, 2019, 01:17 Indranil mal <indranil.mal at gmail.com> wrote:
>
>> Now echo $WIENROOT is giving the $WIENROOT location.
>>
>> echo $WIENROOT/lapw*
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>> ssh vlsi1 'echo $WIENROOT/lapw*'
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>>
>> However getting the same error
>>
>>
>> >   stop error
>>
>> grep: *scf1*: No such file or directory
>> cp: cannot stat '.in.tmp': No such file or directory
>> FERMI - Error
>> grep: *scf1*: No such file or directory
>> Parallel.scf1_1: No such file or directory.
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>>  LAPW0 END
>> hup: Command not found.
>>
>>
>> and lapw2 error file
>>
>>  'LAPW2' - can't open unit: 30
>>
>>  'LAPW2' -        filename: Parallel.energy_1
>>
>> **  testerror: Error in Parallel LAPW2
>>
>>
>> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu> wrote:
>>
>>> The "sudo service sshd restart" step, which I forgot to copy and paste,
>>> that is missing is corrected below.
>>> On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>>
>>> After you set both "SendEnv *" and "AcceptEnv *", did you restart the
>>> sshd service [1]?  The following illustrates steps that might help you
>>> verify that WIENROOT appears on a remote vlsi node:
>>>
>>> username at computername:~$ echo $WIENROOT
>>>
>>> username at computername:~$ export WIENROOT=/servernode1
>>> username at computername:~$ echo $WIENROOT
>>> /servernode1
>>> username at computername:~$ ssh vlsi
>>> Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic x86_64)
>>> ...
>>> Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>>> username at computername:~$ echo $WIENROOT
>>>
>>> username at computername:~$ exit
>>> logout
>>> Connection to vlsi closed.
>>> username at computername:~$ sudo gedit /etc/ssh/ssh_config
>>> [sudo] password for username:
>>>
>>> username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>>
>>> username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>>     SendEnv LANG LC_* WIENROOT
>>> username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>>> AcceptEnv LANG LC_* WIENROOT
>>>
>>>    username at computername:~$ sudo service sshd restart
>>>
>>> username at computername:~$ ssh vlsi
>>> ...
>>> username at computername:~$ echo $WIENROOT
>>> /servernode1
>>> username at computername:~$ exit
>>>
>>> [1]
>>> https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__askubuntu.com_questions_462968_take-2Dchanges-2Din-2Dfile-2Dsshd-2Dconfig-2Dfile-2Dwithout-2Dserver-2Dreboot&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=FLliCLwqxudgCchv5BCSfVP-J7BhHgTh4n7QZlioHSc&e=>
>>> On 9/28/2019 11:22 AM, Indranil mal wrote:
>>>
>>> Sir I have tried with " SetEnv * " Still nothing is coming with echo
>>>  commad and user name by mistake I posted wrong Otherwise no issue with
>>> user name and I have set the parallel options file taksset "no" and remote
>>> options are 1 1 in server and client machines.
>>>
>>>
>>> On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu> wrote:
>>>
>>>> Respected Sir, In my linux(Ubuntu 18.04 LTS) in ssh_config, and in
>>>> sshd_config there are two line already "SendEnv LANG LC_*" "AcceptEnv LANG
>>>> LC_*" respectively.
>>>>
>>>> The "LANG LC_*" probably only puts just the local language variables in
>>>> the remote environment.  Did you follow the previous advice [1] of trying
>>>> to use "*" to put all variables from the local environment?
>>>>
>>>> [1]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19049.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=F2Kzs7Ld5paBoEnONGhjuu1Gkmmzcm97Ym-J9K4SEZI&e=>
>>>>
>>>> However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>>
>>>> That seems to be the main cause of the problem as it should not return
>>>> (blank) but needs to return "/servernode1" as you previously mentioned [2].
>>>>
>>>> [2]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_wien-40zeus.theochem.tuwien.ac.at_msg19036.html&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=BP0kuacuLbCvaswrI8sq446rvs3sVq5NP9e7yGAiTJU&s=vGl31Rca7NV32sDbba9qX9Fj6fpuj8KtDG8FBeL1emI&e=>
>>>>
>>>> Perhaps the message below is a clue.  It you had set the WIENROOT
>>>> variable in .bashrc of your /home/vlsi accounts on each system, you likely
>>>> have to login and use that same /home/vlsi account on the head node as
>>>> the output below seems to indicate login to a different /home/niel
>>>> account.  Alternatively, setting the WIENROOT variable in .bashrc of all
>>>> /home/niel accounts on each node might work too.
>>>>
>>>>    The command ssh vsli1 'pwd $WIENROOT' print "/home/vlsi" the common
>>>> home directory and
>>>> ssh vlsi1 "env"
>>>> ...
>>>> USER=niel
>>>> PWD=/home/niel
>>>> HOME=/home/niel
>>>> ...
>>>> this is similar as server, and other nodes.
>>>>
>>>> Sir After changing the parallel option file in $WIENROOT in server to
>>>>
>>>> setenv TASKSET *"yes" from "no"*
>>>> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>> setenv WIEN_GRANULARITY 1
>>>> setenv DELAY 0.1
>>>> setenv SLEEPY 1
>>>> setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_"
>>>> setenv CORES_PER_NODE 1
>>>>
>>>> the error is not coming but the program is not increasing steps after
>>>> lapw0 it stuck in lapw1
>>>>
>>>> Since it seemed to be throwing an appropriate error message with
>>>> TASKSET previously unlike when set to "yes", probably you should change it
>>>> back to "no".
>>>>
>>> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190930/1eeb5eaa/attachment.html>


More information about the Wien mailing list