[Wien] parallel ssh error
Gavin Abo
gsabo at crimson.ua.edu
Mon Sep 30 13:41:45 CEST 2019
An additional comment, /home/username/WIEN2k (or ~/WIEN2k) is where I
have WIEN2k installed. Whereas, you have installed WIEN2k at
/servernode1 [1]. In the examples of my previous posts (e.g. [2]) you
might find some typographical errors were I forget to replace my
/home/username/WIEN2k with your /servernode1.
It is best to have WIEN2k setup at a common path location on all nodes
[3,4] (i.e., your vlsi1-vsli4). Therefore, I recommend to not have
WIEN2k at other locations among your system nodes like the
/home/username/WIEN2K as you mentioned below.
[1]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
[2]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19061.html
[3]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17988.html
[4]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09229.html
On 9/30/2019 12:59 AM, Peter Blaha wrote:
> So there is progress as now the environment seems to be accepted in
> the remote shell.
>
> lapw1para (called by x_lapw, which is called by run_lapw -p) creates
> the splitted klists-files (case.klist_1,...) and def files
> lapw1_1.def,...
>
> It uses the $cwd variable and executes basically:
>
> ssh vlsi1 "cd $cwd; lapw1c lapw1_1.def "
>
> Does this work on your computers ?
>
>
>
> On 9/29/19 7:16 PM, Indranil mal wrote:
>> Now echo $WIENROOT is giving the $WIENROOT location.
>>
>> echo $WIENROOT/lapw*
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>> ssh vlsi1 'echo $WIENROOT/lapw*'
>>
>> /home/username/WIEN2K/lapw0 /home/username/WIEN2K/lapw0_mpi
>> /home/username/WIEN2K/lapw0para /home/username/WIEN2K/lapw0para_lapw
>> /home/username/WIEN2K/lapw1 /home/username/WIEN2K/lapw1c
>> /home/username/WIEN2K/lapw1c_mpi /home/username/WIEN2K/lapw1cpara
>> /home/username/WIEN2K/lapw1_mpi /home/username/WIEN2K/lapw1para
>> /home/username/WIEN2K/lapw1para_lapw /home/username/WIEN2K/lapw2
>> /home/username/WIEN2K/lapw2c /home/username/WIEN2K/lapw2c_mpi
>> /home/username/WIEN2K/lapw2cpara /home/username/WIEN2K/lapw2_mpi
>> /home/username/WIEN2K/lapw2para /home/username/WIEN2K/lapw2para_lapw
>> /home/username/WIEN2K/lapw3 /home/username/WIEN2K/lapw3c
>> /home/username/WIEN2K/lapw5 /home/username/WIEN2K/lapw5c
>> /home/username/WIEN2K/lapw7 /home/username/WIEN2K/lapw7c
>> /home/username/WIEN2K/lapwdm /home/username/WIEN2K/lapwdmc
>> /home/username/WIEN2K/lapwdmcpara /home/username/WIEN2K/lapwdmpara
>> /home/username/WIEN2K/lapwdmpara_lapw /home/username/WIEN2K/lapwso
>> /home/username/WIEN2K/lapwsocpara /home/username/WIEN2K/lapwso_mpi
>> /home/username/WIEN2K/lapwsopara /home/username/WIEN2K/lapwsopara_lapw
>>
>>
>> However getting the same error
>>
>>
>>
>>
>>> stop error
>>
>> grep: *scf1*: No such file or directory
>> cp: cannot stat '.in.tmp': No such file or directory
>> FERMI - Error
>> grep: *scf1*: No such file or directory
>> Parallel.scf1_1: No such file or directory.
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> bash: fixerror_lapw: command not found
>> bash: lapw1c: command not found
>> LAPW0 END
>> hup: Command not found.
>>
>>
>> and lapw2 error file
>>
>> 'LAPW2' - can't open unit: 30
>> 'LAPW2' - filename: Parallel.energy_1
>> ** testerror: Error in Parallel LAPW2
>>
>>
>>
>> On Sat, Sep 28, 2019 at 11:58 PM Gavin Abo <gsabo at crimson.ua.edu
>> <mailto:gsabo at crimson.ua.edu>> wrote:
>>
>> The "sudo service sshd restart" step, which I forgot to copy and
>> paste, that is missing is corrected below.
>>
>> On 9/28/2019 12:18 PM, Gavin Abo wrote:
>>>
>>> After you set both "SendEnv *" and "AcceptEnv *", did you restart
>>> the sshd service [1]? The following illustrates steps that might
>>> help you verify that WIENROOT appears on a remote vlsi node:
>>>
>>> username at computername:~$ echo $WIENROOT
>>>
>>> username at computername:~$ export WIENROOT=/servernode1
>>> username at computername:~$ echo $WIENROOT
>>> /servernode1
>>> username at computername:~$ ssh vlsi
>>> Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-64-generic x86_64)
>>> ...
>>> Last login: Sat Sep 28 12:04:07 2019 from xxx.x.x.x
>>> username at computername:~$ echo $WIENROOT
>>>
>>> username at computername:~$ exit
>>> logout
>>> Connection to vlsi closed.
>>> username at computername:~$ sudo gedit /etc/ssh/ssh_config
>>> [sudo] password for username:
>>>
>>> username at computername:~$ sudo gedit /etc/ssh/sshd_config
>>>
>>> username at computername:~$ grep SendEnv /etc/ssh/ssh_config
>>> SendEnv LANG LC_* WIENROOT
>>> username at computername:~$ grep AcceptEnv /etc/ssh/sshd_config
>>> AcceptEnv LANG LC_* WIENROOT
>>>
>> username at computername:~$ sudo service sshd restart
>>>
>>> username at computername:~$ ssh vlsi
>>> ...
>>> username at computername:~$ echo $WIENROOT
>>> /servernode1
>>> username at computername:~$ exit
>>>
>>> [1]
>>> https://askubuntu.com/questions/462968/take-changes-in-file-sshd-config-file-without-server-reboot
>>>
>>> On 9/28/2019 11:22 AM, Indranil mal wrote:
>>>> Sir I have tried with " SetEnv * " Still nothing is coming with
>>>> echo commad and user name by mistake I posted wrong Otherwise no
>>>> issue with user name and I have set the parallel options file
>>>> taksset "no" and remote options are 1 1 in server and client
>>>> machines.
>>>>
>>>>
>>>> On Sat, 28 Sep 2019 11:36 Gavin Abo, <gsabo at crimson.ua.edu
>>>> <mailto:gsabo at crimson.ua.edu>> wrote:
>>>>
>>>>> Respected Sir, In my linux(Ubuntu 18.04 LTS) in ssh_config,
>>>>> and in sshd_config there are two line already "SendEnv LANG
>>>>> LC_*" "AcceptEnv LANG LC_*" respectively.
>>>>
>>>> The "LANG LC_*" probably only puts just the local language
>>>> variables in the remote environment. Did you follow the
>>>> previous advice [1] of trying to use "*" to put all variables
>>>> from the local environment?
>>>>
>>>> [1]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19049.html
>>>>
>>>>> However, ssh vsli1 'echo $WIENROOT' gives nothing (blank).
>>>>
>>>> That seems to be the main cause of the problem as it should
>>>> not return (blank) but needs to return "/servernode1" as you
>>>> previously mentioned [2].
>>>>
>>>> [2]
>>>> https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg19036.html
>>>>
>>>> Perhaps the message below is a clue. It you had set the
>>>> WIENROOT variable in .bashrc of your /home/vlsi accounts on
>>>> each system, you likely have to login and use that same
>>>> /home/vlsi account on the head node as the output below seems
>>>> to indicate login to a different /home/niel account.
>>>> Alternatively, setting the WIENROOT variable in .bashrc of
>>>> all /home/niel accounts on each node might work too.
>>>>
>>>>> The command ssh vsli1 'pwd $WIENROOT' print "/home/vlsi"
>>>>> the common home directory and
>>>>> ssh vlsi1 "env"
>>>>> ...
>>>>> USER=niel
>>>>> PWD=/home/niel
>>>>> HOME=/home/niel
>>>>> ...
>>>>> this is similar as server, and other nodes.
>>>>>
>>>>> Sir After changing the parallel option file in $WIENROOT in
>>>>> server to
>>>>>
>>>>> setenv TASKSET *"yes" from "no"*
>>>>> if ( ! $?USE_REMOTE ) setenv USE_REMOTE 1
>>>>> if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 1
>>>>> setenv WIEN_GRANULARITY 1
>>>>> setenv DELAY 0.1
>>>>> setenv SLEEPY 1
>>>>> setenv WIEN_MPIRUN "mpirun -np _NP_ -machinefile _HOSTS_
>>>>> _EXEC_"
>>>>> setenv CORES_PER_NODE 1
>>>>>
>>>>> the error is not coming but the program is not increasing
>>>>> steps after lapw0 it stuck in lapw1
>>>>
>>>> Since it seemed to be throwing an appropriate error message
>>>> with TASKSET previously unlike when set to "yes", probably
>>>> you should change it back to "no".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190930/0a0963a4/attachment.html>
More information about the Wien
mailing list