[Wien] Fail to parallel calculation of lapw1 and lapw2 (testpara1 and testpara2)

Gavin Abo gsabo at crimson.ua.edu
Sun Oct 28 14:14:47 CET 2018


What does "ls -al ~/.ssh/config" give you?

That error is reproducible with Ubuntu 18.04.1 LTS:

username at computername:~$ cat ~/.ssh/config
Host *

      HostName 127.0.0.1

      User username

      ForwardX11Trusted yes

      GatewayPorts yes

      GSSAPIAuthentication yes
username at computername:~$ chmod 666 ~/.ssh/config
username at computername:~$ ls -al ~/.ssh/config
-rw-rw-rw- 1 username username 131 Oct 28 06:54 /home/username/.ssh/config
username at computername:~$ ssh localhost
Bad owner or permissions on /home/username/.ssh/config

Using a set of proper chmod (and chown) file permission indeed seems to 
fix the problem [ 
https://serverfault.com/questions/253313/ssh-returns-bad-owner-or-permissions-on-ssh-config 
]:

username at computername:~$ chmod 644 ~/.ssh/config
username at computername:~$ ls -al ~/.ssh/config
-rw-r--r-- 1 username username 131 Oct 28 06:54 /home/username/.ssh/config
username at computername:~$ ssh localhost
...

Last login: Sun Oct 28 06:54:48 2018 from 127.0.0.1
username at computername:~$

Also, you might have to change "User localhost" to "User username" and 
HostName may need changed from 0.0.0.0 to the loopback address 127.0.0.1 
[ https://en.wikipedia.org/wiki/Localhost ] in your config file, where 
username has to be replaced by your actual user name.

On 10/28/2018 4:04 AM, Woohyeon Baek wrote:
>
> Dear administraters or technicians of WIEN2k,
>
>
>
> Hello. I am an user of WIEN2k v17.1 and now upgraded to 18.2.
>
>
>
> (The specification of my nodes is 2 CPUs with 56 threads in total 
> (Xeon intel E5-2696 series) and CentOS 17.)
>
>
> (I had no installation problems for ./siteconfig when I 
> compiled all with intel compilers with mpi, fftw, scalapack, mkl and 
> libxc library.)
>
>
>
> I have a problem of parallel calculation of lapw1 and lapw2 modules 
> through w2web with tunneling of putty.
>
>
> (The input text and results are in below.)
>
>
> When I tried to calculate my system, it showed constant error about 
> *bad users or permissions* on config file.
>
>
> When I check the archives and googles to solve, they said that the 
> problem is in authorizations. So
>
>
> 1. I already did ssh-keygen command and appending key_authorized but 
> it did not make any difference.
>
>
> 2. I tried changing authorities of config file by chmod and chown 
> commands but it did not worked. (I could not find different solutions 
> except this.)
>
>
> 3. I checked the *.error files of testpara1 and 2 results and it just 
> shows nothing but Error without any comments.
>
>
>
> When I tried without parallization for small size system (only 1 job), 
> the calculation worked without problems.
>
>
>
> I also checked testpara of each lapw modules and lapw1 and 2 showed 
> errors.
>
>
> It seems lapw1 runs without parallelization and lapw2 does not work.
>
>
>
> I would really appreciated if there has a way how to solve problems.
>
>
> I am really thank you for your help in advance.
>
>
>
> (I used just 4 threads for test due to long sentences. Of course I 
> tried using full threads but it did not worked.)
>
>
>
> *.machines file*
>
> -----------------------------
>
> granularity:1
> 1:localhost:4    (I  tried my username but it did not worked. I also 
> tried 1:localhost, 1:localhost localhost:1 and 1:localhost 1:localhost)
> lapw0:localhost:2 localhost:2
> dstart:localhost:2 localhost:2
> nlvdw:localhost:2 localhost:2
>
> ------------------------------
>
>
> *~/.ssh/config*
>
> -------------------------
>
> Host    *
>
> HostName 0.0.0.0   (I also tried my fixed IP but it did not worked)
>
> User localhost
>
> ForwardX11Trusted yes
>
> GatewayPorts yes
>
> GSSAPIAuthentication yes
>
> -------------------------
>
>
> *SCF results*
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> changing 1.in2c changing 1.in2_ls changing 1.in2_st changing 1.in2_sy 
> LAPW0 END [1] Done mpirun -np 4 -machinefile .machine0 
> /home/User/software/WIEN2K/lapw0_mpi lapw0.def >> .time00 DFTD3 END 
> *Bad owner or permissions on /home/User/.ssh/config* [1] + Exit 255 ( 
> $remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]" ) 
> >> .time1_$loop cat: .time1_1: No such file or directory cat: 
> .time1_1: No such file or directory 1.scf1up_1: No such file or 
> directory. cat: No match. grep: No match. grep: No match. grep: No 
> match. > stop error
>
> ------------------------------------------------------------------------------------------------------------------------------
>
> *testpara*
>
> ------------------------------------------------------------
>
> #####################################################
> #                     TESTPARA                      #
> #####################################################
>
> Test: LAPW1 in parallel mode (using .machines)
> Granularity set to 1
> Extrafine unset
> weights: 1
> sumw: 1
> k-points: 30
>
>      klist:       30
>      machines:    localhost
>      procs:       1
>      weigh(old):  1
>      sumw:        1
>      granularity: 1
>      weigh(new):  30
>
> Distribution of k-point (under ideal conditions)
> will be:
>
> 1 : localhost(30) 30k
>
> -------------------------------------------------------
>
>
> *testpara1*
>
> -------------------------------------------------------------
>
> ##################################################### # TESTPARA1 # 
> ##################################################### Sun Oct 28 
> 18:12:33 KST 2018 lapw1para is running 30 of 30 (100%) k-points 
> distributed localhost: running localhost: not running localhost: not 
> running localhost: not running 
> ------------------------------------------------------
>
> *testpara2*
>
> --------------------------------------------------------------
>
> #####################################################
> #                     TESTPARA2                     #
> #####################################################
>
> Sun Oct 28 18:12:47 KST 2018
>
>      lapw2para exited due to an ERROR
>      Check *.error files
>
> ---------------------------------------------------------------
>
>
>
> Sincerely,
>
>
> Woohyeon Baek
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20181028/cc4d0332/attachment.html>


More information about the Wien mailing list