[Wien] qtl: error reading parallel vectors

Peter Blaha pblaha at theochem.tuwien.ac.at
Sun Oct 25 06:45:10 CET 2020


qtlpara is not ready to use parallel vectors from scratch directories on 
different nodes, but so far requires that all vector files are 
accessible directly.

Both,   x lapw2 -p -qtl and also   x qtl -p   run actually in single 
mode, the -p directs them to read the .processes files and to use all 
the parallel vectors (case.vector_1 .._2, ...).

When using a local SCRATCH directory, the vectors are stored there and 
ONLY ACCESSIBLE on the corresponding node. Thus it works if using a 
single node (all parallel vector files are accessible on that node), but 
does not when using 2 or more nodes.

lapw2para can overcome this limitation, since it has a line with 
vec2old_lapw, which uses scp to copy all vector files from the different 
nodes to the local machine:

qtl:
echo "calculating QTL's from parallel vectors"
vec2old_lapw -p -local $so -$updn                      # <-------
$exe $def.def $maxproc

in qtlpara, this line is missing.

echo "calculating QTL's from parallel vectors"
$exe $def.def $maxproc


Please insert the vec2old_lapw line into qtlpara just before the $exe line.

Am 24.10.2020 um 22:30 schrieb Christian Søndergaard Pedersen:
> Hello Gavin
> 
> 
> Thanks for your reply, and apologies for my tardiness.
> 
> 
> [1] All my calculations are run in MPI-parallel on our HPC cluster. I 
> cannot execute any 'x lapw[0,1,2] -p' command in the terminal (on the 
> cluster login node); this results in 'pbsssh: command not found'. 
> However, submitting via the SLURM workload manager works fine. In all my 
> submit scripts, I specify 'setenv SCRATCH /scratch/$USER', which is the 
> proper location of scratch storage on our HPC cluster.
> 
> 
> [2] Without having tried your example for diamond, I can report that 
> 'run_lapw -p' followed by 'x qtl -p -telnes' works without problems for 
> a single cell of Vanadium dioxide. However, for other systems I get the 
> error I specified. The other systems (1) are larger, and (2) use two 
> CPU's instead of a single CPU (.machines file are modified suitably).
> 
> Checking the qtl.def file for the calculation that _did_ work, I can see 
> that the line specifying '/scratch/chrsop/VO2.vectordn' is _also_ 
> present here, so this is not to blame. This leaves me baffled as to what 
> the error can be - as far as I can tell, I am trying to perform the 
> exact same calculation for different systems. I thought maybe 
> insufficient scratch storage could be to blame, but this would most 
> likely show up in the 'run_lapw' cycles (I believe).
> 
> 
> [3] I am posting here the difference between qtlpara and lapw2para:
> 
> $ grep "single" $WIENROOT/qtlpara_lapw
>      testinput .processes single
>      $ grep "single" $WIENROOT/lapw2para_lapw
>      testinput .processes single
>      single:
>      echo "running in single mode"
> 
> ... if this is wrong, I kindly request advice on how to fix it, so I can 
> pass it on to our software maintenance guy. If there's anything else I 
> can try please let me know.
> 
> Best regards
> Christian
> 
> 
> ------------------------------------------------------------------------
> *Fra:* Wien <wien-bounces at zeus.theochem.tuwien.ac.at> på vegne af Gavin 
> Abo <gsabo at crimson.ua.edu>
> *Sendt:* 21. oktober 2020 07:02:01
> *Til:* wien at zeus.theochem.tuwien.ac.at
> *Emne:* Re: [Wien] qtl: error reading parallel vectors
> 
> I'm not sure about the physics of the following WIEN2k 19.2 parallel 
> calculation (with all patches at [1] applied), but mechanically the "x 
> qtl -p -telnes" seems to have run without error.
> 
> 
> I typically have SCRATCH in my .bashrc set to "./" but used another 
> location "/home/username/wiendata/scratch" as seen below.  Does a simple 
> k-point parallel calculation like the one below work on your system?  I 
> haven't tried mpi parallel yet.  On the other hand, I have noticed a 
> possible issue that if one forgets to setup a .machines file and tries 
> to run a parallel calculation that qtlpara_lapw seems to fail switching 
> over to the serial calculation mode as shown under [2] below.  If one 
> compares for example lapw2para_lapw and qtlpara_lapw, as illustrated by 
> [3] below, the qtlpara_lapw may be missing some additional code that 
> could be needed to get that to work.
> 
> 
> username at computername:~/wiendata/diamond$ grep SCRATCH ~/.bashrc
> export SCRATCH=/home/username/wiendata/scratch
> username at computername:~/wiendata/diamond$ ls
> diamond.struct
> username at computername:~/wiendata/diamond$ init_lapw -b
> ...
>    init_lapw finished ok
> username at computername:~/wiendata/diamond$ cat .machines
> 1:localhost
> 1:localhost
> granularity:1
> extrafine:1
> username at computername:~/wiendata/diamond$ run_lapw -p
> ...
> in cycle 11    ETEST: .0001457550000000   CTEST: .0033029
> hup: Command not found.
> STOP  LAPW0 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP LAPW2 - FERMI; weights written
> STOP  LAPW2 END
> STOP  LAPW2 END
> STOP  SUMPARA END
> STOP  CORE  END
> STOP  MIXER END
> ec cc and fc_conv 1 1 1
> 
>  >   stop
> username at computername:~/wiendata/diamond$ cp 
> $WIENROOT/SRC_templates/case.innes diamond.innes
> username at computername:~/wiendata/diamond$ x qtl -p -telnes
> running QTL in parallel mode
> calculating QTL's from parallel vectors
> STOP  QTL END
> 6.4u 0.1s 0:06.59 100.0% 0+0k 0+8024io 0pf+0w
> username at computername:~/wiendata/diamond$ cat diamond.inq
> 0 2.20000000000000000000
> 1
> 1 99 1 0
> 4 0 1 2 3
> username at computername:~/wiendata/diamond$ x telnes3
> STOP TELNES3 DONE
> 3.3u 0.0s 0:03.39 99.7% 0+0k 0+96io 0pf+0w
> 
> 
> [1] https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2
> 
> <https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2>
> 	
> WIEN2k-Patches/19.2 at master · gsabo/WIEN2k-Patches · GitHub 
> <https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2>
> github.com
> Contribute to gsabo/WIEN2k-Patches development by creating an account on 
> GitHub.
> 
> 
> 
> [2] Error when qtlpara_lapw tries to switch to single mode during "x qtl 
> -p -telnes":
> 
> 
> username at computername:~/wiendata/diamond$ cat .machine
> cat: .machine: No such file or directory
> username at computername:~/wiendata/diamond$ run_lapw -p
> ...
> in cycle 11    ETEST: .0001457550000000   CTEST: .0033029
> hup: Command not found.
> STOP  LAPW0 END
> STOP  LAPW1 END
> STOP  LAPW2 END
> STOP  CORE  END
> STOP  MIXER END
> ec cc and fc_conv 1 1 1
> 
>  >   stop
> username at computername:~/wiendata/diamond$ cp 
> $WIENROOT/SRC_templates/case.innes diamond.innes
> username at computername:~/wiendata/diamond$ x qtl -p -telnes
> single: label not found.
> 0.0u 0.0s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
> error: command   /home/username/WIEN2k/qtlpara qtl.def   failed
> 
> 
> [3] Grep difference between qtlpara_lapw and lapw2para_lapw:
> 
> 
> username at computername:~/wiendata/diamond$ grep "single" 
> $WIENROOT/qtlpara_lapw
> testinput .processes single
> username at computername:~/wiendata/diamond$ grep "single" 
> $WIENROOT/lapw2para_lapw
> testinput .processes single
> single:
> echo "running in single mode"
> 
> 
> On 10/20/2020 12:24 PM, Christian Søndergaard Pedersen wrote:
> 
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:  http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
> 

-- 
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WIEN2k: http://www.wien2k.at
WWW: 
http://www.imc.tuwien.ac.at/tc_blaha------------------------------------------------------------------------- 



More information about the Wien mailing list