[Wien] qtl: error reading parallel vectors
Laurence Marks
laurence.marks at gmail.com
Sun Oct 25 02:46:16 CEST 2020
I think you are doing something wrong in your job submission. I suggest
that you talk to your sysadmin, as there are too many ways for your
calculations to have gone wrong. It may take weeks or more of people on the
list guessing.
It should be possible to assign nodes interactively and have them available
in .machines. Your response that the simple commands fail with "pbsssh:
command not found" is very odd. The command "x lapw0 -p" is a very basic
one, and if this fails for multiple cores something is very wrong.
---
Prof Laurence Marks
"Research is to see what everyone else has seen, and to think what nobody
else has thought", Albert Szent-Gyorgi
www.numis.northwestern.edu
On Sat, Oct 24, 2020, 15:30 Christian Søndergaard Pedersen <chrsop at dtu.dk>
wrote:
> Hello Gavin
>
>
> Thanks for your reply, and apologies for my tardiness.
>
>
> [1] All my calculations are run in MPI-parallel on our HPC cluster. I
> cannot execute any 'x lapw[0,1,2] -p' command in the terminal (on the
> cluster login node); this results in 'pbsssh: command not found'. However,
> submitting via the SLURM workload manager works fine. In all my submit
> scripts, I specify 'setenv SCRATCH /scratch/$USER', which is the proper
> location of scratch storage on our HPC cluster.
>
>
> [2] Without having tried your example for diamond, I can report that
> 'run_lapw -p' followed by 'x qtl -p -telnes' works without problems for a
> single cell of Vanadium dioxide. However, for other systems I get the error
> I specified. The other systems (1) are larger, and (2) use two CPU's
> instead of a single CPU (.machines file are modified suitably).
>
> Checking the qtl.def file for the calculation that _did_ work, I can see
> that the line specifying '/scratch/chrsop/VO2.vectordn' is _also_ present
> here, so this is not to blame. This leaves me baffled as to what the error
> can be - as far as I can tell, I am trying to perform the exact same
> calculation for different systems. I thought maybe insufficient scratch
> storage could be to blame, but this would most likely show up in the
> 'run_lapw' cycles (I believe).
>
>
> [3] I am posting here the difference between qtlpara and lapw2para:
>
> $ grep "single" $WIENROOT/qtlpara_lapw
> testinput .processes single
> $ grep "single" $WIENROOT/lapw2para_lapw
> testinput .processes single
> single:
> echo "running in single mode"
>
> ... if this is wrong, I kindly request advice on how to fix it, so I can
> pass it on to our software maintenance guy. If there's anything else I can
> try please let me know.
>
> Best regards
> Christian
>
>
> ------------------------------
> *Fra:* Wien <wien-bounces at zeus.theochem.tuwien.ac.at> på vegne af Gavin
> Abo <gsabo at crimson.ua.edu>
> *Sendt:* 21. oktober 2020 07:02:01
> *Til:* wien at zeus.theochem.tuwien.ac.at
> *Emne:* Re: [Wien] qtl: error reading parallel vectors
>
>
> I'm not sure about the physics of the following WIEN2k 19.2 parallel
> calculation (with all patches at [1] applied), but mechanically the "x qtl
> -p -telnes" seems to have run without error.
>
>
> I typically have SCRATCH in my .bashrc set to "./" but used another
> location "/home/username/wiendata/scratch" as seen below. Does a simple
> k-point parallel calculation like the one below work on your system? I
> haven't tried mpi parallel yet. On the other hand, I have noticed a
> possible issue that if one forgets to setup a .machines file and tries to
> run a parallel calculation that qtlpara_lapw seems to fail switching over
> to the serial calculation mode as shown under [2] below. If one compares
> for example lapw2para_lapw and qtlpara_lapw, as illustrated by [3] below,
> the qtlpara_lapw may be missing some additional code that could be needed
> to get that to work.
>
>
> username at computername:~/wiendata/diamond$ grep SCRATCH ~/.bashrc
> export SCRATCH=/home/username/wiendata/scratch
> username at computername:~/wiendata/diamond$ ls
> diamond.struct
> username at computername:~/wiendata/diamond$ init_lapw -b
> ...
> init_lapw finished ok
> username at computername:~/wiendata/diamond$ cat .machines
> 1:localhost
> 1:localhost
> granularity:1
> extrafine:1
> username at computername:~/wiendata/diamond$ run_lapw -p
> ...
> in cycle 11 ETEST: .0001457550000000 CTEST: .0033029
> hup: Command not found.
> STOP LAPW0 END
> STOP LAPW1 END
> STOP LAPW1 END
> STOP LAPW2 - FERMI; weights written
> STOP LAPW2 END
> STOP LAPW2 END
> STOP SUMPARA END
> STOP CORE END
> STOP MIXER END
> ec cc and fc_conv 1 1 1
>
> > stop
> username at computername:~/wiendata/diamond$ cp
> $WIENROOT/SRC_templates/case.innes diamond.innes
> username at computername:~/wiendata/diamond$ x qtl -p -telnes
> running QTL in parallel mode
> calculating QTL's from parallel vectors
> STOP QTL END
> 6.4u 0.1s 0:06.59 100.0% 0+0k 0+8024io 0pf+0w
> username at computername:~/wiendata/diamond$ cat diamond.inq
> 0 2.20000000000000000000
> 1
> 1 99 1 0
> 4 0 1 2 3
> username at computername:~/wiendata/diamond$ x telnes3
> STOP TELNES3 DONE
> 3.3u 0.0s 0:03.39 99.7% 0+0k 0+96io 0pf+0w
>
>
> [1] https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2
> <https://urldefense.com/v3/__https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2__;!!Dq0X2DkFhyF93HkjWTBQKhk!HnO7YK88PsHaa54xa5ASPZYPlOBOjdZoteJKx-B9H6XjTztBItejXIR8aRUf7sogxXHnPw$>
>
> <https://urldefense.com/v3/__https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2__;!!Dq0X2DkFhyF93HkjWTBQKhk!HnO7YK88PsHaa54xa5ASPZYPlOBOjdZoteJKx-B9H6XjTztBItejXIR8aRUf7sogxXHnPw$>
> WIEN2k-Patches/19.2 at master · gsabo/WIEN2k-Patches · GitHub
> <https://urldefense.com/v3/__https://github.com/gsabo/WIEN2k-Patches/tree/master/19.2__;!!Dq0X2DkFhyF93HkjWTBQKhk!HnO7YK88PsHaa54xa5ASPZYPlOBOjdZoteJKx-B9H6XjTztBItejXIR8aRUf7sogxXHnPw$>
> github.com
> Contribute to gsabo/WIEN2k-Patches development by creating an account on
> GitHub.
>
>
> [2] Error when qtlpara_lapw tries to switch to single mode during "x qtl
> -p -telnes":
>
>
> username at computername:~/wiendata/diamond$ cat .machine
> cat: .machine: No such file or directory
> username at computername:~/wiendata/diamond$ run_lapw -p
> ...
> in cycle 11 ETEST: .0001457550000000 CTEST: .0033029
> hup: Command not found.
> STOP LAPW0 END
> STOP LAPW1 END
> STOP LAPW2 END
> STOP CORE END
> STOP MIXER END
> ec cc and fc_conv 1 1 1
>
> > stop
> username at computername:~/wiendata/diamond$ cp
> $WIENROOT/SRC_templates/case.innes diamond.innes
> username at computername:~/wiendata/diamond$ x qtl -p -telnes
> single: label not found.
> 0.0u 0.0s 0:00.01 0.0% 0+0k 0+0io 0pf+0w
> error: command /home/username/WIEN2k/qtlpara qtl.def failed
>
>
> [3] Grep difference between qtlpara_lapw and lapw2para_lapw:
>
>
> username at computername:~/wiendata/diamond$ grep "single"
> $WIENROOT/qtlpara_lapw
> testinput .processes single
> username at computername:~/wiendata/diamond$ grep "single"
> $WIENROOT/lapw2para_lapw
> testinput .processes single
> single:
> echo "running in single mode"
>
>
> On 10/20/2020 12:24 PM, Christian Søndergaard Pedersen wrote:
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
>
> https://urldefense.com/v3/__http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien__;!!Dq0X2DkFhyF93HkjWTBQKhk!HnO7YK88PsHaa54xa5ASPZYPlOBOjdZoteJKx-B9H6XjTztBItejXIR8aRUf7sqXggOxWw$
> SEARCH the MAILING-LIST at:
> https://urldefense.com/v3/__http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html__;!!Dq0X2DkFhyF93HkjWTBQKhk!HnO7YK88PsHaa54xa5ASPZYPlOBOjdZoteJKx-B9H6XjTztBItejXIR8aRUf7so6BN45lw$
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20201024/b824f86f/attachment.htm>
More information about the Wien
mailing list