[Wien] System configuration
Gavin Abo
gsabo at crimson.ua.edu
Fri May 31 08:43:14 CEST 2019
Keep in mind that with 12 total cores [1], you might see little to no
benefit from using mpi parallel with the computer (single node) that you
have.
You probably saw the siteconfig message:
Do you have MPI, ScaLAPACK, ELPA, or FFTW installed and intend to run
finegrained parallel?
This is useful only for BIG cases (50 atoms and more / unit cell)
and your HARDWARE has at least 16 cores (or is a cluster with
Infiniband)
There is also the posts in the mailing list archive about the need for a
Gb or Infiniband network for mpi parallel [2-4].
The "command not found" errors that you have are most likely because
mpirun does not load your .bashrc environment settings for WIENROOT when
ssh connects with a non-interactive shell login. One solution might be
to comment out the non-interative lines [5] in your .bashrc, for example:
# If not running interactively, don't do anything
#case $- in
# *i*) ;;
# *) return;;
#esac
However, changing the parallel_options file settings in your case should
be the better solution. The file should be located in your WIENROOT
directory.
Sorry, I had you set the values to those that are typically used for a
cluster supercomputer [6] that mpi parallel is used on.
For your PC system, I think you should adjust parallel_options in a text
editor (e.g., gedit) to:
if ( ! $?USE_REMOTE ) setenv USE_REMOTE 0
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0
or you could select Configure Parallel Execution like you did before [7]
to have siteconfig set those by specifying:
Shared Memory Architecture? (y/N):y
From the output in your case.dayfile, it looks like your .machines file
is configured [8] for k-point parallel with two cores. Probably it
contains something like:
1:localhost
1:localhost
granularity:1
extrafine:1
To use mpi parallel, you need to change it [9]. An example of .machines
with four cores:
1:localhost:4
granularity:1
extrafine:1
If you want dstart and lapw0 to be parallel too in addition to lapw1 and
lapw2, then you need to adjust the .machines further according to the
WIEN2k usersguide.
You'll have to do your own testing for your system to see if k-point or
mpi parallel is faster [10]. Using OMP_NUM_THREAD might also be more
beneficial than mpi having so few nodes and processors [11].
[1]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18649.html
[2]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
[3]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html
[4]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17970.html
[5]
https://unix.stackexchange.com/questions/257571/why-does-bashrc-check-whether-the-current-shell-is-interactive
[6] https://en.wikipedia.org/wiki/Supercomputer
[7]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18664.html
[8] http://www.wien2k.at/reg_user/faq/ecss_hliu_051012.pdf
[9]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00985.html
[10]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08702.html
[11]
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00992.html
On 5/30/2019 12:42 PM, Indranil mal wrote:
> After following the references now getting the following error
> > stop error
>
> grep: *scf1*: No such file or directory
> cp: cannot stat '.in.tmp': No such file or directory
> FERMI - Error
> grep: *scf1*: No such file or directory
> InBi.scf1_1: No such file or directory.
> [1] + Done ( ( $remote $machine[$p] "cd
> $PWD;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop";
> rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; if ( -f .stdout1_$loop
> ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \%
> .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print
> stderr " )
> [2] - Done ( ( $remote $machine[$p] "cd
> $PWD;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop";
> rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; if ( -f .stdout1_$loop
> ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \%
> .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print
> stderr " )
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> LAPW0 END
> LAPW0 END
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190531/d0cc7713/attachment.html>
More information about the Wien
mailing list