[Wien] System configuration

Gavin Abo gsabo at crimson.ua.edu
Fri May 31 08:43:14 CEST 2019


Keep in mind that with 12 total cores [1], you might see little to no 
benefit from using mpi parallel with the computer (single node) that you 
have.

You probably saw the siteconfig message:

Do you have MPI, ScaLAPACK, ELPA, or FFTW installed and intend to run
    finegrained parallel?

    This is useful only for BIG cases (50 atoms and more / unit cell)
    and your HARDWARE has at least 16 cores (or is a cluster with 
Infiniband)

There is also the posts in the mailing list archive about the need for a 
Gb or Infiniband network for mpi parallel [2-4].

The "command not found" errors that you have are most likely because 
mpirun does not load your .bashrc environment settings for WIENROOT when 
ssh connects with a non-interactive shell login. One solution might be 
to comment out the non-interative lines [5] in your .bashrc, for example:

# If not running interactively, don't do anything
#case $- in
#    *i*) ;;
#      *) return;;
#esac

However, changing the parallel_options file settings in your case should 
be the better solution.  The file should be located in your WIENROOT 
directory.

Sorry, I had you set the values to those that are typically used for a 
cluster supercomputer [6] that mpi parallel is used on.

For your PC system, I think you should adjust parallel_options in a text 
editor (e.g., gedit) to:

if ( ! $?USE_REMOTE ) setenv USE_REMOTE 0
if ( ! $?MPI_REMOTE ) setenv MPI_REMOTE 0

or you could select Configure Parallel Execution like you did before [7] 
to have siteconfig set those by specifying:

Shared Memory Architecture? (y/N):y

 From the output in your case.dayfile, it looks like your .machines file 
is configured [8] for k-point parallel with two cores.  Probably it 
contains something like:

1:localhost
1:localhost
granularity:1
extrafine:1

To use mpi parallel, you need to change it [9].  An example of .machines 
with four cores:

1:localhost:4
granularity:1
extrafine:1

If you want dstart and lapw0 to be parallel too in addition to lapw1 and 
lapw2, then you need to adjust the .machines further according to the 
WIEN2k usersguide.

You'll have to do your own testing for your system to see if k-point or 
mpi parallel is faster [10].  Using OMP_NUM_THREAD might also be more 
beneficial than mpi having so few nodes and processors [11].

[1] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18649.html
[2] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg13632.html
[3] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg09334.html
[4] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg17970.html
[5] 
https://unix.stackexchange.com/questions/257571/why-does-bashrc-check-whether-the-current-shell-is-interactive
[6] https://en.wikipedia.org/wiki/Supercomputer
[7] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg18664.html
[8] http://www.wien2k.at/reg_user/faq/ecss_hliu_051012.pdf
[9] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00985.html
[10] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg08702.html
[11] 
https://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/msg00992.html


On 5/30/2019 12:42 PM, Indranil mal wrote:
> After following the references now getting the following error
> >   stop error
>
> grep: *scf1*: No such file or directory
> cp: cannot stat '.in.tmp': No such file or directory
> FERMI - Error
> grep: *scf1*: No such file or directory
> InBi.scf1_1: No such file or directory.
> [1]  + Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; 
> rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; if ( -f .stdout1_$loop 
> ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% 
> .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print 
> stderr " )
> [2]  - Done                          ( ( $remote $machine[$p] "cd 
> $PWD;$t $taskset0 $exe ${def}_$loop.def ;fixerror_lapw ${def}_$loop"; 
> rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop; if ( -f .stdout1_$loop 
> ) bashtime2csh.pl_lapw .stdout1_$loop > .temp1_$loop; grep \% 
> .temp1_$loop >> .time1_$loop; grep -v \% .temp1_$loop | perl -e "print 
> stderr " )
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
> bash: fixerror_lapw: command not found
> bash: lapw1c: command not found
>  LAPW0 END
>  LAPW0 END
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20190531/d0cc7713/attachment.html>


More information about the Wien mailing list