[Wien] error in k-point parallel execution

Stefaan Cottenier Stefaan.Cottenier at fys.kuleuven.ac.be
Thu Jul 29 08:27:19 CEST 2004


> > What I mean is: When you log in on condmat2, do you have the same home
> > directory as on localhost? You can test this with "touch file_of_zero"
> > on localhost, then see if it is also present on condmat2.
>
> I executed the command "touch file_of_zero" on both "localhost" and
> "condmat2" and no output resulted but the shell prompt.

What Torsten meant is this: the command 'touch file_of_zero' will create a
file with name 'file_of_zero' and no content (this gives you indeed no
output but the prompt). Do 'ls -l file_of_zero', to see whether this file is
created on localhost (it must). Then go to condmat2, and do 'ls -l
file_of_zero' again (without touch first). Does that same file exists also
there? If so, then localhost and condmat2 share the same directory, as it
should. If not, then wien2k will not work in parallel.

> > What do you get from "echo $SCRATCH" on localhost and condmat2?
>
> I get "./" from both pc's.

If the directories are indeed shared, ./ should be OK. But for this types of
clusters, it is more efficient to put $SCRATCH to /tmp: the large vector
files will then be stored in the local /tmp of each machine, and will not
fill the single disk of localhost (but don't forget to clean the /tmp of
each machine from time to time...)

> > What do you get from "echo $path" on localhost and condmat2?
>
> I get nothing using "echo $path" but usin "env" I get:

For your bash, you need 'echo $PATH'.

> > When you log in on condmat2, can you execute lapw1 manually? Try, in a
> > shell: "which lapw1", and if it gives you a response, try "lapw1". What
> > is the result?
>
> I tried "which lapw1" on both PC's and obtained:
> "/home/wien2k/.WIEN_ROOT/lapw1".
>
> I tried "run lapw1" on both PC's in the directory TiC which I had already
> worked with, and obtained:
> ------------
> ERROR: option lapw1 does not exist !

'run lapw1' does not exist. Use 'x lapw1' instead. (but with 'run lapw1' you
effictively use the 'run' command, as your output shows).

Conclusion: it seems you have installed wien2k on all machines separately,
and did not connect properly their disks. Therefore serial calculations
work, parallel ones don't. In a good set-up, you have to install wien2k only
ONCE on a parallel cluster, namely on the disk that is NFS-shared by all
machines (but I can't help you with the technical details about how to do
that).

Stefaan




More information about the Wien mailing list