[Wien] Error in parallel lapw1
ROBERTO LUIS IGLESIAS PASTRANA
roberto at uniovi.es
Thu Oct 16 10:13:14 CEST 2008
Hello all!
I'm trying to do a parallel calculation in a test system, which needs spin polarization. The machine is a 50 dual-node Xeon processor cluster. The Wien2k version is 08_3. I've searched the mailing list for problems similar to mine, but the few possible solutions I found do not help.
I use a small PBS script, based on the one appearing in the FAQ. If you need to take a look at it, let me know. The error is always:
'INILPW' - can't open unit: 18
'INILPW' - filename: test.vsp
'INILPW' - status: old form: formatted
'LAPW1' - INILPW aborted unsuccessfully.
The files test.vspup, test.vspdn and test.vsp_st are there and not empty. I don't see anything peculiar in them.
There is a post in the mailing list that states that this problem got solved wuth a new .machines file, since the original one had only one node:
http://zeus.theochem.tuwien.ac.at/pipermail/wien/2008-September/011427.html
This is not my case, I think. My .machines file is:
1:yed16:1
1:yed16:1
1:yed15:1
1:yed15:1
1:yed14:1
1:yed14:1
1:yed13:1
1:yed13:1
granularity:1
extrafine:1
which from testpara_lapw would give:
[iglesias at jaula02 test]$ testpara_lapw
#####################################################
# TESTPARA #
#####################################################
Test: LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine set
klist: 500
machines: yed16 yed16 yed15 yed15 yed14 yed14 yed13 yed13
procs: 8
weigh(old): 1 1 1 1 1 1 1 1
sumw: 8
granularity: 1
weigh(new): 62 62 62 62 62 62 62 62
Distribution of k-point (under ideal conditions)
will be:
1 : yed16(62) 62k
2 : yed16(62) 62k
3 : yed15(62) 62k
4 : yed15(62) 62k
5 : yed14(62) 62k
6 : yed14(62) 62k
7 : yed13(62) 62k
8 : yed13(62) 62k
9 : yed16(62) 1k
10 : yed16(62) 1k
11 : yed15(62) 1k
12 : yed15(62) 1k
It would therefore process 62 k-points in each of the 8 nodes (496 processes) and the remaining 4 one on each node whenever thay are free.
Without using the script, that is, issuing runsp_lapw -p from a terminal window, there are no problems. That should make use of the former .machines file. With the script the errors are as the one above for each lapw1 that is distributed to the nodes. In that case:
[iglesias at jaula02 test]$ testpara1_lapw
#####################################################
# TESTPARA1 #
#####################################################
Wed Oct 15 18:53:22 CEST 2008
lapw1para exited due to an ERROR
Check *.error files
If I try from the terminal window:
[iglesias at jaula02 test]$ lapw1 -p lapw1.def
LAPW1 - Error
simply.
On doing:
x lapw1 -p
it crashes again on each node, successively.
Any suggestions? Thanks a lot in advance for your help!
Cheers
Roberto
-------------- next part --------------
A non-text attachment was scrubbed...
Name: roberto.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for ROBERTO LUIS IGLESIAS PASTRANA <roberto at uniovi.es>
Url : http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20081016/67749212/roberto.bin
More information about the Wien
mailing list