[Wien] Cannot run kpoint parallel jobs - only serial.
Robert Nichol
nichol.18 at buckeyemail.osu.edu
Wed May 29 21:58:27 CEST 2013
Dear community.
I've been utilizing WIEN for a little while now and have spent considerable time trying to get the code to work on the Ohio Super Computing Center's ( PBS queue system) Oakley Cluster.
If I submit the script for k-point parallelization lapw2 to crashes.
If I submit the script for mpi parallelization, either I get an error in lapw0, or lapw0 hangs indefinitely.
It is worth mentioning that OSC utilizes MVAPICH2 which is an OSU creation derived from MPICH
**************************************
[osu6811 at oakley02 ti-c-24]$ module avail mvapich
---------------------------------------- /usr/local/share/lmodfiles/intel-12.1 -----------------------------------------
mvapich2/1.7 (default) mvapich2/1.9 mvapich2/1.9a2-profiling
mvapich2/1.7-r5140 mvapich2/1.9-profiling mvapich2/1.9b
mvapich2/1.7-r5140-hwloc mvapich2/1.9a mvapich2/1.9b-profiling
mvapich2/1.8-r5668 mvapich2/1.9a2 mvapich2/1.9rc1
mvapich2/1.8.1 mvapich2/1.9a2-dbg
---------------------------------------- /usr/local/share/lmodfiles/modulefiles ----------------------------------------
hpctoolkit/5.3.2-r3950-mvapich2-1.7 hpctoolkit/5.3.2-r3950-mvapich2-1.9 (default)
hpctoolkit/5.3.2-r3950-mvapich2-1.8
**************************************
This will be a relatively long email as I attempted to be thorough. I have attached my
I have run serial jobs that give reasonable results.
below is my script for k-point parallel jobs
**
Attached
**
example .machines file generated by kp.pbs
**
Attached
**
47 k points in case.klist and 47 cores in the .machines file.
The lapw0 and lapw1_1 - lapw1_47 error files are all empty.
**************************************
[osu6811 at oakley02 ti-c-24]$ cat lapw2.error
Error in LAPW2
'LAPW2' - can't open unit: 30
'LAPW2' - filename: ti-c-24.energy_11
** testerror: Error in Parallel LAPW2
[osu6811 at oakley02 ti-c-24]$
**************************************
not only is case.energy_11 missing, but so is case.energy12 and every 12th case.energy file after those (11/12/23/24/35/36/47 are all missing.)
contents of a case.dayfile
**
Attached
**
Interesting to note this line: running lapw0 in single mode
Apparently lapw0 is not run in parallel and we may own the successful termination of lapw0 to that fact.
Information on the login node
**************************************
[osu6811 at oakley01 ~]$ lsb_release -a
LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.3 (Santiago)
Release: 6.3
Codename: Santiago
[osu6811 at oakley01 ~]$ uname -a
Linux oakley01.osc.edu<http://oakley01.osc.edu> 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[osu6811 at oakley01 ~]$ cat /proc/version
Linux version 2.6.32-220.7.1.el6.x86_64 (mockbuild at x86-002.build.bos.redhat.com<mailto:mockbuild at x86-002.build.bos.redhat.com>) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Fri Feb 10 15:22:22 EST 2012
[osu6811 at oakley01 ~]$ ifort -v
ifort version 12.1.4
[osu6811 at oakley01 ~]$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
[osu6811 at oakley01 ~]$
**************************************
I am running WIEN2k version 12.1
My .bash_profile
**************************************
[osu6811 at oakley02 ~]$ cat .bash_profile
[ -f .bashrc ] && source .bashrc
[osu6811 at oakley02 ~]$
**************************************
I have attached my .bashrc
Thank you for taking the time. I apologize in advance for the bulk of this email.
Appreciatively,
Robert Nichol
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kp.pbs
Type: application/octet-stream
Size: 576 bytes
Desc: kp.pbs
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.dll>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ti-c-24.dayfile
Type: application/octet-stream
Size: 4779 bytes
Desc: ti-c-24.dayfile
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment-0001.dll>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bashrc.txt
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: machines.txt
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment-0001.txt>
More information about the Wien
mailing list