[Wien] Cannot run kpoint parallel jobs - only serial.

Robert Nichol nichol.18 at buckeyemail.osu.edu
Wed May 29 21:58:27 CEST 2013


Dear community.

I've been utilizing WIEN for a little while now and have spent considerable time trying to get the code to work on the Ohio Super Computing Center's ( PBS queue system) Oakley Cluster.
If I submit the script for k-point parallelization lapw2 to crashes.
If I submit the script for mpi parallelization, either I get an error in lapw0, or lapw0 hangs indefinitely.
It is worth mentioning  that OSC utilizes MVAPICH2 which is an OSU creation derived from MPICH

**************************************
[osu6811 at oakley02 ti-c-24]$ module avail mvapich

---------------------------------------- /usr/local/share/lmodfiles/intel-12.1 -----------------------------------------
  mvapich2/1.7 (default)      mvapich2/1.9              mvapich2/1.9a2-profiling
  mvapich2/1.7-r5140          mvapich2/1.9-profiling    mvapich2/1.9b
  mvapich2/1.7-r5140-hwloc    mvapich2/1.9a             mvapich2/1.9b-profiling
  mvapich2/1.8-r5668          mvapich2/1.9a2            mvapich2/1.9rc1
  mvapich2/1.8.1              mvapich2/1.9a2-dbg

---------------------------------------- /usr/local/share/lmodfiles/modulefiles ----------------------------------------
  hpctoolkit/5.3.2-r3950-mvapich2-1.7    hpctoolkit/5.3.2-r3950-mvapich2-1.9 (default)
  hpctoolkit/5.3.2-r3950-mvapich2-1.8
**************************************

This will be a relatively long email as I attempted to be thorough.  I have attached my

I have run serial jobs that give reasonable results.

below is my script for k-point parallel jobs
**
Attached
**

example .machines file generated by kp.pbs
**
Attached
**

 47 k points in case.klist and 47 cores in the .machines file.

The lapw0 and lapw1_1 - lapw1_47 error files are all empty.
**************************************
[osu6811 at oakley02 ti-c-24]$ cat lapw2.error
Error in LAPW2
 'LAPW2' - can't open unit: 30
 'LAPW2' -        filename: ti-c-24.energy_11
**  testerror: Error in Parallel LAPW2
[osu6811 at oakley02 ti-c-24]$
**************************************



not only is case.energy_11 missing, but so is case.energy12 and every 12th case.energy file after those (11/12/23/24/35/36/47 are all missing.)



contents of  a case.dayfile
**
Attached
**

Interesting to note this line: running lapw0 in single mode
Apparently lapw0 is not run in parallel and we may own the successful termination of lapw0 to that fact.




Information on the login node
**************************************
[osu6811 at oakley01 ~]$ lsb_release -a
LSB Version:    :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 6.3 (Santiago)
Release:        6.3
Codename:       Santiago
[osu6811 at oakley01 ~]$ uname -a
Linux oakley01.osc.edu<http://oakley01.osc.edu> 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[osu6811 at oakley01 ~]$ cat /proc/version
Linux version 2.6.32-220.7.1.el6.x86_64 (mockbuild at x86-002.build.bos.redhat.com<mailto:mockbuild at x86-002.build.bos.redhat.com>) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Fri Feb 10 15:22:22 EST 2012
[osu6811 at oakley01 ~]$ ifort -v
ifort version 12.1.4
[osu6811 at oakley01 ~]$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
[osu6811 at oakley01 ~]$
**************************************



I am running WIEN2k version 12.1




My .bash_profile
**************************************
[osu6811 at oakley02 ~]$ cat .bash_profile
[ -f .bashrc ] && source .bashrc
[osu6811 at oakley02 ~]$
**************************************

I have attached my .bashrc

Thank you for taking the time.  I apologize in advance for the bulk of this email.


Appreciatively,

Robert Nichol


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kp.pbs
Type: application/octet-stream
Size: 576 bytes
Desc: kp.pbs
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.dll>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ti-c-24.dayfile
Type: application/octet-stream
Size: 4779 bytes
Desc: ti-c-24.dayfile
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment-0001.dll>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bashrc.txt
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: machines.txt
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130529/bbd35ca7/attachment-0001.txt>


More information about the Wien mailing list