[Wien] parallel wien2k

Tue Mar 2 13:39:01 CET 2010

Similar questions have come up before about mpi compilations. We need
more information to be able to help you, and there are some general
things that you have to check as well, mainly (but there may be more);
a) How was mpi (here mpich2) compiled, and what is it's version?
b) Are all the libraries being picked up (use ldd) on the child nodes?
c) Have you checked against the online mkl linking advisor
(http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
) -- I am not sure that you have.
d) What is the error message?

On Mon, Mar 1, 2010 at 8:05 PM, Zhiyong Zhang <zyzhang at stanford.edu> wrote:
> Dear All,
>
> I think I have problem with the compiler options/libraries for the parallel wien2k. I can run lapw0/1_mpi with k-point parallel mode but not with mpi. Here are the options and libraries with which I built the wien2k:
>
> RP_LIB(SCALAPACK+PBLAS): -lmkl_scalapack_lp64 -lmkl_solver_lp64_sequential -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_lp64 -Wl,--end-group -lpthread -L/home/zzhang/fftw/fftw-2.1.5/lib -lfftw_mpi -lfftw
>
> FPOPT(par.comp.options): FPOPT(par.comp.options): -FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
>
> I used fftw-2.1.5 for the parallel fftw.
>
> Does anybody see a problem with the options I used?
>
> Does anybody have a set of compiler options and libraries for working lapw0_mpi? I used mpich2 to compile the code and the architecture is Intel x86_64.
>
> Thanks in advance!
>
> Zhiyong
>
> ----- Original Message -----
> From: "Zhiyong Zhang" <zyzhang at stanford.edu>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Wednesday, February 24, 2010 5:16:23 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Dear Laurence and All,
>
> Thank you very much for the information. It has been very helpful in clarifying some of the issues. Based on your input, I was able to prepare the .machines in the correct format, I believe:
>
> .machines:
> #
> lapw0: nx59:2 nx58:2
> 1:nx59
> 1:nx59
> 1:nx58
> 1:nx58
> granularity:1
> extrafine:1
>
> and
>
> .machine0
>
> nx59
> nx59
> nx58
> nx58
>
> However, I still got the same problem in TiC.vns, which presumably resulted in the crash in lapw1para.
>
> Are there any places in the output files that I can look for clues of problem? For the same calculation, I can run the lapw0 in serial mode and lapw1 in k-point parallel mode successfully.
>
> Thanks in advance,
> Zhiyong
> ----- Original Message -----
> From: "Laurence Marks" <L-marks at northwestern.edu>
> To: "A Mailing list for WIEN2k users" <wien at zeus.theochem.tuwien.ac.at>
> Sent: Tuesday, February 23, 2010 4:55:19 AM GMT -08:00 US/Canada Pacific
> Subject: Re: [Wien] parallel wien2k
>
> Several points:
>
> 1) You only use "-fc X" for a structure with variable atomic
> positions, and the TiC example has none so it will report that there
> are no forces (but this should not stop the calculation).
>
> 2) The "NaN" in your case.vns file means that something went wrong in
> the lapw0 call, which is why lapw1 is crashing. It is safer to delete
> the case.vns file.
>
> 3) Are you using single core CPU's or multicore? The normal format for
> a parallel lapw0 call (using mpi) is
>
> lapw0: nx1:2 nx62:2 -- please note the space after the ":", it often matters
>
> To do this you have to have mpi installed and have compiled lapw0_mpi.
> If you do not have it you can use
>
> lapw0: nx1
>
> This will run a serial lapw0 on nx1
>
> 4) All the above assumes that you have local control of what nodes you
> can use rather than this being controlled by a queuing system such as
> pbs. If you are using pbs or similar then you have to have a script to
> generate the .machines file since you do not know what machines to use
> (unless you are running interactively).
>
> 5) The script you have will run serial (i.e. not mpi) lapw1, 2
> k-vectors on nx1 and 2 on nx62. If you want to have these run using
> the parallel versions (i.e. lapw1_mpi) you would need to use
>
> 1:nx1:2
> 1:nx62:2
>
> (Note no space after the ":").
>
> Whether it is faster to run with 2 processors on nx1, as against 2
> different k-points will depend upon your cpu's. For a simple
> calculation such as TiC it will be hard to see much difference, but
> this can matter for larger ones. Be aware that if (for instance) you
> had 4 processors on nx1 it may be a bad idea to use
>
> 1:nx1:2
> 1:nx1:2
>
> because some variants of mpi will launch both lapw1_mpi jobs on the
> same cores (CPU_AFFINITY is often the relevant flag, but this varies
> with mpi flavor).
>
> 2010/2/22 zyzhang <zyzhang at stanford.edu>:
>> Dear All,
>>
>>
>>
>> I am trying to test wien2k in parallel mode and I got into some problem. I
>> am using
>>
>>
>>
>> run_lapw -p -i 40 -fc 0.001 –I
>>
>>
>>
>> If I use a number of 0.001 for the option fc above, I got the following
>> error:
>>
>>
>>
>> Force-convergence not possible. Forces not present.
>>
>>
>>
>> If I do not use a number for the –fc option, but use “run_lapw -p -i 40 -fc
>> –I” instead
>>
>>
>>
>> Then lapw0 finishes without a problem but the program doesn’t branch to
>> lapw1. An error message is generated when doing the test
>>
>>
>>
>> “if ($fcut == "0") goto lapw1
>>
>>
>>
>> I was able to do “run_lapw -p -i 40 –I”, without the “-fc” option at all and
>> was able to finish “lapw0 –p” and then start “lapw1 –p” but got into the
>> following error:
>>
>>
>>
>> error: command   /home/zzhang/wien2k/lapw1para lapw1.def   failed
>>
>>
>>
>> Does anybody have similar problems and know how to fix this?
>>
>>
>>
>> It does the following:
>>
>>
>>
>> running LAPW1 in parallel mode (using .machines)
>>
>>
>>
>> and the .machines file is as follows:
>>
>>
>>
>> #
>>
>> lapw0:nx1  nx1  nx62  nx62
>>
>> lapw1:nx1  nx1  nx62  nx62
>>
>> lapw2:nx1  nx1  nx62  nx62
>>
>> 1:nx1
>>
>> 1:nx1
>>
>> 1:nx62
>>
>> 1:nx62
>>
>> granularity:1
>>
>> extrafine:1
>>
>>
>>
>> Thanks,
>>
>> Zhiyong
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>>
>
>
>
> --
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Electron crystallography is the branch of science that uses electron
> scattering and imaging to study the structure of matter.
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.