[Wien] lapw1para error while running k-point parallel calculation
Peter Blaha
pblaha at theochem.tuwien.ac.at
Thu Nov 27 14:14:01 CET 2008
It seems that you do not have a proper environment when doing the ssh hostname ....
a) Are you sure the names "localhost" work properly ? Usually you should put there the correct hostname so that you can do
ssh hostname echo $WIENROOT
b) do you get the proper directory from the above command ? Your basic error seems to be:
12778bash: lapw1c: command not found
c) can you do:
ssh hostname
cd case_dir (where your files are)
x lapw1
d) the parallel lapw1 works like the above, but does it at once:
ssh hostname;cd $PWD;lapw1 lapw1.def
ROBERTO LUIS IGLESIAS PASTRANA schrieb:
> Hello all!
> Iḿ trying to set k-point parallelism up and running in my computer, which has an Intel (R) Core(TM)2 Quad Q9300 @2.50GHz CPU, runs Ubuntu 8.10, using ifort 11.0.069 and mkl libraries 10.1.0.015, and Wien2k_08.3 version. I tried it first with test-case from the benchmarking Wien2k web page. I wanted to do a benchmarking such as the one in the thread starting from:
> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2008-August/011238.html
> I wrote the following .machines file for my 4 processors:
> granularity:11:localhost1:localhost1:localhost1:localhostextrafine:1
> When running x lapw1 -p I get the following error:
> titin at titin-desktop:~/Programas/WIEN2k/titin/benchmark/test_case$ x lapw1 -pstarting parallel lapw1 at jue nov 27 13:33:33 CET 2008-> starting parallel LAPW1 jobs at jue nov 27 13:33:33 CET 2008running LAPW1 in parallel mode (using .machines)4 number_of_parallel_jobs[1] 12778bash: lapw1c: command not foundbash: fixerror_lapw: command not found[1] Done ( ( $remote $machine[$p] ... localhost(1) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w** LAPW1 crashed!cat: No match.0.100u 0.160s 0:02.97 8.7% 0+0k 0+248io 0pf+0werror: command /home/titin/Programas/WIEN2k/lapw1cpara -c lapw1.def failed
> Digging in Wien2k ML files, I did not find any problem exactly as mine. There were some posts regarding the correct linking in WIEN2k ROOT directory, therefore I checked:
> titin at titin-desktop:~/Programas/WIEN2k$ ls -alsp lapw1*11596 -rwxr-xr-x 1 titin titin 11857076 2008-11-20 19:18 lapw111492 -rwxr-xr-x 1 titin titin 11747349 2008-11-20 19:18 lapw1c 0 lrwxrwxrwx 1 titin titin 9 2008-11-18 19:24 lapw1cpara -> lapw1para 0 lrwxrwxrwx 1 titin titin 14 2008-11-18 19:24 lapw1para -> lapw1para_lapw 20 -rwxr-xr-x 1 titin titin 16661 2008-11-18 19:24 lapw1para_lapw
> I think this means the links to the parallel versions are OK, doesn't it?
> I also thought the problem may be due to the fact that test_case had only one k-point in its *.klist file, as suggested by Peter in the above mentioned thread
> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2008-August/011266.html
> Then I decided to try for a bccFe unit cell. The error was multiplied by 4 in this case:
> titin at titin-desktop:~/Programas/WIEN2k/titin/benchmark/bccFe$ x lapw0 -pstarting parallel lapw0 at jue nov 27 13:11:34 CET 2008-------- .machine0 : processors
> running lapw0 in single mode LAPW0 END1.448u 0.108s 0:01.55 99.3% 0+0k 16+448io 0pf+0wtitin at titin-desktop:~/Programas/WIEN2k/titin/benchmark/bccFe$ x lapw1 -pstarting parallel lapw1 at jue nov 27 13:11:52 CET 2008-> starting parallel LAPW1 jobs at jue nov 27 13:11:52 CET 2008running LAPW1 in parallel mode (using .machines)4 number_of_parallel_jobs[1] 12297[2] 12317[3] 12337bash: lapw1: command not foundbash: fixerror_lapw: command not foundbash: lapw1:command not foundbash: fixerror_lapw: command not found[2] - Done ( ( $remote $machine[$p] ...[1] - Done ( ( $remote $machine[$p] ...[4] 12401bash: lapw1: command not foundbash: fixerror_lapw: command not foundbash: lapw1: command not foundbash: fixerror_lapw:command not found[4] - Done ( ( $remote $machine[$p] ...[3] + Done ( ( $remote $machine[$p] ...[1] 12466[2] 12486bash: lapw1: command not foundbash: fixerror_lapw:
command not found[1] - Done ( ( $remote $machine[$p] ...bash: lapw1: command not foundbash: fixerror_lapw: command not found[2] Done ( ( $remote $machine[$p] ... localhost(62) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w localhost(62) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w localhost(62) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w localhost(62) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w localhost(1) 0.000u 0.000s 0.00 0.00% 0+0k 0+0io 0pf+0w localhost(1) 0.004u 0.000s 0.00 400.00% 0+0k 0+0io 0pf+0w** LAPW1 crashed!cat: No match.0.276u 0.228s 0:10.02 4.8% 0+0k 128+992io 1pf+0werror: command /home/titin/Programas/WIEN2k/lapw1para lapw1.def failed
> Could this have something to do with communication between the four CPUs? I first thought it could be due to passwordless ssh login failure, but issuing:
> titin at titin-desktop:~$ ssh titin-desktopLinux titin-desktop 2.6.27-10-generic #1 SMP Fri Nov 21 19:19:18 UTC 2008 x86_64
> The programs included with the Ubuntu system are free software;the exact distribution terms for each program are described in theindividual files in /usr/share/doc/*/copyright.
> Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted byapplicable law.
> To access official Ubuntu documentation, please visit:http://help.ubuntu.com/Last login: Thu Nov 27 13:07:11 2008 from localhost
> seems to get through correctly.
> Maybe I'm asking something rather trivial, but I can't find a solution. Does somebody have any idea? I would be very glad to welcome suggestions. Please don't hesitate to let me know if you need some other infos.
> Have a nice day!
> Roberto_______________________________________________Wien mailing listWien at zeus.theochem.tuwien.ac.athttp://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-15671 FAX: +43-1-58801-15698
Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------
More information about the Wien
mailing list