[Wien] Problem with parallel LAPW1
Laurence Marks
L-marks at northwestern.edu
Thu Sep 27 10:07:45 CEST 2012
The danger of commenting out the line is, as you found (I think), the
possibility of an infinite loop; hence I only recommend commenting it
out for debugging purposes.
It looks like you have not setup password-less ssh and/or do not have
ssh properly setup on your cluster/computer. Maybe you need to
install/reinstall openssh and/or set it up, see for instance
http://www.linuxproblem.org/art_9.html or do a search on "passwordless
ssh".
N.B., you might also have some issues with how you have defined
"remote" during installation.
On Thu, Sep 27, 2012 at 2:56 AM, Reza Mahani <kh.mahani at gmail.com> wrote:
>
> Thanks Gavin.
> I tried commenting that line but it got stuck in lapw1 loop and it gave the
> following error:
>
> LAPW0 END
> LAPW0 END
> LAPW0 END
> LAPW0 END
> ssh_askpass: exec(/usr/libexec/openssh/gnome-ssh-askpass): No such file or
> directory^M
> Host key verification failed.^M
>
> Regards
> Reza
>
>
>
>
> On Wed, Sep 26, 2012 at 6:21 PM, Gavin Abo <gsabo at crimson.ua.edu> wrote:
>>
>> You might try commenting
>>
>> !call W2kinit
>>
>> in SRC_lapw1/lapw1.F (line 34 Wien2k 12.1) and recompile as was mentioned:
>>
>> http://zeus.theochem.tuwien.ac.at/pipermail/wien/2012-August/017575.html
>>
>>
>> On 9/26/2012 10:03 AM, Reza Mahani wrote:
>>
>>
>> Hi Prof Blaha and wien2k users,
>>
>> I have recently installed wien2k 12.1 without any errors and I was running
>> a test job with it inorder to check the parallel mode. It gave me the
>> following errors:
>> dayfile content:
>>
>> Calculating GaAs in /lunarc/nobackup/users/reza/WIEN2k/test/GaAs
>> on an010 with PID 21141
>> using WIEN2k_12.1 (Release 22/7/2012) in
>> /lunarc/nobackup/users/reza/Wien2k_12.1
>>
>>
>> start (Wed Sep 26 10:19:05 CEST 2012) with lapw0 (100/99 to go)
>>
>> cycle 1 (Wed Sep 26 10:19:05 CEST 2012) (100/99 to go)
>>
>> > lapw0 -p (10:19:05) starting parallel lapw0 at Wed Sep 26 10:19:06
>> > CEST 2012
>> -------- .machine0 : 8 processors
>> 16.100u 2.127s 0:06.00 303.6% 0+0k 132168+24680io 202pf+0w
>> > lapw1 -c -up -p (10:19:12) starting parallel lapw1 at Wed Sep 26
>> > 10:19:12 CEST 2012
>> -> starting parallel LAPW1 jobs at Wed Sep 26 10:19:12 CEST 2012
>> running LAPW1 in parallel mode (using .machines)
>> 1 number_of_parallel_jobs
>> an010 an010 an010 an010 an010 an010 an010 an010(120) Child id
>> 3 SIGSEGV, contact developers
>> Child id 0 SIGSEGV, contact developers
>> Child id 7 SIGSEGV, contact developers
>> Child id 1 SIGSEGV, contact developers
>> Child id 2 SIGSEGV, contact developers
>> Child id 6 SIGSEGV, contact developers
>> Child id 5 SIGSEGV, contact developers
>> Child id 4 SIGSEGV, contact developers
>> 0.341u 0.463s 0:01.42 56.3% 0+0k 1976+5760io 47pf+0w
>> Summary of lapw1para:
>> an010 k=0 user=0 wallclock=0
>> 0.423u 0.884s 0:03.75 34.6% 0+0k 2496+6152io 53pf+0w
>> > lapw1 -c -dn -p (10:19:16) starting parallel lapw1 at Wed Sep 26
>> > 10:19:16 CEST 2012
>> -> starting parallel LAPW1 jobs at Wed Sep 26 10:19:16 CEST 2012
>> running LAPW1 in parallel mode (using .machines.help)
>> 1 number_of_parallel_jobs
>> an010 an010 an010 an010 an010 an010 an010 an010(120) Child id
>> 7 SIGSEGV, contact developers
>> Child id 2 SIGSEGV, contact developers
>> Child id 0 SIGSEGV, contact developers
>> Child id 4 SIGSEGV, contact developers
>> Child id 1 SIGSEGV, contact developers
>> Child id 6 SIGSEGV, contact developers
>> Child id 3 SIGSEGV, contact developers
>> Child id 5 SIGSEGV, contact developers
>> 0.123u 0.130s 0:01.18 21.1% 0+0k 0+1448io 15pf+0w
>> Summary of lapw1para:
>> an010 k=0 user=0 wallclock=0
>> 0.209u 0.545s 0:03.70 20.0% 0+0k 0+1832io 15pf+0w
>> > lapw2 -c -up -p (10:19:20) running LAPW2 in parallel mode
>> ** LAPW2 crashed!
>>
>>
>>
>> job.err content:
>>
>> LAPW0 END
>> LAPW0 END
>> epl: Subscript out of range.
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> w2k_dispatch_signal(): received: Segmentation fault
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
>> with errorcode 80.
>>
>> Could you please tell me what causes this problem?
>>
>> Regards
>> Reza
>>
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>>
>>
>> _______________________________________________
>> Wien mailing list
>> Wien at zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>>
>
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi
More information about the Wien
mailing list