[Wien] error in k-point parallel execution
Mahmoud Payami
mpayami at aeoi.org.ir
Wed Jul 28 13:33:53 CEST 2004
Dear Dr. Andersen,
>
> I would need to see your *lapw1*.error files, unless they just say
> "error in lapw1", but I would suspect they contain more information
> since the one on your initial host finished.
Here is the "lapw1.error" content:
--------------
** Error in Parallel LAPW1
** LAPW1 STOPPED at Wed Jul 28 14:06:03 EDT 2004
** check ERROR FILES!
--------------------------
There exist only one other "lapw1_1.error" which is empty, and no other
"lapw1_x.error" (x=2,3,4,5).
>
> Do you have a common nfs-mounted home directory?
>
I am novice in linux and do not understand its meaning. But I have installed
"nfs-utils" package on all pc's. Could you please let me know what should I
do in order to meet this condition?
> Is the scratch directory in the same location on all machines?
I have installed wien in all nodes with the same specifications.
>
> Have you configured $remote in wien2k to ssh?
Yes. I have chosen "ssh" in the "siteconfig_lapw" step.
> Can we see your .machines file?
Here is the ".machines" file content:
-----------
1:localhost
1:condmat2
1:condmat3
1:condmat4
1:condmat5
granularity:1
extrafine:1
-----------------
Kind regards,
Mahmoud Payami
> Best regards,
> Torsten Andersen.
>
> Mahmoud Payami wrote:
> > Dear Dr. Torsten Andersen,
> >
> > Thank you very much for your comment. I have reconfigured the hosts and
> > passwordless ssh is possible from master to nodes and vice versa.
> > I receive more or less the same error:
> >
> > ----------------------------
> > LAPW0 END
> > LAPW1 END
> > 0.39user 0.04system 0:00.43elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+5918minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+204minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+202minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+202minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+204minor)pagefaults 0swaps
> >
> > ---------------------------------------------
> > My own analysis based on your comment is that only the part dedicated to
> > localhost is performed without any problem but the remote hosts did not
> > contribute.
> > I checked the time spent for a password-less ssh to a remote host is
about
> > 10 seconds. Could it be some "timeout" error? If yes, how can it be
fixed?
> >
> > Thank you in advance.
> >
> > Kind regards,
> > Mahmoud Payami
> >
> >
> >
> >
> >
> >
> >
> >>Dear Mr. Payami,
> >>
> >>you can only use nodes for which a key exist in the list of known hosts.
> >>Otherwise it will exit at the prompt for password.
> >>
> >>Mahmoud Payami wrote:
> >>
> >>>Dear Wien Users & Developers,
> >>>
> >>>I noticed that in naming the nodes, one should not use the symbol "_".
> >>>However, when I changed the names and did not use that symbol, I
> >>>encountered the following new error in running scf:
> >>>
> >>>-------------------------------
> >>> 0inputs+0outputs (8major+195minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.10elapsed 0%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>0inputs+0outputs (11major+192minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.10elapsed 1%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>0inputs+0outputs (0major+11620minor)pagefaults 0swaps
> >>>1.21user 0.20system 0:01.42elapsed 99%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>> LAPW1 END
> >>>0inputs+0outputs (8major+196minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.08elapsed 1%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>Warning: Permanently added 'condmat1' (RSA) to the list of known hosts.
> >>
> >>Here! This Warning tells me that you have not initiated your hosts
> >
> > properly.
> >
> >>> LAPW0 END
> >>
>
--------------------------------------------------------------------------
> > -
> >
> >>>I would be grateful for any comment.
> >>>
> >>>Kindest regards,
> >>>
> >>>M. Payami
> >>>
> >>
> >>Best regards,
> >>Torsten Andersen.
> >>
> >>--
> >>Dr. Torsten Andersen TA-web: http://deep.at/myspace/
> >>AG Hübner, Department of Physics, Kaiserslautern University
> >>http://cmt.physik.uni-kl.de http://www.physik.uni-kl.de/
> >>
> >>_______________________________________________
> >>Wien mailing list
> >>Wien at zeus.theochem.tuwien.ac.at
> >>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >>
> >>
> >
> >
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >
>
> --
> Dr. Torsten Andersen TA-web: http://deep.at/myspace/
> AG Hübner, Department of Physics, Kaiserslautern University
> http://cmt.physik.uni-kl.de http://www.physik.uni-kl.de/
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
More information about the Wien
mailing list