[Wien] error in k-point parallel execution

Mahmoud Payami mpayami at aeoi.org.ir
Wed Jul 28 13:33:53 CEST 2004


Dear Dr. Andersen,

>
> I would need to see your *lapw1*.error files, unless they just say
> "error in lapw1", but I would suspect they contain more information
> since the one on your initial host finished.


Here is the "lapw1.error" content:
--------------
**  Error in Parallel LAPW1
**  LAPW1 STOPPED at Wed Jul 28 14:06:03 EDT 2004
**  check ERROR FILES!
--------------------------
There exist only one other "lapw1_1.error" which is empty, and no other
"lapw1_x.error" (x=2,3,4,5).

>
> Do you have a common nfs-mounted home directory?
>
I am novice in linux and do not understand its meaning. But I have installed
"nfs-utils" package on all pc's. Could you please let me know what should I
do in order to meet this condition?

> Is the scratch directory in the same location on all machines?

I have installed wien in all nodes with the same specifications.

>
> Have you configured $remote in wien2k to ssh?

Yes. I have chosen "ssh" in the "siteconfig_lapw" step.

> Can we see your .machines file?

Here is the ".machines" file content:
-----------
1:localhost
1:condmat2
1:condmat3
1:condmat4
1:condmat5
granularity:1
extrafine:1

-----------------

Kind regards,
Mahmoud Payami







> Best regards,
> Torsten Andersen.
>
> Mahmoud Payami wrote:
> > Dear Dr. Torsten Andersen,
> >
> > Thank you very much for your comment. I have reconfigured the hosts and
> > passwordless ssh is possible from master to nodes and vice versa.
> > I receive more or less the same error:
> >
> > ----------------------------
> >  LAPW0 END
> >  LAPW1 END
> > 0.39user 0.04system 0:00.43elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+5918minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+204minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+202minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+202minor)pagefaults 0swaps
> > LAPW1 - Error
> > 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
> > 0inputs+0outputs (0major+204minor)pagefaults 0swaps
> >
> > ---------------------------------------------
> > My own analysis based on your comment is that only the part dedicated to
> > localhost is performed without any problem but the remote hosts did not
> > contribute.
> > I checked the time spent for a password-less ssh to a remote host is
about
> > 10 seconds. Could it be some "timeout" error? If yes, how can it be
fixed?
> >
> > Thank you in advance.
> >
> > Kind regards,
> > Mahmoud Payami
> >
> >
> >
> >
> >
> >
> >
> >>Dear Mr. Payami,
> >>
> >>you can only use nodes for which a key exist in the list of known hosts.
> >>Otherwise it will exit at the prompt for password.
> >>
> >>Mahmoud Payami wrote:
> >>
> >>>Dear Wien Users & Developers,
> >>>
> >>>I noticed that in naming the nodes, one should not use the symbol "_".
> >>>However, when I changed the names and did not use that symbol, I
> >>>encountered the following new error in running scf:
> >>>
> >>>-------------------------------
> >>> 0inputs+0outputs (8major+195minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.10elapsed 0%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>0inputs+0outputs (11major+192minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.10elapsed 1%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>0inputs+0outputs (0major+11620minor)pagefaults 0swaps
> >>>1.21user 0.20system 0:01.42elapsed 99%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>> LAPW1 END
> >>>0inputs+0outputs (8major+196minor)pagefaults 0swaps
> >>>0.00user 0.00system 0:00.08elapsed 1%CPU (0avgtext+0avgdata
> >
> > 0maxresident)k
> >
> >>>LAPW1 - Error
> >>>Warning: Permanently added 'condmat1' (RSA) to the list of known hosts.
> >>
> >>Here! This Warning tells me that you have not initiated your hosts
> >
> > properly.
> >
> >>> LAPW0 END
> >>
>
  --------------------------------------------------------------------------
> > -
> >
> >>>I would be grateful for any comment.
> >>>
> >>>Kindest regards,
> >>>
> >>>M. Payami
> >>>
> >>
> >>Best regards,
> >>Torsten Andersen.
> >>
> >>-- 
> >>Dr. Torsten Andersen        TA-web: http://deep.at/myspace/
> >>AG Hübner, Department of Physics, Kaiserslautern University
> >>http://cmt.physik.uni-kl.de    http://www.physik.uni-kl.de/
> >>
> >>_______________________________________________
> >>Wien mailing list
> >>Wien at zeus.theochem.tuwien.ac.at
> >>http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >>
> >>
> >
> >
> > _______________________________________________
> > Wien mailing list
> > Wien at zeus.theochem.tuwien.ac.at
> > http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> >
>
> -- 
> Dr. Torsten Andersen        TA-web: http://deep.at/myspace/
> AG Hübner, Department of Physics, Kaiserslautern University
> http://cmt.physik.uni-kl.de    http://www.physik.uni-kl.de/
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>





More information about the Wien mailing list