[Wien] parallel under sge environment

Peter Blaha pblaha at theochem.tuwien.ac.at
Wed Apr 21 07:35:17 CEST 2010


Still the analysis is not complete:

In your job you requested 4 slots.

In your job.error I can see 3 attempts to connect to remote hosts 
(r105-n15,r108-n84 and r103-n2), but not 4.
Furthermore I see 2 times: lapw0 END ???

how does the corresponding .machines file look ?

When you request 4-8 slots, do you get them on the same node, or are 
they allocated on different nodes (as suggested in your errorlog. ?

If they are on the same node, you can follow the advise given before and 
disable ssh for k-point parallel. This can be done eg. by reinstalling 
WIEWN2k and saying "shared memory machines" in siteconfig.

-------------------------------------------
However, your error log still did not convince me that ssh is impossible.

On your login node say:

hostname   (which returns the name of your frontend). Then

ssh host   (substitute host by the actual name obtained before.

Can you login WITHOUT specifying a password ???

If not, your ssh-keygen installation was not succesful, but this is a 
prerequisite.




Am 20.04.2010 21:29, schrieb zhaoyh:
> Hello Prof. Blaha and Marks,
>
> The submitting script and the error message have been attached.
>
> The "host" and "hosts" pe are not usable right now. The only one I can
> use is mpi.
>
> Thanks for your help.
>
> Regards,
>
> yonghong
> On Tue, 2010-04-20 at 16:33 +0200, Peter Blaha wrote:
>> Still not clear:
>>
>>> "I cannot use ssh" means that this supercomputer doesn't allow users to
>>> log in to the compute node directly. I have consulted the admin already.
>>> He just ask me to use sge script to submit job. The attachment is the
>>
>> It is "normal" that you cannot ssh to the compute node FROM the login node.
>> So you will never be able to type in
>>        ssh nodexxx
>> but this is NOT necessary anyway!
>>
>> Have you tried to adapt one of the job scripts at the faq-page of www.wien2k.at
>> and after creation of the machines file, put    run_lapw -p into the sge script ??
>>
>> It is not helpful to show the PWSCF script, show the WIEN2k script you have tried.
>> Anyway from your script I can see:
>>
>> #$ -pe mpi 160      # 4 slots (allocated among the available hosts)
>> ##$ -pe host 6         # 6 slots (allocated on a single host max=8)
>> ##$ -pe hosts 16       # 8 slots per host. (numbers of cores should be a multiple of 8)
>>
>> Most likely you need to uncomment the last line (and comment the first one), if you do not
>> want to use mpi. At least it indicates that you have different "pe" environments available.
>>
>> Then you need some lines, which generates   .machines  from the nodes assigned to you.
>> (See templates mentioned above, or you said, that you already have that)
>>
>>
>> mpirun -np 160  pwscf -npool 16<  input>  out
>>
>> Instead of that line, you put     run_lapw -p
>>
>>
>> My experience says, that users who cannot handle k-parallelism, will not be
>> able to run mpi-parallel, because this is much more difficult.
>>
>>
>>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 
Peter Blaha
Inst.Materialchemie, TU Wien
Getreidemarkt 9
A-1060 Vienna
Austria


More information about the Wien mailing list