[Wien] parallel under sge environment
Peter Blaha
pblaha at theochem.tuwien.ac.at
Wed Apr 21 07:35:17 CEST 2010
Still the analysis is not complete:
In your job you requested 4 slots.
In your job.error I can see 3 attempts to connect to remote hosts
(r105-n15,r108-n84 and r103-n2), but not 4.
Furthermore I see 2 times: lapw0 END ???
how does the corresponding .machines file look ?
When you request 4-8 slots, do you get them on the same node, or are
they allocated on different nodes (as suggested in your errorlog. ?
If they are on the same node, you can follow the advise given before and
disable ssh for k-point parallel. This can be done eg. by reinstalling
WIEWN2k and saying "shared memory machines" in siteconfig.
-------------------------------------------
However, your error log still did not convince me that ssh is impossible.
On your login node say:
hostname (which returns the name of your frontend). Then
ssh host (substitute host by the actual name obtained before.
Can you login WITHOUT specifying a password ???
If not, your ssh-keygen installation was not succesful, but this is a
prerequisite.
Am 20.04.2010 21:29, schrieb zhaoyh:
> Hello Prof. Blaha and Marks,
>
> The submitting script and the error message have been attached.
>
> The "host" and "hosts" pe are not usable right now. The only one I can
> use is mpi.
>
> Thanks for your help.
>
> Regards,
>
> yonghong
> On Tue, 2010-04-20 at 16:33 +0200, Peter Blaha wrote:
>> Still not clear:
>>
>>> "I cannot use ssh" means that this supercomputer doesn't allow users to
>>> log in to the compute node directly. I have consulted the admin already.
>>> He just ask me to use sge script to submit job. The attachment is the
>>
>> It is "normal" that you cannot ssh to the compute node FROM the login node.
>> So you will never be able to type in
>> ssh nodexxx
>> but this is NOT necessary anyway!
>>
>> Have you tried to adapt one of the job scripts at the faq-page of www.wien2k.at
>> and after creation of the machines file, put run_lapw -p into the sge script ??
>>
>> It is not helpful to show the PWSCF script, show the WIEN2k script you have tried.
>> Anyway from your script I can see:
>>
>> #$ -pe mpi 160 # 4 slots (allocated among the available hosts)
>> ##$ -pe host 6 # 6 slots (allocated on a single host max=8)
>> ##$ -pe hosts 16 # 8 slots per host. (numbers of cores should be a multiple of 8)
>>
>> Most likely you need to uncomment the last line (and comment the first one), if you do not
>> want to use mpi. At least it indicates that you have different "pe" environments available.
>>
>> Then you need some lines, which generates .machines from the nodes assigned to you.
>> (See templates mentioned above, or you said, that you already have that)
>>
>>
>> mpirun -np 160 pwscf -npool 16< input> out
>>
>> Instead of that line, you put run_lapw -p
>>
>>
>> My experience says, that users who cannot handle k-parallelism, will not be
>> able to run mpi-parallel, because this is much more difficult.
>>
>>
>>
>
>
>
> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
Peter Blaha
Inst.Materialchemie, TU Wien
Getreidemarkt 9
A-1060 Vienna
Austria
More information about the Wien
mailing list