[Wien] dstart_mpi error

karima Physique physique.karima at gmail.com
Fri Jul 20 13:05:29 CEST 2018


Dear Dr. Gavin Abo

thank you very much for your answers, my system does not have a HFI card
but I will try the second solution.

Le ven. 20 juil. 2018 à 02:54, Gavin Abo <gsabo at crimson.ua.edu> a écrit :

> Good to hear that the "unable to get host address" and "unable to connect
> to server" errors are gone after you fixed the hosts file on each node.
>
> Regarding the "no hfi units are available" error, if your system has Intel
> OP HFI cards, then maybe they just need configured to work [
> https://software.intel.com/en-us/articles/using-intel-omni-path-architecture
> ].  If your system does not have HFI cards, then maybe you need to set the
> I_MPI_FABRICS environmental variable on your system to use a different
> fabric like tcp [
> https://software.intel.com/en-us/mpi-developer-guide-linux-selecting-fabrics
> ].
>
> I, however, am no expert on the Intel Parallel Studio Cluster Edition.
>
> So, if the above doesn't help, another resource is:
>
> Of the Intel Forums [ https://software.intel.com/en-us/forum ], the forum
> having the topic "Intel Clusters and HPC Technology" looks like it may be
> the most appropriate one to reach one of the Intel company's experts for
> the Intel Parallel Studio Cluster Edition.
> On 7/19/2018 10:57 AM, Laurence Marks wrote:
>
> As I said, this is in your IB (or similar) fabric.
>
> On Thu, Jul 19, 2018 at 11:54 AM, karima Physique <
> physique.karima at gmail.com> wrote:
>
>> Dear prof. Laurence Marks
>>
>> *I note that I am using the latest version of intel compilers (Intel
>> Parallel Studio Cluster Edition)*
>> *I read about the possible solution but I did not find a solution related
>> to intel.*
>> *do you have any solution for this problem?*
>>
>> Le jeu. 19 juil. 2018 à 16:02, Laurence Marks <L-marks at northwestern.edu>
>> a écrit :
>>
>>> See
>>>
>>> https://www.google.com/search?q=no+hfi+units+are+available+(err%3D23)&oq=no+hfi+units+are+available+(err%3D23)&aqs=chrome..69i57.481j0j4&sourceid=chrome&ie=UTF-8
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_search-3Fq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26oq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26aqs-3Dchrome..69i57.481j0j4-26sourceid-3Dchrome-26ie-3DUTF-2D8&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=qNdRn0Ii6pGOHnTewBT_CRoYTZ4jsF-Fe7RbqSaX7SE&s=Tx6AJR0FtoSRLcurW3nZOAxrL6hhtHGC7YGcNIhcIlU&e=>
>>>
>>> This appears to be an issue with your local mpi/fabric.
>>>
>>> On Thu, Jul 19, 2018 at 8:03 AM, karima Physique <
>>> physique.karima at gmail.com> wrote:
>>>
>>>> *dear dr Gavin Abo*
>>>> actually, the problem was solved by adding the hostname in the hosts
>>>> file in all the nodes  and not only in the master node.
>>>>
>>>> now the calculation works very well but at each excusion of LAPW0 in
>>>> the scf I get this error without affecting the calculations :
>>>> *""calcul.23539PSM2 no hfi units are available (err=23)""*
>>>>
>>>> I would be grateful if you can help me solve this problem even though
>>>> it does not affect the calculations
>>>>
>>> _______________________________________________
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180720/478fdf63/attachment.html>


More information about the Wien mailing list