[Wien] dstart_mpi error

Gavin Abo gsabo at crimson.ua.edu
Fri Jul 20 02:54:45 CEST 2018


Good to hear that the "unable to get host address" and "unable to 
connect to server" errors are gone after you fixed the hosts file on 
each node.

Regarding the "no hfi units are available" error, if your system has 
Intel OP HFI cards, then maybe they just need configured to work [ 
https://software.intel.com/en-us/articles/using-intel-omni-path-architecture 
].  If your system does not have HFI cards, then maybe you need to set 
the I_MPI_FABRICS environmental variable on your system to use a 
different fabric like tcp [ 
https://software.intel.com/en-us/mpi-developer-guide-linux-selecting-fabrics 
].

I, however, am no expert on the Intel Parallel Studio Cluster Edition.

So, if the above doesn't help, another resource is:

Of the Intel Forums [ https://software.intel.com/en-us/forum ], the 
forum having the topic "Intel Clusters and HPC Technology" looks like it 
may be the most appropriate one to reach one of the Intel company's 
experts for the Intel Parallel Studio Cluster Edition.

On 7/19/2018 10:57 AM, Laurence Marks wrote:
> As I said, this is in your IB (or similar) fabric.
>
> On Thu, Jul 19, 2018 at 11:54 AM, karima Physique 
> <physique.karima at gmail.com <mailto:physique.karima at gmail.com>> wrote:
>
>     Dear prof. Laurence Marks
>
>     *I note that I am using the latest version of intel compilers
>     (Intel Parallel Studio Cluster Edition)*
>     *I read about the possible solution but I did not find a solution
>     related to intel.*
>     *do you have any solution for this problem?*
>
>     Le jeu. 19 juil. 2018 à 16:02, Laurence Marks
>     <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu>> a écrit :
>
>         See
>         https://www.google.com/search?q=no+hfi+units+are+available+(err%3D23)&oq=no+hfi+units+are+available+(err%3D23)&aqs=chrome..69i57.481j0j4&sourceid=chrome&ie=UTF-8
>         <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_search-3Fq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26oq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26aqs-3Dchrome..69i57.481j0j4-26sourceid-3Dchrome-26ie-3DUTF-2D8&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=qNdRn0Ii6pGOHnTewBT_CRoYTZ4jsF-Fe7RbqSaX7SE&s=Tx6AJR0FtoSRLcurW3nZOAxrL6hhtHGC7YGcNIhcIlU&e=>
>
>         This appears to be an issue with your local mpi/fabric.
>
>         On Thu, Jul 19, 2018 at 8:03 AM, karima Physique
>         <physique.karima at gmail.com <mailto:physique.karima at gmail.com>>
>         wrote:
>
>             *dear dr Gavin Abo*
>             actually, the problem was solved by adding the hostname in
>             the hosts file in all the nodes  and not only in the
>             master node.
>
>             now the calculation works very well but at each excusion
>             of LAPW0 in the scf I get this error without affecting the
>             calculations :
>             /""calcul.23539PSM2 no hfi units are available (err=23)""/
>
>             I would be grateful if you can help me solve this problem
>             even though it does not affect the calculations
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180719/808e6c78/attachment.html>


More information about the Wien mailing list