[Wien] dstart_mpi error
Gavin Abo
gsabo at crimson.ua.edu
Fri Jul 20 02:54:45 CEST 2018
Good to hear that the "unable to get host address" and "unable to
connect to server" errors are gone after you fixed the hosts file on
each node.
Regarding the "no hfi units are available" error, if your system has
Intel OP HFI cards, then maybe they just need configured to work [
https://software.intel.com/en-us/articles/using-intel-omni-path-architecture
]. If your system does not have HFI cards, then maybe you need to set
the I_MPI_FABRICS environmental variable on your system to use a
different fabric like tcp [
https://software.intel.com/en-us/mpi-developer-guide-linux-selecting-fabrics
].
I, however, am no expert on the Intel Parallel Studio Cluster Edition.
So, if the above doesn't help, another resource is:
Of the Intel Forums [ https://software.intel.com/en-us/forum ], the
forum having the topic "Intel Clusters and HPC Technology" looks like it
may be the most appropriate one to reach one of the Intel company's
experts for the Intel Parallel Studio Cluster Edition.
On 7/19/2018 10:57 AM, Laurence Marks wrote:
> As I said, this is in your IB (or similar) fabric.
>
> On Thu, Jul 19, 2018 at 11:54 AM, karima Physique
> <physique.karima at gmail.com <mailto:physique.karima at gmail.com>> wrote:
>
> Dear prof. Laurence Marks
>
> *I note that I am using the latest version of intel compilers
> (Intel Parallel Studio Cluster Edition)*
> *I read about the possible solution but I did not find a solution
> related to intel.*
> *do you have any solution for this problem?*
>
> Le jeu. 19 juil. 2018 à 16:02, Laurence Marks
> <L-marks at northwestern.edu <mailto:L-marks at northwestern.edu>> a écrit :
>
> See
> https://www.google.com/search?q=no+hfi+units+are+available+(err%3D23)&oq=no+hfi+units+are+available+(err%3D23)&aqs=chrome..69i57.481j0j4&sourceid=chrome&ie=UTF-8
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.com_search-3Fq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26oq-3Dno-2Bhfi-2Bunits-2Bare-2Bavailable-2B-28err-253D23-29-26aqs-3Dchrome..69i57.481j0j4-26sourceid-3Dchrome-26ie-3DUTF-2D8&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=U_T4PL6jwANfAy4rnxTj8IUxm818jnvqKFdqWLwmqg0&m=qNdRn0Ii6pGOHnTewBT_CRoYTZ4jsF-Fe7RbqSaX7SE&s=Tx6AJR0FtoSRLcurW3nZOAxrL6hhtHGC7YGcNIhcIlU&e=>
>
> This appears to be an issue with your local mpi/fabric.
>
> On Thu, Jul 19, 2018 at 8:03 AM, karima Physique
> <physique.karima at gmail.com <mailto:physique.karima at gmail.com>>
> wrote:
>
> *dear dr Gavin Abo*
> actually, the problem was solved by adding the hostname in
> the hosts file in all the nodes and not only in the
> master node.
>
> now the calculation works very well but at each excusion
> of LAPW0 in the scf I get this error without affecting the
> calculations :
> /""calcul.23539PSM2 no hfi units are available (err=23)""/
>
> I would be grateful if you can help me solve this problem
> even though it does not affect the calculations
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20180719/808e6c78/attachment.html>
More information about the Wien
mailing list