Markus Grassl on Sun, 04 Dec 2022 21:13:47 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

help with MPI and SLURM


Hello,

I hope that someone here on the list can help me with the MPI version of PARI/gp in a SLURM environment. I apologise if that was explained in the documentation and would like to ask for pointers to the respective parts. In general, I don't have experience with MPI, so I hope I am just making stupid beginner's faults.

I did compile PARI from the sources using the option -mt=mpi into a directory ${HOME}/local-mpi/

Before trying to run gp, I have allocated 3 nodes in SLURM using

  salloc -N 3 -p partition

Then I have tried both "srun" and "mpirun" to execute "gp", and I have obtained the error messages below.

I suspect that something is missing in the local MPI installation so that I cannot execute gp. Probably SLURM has to be reinstalled with the settings indicated by the first error message.

It would be useful to receive some hints which I then could pass on to our sysadmins.

Further, does anyone has a sample SLURM script that runs gp non-interactively on multiple nodes?


Thanks

Markus

======================================================================================================

$ srun  ~/local-mpi/bin/gp

The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.


$ mpirun  ~/local-mpi/bin/gp
  [dd-login:164917] PMIX-XFER-VALUE: UNSUPPORTED TYPE 28016
  [dd-login:164917] PMIX ERROR: ERROR in file server/pmix_server.c at line 322