MPI

cosmos 11th February 2019 at 4:52pm

Message programming interface

Distributed memory Parallel programming

Multiple processes

MPI_Init()

MPI_Comm_rank()

MPI_Finalize()

MPI_Comm_size reports the size of the group of processes associated with the specified communicator (a group of processes which communicate with each other).

MPI_Send <–> MPI_Recv

Deadlocks

  • When a process makes a call to MPI_Recv , it will wait patiently until a matching send is posted.
  • Similarly you must assume that when a process makes a call to MPI_Send it will wait until a matching recv is posted

Reduces

http://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/

MPI_Reduce

MPI_Allreduce, MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)


good way to parallelize a certain number of tasks indexed by an index using MPI, including case where I have many gpus.



from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print(rank)
# num_inits_per_task = 1
num_tasks = int(sys.argv[1])

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

num_gpus = len(get_available_gpus())

num_tasks_per_job = num_tasks//size
tasks = list(range(rank*num_tasks_per_job,(rank+1)*num_tasks_per_job))

if rank < num_tasks%size:
    tasks.append(size*num_tasks_per_job+rank)

config = tf.ConfigProto(device_count={'GPU': rank%num_gpus})
# config = tf.ConfigProto()
config.gpu_options.allow_growth = True


tf.enable_eager_execution(config=config)