Message programming interface
Distributed memory Parallel programming
Multiple processes
MPI_Init()
MPI_Comm_rank()
MPI_Finalize()
MPI_Comm_size reports the size of the group of processes associated with the specified communicator (a group of processes which communicate with each other).
MPI_Send
<–> MPI_Recv
http://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/
MPI_Reduce
MPI_Allreduce
, MPI_Allreduce(const void *sendbuf, void *recvbuf, int count,
MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
good way to parallelize a certain number of tasks indexed by an index using MPI, including case where I have many gpus.
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print(rank)
# num_inits_per_task = 1
num_tasks = int(sys.argv[1])
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
num_gpus = len(get_available_gpus())
num_tasks_per_job = num_tasks//size
tasks = list(range(rank*num_tasks_per_job,(rank+1)*num_tasks_per_job))
if rank < num_tasks%size:
tasks.append(size*num_tasks_per_job+rank)
config = tf.ConfigProto(device_count={'GPU': rank%num_gpus})
# config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.enable_eager_execution(config=config)