Distributed Computation¶
High-level Point FMM Interface¶
To perform point-FMM, first construct a
boxtree.distributed.DistributedFMMRunner object. The constructor will
distribute the necessary information from the root rank to all worker ranks. Then,
the boxtree.distributed.DistributedFMMRunner.drive_dfmm() can be used for
launching FMM.
- class boxtree.distributed.DistributedFMMRunner(queue, global_tree, traversal_builder, wrangler_factory, calibration_params=None, comm=<mpi4py.MPI.Intracomm object>)[source]¶
Helper class for setting up and running distributed point FMM.
- __init__(queue, global_tree, traversal_builder, wrangler_factory, calibration_params=None, comm=<mpi4py.MPI.Intracomm object>)[source]¶
Construct a DistributedFMMRunner object.
- Parameters:
global_tree – a
boxtree.Treeobject. This tree could live in the host or the device memory, depending on the wrangler. This argument is only significant on the root rank.traversal_builder – an object which, when called, takes a
pyopencl.CommandQueueobject and aboxtree.Treeobject, and generates aboxtree.traversal.FMMTraversalInfoobject from the tree using the command queue.wrangler_factory – an object which, when called, takes the local traversal and the global traversal objects and returns an
boxtree.fmm.ExpansionWranglerInterfaceobject.calibration_params – Calibration parameters for the cost model, if supplied. The cost model is used for estimating the execution time of each box, which is used for improving load balancing.
comm – MPI communicator.
Distributed Algorithm Overview¶
Construct the global tree and traversal lists on the root rank and broadcast to all worker ranks.
Partition boxes into disjoint sets, where the number of sets is the number of MPI ranks. (See Partition Boxes)
Each rank constructs the local tree and traversal lists independently, according to the partition. (See Construct Local Tree and Traversal)
Distribute source weights from the root rank to all worker ranks. (See Distributed Wrangler)
Each rank independently forms multipole expansions from the leaf nodes of the local tree and propagates the partial multipole expansions upwards.
Communicate multipole expansions so that all ranks have the complete multipole expansions needed.
Each rank independently forms local expansions, propagates the local expansions downwards, and evaluate potentials of target points in its partition. The calculated potentials are then assembled on the root rank.
For step 5-7, see Distributed FMM Evaluation.
Note that step 4-7 may be repeated multiple times with the same tree and traversal object built from step 1-3. For example, when iteratively solving a PDE, step 4-7 is executed for each iteration of the linear solver.
The next sections will cover the interfaces of these steps.
Partition Boxes¶
- boxtree.distributed.partition.partition_work(cost_per_box, traversal, comm)[source]¶
This function assigns responsible boxes for each rank.
If a rank is responsible for a box, it will calculate the multiple expansion of the box and evaluate target potentials in the box.
- Parameters:
cost_per_box – The expected running time of each box. This argument is only significant on the root rank.
traversal – The global traversal object containing all particles. This argument is significant on all ranks.
comm – MPI communicator.
- Returns:
A numpy array containing the responsible boxes of the current rank.
- class boxtree.distributed.partition.BoxMasks(responsible_boxes: Array, ancestor_boxes: Array, point_src_boxes: Array, multipole_src_boxes: Array)[source]¶
Box masks needed for the distributed calculation. Each of these masks is a PyOpenCL array with length
tree.nboxes, whose i-th entry is 1 if box i is set.- responsible_boxes¶
Current process will evaluate target potentials and multipole expansions in these boxes. Sources and targets in these boxes are needed.
- ancestor_boxes¶
Ancestors of the responsible boxes.
- point_src_boxes¶
Current process needs sources but not targets in these boxes.
- multipole_src_boxes¶
Current process needs multipole expressions in these boxes.
- boxtree.distributed.partition.get_box_masks(queue, traversal, responsible_boxes_list)[source]¶
Given the responsible boxes for a rank, this helper function calculates the relevant masks.
- Parameters:
responsible_boxes_list – A numpy array of responsible box indices.
- Returns:
A
BoxMasksobject of the relevant masks.
Construct Local Tree and Traversal¶
- class boxtree.distributed.local_tree.LocalTree(valuedict: Mapping[str, Any] | None = None, exclude: Sequence[str] | None = None, **kwargs: Any)[source]¶
Inherits from
boxtree.Tree.- box_to_user_rank_starts¶
box_id_t [nboxes + 1]
- box_to_user_rank_lists¶
int32 [*]A CSR-like interaction list storage array, together with
box_to_user_rank_starts. For each box, the list of ranks which own targets that use the multipole expansion at this box, via either List 3 or (possibly downward propagated from an ancestor) List 2.
- boxtree.distributed.local_tree.generate_local_tree(queue, global_traversal, responsible_boxes_list, comm)[source]¶
Generate the local tree for the current rank.
This is an MPI-collective routine on comm.
- Parameters:
queue – a
pyopencl.CommandQueueobject.global_traversal – Global
boxtree.traversal.FMMTraversalInfoobject on host memory.responsible_boxes_list – a
numpy.ndarrayobject containing the responsible boxes of the current rank.
- Returns:
a tuple of
(local_tree, src_idx, tgt_idx), wherelocal_treeis an object with classboxtree.distributed.local_tree.LocalTreeof the generated local tree,src_idxis the indices of the local sources in the global tree, andtgt_idxis the indices of the local targets in the global tree.src_idxandtgt_idxare needed for distributing source weights from root rank and assembling calculated potentials on the root rank.
- boxtree.distributed.local_traversal.generate_local_travs(queue, local_tree, traversal_builder, merge_close_lists=False)[source]¶
Generate local traversal from local tree.
- Parameters:
queue – a
pyopencl.CommandQueueobject.local_tree – the local tree of class boxtree.tools.ImmutableHostDeviceArray on which the local traversal object will be constructed.
traversal_builder – a function, taken a
pyopencl.CommandQueueand a tree, returns the traversal object based on the tree.
- Returns:
generated local traversal object in device memory
Distributed Wrangler¶
- class boxtree.distributed.calculation.DistributedExpansionWrangler(context, comm, global_traversal, traversal_in_device_memory, communicate_mpoles_via_allreduce=False)[source]¶
Distributed expansion wrangler base class.
This is an abstract class and should not be directly instantiated. Instead, it is expected that all distributed wranglers should be subclasses of this class.
- __init__(context, comm, global_traversal, traversal_in_device_memory, communicate_mpoles_via_allreduce=False)[source]¶
- distribute_source_weights(src_weight_vecs, src_idx_all_ranks)[source]¶
Used by the distributed implementation for transferring needed source weights from root rank to each worker rank in the communicator.
This method needs to be called collectively by all ranks in the communicator.
- Parameters:
src_weight_vecs – a sequence of
numpy.ndarray, each with lengthnsources, representing the weights of sources on the root rank. None on worker ranks.src_idx_all_ranks – a
listof lengthnranks, including the root rank, where the i-th entry is anumpy.ndarrayof indices, of which src_weight_vecs to be sent from the root rank to rank i. Each entry can be generated bygenerate_local_tree(). None on worker ranks.
- Returns:
Received source weights of the current rank, including the root rank.
- gather_potential_results(potentials, tgt_idx_all_ranks)[source]¶
Used by the distributed implementation for gathering calculated potentials from all worker ranks in the communicator to the root rank.
This method needs to be called collectively by all ranks in the communicator.
- Parameters:
potentials – Calculated potentials on each rank. This argument is significant on all ranks, including the root rank.
tgt_idx_all_ranks – a
listof lengthnranks, where the i-th entry is anumpy.ndarrayof the global potential indices of potentials from rank i. This argument is only significant on the root rank.
- Returns:
Gathered potentials on the root rank. None on worker ranks.
- communicate_mpoles(mpole_exps, return_stats=False)[source]¶
Based on Algorithm 3: Reduce and Scatter in Lashuk et al. [1].
The main idea is to mimic an allreduce as done on a hypercube network, but to decrease the bandwidth cost by sending only information that is relevant to the rank receiving the message.
Distributed FMM Evaluation¶
The distributed version of the FMM evaluation shares the same interface as the
shared-memory version. To evaluate FMM in a distributed manner, use a subclass
of boxtree.distributed.calculation.DistributedExpansionWrangler in
boxtree.fmm.drive_fmm().