Parallel Algorithms¶
Element-wise expression evaluation (“map”)¶
Evaluating involved expressions on pyopencl.array.Array
instances by
using overloaded operators can be somewhat inefficient, because a new temporary
is created for each intermediate result. The functionality in the module
pyopencl.elementwise
contains tools to help generate kernels that
evaluate multi-stage expressions on one or several operands in a single pass.
- class pyopencl.elementwise.ElementwiseKernel(context: Context, arguments: str | List[DtypedArgument], operation: str, name: str = 'elwise_kernel', options: Any = None, **kwargs: Any)[source]¶
A kernel that takes a number of scalar or vector arguments and performs an operation specified as a snippet of C on these arguments.
- Parameters:
arguments – a string formatted as a C argument list.
operation – a snippet of C that carries out the desired ‘map’ operation. The current index is available as the variable i. operation may contain the statement
PYOPENCL_ELWISE_CONTINUE
, which will terminate processing for the current element.name – the function name as which the kernel is compiled
options – passed unmodified to
pyopencl.Program.build()
.preamble – a piece of C source code that gets inserted outside of the function context in the elementwise operation’s kernel source code.
Warning
Using a
return
statement in operation will lead to incorrect results, as some elements may never get processed. UsePYOPENCL_ELWISE_CONTINUE
instead.Changed in version 2013.1: Added
PYOPENCL_ELWISE_CONTINUE
.- __call__(*args, **kwargs) Event [source]¶
Invoke the generated scalar kernel.
The arguments may either be scalars or
pyopencl.array.Array
instances.Returns a new
pyopencl.Event
. wait_for may either be None or a list ofpyopencl.Event
instances for whose completion this command waits before starting exeuction.
Here’s a usage example:
import numpy as np
import pyopencl as cl
import pyopencl.array
from pyopencl.elementwise import ElementwiseKernel
n = 10
rng = np.random.default_rng()
a_np = rng.random(n, dtype=np.float32)
b_np = rng.random(n, dtype=np.float32)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
a_g = cl.array.to_device(queue, a_np)
b_g = cl.array.to_device(queue, b_np)
lin_comb = ElementwiseKernel(ctx,
"float k1, float *a_g, float k2, float *b_g, float *res_g",
"res_g[i] = k1 * a_g[i] + k2 * b_g[i]",
"lin_comb")
res_g = cl.array.empty_like(a_g)
lin_comb(2, a_g, 3, b_g, res_g)
# Check on GPU with PyOpenCL Array:
print((res_g - (2 * a_g + 3 * b_g)).get())
# Check on CPU with Numpy:
res_np = res_g.get()
print(res_np - (2 * a_np + 3 * b_np))
print(np.linalg.norm(res_np - (2 * a_np + 3 * b_np)))
(You can find this example as
examples/demo_elementwise.py
in the PyOpenCL distribution.)
Sums and counts (“reduce”)¶
- class pyopencl.reduction.ReductionKernel(ctx: Context, dtype_out: Any, neutral: str, reduce_expr: str, map_expr: str | None = None, arguments: str | List[DtypedArgument] | None = None, name: str = 'reduce_kernel', options: Any = None, preamble: str = '')[source]¶
A kernel that performs a generic reduction on arrays.
Generate a kernel that takes a number of scalar or vector arguments (at least one vector argument), performs the map_expr on each entry of the vector argument and then the reduce_expr on the outcome of that. neutral serves as an initial value. preamble offers the possibility to add preprocessor directives and other code (such as helper functions) to be added before the actual reduction kernel code.
Vectors in map_expr should be indexed by the variable i. reduce_expr uses the formal values “a” and “b” to indicate two operands of a binary reduction operation. If you do not specify a map_expr,
in[i]
is automatically assumed and treated as the only one input argument.dtype_out specifies the
numpy.dtype
in which the reduction is performed and in which the result is returned. neutral is specified as float or integer formatted as string. reduce_expr and map_expr are specified as string formatted operations and arguments is specified as a string formatted as a C argument list. name specifies the name as which the kernel is compiled. options are passed unmodified topyopencl.Program.build()
. preamble specifies a string of code that is inserted before the actual kernels.- __init__(ctx: Context, dtype_out: Any, neutral: str, reduce_expr: str, map_expr: str | None = None, arguments: str | List[DtypedArgument] | None = None, name: str = 'reduce_kernel', options: Any = None, preamble: str = '') None [source]¶
- __call__(*args: Any, **kwargs: Any) Event [source]¶
Invoke the generated kernel.
wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.With out the resulting single-entry
pyopencl.array.Array
can be specified. Because offsets are supported one can store results anywhere (e.g.out=a[3]
).Note
The returned
pyopencl.Event
corresponds only to part of the execution of the reduction. It is not suitable for profiling.Added in version 2011.1.
Changed in version 2014.2: Added out parameter.
Changed in version 2016.2: range_ and slice_ added.
- Parameters:
range – A
slice
object. Specifies the range of indices on which the kernel will be executed. May not be given at the same time as slice.slice – A
slice
object. Specifies the range of indices on which the kernel will be executed, relative to the first vector-like argument. May not be given at the same time as range.return_event – a boolean flag used to return an event for the reduction.
- Returns:
the resulting scalar as a single-entry
pyopencl.array.Array
if return_event is False, otherwise a tuple(scalar_array, event)
.
Here’s a usage example:
a = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
b = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="__global float *x, __global float *y")
my_dot_prod = krnl(a, b).get()
Prefix Sums (“scan”)¶
A prefix sum is a running sum of an array, as provided by
e.g. numpy.cumsum()
:
>>> import numpy as np
>>> a = [1,1,1,1,1,2,2,2,2,2]
>>> np.cumsum(a)
array([ 1, 2, 3, 4, 5, 7, 9, 11, 13, 15])
This is a very simple example of what a scan can do. It turns out that scans are significantly more versatile. They are a basic building block of many non-trivial parallel algorithms. Many of the operations enabled by scans seem difficult to parallelize because of loop-carried dependencies.
See also
- Prefix sums and their applications, by Guy Blelloch.
This article gives an overview of some surprising applications of scans.
- Simple / Legacy Interface
These operations built into PyOpenCL are realized using
GenericScanKernel
.
Usage Example¶
This example illustrates the implementation of a simplified version of
pyopencl.algorithm.copy_if()
,
which copies integers from an array into the (variable-size) output if they are
greater than 300:
knl = GenericScanKernel(
ctx, np.int32,
arguments="__global int *ary, __global int *out",
input_expr="(ary[i] > 300) ? 1 : 0",
scan_expr="a+b", neutral="0",
output_statement="""
if (prev_item != item) out[item-1] = ary[i];
""")
out = a.copy()
knl(a, out)
a_host = a.get()
out_host = a_host[a_host > 300]
assert (out_host == out.get()[:len(out_host)]).all()
The value being scanned over is a number of flags indicating whether each array
element is greater than 300. These flags are computed by input_expr. The
prefix sum over this array gives a running count of array items greater than
300. The output_statement the compares prev_item
(the previous item’s scan
result, i.e. index) to item
(the current item’s scan result, i.e.
index). If they differ, i.e. if the predicate was satisfied at this
position, then the item is stored in the output at the computed index.
This example does not make use of the following advanced features also available in PyOpenCL:
Segmented scans
Access to the previous item in input_expr (e.g. for comparisons) See the implementation of
pyopencl.algorithm.unique()
for an example.
Making Custom Scan Kernels¶
Added in version 2013.1.
- class pyopencl.scan.GenericScanKernel(ctx: Context, dtype: Any, arguments: str | List[DtypedArgument], input_expr: str, scan_expr: str, neutral: str | None, output_statement: str, is_segment_start_expr: str | None = None, input_fetch_exprs: List[Tuple[str, str, int]] | None = None, index_dtype: Any = None, name_prefix: str = 'scan', options: Any = None, preamble: str = '', devices: Device | None = None)[source]¶
Generates and executes code that performs prefix sums (“scans”) on arbitrary types, with many possible tweaks.
Usage example:
from pyopencl.scan import GenericScanKernel knl = GenericScanKernel( context, np.int32, arguments="__global int *ary", input_expr="ary[i]", scan_expr="a+b", neutral="0", output_statement="ary[i+1] = item;") a = cl.array.arange(queue, 10000, dtype=np.int32) knl(a, queue=queue)
- __init__(ctx: Context, dtype: Any, arguments: str | List[DtypedArgument], input_expr: str, scan_expr: str, neutral: str | None, output_statement: str, is_segment_start_expr: str | None = None, input_fetch_exprs: List[Tuple[str, str, int]] | None = None, index_dtype: Any = None, name_prefix: str = 'scan', options: Any = None, preamble: str = '', devices: Device | None = None) None [source]¶
- Parameters:
ctx – a
pyopencl.Context
within which the code for this scan kernel will be generated.dtype – the
numpy.dtype
with which the scan will be performed. May be a structured type if that type was registered throughpyopencl.tools.get_or_register_dtype()
.arguments – A string of comma-separated C argument declarations. If arguments is specified, then input_expr must also be specified. All types used here must be known to PyOpenCL. (see
pyopencl.tools.get_or_register_dtype()
).scan_expr –
The associative, binary operation carrying out the scan, represented as a C string. Its two arguments are available as
a
andb
when it is evaluated.b
is guaranteed to be the ‘element being updated’, anda
is the increment. Thus, if some data is supposed to just propagate along without being modified by the scan, it should live inb
.This expression may call functions given in the preamble.
Another value available to this expression is
across_seg_boundary
, a C bool indicating whether this scan update is crossing a segment boundary, as defined byis_segment_start_expr
. The scan routine does not implement segmentation semantics on its own. It relies onscan_expr
to do this. This value is available (but alwaysfalse
) even for a non-segmented scan.Note
In early pre-releases of the segmented scan, segmentation semantics were implemented without relying on
scan_expr
.input_expr –
A C expression, encoded as a string, resulting in the values to which the scan is applied. This may be used to apply a mapping to values stored in arguments before being scanned. The result of this expression must match dtype. The index intended to be mapped is available as
i
in this expression. This expression may also use the variables defined by input_fetch_expr.This expression may also call functions given in the preamble.
output_statement –
a C statement that writes the output of the scan. It has access to the scan result as
item
, the preceding scan result item asprev_item
, and the current index asi
.prev_item
in a segmented scan will be the neutral element at a segment boundary, not the immediately preceding item.Using prev_item in output statement has a small run-time cost.
prev_item
enables the construction of an exclusive scan.For non-segmented scans, output_statement may also reference
last_item
, which evaluates to the scan result of the last array entry.is_segment_start_expr –
A C expression, encoded as a string, resulting in a C
bool
value that determines whether a new scan segments starts at index i. If given, makes the scan a segmented scan. Has access to the current indexi
, the result of input_expr asa
, and in addition may use arguments and input_fetch_expr variables just like input_expr.If it returns true, then previous sums will not spill over into the item with index i or subsequent items.
input_fetch_exprs –
a list of tuples (NAME, ARG_NAME, OFFSET). An entry here has the effect of doing the equivalent of the following before input_expr:
ARG_NAME_TYPE NAME = ARG_NAME[i+OFFSET];
OFFSET
is allowed to be 0 or -1, andARG_NAME_TYPE
is the type ofARG_NAME
.preamble – A snippet of C that is inserted into the compiled kernel before the actual kernel function. May be used for, e.g. type definitions or include statements.
The first array in the argument list determines the size of the index space over which the scan is carried out, and thus the values over which the index i occurring in a number of code fragments in arguments above will vary.
All code fragments further have access to N, the number of elements being processed in the scan.
- __call__(*args: Any, **kwargs: Any) Event [source]¶
Returns a new
pyopencl.Event
. wait_for may either be None or a list ofpyopencl.Event
instances for whose completion this command waits before starting exeuction.Note
The returned
pyopencl.Event
corresponds only to part of the execution of the scan. It is not suitable for profiling.- Parameters:
queue – queue on which to execute the scan. If not given, the queue of the first
pyopencl.array.Array
in args is usedallocator – an allocator for the temporary arrays and results. If not given, the allocator of the first
pyopencl.array.Array
in args is used.size – specify the length of the scan to be carried out. If not given, this length is inferred from the first argument
wait_for – a
list
of events to wait for.
Debugging aids¶
- class pyopencl.scan.GenericDebugScanKernel(ctx: Context, dtype: Any, arguments: str | List[DtypedArgument], input_expr: str, scan_expr: str, neutral: str | None, output_statement: str, is_segment_start_expr: str | None = None, input_fetch_exprs: List[Tuple[str, str, int]] | None = None, index_dtype: Any = None, name_prefix: str = 'scan', options: Any = None, preamble: str = '', devices: Device | None = None)[source]¶
Performs the same function and has the same interface as
GenericScanKernel
, but uses a dead-simple, sequential scan. Works best on CPU platforms, and helps isolate bugs in scans by removing the potential for issues originating in parallel execution.
Simple / Legacy Interface¶
- class pyopencl.scan.ExclusiveScanKernel(ctx, dtype, scan_expr, neutral, name_prefix='scan', options=[], preamble='', devices=None)[source]¶
Generates a kernel that can compute a prefix sum using any associative operation given as scan_expr. scan_expr uses the formal values “a” and “b” to indicate two operands of an associative binary operation. neutral is the neutral element of scan_expr, obeying scan_expr(a, neutral) == a.
dtype specifies the type of the arrays being operated on. name_prefix is used for kernel names to ensure recognizability in profiles and logs. options is a list of compiler options to use when building. preamble specifies a string of code that is inserted before the actual kernels. devices may be used to restrict the set of devices on which the kernel is meant to run. (defaults to all devices in the context ctx.
- class pyopencl.scan.InclusiveScanKernel(ctx, dtype, scan_expr, neutral=None, name_prefix='scan', options=[], preamble='', devices=None)[source]¶
Works like
ExclusiveScanKernel
.Changed in version 2013.1: neutral is now always required.
For the array [1, 2, 3]
, inclusive scan results in [1, 3, 6]
, and exclusive
scan results in [0, 1, 3]
.
Here’s a usage example:
knl = InclusiveScanKernel(context, np.int32, "a+b")
n = 2**20-2**18+5
rng = np.random.default_rng(seed=42)
host_data = rng.integers(0, 10, size=n, dtype=np.int32)
dev_data = cl_array.to_device(queue, host_data)
knl(dev_data)
assert (dev_data.get() == np.cumsum(host_data, axis=0)).all()
Predicated copies (“partition”, “unique”, …)¶
- pyopencl.algorithm.copy_if(ary, predicate, extra_args=None, preamble='', queue=None, wait_for=None)[source]¶
Copy the elements of ary satisfying predicate to an output array.
- Parameters:
predicate – a C expression evaluating to a
bool
, represented as a string. The value to test is available asary[i]
, and if the expression evaluates totrue
, then this value ends up in the output.extra_args – a list of tuples (name, value) specifying extra arguments to pass to the scan procedure. For version 2013.1, value must be a of a
numpy
sized scalar type. As of version 2013.2, value may also be apyopencl.array.Array
.preamble – A snippet of C that is inserted into the compiled kernel before the actual kernel function. May be used for, e.g. type definitions or include statements.
wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
a tuple (out, count, event) where out is the output array, count is an on-device scalar (fetch to host with
count.get()
) indicating how many elements satisfied predicate, and event is apyopencl.Event
for dependency management. out is allocated to the same length as ary, but only the first count entries carry meaning.
Added in version 2013.1.
- pyopencl.algorithm.remove_if(ary, predicate, extra_args=None, preamble='', queue=None, wait_for=None)[source]¶
Copy the elements of ary not satisfying predicate to an output array.
- Parameters:
predicate – a C expression evaluating to a
bool
, represented as a string. The value to test is available asary[i]
, and if the expression evaluates tofalse
, then this value ends up in the output.extra_args – a list of tuples (name, value) specifying extra arguments to pass to the scan procedure. For version 2013.1, value must be a of a
numpy
sized scalar type. As of version 2013.2, value may also be apyopencl.array.Array
.preamble – A snippet of C that is inserted into the compiled kernel before the actual kernel function. May be used for, e.g. type definitions or include statements.
wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
a tuple (out, count, event) where out is the output array, count is an on-device scalar (fetch to host with
count.get()
) indicating how many elements did not satisfy predicate, and event is apyopencl.Event
for dependency management.
Added in version 2013.1.
- pyopencl.algorithm.partition(ary, predicate, extra_args=None, preamble='', queue=None, wait_for=None)[source]¶
Copy the elements of ary into one of two arrays depending on whether they satisfy predicate.
- Parameters:
predicate – a C expression evaluating to a
bool
, represented as a string. The value to test is available asary[i]
.extra_args – a list of tuples (name, value) specifying extra arguments to pass to the scan procedure. For version 2013.1, value must be a of a
numpy
sized scalar type. As of version 2013.2, value may also be apyopencl.array.Array
.preamble – A snippet of C that is inserted into the compiled kernel before the actual kernel function. May be used for, e.g. type definitions or include statements.
wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
a tuple (out_true, out_false, count, event) where count is an on-device scalar (fetch to host with
count.get()
) indicating how many elements satisfied the predicate, and event is apyopencl.Event
for dependency management.
Added in version 2013.1.
- pyopencl.algorithm.unique(ary, is_equal_expr='a == b', extra_args=None, preamble='', queue=None, wait_for=None)[source]¶
Copy the elements of ary into the output if is_equal_expr, applied to the array element and its predecessor, yields false.
Works like the UNIX command uniq, with a potentially custom comparison. This operation is often used on sorted sequences.
- Parameters:
is_equal_expr – a C expression evaluating to a
bool
, represented as a string. The elements being compared are available asa
andb
. If this expression yieldsfalse
, the two are considered distinct.extra_args – a list of tuples (name, value) specifying extra arguments to pass to the scan procedure. For version 2013.1, value must be a of a
numpy
sized scalar type. As of version 2013.2, value may also be apyopencl.array.Array
.preamble – A snippet of C that is inserted into the compiled kernel before the actual kernel function. May be used for, e.g. type definitions or include statements.
wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
a tuple (out, count, event) where out is the output array, count is an on-device scalar (fetch to host with
count.get()
) indicating how many elements satisfied the predicate, and event is apyopencl.Event
for dependency management.
Added in version 2013.1.
Sorting (radix sort)¶
- class pyopencl.algorithm.RadixSort(context, arguments, key_expr, sort_arg_names, bits_at_a_time=2, index_dtype=<class 'numpy.int32'>, key_dtype=<class 'numpy.uint32'>, scan_kernel=<class 'pyopencl.scan.GenericScanKernel'>, options=None)[source]¶
Provides a general radix sort on the compute device.
See also
Added in version 2013.1.
- __call__(*args, **kwargs)[source]¶
Run the radix sort. In addition to args which must match the arguments specification on the constructor, the following keyword arguments are supported:
- Parameters:
key_bits – specify how many bits (starting from least-significant) there are in the key.
allocator – See the allocator argument of
pyopencl.array.empty()
.queue – A
pyopencl.CommandQueue
, defaulting to the one from the first argument array.wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
A tuple
(sorted, event)
. sorted consists of sorted copies of the arrays named in sorted_args, in the order of that list. event is apyopencl.Event
for dependency management.
Building many variable-size lists¶
- class pyopencl.algorithm.ListOfListsBuilder(context, list_names_and_dtypes, generate_template, arg_decls, count_sharing=None, devices=None, name_prefix='plb_build_list', options=None, preamble='', debug=False, complex_kernel=False, eliminate_empty_output_lists=False)[source]¶
Generates and executes code to produce a large number of variable-size lists, simply.
Note
This functionality is provided as a preview. Its interface is subject to change until this notice is removed.
Added in version 2013.1.
Here’s a usage example:
from pyopencl.algorithm import ListOfListsBuilder builder = ListOfListsBuilder(context, [("mylist", np.int32)], """ void generate(LIST_ARG_DECL USER_ARG_DECL index_type i) { int count = i % 4; for (int j = 0; j < count; ++j) { APPEND_mylist(count); } } """, arg_decls=[]) result, event = builder(queue, 2000) inf = result["mylist"] assert inf.count == 3000 assert (inf.list.get()[-6:] == [1, 2, 2, 3, 3, 3]).all()
The function
generate
above is called once for each “input object”. Each input object can then generate zero or more list entries. The number of these input objects is given to__call__()
as n_objects. List entries are generated by calls toAPPEND_<list name>(value)
. Multiple lists may be generated at once.- __init__(context, list_names_and_dtypes, generate_template, arg_decls, count_sharing=None, devices=None, name_prefix='plb_build_list', options=None, preamble='', debug=False, complex_kernel=False, eliminate_empty_output_lists=False)[source]¶
- Parameters:
context – A
pyopencl.Context
.list_names_and_dtypes – a list of
(name, dtype)
tuples indicating the lists to be built.generate_template – a snippet of C as described below
arg_decls – A string of comma-separated C argument declarations.
count_sharing – A mapping consisting of
(child, mother)
indicating thatmother
andchild
will always have the same number of indices, and theAPPEND
tomother
will always happen before theAPPEND
to the child.name_prefix – the name prefix to use for the compiled kernels
options – OpenCL compilation options for kernels using generate_template.
complex_kernel – If True, prevents vectorization on CPUs.
eliminate_empty_output_lists – A Python list of list names for which the empty output lists are eliminated.
generate_template may use the following C macros/identifiers:
index_type
: expands to C identifier for the index type used for the calculationUSER_ARG_DECL
: expands to the C declarator forarg_decls
USER_ARGS
: a list of C argument values corresponding touser_arg_decl
LIST_ARG_DECL
: expands to a C argument list representing the data for the output lists. These are escaped prefixed with"plg_"
so as to not interfere with user-provided names.LIST_ARGS
: a list of C argument values corresponding toLIST_ARG_DECL
APPEND_name(entry)
: insertsentry
into the listname
. entry must be a valid C expression of the correct type.
All argument-list related macros have a trailing comma included if they are non-empty.
generate_template must supply a function:
void generate(USER_ARG_DECL LIST_ARG_DECL index_type i) { APPEND_mylist(5); }
Internally, the
kernel_template
is expanded (at least) twice. Once, for a ‘counting’ stage where the size of all the lists is determined, and a second time, for a ‘generation’ stage where the lists are actually filled. Agenerate
function that has side effects beyond callingappend
is therefore ill-formed.Changed in version 2018.1: Change eliminate_empty_output_lists argument type from
bool
tolist
.
- __call__(queue, n_objects, *args, **kwargs)[source]¶
- Parameters:
args – arguments corresponding to
arg_decls
in the constructor. Array-like arguments must be either 1Dpyopencl.array.Array
objects orpyopencl.MemoryObject
objects, of which the latter can be obtained from apyopencl.array.Array
using thepyopencl.array.Array.data
attribute.allocator – optionally, the allocator to use to allocate new arrays.
omit_lists – an iterable of list names that should not be built with this invocation. The kernel code may not call
APPEND_name
for these omitted lists. If it does, undefined behavior will result. The returned lists dictionary will not contain an entry for names in omit_lists.wait_for – wait_for may either be None or a list of
pyopencl.Event
instances for whose completion this command waits before starting exeuction.
- Returns:
a tuple
(lists, event)
, wherelists
is a mapping from (built) list names to objects which have attributescount
for the total number of entries in all lists combinedlists
for the array containing all lists.starts
for the array of starting indices inlists
.starts
is built so that it has n+1 entries, so that the i’th entry is the start of the i’th list, and the i’th entry is the index one past the i’th list’s end, even for the last list.This implies that all lists are contiguous.
If the list name is specified in eliminate_empty_output_lists constructor argument, lists has two additional attributes
num_nonempty_lists
andnonempty_indices
num_nonempty_lists
for the number of nonempty lists.nonempty_indices
for the index of nonempty list in input objects.
In this case,
starts
hasnum_nonempty_lists + 1
entries. The i’s entry is the start of the i’th nonempty list, which is generated by the object with indexnonempty_indices[i]
.event is a
pyopencl.Event
for dependency management.
Changed in version 2016.2: Added omit_lists.
Bitonic Sort¶
- class pyopencl.bitonic_sort.BitonicSort(context)[source]¶
Sort an array (or one axis of one) using a sorting network.
Will only work if the axis of the array to be sorted has a length that is a power of 2.
Added in version 2015.2.
See also
- __call__(arr, idx=None, queue=None, wait_for=None, axis=0)[source]¶
- Parameters:
arr – the array to be sorted. Will be overwritten with the sorted array.
idx – an array of indices to be tracked along with the sorting of arr
queue – a
pyopencl.CommandQueue
, defaults to the array’s queue if Nonewait_for – a list of
pyopencl.Event
instances or Noneaxis – the axis of the array by which to sort
- Returns:
a tuple (sorted_array, event)