OpenCL Runtime: Programs and Kernels

Program

class pyopencl.Program(context, src)
class pyopencl.Program(context, devices, binaries)

binaries must contain one binary for each entry in devices. If src is a bytes object starting with a valid SPIR-V magic number, it will be handed off to the OpenCL implementation as such, rather than as OpenCL C source code. (SPIR-V support requires OpenCL 2.1.)

Changed in version 2016.2: Add support for SPIR-V.

info

Lower case versions of the program_info constants may be used as attributes on instances of this class to directly query info attributes.

get_info(param)

See program_info for values of param.

get_build_info(device, param)

See program_build_info for values of param.

build(options=[], devices=None, cache_dir=None)

options is a string of compiler flags. Returns self.

If cache_dir is not None - built binaries are cached in an on-disk cache with given path. If passed cache_dir is None, but context of this program was created with not-None cache_dir - it will be used as cache directory. If passed cache_dir is None and context was created with None cache_dir: built binaries will be cached in an on-disk cache called pyopencl-compiler-cache-vN-uidNAME-pyVERSION in the directory returned by tempfile.gettempdir(). By setting the environment variable PYOPENCL_NO_CACHE to any non-empty value, this caching is suppressed. Any options found in the environment variable PYOPENCL_BUILD_OPTIONS will be appended to options.

Changed in version 2011.1: options may now also be a list of str.

Changed in version 2013.1: Added PYOPENCL_NO_CACHE. Added PYOPENCL_BUILD_OPTIONS.

compile(self, options=[], devices=None, headers=[])
Parameters:headers – a list of tuples (name, program).

Only available with CL 1.2.

New in version 2011.2.

kernel_name

You may use program.kernel_name to obtain a Kernel objects from a program. Note that every lookup of this type produces a new kernel object, so that this won’t work:

prg.sum.set_args(a_g, b_g, res_g)
ev = cl.enqueue_nd_range_kernel(queue, prg.sum, a_np.shape, None)

Instead, either use the (recommended, stateless) calling interface:

prg.sum(queue, prg.sum, a_np.shape, None)

or keep the kernel in a temporary variable:

sum_knl = prg.sum
sum_knl.set_args(a_g, b_g, res_g)
ev = cl.enqueue_nd_range_kernel(queue, sum_knl, a_np.shape, None)

Note that the Program has to be built (see build()) in order for this to work simply by attribute lookup.

Note

The program_info attributes live in the same name space and take precedence over Kernel names.

all_kernels()

Returns a list of all Kernel objects in the Program.

static from_int_ptr(int_ptr_value)

Constructs a pyopencl handle from a C-level pointer (given as the integer int_ptr_value). If retain is True (the defauult) pyopencl will call clRetainXXX on the provided object. If the previous owner of the object will not release the reference, retain should be set to False, to effectively transfer ownership to pyopencl.

Changed in version 2016.1: retain added

int_ptr

Instances of this class are hashable, and two instances of this class may be compared using “==” and ”!=”. (Hashability was added in version 2011.2.) Two objects are considered the same if the underlying OpenCL object is the same, as established by C pointer equality.

pyopencl.create_program_with_built_in_kernels(context, devices, kernel_names)

Only available with CL 1.2.

New in version 2011.2.

Only available with CL 1.2.

New in version 2011.2.

pyopencl.unload_platform_compiler(platform)

Only available with CL 1.2.

New in version 2011.2.

Kernel

class pyopencl.Kernel(program, name)
info

Lower case versions of the kernel_info constants may be used as attributes on instances of this class to directly query info attributes.

get_info(param)

See kernel_info for values of param.

get_work_group_info(param, device)

See kernel_work_group_info for values of param.

get_arg_info(arg_index, param)

See kernel_arg_info for values of param.

Only available in OpenCL 1.2 and newer.

set_arg(self, index, arg)

arg may be

  • None: This may be passed for __global memory references to pass a NULL pointer to the kernel.

  • Anything that satisfies the Python buffer interface, in particular numpy.ndarray, str, or numpy‘s sized scalars, such as numpy.int32 or numpy.float64.

    Note

    Note that Python’s own int or float objects will not work out of the box. See Kernel.set_scalar_arg_dtypes() for a way to make them work. Alternatively, the standard library module struct can be used to convert Python’s native number types to binary data in a str.

  • An instance of MemoryObject. (e.g. Buffer, Image, etc.)

  • An instance of LocalMemory.

  • An instance of Sampler.

set_args(self, *args)

Invoke set_arg() on each element of args in turn.

New in version 0.92.

set_scalar_arg_dtypes(arg_dtypes)

Inform the wrapper about the sized types of scalar Kernel arguments. For each argument, arg_dtypes contains an entry. For non-scalars, this must be None. For scalars, it must be an object acceptable to the numpy.dtype constructor, indicating that the corresponding scalar argument is of that type.

After invoking this function with the proper information, most suitable number types will automatically be cast to the right type for kernel invocation.

Note

The information set by this rountine is attached to a single kernel instance. A new kernel instance is created every time you use program.kernel attribute access. The following will therefore not work:

prg = cl.Program(...).build()
prg.kernel.set_scalar_arg_dtypes(...)
prg.kernel(queue, n_globals, None, args)
__call__(queue, global_size, local_size, *args, global_offset=None, wait_for=None, g_times_l=False)

Use enqueue_nd_range_kernel() to enqueue a kernel execution, after using set_args() to set each argument in turn. See the documentation for set_arg() to see what argument types are allowed. Returns a new pyopencl.Event. wait_for may either be None or a list of pyopencl.Event instances for whose completion this command waits before starting exeuction.

None may be passed for local_size.

If g_times_l is specified, the global size will be multiplied by the local size. (which makes the behavior more like Nvidia CUDA) In this case, global_size and local_size also do not have to have the same number of dimensions.

Note

__call__() is not thread-safe. It sets the arguments using set_args() and then runs enqueue_nd_range_kernel(). Another thread could race it in doing the same things, with undefined outcome. This issue is inherited from the C-level OpenCL API. The recommended solution is to make a kernel (i.e. access prg.kernel_name, which corresponds to making a new kernel) for every thread that may enqueue calls to the kernel.

A solution involving implicit locks was discussed and decided against on the mailing list in October 2012.

Changed in version 0.92: local_size was promoted to third positional argument from being a keyword argument. The old keyword argument usage will continue to be accepted with a warning throughout the 0.92 release cycle. This is a backward-compatible change (just barely!) because local_size as third positional argument can only be a tuple or None. tuple instances are never valid Kernel arguments, and None is valid as an argument, but its treatment in the wrapper had a bug (now fixed) that prevented it from working.

Changed in version 2011.1: Added the g_times_l keyword arg.

capture_call(filename, queue, global_size, local_size, *args, global_offset=None, wait_for=None, g_times_l=False)

This method supports the exact same interface as __call__(), but instead of invoking the kernel, it writes a self-contained PyOpenCL program to filename that reproduces this invocation. Data and kernel source code will be packaged up in filename‘s source code.

This is mainly intended as a debugging aid. For example, it can be used to automate the task of creating a small, self-contained test case for an observed problem. It can also help separate a misbehaving kernel from a potentially large or time-consuming outer code.

To use, simply change:

evt = my_kernel(queue, gsize, lsize, arg1, arg2, ...)

to:

evt = my_kernel.capture_call("bug.py", queue, gsize, lsize, arg1, arg2, ...)

New in version 2013.1.

from_int_ptr(int_ptr_value, retain=True)

Constructs a pyopencl handle from a C-level pointer (given as the integer int_ptr_value). If retain is True (the defauult) pyopencl will call clRetainXXX on the provided object. If the previous owner of the object will not release the reference, retain should be set to False, to effectively transfer ownership to pyopencl.

Changed in version 2016.1: retain added

int_ptr

Instances of this class are hashable, and two instances of this class may be compared using “==” and ”!=”. (Hashability was added in version 2011.2.) Two objects are considered the same if the underlying OpenCL object is the same, as established by C pointer equality.

class pyopencl.LocalMemory(size)

A helper class to pass __local memory arguments to kernels.

New in version 0.91.2.

size

The size of local buffer in bytes to be provided.

pyopencl.enqueue_nd_range_kernel(queue, kernel, global_work_size, local_work_size, global_work_offset=None, wait_for=None, g_times_l=False)

Returns a new pyopencl.Event. wait_for may either be None or a list of pyopencl.Event instances for whose completion this command waits before starting exeuction.

If g_times_l is specified, the global size will be multiplied by the local size. (which makes the behavior more like Nvidia CUDA) In this case, global_size and local_size also do not have to have the same number of dimensions.

Changed in version 2011.1: Added the g_times_l keyword arg.

pyopencl.enqueue_task(queue, kernel, wait_for=None)

Returns a new pyopencl.Event. wait_for may either be None or a list of pyopencl.Event instances for whose completion this command waits before starting exeuction.