Welcome to loopy’s documentation!#

loopy is a code generator for array-based code in the OpenCL/CUDA execution model. Here’s a very simple example of how to double the entries of a vector using loopy:

import numpy as np
import loopy as lp
import pyopencl as cl
import pyopencl.array
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2  # noqa: F401

# setup
# -----
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

n = 15 * 10**6
a = cl.array.arange(queue, n, dtype=np.float32)

# create
# ------
knl = lp.make_kernel(
        "{ [i]: 0<=i<n }",
        "out[i] = 2*a[i]")

# transform
# ---------
knl = lp.split_iname(knl, "i", 128, outer_tag="g.0", inner_tag="l.0")

# execute
# -------
evt, (out,) = knl(queue, a=a)

This example is included in the loopy distribution as examples/python/hello-loopy.py.

When you run this script, the following kernel is generated, compiled, and executed:

#define lid(N) ((int) get_local_id(N))
#define gid(N) ((int) get_group_id(N))

__kernel void __attribute__ ((reqd_work_group_size(128, 1, 1)))
  loopy_kernel(__global float *restrict out, __global float const *restrict a, int const n)
{

    if ((-1 + -128 * gid(0) + -1 * lid(0) + n) >= 0)
          out[lid(0) + gid(0) * 128] = 2.0f * a[lid(0) + gid(0) * 128];
}

(See the full example for how to print the generated code.)

Table of Contents#

If you’re only just learning about loopy, consider the following paper on loopy that may serve as a good introduction.

Please check Installation to get started.

Indices and tables#