Welcome to loopy’s documentation!

loopy is a code generator for array-based code in the OpenCL/CUDA execution model. Here’s a very simple example of how to double the entries of a vector using loopy:

import numpy as np
import loopy as lp
import pyopencl as cl
import pyopencl.array
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2  # noqa: F401

# setup
# -----
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

n = 15 * 10**6
a = cl.array.arange(queue, n, dtype=np.float32)

# create
# ------
knl = lp.make_kernel(
        "{ [i]: 0<=i<n }",
        "out[i] = 2*a[i]")

# transform
# ---------
knl = lp.split_iname(knl, "i", 128, outer_tag="g.0", inner_tag="l.0")

# execute
# -------
# easy, slower:
evt, (out,) = knl(queue, a=a)
# efficient, with caching:
knl_ex = knl.executor(ctx)
evt, (out,) = knl_ex(queue, a=a)

This example is included in the loopy distribution as examples/python/hello-loopy.py.

When you run this script, the following kernel is generated, compiled, and executed:

#define lid(N) ((int) get_local_id(N))
#define gid(N) ((int) get_group_id(N))

__kernel void __attribute__ ((reqd_work_group_size(128, 1, 1)))
  loopy_kernel(__global float *restrict out, __global float const *restrict a, int const n)
{

    if ((-1 + -128 * gid(0) + -1 * lid(0) + n) >= 0)
          out[lid(0) + gid(0) * 128] = 2.0f * a[lid(0) + gid(0) * 128];
}

(See the full example for how to print the generated code.)

Table of Contents

If you’re only just learning about loopy, consider the following paper on loopy that may serve as a good introduction.

Please check Installation to get started.

Indices and tables