Welcome to loopy’s documentation!¶
loopy is a code generator for array-based code in the OpenCL/CUDA execution model. Here’s a very simple example of how to double the entries of a vector using loopy:
import numpy as np
import pyopencl as cl
import pyopencl.array
import loopy as lp
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2 # noqa: F401
# setup
# -----
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
n = 15 * 10**6
a = cl.array.arange(queue, n, dtype=np.float32)
# create
# ------
knl = lp.make_kernel(
"{ [i]: 0<=i<n }",
"out[i] = 2*a[i]")
# transform
# ---------
knl = lp.split_iname(knl, "i", 128, outer_tag="g.0", inner_tag="l.0")
# execute
# -------
# easy, slower:
evt, (out,) = knl(queue, a=a)
# efficient, with caching:
knl_ex = knl.executor(ctx)
evt, (out,) = knl_ex(queue, a=a)
This example is included in the loopy
distribution as
examples/python/hello-loopy.py
.
When you run this script, the following kernel is generated, compiled, and executed:
#define lid(N) ((int) get_local_id(N))
#define gid(N) ((int) get_group_id(N))
__kernel void __attribute__ ((reqd_work_group_size(128, 1, 1)))
loopy_kernel(__global float *restrict out, __global float const *restrict a, int const n)
{
if ((-1 + -128 * gid(0) + -1 * lid(0) + n) >= 0)
out[lid(0) + gid(0) * 128] = 2.0f * a[lid(0) + gid(0) * 128];
}
(See the full example for how to print the generated code.)
Table of Contents¶
If you’re only just learning about loopy, consider the following paper on loopy that may serve as a good introduction.
Please check Installation to get started.
- Tutorial
- Reference: Creating Kernels
- Reference: Loopy’s Model of a Kernel
- What Types of Computation can a Loopy Program Express?
- Loop Domain Forest
- Identifiers
- Instructions
- Data: Arguments and Temporaries
- Substitution Rules
- Kernel Options
- Targets
- Helper values
- Libraries: Extending and Interfacing with External Functionality
- The Kernel Object
- Implementation Details: The Base Array
- Translation Units
- Reference: Transforming Kernels
- Dealing with Parameters
- Wrangling inames
- Dealing with Substitution Rules
- Caching, Precomputation and Prefetching
- Influencing data access
- Padding Data
- Manipulating Instructions
- Registering Library Routines
- Modifying Arguments
- Creating Batches of Operations
- Finishing up
- Setting options
- Matching contexts
- Function Interface
- Reference: Other Functionality
- Installation
- User-visible Changes
- Licensing
- Frequently Asked Questions
- Is Loopy specific to OpenCL?
- For what types of codes does
loopy
work well? - Can I see some examples?
- Specifying dependencies for groups of instructions is cumbersome. Help?
- What types of transformations can I do?
- In what sense does Loopy support vectorization?
- What is the story with language versioning?
- Uh-oh. I got a scheduling error. Any hints?
- Citing Loopy
- Getting help
- Acknowledgments
- Cross-References to Other Documentation
- Reference: Documentation for Internal API
- 🚀 Github
- 💾 Download Releases