Reference: Loopy’s Model of a Kernel

Loop Domain Forest

Example:

{ [i]: 0<=i<n }

A kernel’s iteration domain is given by a list of islpy.BasicSet instances (which parametrically represent multi-dimensional sets of tuples of integers). They define the integer values of the loop variables for which instructions (see below) will be executed. It is written in ISL syntax. loopy calls the loop variables inames. In this case, i is the sole iname. The loop domain is given as a conjunction of affine equality and inequality constraints. Integer divisibility constraints (resulting in strides) are also allowed. In the absence of divisibility constraints, the loop domain is convex.

Note that n in the example is not an iname. It is a Domain parameters that is passed to the kernel by the user.

To accommodate some data-dependent control flow, there is not actually a single loop domain, but rather a forest of loop domains (a collection of trees) allowing more deeply nested domains to depend on inames introduced by domains closer to the root.

Here is an example:

{ [l] : 0 <= l <= 2 }
  { [i] : start <= i < end }
  { [j] : start <= j < end }

The i and j domains are “children” of the l domain (visible from indentation). This is also how loopy prints the domain forest, to make the parent/child relationship visible. In the example, the parameters start/end might be read inside of the ‘l’ loop.

The idea is that domains form a forest (a collection of trees), and a “sub-forest” is extracted that covers all the inames for each instruction. Each individual sub-tree is then checked for branching, which is ill-formed. It is declared ill-formed because intersecting, in the above case, the l, i, and j domains could result in restrictions from the i domain affecting the j domain by way of how i affects l–which would be counterintuitive to say the least.)

Inames

Loops are (by default) entered exactly once. This is necessary to preserve dependency semantics–otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest.

ISL syntax

The general syntax of an ISL set is the following:

{[VARIABLES]: CONDITIONS}

VARIABLES is a simple list of identifiers representing loop indices, or, as loopy calls them, inames. Example:

{[i, j, k]: CONDITIONS}

The following constructs are supported for CONDITIONS:

  • Simple conditions: i <= 15, i>0
  • Conjunctions: i > 0 and i <= 15
  • Two-sided conditions: 0 < i <= 15 (equivalent to the previous example)
  • Identical conditions on multiple variables: 0 < i,j <= 15
  • Equality constraints: i = j*3 (Note: =, not ==.)
  • Modulo: i mod 3 = 0
  • Existential quantifiers: (exists l: i = 3*l) (equivalent to the previous example)

Examples of constructs that are not allowed:

  • Multiplication by non-constants: j*k
  • Disjunction: (i=1) or (i=5) (Note: This may be added in a future version of loopy. For now, loop domains have to be convex.)

Domain parameters

Domain parameters are identifiers being used in loop domains that are not inames, i.e. they do not define loop variables. In the following domain specification, n is a domain parameter:

{[i,j]: 0 <= i,j < n}

Values of domain parameters arise from

Iname Implementation Tags

Tag Meaning
None | "for" Sequential loop
"ord" Forced-order sequential loop
"l.N" Local (intra-group) axis N (“local”)
"g.N" Group-number axis N (“group”)
"unr" Unroll
"ilp" | "ilp.unr" Unroll using instruction-level parallelism
"ilp.seq" Realize parallel iname as innermost loop
"like.INAME" Can be used when tagging inames to tag like another
"unused.g" | "unused.l" Can be to tag as the next unused group/local axis

(Throughout this table, N must be replaced by an actual, zero-based number.)

“ILP” does three things:

  • Restricts loops to be innermost
  • Duplicates reduction storage for any reductions nested around ILP usage
  • Causes a loop (unrolled or not) to be opened/generated for each involved instruction

Instructions

class loopy.InstructionBase(id, depends_on, depends_on_is_final, groups, conflicts_with_groups, no_sync_with, within_inames_is_final, within_inames, priority, boostable, boostable_into, predicates, tags, insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)

A base class for all types of instruction that can occur in a kernel.

id

An (otherwise meaningless) identifier that is unique within a loopy.kernel.LoopKernel.

Instruction ordering

depends_on

a frozenset of id values of Instruction instances that must be executed before this one. Note that loopy.preprocess_kernel() (usually invoked automatically) augments this by adding dependencies on any writes to temporaries read by this instruction.

May be None to invoke the default.

There are two extensions to this:

depends_on_is_final

A bool determining whether depends_on constitutes the entire list of iname dependencies.

Defaults to False.

groups

A frozenset of strings indicating the names of ‘instruction groups’ of which this instruction is a part. An instruction group is considered ‘active’ as long as one (but not all) instructions of the group have been executed.

conflicts_with_groups

A frozenset of strings indicating which instruction groups (see InstructionBase.groups) may not be active when this instruction is scheduled.

priority

Scheduling priority, an integer. Higher means ‘execute sooner’. Default 0.

Synchronization

no_sync_with

a frozenset of tuples of the form (insn_id, scope), where insn_id refers to id of Instruction instances and scope is one of the following strings:

  • “local”
  • “global”
  • “any”.

An element (insn_id, scope) means “do not consider any variable access conflicting for variables of scope between this instruction and insn_id”. Specifically, loopy will not complain even if it detects that accesses potentially requiring ordering (e.g. by dependencies) exist, and it will not emit barriers to guard any dependencies from this instruction on insn_id that may exist.

Note, that no_sync_with allows instruction matching through wildcards and match expression, just like depends_on.

This data is used specifically by barrier insertion and loopy.check.enforce_variable_access_ordered().

Conditionals

predicates

a frozenset of expressions. The conjunction (logical and) of their truth values (as defined by C) determines whether this instruction should be run.

Iname dependencies

within_inames

A frozenset of inames identifying the loops within which this instruction will be executed.

Iname dependencies

Tagging

tags

A frozenset of string identifiers that can be used to identify groups of instructions.

Tags starting with exclamation marks (!) are reserved and may have specific meanings defined by loopy or its targets.

__init__(id, depends_on, depends_on_is_final, groups, conflicts_with_groups, no_sync_with, within_inames_is_final, within_inames, priority, boostable, boostable_into, predicates, tags, insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)

Initialize self. See help(type(self)) for accurate signature.

assignee_var_names()

Return a tuple of tuples of assignee variable names, one for each quantity being assigned to.

assignee_subscript_deps()

Return a list of sets of variable names referred to in the subscripts of the quantities being assigned to, one for each assignee.

with_transformed_expressions(f, *args)

Return a new copy of self where f has been applied to every expression occurring in self. args will be passed as extra arguments (in addition to the expression) to f.

write_dependency_names()

Return a set of dependencies of the left hand side of the assignments performed by this instruction, including written variables and indices.

dependency_names()
copy(**kwargs)

Assignment objects

class loopy.Assignment(assignee, expression, id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, boostable=None, boostable_into=None, tags=None, temp_var_type=None, atomicity=(), priority=0, predicates=frozenset(), insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)
assignee
expression

The following attributes are only used until loopy.make_kernel() is finished:

temp_var_type

if not None, a type that will be assigned to the new temporary variable created from the assignee

atomicity

A tuple of instances of VarAtomicity. Together, they describe to what extent the assignment is to be carried out in a way that involves atomic operations.

To describe an atomic update, any memory reads of exact occurrences of the left-hand side expression of the assignment in the right hand side are treated , together with the “memory write” part of the assignment, as part of one single atomic update.

Note

Exact identity of the LHS with RHS subexpressions is required for an atomic update to be recognized. For example, the following update will not be recognized as an update:

z[i] = z[i+1-1] + a {atomic}

loopy may choose to evaluate the right-hand side multiple times as part of a single assignment. It is up to the user to ensure that this retains correct semantics.

For example, the following assignment:

z[i] = f(z[i]) + a {atomic}

may generate the following (pseudo-)code:

DO
    READ ztemp_old = z[i]
    EVALUATE ztemp_new = f(ztemp_old) + a
WHILE compare_and_swap(z[i], ztemp_new, ztemp_old) did not succeed
__init__(assignee, expression, id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, boostable=None, boostable_into=None, tags=None, temp_var_type=None, atomicity=(), priority=0, predicates=frozenset(), insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)

Initialize self. See help(type(self)) for accurate signature.

Textual Assignment Syntax

The general syntax of an instruction is a simple assignment:

LHS[i,j,k] = EXPRESSION

Several extensions of this syntax are defined, as discussed below. They may be combined freely.

You can also use an instruction to declare a new temporary variable. (See Temporary Variables.) See Specifying Types for what types are acceptable. If the LHS has a subscript, bounds on the indices are inferred (which must be constants at the time of kernel creation) and the declared temporary is created as an array. Instructions declaring temporaries have the following form:

<temp_var_type> LHS[i,j,k] = EXPRESSION

You can also create a temporary and ask loopy to determine its type automatically. This uses the following syntax:

<> LHS[i,j,k] = EXPRESSION

Lastly, each instruction may optionally have a number of attributes specified, using the following format:

LHS[i,j,k] = EXPRESSION {attr1,attr2=value1:value2}

These are usually key-value pairs. The following attributes are recognized:

  • id=value sets the instruction’s identifier to value. value must be unique within the kernel. This identifier is used to refer to the instruction after it has been created, such as from dep attributes (see below) or from context matches.

  • id_prefix=value also sets the instruction’s identifier, however uniqueness is ensured by loopy itself, by appending further components (often numbers) to the given id_prefix.

  • inames=i:j:k forces the instruction to reside within the loops over Inames i, j and k (and only those).

    Note

    The default for the inames that the instruction depends on is the inames used in the instruction itself plus the common subset of inames shared by writers of all variables read by the instruction.

    You can add a plus sign (“+”) to the front of this option value to indicate that you would like the inames you specify here to be in addition to the ones found by the heuristic described above.

  • dup=i:j->j_new:k->k_new makes a copy of the inames i, j, and k, with all the same domain constraints as the original inames. A new name of the copy of i will be automatically chosen, whereas the new name of j will be j_new, and the new name of k will be k_new.

    This is a shortcut for calling loopy.duplicate_inames() later (once the kernel is created).

  • dep=id1:id2 creates a dependency of this instruction on the instructions with identifiers id1 and id2. The meaning of this dependency is that the code generated for this instruction is required to appear textually after all of these dependees’ generated code.

    Identifiers here are allowed to be wildcards as defined by the Python function fnmatch.fnmatchcase(). This is helpful in conjunction with id_prefix.

    Note

    Since specifying all possible dependencies is cumbersome and error-prone, loopy employs a heuristic to automatically find dependencies. Specifically, loopy will automatically add a dependency to an instruction reading a variable if there is exactly one instruction writing that variable. (“Variable” here may mean either temporary variable or kernel argument.)

    If each variable in a kernel is only written once, then this heuristic should be able to compute all required dependencies.

    Conversely, if a variable is written by two different instructions, all ordering around that variable needs to be specified explicitly. It is recommended to use get_dot_dependency_graph() to visualize the dependency graph of possible orderings.

    You may use a leading asterisk (“*”) to turn off the single-writer heuristic and indicate that the specified list of dependencies is exhaustive.

  • dep_query=... provides an alternative way of specifying instruction dependencies. The given string is parsed as a match expression object by loopy.match.parse_match(). Upon kernel generation, this match expression is used to match instructions in the kernel and add them as dependencies.

  • nosync=id1:id2 prescribes that no barrier synchronization is necessary for the instructions with identifiers id1 and id2, even if a dependency chain exists and variables are accessed in an apparently racy way.

    Identifiers here are allowed to be wildcards as defined by the Python function fnmatch.fnmatchcase(). This is helpful in conjunction with id_prefix.

    Identifiers (including wildcards) accept an optional @scope suffix, which prescribes that no synchronization at level scope is needed. This does not preclude barriers at levels different from scope. Allowable scope values are:

    • local
    • global
    • any

    As an example, nosync=id1@local:id2@global prescribes that no local synchronization is needed with instruction id1 and no global synchronization is needed with instruction id2.

    nosync=id1@any has the same effect as nosync=id1.

  • nosync_query=... provides an alternative way of specifying nosync, just like dep_query and dep. As with nosync, nosync_query accepts an optional @scope suffix.

  • priority=integer sets the instructions priority to the value integer. Instructions with higher priority will be scheduled sooner, if possible. Note that the scheduler may still schedule a lower-priority instruction ahead of a higher-priority one if loop orders or dependencies require it.

  • if=variable1:variable2 Only execute this instruction if all condition variables (which must be scalar variables) evaluate to true (as defined by C).

  • tags=tag1:tag2 Apply tags to this instruction that can then be used for Matching contexts.

  • groups=group1:group2 Make this instruction part of the given instruction groups. See InstructionBase.groups.

  • conflicts_grp=group1:group2 Make this instruction conflict with the given instruction groups. See InstructionBase.conflicts_with_groups.

  • atomic The update embodied by the assignment is carried out atomically. See Assignment.atomicity for precise semantics.

Expressions

Loopy’s expressions are a slight superset of the expressions supported by pymbolic.

  • if(cond, then, else_)

  • a[[ 8*i + j ]]: Linear subscripts. See loopy.symbolic.LinearSubscript.

  • reductions See loopy.symbolic.Reduction.

    • reduce vs simul_reduce
  • complex-valued arithmetic

  • tagging of array access and substitution rule use (“$”) See loopy.symbolic.TaggedVariable.

  • indexof, indexof_vec

  • cast(type, value): No parse syntax currently. See loopy.symbolic.TypeCast.

TODO: Functions TODO: Reductions

Function Call Instructions

class loopy.CallInstruction(assignees, expression, id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, boostable=None, boostable_into=None, tags=None, temp_var_types=None, priority=0, predicates=frozenset(), insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)

An instruction capturing a function call. Unlike Assignment, this instruction supports functions with multiple return values.

assignees

A tuple of left-hand sides for the assignment

expression

The following attributes are only used until loopy.make_kernel() is finished:

temp_var_types

if not None, a type that will be assigned to the new temporary variable created from the assignee

__init__(assignees, expression, id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, boostable=None, boostable_into=None, tags=None, temp_var_types=None, priority=0, predicates=frozenset(), insn_deps=None, insn_deps_is_final=None, forced_iname_deps=None, forced_iname_deps_is_final=None)

Initialize self. See help(type(self)) for accurate signature.

C Block Instructions

class loopy.CInstruction(iname_exprs, code, read_variables=frozenset(), assignees=(), id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=0, boostable=None, boostable_into=None, predicates=frozenset(), tags=None, insn_deps=None, insn_deps_is_final=None)
iname_exprs

A list of tuples (name, expr) of inames or expressions based on them that the instruction needs access to.

code

The C code to be executed.

The code should obey the following rules:

  • It should only write to temporary variables, specifically the temporary variables

Note

Of course, nothing in loopy will prevent you from doing ‘forbidden’ things in your C code. If you ignore the rules and something breaks, you get to keep both pieces.

read_variables

A frozenset of variable names that code reads. This is optional and only used for figuring out dependencies.

assignees

A sequence (typically a tuple) of variable references (with or without subscript) as pymbolic.primitives.Expression instances that code writes to. This is optional and only used for figuring out dependencies.

Atomic Operations

class loopy.MemoryOrdering

Ordering of atomic operations, defined as in C11 and OpenCL.

RELAXED
ACQUIRE
RELEASE
ACQ_REL
SEQ_CST
class loopy.MemoryScope

Scope of atomicity, defined as in OpenCL.

auto

Scope matches the accessibility of the variable.

WORK_ITEM
WORK_GROUP
WORK_DEVICE
ALL_SVM_DEVICES
class loopy.VarAtomicity(var_name)

A base class for the description of how atomic access to var_name shall proceed.

var_name
class loopy.AtomicInit(var_name)

Describes initialization of an atomic variable. A subclass of OrderedAtomic.

ordering

One of the values from MemoryOrdering

scope

One of the values from MemoryScope

class loopy.AtomicUpdate(var_name)

Properties of an atomic update. A subclass of OrderedAtomic.

ordering

One of the values from MemoryOrdering

scope

One of the values from MemoryScope

No-Op Instruction

class loopy.NoOpInstruction(id=None, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=None, boostable=None, boostable_into=None, predicates=None, tags=None)

An instruction that carries out no operation. It is mainly useful as a way to structure dependencies between other instructions.

The textual syntax in a loopy kernel is:

... nop

Barrier Instructions

class loopy.BarrierInstruction(id, depends_on=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=None, boostable=None, boostable_into=None, predicates=None, tags=None, synchronization_kind='global', mem_kind='local')

An instruction that requires synchronization with all concurrent work items of synchronization_kind.

synchronization_kind

A string, "global" or "local".

mem_kind

A string, "global" or "local". Chooses which memory type to sychronize, for targets that require this (e.g. OpenCL)

The textual syntax in a loopy kernel is:

... gbarrier
... lbarrier

Note that the memory type mem_kind can be specified for local barriers:

... lbarrier {mem_kind=global}

Data: Arguments and Temporaries

Kernels operate on two types of data: ‘arguments’ carrying data into and out of a kernel, and temporaries with lifetimes tied to the runtime of the kernel.

Arguments

class loopy.KernelArgument(**kwargs)

Base class for all argument types

class loopy.ValueArg(name, dtype=None, approximately=1000, target=None, is_output_only=False)
get_arg_decl(ast_builder)
update_persistent_hash(key_hash, key_builder)

Custom hash computation function for use with pytools.persistent_dict.PersistentDict.

class loopy.ArrayArg(*args, **kwargs)
name
dtype

The loopy.types.LoopyType of the array. If this is None, loopy will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (via loopy.LoopKernel.__call__()) automatically adds this type information based on invocation arguments.

Note that some transformations, such as loopy.add_padding() cannot be performed without knowledge of the exact dtype.

shape

May be one of the following:

  • None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.

  • loopy.auto. The shape will be determined by finding the access footprint.

  • a tuple like like numpy.ndarray.shape.

    Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    Any element of the shape tuple not used to compute strides may be None.

dim_tags

See Data Axis Tags.

offset
Offset from the beginning of the buffer to the point from

which the strides are counted. May be one of

  • 0 or None
  • a string (that is interpreted as an argument name).
  • a pymbolic expression
  • loopy.auto, in which case an offset argument is added automatically, immediately following this argument. loopy.CompiledKernel is even smarter in its treatment of this case and will compile custom versions of the kernel based on whether the passed arrays have offsets or not.
dim_names

A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as dim_tags and dim_tags. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.

alignment

Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.

Defaults to None.

If an integer N is given, the array would be declared with __attribute__((aligned(N))) in code generation for loopy.CTarget.

New in version 2018.1.

__init__(*args, **kwargs)

All of the following (except name) are optional. Specify either strides or shape.

Parameters:
  • name – When passed to loopy.make_kernel, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.
  • shape – May be any of the things specified under shape, or a string which can be parsed into the previous form.
  • dim_tags – A comma-separated list of tags as understood by parse_array_dim_tag().
  • strides

    May be one of the following:

    • None
    • loopy.auto. The strides will be determined by order and the access footprint.
    • a tuple like like numpy.ndarray.shape.

      Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    • A string which can be parsed into the previous form.
  • order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to loopy.make_kernel().
  • for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
  • offset – (See offset)
  • alignment – memory alignment in bytes
__eq__(other)

Return self==value.

num_user_axes(require_answer=True)
num_target_axes()
vector_size(target)

Return the size of the vector type used for the array divided by the basic data type.

Note: For 3-vectors, this will be 4.

(supports persistent hashing)

address_space

An attribute of AddressSpace defining the address space in which the array resides.

is_output_only

An instance of bool. If set to True, recorded to be returned from the kernel.

allowed_extra_kwargs = ['address_space', 'is_output_only']
get_arg_decl(ast_builder, name_suffix, shape, dtype, is_written)
max_target_axes = 1
min_target_axes = 0
class loopy.ConstantArg(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)
name
dtype

The loopy.types.LoopyType of the array. If this is None, loopy will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (via loopy.LoopKernel.__call__()) automatically adds this type information based on invocation arguments.

Note that some transformations, such as loopy.add_padding() cannot be performed without knowledge of the exact dtype.

shape

May be one of the following:

  • None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.

  • loopy.auto. The shape will be determined by finding the access footprint.

  • a tuple like like numpy.ndarray.shape.

    Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    Any element of the shape tuple not used to compute strides may be None.

dim_tags

See Data Axis Tags.

offset
Offset from the beginning of the buffer to the point from

which the strides are counted. May be one of

  • 0 or None
  • a string (that is interpreted as an argument name).
  • a pymbolic expression
  • loopy.auto, in which case an offset argument is added automatically, immediately following this argument. loopy.CompiledKernel is even smarter in its treatment of this case and will compile custom versions of the kernel based on whether the passed arrays have offsets or not.
dim_names

A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as dim_tags and dim_tags. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.

alignment

Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.

Defaults to None.

If an integer N is given, the array would be declared with __attribute__((aligned(N))) in code generation for loopy.CTarget.

New in version 2018.1.

__init__(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)

All of the following (except name) are optional. Specify either strides or shape.

Parameters:
  • name – When passed to loopy.make_kernel, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.
  • shape – May be any of the things specified under shape, or a string which can be parsed into the previous form.
  • dim_tags – A comma-separated list of tags as understood by parse_array_dim_tag().
  • strides

    May be one of the following:

    • None
    • loopy.auto. The strides will be determined by order and the access footprint.
    • a tuple like like numpy.ndarray.shape.

      Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    • A string which can be parsed into the previous form.
  • order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to loopy.make_kernel().
  • for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
  • offset – (See offset)
  • alignment – memory alignment in bytes
__eq__(other)

Return self==value.

num_user_axes(require_answer=True)
num_target_axes()
vector_size(target)

Return the size of the vector type used for the array divided by the basic data type.

Note: For 3-vectors, this will be 4.

(supports persistent hashing)

get_arg_decl(ast_builder, name_suffix, shape, dtype, is_written)
max_target_axes = 1
min_target_axes = 0
class loopy.ImageArg(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)
name
dtype

The loopy.types.LoopyType of the array. If this is None, loopy will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (via loopy.LoopKernel.__call__()) automatically adds this type information based on invocation arguments.

Note that some transformations, such as loopy.add_padding() cannot be performed without knowledge of the exact dtype.

shape

May be one of the following:

  • None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.

  • loopy.auto. The shape will be determined by finding the access footprint.

  • a tuple like like numpy.ndarray.shape.

    Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    Any element of the shape tuple not used to compute strides may be None.

dim_tags

See Data Axis Tags.

offset
Offset from the beginning of the buffer to the point from

which the strides are counted. May be one of

  • 0 or None
  • a string (that is interpreted as an argument name).
  • a pymbolic expression
  • loopy.auto, in which case an offset argument is added automatically, immediately following this argument. loopy.CompiledKernel is even smarter in its treatment of this case and will compile custom versions of the kernel based on whether the passed arrays have offsets or not.
dim_names

A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as dim_tags and dim_tags. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.

alignment

Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.

Defaults to None.

If an integer N is given, the array would be declared with __attribute__((aligned(N))) in code generation for loopy.CTarget.

New in version 2018.1.

__init__(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)

All of the following (except name) are optional. Specify either strides or shape.

Parameters:
  • name – When passed to loopy.make_kernel, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.
  • shape – May be any of the things specified under shape, or a string which can be parsed into the previous form.
  • dim_tags – A comma-separated list of tags as understood by parse_array_dim_tag().
  • strides

    May be one of the following:

    • None
    • loopy.auto. The strides will be determined by order and the access footprint.
    • a tuple like like numpy.ndarray.shape.

      Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    • A string which can be parsed into the previous form.
  • order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to loopy.make_kernel().
  • for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
  • offset – (See offset)
  • alignment – memory alignment in bytes
__eq__(other)

Return self==value.

num_user_axes(require_answer=True)
num_target_axes()
vector_size(target)

Return the size of the vector type used for the array divided by the basic data type.

Note: For 3-vectors, this will be 4.

(supports persistent hashing)

dimensions
get_arg_decl(ast_builder, name_suffix, shape, dtype, is_written)
max_target_axes = 3
min_target_axes = 1

Temporary Variables

Temporary variables model OpenCL’s private and local address spaces. Both have the lifetime of a kernel invocation.

class loopy.temp_var_scope

Deprecated. Use AddressSpace instead.

class loopy.TemporaryVariable(name, dtype=None, shape=(), address_space=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, base_indices=None, storage_shape=None, base_storage=None, initializer=None, read_only=False, _base_storage_access_may_be_aliasing=False, **kwargs)
name
dtype

The loopy.types.LoopyType of the array. If this is None, loopy will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (via loopy.LoopKernel.__call__()) automatically adds this type information based on invocation arguments.

Note that some transformations, such as loopy.add_padding() cannot be performed without knowledge of the exact dtype.

shape

May be one of the following:

  • None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.

  • loopy.auto. The shape will be determined by finding the access footprint.

  • a tuple like like numpy.ndarray.shape.

    Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    Any element of the shape tuple not used to compute strides may be None.

dim_tags

See Data Axis Tags.

offset
Offset from the beginning of the buffer to the point from

which the strides are counted. May be one of

  • 0 or None
  • a string (that is interpreted as an argument name).
  • a pymbolic expression
  • loopy.auto, in which case an offset argument is added automatically, immediately following this argument. loopy.CompiledKernel is even smarter in its treatment of this case and will compile custom versions of the kernel based on whether the passed arrays have offsets or not.
dim_names

A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as dim_tags and dim_tags. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.

alignment

Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.

Defaults to None.

If an integer N is given, the array would be declared with __attribute__((aligned(N))) in code generation for loopy.CTarget.

New in version 2018.1.

__init__(name, dtype=None, shape=(), address_space=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, base_indices=None, storage_shape=None, base_storage=None, initializer=None, read_only=False, _base_storage_access_may_be_aliasing=False, **kwargs)
Parameters:
__eq__(other)

Return self==value.

num_user_axes(require_answer=True)
num_target_axes()
vector_size(target)

Return the size of the vector type used for the array divided by the basic data type.

Note: For 3-vectors, this will be 4.

(supports persistent hashing)

storage_shape
base_indices
address_space

What memory this temporary variable lives in. One of the values in AddressSpace, or loopy.auto if this is to be automatically determined.

base_storage

The name of a storage array that is to be used to actually hold the data in this temporary. Note that this storage array must not match any existing variable names.

initializer

None or a numpy.ndarray of data to be used to initialize the array.

read_only

A bool indicating whether the variable may be written during its lifetime. If True, initializer must be given.

_base_storage_access_may_be_aliasing

Whether the temporary is used to alias the underlying base storage. Defaults to False. If False, C-based code generators will declare the temporary as a restrict const pointer to the base storage memory location. If True, the restrict part is omitted on this declaration.

allowed_extra_kwargs = ['storage_shape', 'base_indices', 'address_space', 'base_storage', 'initializer', 'read_only', '_base_storage_access_may_be_aliasing']
copy(**kwargs)
decl_info(target, index_dtype)

Return a list of loopy.codegen.ImplementedDataInfo instances corresponding to the array.

get_arg_decl(ast_builder, name_suffix, shape, dtype, is_written)
max_target_axes = 1
min_target_axes = 0
nbytes
scope
update_persistent_hash(key_hash, key_builder)

Custom hash computation function for use with pytools.persistent_dict.PersistentDict.

Specifying Types

loopy uses the same type system as numpy. (See numpy.dtype) It also uses pyopencl for a registry of user-defined types and their C equivalents. See pyopencl.tools.get_or_register_dtype() and related functions.

For a string representation of types, all numpy types (e.g. float32 etc.) are accepted, in addition to what is registered in pyopencl.

Data Axis Tags

Data axis tags specify how a multi-dimensional array (which is loopy’s main way of storing data) is represented in (linear, 1D) computer memory. This storage format is given as a number of “tags”, as listed in the table below. Each axis of an array has a tag corresponding to it. In the user interface, array dim tags are specified as a tuple of these tags or a comma-separated string containing them, such as the following:

c,vec,sep,c

The interpretation of these tags is order-dependent, they are read from left to right.

Tag Meaning
c Nest current axis around the ones that follow
f Nest current axis inside the ones that follow
N0N9 Specify an explicit nesting level for this axis
stride:EXPR A fixed stride
sep Implement this axis by mapping to separate arrays
vec Implement this axis as entries in a vector

sep and vec obviously require the number of entries in the array along their respective axis to be known at code generation time.

When the above speaks about ‘nesting levels’, this means that axes “nested inside” others are “faster-moving” when viewed from linear memory.

In addition, each tag may be followed by a question mark (?), which indicates that if there are more dimension tags specified than array axes present, that this axis should be omitted. Axes with question marks are omitted in a left-first manner until the correct number of dimension tags is achieved.

Some examples follow, all of which use a three-dimensional array of shape (3, M, 4). For simplicity, we assume that array entries have size one.

  • c,c,c: The axes will have strides (M*4, 4, 1), leading to a C-like / row-major layout.
  • f,f,f: The axes will have strides (1, 3, 3*M), leading to a Fortran-like / row-major layout.
  • sep,c,c: The array will be mapped to three arrays of shape (M, 4), each with strides (4, 1).
  • c,c,vec: The array will be mapped to an array of float4 vectors, with (float4-based) strides of (M, 1).
  • N1,N0,N2: The axes will have strides (M, 1, 3*M).

Substitution Rules

Substitution Rule Objects

class loopy.SubstitutionRule(name, arguments, expression)
name
arguments

A tuple of strings

expression

Textual Syntax for Substitution Rules

Syntax of a substitution rule:

rule_name(arg1, arg2) := EXPRESSION

Kernel Options

class loopy.Options(**kwargs)

Unless otherwise specified, these options are Boolean-valued (i.e. on/off).

Code-generation options

annotate_inames

When generating code for inames, annotate them with comments if it is not immediately apparent which iname is being referred to (such as for inames mapped to constants or OpenCL group/local IDs).

trace_assignments

Generate code that uses printf in kernels to trace the execution of assignment instructions.

trace_assignment_values

Like trace_assignments, but also trace the assigned values.

ignore_boostable_into

Ignore the boostable_into field of the kernel, when determining whether an iname duplication is necessary for the kernel to be schedulable.

check_dep_resolution

Whether loopy should issue an error if a dependency expression does not match any instructions in the kernel.

Invocation-related options

skip_arg_checks

Do not do any checking (data type, data layout, shape, etc.) on arguments for a minor performance gain.

no_numpy

Do not check for or accept numpy arrays as arguments.

Defaults to False.

cl_exec_manage_array_events

Within the PyOpenCL executor, respect and udpate pyopencl.array.Array.event.

Defaults to True.

return_dict

Have kernels return a dict instead of a tuple as output. Specifically, the result of a kernel invocation with this flag is a tuple (evt, out_dict), where out_dict is a dictionary mapping argument names to their output values. This is helpful if arguments are inferred and argument ordering is thus implementation-defined.

See CompiledKernel.__call__().

write_wrapper

Print the generated Python invocation wrapper. Accepts a file name as a value. Writes to sys.stdout if none is given.

write_code

Print the generated code. Accepts a file name or a boolean as a value. Writes to sys.stdout if set to True.

edit_code

Invoke an editor (given by the environment variable EDITOR) on the generated kernel code, allowing for tweaks before the code is passed on to the target for compilation.

build_options

Options to pass to the target compiler when building the kernel. A list of strings.

allow_terminal_colors

A bool. Whether to allow colors in terminal output

Features

disable_global_barriers
enforce_variable_access_ordered

If True, require that loopy.check.check_variable_access_ordered() passes. Required for language versions 2018.1 and above. This check helps find and eliminate unintentionally unordered access to variables.

If equal to "no_check", then no check is performed.

Targets

class loopy.TargetBase

Base class for all targets, i.e. different combinations of code that loopy can generate.

Objects of this type must be picklable.

class loopy.ASTBuilderBase(target)

An interface for generating (host or device) ASTs.

class loopy.CTarget(fortran_abi=False)

A target for plain “C”, without any parallel extensions.

class loopy.ExecutableCTarget(compiler=None, fortran_abi=False)

An executable CTarget that uses (by default) JIT compilation of C-code

class loopy.CudaTarget(extern_c=True)

A target for Nvidia’s CUDA GPU programming language.

class loopy.OpenCLTarget(atomics_flavor=None)

A target for the OpenCL C heterogeneous compute programming language.

class loopy.PyOpenCLTarget(device=None, pyopencl_module_name='_lpy_cl', atomics_flavor=None)

A code generation target that takes special advantage of pyopencl features such as run-time knowledge of the target device (to generate warnings) and support for complex numbers.

class loopy.ISPCTarget(occa_mode=False)

A code generation target for Intel’s ISPC SPMD programming language, to target Intel’s Knight’s hardware and modern Intel CPUs with wide vector units.

class loopy.NumbaTarget

A target for plain Python as understood by Numba, without any parallel extensions.

class loopy.NumbaCudaTarget

A target for Numba with CUDA extensions.

Helper values

class loopy.auto

A generic placeholder object for something that should be automatically determined. See, for example, the shape or strides argument of ArrayArg.

class loopy.UniqueName(name)

A tag for a string that identifies a partial identifier that is to be made unique by the UI.

Libraries: Extending and Interfacing with External Functionality

Symbols

Functions

class loopy.PreambleInfo(valuedict=None, exclude=['self'], **kwargs)
kernel
seen_dtypes
seen_functions
seen_atomic_dtypes
codegen_state
class loopy.CallMangleInfo(target_name, result_dtypes, arg_dtypes)
target_name

A string. The name of the function to be called in the generated target code.

result_dtypes

A tuple of LoopyType instances indicating what types of values the function returns.

arg_dtypes

A tuple of LoopyType instances indicating what types of arguments the function actually receives.

Reductions

The Kernel Object

Do not create LoopKernel objects directly. Instead, refer to Reference: Creating Kernels.

class loopy.LoopKernel(domains, instructions, args=None, schedule=None, name='loopy_kernel', preambles=None, preamble_generators=None, assumptions=None, local_sizes=None, temporary_variables=None, iname_to_tags=None, substitutions=None, function_manglers=None, symbol_manglers=[], iname_slab_increments=None, loop_priority=frozenset(), silenced_warnings=None, applied_iname_rewrites=None, cache_manager=None, index_dtype=<class 'numpy.int32'>, options=None, state=0, target=None, overridden_get_grid_sizes_for_insn_ids=None, _cached_written_variables=None)

These correspond more or less directly to arguments of loopy.make_kernel().

Note

This data structure and its attributes should be considered immutable, even if it contains mutable data types. See copy() for an easy way of producing a modified copy.

domains

a list of islpy.BasicSet instances representing the Loop Domain Forest.

instructions

A list of InstructionBase instances, e.g. Assignment. See Instructions.

args

A list of loopy.KernelArgument

schedule

None or a list of loopy.schedule.ScheduleItem

name
preambles
preamble_generators
assumptions

A islpy.BasicSet parameter domain.

local_sizes
temporary_variables

A dict of mapping variable names to loopy.TemporaryVariable instances.

iname_to_tags

A dict mapping inames (as strings) to set of instances of loopy.kernel.data.IndexTag. .. versionadded:: 2018.1

function_manglers
symbol_manglers
substitutions

a mapping from substitution names to SubstitutionRule objects

iname_slab_increments

a dictionary mapping inames to (lower_incr, upper_incr) tuples that will be separated out in the execution to generate ‘bulk’ slabs with fewer conditionals.

loop_priority

A frozenset of priority constraints to the kernel. Each such constraint is a tuple of inames. Inames occuring in such a tuple will be scheduled earlier than any iname following in the tuple. This applies only to inames with non-parallel implementation tags.

silenced_warnings
applied_iname_rewrites

A list of past substitution dictionaries that were applied to the kernel. These are stored so that they may be repeated on expressions the user specifies later.

cache_manager
options

An instance of loopy.Options

state

A value from KernelState.

target

A subclass of loopy.TargetBase.

class loopy.KernelState
INITIAL = 0
PREPROCESSED = 1
SCHEDULED = 2

Implementation Detail: The Base Array

All array-like data in loopy (such as ArrayArg and TemporaryVariable) derive from single, shared base array type, described next.

class loopy.kernel.array.ArrayBase(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)
name
dtype

The loopy.types.LoopyType of the array. If this is None, loopy will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (via loopy.LoopKernel.__call__()) automatically adds this type information based on invocation arguments.

Note that some transformations, such as loopy.add_padding() cannot be performed without knowledge of the exact dtype.

shape

May be one of the following:

  • None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.

  • loopy.auto. The shape will be determined by finding the access footprint.

  • a tuple like like numpy.ndarray.shape.

    Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    Any element of the shape tuple not used to compute strides may be None.

dim_tags

See Data Axis Tags.

offset
Offset from the beginning of the buffer to the point from

which the strides are counted. May be one of

  • 0 or None
  • a string (that is interpreted as an argument name).
  • a pymbolic expression
  • loopy.auto, in which case an offset argument is added automatically, immediately following this argument. loopy.CompiledKernel is even smarter in its treatment of this case and will compile custom versions of the kernel based on whether the passed arrays have offsets or not.
dim_names

A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as dim_tags and dim_tags. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.

alignment

Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.

Defaults to None.

If an integer N is given, the array would be declared with __attribute__((aligned(N))) in code generation for loopy.CTarget.

New in version 2018.1.

__init__(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, target=None, alignment=None, **kwargs)

All of the following (except name) are optional. Specify either strides or shape.

Parameters:
  • name – When passed to loopy.make_kernel, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.
  • shape – May be any of the things specified under shape, or a string which can be parsed into the previous form.
  • dim_tags – A comma-separated list of tags as understood by parse_array_dim_tag().
  • strides

    May be one of the following:

    • None
    • loopy.auto. The strides will be determined by order and the access footprint.
    • a tuple like like numpy.ndarray.shape.

      Each entry of the tuple is also allowed to be a pymbolic expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.

    • A string which can be parsed into the previous form.
  • order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to loopy.make_kernel().
  • for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
  • offset – (See offset)
  • alignment – memory alignment in bytes
__eq__(other)

Return self==value.

num_user_axes(require_answer=True)
num_target_axes()
vector_size(target)

Return the size of the vector type used for the array divided by the basic data type.

Note: For 3-vectors, this will be 4.

(supports persistent hashing)