Reference: Transforming Kernels

Dealing with Parameters

loopy.fix_parameters(kernel, **value_dict)[source]

Fix the values of the arguments to specific constants.

value_dict consists of name/value pairs, where name will be fixed to be value. name may refer to Domain parameters or Arguments.

loopy.assume(kernel, assumptions)[source]

Include an assumption about Domain parameters in the kernel, e.g. n mod 4 = 0.

Parameters

assumptions – a islpy.BasicSet or a string representation of the assumptions in ISL syntax.

Wrangling inames

loopy.split_iname(kernel, split_iname, inner_length, *, outer_iname=None, inner_iname=None, outer_tag=None, inner_tag=None, slabs=(0, 0), do_tagged_check=True, within=None)[source]

Split split_iname into two inames (an ‘inner’ one and an ‘outer’ one) so that split_iname == inner + outer*inner_length and inner is of constant length inner_length.

Parameters
  • outer_iname – The new iname to use for the ‘inner’ (fixed-length) loop. Defaults to a name derived from split_iname + "_outer"

  • inner_iname – The new iname to use for the ‘inner’ (fixed-length) loop. Defaults to a name derived from split_iname + "_inner"

  • inner_length – a positive integer

  • slabs – A tuple (head_it_count, tail_it_count) indicating the number of leading/trailing iterations of outer_iname for which separate code should be generated.

  • outer_tag – The iname tag (see Iname Implementation Tags) to apply to outer_iname.

  • inner_tag – The iname tag (see Iname Implementation Tags) to apply to inner_iname.

  • within – a stack match as understood by loopy.match.parse_match().

Split inames do not inherit tags from their ‘parent’ inames.

loopy.chunk_iname(kernel, split_iname, num_chunks, outer_iname=None, inner_iname=None, outer_tag=None, inner_tag=None, slabs=(0, 0), do_tagged_check=True, within=None)[source]

Split split_iname into two inames (an ‘inner’ one and an ‘outer’ one) so that split_iname == inner + outer*chunk_length and outer is of fixed length num_chunks.

Parameters

within – a stack match as understood by loopy.match.parse_stack_match().

Split inames do not inherit tags from their ‘parent’ inames.

New in version 2016.2.

loopy.join_inames(kernel, inames, new_iname=None, tag=None, within=None)[source]

In a sense, the inverse of split_iname(). Takes in inames, finds their bounds (all but the first have to be bounded), and combines them into a single loop via analogs of new_iname = i0 * LEN(i1) + i1. The old inames are re-obtained via the appropriate division/modulo operations.

Parameters
loopy.untag_inames(kernel, iname_to_untag, tag_type)[source]

Remove tags on iname_to_untag which matches tag_type.

Parameters

New in version 2018.1.

loopy.tag_inames(kernel, iname_to_tag, force=False, ignore_nonexistent=False)[source]

Tag an iname

Parameters

iname_to_tag – a list of tuples (iname, new_tag). new_tag is given as an instance of a subclass of pytools.tag.Tag, for example a subclass of loopy.kernel.data.InameImplementationTag. May also be iterable of which, or as a string as shown in Iname Implementation Tags. May also be a dictionary for backwards compatibility. iname may also be a wildcard using * and ?.

Changed in version 2016.3: Added wildcards.

Changed in version 2018.1: Added iterable of tags

loopy.duplicate_inames(kernel, inames, within, new_inames=None, suffix=None, tags=None)[source]
Parameters

within – a stack match as understood by loopy.match.parse_stack_match().

loopy.get_iname_duplication_options(kernel, use_boostable_into=None)[source]

List options for duplication of inames, if necessary for schedulability

Returns

a generator listing all options to duplicate inames, if duplication of an iname is necessary to ensure the schedulability of the kernel. Duplication options are returned as tuples (iname, within) as understood by duplicate_inames(). There is no guarantee, that the transformed kernel will be schedulable, because multiple duplications of iname may be necessary.

Some kernels require the duplication of inames in order to be schedulable, as the forced iname dependencies define an over-determined problem to the scheduler. Consider the following minimal example:

knl = lp.make_kernel(["{[i,j]:0<=i,j<n}"],
                     """
                     mat1[i,j] = mat1[i,j] + 1 {inames=i:j, id=i1}
                     mat2[j] = mat2[j] + 1 {inames=j, id=i2}
                     mat3[i] = mat3[i] + 1 {inames=i, id=i3}
                     """)

In the example, there are four possibilities to resolve the problem: * duplicating i in instruction i3 * duplicating i in instruction i1 and i3 * duplicating j in instruction i2 * duplicating i in instruction i2 and i3

Use has_schedulable_iname_nesting() to decide whether an iname needs to be duplicated in a given kernel.

loopy.has_schedulable_iname_nesting(kernel)[source]
Returns

a bool indicating whether this kernel needs an iname duplication in order to be schedulable.

loopy.prioritize_loops(kernel, loop_priority)[source]

Indicates the textual order in which loops should be entered in the kernel code. Note that this priority has an advisory role only. If the kernel logically requires a different nesting, priority is ignored. Priority is only considered if loop nesting is ambiguous.

prioritize_loops can be used multiple times. If you do so, each given loop_priority specifies a scheduling constraint. The constraints from all calls to prioritize_loops together establish a partial order on the inames (see https://en.wikipedia.org/wiki/Partially_ordered_set).

Arg

an iterable of inames, or, for brevity, a comma-separated string of inames

loopy.rename_iname(kernel, old_iname, new_iname, existing_ok=False, within=None)[source]
Parameters
loopy.remove_unused_inames(kernel, inames=None)[source]

Delete those among inames that are unused, i.e. project them out of the domain. If these inames pose implicit restrictions on other inames, these restrictions will persist as existentially quantified variables.

Parameters

inames – may be an iterable of inames or a string of comma-separated inames.

loopy.split_reduction_inward(kernel, inames, within=None)[source]

Takes a reduction of the form:

sum([i,j,k], ...)

and splits it into two nested reductions:

sum([j,k], sum([i], ...))

In this case, inames would have been "i" indicating that the iname i should be made the iname governing the inner reduction.

Parameters

inames – A list of inames, or a comma-separated string that can be parsed into those

loopy.split_reduction_outward(kernel, inames, within=None)[source]

Takes a reduction of the form:

sum([i,j,k], ...)

and splits it into two nested reductions:

sum([i], sum([j,k], ...))

In this case, inames would have been "i" indicating that the iname i should be made the iname governing the outer reduction.

Parameters

inames – A list of inames, or a comma-separated string that can be parsed into those

loopy.affine_map_inames(kernel, old_inames, new_inames, equations)[source]

Return a new kernel where the affine transform specified by equations has been applied to the inames.

Parameters
  • old_inames – A list of inames to be replaced by affine transforms of their values. May also be a string of comma-separated inames.

  • new_inames – A list of new inames that are not yet used in kernel, but have their values established in terms of old_inames by equations. May also be a string of comma-separated inames.

  • equations – A list of equations estabilishing a relationship between old_inames and new_inames. Each equation may be a tuple (lhs, rhs) of expressions or a string, with left and right hand side of the equation separated by =.

loopy.find_unused_axis_tag(kernel, kind, insn_match=None)[source]

For one of the hardware-parallel execution tags, find an unused axis.

Parameters
  • insn_match – An instruction match as understood by loopy.match.parse_match().

  • kind – may be “l” or “g”, or the corresponding tag class name

Returns

an loopy.kernel.data.GroupInameTag or loopy.kernel.data.LocalInameTag that is not being used within the instructions matched by insn_match.

loopy.make_reduction_inames_unique(kernel, inames=None, within=None)[source]
Parameters

New in version 2016.2.

loopy.add_inames_to_insn(kernel, inames, insn_match)[source]
Parameters
  • inames – a frozenset of inames that will be added to the instructions matched by insn_match, or a comma-separated string that parses to such a tuple.

  • insn_match – An instruction match as understood by loopy.match.parse_match().

Returns

an loopy.kernel.data.GroupInameTag or loopy.kernel.data.LocalInameTag that is not being used within the instructions matched by insn_match.

New in version 2016.3.

loopy.add_inames_for_unused_hw_axes(kernel, within=None)[source]

Returns a kernel with inames added to each instruction corresponding to any hardware-parallel iname tags (loopy.kernel.data.GroupInameTag, loopy.kernel.data.LocalInameTag) unused in the instruction but used elsewhere in the kernel.

Current limitations:

  • Only one iname in the kernel may be tagged with each of the unused hw axes.

  • Occurence of an l.auto tag when an instruction is missing one of the local hw axes.

Parameters

within – An instruction match as understood by loopy.match.parse_match().

Dealing with Substitution Rules

loopy.extract_subst(kernel, subst_name, template, parameters=(), within=None)[source]
Parameters
  • subst_name – The name of the substitution rule to be created.

  • template – Unification template expression.

  • parameters – An iterable of parameters used in template, or a comma-separated string of the same.

  • within – An instance of loopy.match.MatchExpressionBase or str as understood by loopy.match.parse_match().

All targeted subexpressions must match (‘unify with’) template The template may contain ‘*’ wildcards that will have to match exactly across all unifications.

loopy.assignment_to_subst(kernel, lhs_name, extra_arguments=(), within=None, force_retain_argument=False)[source]

Extract an assignment (to a temporary variable or an argument) as a Substitution Rules. The temporary may be an array, in which case the array indices will become arguments to the substitution rule.

Parameters
  • within – a stack match as understood by loopy.match.parse_stack_match().

  • force_retain_argument – If True and if lhs_name is an argument, it is kept even if it is no longer referenced.

This operation will change all usage sites of lhs_name matched by within. If there are further usage sites of lhs_name, then the original assignment to lhs_name as well as the temporary variable is left in place.

loopy.expand_subst(kernel, within=None)[source]

Returns an instance of loopy.LoopKernel with the substitutions referenced in instructions of kernel matched by within expanded.

Parameters

within – a stack match as understood by loopy.match.parse_stack_match().

loopy.find_rules_matching(kernel, pattern)[source]
Pattern

A shell-style glob pattern.

loopy.find_one_rule_matching(program, pattern)[source]

Caching, Precomputation and Prefetching

loopy.precompute(program, *args, **kwargs)[source]
loopy.add_prefetch(program, *args, **kwargs)[source]
loopy.buffer_array(program, *args, **kwargs)[source]
loopy.alias_temporaries(kernel, names, base_name_prefix=None, synchronize_for_exclusive_use=True)[source]

Sets all temporaries given by names to be backed by a single piece of storage.

Parameters
  • synchronize_for_exclusive_use – A bool. If True, this also introduces ordering structures (“groups”) to prevent the usage to ensure that the live ranges (i.e. the regions of code where each of the temporaries is used) do not overlap. This will allow two (or more) temporaries to share the same storage space as long as their live ranges do not need to be concurrent.

  • base_name_prefix – an identifier to be used for the common storage area

Changed in version 2016.3: Added synchronize_for_exclusive_use flag. synchronize_for_exclusive_use=True was the previous default behavior.

Influencing data access

loopy.change_arg_to_image(kernel, name)[source]
loopy.tag_array_axes(kernel, ary_names, dim_tags)[source]
Parameters

dim_tags

a tuple of loopy.kernel.array.ArrayDimImplementationTag or a string that parses to one. See loopy.kernel.array.parse_array_dim_tags() for a description of the allowed string format.

For example, dim_tags could be "N2,N0,N1" to determine that the second axis is the fastest-varying, the last is the next-fastest, and the first is the slowest.

Changed in version 2016.2: This function was called tag_data_axes before version 2016.2.

loopy.remove_unused_arguments(kernel)[source]
loopy.set_array_axis_names(kernel, ary_names, dim_names)[source]

Changed in version 2016.2: This function was called set_array_dim_names before version 2016.2.

loopy.privatize_temporaries_with_inames(kernel, privatizing_inames, only_var_names=None)[source]

This function provides each loop iteration of the privatizing_inames with its own private entry in the temporaries it accesses (possibly restricted to only_var_names).

This is accomplished implicitly as part of generating instruction-level parallelism by the “ILP” tag and accessible separately through this transformation.

Example:

for imatrix, i
    acc = 0
    for k
        acc = acc + a[imatrix, i, k] * vec[k]
    end
end

might become:

for imatrix, i
    acc[imatrix] = 0
    for k
        acc[imatrix] = acc[imatrix] + a[imatrix, i, k] * vec[k]
    end
end

facilitating loop interchange of the imatrix loop. .. versionadded:: 2018.1

Padding Data

loopy.split_array_axis(kernel, array_names, axis_nr, count, order='C')[source]
Parameters
  • array – a list of names of temporary variables or arguments. May also be a comma-separated string of these.

  • axis_nr – the (zero-based) index of the axis that should be split.

  • count – The group size to use in the split.

  • order – The way the split array axis should be linearized. May be “C” or “F” to indicate C/Fortran (row/column)-major order.

Changed in version 2016.2: There was a more complicated, dumber function called loopy.split_array_dim that had the role of this function in versions prior to 2016.2.

loopy.find_padding_multiple(kernel, variable, axis, align_bytes, allowed_waste=0.1)[source]
loopy.add_padding(kernel, variable, axis, align_bytes)[source]

Manipulating Instructions

loopy.set_instruction_priority(kernel, insn_match, priority)[source]

Set the priority of instructions matching insn_match to priority.

insn_match may be any instruction id match understood by loopy.match.parse_match().

loopy.add_dependency(kernel, insn_match, depends_on)[source]

Add the instruction dependency dependency to the instructions matched by insn_match.

insn_match and depends_on may be any instruction id match understood by loopy.match.parse_match().

Changed in version 2016.3: Third argument renamed to depends_on for clarity, allowed to be not just ID but also match expression.

loopy.remove_instructions(kernel, insn_ids)[source]

Return a new kernel with instructions in insn_ids removed.

Dependencies across deleted instructions are transitively propagated i.e. if insn_a depends on insn_b that depends on insn_c and ‘insn_b’ is to be removed then the returned kernel will have a dependency from ‘insn_a’ to ‘insn_c’.

This also updates no_sync_with for all instructions.

Parameters

insn_ids – An instance of set or str as understood by loopy.match.parse_match() or loopy.match.MatchExpressionBase.

loopy.replace_instruction_ids(kernel, replacements)[source]
loopy.tag_instructions(kernel, new_tag, within=None)[source]
loopy.add_nosync(kernel, scope, source, sink, bidirectional=False, force=False, empty_ok=False)[source]

Add a no_sync_with directive between source and sink. no_sync_with is only added if sink depends on source or if the instruction pair is in a conflicting group.

This function does not check for the presence of a memory dependency.

Parameters
  • kernel – The kernel

  • source – Either a single instruction id, or any instruction id match understood by loopy.match.parse_match().

  • sink – Either a single instruction id, or any instruction id match understood by loopy.match.parse_match().

  • scope – A valid no_sync_with scope. See loopy.InstructionBase.no_sync_with for allowable scopes.

  • bidirectional – A bool. If True, add a no_sync_with to both the source and sink instructions, otherwise the directive is only added to the sink instructions.

  • force – A bool. If True, add a no_sync_with directive even without the presence of a dependency edge or conflicting instruction group.

  • empty_ok – If True, do not complain even if no nosync tags were added as a result of the transformation.

Returns

The updated kernel

Changed in version 2018.1: If the transformation adds no nosync directives, it will complain. This used to silently pass. This behavior can be restored using empty_ok.

loopy.add_barrier(kernel, insn_before='', insn_after='', id_based_on=None, tags=None, synchronization_kind='global', mem_kind=None, within_inames=None)[source]

Takes in a kernel that needs to be added a barrier and returns a kernel which has a barrier inserted into it. It takes input of 2 instructions and then adds a barrier in between those 2 instructions. The expressions can be any inputs that are understood by loopy.match.parse_match().

Parameters
  • insn_before – String expression that specifies the instruction(s) before the barrier which is to be added. If None, no dependencies will be added to barrier.

  • insn_after – String expression that specifies the instruction(s) after the barrier which is to be added. If None, no dependencies on the barrier will be added.

  • id – String on which the id of the barrier would be based on.

  • tags – The tag of the group to which the barrier must be added

  • synchronization_kind – Kind of barrier to be added. May be “global” or “local”

  • kind – Type of memory to be synchronized. May be “global” or “local”. Ignored for “global” barriers. If not supplied, defaults to synchronization_kind

  • within_inames – A frozenset of inames identifying the loops within which the barrier will be executed.

Registering Library Routines

loopy.register_reduction_parser(parser)[source]

Register a new loopy.library.reduction.ReductionOperation.

Parameters

parser – A function that receives a string and returns a subclass of ReductionOperation.

loopy.register_preamble_generators(kernel, preamble_generators)[source]
Parameters

manglers – list of functions of signature (preamble_info) generating tuples (sortable_str_identifier, code), where preamble_info is a PreambleInfo.

Returns

kernel with manglers registered

loopy.register_symbol_manglers(kernel, manglers)[source]

Modifying Arguments

loopy.set_argument_order(kernel, arg_names)[source]
Parameters

arg_names – A list (or comma-separated string) or argument names. All arguments must be in this list.

loopy.add_dtypes(prog_or_kernel, dtype_dict)[source]

Specify remaining unspecified argument/temporary variable types.

Parameters

dtype_dict – a mapping from variable names to numpy.dtype instances

loopy.infer_unknown_types(program, expect_completion=False)[source]

Infer types on temporaries and arguments.

loopy.add_and_infer_dtypes(prog, dtype_dict, expect_completion=False, kernel_name=None)[source]
loopy.rename_argument(kernel, old_name, new_name, existing_ok=False)[source]

New in version 2016.2.

loopy.set_temporary_scope(kernel, temp_var_names, scope)[source]
Parameters
  • temp_var_names – a container with membership checking, or a comma-separated string of variables for which the scope is to be set.

  • scope – One of the values from loopy.AddressSpace, or one of the strings "private", "local", or "global".

Creating Batches of Operations

loopy.to_batched(kernel, nbatches, batch_varying_args, batch_iname_prefix='ibatch', sequential=False)[source]

Takes in a kernel that carries out an operation and returns a kernel that carries out a batch of these operations.

Note

For temporaries in a kernel that are private or read only globals and if sequential=True, loopy does not does not batch these variables unless explicitly mentioned in batch_varying_args.

Parameters
  • nbatches – the number of batches. May be a constant non-negative integer or a string, which will be added as an integer argument.

  • batch_varying_args – a list of argument names that vary per-batch. Each such variable will have a batch index added.

  • sequential – A bool. If True, do not duplicate temporary variables for each batch. This automatically tags the batch iname for sequential execution.

Finishing up

loopy.preprocess_kernel(program, device=None)
loopy.generate_loop_schedules(kernel, callables_table, debug_args=None)[source]

Warning

This function needs to be called inside (another layer) of a loopy.schedule.MinRecursionLimitForScheduling context manager, and the context manager needs to end after the last reference to the generators has gone out of scope. Otherwise, the high-recursion-limit generator chain may not be successfully garbage-collected and cause an internal error in the Python runtime.

loopy.get_one_linearized_kernel(kernel, callables_table)[source]
loopy.save_and_reload_temporaries(program, entrypoint=None)[source]

Add instructions to save and reload temporary variables that are live across kernel calls.

The basic code transformation turns schedule segments:

t = <...>
<return followed by call>
<...> = t

into this code:

t = <...>
t_save_slot = t
<return followed by call>
t = t_save_slot
<...> = t

where t_save_slot is a newly-created global temporary variable.

Returns

The resulting kernel

class loopy.GeneratedProgram(*args, **kwargs)[source]
name
is_device_program
ast

Once generated, this captures the AST of the overall function definition, including the body.

body_ast

Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.

class loopy.CodeGenerationResult(*args, **kwargs)[source]
host_program
device_programs

A list of GeneratedProgram instances intended to run on the compute device.

implemented_domains

A mapping from instruction ID to a list of islpy.Set objects.

host_preambles
device_preambles
host_code()[source]
device_code()[source]
all_code()[source]
implemented_data_info

a list of loopy.codegen.ImplementedDataInfo objects. Only added at the very end of code generation.

loopy.generate_code_v2(program)[source]

Returns an instance of CodeGenerationResult.

Parameters

program – An instance of loopy.TranslationUnit.

loopy.generate_header(kernel, codegen_result=None)[source]
Parameters
Returns

a list of AST nodes (which may have str called on them to produce a string) representing function declarations for the generated device functions.

Setting options

loopy.set_options(kernel, *args, **kwargs)[source]

Return a new kernel with the options given as keyword arguments, or from a string representation passed in as the first (and only) positional argument.

See also Options.

Matching contexts

TODO: Matching instruction tags

loopy.match.parse_match(expr)[source]

Syntax examples:

* ``id:yoink and writes:a_temp``
* ``id:yoink and (not writes:a_temp or tag:input)``
loopy.match.parse_stack_match(smatch)[source]

Syntax example:

... > outer > ... > next > innermost $
insn > next
insn > ... > next > innermost $

... matches an arbitrary number of intervening stack levels.

Each of the entries is a match expression as understood by parse_match().

Match expressions

class loopy.match.MatchExpressionBase[source]
class loopy.match.All[source]
class loopy.match.And(children)[source]
class loopy.match.Or(children)[source]
class loopy.match.Not(child)[source]
class loopy.match.Id(glob)[source]
class loopy.match.ObjTagged(tag: pytools.tag.Tag)[source]

Match if the object is tagged with a given Tag.

Note

These instance-based tags will, in the not-too-distant future, replace the string-based tags matched by Tagged.

class loopy.match.Tagged(glob)[source]

Match a string-based tagged using a glob expression.

Note

These string-based tags will, in the not-too-distant future, be replace by instance-based tags matched by ObjTagged.

class loopy.match.Writes(glob)[source]
class loopy.match.Reads(glob)[source]
class loopy.match.InKernel(glob)[source]
class loopy.match.Iname(glob)[source]