Reference: Transforming Kernels¶
Dealing with Parameters¶
-
loopy.
fix_parameters
(kernel, **value_dict)¶ Fix the values of the arguments to specific constants.
value_dict consists of name/value pairs, where name will be fixed to be value. name may refer to Domain parameters or Arguments.
-
loopy.
assume
(kernel, assumptions)¶ Include an assumption about Domain parameters in the kernel, e.g. n mod 4 = 0.
- Parameters
assumptions – a
islpy.BasicSet
or a string representation of the assumptions in ISL syntax.
Wrangling inames¶
-
loopy.
split_iname
(kernel, split_iname, inner_length, *, outer_iname=None, inner_iname=None, outer_tag=None, inner_tag=None, slabs=(0, 0), do_tagged_check=True, within=None)¶ Split split_iname into two inames (an ‘inner’ one and an ‘outer’ one) so that
split_iname == inner + outer*inner_length
and inner is of constant length inner_length.- Parameters
outer_iname – The new iname to use for the ‘inner’ (fixed-length) loop. Defaults to a name derived from
split_iname + "_outer"
inner_iname – The new iname to use for the ‘inner’ (fixed-length) loop. Defaults to a name derived from
split_iname + "_inner"
inner_length – a positive integer
slabs – A tuple
(head_it_count, tail_it_count)
indicating the number of leading/trailing iterations of outer_iname for which separate code should be generated.outer_tag – The iname tag (see Iname Implementation Tags) to apply to outer_iname.
inner_tag – The iname tag (see Iname Implementation Tags) to apply to inner_iname.
within – a stack match as understood by
loopy.match.parse_match()
.
-
loopy.
chunk_iname
(kernel, split_iname, num_chunks, outer_iname=None, inner_iname=None, outer_tag=None, inner_tag=None, slabs=(0, 0), do_tagged_check=True, within=None)¶ Split split_iname into two inames (an ‘inner’ one and an ‘outer’ one) so that
split_iname == inner + outer*chunk_length
and outer is of fixed length num_chunks.- Parameters
within – a stack match as understood by
loopy.match.parse_stack_match()
.
New in version 2016.2.
-
loopy.
join_inames
(kernel, inames, new_iname=None, tag=None, within=None)¶ - Parameters
inames – fastest varying last
within – a stack match as understood by
loopy.match.parse_stack_match()
.
-
loopy.
untag_inames
(kernel, iname_to_untag, tag_type)¶ Remove tags on iname_to_untag which matches tag_type.
- Parameters
iname_to_untag – iname as string.
tag_type – a subclass of
loopy.kernel.data.IndexTag
.
New in version 2018.1.
-
loopy.
tag_inames
(kernel, iname_to_tag, force=False, ignore_nonexistent=False)¶ Tag an iname
- Parameters
iname_to_tag – a list of tuples
(iname, new_tag)
. new_tag is given as an instance of a subclass ofloopy.kernel.data.IndexTag
or an iterable of which, or as a string as shown in Iname Implementation Tags. May also be a dictionary for backwards compatibility. iname may also be a wildcard using*
and?
.
Changed in version 2016.3: Added wildcards.
Changed in version 2018.1: Added iterable of tags
-
loopy.
duplicate_inames
(kernel, inames, within, new_inames=None, suffix=None, tags={})¶ - Parameters
within – a stack match as understood by
loopy.match.parse_stack_match()
.
-
loopy.
get_iname_duplication_options
(kernel, use_boostable_into=None)¶ List options for duplication of inames, if necessary for schedulability
- Returns
a generator listing all options to duplicate inames, if duplication of an iname is necessary to ensure the schedulability of the kernel. Duplication options are returned as tuples (iname, within) as understood by
duplicate_inames()
. There is no guarantee, that the transformed kernel will be schedulable, because multiple duplications of iname may be necessary.
Some kernels require the duplication of inames in order to be schedulable, as the forced iname dependencies define an over-determined problem to the scheduler. Consider the following minimal example:
- knl = lp.make_kernel([“{[i,j]:0<=i,j<n}”],
“”” mat1[i,j] = mat1[i,j] + 1 {inames=i:j, id=i1} mat2[j] = mat2[j] + 1 {inames=j, id=i2} mat3[i] = mat3[i] + 1 {inames=i, id=i3} “””)
In the example, there are four possibilities to resolve the problem: * duplicating i in instruction i3 * duplicating i in instruction i1 and i3 * duplicating j in instruction i2 * duplicating i in instruction i2 and i3
Use
has_schedulable_iname_nesting()
to decide whether an iname needs to be duplicated in a given kernel.
-
loopy.
has_schedulable_iname_nesting
(kernel)¶ - Returns
a
bool
indicating whether this kernel needs an iname duplication in order to be schedulable.
-
loopy.
prioritize_loops
(kernel, loop_priority)¶ Indicates the textual order in which loops should be entered in the kernel code. Note that this priority has an advisory role only. If the kernel logically requires a different nesting, priority is ignored. Priority is only considered if loop nesting is ambiguous.
prioritize_loops can be used multiple times. If you do so, each given loop_priority specifies a scheduling constraint. The constraints from all calls to prioritize_loops together establish a partial order on the inames (see https://en.wikipedia.org/wiki/Partially_ordered_set).
- Arg
an iterable of inames, or, for brevity, a comma-separated string of inames
-
loopy.
rename_iname
(kernel, old_iname, new_iname, existing_ok=False, within=None)¶ - Parameters
within – a stack match as understood by
loopy.match.parse_stack_match()
.existing_ok – execute even if new_iname already exists
-
loopy.
remove_unused_inames
(kernel, inames=None)¶ Delete those among inames that are unused, i.e. project them out of the domain. If these inames pose implicit restrictions on other inames, these restrictions will persist as existentially quantified variables.
- Parameters
inames – may be an iterable of inames or a string of comma-separated inames.
-
loopy.
split_reduction_inward
(kernel, inames, within=None)¶ Takes a reduction of the form:
sum([i,j,k], ...)
and splits it into two nested reductions:
sum([j,k], sum([i], ...))
In this case, inames would have been
"i"
indicating that the inamei
should be made the iname governing the inner reduction.- Parameters
inames – A list of inames, or a comma-separated string that can be parsed into those
-
loopy.
split_reduction_outward
(kernel, inames, within=None)¶ Takes a reduction of the form:
sum([i,j,k], ...)
and splits it into two nested reductions:
sum([i], sum([j,k], ...))
In this case, inames would have been
"i"
indicating that the inamei
should be made the iname governing the outer reduction.- Parameters
inames – A list of inames, or a comma-separated string that can be parsed into those
-
loopy.
affine_map_inames
(kernel, old_inames, new_inames, equations)¶ Return a new kernel where the affine transform specified by equations has been applied to the inames.
- Parameters
old_inames – A list of inames to be replaced by affine transforms of their values. May also be a string of comma-separated inames.
new_inames – A list of new inames that are not yet used in kernel, but have their values established in terms of old_inames by equations. May also be a string of comma-separated inames.
equations – A list of equations estabilishing a relationship between old_inames and new_inames. Each equation may be a tuple
(lhs, rhs)
of expressions or a string, with left and right hand side of the equation separated by=
.
-
loopy.
find_unused_axis_tag
(kernel, kind, insn_match=None)¶ For one of the hardware-parallel execution tags, find an unused axis.
- Parameters
insn_match – An instruction match as understood by
loopy.match.parse_match()
.kind – may be “l” or “g”, or the corresponding tag class name
- Returns
an
loopy.kernel.data.GroupIndexTag
orloopy.kernel.data.LocalIndexTag
that is not being used within the instructions matched by insn_match.
-
loopy.
make_reduction_inames_unique
(kernel, inames=None, within=None)¶ - Parameters
inames – if not None, only apply to these inames
within – a stack match as understood by
loopy.match.parse_stack_match()
.
New in version 2016.2.
-
loopy.
add_inames_to_insn
(kernel, inames, insn_match)¶ - Parameters
inames – a frozenset of inames that will be added to the instructions matched by insn_match, or a comma-separated string that parses to such a tuple.
insn_match – An instruction match as understood by
loopy.match.parse_match()
.
- Returns
an
loopy.kernel.data.GroupIndexTag
orloopy.kernel.data.LocalIndexTag
that is not being used within the instructions matched by insn_match.
New in version 2016.3.
-
loopy.
add_inames_for_unused_hw_axes
(kernel, within=None)¶ Returns a kernel with inames added to each instruction corresponding to any hardware-parallel iname tags (
loopy.kernel.data.GroupIndexTag
,loopy.kernel.data.LocalIndexTag
) unused in the instruction but used elsewhere in the kernel.Current limitations:
Only one iname in the kernel may be tagged with each of the unused hw axes.
Occurence of an
l.auto
tag when an instruction is missing one of the local hw axes.
- Parameters
within – An instruction match as understood by
loopy.match.parse_match()
.
Dealing with Substitution Rules¶
-
loopy.
extract_subst
(kernel, subst_name, template, parameters=())¶ - Parameters
subst_name – The name of the substitution rule to be created.
template – Unification template expression.
parameters – An iterable of parameters used in template, or a comma-separated string of the same.
All targeted subexpressions must match (‘unify with’) template The template may contain ‘*’ wildcards that will have to match exactly across all unifications.
-
loopy.
assignment_to_subst
(kernel, lhs_name, extra_arguments=(), within=None, force_retain_argument=False)¶ Extract an assignment (to a temporary variable or an argument) as a Substitution Rules. The temporary may be an array, in which case the array indices will become arguments to the substitution rule.
- Parameters
within – a stack match as understood by
loopy.match.parse_stack_match()
.force_retain_argument – If True and if lhs_name is an argument, it is kept even if it is no longer referenced.
This operation will change all usage sites of lhs_name matched by within. If there are further usage sites of lhs_name, then the original assignment to lhs_name as well as the temporary variable is left in place.
-
loopy.
expand_subst
(kernel, within=None)¶ Returns an instance of
loopy.LoopKernel
with the substitutions referenced in instructions of kernel matched by within expanded.- Parameters
within – a stack match as understood by
loopy.match.parse_stack_match()
.
-
loopy.
find_rules_matching
(kernel, pattern)¶ - Pattern
A shell-style glob pattern.
-
loopy.
find_one_rule_matching
(kernel, pattern)¶
Caching, Precomputation and Prefetching¶
-
loopy.
precompute
(kernel, subst_use, sweep_inames=[], within=None, storage_axes=None, temporary_name=None, precompute_inames=None, precompute_outer_inames=None, storage_axis_to_tag={}, default_tag=<class 'loopy.transform.precompute._not_provided'>, dtype=None, fetch_bounding_box=False, temporary_address_space=None, compute_insn_id=None, **kwargs)¶ Precompute the expression described in the substitution rule determined by subst_use and store it in a temporary array. A precomputation needs two things to operate, a list of sweep_inames (order irrelevant) and an ordered list of storage_axes (whose order will describe the axis ordering of the temporary array).
- Parameters
subst_use –
Describes what to prefetch.
The following objects may be given for subst_use:
The name of the substitution rule.
The tagged name (“name$tag”) of the substitution rule.
A list of invocations of the substitution rule. This list of invocations, when swept across sweep_inames, then serves to define the footprint of the precomputation.
Invocations may be tagged (“name$tag”) to filter out a subset of the usage sites of the substitution rule. (Namely those usage sites that use the same tagged name.)
Invocations may be given as a string or as a
pymbolic.primitives.Expression
object.If only one invocation is to be given, then the only entry of the list may be given directly.
If the list of invocations generating the footprint is not given, all (tag-matching, if desired) usage sites of the substitution rule are used to determine the footprint.
The following cases can arise for each sweep axis:
The axis is an iname that occurs within arguments specified at usage sites of the substitution rule. This case is assumed covered by the storage axes provided for the argument.
The axis is an iname that occurs within the value of the rule, but not within its arguments. A new, dedicated storage axis is allocated for such an axis.
- Parameters
sweep_inames – A
list
of inames to be swept. May also equivalently be a comma-separated string.within – a stack match as understood by
loopy.match.parse_stack_match()
.storage_axes – A
list
of inames and/or rule argument names/indices to be used as storage axes. May also equivalently be a comma-separated string.temporary_name – The temporary variable name to use for storing the precomputed data. If it does not exist, it will be created. If it does exist, its properties (such as size, type) are checked (and updated, if possible) to match its use.
precompute_inames – A tuple of inames to be used to carry out the precomputation. If the specified inames do not already exist, they will be created. If they do already exist, their loop domain is verified against the one required for this precomputation. This tuple may be shorter than the (provided or automatically found) storage_axes tuple, in which case names will be automatically created. May also equivalently be a comma-separated string.
precompute_outer_inames – A
frozenset
of inames within which the compute instruction is nested. If None, make an educated guess. May also be specified as a comma-separated string.default_tag – The iname tag to be applied to the inames created to perform the precomputation. The current default will make them local axes and automatically split them to fit the work group size, but this default will disappear in favor of simply leaving them untagged in 2019. For 2018, a warning will be issued if no default_tag is specified.
compute_insn_id – The ID of the instruction generated to perform the precomputation.
If storage_axes is not specified, it defaults to the arrangement <direct sweep axes><arguments> with the direct sweep axes being the slower-varying indices.
Trivial storage axes (i.e. axes of length 1 with respect to the sweep) are eliminated.
-
loopy.
add_prefetch
(kernel, var_name, sweep_inames=[], dim_arg_names=None, default_tag=<class 'loopy.transform.data._not_provided'>, rule_name=None, temporary_name=None, temporary_address_space=None, temporary_scope=None, footprint_subscripts=None, fetch_bounding_box=False, fetch_outer_inames=None)¶ Prefetch all accesses to the variable var_name, with all accesses being swept through sweep_inames.
- Parameters
var_name –
A string, the name of the variable being prefetched. This may be a ‘tagged variable name’ (such as
field$mytag
to restrict the effect of the operation to only variable accesses with a matching tag.This may also be a subscripted version of the variable, in which case this access dictates the footprint that is prefetched, e.g.
A[:,:]
orfield[i,j,:,:]
. In this case, accesses in the kernel are disregarded.sweep_inames – A list of inames, or a comma-separated string of them. This routine ‘sweeps’ all accesses to var_name through all allowed values of the sweep_inames to generate a footprint. All values in this footprint are then stored in a temporary variable, and the original variable accesses replaced with accesses to this temporary.
dim_arg_names – List of names representing each fetch axis. These names show up as inames in the generated fetch code
default_tag – The implementation tag to assign to the inames driving the prefetch code. Use None to leave them undefined (to assign them later by hand). The current default will make them local axes and automatically split them to fit the work group size, but this default will disappear in favor of simply leaving them untagged in 2019.x. For 2018.x, a warning will be issued if no default_tag is specified.
rule_name – base name of the generated temporary variable.
temporary_name – The name of the temporary to be used.
temporary_address_space – The
AddressSpace
to use for the temporary.footprint_subscripts –
A list of tuples indicating the index (i.e. subscript) tuples used to generate the footprint.
If only one such set of indices is desired, this may also be specified directly by putting an index expression into var_name. Substitutions such as those occurring in dimension splits are recorded and also applied to these indices.
fetch_bounding_box –
To fit within
loopy
’s execution model, the ‘footprint’ of the fetch currently has to be a convex set. Sometimes this is not the case, e.g. for a high-order stencil:o o ooooo o o
The footprint of the stencil when ‘swept’ over a base domain would look like this, and because of the ‘missing corners’, this set is not convex:
oooooooooo oooooooooo oooooooooooooo oooooooooooooo oooooooooooooo oooooooooooooo oooooooooo oooooooooo
Passing
fetch_bounding_box=True
givesloopy
permission to instead fetch the ‘bounding box’ of the footprint, i.e. this set in the stencil example:OOooooooooooOO OOooooooooooOO oooooooooooooo oooooooooooooo oooooooooooooo oooooooooooooo OOooooooooooOO OOooooooooooOO
Note the added corners marked with “
O
”. The resulting footprint is guaranteed to be convex.fetch_outer_inames – The inames within which the fetch instruction is nested. If None, make an educated guess.
This function internally uses
extract_subst()
andprecompute()
.
-
loopy.
buffer_array
(kernel, var_name, buffer_inames, init_expression=None, store_expression=None, within=None, default_tag='l.auto', temporary_scope=None, temporary_is_local=None, fetch_bounding_box=False)¶ Replace accesses to var_name with ones to a temporary, which is created and acts as a buffer. To perform this transformation, the access footprint to var_name is determined and a temporary of a suitable
loopy.AddressSpace
and shape is created.By default, the value of the buffered cells in var_name are read prior to any (read/write) use, and the modified values are written out after use has concluded, but for special use cases (e.g. additive accumulation), the behavior can be modified using init_expression and store_expression.
- Parameters
buffer_inames – The inames across which the buffer should be usable–i.e. all possible values of these inames will be covered by the buffer footprint. A tuple of inames or a comma-separated string.
init_expression – Either None (indicating the prior value of the buffered array should be read) or an expression optionally involving the variable ‘base’ (which references the associated location in the array being buffered).
store_expression – Either None, False, or an expression involving variables ‘base’ and ‘buffer’ (without array indices). (None indicates that a default storage instruction should be used, False indicates that no storing of the temporary should occur at all.)
within – If not None, limit the action of the transformation to matching contexts. See
loopy.match.parse_stack_match()
for syntax.temporary_scope – If given, override the choice of
AddressSpace
for the created temporary.default_tag – The default Iname Implementation Tags to be assigned to the inames used for fetching and storing
fetch_bounding_box – If the access footprint is non-convex (resulting in an error), setting this argument to True will force a rectangular (and hence convex) superset of the footprint to be fetched.
-
loopy.
alias_temporaries
(kernel, names, base_name_prefix=None, synchronize_for_exclusive_use=True)¶ Sets all temporaries given by names to be backed by a single piece of storage.
- Parameters
synchronize_for_exclusive_use – A
bool
. IfTrue
, this also introduces ordering structures (“groups”) to prevent the usage to ensure that the live ranges (i.e. the regions of code where each of the temporaries is used) do not overlap. This will allow two (or more) temporaries to share the same storage space as long as their live ranges do not need to be concurrent.base_name_prefix – an identifier to be used for the common storage area
Changed in version 2016.3: Added synchronize_for_exclusive_use flag.
synchronize_for_exclusive_use=True
was the previous default behavior.
Influencing data access¶
-
loopy.
change_arg_to_image
(kernel, name)¶
-
loopy.
tag_array_axes
(kernel, ary_names, dim_tags)¶ - Parameters
dim_tags –
a tuple of
loopy.kernel.array.ArrayDimImplementationTag
or a string that parses to one. Seeloopy.kernel.array.parse_array_dim_tags()
for a description of the allowed string format.For example, dim_tags could be
"N2,N0,N1"
to determine that the second axis is the fastest-varying, the last is the next-fastest, and the first is the slowest.
Changed in version 2016.2: This function was called
tag_data_axes
before version 2016.2.
-
loopy.
remove_unused_arguments
(kernel)¶
-
loopy.
set_array_axis_names
(kernel, ary_names, dim_names)¶ Changed in version 2016.2: This function was called
set_array_dim_names
before version 2016.2.
-
loopy.
privatize_temporaries_with_inames
(kernel, privatizing_inames, only_var_names=None)¶ This function provides each loop iteration of the privatizing_inames with its own private entry in the temporaries it accesses (possibly restricted to only_var_names).
This is accomplished implicitly as part of generating instruction-level parallelism by the “ILP” tag and accessible separately through this transformation.
Example:
for imatrix, i acc = 0 for k acc = acc + a[imatrix, i, k] * vec[k] end end
might become:
for imatrix, i acc[imatrix] = 0 for k acc[imatrix] = acc[imatrix] + a[imatrix, i, k] * vec[k] end end
facilitating loop interchange of the imatrix loop. .. versionadded:: 2018.1
Padding Data¶
-
loopy.
split_array_axis
(kernel, array_names, axis_nr, count, order='C')¶ - Parameters
array – a list of names of temporary variables or arguments. May also be a comma-separated string of these.
axis_nr – the (zero-based) index of the axis that should be split.
count – The group size to use in the split.
order – The way the split array axis should be linearized. May be “C” or “F” to indicate C/Fortran (row/column)-major order.
Changed in version 2016.2: There was a more complicated, dumber function called
loopy.split_array_dim
that had the role of this function in versions prior to 2016.2.
-
loopy.
find_padding_multiple
(kernel, variable, axis, align_bytes, allowed_waste=0.1)¶
-
loopy.
add_padding
(kernel, variable, axis, align_bytes)¶
Manipulating Instructions¶
-
loopy.
set_instruction_priority
(kernel, insn_match, priority)¶ Set the priority of instructions matching insn_match to priority.
insn_match may be any instruction id match understood by
loopy.match.parse_match()
.
-
loopy.
add_dependency
(kernel, insn_match, depends_on)¶ Add the instruction dependency dependency to the instructions matched by insn_match.
insn_match and depends_on may be any instruction id match understood by
loopy.match.parse_match()
.Changed in version 2016.3: Third argument renamed to depends_on for clarity, allowed to be not just ID but also match expression.
-
loopy.
remove_instructions
(kernel, insn_ids)¶ Return a new kernel with instructions in insn_ids removed.
Dependencies across (one, for now) deleted isntructions are propagated. Behavior is undefined for now for chains of dependencies within the set of deleted instructions.
This also updates no_sync_with for all instructions.
-
loopy.
replace_instruction_ids
(kernel, replacements)¶
-
loopy.
tag_instructions
(kernel, new_tag, within=None)¶
-
loopy.
add_nosync
(kernel, scope, source, sink, bidirectional=False, force=False, empty_ok=False)¶ Add a no_sync_with directive between source and sink. no_sync_with is only added if sink depends on source or if the instruction pair is in a conflicting group.
This function does not check for the presence of a memory dependency.
- Parameters
kernel – The kernel
source – Either a single instruction id, or any instruction id match understood by
loopy.match.parse_match()
.sink – Either a single instruction id, or any instruction id match understood by
loopy.match.parse_match()
.scope – A valid no_sync_with scope. See
loopy.InstructionBase.no_sync_with
for allowable scopes.bidirectional – A
bool
. If True, add a no_sync_with to both the source and sink instructions, otherwise the directive is only added to the sink instructions.force – A
bool
. If True, add a no_sync_with directive even without the presence of a dependency edge or conflicting instruction group.empty_ok – If True, do not complain even if no nosync tags were added as a result of the transformation.
- Returns
The updated kernel
Changed in version 2018.1: If the transformation adds no nosync directives, it will complain. This used to silently pass. This behavior can be restored using empty_ok.
-
loopy.
add_barrier
(kernel, insn_before='', insn_after='', id_based_on=None, tags=None, synchronization_kind='global', mem_kind=None)¶ Takes in a kernel that needs to be added a barrier and returns a kernel which has a barrier inserted into it. It takes input of 2 instructions and then adds a barrier in between those 2 instructions. The expressions can be any inputs that are understood by
loopy.match.parse_match()
.- Parameters
insn_before – String expression that specifies the instruction(s) before the barrier which is to be added
insn_after – String expression that specifies the instruction(s) after the barrier which is to be added
id – String on which the id of the barrier would be based on.
tags – The tag of the group to which the barrier must be added
synchronization_kind – Kind of barrier to be added. May be “global” or “local”
kind – Type of memory to be synchronied. May be “global” or “local”. Ignored for “global” bariers. If not supplied, defaults to synchronization_kind
Registering Library Routines¶
-
loopy.
register_reduction_parser
(parser)¶ Register a new
loopy.library.reduction.ReductionOperation
.- Parameters
parser – A function that receives a string and returns a subclass of ReductionOperation.
-
loopy.
register_preamble_generators
(kernel, preamble_generators)¶ - Parameters
manglers – list of functions of signature
(preamble_info)
generating tuples(sortable_str_identifier, code)
, where preamble_info is aPreambleInfo
.- Returns
kernel with manglers registered
-
loopy.
register_symbol_manglers
(kernel, manglers)¶
-
loopy.
register_function_manglers
(kernel, manglers)¶ - Parameters
manglers – list of functions of signature
(kernel, name, arg_dtypes)
returning aloopy.CallMangleInfo
.- Returns
kernel with manglers registered
Modifying Arguments¶
-
loopy.
set_argument_order
(kernel, arg_names)¶ - Parameters
arg_names – A list (or comma-separated string) or argument names. All arguments must be in this list.
-
loopy.
add_dtypes
(kernel, dtype_dict)¶ Specify remaining unspecified argument/temporary variable types.
- Parameters
dtype_dict – a mapping from variable names to
numpy.dtype
instances
-
loopy.
infer_unknown_types
(kernel, expect_completion=False)¶ Infer types on temporaries and arguments.
-
loopy.
add_and_infer_dtypes
(kernel, dtype_dict, expect_completion=False)¶
-
loopy.
rename_argument
(kernel, old_name, new_name, existing_ok=False)¶ New in version 2016.2.
-
loopy.
set_temporary_scope
(kernel, temp_var_names, scope)¶ - Parameters
temp_var_names – a container with membership checking, or a comma-separated string of variables for which the scope is to be set.
scope – One of the values from
loopy.AddressSpace
, or one of the strings"private"
,"local"
, or"global"
.
Creating Batches of Operations¶
-
loopy.
to_batched
(kernel, nbatches, batch_varying_args, batch_iname_prefix='ibatch', sequential=False)¶ Takes in a kernel that carries out an operation and returns a kernel that carries out a batch of these operations.
Note
For temporaries in a kernel that are private or read only globals and if sequential=True, loopy does not does not batch these variables unless explicitly mentioned in batch_varying_args.
- Parameters
nbatches – the number of batches. May be a constant non-negative integer or a string, which will be added as an integer argument.
batch_varying_args – a list of argument names that vary per-batch. Each such variable will have a batch index added.
sequential – A
bool
. If True, do not duplicate temporary variables for each batch. This automatically tags the batch iname for sequential execution.
Finishing up¶
-
loopy.
preprocess_kernel
(kernel, device=None)¶
-
loopy.
generate_loop_schedules
(kernel, debug_args={})¶ Warning
This function needs to be called inside (another layer) of a
loopy.schedule.MinRecursionLimitForScheduling
context manager, and the context manager needs to end after the last reference to the generators has gone out of scope. Otherwise, the high-recursion-limit generator chain may not be successfully garbage-collected and cause an internal error in the Python runtime.
-
loopy.
get_one_linearized_kernel
(kernel)¶
-
loopy.
save_and_reload_temporaries
(kernel)¶ Add instructions to save and reload temporary variables that are live across kernel calls.
The basic code transformation turns schedule segments:
t = <...> <return followed by call> <...> = t
into this code:
t = <...> t_save_slot = t <return followed by call> t = t_save_slot <...> = t
where t_save_slot is a newly-created global temporary variable.
- Returns
The resulting kernel
-
class
loopy.
GeneratedProgram
(*args, **kwargs)¶ -
name
¶
-
is_device_program
¶
-
ast
¶ Once generated, this captures the AST of the overall function definition, including the body.
-
body_ast
¶ Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.
-
-
class
loopy.
CodeGenerationResult
(*args, **kwargs)¶ -
host_program
¶
-
device_programs
¶ A list of
GeneratedProgram
instances intended to run on the compute device.
-
host_preambles
¶
-
device_preambles
¶
-
host_code
()¶
-
device_code
()¶
-
all_code
()¶
-
implemented_data_info
¶ a list of
loopy.codegen.ImplementedDataInfo
objects. Only added at the very end of code generation.
-
-
loopy.
generate_code_v2
(kernel)¶ - Returns
-
loopy.
generate_header
(kernel, codegen_result=None)¶ - Parameters
kernel – a
loopy.LoopKernel
codegen_result – an instance of
loopy.CodeGenerationResult
- Returns
a list of AST nodes (which may have
str
called on them to produce a string) representing function declarations for the generated device functions.
Setting options¶
Matching contexts¶
TODO: Matching instruction tags
-
loopy.match.
parse_match
(expr)¶ Syntax examples:
* ``id:yoink and writes:a_temp`` * ``id:yoink and (not writes:a_temp or tag:input)``
-
loopy.match.
parse_stack_match
(smatch)¶ Syntax example:
... > outer > ... > next > innermost $ insn > next insn > ... > next > innermost $
...
matches an arbitrary number of intervening stack levels.Each of the entries is a match expression as understood by
parse_match()
.
Match expressions¶
-
class
loopy.match.
MatchExpressionBase
¶
-
class
loopy.match.
All
¶
-
class
loopy.match.
And
(children)¶
-
class
loopy.match.
Or
(children)¶
-
class
loopy.match.
Not
(child)¶
-
class
loopy.match.
Id
(glob)¶
-
class
loopy.match.
Tagged
(glob)¶
-
class
loopy.match.
Writes
(glob)¶
-
class
loopy.match.
Reads
(glob)¶
-
class
loopy.match.
Iname
(glob)¶