Reference: Loopy’s Model of a Kernel¶
What Types of Computation can a Loopy Program Express?¶
Loopy programs consist of an a-priori unordered set of statements, operating on \(n\)-dimensional array variables.
Arrays consist of “plain old data” and structures thereof, as describable
by a numpy.dtype
. The n-dimensional shape of these arrays is
given by a tuple of expressions at most affine in parameters that are
fixed for the duration of program execution.
Each array variable in the program is either an argument or a temporary
variable. A temporary variable is only live within the program, while
argument variables are accessible outside the program and constitute the
program’s inputs and outputs.
A statement (still called ‘instruction’ in some places, cf.
loopy.InstructionBase
) encodes an assignment to an entry of an array.
The right-hand side of an assignment consists of an expression that may
consist of arithmetic operations and calls to functions.
If the outermost operation of the RHS expression is a function call,
the RHS value may be a tuple, and multiple (still scalar) arrays appear
as LHS values. (This is the only sense in which tuple types are supported.)
Each statement is parametrized by zero or more loop variables (“inames”).
A statement is executed once for each integer point defined by the domain
forest for the iname tuple given for that statement
(loopy.InstructionBase.within_inames
). Each execution of a
statement (with specific values of the inames) is called a statement
instance. Dependencies between these instances as well as instances of
other statements are encoded in the program representation and specify permissible
execution orderings. (The semantics of the dependencies are being
sharpened.) Assignments
(comprising the evaluation of the RHS and the assignment to the LHS) may
be specified to be atomic.
The basic building blocks of the domain forest are sets given as
conjunctions of equalities and inequalities of quasi-affine expressions on
integer tuples, called domains, and represented as instances of
islpy.BasicSet
. The entries of each integer tuple are
either parameters or inames. Each domain may optionally have a parent
domain. Parameters of parent-less domains are given by value arguments
supplied to the program that will remain unchanged during program
execution. Parameters of domains with parents may be
run-time-constant value arguments to the program, or
inames from parent domains, or
scalar, integer temporary variables that are written by statements with iteration domains controlled by a parent domain.
For each tuple of concrete parameter values, the set of iname tuples must be finite. Each iname is defined by exactly one domain.
For a tuple of inames, the domain forest defines an iteration domain by finding all the domains defining the inames involved, along with their parent domains. The resulting tree of domains may contain multiple roots, but no branches. The iteration domain is then constructed by intersecting these domains and constructing the projection of that set onto the space given by the required iname tuple. Observe that, via the parent-child domain mechanism, imperfectly-nested and data-dependent loops become expressible.
The set of functions callable from the language is predefined by the system. Additional functions may be defined by the user by registering them. It is not currently possible to define functions from within Loopy, however work is progressing on permitting this. Even once this is allowed, recursion will not be permitted.
Loop Domain Forest¶
Example:
{ [i]: 0<=i<n }
A kernel’s iteration domain is given by a list of islpy.BasicSet
instances (which parametrically represent multi-dimensional sets of
tuples of integers). They define the integer values of the loop variables
for which instructions (see below) will be executed.
It is written in ISL syntax. loopy
calls the loop variables
inames. In this case, i is the sole iname. The loop
domain is given as a conjunction of affine equality
and inequality constraints. Integer divisibility constraints (resulting
in strides) are also allowed. In the absence of divisibility
constraints, the loop domain is convex.
Note that n in the example is not an iname. It is a Domain parameters that is passed to the kernel by the user.
To accommodate some data-dependent control flow, there is not actually a single loop domain, but rather a forest of loop domains (a collection of trees) allowing more deeply nested domains to depend on inames introduced by domains closer to the root.
Here is an example:
{ [l] : 0 <= l <= 2 }
{ [i] : start <= i < end }
{ [j] : start <= j < end }
The i and j domains are “children” of the l domain (visible from indentation).
This is also how loopy
prints the domain forest, to make the parent/child
relationship visible. In the example, the parameters start/end might be read
inside of the ‘l’ loop.
The idea is that domains form a forest (a collection of trees), and a “sub-forest” is extracted that covers all the inames for each instruction. Each individual sub-tree is then checked for branching, which is ill-formed. It is declared ill-formed because intersecting, in the above case, the l, i, and j domains could result in restrictions from the i domain affecting the j domain by way of how i affects l–which would be counterintuitive to say the least.)
Inames¶
Loops are (by default) entered exactly once. This is necessary to preserve dependency semantics–otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest.
ISL syntax¶
The general syntax of an ISL set is the following:
{[VARIABLES]: CONDITIONS}
VARIABLES
is a simple list of identifiers representing loop indices,
or, as loopy calls them, inames. Example:
{[i, j, k]: CONDITIONS}
The following constructs are supported for CONDITIONS
:
Simple conditions:
i <= 15
,i>0
Conjunctions:
i > 0 and i <= 15
Two-sided conditions:
0 < i <= 15
(equivalent to the previous example)Identical conditions on multiple variables:
0 < i,j <= 15
Equality constraints:
i = j*3
(Note:=
, not==
.)Modulo:
i mod 3 = 0
Existential quantifiers:
(exists l: i = 3*l)
(equivalent to the previous example)
Examples of constructs that are not allowed:
Multiplication by non-constants:
j*k
Disjunction:
(i=1) or (i=5)
(Note: This may be added in a future version of loopy. For now, loop domains have to be convex.)
Domain parameters¶
Domain parameters are identifiers being used in loop domains that are not inames, i.e. they do not define loop variables. In the following domain specification, n is a domain parameter:
{[i,j]: 0 <= i,j < n}
Values of domain parameters arise from
being passed to the kernel as Arguments
being assigned to Temporary Variables to feed into domains lower in the Loop Domain Forest.
Identifiers¶
Reserved Identifiers¶
The identifier prefix _lp_
is reserved for internal usage; when creating
inames, argument names, temporary variable names, substitution rule
names, instruction IDs, and other identifiers, users should not use names
beginning with _lp_
. This prefix is used for identifiers created
internally when operating on Loopy’s kernel IR. For Loopy developers, further
information on name prefixes used within submodules is below.
Identifier Registry¶
Functionality in loopy
must use identifiers beginning with _lp_
for
all internally-created identifiers. Additionally, each name beginning with
_lp_
must start with one of the reserved prefixes below. New prefixes may
be registered by adding them to the table below. New prefixes may not themselves
be the prefix of an existing prefix.
Reserved Identifier Prefixes
Reserved Prefix |
Usage (module or purpose) |
---|---|
|
|
Note
Existing Loopy code may not yet fully satisfy these naming requirements. Name changes are in progress, and prefixes will be added to this registry as they are created.
Instructions¶
- class loopy.HappensAfter(variable_name: str | None, instances_rel: Map | None)[source]¶
A class representing a “happens-after” relationship between two statements found in a
loopy.LoopKernel
. Used to validate that a given kernel transformation respects the data dependencies in a given program.- variable_name¶
The name of the variable responsible for the dependency. For backward compatibility purposes, this may be None. In this case, the dependency semantics revert to the deprecated, statement-level dependencies of prior versions of
loopy
.
- instances_rel¶
An
islpy.Map
representing the precise happens-after relationship. The domain and range are sets of statement instances. The instances in the domain are required to execute before the instances in the range.Map dimensions are named according to the order of appearance of the inames in a
loopy
program. The dimension names in the range are appended with a prime to signify that the mapped instances are distinct.As a (deprecated) matter of backward compatibility, this may be None, in which case the semantics revert to the (underspecified) statement-level dependencies of prior versions of
loopy
.
- class loopy.InstructionBase(id: str | None, happens_after: Mapping[str, HappensAfter] | FrozenSet[str] | str | None, depends_on_is_final: bool | None, groups: FrozenSet[str] | None, conflicts_with_groups: FrozenSet[str] | None, no_sync_with: FrozenSet[Tuple[str, str]] | None, within_inames_is_final: bool | None, within_inames: FrozenSet[str] | None, priority: int | None, predicates: FrozenSet[str] | None, tags: FrozenSet[Tag] | None, *, depends_on: FrozenSet[str] | str | None = None)[source]¶
A base class for all types of instruction that can occur in a kernel.
- id¶
An (otherwise meaningless) identifier that is unique within a
loopy.LoopKernel
.
Instruction ordering
- depends_on¶
a
frozenset
ofid
values ofInstructionBase
instances that must be executed before this one. Note thatloopy.preprocess_kernel()
(usually invoked automatically) augments this by adding dependencies on any writes to temporaries read by this instruction.May be None to invoke the default.
There are two extensions to this:
You may use * as a wildcard in the given IDs. This will be expanded to all matching instruction IDs during
loopy.make_kernel()
.Instead of an instruction ID, you may pass an instance of
loopy.match.MatchExpressionBase
into thedepends_on
frozenset
. The given expression will be used to add any matching instructions in the kernel todepends_on
duringloopy.make_kernel()
. Note, that this is not meant as a user-facing interface.
- depends_on_is_final¶
A
bool
determining whetherdepends_on
constitutes the entire list of iname dependencies. If not marked final, various semi-broken heuristics will try to add further dependencies.Defaults to False.
- groups¶
A
frozenset
of strings indicating the names of ‘instruction groups’ of which this instruction is a part. An instruction group is considered ‘active’ as long as one (but not all) instructions of the group have been executed.
- conflicts_with_groups¶
A
frozenset
of strings indicating which instruction groups (seegroups
) may not be active when this instruction is scheduled.
- priority¶
Scheduling priority, an integer. Higher means ‘execute sooner’. Default 0.
Synchronization
- no_sync_with¶
a
frozenset
of tuples of the form(insn_id, scope)
, whereinsn_id
refers toid
ofInstructionBase
instances and scope is one of the following strings:“local”
“global”
“any”.
An element
(insn_id, scope)
means “do not consider any variable access conflicting for variables ofscope
between this instruction andinsn_id
”. Specifically, loopy will not complain even if it detects that accesses potentially requiring ordering (e.g. by dependencies) exist, and it will not emit barriers to guard any dependencies from this instruction oninsn_id
that may exist.Note, that
no_sync_with
allows instruction matching through wildcards and match expression, just likedepends_on
.This data is used specifically by barrier insertion and
loopy.check.check_variable_access_ordered()
.
Conditionals
- predicates¶
a
frozenset
of expressions. The conjunction (logical and) of their truth values (as defined by C) determines whether this instruction should be run.
Iname dependencies
- within_inames¶
A
frozenset
of inames identifying the loops within which this instruction will be executed.
Iname dependencies
Tagging
- tags¶
A
frozenset
of subclasses ofpytools.tag.Tag
used to provide metadata on this object. Legacy string tags are converted toLegacyStringInstructionTag
or, if they used to carry a functional meaning, the tag carrying that same functional meaning (e.g.UseStreamingStoreTag
).
- __init__(id: str | None, happens_after: Mapping[str, HappensAfter] | FrozenSet[str] | str | None, depends_on_is_final: bool | None, groups: FrozenSet[str] | None, conflicts_with_groups: FrozenSet[str] | None, no_sync_with: FrozenSet[Tuple[str, str]] | None, within_inames_is_final: bool | None, within_inames: FrozenSet[str] | None, priority: int | None, predicates: FrozenSet[str] | None, tags: FrozenSet[Tag] | None, *, depends_on: FrozenSet[str] | str | None = None) None [source]¶
- assignee_var_names()[source]¶
Return a tuple of assignee variable names, one for each quantity being assigned to.
- assignee_subscript_deps()[source]¶
Return a list of sets of variable names referred to in the subscripts of the quantities being assigned to, one for each assignee.
- with_transformed_expressions(f, assignee_f=None)[source]¶
Return a new copy of self where f has been applied to every expression occurring in self. args will be passed as extra arguments (in addition to the expression) to f.
If assignee_f is passed, then left-hand sides of assignments are passed to it. If it is not given, it defaults to the same as f.
- write_dependency_names()[source]¶
Return a set of dependencies of the left hand side of the assignments performed by this instruction, including written variables and indices.
Inherits from
pytools.tag.Taggable
.
Assignment objects¶
- class loopy.Assignment(assignee: str | int | ~numpy.integer | float | complex | ~numpy.inexact | bool | ~numpy.bool | Expression | tuple[ExpressionT, ...], expression: str | int | ~numpy.integer | float | complex | ~numpy.inexact | bool | ~numpy.bool | Expression | tuple[ExpressionT, ...], id: str | None = None, happens_after: ~typing.Mapping[str, ~loopy.kernel.instruction.HappensAfter] | ~typing.FrozenSet[str] | str | None = None, depends_on_is_final: bool | None = None, groups: ~typing.FrozenSet[str] | None = None, conflicts_with_groups: ~typing.FrozenSet[str] | None = None, no_sync_with: ~typing.FrozenSet[~typing.Tuple[str, str]] | None = None, within_inames_is_final: bool | None = None, within_inames: ~typing.FrozenSet[str] | None = None, priority: int | None = None, predicates: ~typing.FrozenSet[str] | None = None, tags: ~typing.FrozenSet[~pytools.tag.Tag] | None = None, temp_var_type: ~typing.Type[~loopy.kernel.instruction._not_provided] | None | ~loopy.tools.Optional | ~loopy.types.LoopyType = <class 'loopy.kernel.instruction._not_provided'>, atomicity: ~typing.Tuple[~loopy.kernel.instruction.VarAtomicity, ...] = (), *, depends_on: ~typing.FrozenSet[str] | str | None = None)[source]¶
- assignee¶
- expression¶
The following attributes are only used until
loopy.make_kernel()
is finished:- temp_var_type¶
A
loopy.Optional
. If not empty, contains the type that will be assigned to the new temporary variable created from the assignment.
- atomicity¶
A tuple of instances of
VarAtomicity
. Together, they describe to what extent the assignment is to be carried out in a way that involves atomic operations.To describe an atomic update, any memory reads of exact occurrences of the left-hand side expression of the assignment in the right hand side are treated , together with the “memory write” part of the assignment, as part of one single atomic update.
Note
Exact identity of the LHS with RHS subexpressions is required for an atomic update to be recognized. For example, the following update will not be recognized as an update:
z[i] = z[i+1-1] + a {atomic}
loopy
may choose to evaluate the right-hand side multiple times as part of a single assignment. It is up to the user to ensure that this retains correct semantics.For example, the following assignment:
z[i] = f(z[i]) + a {atomic}
may generate the following (pseudo-)code:
DO READ ztemp_old = z[i] EVALUATE ztemp_new = f(ztemp_old) + a WHILE compare_and_swap(z[i], ztemp_new, ztemp_old) did not succeed
- __init__(assignee: str | int | ~numpy.integer | float | complex | ~numpy.inexact | bool | ~numpy.bool | Expression | tuple[ExpressionT, ...], expression: str | int | ~numpy.integer | float | complex | ~numpy.inexact | bool | ~numpy.bool | Expression | tuple[ExpressionT, ...], id: str | None = None, happens_after: ~typing.Mapping[str, ~loopy.kernel.instruction.HappensAfter] | ~typing.FrozenSet[str] | str | None = None, depends_on_is_final: bool | None = None, groups: ~typing.FrozenSet[str] | None = None, conflicts_with_groups: ~typing.FrozenSet[str] | None = None, no_sync_with: ~typing.FrozenSet[~typing.Tuple[str, str]] | None = None, within_inames_is_final: bool | None = None, within_inames: ~typing.FrozenSet[str] | None = None, priority: int | None = None, predicates: ~typing.FrozenSet[str] | None = None, tags: ~typing.FrozenSet[~pytools.tag.Tag] | None = None, temp_var_type: ~typing.Type[~loopy.kernel.instruction._not_provided] | None | ~loopy.tools.Optional | ~loopy.types.LoopyType = <class 'loopy.kernel.instruction._not_provided'>, atomicity: ~typing.Tuple[~loopy.kernel.instruction.VarAtomicity, ...] = (), *, depends_on: ~typing.FrozenSet[str] | str | None = None) None [source]¶
Textual Assignment Syntax¶
The general syntax of an instruction is a simple assignment:
LHS[i,j,k] = EXPRESSION
Several extensions of this syntax are defined, as discussed below. They may be combined freely.
You can also use an instruction to declare a new temporary variable. (See
Temporary Variables.) See Specifying Types for what types are acceptable. If the
LHS
has a subscript, bounds on the indices are inferred (which must be
constants at the time of kernel creation) and the declared temporary is
created as an array. Instructions declaring temporaries have the following
form:
<temp_var_type> LHS[i,j,k] = EXPRESSION
You can also create a temporary and ask loopy to determine its type automatically. This uses the following syntax:
<> LHS[i,j,k] = EXPRESSION
Lastly, each instruction may optionally have a number of attributes specified, using the following format:
LHS[i,j,k] = EXPRESSION {attr1,attr2=value1:value2}
These are usually key-value pairs. The following attributes are recognized:
id=value
sets the instruction’s identifier tovalue
.value
must be unique within the kernel. This identifier is used to refer to the instruction after it has been created, such as fromdep
attributes (see below) or fromcontext matches
.id_prefix=value
also sets the instruction’s identifier, however uniqueness is ensured by loopy itself, by appending further components (often numbers) to the givenid_prefix
.inames=i:j:k
forces the instruction to reside within the loops over Inamesi
,j
andk
(and only those).Note
The default for the inames that the instruction depends on is the inames used in the instruction itself plus the common subset of inames shared by writers of all variables read by the instruction.
You can add a plus sign (”
+
”) to the front of this option value to indicate that you would like the inames you specify here to be in addition to the ones found by the heuristic described above.dup=i:j->j_new:k->k_new
makes a copy of the inamesi
,j
, andk
, with all the same domain constraints as the original inames. A new name of the copy ofi
will be automatically chosen, whereas the new name ofj
will bej_new
, and the new name ofk
will bek_new
.This is a shortcut for calling
loopy.duplicate_inames()
later (once the kernel is created).dep=id1:id2
creates a dependency of this instruction on the instructions with identifiersid1
andid2
. The meaning of this dependency is that the code generated for this instruction is required to appear textually after all of these dependees’ generated code.Identifiers here are allowed to be wildcards as defined by the Python function
fnmatch.fnmatchcase()
. This is helpful in conjunction withid_prefix
.Note
Since specifying all possible dependencies is cumbersome and error-prone,
loopy
employs a heuristic to automatically find dependencies. Specifically,loopy
will automatically add a dependency to an instruction reading a variable if there is exactly one instruction writing that variable. (“Variable” here may mean either temporary variable or kernel argument.)If each variable in a kernel is only written once, then this heuristic should be able to compute all required dependencies.
Conversely, if a variable is written by two different instructions, all ordering around that variable needs to be specified explicitly. It is recommended to use
get_dot_dependency_graph()
to visualize the dependency graph of possible orderings.You may use a leading asterisk (”
*
”) to turn off the single-writer heuristic and indicate that the specified list of dependencies is exhaustive.dep_query=...
provides an alternative way of specifying instruction dependencies. The given string is parsed as a match expression object byloopy.match.parse_match()
. Upon kernel generation, this match expression is used to match instructions in the kernel and add them as dependencies.nosync=id1:id2
prescribes that no barrier synchronization is necessary for the instructions with identifiersid1
andid2
, even if a dependency chain exists and variables are accessed in an apparently racy way.Identifiers here are allowed to be wildcards as defined by the Python function
fnmatch.fnmatchcase()
. This is helpful in conjunction withid_prefix
.Identifiers (including wildcards) accept an optional @scope suffix, which prescribes that no synchronization at level scope is needed. This does not preclude barriers at levels different from scope. Allowable scope values are:
local
global
any
As an example,
nosync=id1@local:id2@global
prescribes that no local synchronization is needed with instructionid1
and no global synchronization is needed with instructionid2
.nosync=id1@any
has the same effect asnosync=id1
.nosync_query=...
provides an alternative way of specifyingnosync
, just likedep_query
anddep
. As withnosync
,nosync_query
accepts an optional @scope suffix.priority=integer
sets the instructions priority to the valueinteger
. Instructions with higher priority will be scheduled sooner, if possible. Note that the scheduler may still schedule a lower-priority instruction ahead of a higher-priority one if loop orders or dependencies require it.if=variable1:variable2
Only execute this instruction if all condition variables (which must be scalar variables) evaluate totrue
(as defined by C).tags=tag1:tag2
Apply tags to this instruction that can then be used for Matching contexts.groups=group1:group2
Make this instruction part of the given instruction groups. SeeInstructionBase.groups
.conflicts_grp=group1:group2
Make this instruction conflict with the given instruction groups. SeeInstructionBase.conflicts_with_groups
.atomic
The update embodied by the assignment is carried out atomically. SeeAssignment.atomicity
for precise semantics.
Expressions¶
Loopy’s expressions are a slight superset of the expressions supported by
pymbolic
.
if(cond, then, else_)
a[[ 8*i + j ]]
: Linear subscripts. Seeloopy.symbolic.LinearSubscript
.reductions
Seeloopy.symbolic.Reduction
.reduce
vssimul_reduce
complex-valued arithmetic
tagging of array access and substitution rule use (“$”) See
loopy.symbolic.TaggedVariable
.indexof
,indexof_vec
cast(type, value)
: No parse syntax currently. Seeloopy.symbolic.TypeCast
.If constants in expressions are subclasses of
numpy.generic
, generated code will contain literals of exactly that type, making them explicitly typed. Constants given as Python types such asint
,float
orcomplex
are called implicitly typed and adapt to the type of the expected result.
TODO: Functions TODO: Reductions
Function Call Instructions¶
- class loopy.CallInstruction(assignees, expression, id=None, happens_after=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, tags=None, temp_var_types=None, priority=0, predicates=frozenset({}), depends_on=None)[source]¶
An instruction capturing a function call. Unlike
Assignment
, this instruction supports functions with multiple return values.- expression¶
The following attributes are only used until
loopy.make_kernel()
is finished:- temp_var_types¶
A tuple of :class:loopy.Optional. If an entry is not empty, it contains the type that will be assigned to the new temporary variable created from the assignment.
C Block Instructions¶
- class loopy.CInstruction(iname_exprs, code, read_variables=frozenset({}), assignees=(), id=None, happens_after=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=0, predicates=frozenset({}), tags=None, depends_on=None)[source]¶
- iname_exprs¶
A tuple of tuples (name, expr) of inames or expressions based on them that the instruction needs access to.
- code¶
The C code to be executed.
The code should obey the following rules:
It should only write to temporary variables, specifically the temporary variables
Note
Of course, nothing in
loopy
will prevent you from doing ‘forbidden’ things in your C code. If you ignore the rules and something breaks, you get to keep both pieces.
- read_variables¶
A
frozenset
of variable names thatcode
reads. This is optional and only used for figuring out dependencies.
- assignees¶
A sequence (typically a
tuple
) of variable references (with or without subscript) aspymbolic.primitives.Expression
instances thatcode
writes to. This is optional and only used for figuring out dependencies.
Atomic Operations¶
- class loopy.MemoryOrdering[source]¶
Ordering of atomic operations, defined as in C11 and OpenCL.
- RELAXED¶
- ACQUIRE¶
- RELEASE¶
- ACQ_REL¶
- SEQ_CST¶
- class loopy.MemoryScope[source]¶
Scope of atomicity, defined as in OpenCL.
- auto¶
Scope matches the accessibility of the variable.
- WORK_ITEM¶
- WORK_GROUP¶
- WORK_DEVICE¶
- ALL_SVM_DEVICES¶
- class loopy.VarAtomicity(var_name)[source]¶
A base class for the description of how atomic access to
var_name
shall proceed.- var_name¶
- class loopy.OrderedAtomic(var_name)[source]¶
Properties of an atomic operation. A subclass of
VarAtomicity
.- ordering¶
One of the values from
MemoryOrdering
- scope¶
One of the values from
MemoryScope
- class loopy.AtomicInit(var_name)[source]¶
Describes initialization of an atomic variable. A subclass of
OrderedAtomic
.- ordering¶
One of the values from
MemoryOrdering
- scope¶
One of the values from
MemoryScope
- class loopy.AtomicUpdate(var_name)[source]¶
Properties of an atomic update. A subclass of
OrderedAtomic
.- ordering¶
One of the values from
MemoryOrdering
- scope¶
One of the values from
MemoryScope
No-Op Instruction¶
- class loopy.NoOpInstruction(id=None, happens_after=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=None, predicates=None, tags=None, depends_on=None)[source]¶
An instruction that carries out no operation. It is mainly useful as a way to structure dependencies between other instructions.
The textual syntax in a
loopy
kernel is:... nop
Barrier Instructions¶
- class loopy.BarrierInstruction(id, happens_after=None, depends_on_is_final=None, groups=None, conflicts_with_groups=None, no_sync_with=None, within_inames_is_final=None, within_inames=None, priority=None, predicates=None, tags=None, synchronization_kind='global', mem_kind='local', depends_on=None)[source]¶
An instruction that requires synchronization with all concurrent work items of
synchronization_kind
.- synchronization_kind¶
A string,
"global"
or"local"
.
- mem_kind¶
A string,
"global"
or"local"
. Chooses which memory type to synchronize, for targets that require this (e.g. OpenCL)
The textual syntax in a
loopy
kernel is:... gbarrier ... lbarrier
Note that the memory type
mem_kind
can be specified for local barriers:... lbarrier {mem_kind=global}
Data: Arguments and Temporaries¶
Kernels operate on two types of data: ‘arguments’ carrying data into and out of a kernel, and temporaries with lifetimes tied to the runtime of the kernel.
Arguments¶
- class loopy.KernelArgument(**kwargs)[source]¶
Base class for all argument types.
- name¶
- dtype¶
- is_output¶
- is_input¶
- class loopy.ValueArg(name: str, dtype: ToLoopyTypeConvertible | None = None, approximately: int = 1000, is_output: bool = False, is_input: bool = True, tags: frozenset[Tag] | None = None)[source]¶
- class loopy.ArrayArg(*args, **kwargs)[source]¶
-
- dtype: LoopyType | None¶
The
loopy.types.LoopyType
of the array. If this is None,loopy
will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (vialoopy.LoopKernel.__call__()
) automatically adds this type information based on invocation arguments.Note that some transformations, such as
loopy.add_padding()
cannot be performed without knowledge of the exact dtype.
- shape: ShapeType | Type[auto] | None¶
May be one of the following:
None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.
loopy.auto
. The shape will be determined by finding the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.Any element of the shape tuple not used to compute strides may be None.
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- offset: ExpressionT | str | None¶
Offset from the beginning of the buffer to the point from which the strides are counted, in units of the
dtype
. May be one of0 or None
a string (that is interpreted as an argument name).
a pymbolic expression
loopy.auto
, in which case an offset argument is added automatically, immediately following this argument.
- dim_names: Tuple[str, ...] | None¶
A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as
dim_tags
anddim_tags
. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.
- alignment: int | None¶
Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.
Defaults to None.
If an integer N is given, the array would be declared with
__attribute__((aligned(N)))
in code generation forloopy.CFamilyTarget
.Added in version 2018.1.
- tags: FrozenSet[Tag]¶
A (possibly empty) frozenset of instances of
pytools.tag.Tag
intended for consumption by an application.Added in version 2020.2.2.
- __init__(*args, **kwargs)[source]¶
All of the following (except name) are optional. Specify either strides or shape.
- Parameters:
name – When passed to
loopy.make_kernel
, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.shape – May be any of the things specified under
shape
, or a string which can be parsed into the previous form.dim_tags – A comma-separated list of tags as understood by
loopy.kernel.array.parse_array_dim_tags()
.strides –
May be one of the following:
None
loopy.auto
. The strides will be determined by order and the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.A string which can be parsed into the previous form.
order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to
loopy.make_kernel()
.for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
offset – (See
offset
)alignment – memory alignment in bytes
tags – An instance of or an Iterable of instances of
pytools.tag.Tag
.
- vector_size(target: TargetBase) int [source]¶
Return the size of the vector type used for the array divided by the basic data type.
Note: For 3-vectors, this will be 4.
(supports persistent hashing)
- address_space¶
An attribute of
AddressSpace
defining the address space in which the array resides.
- class loopy.ConstantArg(*args, **kwargs)[source]¶
-
- dtype: LoopyType | None¶
The
loopy.types.LoopyType
of the array. If this is None,loopy
will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (vialoopy.LoopKernel.__call__()
) automatically adds this type information based on invocation arguments.Note that some transformations, such as
loopy.add_padding()
cannot be performed without knowledge of the exact dtype.
- shape: ShapeType | Type[auto] | None¶
May be one of the following:
None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.
loopy.auto
. The shape will be determined by finding the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.Any element of the shape tuple not used to compute strides may be None.
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- offset: ExpressionT | str | None¶
Offset from the beginning of the buffer to the point from which the strides are counted, in units of the
dtype
. May be one of0 or None
a string (that is interpreted as an argument name).
a pymbolic expression
loopy.auto
, in which case an offset argument is added automatically, immediately following this argument.
- dim_names: Tuple[str, ...] | None¶
A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as
dim_tags
anddim_tags
. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.
- alignment: int | None¶
Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.
Defaults to None.
If an integer N is given, the array would be declared with
__attribute__((aligned(N)))
in code generation forloopy.CFamilyTarget
.Added in version 2018.1.
- tags: FrozenSet[Tag]¶
A (possibly empty) frozenset of instances of
pytools.tag.Tag
intended for consumption by an application.Added in version 2020.2.2.
- __init__(*args, **kwargs)[source]¶
All of the following (except name) are optional. Specify either strides or shape.
- Parameters:
name – When passed to
loopy.make_kernel
, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.shape – May be any of the things specified under
shape
, or a string which can be parsed into the previous form.dim_tags – A comma-separated list of tags as understood by
loopy.kernel.array.parse_array_dim_tags()
.strides –
May be one of the following:
None
loopy.auto
. The strides will be determined by order and the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.A string which can be parsed into the previous form.
order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to
loopy.make_kernel()
.for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
offset – (See
offset
)alignment – memory alignment in bytes
tags – An instance of or an Iterable of instances of
pytools.tag.Tag
.
- vector_size(target: TargetBase) int [source]¶
Return the size of the vector type used for the array divided by the basic data type.
Note: For 3-vectors, this will be 4.
(supports persistent hashing)
- class loopy.ImageArg(*args, **kwargs)[source]¶
-
- dtype: LoopyType | None¶
The
loopy.types.LoopyType
of the array. If this is None,loopy
will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (vialoopy.LoopKernel.__call__()
) automatically adds this type information based on invocation arguments.Note that some transformations, such as
loopy.add_padding()
cannot be performed without knowledge of the exact dtype.
- shape: ShapeType | Type[auto] | None¶
May be one of the following:
None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.
loopy.auto
. The shape will be determined by finding the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.Any element of the shape tuple not used to compute strides may be None.
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- offset: ExpressionT | str | None¶
Offset from the beginning of the buffer to the point from which the strides are counted, in units of the
dtype
. May be one of0 or None
a string (that is interpreted as an argument name).
a pymbolic expression
loopy.auto
, in which case an offset argument is added automatically, immediately following this argument.
- dim_names: Tuple[str, ...] | None¶
A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as
dim_tags
anddim_tags
. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.
- alignment: int | None¶
Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.
Defaults to None.
If an integer N is given, the array would be declared with
__attribute__((aligned(N)))
in code generation forloopy.CFamilyTarget
.Added in version 2018.1.
- tags: FrozenSet[Tag]¶
A (possibly empty) frozenset of instances of
pytools.tag.Tag
intended for consumption by an application.Added in version 2020.2.2.
- __init__(*args, **kwargs)[source]¶
All of the following (except name) are optional. Specify either strides or shape.
- Parameters:
name – When passed to
loopy.make_kernel
, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.shape – May be any of the things specified under
shape
, or a string which can be parsed into the previous form.dim_tags – A comma-separated list of tags as understood by
loopy.kernel.array.parse_array_dim_tags()
.strides –
May be one of the following:
None
loopy.auto
. The strides will be determined by order and the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.A string which can be parsed into the previous form.
order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to
loopy.make_kernel()
.for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
offset – (See
offset
)alignment – memory alignment in bytes
tags – An instance of or an Iterable of instances of
pytools.tag.Tag
.
- vector_size(target: TargetBase) int [source]¶
Return the size of the vector type used for the array divided by the basic data type.
Note: For 3-vectors, this will be 4.
(supports persistent hashing)
Temporary Variables¶
Temporary variables model OpenCL’s private
and local
address spaces. Both
have the lifetime of a kernel invocation.
- class loopy.AddressSpace(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Storage location of a variable.
- PRIVATE¶
- LOCAL¶
- GLOBAL¶
- class loopy.TemporaryVariable(name: str, dtype: ToLoopyTypeConvertible = None, shape: Union[ShapeType, Type['auto'], None] = <class 'loopy.typing.auto'>, address_space: Union[AddressSpace, Type[auto], None] = None, dim_tags: Optional[Sequence[ArrayDimImplementationTag]] = None, offset: Union[ExpressionT, str, None] = 0, dim_names: Optional[Tuple[str, ...]] = None, strides: Optional[Tuple[ExpressionT, ...]] = None, order: str | None = None, base_indices: Optional[Tuple[ExpressionT, ...]] = None, storage_shape: ShapeType | None = None, base_storage: Optional[str] = None, initializer: Optional[np.ndarray] = None, read_only: bool = False, _base_storage_access_may_be_aliasing: bool = False, **kwargs: Any)[source]¶
-
- dtype: LoopyType | None¶
The
loopy.types.LoopyType
of the array. If this is None,loopy
will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (vialoopy.LoopKernel.__call__()
) automatically adds this type information based on invocation arguments.Note that some transformations, such as
loopy.add_padding()
cannot be performed without knowledge of the exact dtype.
- shape: ShapeType | Type[auto] | None¶
May be one of the following:
None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.
loopy.auto
. The shape will be determined by finding the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.Any element of the shape tuple not used to compute strides may be None.
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- offset: ExpressionT | str | None¶
Offset from the beginning of the buffer to the point from which the strides are counted, in units of the
dtype
. May be one of0 or None
a string (that is interpreted as an argument name).
a pymbolic expression
loopy.auto
, in which case an offset argument is added automatically, immediately following this argument.
- dim_names: Tuple[str, ...] | None¶
A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as
dim_tags
anddim_tags
. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.
- alignment: int | None¶
Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.
Defaults to None.
If an integer N is given, the array would be declared with
__attribute__((aligned(N)))
in code generation forloopy.CFamilyTarget
.Added in version 2018.1.
- tags: FrozenSet[Tag]¶
A (possibly empty) frozenset of instances of
pytools.tag.Tag
intended for consumption by an application.Added in version 2020.2.2.
- __init__(name: str, dtype: ToLoopyTypeConvertible = None, shape: Union[ShapeType, Type['auto'], None] = <class 'loopy.typing.auto'>, address_space: Union[AddressSpace, Type[auto], None] = None, dim_tags: Optional[Sequence[ArrayDimImplementationTag]] = None, offset: Union[ExpressionT, str, None] = 0, dim_names: Optional[Tuple[str, ...]] = None, strides: Optional[Tuple[ExpressionT, ...]] = None, order: str | None = None, base_indices: Optional[Tuple[ExpressionT, ...]] = None, storage_shape: ShapeType | None = None, base_storage: Optional[str] = None, initializer: Optional[np.ndarray] = None, read_only: bool = False, _base_storage_access_may_be_aliasing: bool = False, **kwargs: Any) None [source]¶
- Parameters:
dtype –
loopy.auto
or anumpy.dtype
shape –
loopy.auto
or a shape tuplebase_indices –
loopy.auto
or a tuple of base indices
- vector_size(target: TargetBase) int [source]¶
Return the size of the vector type used for the array divided by the basic data type.
Note: For 3-vectors, this will be 4.
(supports persistent hashing)
- base_indices: Tuple[ExpressionT, ...] | None¶
- address_space: AddressSpace | Type[auto]¶
- base_storage: str | None¶
The name of a storage array that is to be used to actually hold the data in this temporary, or None. If not None or the name of an existing variable, a variable of this name and appropriate size will be created.
- initializer: ndarray | None¶
None or a
numpy.ndarray
of data to be used to initialize the array.
- read_only: bool¶
A
bool
indicating whether the variable may be written during its lifetime. If True, initializer must be given.
- _base_storage_access_may_be_aliasing: bool¶
Whether the temporary is used to alias the underlying base storage. Defaults to False. If False, C-based code generators will declare the temporary as a
restrict
const pointer to the base storage memory location. If True, the restrict part is omitted on this declaration.
Specifying Types¶
loopy
uses the same type system as numpy
. (See
numpy.dtype
) It also uses pyopencl
for a registry of
user-defined types and their C equivalents. See pyopencl.tools.get_or_register_dtype()
and related functions.
For a string representation of types, all numpy types (e.g. float32
etc.)
are accepted, in addition to what is registered in pyopencl
.
Substitution Rules¶
Substitution Rule Objects¶
Textual Syntax for Substitution Rules¶
Syntax of a substitution rule:
rule_name(arg1, arg2) := EXPRESSION
Kernel Options¶
- class loopy.Options(**kwargs: Any)[source]¶
Unless otherwise specified, these options are Boolean-valued (i.e. on/off).
Code-generation options
- annotate_inames¶
When generating code for inames, annotate them with comments if it is not immediately apparent which iname is being referred to (such as for inames mapped to constants or OpenCL group/local IDs).
- trace_assignments¶
Generate code that uses printf in kernels to trace the execution of assignment instructions.
- trace_assignment_values¶
Like
trace_assignments
, but also trace the assigned values.
- check_dep_resolution¶
Whether loopy should issue an error if a dependency expression does not match any instructions in the kernel.
Invocation-related options
- skip_arg_checks¶
Do not do any checking (data type, data layout, shape, etc.) on arguments for a minor performance gain.
Changed in version 2021.1: This now defaults to the same value as the
optimize
sub-flag fromsys.flags
. This flag can be controlled (i.e. set to True) by running Python with the-O
flag.
- cl_exec_manage_array_events¶
Within the PyOpenCL executor, respect and update
pyopencl.array.Array.events
.Defaults to True.
- return_dict¶
Have kernels return a
dict
instead of a tuple as output. Specifically, the result of a kernel invocation with this flag is a tuple(evt, out_dict)
, where out_dict is a dictionary mapping argument names to their output values. This is helpful if arguments are inferred and argument ordering is thus implementation-defined.
- write_wrapper¶
Print the generated Python invocation wrapper. Accepts a file name as a value. Writes to
sys.stdout
if none is given.
- write_code¶
Print the generated code. Accepts a file name or a boolean as a value. Writes to
sys.stdout
if set to True.
- edit_code¶
Invoke an editor (given by the environment variable
EDITOR
) on the generated kernel code, allowing for tweaks before the code is passed on to the target for compilation.
- allow_fp_reordering¶
Allow re-ordering of floating point arithmetic. Re-ordering may give different results as floating point arithmetic is not associative in addition and multiplication. Default is True. Note that the implementation of this option is currently incomplete.
- build_options¶
Options to pass to the target compiler when building the kernel. A list of strings.
Features
- disable_global_barriers¶
- enforce_variable_access_ordered¶
If True, require that
loopy.check.check_variable_access_ordered()
passes. Required for language versions 2018.1 and above. This check helps find and eliminate unintentionally unordered access to variables.If equal to
"no_check"
, then no check is performed.
- enforce_array_accesses_within_bounds¶
If True, require that
check_bounds()
passes. If False, thencheck_bounds()
raises a warning for any out-of-bounds accesses.If equal to
"no_check"
, then no check is performed.
- insert_gbarriers¶
If True, based on the memory dependency between variables in the global address space loopy will insert global barriers to avoid RAW, WAR and WAW races.
Targets¶
- class loopy.TargetBase[source]¶
Base class for all targets, i.e. different combinations of code that loopy can generate.
Objects of this type must be picklable.
- class loopy.CFamilyTarget(fortran_abi=False)[source]¶
A target for “least-common denominator C”, without any parallel extensions, and without use of any C99 specifics. Intended to be usable as a common base for C99, C++, OpenCL, CUDA, and the like.
- class loopy.CTarget(fortran_abi=False)[source]¶
This target may emit code using all features of C99. For a target base supporting “least-common-denominator” C, see
CFamilyTarget
.
- class loopy.ExecutableCTarget(compiler=None, fortran_abi=False)[source]¶
An executable CFamilyTarget that uses (by default) JIT compilation of C-code
- class loopy.OpenCLTarget(atomics_flavor=None, use_int8_for_bool=True)[source]¶
A target for the OpenCL C heterogeneous compute programming language.
- class loopy.PyOpenCLTarget(device=None, *, pyopencl_module_name: str = '_lpy_cl', atomics_flavor=None, use_int8_for_bool: bool = True, limit_arg_size_nbytes: int | None = None, pointer_size_nbytes: int | None = None)[source]¶
A code generation target that takes special advantage of
pyopencl
features such as run-time knowledge of the target device (to generate warnings) and support for complex numbers.
- class loopy.ISPCTarget(fortran_abi=False)[source]¶
A code generation target for Intel’s ISPC SPMD programming language, to target Intel’s Knight’s hardware and modern Intel CPUs with wide vector units.
References to Canonical Names¶
- class loopy.target.TargetBase[source]¶
See
loopy.TargetBase
.
Helper values¶
- class loopy.auto[source]¶
A generic placeholder object for something that should be automatically determined. See, for example, the shape or strides argument of
ArrayArg
.
Libraries: Extending and Interfacing with External Functionality¶
Symbols¶
Functions¶
- class loopy.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]¶
- class loopy.CallMangleInfo(target_name, result_dtypes, arg_dtypes)[source]¶
- target_name¶
A string. The name of the function to be called in the generated target code.
- result_dtypes¶
A tuple of
loopy.types.LoopyType
instances indicating what types of values the function returns.
- arg_dtypes¶
A tuple of
loopy.types.LoopyType
instances indicating what types of arguments the function actually receives.
Reductions¶
The Kernel Object¶
Do not create LoopKernel
objects directly. Instead, refer to
Reference: Creating Kernels.
- class loopy.LoopKernel(domains: ~typing.Sequence[~islpy._isl.BasicSet], instructions: ~typing.Sequence[~loopy.kernel.instruction.InstructionBase], args: ~typing.Sequence[~loopy.kernel.data.KernelArgument], assumptions: ~islpy._isl.BasicSet, temporary_variables: ~typing.Mapping[str, ~loopy.kernel.data.TemporaryVariable], inames: ~typing.Mapping[str, ~loopy.kernel.data.Iname], substitutions: ~typing.Mapping[str, ~loopy.kernel.data.SubstitutionRule], options: ~loopy.options.Options, target: ~loopy.target.TargetBase, tags: ~typing.FrozenSet[~pytools.tag.Tag], state: ~loopy.kernel.KernelState = KernelState.INITIAL, name: str = 'loopy_kernel', preambles: ~typing.Sequence[~typing.Tuple[int, str]] = (), preamble_generators: ~typing.Sequence[~typing.Callable[[PreambleInfo], ~typing.Iterator[~typing.Tuple[int, str]]]] = (), symbol_manglers: ~typing.Sequence[~typing.Callable[[LoopKernel, str], ~typing.Tuple[~loopy.types.LoopyType, str] | None]] = (), linearization: ~typing.Sequence[~loopy.schedule.ScheduleItem] | None = None, iname_slab_increments: ~typing.Mapping[str, ~typing.Tuple[int, int]] = <factory>, loop_priority: ~typing.FrozenSet[~typing.Tuple[str, ...]] = <factory>, applied_iname_rewrites: ~typing.Tuple[~typing.Dict[str, int | ~numpy.integer | float | complex | ~numpy.inexact | bool | ~numpy.bool | Expression | tuple[ExpressionT, ...]], ...] = (), index_dtype: ~loopy.types.NumpyType = np:dtype('int32'), silenced_warnings: ~typing.FrozenSet[str] = frozenset({}), overridden_get_grid_sizes_for_insn_ids: ~typing.Callable[[~typing.FrozenSet[str], ~typing.Dict[str, InKernelCallable], bool], ~typing.Tuple[~typing.Tuple[int, ...], ~typing.Tuple[int, ...]]] | None = None)[source]¶
These correspond more or less directly to arguments of
loopy.make_kernel()
.Note
This data structure and its attributes should be considered immutable, even if it contains mutable data types. See
copy()
for an easy way of producing a modified copy.- domains: Sequence[BasicSet]¶
Represents the Loop Domain Forest.
- instructions: Sequence[InstructionBase]¶
See Instructions.
- args: Sequence[KernelArgument]¶
- linearization: Sequence[ScheduleItem] | None = None¶
- assumptions: BasicSet¶
Must be a
islpy.BasicSet
parameter domain.
- temporary_variables: Mapping[str, TemporaryVariable]¶
- substitutions: Mapping[str, SubstitutionRule]¶
- iname_slab_increments: Mapping[str, Tuple[int, int]]¶
A mapping from inames to (lower_incr, upper_incr) tuples that will be separated out in the execution to generate ‘bulk’ slabs with fewer conditionals.
- loop_priority: FrozenSet[Tuple[str, ...]]¶
A frozenset of priority constraints to the kernel. Each such constraint is a tuple of inames. Inames occurring in such a tuple will be scheduled earlier than any iname following in the tuple. This applies only to inames with non-parallel implementation tags.
- applied_iname_rewrites: Tuple[Dict[str, int | integer | float | complex | inexact | bool | bool | Expression | tuple[ExpressionT, ...]], ...] = ()¶
A list of past substitution dictionaries that were applied to the kernel. These are stored so that they may be repeated on expressions the user specifies later.
- state: KernelState = 0¶
- target: TargetBase¶
- __call__(*args, **kwargs)[source]¶
Execute the
LoopKernel
.
- copy(**kwargs: Any) LoopKernel [source]¶
- tagged(tags: Iterable[Tag] | Tag | None) Self [source]¶
Return a copy of self with the specified tag or tags added to the set of tags. If the resulting set of tags violates the rules on
pytools.tag.UniqueTag
, an error is raised.- Parameters:
tags – An instance of
Tag
or an iterable with instances therein.
- without_tags(tags: Iterable[Tag] | Tag | None, verify_existence: bool = True) Self [source]¶
Return a copy of self without the specified tags.
- Parameters:
tags – An instance of
Tag
or an iterable with instances therein.verify_existence – If set to True, this method raises an exception if not all tags specified for removal are present in the original set of tags. Default True.
Implementation Details: The Base Array¶
All array-like data in loopy
(such as ArrayArg
and
TemporaryVariable
) derive from single, shared base array type,
described next.
- class loopy.kernel.array.ArrayBase(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, alignment=None, tags=None, **kwargs)[source]¶
-
- dtype: LoopyType | None¶
The
loopy.types.LoopyType
of the array. If this is None,loopy
will try to continue without knowing the type of this array, where the idea is that precise knowledge of the type will become available at invocation time. Calling the kernel (vialoopy.LoopKernel.__call__()
) automatically adds this type information based on invocation arguments.Note that some transformations, such as
loopy.add_padding()
cannot be performed without knowledge of the exact dtype.
- shape: ShapeType | Type[auto] | None¶
May be one of the following:
None. In this case, no shape is intended to be specified, only the strides will be used to access the array. Bounds checking will not be performed.
loopy.auto
. The shape will be determined by finding the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.Any element of the shape tuple not used to compute strides may be None.
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- offset: ExpressionT | str | None¶
Offset from the beginning of the buffer to the point from which the strides are counted, in units of the
dtype
. May be one of0 or None
a string (that is interpreted as an argument name).
a pymbolic expression
loopy.auto
, in which case an offset argument is added automatically, immediately following this argument.
- dim_names: Tuple[str, ...] | None¶
A tuple of strings providing names for the array axes, or None. If given, must have the same number of entries as
dim_tags
anddim_tags
. These do not live in any particular namespace (i.e. collide with no other names) and serve a purely informational/documentational purpose. On occasion, they are used to generate more informative names than could be achieved by axis numbers.
- alignment: int | None¶
Memory alignment of the array in bytes. For temporary arrays, this ensures they are allocated with this alignment. For arguments, this entails a promise that the incoming array obeys this alignment restriction.
Defaults to None.
If an integer N is given, the array would be declared with
__attribute__((aligned(N)))
in code generation forloopy.CFamilyTarget
.Added in version 2018.1.
- tags: FrozenSet[Tag]¶
A (possibly empty) frozenset of instances of
pytools.tag.Tag
intended for consumption by an application.Added in version 2020.2.2.
- __init__(name, dtype=None, shape=None, dim_tags=None, offset=0, dim_names=None, strides=None, order=None, for_atomic=False, alignment=None, tags=None, **kwargs)[source]¶
All of the following (except name) are optional. Specify either strides or shape.
- Parameters:
name – When passed to
loopy.make_kernel
, this may contain multiple names separated by commas, in which case multiple arguments, each with identical properties, are created for each name.shape – May be any of the things specified under
shape
, or a string which can be parsed into the previous form.dim_tags – A comma-separated list of tags as understood by
loopy.kernel.array.parse_array_dim_tags()
.strides –
May be one of the following:
None
loopy.auto
. The strides will be determined by order and the access footprint.a tuple like like
numpy.ndarray.shape
.Each entry of the tuple is also allowed to be a
pymbolic
expression involving kernel parameters, or a (potentially-comma separated) or a string that can be parsed to such an expression.A string which can be parsed into the previous form.
order – “F” or “C” for C (row major) or Fortran (column major). Defaults to the default_order argument passed to
loopy.make_kernel()
.for_atomic – Whether the array is declared for atomic access, and, if necessary, using atomic-capable data types.
offset – (See
offset
)alignment – memory alignment in bytes
tags – An instance of or an Iterable of instances of
pytools.tag.Tag
.
- vector_size(target: TargetBase) int [source]¶
Return the size of the vector type used for the array divided by the basic data type.
Note: For 3-vectors, this will be 4.
(supports persistent hashing)