Reference: Documentation for Internal API¶
Targets¶
See also Targets.
- class loopy.target.c.POD(ast_builder, dtype, name)[source]¶
A simple declarator: The type is given as a
numpy.dtype
and the name is given as a string.
- class loopy.target.c.ScopingBlock(contents: Sequence[Generable] | None = None)[source]¶
A block that is mandatory for scoping and may not be simplified away by
loopy.codegen.result.merge_codegen_results()
.
- class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]¶
Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.
Note
All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.
References¶
- class loopy.target.c.codegen.expression.Generable¶
See
cgen.Generable
.
Symbolic¶
See also Expressions.
Loopy-specific expression types¶
- class loopy.symbolic.Literal(s: str)[source]¶
A literal to be used during code generation.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.ArrayLiteral(children: tuple[Expression, ...])[source]¶
An array literal.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.TypedCSE(child: _Expression, prefix: str | None = None, scope: str = 'pymbolic_eval', dtype: LoopyType | None = None)[source]¶
A
pymbolic.primitives.CommonSubexpression
annotated with a type.
- class loopy.TypeCast(type: ToLoopyTypeConvertible, child: Expression)[source]¶
Only defined for numerical types with semantics matching
numpy.ndarray.astype()
.- child: Expression¶
The expression to be cast.
- type¶
- class loopy.TaggedVariable(name: str, tags: Iterable[Tag] | Tag | None)[source]¶
This is an identifier with tags, such as
matrix$one
, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.- tags: frozenset[Tag]¶
A
frozenset
of subclasses ofpytools.tag.Tag
used to provide metadata on this object. Legacy string tags are converted toLegacyStringInstructionTag
or, if they used to carry a functional meaning, the tag carrying that same functional meaning (e.g.UseStreamingStoreTag
).
Inherits from
pymbolic.primitives.Variable
andpytools.tag.Taggable
.
- class loopy.Reduction(operation: ReductionOperation | str, inames: tuple[str | pymbolic.primitives.Variable, ...] | pymbolic.primitives.Variable | str, expr: Expression, allow_simultaneous: bool = False)[source]¶
Represents a reduction operation on
expr
acrossinames
.- operation: ReductionOperation¶
- expr: Expression¶
An expression which may have tuple type. If the expression has tuple type, it must be one of the following:
a
tuple
ofpymbolic.typing.Expression
, ora
loopy.symbolic.Reduction
, ora function call or substitution rule invocation.
- class loopy.LinearSubscript(aggregate: Expression, index: Expression = <function ExpressionNode.index>)[source]¶
Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.
- class loopy.symbolic.SubArrayRef(swept_inames: tuple[Variable, ...], subscript: Subscript)[source]¶
An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using
SubArrayRef.swept_inames
.- swept_inames¶
An instance of
tuple
denoting the axes to which the sub array is supposed to be mapped to.
- subscript¶
An instance of
pymbolic.primitives.Subscript
denoting the array in the kernel.
- class loopy.symbolic.RuleArgument(index: int = <function ExpressionNode.index>)[source]¶
Represents a (numbered) argument of a
loopy.SubstitutionRule
. Only used internally in the rule-aware mappers to match subst rules independently of argument names.
- class loopy.symbolic.ResolvedFunction(function: Variable | ReductionOpFunction)[source]¶
A function identifier whose definition is known in a
loopy
program. A function is said to be known in aTranslationUnit
if its name maps to anInKernelCallable
inloopy.TranslationUnit.callables_table
. Refer to Function Interface.- function: Variable | ReductionOpFunction¶
- name¶
Rule-aware Mappers¶
- class loopy.symbolic.ExpansionState(kernel: LoopKernel, instruction: InstructionBase, stack: tuple[tuple[str, Tag], ...], arg_context: immutables.Map[str, Expression])[source]¶
- kernel¶
- instruction¶
- stack¶
a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.
- arg_context¶
a dict representing current argument values
- class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context: SubstitutionRuleMappingContext)[source]¶
Note: the third argument dragged around by this mapper is the current
ExpansionState
.Subclasses of this must be careful to not touch identifiers that are in
ExpansionState.arg_context
.
Expression Manipulation Helpers¶
- loopy.symbolic.simplify_using_aff(kernel, expr)[source]¶
Simplifies expr on kernel’s domain.
- Parameters:
expr – An instance of
pymbolic.typing.Expression
.
References¶
- class loopy.symbolic.Variable[source]¶
See
pymbolic.Variable
.
- class loopy.symbolic.Expression¶
- class loopy.symbolic._Expression¶
Types¶
DTypes of variables in a loopy.LoopKernel
must be picklable, so in
the codegen pipeline user-provided types are converted to
loopy.types.LoopyType
.
- class loopy.LoopyType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
.
- class loopy.types.AtomicType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
on which atomic operations are performed .
Codegen¶
- class loopy.codegen.PreambleInfo(kernel: LoopKernel, seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], codegen_state: CodeGenerationState)[source]¶
- kernel: LoopKernel¶
- seen_functions: set[SeenFunction]¶
- class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: tuple[LoopyType, ...], result_dtypes: tuple[LoopyType, ...])[source]¶
This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with
InKernelCallable
.- name¶
- c_name¶
- arg_dtypes¶
a tuple of arg dtypes
- result_dtypes¶
a tuple of result dtypes
- class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: islpy.Set, implemented_predicates: frozenset[str | Expression], seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], var_subst_map: immutables.Map[str, Expression], allow_complex: bool, callables_table: CallablesTable, is_entrypoint: bool, var_name_generator: pytools.UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cache_manager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]¶
- kernel: LoopKernel¶
- target: TargetBase¶
- implemented_domain: islpy.Set¶
The entire implemented domain (as an
islpy.Set
) i.e. all constraints that have been enforced so far.
- implemented_predicates: frozenset[str | Expression]¶
- seen_functions: set[SeenFunction]¶
- seen_atomic_dtypes¶
- var_subst_map: immutables.Map[str, Expression]¶
- vectorization_info: VectorizationInfo | None = None¶
- callables_table: CallablesTable¶
- codegen_cache_manager: CodegenOperationCacheManager¶
- class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[tuple[int, str]] = (), device_preambles: Sequence[tuple[int, str]] = ())[source]¶
- host_program¶
A mapping from names of entrypoints to their host
GeneratedProgram
.
- device_programs¶
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, islpy.Set], host_preambles: Sequence[tuple[str, str]] = (), device_preambles: Sequence[tuple[str, str]] = ())[source]¶
- host_program¶
- device_programs¶
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult [source]¶
- class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: list[InstructionBase], linearization: list[ScheduleItem], inames: dict[str, loopy.kernel.data.Iname])[source]¶
Proxy to
loopy.LoopKernel
to be used byCodegenOperationCacheManager
.
- class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]¶
Caches operations arising during the codegen pipeline.
- kernel_proxy¶
An instance of
KernelProxyForCodegenOperationCacheManager
.
- with_kernel(kernel)[source]¶
Returns a new instance of
CodegenOperationCacheManager
corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.
- get_concurrent_inames_in_a_callkernel(callkernel_index: int) frozenset[str] [source]¶
Returns a
frozenset
of concurrent inames in a callkernel- Parameters:
callkernel_index – Index of the
loopy.schedule.CallKernel
in theCodegenOperationCacheManager.kernel_proxy
’s schedule, whose parallel inames are to be found.
References¶
- class loopy.codegen.tools.ExpressionNode¶
Reduction Operation¶
Iname Tags¶
- loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]¶
Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.
- Parameters:
tags – An iterable of tags.
tag_type – a subclass of
loopy.kernel.data.InameImplementationTag
.max_num – the maximum number of tags expected to be found.
min_num – the minimum number of tags expected to be found.
- class loopy.kernel.data.Iname(name: str, tags: frozenset[Tag])[source]¶
Records an iname in a
LoopKernel
. See Loop Domain Forest for semantics of inames inloopy
.This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like
loopy.kernel.data.InameImplementationTag
or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.- tags¶
An instance of
frozenset
ofpytools.tag.Tag
.
References¶
- class loopy.kernel.data.ToLoopyTypeConvertible¶
Array¶
- class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]¶
- target_axis¶
For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.
Note that “normal” arrays only have one target_axis.
- layout_nesting_level¶
For determining the stride of
ComputedStrideArrayDimTag
, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a singleArrayBase.dim_tags
. The lowest nesting level varies fastest when viewed in linear memory.May be None on
FixedStrideArrayDimTag
, in which case noComputedStrideArrayDimTag
instances may occur.
- class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]¶
An arg dimension implementation tag for a fixed (potentially symbolic) stride.
- stride¶
May be one of the following:
A
Expression
, including an integer, indicating the stride in units of the underlying array’sArrayBase.dtype
.loopy.auto
, indicating that a new kernel argument for this stride should automatically be created.
The stride is given in units of
ArrayBase.dtype
.
- class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]¶
- pad_to¶
ArrayBase.dtype
granularity to which to pad this dimension
This type of stride arg dim gets converted to
FixedStrideArrayDimTag
on input toArrayBase
subclasses.
- loopy.kernel.array.parse_array_dim_tags(dim_tags, n_axes=None, use_increasing_target_axes=False, dim_names=None)[source]¶
Cross-references¶
(This section shouldn’t exist: Sphinx should be able to resolve these on its own.)
- class loopy.kernel.array.ShapeType¶
- class loopy.kernel.array.Tag[source]¶
See
pytools.tag.Tag
Checks¶
- loopy.check.check_for_integer_subscript_indices(t_unit)[source]¶
Checks if every array access is of type
int
.
- loopy.check.check_for_duplicate_insn_ids(knl: LoopKernel) None [source]¶
Check if multiple instructions of knl have the same
loopy.InstructionBase.id
.
- loopy.check.check_for_double_use_of_hw_axes(t_unit: TranslationUnit) None [source]¶
Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.
- loopy.check.check_insn_attributes(kernel: LoopKernel) None [source]¶
Check for legality of attributes of every instruction in kernel.
- loopy.check.check_loop_priority_inames_known(kernel: LoopKernel) None [source]¶
Checks if the inames in
loopy.LoopKernel.loop_priority
are part of the kernel’s domain.
- loopy.check.check_multiple_tags_allowed(kernel: LoopKernel) None [source]¶
Checks if a multiple tags of an iname are compatible.
- loopy.check.check_for_inactive_iname_access(kernel: LoopKernel) None [source]¶
Check if any instruction accesses an iname but is not within it.
- loopy.check.check_for_unused_inames(kernel: LoopKernel) None [source]¶
Check if there are any unused inames in the kernel.
- loopy.check.check_for_write_races(kernel: LoopKernel) None [source]¶
Check if any memory accesses lead to write races.
- loopy.check.check_for_data_dependent_parallel_bounds(kernel: LoopKernel) None [source]¶
Check that inames tagged as hw axes have bounds that are known at kernel launch.
- loopy.check.check_bounds(t_unit: TranslationUnit) None [source]¶
Performs out-of-bound check for every array access.
- loopy.check.check_variable_access_ordered(kernel: LoopKernel) None [source]¶
Checks that between each write to a variable and all other accesses to the variable there is either:
a direct/indirect dependency edge, or
an explicit statement that no ordering is necessary (expressed through a bi-directional
loopy.InstructionBase.no_sync_with
)
Schedule¶
- class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]¶
- comment¶
A plain-text comment explaining why the barrier was inserted.
- synchronization_kind¶
"local"
or"global"
- mem_kind¶
"local"
or"global"
- originating_insn_id¶
- loopy.schedule.tools.get_block_boundaries(schedule: Sequence[ScheduleItem]) Mapping[int, int] [source]¶
Return a dictionary mapping indices of
loopy.schedule.BeginBlockItem
s toloopy.schedule.EndBlockItem
s and vice versa.
- loopy.schedule.tools.temporaries_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str] [source]¶
- loopy.schedule.tools.args_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str] [source]¶
- loopy.schedule.tools.args_written_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str] [source]¶
- loopy.schedule.tools.supporting_temporary_names(kernel: LoopKernel, tv_names: frozenset[str]) frozenset[str] [source]¶
- class loopy.schedule.tools.KernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str])[source]¶
- class loopy.schedule.tools.SubKernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str], passed_inames: Sequence[str], passed_temporaries: Sequence[str])[source]¶
Inherits from
KernelArgInfo
.
- loopy.schedule.tools.get_kernel_arg_info(kernel: LoopKernel) KernelArgInfo [source]¶
- loopy.schedule.tools.get_subkernel_arg_info(kernel: LoopKernel, subkernel_name: str) SubKernelArgInfo [source]¶
- loopy.schedule.tools.get_return_from_kernel_mapping(kernel: LoopKernel) Mapping[int, int | None] [source]¶
Returns a mapping from schedule index of every schedule item (S) in kernel to the schedule index of
loopy.schedule.ReturnFromKernel
of the active sub-kernel at ‘S’.
- class loopy.schedule.tools.AccessMapDescriptor(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Special access map values.
- Attr DOES_NOT_ACCESS:
Describes an unaccessed variable.
- Attr NON_AFFINE_ACCESS:
Describes a non-quasi-affine access into an array.
- class loopy.schedule.tools.WriteRaceChecker(kernel, callables_table)[source]¶
Used for checking for overlap between access ranges of instructions.
- loopy.schedule.tools.separate_loop_nest(tree: LoopNestTree, loop_nests: Collection[InameStrSet], inames_to_separate: InameStrSet) tuple[LoopNestTree, InameStrSet, InameStrSet | None] [source]¶
Returns a copy of tree that has inames_to_separate occur in nodes that are not shared with other inames. Returns a version of the loop nest tree tree so that every node in the tree is either a subset of outermost_inames or has an empty intersection with outermost_inames.
This routine modifies at most one node of the tree. All its ancestors must satisfy ancestor <= outermost_inames. For the first node not satisfying this relationship, if node & outermost_inames is empty, no modification is made. Otherwise, if
node & outermost_inames < node
, that node is split so as to separate outermost_inames in their own node.- Parameters:
loop_nests – A collection of nodes in tree that cover inames_to_separate.
- Returns:
a
tuple
(new_tree, outer_loop_nest, inner_loop_nest)
, where outer_loop_nest is the identifier for the new outer and inner loop nests so that inames_to_separate is a valid nesting.
Note
We could compute loop_nests within this routine’s implementation, but computing would be expensive and hence we ask the caller for this info.
- Example::
- tree: frozenset()
- └── frozenset({‘j’, ‘i’})
└── frozenset({‘k’, ‘l’})
inames_to_separate: frozenset({‘k’, ‘i’, ‘j’}) loop_nests: {frozenset({‘j’, ‘i’}), frozenset({‘k’, ‘l’})}
Returns:
- new_tree: frozenset()
- └── frozenset({‘j’, ‘i’})
- └── frozenset({‘k’})
└── frozenset({‘l’})
outer_loop_nest: frozenset({‘k’}) inner_loop_nest: frozenset({‘l’})
- loopy.schedule.tools.get_partial_loop_nest_tree(kernel: LoopKernel) LoopNestTree [source]¶
Returns a tree representing the kernel’s loop nests.
Each node of the returned tree has a
frozenset
of inames. All the inames in the identifier of a parent node of a loop nest in the tree must be nested outside all the iname in identifier of the loop nest.Note
This routine only takes into account the nesting dependency constraints of
loopy.InstructionBase.within_inames
of all the kernel’s instructions and the iname tags. This routine does NOT include the nesting constraints imposed by the dependencies between the instructions and the dependencies imposed by the kernel’s domain tree.
- loopy.schedule.tools.get_loop_tree(kernel: LoopKernel) LoopTree [source]¶
Returns a tree representing the loop nesting for kernel. A parent node in the tree is always nested outside all its children.
Note
Multiple loop nestings might exist for kernel, but this routine returns one valid loop nesting.
References¶
- class loopy.schedule.tools.InameStrSet¶
- class loopy.schedule.tree.Tree(_parent_to_children: Map[NodeT, tuple[NodeT, ...]], _child_to_parent: Map[NodeT, NodeT | None])[source]¶
An immutable tree containing nodes of type
NodeT
.- ancestors(node: NodeT) tuple[NodeT, ...] [source]¶
Returns a
tuple
of nodes that are ancestors of node.
- add_node(node: NodeT, parent: NodeT) Tree[NodeT] [source]¶
Returns a
Tree
with added node node having a parent parent.
- replace_node(node: NodeT, new_node: NodeT) Tree[NodeT] [source]¶
Returns a copy of self with node replaced with new_node.
- move_node(node: NodeT, new_parent: NodeT | None) Tree[NodeT] [source]¶
Returns a copy of self with node node as a child of new_parent.
Note
Almost all the operations are implemented recursively. NOT suitable for deep trees. At the very least if the Python implementation is CPython this allocates a new stack frame for each iteration of the operation.