Reference: Documentation for Internal API¶
Targets¶
See also Targets.
- class loopy.target.c.POD(ast_builder, dtype, name)[source]¶
A simple declarator: The type is given as a
numpy.dtypeand the name is given as a string.
- class loopy.target.c.ScopingBlock(contents: Sequence[Generable] | None = None)[source]¶
A block that is mandatory for scoping and may not be simplified away by
loopy.codegen.result.merge_codegen_results().
- class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state: CodeGenerationState, fortran_abi: bool = False, type_inf_mapper: TypeInferenceMapper | None = None)[source]¶
Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.
Note
All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.
References¶
- class loopy.target.c.codegen.expression.Generable¶
See
cgen.Generable.
Symbolic¶
See also Expressions.
Loopy-specific expression types¶
- class loopy.symbolic.Literal(s: str)[source]¶
A literal to be used during code generation.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.ArrayLiteral(children: tuple[Expression, ...])[source]¶
An array literal.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.TypedCSE(child: _Expression, prefix: str | None = None, scope: str = 'pymbolic_eval', dtype: LoopyType | None = None)[source]¶
A
pymbolic.primitives.CommonSubexpressionannotated with a type.
- class loopy.TypeCast(type: ToLoopyTypeConvertible, child: Expression)[source]¶
Only defined for numerical types with semantics matching
numpy.ndarray.astype().- child: Expression¶
The expression to be cast.
- type¶
- class loopy.TaggedVariable(name: str, tags: Iterable[Tag] | Tag | None)[source]¶
This is an identifier with tags, such as
matrix$one, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.- tags: frozenset[Tag]¶
A
frozensetof subclasses ofpytools.tag.Tagused to provide metadata on this object. Legacy string tags are converted toLegacyStringInstructionTagor, if they used to carry a functional meaning, the tag carrying that same functional meaning (e.g.UseStreamingStoreTag).
Inherits from
pymbolic.primitives.Variableandpytools.tag.Taggable.
- class loopy.Reduction(operation: ReductionOperation | str, inames: tuple[str | pymbolic.primitives.Variable, ...] | pymbolic.primitives.Variable | str, expr: Expression, allow_simultaneous: bool = False)[source]¶
Represents a reduction operation on
expracrossinames.- operation: ReductionOperation¶
- expr: Expression¶
An expression which may have tuple type. If the expression has tuple type, it must be one of the following:
a
tupleofpymbolic.typing.Expression, ora
loopy.symbolic.Reduction, ora function call or substitution rule invocation.
- class loopy.LinearSubscript(aggregate: Expression, index: Expression = <function ExpressionNode.index>)[source]¶
Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.
- class loopy.symbolic.SubArrayRef(swept_inames: tuple[Variable, ...], subscript: Subscript)[source]¶
An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using
SubArrayRef.swept_inames.- swept_inames¶
An instance of
tupledenoting the axes to which the sub array is supposed to be mapped to.
- subscript¶
An instance of
pymbolic.primitives.Subscriptdenoting the array in the kernel.
- class loopy.symbolic.RuleArgument(index: int = <function ExpressionNode.index>)[source]¶
Represents a (numbered) argument of a
loopy.SubstitutionRule. Only used internally in the rule-aware mappers to match subst rules independently of argument names.
- class loopy.symbolic.ResolvedFunction(function: Variable | ReductionOpFunction)[source]¶
A function identifier whose definition is known in a
loopyprogram. A function is said to be known in aTranslationUnitif its name maps to anInKernelCallableinloopy.TranslationUnit.callables_table. Refer to Function Interface.- function: Variable | ReductionOpFunction¶
- name¶
Rule-aware Mappers¶
- class loopy.symbolic.ExpansionState(kernel: LoopKernel, instruction: InstructionBase, stack: tuple[ConcreteMatchable, ...], arg_context: Mapping[str, Expression])[source]¶
- kernel¶
- instruction¶
- stack¶
a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.
- arg_context¶
a dict representing current argument values
- class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context: SubstitutionRuleMappingContext)[source]¶
Note: the third argument dragged around by this mapper is the current
ExpansionState.Subclasses of this must be careful to not touch identifiers that are in
ExpansionState.arg_context.
Expression Manipulation Helpers¶
- loopy.symbolic.simplify_using_aff(kernel: LoopKernel, expr: ArithmeticExpression) ArithmeticExpression[source]¶
Simplifies expr on kernel’s domain.
- Parameters:
expr – An instance of
pymbolic.typing.Expression.
References¶
- class loopy.symbolic.Variable[source]¶
See
pymbolic.Variable.
- class loopy.symbolic.Expression¶
- class loopy.symbolic.ArithmeticExpression¶
- class loopy.symbolic._Expression¶
Types¶
DTypes of variables in a loopy.LoopKernel must be picklable, so in
the codegen pipeline user-provided types are converted to
loopy.types.LoopyType.
- class loopy.LoopyType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel.
- loopy.ToLoopyTypeConvertible¶
alias of ( type[auto] | DTypeLike | LoopyType | str | None)
- class loopy.types.AtomicType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernelon which atomic operations are performed .
- class loopy.types.AtomicNumpyType(dtype: DTypeLike)[source]¶
A dtype wrapper that indicates that the described type should be capable of atomic operations.
Type inference¶
Codegen¶
- class loopy.codegen.PreambleInfo(kernel: LoopKernel, seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], codegen_state: CodeGenerationState)[source]¶
- kernel: LoopKernel¶
- seen_functions: set[SeenFunction]¶
- class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: tuple[LoopyType, ...], result_dtypes: tuple[LoopyType, ...])[source]¶
This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with
InKernelCallable.- name¶
- c_name¶
- arg_dtypes¶
a tuple of arg dtypes
- result_dtypes¶
a tuple of result dtypes
- class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: islpy.Set, implemented_predicates: frozenset[Expression], seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], var_subst_map: constantdict.constantdict[str, Expression], allow_complex: bool, callables_table: CallablesTable, is_entrypoint: bool, var_name_generator: pytools.UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cache_manager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]¶
- kernel: LoopKernel¶
- target: TargetBase¶
- implemented_domain: islpy.Set¶
The entire implemented domain (as an
islpy.Set) i.e. all constraints that have been enforced so far.
- implemented_predicates: frozenset[Expression]¶
- seen_functions: set[SeenFunction]¶
- seen_atomic_dtypes¶
- var_subst_map: constantdict.constantdict[str, Expression]¶
- vectorization_info: VectorizationInfo | None = None¶
- callables_table: CallablesTable¶
- codegen_cache_manager: CodegenOperationCacheManager¶
- class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[tuple[int, str]] = (), device_preambles: Sequence[tuple[int, str]] = ())[source]¶
- host_program¶
A mapping from names of entrypoints to their host
GeneratedProgram.
- device_programs¶
A list of
GeneratedPrograminstances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, list[islpy.Set]], host_preambles: Sequence[tuple[str, str]] = (), device_preambles: Sequence[tuple[str, str]] = ())[source]¶
- host_program¶
- device_programs¶
A list of
GeneratedPrograminstances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult[ASTType] | Any], collapse: bool = True) CodeGenerationResult[ASTType][source]¶
- loopy.codegen.result.generate_host_or_device_program(codegen_state: CodeGenerationState, schedule_index: int)[source]¶
- class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: list[InstructionBase], linearization: list[ScheduleItem], inames: dict[str, loopy.kernel.data.Iname])[source]¶
Proxy to
loopy.LoopKernelto be used byCodegenOperationCacheManager.
- class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]¶
Caches operations arising during the codegen pipeline.
- kernel_proxy¶
An instance of
KernelProxyForCodegenOperationCacheManager.
- with_kernel(kernel)[source]¶
Returns a new instance of
CodegenOperationCacheManagercorresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.
- get_concurrent_inames_in_a_callkernel(callkernel_index: int) frozenset[str][source]¶
Returns a
frozensetof concurrent inames in a callkernel- Parameters:
callkernel_index – Index of the
loopy.schedule.CallKernelin theCodegenOperationCacheManager.kernel_proxy’s schedule, whose parallel inames are to be found.
References¶
- class loopy.codegen.tools.ExpressionNode¶
Reduction Operation¶
Iname Tags¶
- loopy.kernel.data.filter_iname_tags_by_type(tags: Iterable[Tag], tag_type: type[TagT] | tuple[type[TagT], ...], max_num: int | None = None, min_num: int | None = None) set[TagT][source]¶
Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.
- Parameters:
tags – An iterable of tags.
tag_type – a subclass of
loopy.kernel.data.InameImplementationTag.max_num – the maximum number of tags expected to be found.
min_num – the minimum number of tags expected to be found.
- class loopy.kernel.data.Iname(name: str, tags: frozenset[Tag])[source]¶
Records an iname in a
LoopKernel. See Loop Domain Forest for semantics of inames inloopy.This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like
loopy.kernel.data.InameImplementationTagor a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.- tags¶
An instance of
frozensetofpytools.tag.Tag.
References¶
- class loopy.kernel.data.ToLoopyTypeConvertible¶
- class loopy.kernel.data.TagT¶
A type variable with a lower bound of
pytools.tag.Tag.
Array¶
- class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]¶
- target_axis¶
For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.
Note that “normal” arrays only have one target_axis.
- layout_nesting_level¶
For determining the stride of
ComputedStrideArrayDimTag, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a singleArrayBase.dim_tags. The lowest nesting level varies fastest when viewed in linear memory.May be None on
FixedStrideArrayDimTag, in which case noComputedStrideArrayDimTaginstances may occur.
- class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]¶
An arg dimension implementation tag for a fixed (potentially symbolic) stride.
- stride¶
May be one of the following:
A
Expression, including an integer, indicating the stride in units of the underlying array’sArrayBase.dtype.loopy.auto, indicating that a new kernel argument for this stride should automatically be created.
The stride is given in units of
ArrayBase.dtype.
- class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]¶
- pad_to¶
ArrayBase.dtypegranularity to which to pad this dimension
This type of stride arg dim gets converted to
FixedStrideArrayDimTagon input toArrayBasesubclasses.
- loopy.kernel.array.ToDimTagsParseable: TypeAlias = str | collections.abc.Sequence[str | loopy.kernel.array.ArrayDimImplementationTag] | dict[str, str]
Represent a PEP 604 union type
E.g. for int | str
- class loopy.kernel.array.ToDimTagsParseable¶
See above.
- loopy.kernel.array.parse_array_dim_tags(dim_tags: str | Sequence[str | ArrayDimImplementationTag] | dict[str, str], n_axes: int | None = None, use_increasing_target_axes: bool = False, dim_names: Sequence[str] | None = None) Sequence[ArrayDimImplementationTag][source]¶
Cross-references¶
(This section shouldn’t exist: Sphinx should be able to resolve these on its own.)
- class loopy.kernel.array.ShapeType¶
- class loopy.kernel.array.Tag[source]¶
See
pytools.tag.Tag
Checks¶
- loopy.check.check_for_integer_subscript_indices(t_unit)[source]¶
Checks if every array access is of type
int.
- loopy.check.check_for_duplicate_insn_ids(knl: LoopKernel) None[source]¶
Check if multiple instructions of knl have the same
loopy.InstructionBase.id.
- loopy.check.check_for_double_use_of_hw_axes(t_unit: TranslationUnit) None[source]¶
Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.
- loopy.check.check_insn_attributes(kernel: LoopKernel) None[source]¶
Check for legality of attributes of every instruction in kernel.
- loopy.check.check_loop_priority_inames_known(kernel: LoopKernel) None[source]¶
Checks if the inames in
loopy.LoopKernel.loop_priorityare part of the kernel’s domain.
- loopy.check.check_multiple_tags_allowed(kernel: LoopKernel) None[source]¶
Checks if a multiple tags of an iname are compatible.
- loopy.check.check_for_inactive_iname_access(kernel: LoopKernel) None[source]¶
Check if any instruction accesses an iname but is not within it.
- loopy.check.check_for_unused_inames(kernel: LoopKernel) None[source]¶
Check if there are any unused inames in the kernel.
- loopy.check.check_for_write_races(kernel: LoopKernel) None[source]¶
Check if any memory accesses lead to write races.
- loopy.check.check_for_data_dependent_parallel_bounds(kernel: LoopKernel) None[source]¶
Check that inames tagged as hw axes have bounds that are known at kernel launch.
- loopy.check.check_bounds(t_unit: TranslationUnit) None[source]¶
Performs out-of-bound check for every array access.
- loopy.check.check_variable_access_ordered(kernel: LoopKernel) None[source]¶
Checks that between each write to a variable and all other accesses to the variable there is either:
a direct/indirect dependency edge, or
an explicit statement that no ordering is necessary (expressed through a bi-directional
loopy.InstructionBase.no_sync_with)
Schedule¶
- class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]¶
- comment¶
A plain-text comment explaining why the barrier was inserted.
- synchronization_kind¶
"local"or"global"
- mem_kind¶
"local"or"global"
- originating_insn_id¶
- loopy.schedule.tools.get_block_boundaries(schedule: Sequence[ScheduleItem]) Mapping[int, int][source]¶
Return a dictionary mapping indices of
loopy.schedule.BeginBlockItems toloopy.schedule.EndBlockItems and vice versa.
- loopy.schedule.tools.temporaries_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str][source]¶
- loopy.schedule.tools.args_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str][source]¶
- loopy.schedule.tools.args_written_in_subkernel(kernel: LoopKernel, subkernel_name: str) frozenset[str][source]¶
- loopy.schedule.tools.supporting_temporary_names(kernel: LoopKernel, tv_names: frozenset[str]) frozenset[str][source]¶
- class loopy.schedule.tools.KernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str])[source]¶
- class loopy.schedule.tools.SubKernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str], passed_inames: Sequence[str], passed_temporaries: Sequence[str])[source]¶
Inherits from
KernelArgInfo.
- loopy.schedule.tools.get_kernel_arg_info(kernel: LoopKernel) KernelArgInfo[source]¶
- loopy.schedule.tools.get_subkernel_arg_info(kernel: LoopKernel, subkernel_name: str) SubKernelArgInfo[source]¶
- loopy.schedule.tools.get_return_from_kernel_mapping(kernel: LoopKernel) Mapping[int, int | None][source]¶
Returns a mapping from schedule index of every schedule item (S) in kernel to the schedule index of
loopy.schedule.ReturnFromKernelof the active sub-kernel at ‘S’.
- class loopy.schedule.tools.AccessMapDescriptor(*values)[source]¶
Special access map values.
- Attr DOES_NOT_ACCESS:
Describes an unaccessed variable.
- Attr NON_AFFINE_ACCESS:
Describes a non-quasi-affine access into an array.
- class loopy.schedule.tools.WriteRaceChecker(kernel: LoopKernel, callables_table: CallablesTable)[source]¶
Used for checking for overlap between access ranges of instructions.
- loopy.schedule.tools.separate_loop_nest(tree: LoopNestTree, loop_nests: Collection[InameStrSet], inames_to_separate: InameStrSet) tuple[LoopNestTree, InameStrSet, InameStrSet | None][source]¶
Returns a copy of tree that has inames_to_separate occur in nodes that are not shared with other inames. Returns a version of the loop nest tree tree so that every node in the tree is either a subset of outermost_inames or has an empty intersection with outermost_inames.
This routine modifies at most one node of the tree. All its ancestors must satisfy ancestor <= outermost_inames. For the first node not satisfying this relationship, if node & outermost_inames is empty, no modification is made. Otherwise, if
node & outermost_inames < node, that node is split so as to separate outermost_inames in their own node.- Parameters:
loop_nests – A collection of nodes in tree that cover inames_to_separate.
- Returns:
a
tuple(new_tree, outer_loop_nest, inner_loop_nest), where outer_loop_nest is the identifier for the new outer and inner loop nests so that inames_to_separate is a valid nesting.
Note
We could compute loop_nests within this routine’s implementation, but computing would be expensive and hence we ask the caller for this info.
- Example::
- tree: frozenset()
- └── frozenset({‘j’, ‘i’})
└── frozenset({‘k’, ‘l’})
inames_to_separate: frozenset({‘k’, ‘i’, ‘j’}) loop_nests: {frozenset({‘j’, ‘i’}), frozenset({‘k’, ‘l’})}
Returns:
- new_tree: frozenset()
- └── frozenset({‘j’, ‘i’})
- └── frozenset({‘k’})
└── frozenset({‘l’})
outer_loop_nest: frozenset({‘k’}) inner_loop_nest: frozenset({‘l’})
- loopy.schedule.tools.get_partial_loop_nest_tree(kernel: LoopKernel) LoopNestTree[source]¶
Returns a tree representing the kernel’s loop nests.
Each node of the returned tree has a
frozensetof inames. All the inames in the identifier of a parent node of a loop nest in the tree must be nested outside all the iname in identifier of the loop nest.Note
This routine only takes into account the nesting dependency constraints of
loopy.InstructionBase.within_inamesof all the kernel’s instructions and the iname tags. This routine does NOT include the nesting constraints imposed by the dependencies between the instructions and the dependencies imposed by the kernel’s domain tree.
- loopy.schedule.tools.get_loop_tree(kernel: LoopKernel) LoopTree[source]¶
Returns a tree representing the loop nesting for kernel. A parent node in the tree is always nested outside all its children.
Note
Multiple loop nestings might exist for kernel, but this routine returns one valid loop nesting.
References¶
- class loopy.schedule.tools.InameStrSet¶
- class loopy.schedule.tree.Tree(_parent_to_children: constantdict[NodeT, tuple[NodeT, ...]], _child_to_parent: constantdict[NodeT, NodeT | None])[source]¶
An immutable tree containing nodes of type
NodeT.- ancestors(node: NodeT) tuple[NodeT, ...][source]¶
Returns a
tupleof nodes that are ancestors of node.
- add_node(node: NodeT, parent: NodeT) Tree[NodeT][source]¶
Returns a
Treewith added node node having a parent parent.
- replace_node(node: NodeT, new_node: NodeT) Tree[NodeT][source]¶
Returns a copy of self with node replaced with new_node.
- move_node(node: NodeT, new_parent: NodeT | None) Tree[NodeT][source]¶
Returns a copy of self with node node as a child of new_parent.
Note
Almost all the operations are implemented recursively. NOT suitable for deep trees. At the very least if the Python implementation is CPython this allocates a new stack frame for each iteration of the operation.
References¶
Mostly things that Sphinx (our documentation tool) should resolve but won’t.
- class loopy.schedule.tree.EllipsisType¶
See
types.EllipsisType.
- class loopy.schedule.tree.ASTType¶
A type variable, representing an AST node. For now, either
cgen.Generableorgenpy.Generable.
- class loopy.schedule.tree.constantdict¶
- class loopy.schedule.tree.DTypeLike¶
- class p.Call¶
- class p.CallWithKwargs¶
- class isl.Space¶
See
islpy.Space.
- class isl.PwAff¶
See
islpy.PwAff.
- class isl.BasicSet¶
See
islpy.BasicSet.