Reference: Documentation for Internal API¶

Targets¶

References¶

class loopy.target.c.codegen.expression.Generable¶: See cgen.Generable.

Symbolic¶

Loopy-specific expression types¶

class loopy.symbolic.Literal(s: str)[source]¶: A literal to be used during code generation.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.ArrayLiteral(children: tuple[Expression, ...])[source]¶: An array literal.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.FunctionIdentifier[source]¶: A base class for symbols representing functions.

class loopy.symbolic.TypedCSE(child: _Expression, prefix: str | None = None, scope: str = 'pymbolic_eval', dtype: LoopyType | None = None)[source]¶

A pymbolic.primitives.CommonSubexpression annotated with a type.

dtype: LoopyType | None = None¶

class loopy.TypeCast(type: ToLoopyTypeConvertible, child: Expression)[source]¶

Only defined for numerical types with semantics matching numpy.ndarray.astype().

child: Expression¶: The expression to be cast.

type¶

class loopy.TaggedVariable(name: str, tags: Iterable[Tag] | Tag | None)[source]¶

This is an identifier with tags, such as matrix$one, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.

tags: frozenset[Tag]¶: A frozenset of subclasses of pytools.tag.Tag used to provide metadata on this object. Legacy string tags are converted to LegacyStringInstructionTag or, if they used to carry a functional meaning, the tag carrying that same functional meaning (e.g. UseStreamingStoreTag).

Inherits from pymbolic.primitives.Variable and pytools.tag.Taggable.

class loopy.Reduction(operation: ReductionOperation | str, inames: tuple[str | pymbolic.primitives.Variable, ...] | pymbolic.primitives.Variable | str, expr: Expression, allow_simultaneous: bool = False)[source]¶

Represents a reduction operation on expr across inames.

operation: ReductionOperation¶

inames: Sequence[str]¶: The inames across which reduction on expr is being carried out.

expr: Expression¶

An expression which may have tuple type. If the expression has tuple type, it must be one of the following:

a tuple of pymbolic.typing.Expression, or
a loopy.symbolic.Reduction, or
a function call or substitution rule invocation.

allow_simultaneous: bool¶: If not True, an iname is allowed to be used in precisely one reduction, to avoid misnesting errors.

class loopy.LinearSubscript(aggregate: Expression, index: Expression = <function ExpressionNode.index>)[source]¶: Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.

class loopy.symbolic.SubArrayRef(swept_inames: tuple[Variable, ...], subscript: Subscript)[source]¶

An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using SubArrayRef.swept_inames.

swept_inames¶: An instance of tuple denoting the axes to which the sub array is supposed to be mapped to.

subscript¶: An instance of pymbolic.primitives.Subscript denoting the array in the kernel.

is_equal(other) → bool[source]¶

class loopy.symbolic.RuleArgument(index: int = <function ExpressionNode.index>)[source]¶: Represents a (numbered) argument of a loopy.SubstitutionRule. Only used internally in the rule-aware mappers to match subst rules independently of argument names.

class loopy.symbolic.ResolvedFunction(function: Variable | ReductionOpFunction)[source]¶

A function identifier whose definition is known in a loopy program. A function is said to be known in a TranslationUnit if its name maps to an InKernelCallable in loopy.TranslationUnit.callables_table. Refer to Function Interface.

function: Variable | ReductionOpFunction¶

name¶

Rule-aware Mappers¶

class loopy.symbolic.SubstitutionRuleMappingContext(old_subst_rules, make_unique_var_name)[source]¶

class loopy.symbolic.ExpansionState(kernel: LoopKernel, instruction: InstructionBase, stack: tuple[tuple[str, Tag], ...], arg_context: Mapping[str, Expression])[source]¶

kernel¶

instruction¶

stack¶: a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.

arg_context¶: a dict representing current argument values

class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context: SubstitutionRuleMappingContext)[source]¶

Note: the third argument dragged around by this mapper is the current ExpansionState.

Subclasses of this must be careful to not touch identifiers that are in ExpansionState.arg_context.

Expression Manipulation Helpers¶

loopy.symbolic.simplify_using_aff(kernel, expr)[source]¶

Simplifies expr on kernel’s domain.

Parameters:: expr – An instance of pymbolic.typing.Expression.

References¶

class loopy.symbolic.Variable[source]¶: See pymbolic.Variable.

class loopy.symbolic.Expression¶: See pymbolic.typing.Expression.

class loopy.symbolic._Expression¶: See pymbolic.primitives.ExpressionNode.

Types¶

DTypes of variables in a loopy.LoopKernel must be picklable, so in the codegen pipeline user-provided types are converted to loopy.types.LoopyType.

class loopy.LoopyType[source]¶: Abstract class for dtypes of variables encountered in a loopy.LoopKernel.

loopy.ToLoopyTypeConvertible¶: alias of Type[auto] | Type[generic] | dtype | LoopyType | str | None

class loopy.NumpyType(dtype: dtype)[source]¶

class loopy.types.AtomicType[source]¶: Abstract class for dtypes of variables encountered in a loopy.LoopKernel on which atomic operations are performed .

class loopy.types.AtomicNumpyType(dtype: dtype)[source]¶: A dtype wrapper that indicates that the described type should be capable of atomic operations.

Type inference¶

class loopy.type_inference.TypeInferenceMapper(kernel, clbl_inf_ctx, new_assignments=None)[source]¶

Codegen¶

class loopy.codegen.PreambleInfo(kernel: LoopKernel, seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], codegen_state: CodeGenerationState)[source]¶

kernel: LoopKernel¶

seen_dtypes: set[LoopyType]¶

seen_functions: set[SeenFunction]¶

seen_atomic_dtypes: set[LoopyType]¶

class loopy.codegen.VectorizationInfo(iname: str, length: int)[source]¶

iname¶

length¶

space¶

class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: tuple[LoopyType, ...], result_dtypes: tuple[LoopyType, ...])[source]¶

This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with InKernelCallable.

name¶

c_name¶

arg_dtypes¶: a tuple of arg dtypes

result_dtypes¶: a tuple of result dtypes

class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: islpy.Set, implemented_predicates: frozenset[str | Expression], seen_dtypes: set[LoopyType], seen_functions: set[SeenFunction], seen_atomic_dtypes: set[LoopyType], var_subst_map: constantdict.constantdict[str, Expression], allow_complex: bool, callables_table: CallablesTable, is_entrypoint: bool, var_name_generator: pytools.UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cache_manager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]¶

kernel: LoopKernel¶

target: TargetBase¶

implemented_domain: islpy.Set¶: The entire implemented domain (as an islpy.Set) i.e. all constraints that have been enforced so far.

implemented_predicates: frozenset[str | Expression]¶

seen_dtypes: set[LoopyType]¶

seen_functions: set[SeenFunction]¶

seen_atomic_dtypes¶

var_subst_map: constantdict.constantdict[str, Expression]¶

allow_complex: bool¶

vectorization_info: VectorizationInfo | None = None¶

is_generating_device_code: bool¶

gen_program_name: str¶

schedule_index_end: int¶

callables_table: CallablesTable¶

is_entrypoint: bool¶

codegen_cache_manager: CodegenOperationCacheManager¶

class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[tuple[int, str]] = (), device_preambles: Sequence[tuple[int, str]] = ())[source]¶

host_program¶: A mapping from names of entrypoints to their host GeneratedProgram.

device_programs¶: A list of GeneratedProgram instances intended to run on the compute device.

host_preambles¶

device_preambles¶

host_code()[source]¶

device_code()[source]¶

all_code()[source]¶

class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, islpy.Set], host_preambles: Sequence[tuple[str, str]] = (), device_preambles: Sequence[tuple[str, str]] = ())[source]¶

host_program¶

device_programs¶: A list of GeneratedProgram instances intended to run on the compute device.

implemented_domains¶: A mapping from instruction ID to a list of islpy.Set objects.

host_preambles¶

device_preambles¶

host_code()[source]¶

device_code()[source]¶

all_code()[source]¶

loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) → CodeGenerationResult[source]¶

loopy.codegen.result.generate_host_or_device_program(codegen_state, schedule_index)[source]¶

class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: list[InstructionBase], linearization: list[ScheduleItem], inames: dict[str, loopy.kernel.data.Iname])[source]¶: Proxy to loopy.LoopKernel to be used by CodegenOperationCacheManager.

class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]¶

Caches operations arising during the codegen pipeline.

kernel_proxy¶: An instance of KernelProxyForCodegenOperationCacheManager.

with_kernel(kernel)[source]¶: Returns a new instance of CodegenOperationCacheManager corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.

get_concurrent_inames_in_a_callkernel(callkernel_index: int) → frozenset[str][source]¶

Returns a frozenset of concurrent inames in a callkernel

Parameters:: callkernel_index – Index of the loopy.schedule.CallKernel in the CodegenOperationCacheManager.kernel_proxy’s schedule, whose parallel inames are to be found.

References¶

class loopy.codegen.tools.ExpressionNode¶: See pymbolic.primitives.ExpressionNode.

Reduction Operation¶

class loopy.library.reduction.ReductionOperation[source]¶: Subclasses of this type have to be hashable, picklable, and equality-comparable.

class loopy.library.reduction.ScalarReductionOperation[source]¶

class loopy.library.reduction.SumReductionOperation[source]¶

class loopy.library.reduction.ProductReductionOperation[source]¶

class loopy.library.reduction.MaxReductionOperation[source]¶

class loopy.library.reduction.MinReductionOperation[source]¶

class loopy.library.reduction.ReductionOpFunction(reduction_op: 'ReductionOperation')[source]¶

Iname Tags¶

loopy.kernel.data.filter_iname_tags_by_type(tags: Iterable[Tag], tag_type: type[TagT] | tuple[type[TagT], ...], max_num: int | None = None, min_num: int | None = None) → set[TagT][source]¶

Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.

Parameters:

tags – An iterable of tags.
tag_type – a subclass of loopy.kernel.data.InameImplementationTag.
max_num – the maximum number of tags expected to be found.
min_num – the minimum number of tags expected to be found.

class loopy.kernel.data.InameImplementationTag(*args, **kwargs)[source]¶

class loopy.kernel.data.ConcurrentTag(*args, **kwargs)[source]¶

class loopy.kernel.data.UniqueInameTag(*args, **kwargs)[source]¶

class loopy.kernel.data.AxisTag(axis)[source]¶

class loopy.kernel.data.LocalInameTag(axis)[source]¶

class loopy.kernel.data.GroupInameTag(axis)[source]¶

class loopy.kernel.data.VectorizeTag(*args, **kwargs)[source]¶

class loopy.kernel.data.UnrollTag(*args, **kwargs)[source]¶

class loopy.kernel.data.Iname(name: str, tags: frozenset[Tag])[source]¶

Records an iname in a LoopKernel. See Loop Domain Forest for semantics of inames in loopy.

This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like loopy.kernel.data.InameImplementationTag or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.

name¶: An instance of str, denoting the iname’s name.

tags¶: An instance of frozenset of pytools.tag.Tag.

References¶

class loopy.kernel.data.ToLoopyTypeConvertible¶: See loopy.ToLoopyTypeConvertible.

class loopy.kernel.data.TagT¶: A type variable with a lower bound of pytools.tag.Tag.

Array¶

class loopy.kernel.array.ArrayDimImplementationTag(*args, **kwargs)[source]¶

class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]¶

target_axis¶

For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.

Note that “normal” arrays only have one target_axis.

layout_nesting_level¶

For determining the stride of ComputedStrideArrayDimTag, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a single ArrayBase.dim_tags. The lowest nesting level varies fastest when viewed in linear memory.

May be None on FixedStrideArrayDimTag, in which case no ComputedStrideArrayDimTag instances may occur.

class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]¶

An arg dimension implementation tag for a fixed (potentially symbolic) stride.

stride¶

May be one of the following:

A Expression, including an integer, indicating the stride in units of the underlying array’s ArrayBase.dtype.
loopy.auto, indicating that a new kernel argument for this stride should automatically be created.

The stride is given in units of ArrayBase.dtype.

class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]¶

pad_to¶: ArrayBase.dtype granularity to which to pad this dimension

This type of stride arg dim gets converted to FixedStrideArrayDimTag on input to ArrayBase subclasses.

class loopy.kernel.array.SeparateArrayArrayDimTag(*args, **kwargs)[source]¶

class loopy.kernel.array.VectorArrayDimTag(*args, **kwargs)[source]¶

loopy.kernel.array.parse_array_dim_tags(dim_tags, n_axes=None, use_increasing_target_axes=False, dim_names=None)[source]¶

Cross-references¶

(This section shouldn’t exist: Sphinx should be able to resolve these on its own.)

class loopy.kernel.array.ShapeType¶: See loopy.typing.ShapeType

class loopy.kernel.array.Tag[source]¶: See pytools.tag.Tag

Checks¶

loopy.check.check_for_integer_subscript_indices(t_unit)[source]¶: Checks if every array access is of type int.

loopy.check.check_for_duplicate_insn_ids(knl: LoopKernel) → None[source]¶: Check if multiple instructions of knl have the same loopy.InstructionBase.id.

loopy.check.check_for_double_use_of_hw_axes(t_unit: TranslationUnit) → None[source]¶: Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.

loopy.check.check_insn_attributes(kernel: LoopKernel) → None[source]¶: Check for legality of attributes of every instruction in kernel.

loopy.check.check_loop_priority_inames_known(kernel: LoopKernel) → None[source]¶: Checks if the inames in loopy.LoopKernel.loop_priority are part of the kernel’s domain.

loopy.check.check_multiple_tags_allowed(kernel: LoopKernel) → None[source]¶: Checks if a multiple tags of an iname are compatible.

loopy.check.check_for_inactive_iname_access(kernel: LoopKernel) → None[source]¶: Check if any instruction accesses an iname but is not within it.

loopy.check.check_for_unused_inames(kernel: LoopKernel) → None[source]¶: Check if there are any unused inames in the kernel.

loopy.check.check_for_write_races(kernel: LoopKernel) → None[source]¶: Check if any memory accesses lead to write races.

loopy.check.check_for_data_dependent_parallel_bounds(kernel: LoopKernel) → None[source]¶: Check that inames tagged as hw axes have bounds that are known at kernel launch.

loopy.check.check_bounds(t_unit: TranslationUnit) → None[source]¶: Performs out-of-bound check for every array access.

loopy.check.check_variable_access_ordered(kernel: LoopKernel) → None[source]¶

Checks that between each write to a variable and all other accesses to the variable there is either:

a direct/indirect dependency edge, or
an explicit statement that no ordering is necessary (expressed through a bi-directional loopy.InstructionBase.no_sync_with)

Schedule¶

class loopy.schedule.ScheduleItem[source]¶

class loopy.schedule.BeginBlockItem[source]¶

class loopy.schedule.EndBlockItem[source]¶

class loopy.schedule.CallKernel(kernel_name: 'str')[source]¶

class loopy.schedule.ReturnFromKernel(kernel_name: 'str')[source]¶

class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]¶

comment¶: A plain-text comment explaining why the barrier was inserted.

synchronization_kind¶: "local" or "global"

mem_kind¶: "local" or "global"

originating_insn_id¶

class loopy.schedule.RunInstruction(insn_id: 'str')[source]¶

class loopy.schedule.MinRecursionLimitForScheduling(kernel)[source]¶

loopy.schedule.tools.get_block_boundaries(schedule: Sequence[ScheduleItem]) → Mapping[int, int][source]¶: Return a dictionary mapping indices of loopy.schedule.BeginBlockItems to loopy.schedule.EndBlockItems and vice versa.

loopy.schedule.tools.temporaries_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) → frozenset[str][source]¶

loopy.schedule.tools.args_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) → frozenset[str][source]¶

loopy.schedule.tools.args_written_in_subkernel(kernel: LoopKernel, subkernel_name: str) → frozenset[str][source]¶

loopy.schedule.tools.supporting_temporary_names(kernel: LoopKernel, tv_names: frozenset[str]) → frozenset[str][source]¶

class loopy.schedule.tools.KernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str])[source]¶

passed_arg_names: Sequence[str]¶

written_names: frozenset[str]¶

class loopy.schedule.tools.SubKernelArgInfo(passed_arg_names: Sequence[str], written_names: frozenset[str], passed_inames: Sequence[str], passed_temporaries: Sequence[str])[source]¶

Inherits from KernelArgInfo.

passed_inames: Sequence[str]¶

passed_temporaries: Sequence[str]¶

loopy.schedule.tools.get_kernel_arg_info(kernel: LoopKernel) → KernelArgInfo[source]¶

loopy.schedule.tools.get_subkernel_arg_info(kernel: LoopKernel, subkernel_name: str) → SubKernelArgInfo[source]¶

loopy.schedule.tools.get_return_from_kernel_mapping(kernel: LoopKernel) → Mapping[int, int | None][source]¶: Returns a mapping from schedule index of every schedule item (S) in kernel to the schedule index of loopy.schedule.ReturnFromKernel of the active sub-kernel at ‘S’.

class loopy.schedule.tools.AccessMapDescriptor(*values)[source]¶

Special access map values.

Attr DOES_NOT_ACCESS:: Describes an unaccessed variable.
Attr NON_AFFINE_ACCESS:: Describes a non-quasi-affine access into an array.

class loopy.schedule.tools.WriteRaceChecker(kernel, callables_table)[source]¶: Used for checking for overlap between access ranges of instructions.

loopy.schedule.tools.LoopNestTree¶: alias of Tree[frozenset[str]]

loopy.schedule.tools.LoopTree¶: alias of Tree[str]

loopy.schedule.tools.separate_loop_nest(tree: LoopNestTree, loop_nests: Collection[InameStrSet], inames_to_separate: InameStrSet) → tuple[LoopNestTree, InameStrSet, InameStrSet | None][source]¶

Returns a copy of tree that has inames_to_separate occur in nodes that are not shared with other inames. Returns a version of the loop nest tree tree so that every node in the tree is either a subset of outermost_inames or has an empty intersection with outermost_inames.

This routine modifies at most one node of the tree. All its ancestors must satisfy ancestor <= outermost_inames. For the first node not satisfying this relationship, if node & outermost_inames is empty, no modification is made. Otherwise, if node & outermost_inames < node, that node is split so as to separate outermost_inames in their own node.

Parameters:: loop_nests – A collection of nodes in tree that cover inames_to_separate.
Returns:: a tuple (new_tree, outer_loop_nest, inner_loop_nest), where outer_loop_nest is the identifier for the new outer and inner loop nests so that inames_to_separate is a valid nesting.

Note

We could compute loop_nests within this routine’s implementation, but computing would be expensive and hence we ask the caller for this info.

Example::

tree: frozenset()

└── frozenset({‘j’, ‘i’}): └── frozenset({‘k’, ‘l’})

inames_to_separate: frozenset({‘k’, ‘i’, ‘j’}) loop_nests: {frozenset({‘j’, ‘i’}), frozenset({‘k’, ‘l’})}

Returns:

new_tree: frozenset()

└── frozenset({‘j’, ‘i’})

└── frozenset({‘k’}): └── frozenset({‘l’})

outer_loop_nest: frozenset({‘k’}) inner_loop_nest: frozenset({‘l’})

loopy.schedule.tools.get_partial_loop_nest_tree(kernel: LoopKernel) → LoopNestTree[source]¶

Returns a tree representing the kernel’s loop nests.

Each node of the returned tree has a frozenset of inames. All the inames in the identifier of a parent node of a loop nest in the tree must be nested outside all the iname in identifier of the loop nest.

Note

This routine only takes into account the nesting dependency constraints of loopy.InstructionBase.within_inames of all the kernel’s instructions and the iname tags. This routine does NOT include the nesting constraints imposed by the dependencies between the instructions and the dependencies imposed by the kernel’s domain tree.

loopy.schedule.tools.get_loop_tree(kernel: LoopKernel) → LoopTree[source]¶: Returns a tree representing the loop nesting for kernel. A parent node in the tree is always nested outside all its children.

Note

Multiple loop nestings might exist for kernel, but this routine returns one valid loop nesting.

References¶

class loopy.schedule.tools.InameStrSet¶: See loopy.typing.InameStrSet

class loopy.schedule.tree.NodeT¶: alias of TypeVar(‘NodeT’, bound=Hashable)

class loopy.schedule.tree.Tree(_parent_to_children: constantdict[NodeT, tuple[NodeT, ...]], _child_to_parent: constantdict[NodeT, NodeT | None])[source]¶

An immutable tree containing nodes of type NodeT.

ancestors(node: NodeT) → tuple[NodeT, ...][source]¶: Returns a tuple of nodes that are ancestors of node.

parent(node: NodeT) → NodeT | None[source]¶: Returns the parent of node.

children(node: NodeT) → tuple[NodeT, ...][source]¶: Returns the children of node.

add_node(node: NodeT, parent: NodeT) → Tree[NodeT][source]¶: Returns a Tree with added node node having a parent parent.

depth(node: NodeT) → int[source]¶: Returns the depth of node, with the root having depth 0.

replace_node(node: NodeT, new_node: NodeT) → Tree[NodeT][source]¶: Returns a copy of self with node replaced with new_node.

move_node(node: NodeT, new_parent: NodeT | None) → Tree[NodeT][source]¶: Returns a copy of self with node node as a child of new_parent.

__contains__(node: NodeT) → bool[source]¶: Return True if node is a node in the tree.

Note

Almost all the operations are implemented recursively. NOT suitable for deep trees. At the very least if the Python implementation is CPython this allocates a new stack frame for each iteration of the operation.