Reference: Documentation for Internal API

Targets

See also Targets.

class loopy.target.c.POD(ast_builder, dtype, name)[source]

A simple declarator: The type is given as a numpy.dtype and the name is given as a string.

class loopy.target.c.ScopingBlock(contents=None)[source]

A block that is mandatory for scoping and may not be simplified away by loopy.codegen.result.merge_codegen_results().

class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]

Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.

Note

  • All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.

Symbolic

See also Expressions.

Loopy-specific expression types

class loopy.symbolic.Literal(s)[source]

A literal to be used during code generation.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.ArrayLiteral(children)[source]

An array literal.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.FunctionIdentifier[source]

A base class for symbols representing functions.

class loopy.symbolic.TypedCSE(child, prefix=None, dtype=None)[source]

A pymbolic.primitives.CommonSubexpression annotated with a numpy.dtype.

class loopy.TypeCast(type: Type[auto] | None | dtype | LoopyType, child: int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression)[source]

Only defined for numerical types with semantics matching numpy.ndarray.astype().

child

The expression to be cast.

class loopy.TaggedVariable(name, tags)[source]

This is an identifier with tags, such as matrix$one, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.

tags: frozenset[Tag]

A frozenset of subclasses of pytools.tag.Tag used to provide metadata on this object. Legacy string tags are converted to LegacyStringInstructionTag or, if they used to carry a functional meaning, the tag carrying that same functional meaning (e.g. UseStreamingStoreTag).

Inherits from pymbolic.primitives.Variable and pytools.tag.Taggable.

class loopy.Reduction(operation: ReductionOperation | str, inames: tuple[str | pymbolic.primitives.Variable, ...] | pymbolic.primitives.Variable | str, expr: ExpressionT, allow_simultaneous: bool = False)[source]

Represents a reduction operation on expr across inames.

operation: ReductionOperation
inames: Sequence[str]

The inames across which reduction on expr is being carried out.

expr: ExpressionT

An expression which may have tuple type. If the expression has tuple type, it must be one of the following:

allow_simultaneous: bool

If not True, an iname is allowed to be used in precisely one reduction, to avoid misnesting errors.

class loopy.LinearSubscript(aggregate, index)[source]

Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.

class loopy.symbolic.RuleArgument(index)[source]

Represents a (numbered) argument of a loopy.SubstitutionRule. Only used internally in the rule-aware mappers to match subst rules independently of argument names.

class loopy.symbolic.ExpansionState(kernel, instruction, stack, arg_context)[source]
kernel
instruction
stack

a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.

arg_context

a dict representing current argument values

class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context)[source]

Note: the third argument dragged around by this mapper is the current ExpansionState.

Subclasses of this must be careful to not touch identifiers that are in ExpansionState.arg_context.

class loopy.symbolic.ResolvedFunction(function)[source]

A function identifier whose definition is known in a loopy program. A function is said to be known in a TranslationUnit if its name maps to an InKernelCallable in loopy.TranslationUnit.callables_table. Refer to Function Interface.

function

An instance of pymbolic.primitives.Variable or loopy.library.reduction.ReductionOpFunction.

class loopy.symbolic.SubArrayRef(swept_inames: tuple[Variable, ...] | Variable, subscript: Subscript)[source]

An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using SubArrayRef.swept_inames.

swept_inames

An instance of tuple denoting the axes to which the sub array is supposed to be mapped to.

subscript

An instance of pymbolic.primitives.Subscript denoting the array in the kernel.

is_equal(other)[source]

Returns True iff the sub-array refs have identical expressions.

Expression Manipulation Helpers

loopy.symbolic.simplify_using_aff(kernel, expr)[source]

Simplifies expr on kernel’s domain.

Parameters:

expr – An instance of pymbolic.primitives.Expression.

Types

DTypes of variables in a loopy.LoopKernel must be picklable, so in the codegen pipeline user-provided types are converted to loopy.types.LoopyType.

class loopy.types.LoopyType[source]

Abstract class for dtypes of variables encountered in a loopy.LoopKernel.

class loopy.types.NumpyType(dtype: dtype)[source]
class loopy.types.AtomicType[source]

Abstract class for dtypes of variables encountered in a loopy.LoopKernel on which atomic operations are performed .

class loopy.types.AtomicNumpyType(dtype: dtype)[source]

A dtype wrapper that indicates that the described type should be capable of atomic operations.

Codegen

class loopy.codegen.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]
class loopy.codegen.VectorizationInfo(iname: str, length: int, space: Space)[source]
iname
length
space
class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: Tuple[LoopyType, ...], result_dtypes: Tuple[LoopyType, ...])[source]

This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with InKernelCallable.

name
c_name
arg_dtypes

a tuple of arg dtypes

result_dtypes

a tuple of result dtypes

class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: Set, implemented_predicates: FrozenSet[str | int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], seen_dtypes: Set[LoopyType], seen_functions: Set[SeenFunction], seen_atomic_dtypes: Set[LoopyType], var_subst_map: Map[str, int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], allow_complex: bool, callables_table: Mapping[str | ReductionOpFunction, InKernelCallable], is_entrypoint: bool, var_name_generator: UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cachemanager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]
kernel
target
implemented_domain

The entire implemented domain (as an islpy.Set) i.e. all constraints that have been enforced so far.

implemented_predicates

A frozenset of predicates for which checks have been implemented.

seen_dtypes

set of dtypes that were encountered

seen_functions

set of SeenFunction instances

seen_atomic_dtypes
var_subst_map
allow_complex
vectorization_info

None (to mean vectorization has not yet been applied), or an instance of VectorizationInfo.

is_generating_device_code
gen_program_name

None (indicating that host code is being generated) or the name of the device program currently being generated.

schedule_index_end
callables_table

A mapping from callable names to instances of loopy.kernel.function_interface.InKernelCallable.

is_entrypoint

A bool to indicate if the code is being generated for an entrypoint kernel

codegen_cache_manager

An instance of loopy.codegen.tools.CodegenOperationCacheManager.

class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[Tuple[int, str]] = (), device_preambles: Sequence[Tuple[int, str]] = ())[source]
host_program

A mapping from names of entrypoints to their host GeneratedProgram.

device_programs

A list of GeneratedProgram instances intended to run on the compute device.

host_preambles
device_preambles
host_code()[source]
device_code()[source]
all_code()[source]
class loopy.codegen.result.GeneratedProgram(name: str, is_device_program: bool, ast: Any, body_ast: Any | None = None)[source]
name
is_device_program
ast

Once generated, this captures the AST of the overall function definition, including the body.

body_ast

Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.

class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, Set], host_preambles: Sequence[Tuple[str, str]] = (), device_preambles: Sequence[Tuple[str, str]] = ())[source]
host_program
device_programs

A list of GeneratedProgram instances intended to run on the compute device.

implemented_domains

A mapping from instruction ID to a list of islpy.Set objects.

host_preambles
device_preambles
host_code()[source]
device_code()[source]
all_code()[source]
loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult[source]
loopy.codegen.result.generate_host_or_device_program(codegen_state, schedule_index)[source]
class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: List[InstructionBase], linearization: List[ScheduleItem], inames: Dict[str, Iname])[source]

Proxy to loopy.LoopKernel to be used by CodegenOperationCacheManager.

class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]

Caches operations arising during the codegen pipeline.

kernel_proxy

An instance of KernelProxyForCodegenOperationCacheManager.

with_kernel(kernel)[source]

Returns a new instance of CodegenOperationCacheManager corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.

get_concurrent_inames_in_a_callkernel(callkernel_index: int) FrozenSet[str][source]

Returns a frozenset of concurrent inames in a callkernel

Parameters:

callkernel_index – Index of the loopy.schedule.CallKernel in the CodegenOperationCacheManager.kernel_proxy’s schedule, whose parallel inames are to be found.

Reduction Operation

class loopy.library.reduction.ReductionOperation[source]

Subclasses of this type have to be hashable, picklable, and equality-comparable.

class loopy.library.reduction.ScalarReductionOperation[source]
class loopy.library.reduction.SumReductionOperation[source]
class loopy.library.reduction.ProductReductionOperation[source]
class loopy.library.reduction.MaxReductionOperation[source]
class loopy.library.reduction.MinReductionOperation[source]
class loopy.library.reduction.ReductionOpFunction(reduction_op)[source]

Iname Tags

loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]

Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.

Parameters:
  • tags – An iterable of tags.

  • tag_type – a subclass of loopy.kernel.data.InameImplementationTag.

  • max_num – the maximum number of tags expected to be found.

  • min_num – the minimum number of tags expected to be found.

class loopy.kernel.data.InameImplementationTag(*args, **kwargs)[source]
class loopy.kernel.data.ConcurrentTag(*args, **kwargs)[source]
class loopy.kernel.data.UniqueInameTag(*args, **kwargs)[source]
class loopy.kernel.data.AxisTag(axis)[source]
class loopy.kernel.data.LocalInameTag(axis)[source]
class loopy.kernel.data.GroupInameTag(axis)[source]
class loopy.kernel.data.VectorizeTag(*args, **kwargs)[source]
class loopy.kernel.data.UnrollTag(*args, **kwargs)[source]
class loopy.kernel.data.Iname(name: str, tags: FrozenSet[Tag])[source]

Records an iname in a LoopKernel. See Loop Domain Forest for semantics of inames in loopy.

This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like loopy.kernel.data.InameImplementationTag or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.

name

An instance of str, denoting the iname’s name.

tags

An instance of frozenset of pytools.tag.Tag.

Array

class loopy.kernel.array.ArrayDimImplementationTag(*args, **kwargs)[source]
class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]
target_axis

For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.

Note that “normal” arrays only have one target_axis.

layout_nesting_level

For determining the stride of ComputedStrideArrayDimTag, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a single ArrayBase.dim_tags. The lowest nesting level varies fastest when viewed in linear memory.

May be None on FixedStrideArrayDimTag, in which case no ComputedStrideArrayDimTag instances may occur.

class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]

An arg dimension implementation tag for a fixed (potentially symbolic) stride.

stride

May be one of the following:

The stride is given in units of ArrayBase.dtype.

class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]
pad_to

ArrayBase.dtype granularity to which to pad this dimension

This type of stride arg dim gets converted to FixedStrideArrayDimTag on input to ArrayBase subclasses.

class loopy.kernel.array.SeparateArrayArrayDimTag(*args, **kwargs)[source]
class loopy.kernel.array.VectorArrayDimTag(*args, **kwargs)[source]
loopy.kernel.array.parse_array_dim_tags(dim_tags, n_axes=None, use_increasing_target_axes=False, dim_names=None)[source]

Cross-references

(This section shouldn’t exist: Sphinx should be able to resolve these on its own.)

class loopy.kernel.array.ShapeType

See loopy.typing.ShapeType

class loopy.kernel.array.ExpressionT

See loopy.typing.ExpressionT

class loopy.kernel.array.Tag[source]

See pytools.tag.Tag

Checks

loopy.check.check_for_integer_subscript_indices(t_unit)[source]

Checks if every array access is of type int.

loopy.check.check_for_duplicate_insn_ids(knl: LoopKernel) None[source]

Check if multiple instructions of knl have the same loopy.InstructionBase.id.

loopy.check.check_for_double_use_of_hw_axes(t_unit: TranslationUnit) None[source]

Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.

loopy.check.check_insn_attributes(kernel: LoopKernel) None[source]

Check for legality of attributes of every instruction in kernel.

loopy.check.check_loop_priority_inames_known(kernel: LoopKernel) None[source]

Checks if the inames in loopy.LoopKernel.loop_priority are part of the kernel’s domain.

loopy.check.check_multiple_tags_allowed(kernel: LoopKernel) None[source]

Checks if a multiple tags of an iname are compatible.

loopy.check.check_for_inactive_iname_access(kernel: LoopKernel) None[source]

Check if any instruction accesses an iname but is not within it.

loopy.check.check_for_unused_inames(kernel: LoopKernel) None[source]

Check if there are any unused inames in the kernel.

loopy.check.check_for_write_races(kernel: LoopKernel) None[source]

Check if any memory accesses lead to write races.

loopy.check.check_for_data_dependent_parallel_bounds(kernel: LoopKernel) None[source]

Check that inames tagged as hw axes have bounds that are known at kernel launch.

loopy.check.check_bounds(t_unit: TranslationUnit) None[source]

Performs out-of-bound check for every array access.

loopy.check.check_variable_access_ordered(kernel: LoopKernel) None[source]

Checks that between each write to a variable and all other accesses to the variable there is either:

Schedule

class loopy.schedule.ScheduleItem[source]
class loopy.schedule.BeginBlockItem[source]
class loopy.schedule.EndBlockItem[source]
class loopy.schedule.CallKernel(kernel_name: 'str')[source]
class loopy.schedule.ReturnFromKernel(kernel_name: 'str')[source]
class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]
comment

A plain-text comment explaining why the barrier was inserted.

synchronization_kind

"local" or "global"

mem_kind

"local" or "global"

originating_insn_id
class loopy.schedule.RunInstruction(insn_id: 'str')[source]
class loopy.schedule.MinRecursionLimitForScheduling(kernel)[source]
loopy.schedule.tools.get_block_boundaries(schedule: Sequence[ScheduleItem]) Mapping[int, int][source]

Return a dictionary mapping indices of loopy.schedule.BeginBlockItems to loopy.schedule.EndBlockItems and vice versa.

loopy.schedule.tools.temporaries_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) FrozenSet[str][source]
loopy.schedule.tools.args_read_in_subkernel(kernel: LoopKernel, subkernel_name: str) FrozenSet[str][source]
loopy.schedule.tools.args_written_in_subkernel(kernel: LoopKernel, subkernel_name: str) FrozenSet[str][source]
loopy.schedule.tools.supporting_temporary_names(kernel: LoopKernel, tv_names: FrozenSet[str]) FrozenSet[str][source]
class loopy.schedule.tools.KernelArgInfo(passed_arg_names: Sequence[str], written_names: FrozenSet[str])[source]
passed_arg_names: Sequence[str]
written_names: FrozenSet[str]
class loopy.schedule.tools.SubKernelArgInfo(passed_arg_names: Sequence[str], written_names: FrozenSet[str], passed_inames: Sequence[str], passed_temporaries: Sequence[str])[source]

Inherits from KernelArgInfo.

passed_inames: Sequence[str]
passed_temporaries: Sequence[str]
loopy.schedule.tools.get_kernel_arg_info(kernel: LoopKernel) KernelArgInfo[source]
loopy.schedule.tools.get_subkernel_arg_info(kernel: LoopKernel, subkernel_name: str) SubKernelArgInfo[source]
loopy.schedule.tools.get_return_from_kernel_mapping(kernel: LoopKernel) Mapping[int, int | None][source]

Returns a mapping from schedule index of every schedule item (S) in kernel to the schedule index of loopy.schedule.ReturnFromKernel of the active sub-kernel at ‘S’.

class loopy.schedule.tools.AccessMapDescriptor(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Special access map values.

Attr DOES_NOT_ACCESS:

Describes an unaccessed variable.

Attr NON_AFFINE_ACCESS:

Describes a non-quasi-affine access into an array.

class loopy.schedule.tools.WriteRaceChecker(kernel, callables_table)[source]

Used for checking for overlap between access ranges of instructions.

loopy.schedule.tools.InameStrSet

alias of FrozenSet[str]

loopy.schedule.tools.LoopNestTree

alias of Tree[FrozenSet[str]]

loopy.schedule.tools.LoopTree

alias of Tree[str]

loopy.schedule.tools.separate_loop_nest(tree: Tree[FrozenSet[str]], loop_nests: Collection[FrozenSet[str]], inames_to_separate: FrozenSet[str]) tuple[Tree[FrozenSet[str]], FrozenSet[str], FrozenSet[str] | None][source]

Returns a copy of tree that has inames_to_separate occur in nodes that are not shared with other inames. Returns a version of the loop nest tree tree so that every node in the tree is either a subset of outermost_inames or has an empty intersection with outermost_inames.

This routine modifies at most one node of the tree. All its ancestors must satisfy ancestor <= outermost_inames. For the first node not satisfying this relationship, if node & outermost_inames is empty, no modification is made. Otherwise, if node & outermost_inames < node, that node is split so as to separate outermost_inames in their own node.

Parameters:

loop_nests – A collection of nodes in tree that cover inames_to_separate.

Returns:

a tuple (new_tree, outer_loop_nest, inner_loop_nest), where outer_loop_nest is the identifier for the new outer and inner loop nests so that inames_to_separate is a valid nesting.

Note

We could compute loop_nests within this routine’s implementation, but computing would be expensive and hence we ask the caller for this info.

Example::
tree: frozenset()
└── frozenset({‘j’, ‘i’})

└── frozenset({‘k’, ‘l’})

inames_to_separate: frozenset({‘k’, ‘i’, ‘j’}) loop_nests: {frozenset({‘j’, ‘i’}), frozenset({‘k’, ‘l’})}

Returns:

new_tree: frozenset()
└── frozenset({‘j’, ‘i’})
└── frozenset({‘k’})

└── frozenset({‘l’})

outer_loop_nest: frozenset({‘k’}) inner_loop_nest: frozenset({‘l’})

loopy.schedule.tools.get_partial_loop_nest_tree(kernel: LoopKernel) Tree[FrozenSet[str]][source]

Returns a tree representing the kernel’s loop nests.

Each node of the returned tree has a frozenset of inames. All the inames in the identifier of a parent node of a loop nest in the tree must be nested outside all the iname in identifier of the loop nest.

Note

This routine only takes into account the nesting dependency constraints of loopy.InstructionBase.within_inames of all the kernel’s instructions and the iname tags. This routine does NOT include the nesting constraints imposed by the dependencies between the instructions and the dependencies imposed by the kernel’s domain tree.

loopy.schedule.tools.get_loop_tree(kernel: LoopKernel) Tree[str][source]

Returns a tree representing the loop nesting for kernel. A parent node in the tree is always nested outside all its children.

Note

Multiple loop nestings might exist for kernel, but this routine returns one valid loop nesting.

class loopy.schedule.tree.NodeT

alias of TypeVar(‘NodeT’, bound=Hashable)

class loopy.schedule.tree.Tree(_parent_to_children: Map[NodeT, Tuple[NodeT, ...]], _child_to_parent: Map[NodeT, NodeT | None])[source]

An immutable tree containing nodes of type NodeT.

ancestors(node: NodeT) Tuple[NodeT, ...][source]

Returns a tuple of nodes that are ancestors of node.

parent(node: NodeT) NodeT | None[source]

Returns the parent of node.

children(node: NodeT) Tuple[NodeT, ...][source]

Returns the children of node.

add_node(node: NodeT, parent: NodeT) Tree[NodeT][source]

Returns a Tree with added node node having a parent parent.

depth(node: NodeT) int[source]

Returns the depth of node, with the root having depth 0.

replace_node(node: NodeT, new_node: NodeT) Tree[NodeT][source]

Returns a copy of self with node replaced with new_node.

move_node(node: NodeT, new_parent: NodeT | None) Tree[NodeT][source]

Returns a copy of self with node node as a child of new_parent.

__contains__(node: NodeT) bool[source]

Return True if node is a node in the tree.

Note

Almost all the operations are implemented recursively. NOT suitable for deep trees. At the very least if the Python implementation is CPython this allocates a new stack frame for each iteration of the operation.