Reference: Documentation for Internal API

Targets

See also Targets.

class loopy.target.c.POD(ast_builder, dtype, name)[source]

A simple declarator: The type is given as a numpy.dtype and the name is given as a string.

class loopy.target.c.ScopingBlock(contents=None)[source]

A block that is mandatory for scoping and may not be simplified away by loopy.codegen.result.merge_codegen_results().

class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]

Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.

Note

  • All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.

Symbolic

See also Expressions.

Loopy-specific expression types

class loopy.symbolic.Literal(s)[source]

A literal to be used during code generation.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.ArrayLiteral(children)[source]

An array literal.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.FunctionIdentifier[source]

A base class for symbols representing functions.

class loopy.symbolic.TypedCSE(child, prefix=None, dtype=None)[source]

A pymbolic.primitives.CommonSubexpression annotated with a numpy.dtype.

class loopy.symbolic.TypeCast(type, child)[source]

Only defined for numerical types with semantics matching numpy.ndarray.astype().

child

The expression to be cast.

class loopy.symbolic.TaggedVariable(name, tags)[source]

This is an identifier with tags, such as matrix$one, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.

tags

A frozenset of subclasses of pytools.tag.Tag used to provide metadata on this object. Legacy string tags are converted to LegacyStringInstructionTag or, if they used to carry a functional meaning, the tag carrying that same fucntional meaning (e.g. UseStreamingStoreTag).

Inherits from pymbolic.primitives.Variable and pytools.tag.Taggable.

class loopy.symbolic.Reduction(operation, inames, expr, allow_simultaneous=False)[source]

Represents a reduction operation on expr across inames.

operation
an instance of :class:`loopy.library.reduction.ReductionOperation`
inames

a list of inames across which reduction on expr is being carried out.

expr

An expression which may have tuple type. If the expression has tuple type, it must be one of the following: * a tuple of pymbolic.primitives.Expression, or * a loopy.symbolic.Reduction, or * a function call or substitution rule invocation.

allow_simultaneous

A bool. If not True, an iname is allowed to be used in precisely one reduction, to avoid mis-nesting errors.

class loopy.symbolic.LinearSubscript(aggregate, index)[source]

Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.

class loopy.symbolic.RuleArgument(index)[source]

Represents a (numbered) argument of a loopy.SubstitutionRule. Only used internally in the rule-aware mappers to match subst rules independently of argument names.

class loopy.symbolic.ExpansionState(kernel, instruction, stack, arg_context)[source]
kernel
instruction
stack

a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.

arg_context

a dict representing current argument values

class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context)[source]

Note: the third argument dragged around by this mapper is the current ExpansionState.

Subclasses of this must be careful to not touch identifiers that are in ExpansionState.arg_context.

class loopy.symbolic.ResolvedFunction(function)[source]

A function identifier whose definition is known in a loopy program. A function is said to be known in a TranslationUnit if its name maps to an InKernelCallable in loopy.TranslationUnit.callables_table. Refer to Function Interface.

function

An instance of pymbolic.primitives.Variable or loopy.library.reduction.ReductionOpFunction.

class loopy.symbolic.SubArrayRef(swept_inames, subscript)[source]

An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using SubArrayRef.swept_inames.

swept_inames

An instance of tuple denoting the axes to which the sub array is supposed to be mapped to.

subscript

An instance of pymbolic.primitives.Subscript denoting the array in the kernel.

is_equal(other)[source]

Returns True iff the sub-array refs have identical expressions.

Expression Manipulation Helpers

loopy.symbolic.simplify_using_aff(kernel, expr)[source]

Simplifies expr on kernel’s domain.

Parameters:

expr – An instance of pymbolic.primitives.Expression.

Types

DTypes of variables in a loopy.LoopKernel must be picklable, so in the codegen pipeline user-provided types are converted to loopy.types.LoopyType.

class loopy.types.LoopyType[source]

Abstract class for dtypes of variables encountered in a loopy.LoopKernel.

class loopy.types.NumpyType(dtype: dtype)[source]
class loopy.types.AtomicType[source]

Abstract class for dtypes of variables encountered in a loopy.LoopKernel on which atomic operations are performed .

class loopy.types.AtomicNumpyType(dtype: dtype)[source]

A dtype wrapper that indicates that the described type should be capable of atomic operations.

Codegen

class loopy.codegen.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]
class loopy.codegen.VectorizationInfo(iname: str, length: int, space: Space)[source]
iname
length
space
class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: Tuple[LoopyType, ...], result_dtypes: Tuple[LoopyType, ...])[source]

This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with InKernelCallable.

name
c_name
arg_dtypes

a tuple of arg dtypes

result_dtypes

a tuple of result dtypes

class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: Set, implemented_predicates: FrozenSet[str | int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], seen_dtypes: Set[LoopyType], seen_functions: Set[SeenFunction], seen_atomic_dtypes: Set[LoopyType], var_subst_map: Map[str, int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], allow_complex: bool, callables_table: Mapping[str | ReductionOpFunction, InKernelCallable], is_entrypoint: bool, var_name_generator: UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cachemanager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]
kernel
target
implemented_domain

The entire implemented domain (as an islpy.Set) i.e. all constraints that have been enforced so far.

implemented_predicates

A frozenset of predicates for which checks have been implemented.

seen_dtypes

set of dtypes that were encountered

seen_functions

set of SeenFunction instances

seen_atomic_dtypes
var_subst_map
allow_complex
vectorization_info

None (to mean vectorization has not yet been applied), or an instance of VectorizationInfo.

is_generating_device_code
gen_program_name

None (indicating that host code is being generated) or the name of the device program currently being generated.

schedule_index_end
callables_table

A mapping from callable names to instances of loopy.kernel.function_interface.InKernelCallable.

is_entrypoint

A bool to indicate if the code is being generated for an entrypoint kernel

codegen_cache_manager

An instance of loopy.codegen.tools.CodegenOperationCacheManager.

class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[Tuple[int, str]] = (), device_preambles: Sequence[Tuple[int, str]] = ())[source]
host_program

A mapping from names of entrypoints to their host GeneratedProgram.

device_programs

A list of GeneratedProgram instances intended to run on the compute device.

host_preambles
device_preambles
host_code()[source]
device_code()[source]
all_code()[source]
class loopy.codegen.result.GeneratedProgram(name: str, is_device_program: bool, ast: Any, body_ast: Any | None = None)[source]
name
is_device_program
ast

Once generated, this captures the AST of the overall function definition, including the body.

body_ast

Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.

class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, Set], host_preambles: Sequence[Tuple[str, str]] = (), device_preambles: Sequence[Tuple[str, str]] = ())[source]
host_program
device_programs

A list of GeneratedProgram instances intended to run on the compute device.

implemented_domains

A mapping from instruction ID to a list of islpy.Set objects.

host_preambles
device_preambles
host_code()[source]
device_code()[source]
all_code()[source]
loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult[source]
loopy.codegen.result.generate_host_or_device_program(codegen_state, schedule_index)[source]
class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: List[InstructionBase], linearization: List[ScheduleItem], inames: Dict[str, Iname])[source]

Proxy to loopy.LoopKernel to be used by CodegenOperationCacheManager.

class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]

Caches operations arising during the codegen pipeline.

kernel_proxy

An instance of KernelProxyForCodegenOperationCacheManager.

with_kernel(kernel)[source]

Returns a new instance of CodegenOperationCacheManager corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.

get_concurrent_inames_in_a_callkernel(callkernel_index: int) FrozenSet[str][source]

Returns a frozenset of concurrent inames in a callkernel

Parameters:

callkernel_index – Index of the loopy.schedule.CallKernel in the CodegenOperationCacheManager.kernel_proxy’s schedule, whose parallel inames are to be found.

Reduction Operation

class loopy.library.reduction.ReductionOperation[source]

Subclasses of this type have to be hashable, picklable, and equality-comparable.

class loopy.library.reduction.ScalarReductionOperation[source]
class loopy.library.reduction.SumReductionOperation[source]
class loopy.library.reduction.ProductReductionOperation[source]
class loopy.library.reduction.MaxReductionOperation[source]
class loopy.library.reduction.MinReductionOperation[source]
class loopy.library.reduction.ReductionOpFunction(reduction_op)[source]

Iname Tags

loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]

Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.

Parameters:
  • tags – An iterable of tags.

  • tag_type – a subclass of loopy.kernel.data.InameImplementationTag.

  • max_num – the maximum number of tags expected to be found.

  • min_num – the minimum number of tags expected to be found.

class loopy.kernel.data.InameImplementationTag(*args, **kwargs)[source]
class loopy.kernel.data.ConcurrentTag(*args, **kwargs)[source]
class loopy.kernel.data.UniqueInameTag(*args, **kwargs)[source]
class loopy.kernel.data.AxisTag(axis)[source]
class loopy.kernel.data.LocalInameTag(axis)[source]
class loopy.kernel.data.GroupInameTag(axis)[source]
class loopy.kernel.data.VectorizeTag(*args, **kwargs)[source]
class loopy.kernel.data.UnrollTag(*args, **kwargs)[source]
class loopy.kernel.data.Iname(name: str, tags: FrozenSet[Tag])[source]

Records an iname in a LoopKernel. See Loop Domain Forest for semantics of inames in loopy.

This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like loopy.kernel.data.InameImplementationTag or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.

name

An instance of str, denoting the iname’s name.

tags

An instance of frozenset of pytools.tag.Tag.

Array

class loopy.kernel.array.ArrayDimImplementationTag(*args, **kwargs)[source]
class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]
target_axis

For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.

Note that “normal” arrays only have one target_axis.

layout_nesting_level

For determining the stride of ComputedStrideArrayDimTag, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a single ArrayBase.dim_tags. The lowest nesting level varies fastest when viewed in linear memory.

May be None on FixedStrideArrayDimTag, in which case no ComputedStrideArrayDimTag instances may occur.

class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]

An arg dimension implementation tag for a fixed (potentially symbolic) stride.

stride

May be one of the following:

The stride is given in units of ArrayBase.dtype.

class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]
pad_to

ArrayBase.dtype granularity to which to pad this dimension

This type of stride arg dim gets converted to FixedStrideArrayDimTag on input to ArrayBase subclasses.

class loopy.kernel.array.SeparateArrayArrayDimTag(*args, **kwargs)[source]
class loopy.kernel.array.VectorArrayDimTag(*args, **kwargs)[source]
loopy.kernel.array.parse_array_dim_tags(dim_tags, n_axes=None, use_increasing_target_axes=False, dim_names=None)[source]

Checks

loopy.check.check_for_integer_subscript_indices(t_unit)[source]

Checks if every array access is of type int.

loopy.check.check_for_duplicate_insn_ids(knl)[source]

Check if multiple instructions of knl have the same loopy.InstructionBase.id.

loopy.check.check_for_double_use_of_hw_axes(t_unit)[source]

Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.

loopy.check.check_insn_attributes(kernel)[source]

Check for legality of attributes of every instruction in kernel.

loopy.check.check_loop_priority_inames_known(kernel)[source]

Checks if the inames in loopy.LoopKernel.loop_priority are part of the kernel’s domain.

loopy.check.check_multiple_tags_allowed(kernel)[source]

Checks if a multiple tags of an iname are compatible.

loopy.check.check_for_inactive_iname_access(kernel)[source]

Check if any instruction accesses an iname but is not within it.

loopy.check.check_for_unused_inames(kernel)[source]

Check if there are any unused inames in the kernel.

loopy.check.check_for_write_races(kernel)[source]

Check if any memory accesses lead to write races.

loopy.check.check_for_data_dependent_parallel_bounds(kernel)[source]

Check that inames tagged as hw axes have bounds that are known at kernel launch.

loopy.check.check_bounds(t_unit)[source]

Performs out-of-bound check for every array access.

loopy.check.check_variable_access_ordered(kernel)[source]

Checks that between each write to a variable and all other accesses to the variable there is either:

Schedule

class loopy.schedule.ScheduleItem[source]
class loopy.schedule.BeginBlockItem[source]
class loopy.schedule.EndBlockItem[source]
class loopy.schedule.CallKernel(kernel_name: 'str')[source]
class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]
comment

A plain-text comment explaining why the barrier was inserted.

synchronization_kind

"local" or "global"

mem_kind

"local" or "global"

originating_insn_id
class loopy.schedule.RunInstruction(insn_id: 'str')[source]
class loopy.schedule.MinRecursionLimitForScheduling(kernel)[source]