Reference: Documentation for Internal API#

Targets#

See also Targets.

class loopy.target.c.POD(ast_builder, dtype, name)[source]#

A simple declarator: The type is given as a numpy.dtype and the name is given as a string.

class loopy.target.c.ScopingBlock(contents=None)[source]#

A block that is mandatory for scoping and may not be simplified away by loopy.codegen.result.merge_codegen_results().

class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]#

Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.

Note

  • All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.

Symbolic#

See also Expressions.

Loopy-specific expression types#

class loopy.symbolic.Literal(s)[source]#

A literal to be used during code generation.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.ArrayLiteral(children)[source]#

An array literal.

Note

Only used in the output of loopy.target.c.codegen.expression.ExpressionToCExpressionMapper (and similar mappers). Not for use in Loopy source representation.

class loopy.symbolic.FunctionIdentifier[source]#

A base class for symbols representing functions.

class loopy.symbolic.TypedCSE(child, prefix=None, dtype=None)[source]#

A pymbolic.primitives.CommonSubexpression annotated with a numpy.dtype.

class loopy.symbolic.TypeCast(type, child)[source]#

Only defined for numerical types with semantics matching numpy.ndarray.astype().

child#

The expression to be cast.

class loopy.symbolic.TaggedVariable(name, tags)[source]#

This is an identifier with tags, such as matrix$one, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.

tags#

A frozenset of subclasses of pytools.tag.Tag used to provide metadata on this object. Legacy string tags are converted to LegacyStringInstructionTag or, if they used to carry a functional meaning, the tag carrying that same fucntional meaning (e.g. UseStreamingStoreTag).

Inherits from pymbolic.primitives.Variable and pytools.tag.Taggable.

class loopy.symbolic.Reduction(operation, inames, expr, allow_simultaneous=False)[source]#

Represents a reduction operation on expr across inames.

operation#
an instance of :class:`loopy.library.reduction.ReductionOperation`
inames#

a list of inames across which reduction on expr is being carried out.

expr#

An expression which may have tuple type. If the expression has tuple type, it must be one of the following: * a tuple of pymbolic.primitives.Expression, or * a loopy.symbolic.Reduction, or * a function call or substitution rule invocation.

allow_simultaneous#

A bool. If not True, an iname is allowed to be used in precisely one reduction, to avoid mis-nesting errors.

class loopy.symbolic.LinearSubscript(aggregate, index)[source]#

Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.

class loopy.symbolic.RuleArgument(index)[source]#

Represents a (numbered) argument of a loopy.SubstitutionRule. Only used internally in the rule-aware mappers to match subst rules independently of argument names.

class loopy.symbolic.ExpansionState(kernel, instruction, stack, arg_context)[source]#
kernel#
instruction#
stack#

a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.

arg_context#

a dict representing current argument values

class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context)[source]#

Note: the third argument dragged around by this mapper is the current ExpansionState.

Subclasses of this must be careful to not touch identifiers that are in ExpansionState.arg_context.

class loopy.symbolic.ResolvedFunction(function)[source]#

A function identifier whose definition is known in a loopy program. A function is said to be known in a TranslationUnit if its name maps to an InKernelCallable in loopy.TranslationUnit.callables_table. Refer to Function Interface.

function#

An instance of pymbolic.primitives.Variable or loopy.library.reduction.ReductionOpFunction.

class loopy.symbolic.SubArrayRef(swept_inames, subscript)[source]#

An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using SubArrayRef.swept_inames.

swept_inames#

An instance of tuple denoting the axes to which the sub array is supposed to be mapped to.

subscript#

An instance of pymbolic.primitives.Subscript denoting the array in the kernel.

is_equal(other)[source]#

Returns True iff the sub-array refs have identical expressions.

Expression Manipulation Helpers#

loopy.symbolic.simplify_using_aff(kernel, expr)[source]#

Simplifies expr on kernel’s domain.

Parameters:

expr – An instance of pymbolic.primitives.Expression.

Types#

DTypes of variables in a loopy.LoopKernel must be picklable, so in the codegen pipeline user-provided types are converted to loopy.types.LoopyType.

class loopy.types.LoopyType[source]#

Abstract class for dtypes of variables encountered in a loopy.LoopKernel.

class loopy.types.NumpyType(dtype: dtype)[source]#
class loopy.types.AtomicType[source]#

Abstract class for dtypes of variables encountered in a loopy.LoopKernel on which atomic operations are performed .

class loopy.types.AtomicNumpyType(dtype: dtype)[source]#

A dtype wrapper that indicates that the described type should be capable of atomic operations.

Codegen#

class loopy.codegen.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]#
class loopy.codegen.VectorizationInfo(iname: str, length: int, space: Space)[source]#
iname#
length#
space#
class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: Tuple[LoopyType, ...], result_dtypes: Tuple[LoopyType, ...])[source]#

This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with InKernelCallable.

name#
c_name#
arg_dtypes#

a tuple of arg dtypes

result_dtypes#

a tuple of result dtypes

class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: Set, implemented_predicates: FrozenSet[str | int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], seen_dtypes: Set[LoopyType], seen_functions: Set[SeenFunction], seen_atomic_dtypes: Set[LoopyType], var_subst_map: Map[str, int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], allow_complex: bool, callables_table: Mapping[str, InKernelCallable], is_entrypoint: bool, var_name_generator: UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cachemanager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]#
kernel#
target#
implemented_domain#

The entire implemented domain (as an islpy.Set) i.e. all constraints that have been enforced so far.

implemented_predicates#

A frozenset of predicates for which checks have been implemented.

seen_dtypes#

set of dtypes that were encountered

seen_functions#

set of SeenFunction instances

seen_atomic_dtypes#
var_subst_map#
allow_complex#
vectorization_info#

None (to mean vectorization has not yet been applied), or an instance of VectorizationInfo.

is_generating_device_code#
gen_program_name#

None (indicating that host code is being generated) or the name of the device program currently being generated.

schedule_index_end#
callables_table#

A mapping from callable names to instances of loopy.kernel.function_interface.InKernelCallable.

is_entrypoint#

A bool to indicate if the code is being generated for an entrypoint kernel

codegen_cache_manager#

An instance of loopy.codegen.tools.CodegenOperationCacheManager.

class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[Tuple[int, str]] = (), device_preambles: Sequence[Tuple[int, str]] = ())[source]#
host_program#

A mapping from names of entrypoints to their host GeneratedProgram.

device_programs#

A list of GeneratedProgram instances intended to run on the compute device.

host_preambles#
device_preambles#
host_code()[source]#
device_code()[source]#
all_code()[source]#
class loopy.codegen.result.GeneratedProgram(name: str, is_device_program: bool, ast: Any, body_ast: Any | None = None)[source]#
name#
is_device_program#
ast#

Once generated, this captures the AST of the overall function definition, including the body.

body_ast#

Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.

class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, Set], host_preambles: Sequence[Tuple[str, str]] = (), device_preambles: Sequence[Tuple[str, str]] = ())[source]#
host_program#
device_programs#

A list of GeneratedProgram instances intended to run on the compute device.

implemented_domains#

A mapping from instruction ID to a list of islpy.Set objects.

host_preambles#
device_preambles#
host_code()[source]#
device_code()[source]#
all_code()[source]#
loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult[source]#
loopy.codegen.result.generate_host_or_device_program(codegen_state, schedule_index)[source]#
class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: List[InstructionBase], linearization: List[ScheduleItem], inames: Dict[str, Iname])[source]#

Proxy to loopy.LoopKernel to be used by CodegenOperationCacheManager.

class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]#

Caches operations arising during the codegen pipeline.

kernel_proxy#

An instance of KernelProxyForCodegenOperationCacheManager.

with_kernel(kernel)[source]#

Returns a new instance of CodegenOperationCacheManager corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.

get_concurrent_inames_in_a_callkernel(callkernel_index: int) FrozenSet[str][source]#

Returns a frozenset of concurrent inames in a callkernel

Parameters:

callkernel_index – Index of the loopy.schedule.CallKernel in the CodegenOperationCacheManager.kernel_proxy’s schedule, whose parallel inames are to be found.

Reduction Operation#

class loopy.library.reduction.ReductionOperation[source]#

Subclasses of this type have to be hashable, picklable, and equality-comparable.

class loopy.library.reduction.ScalarReductionOperation[source]#
class loopy.library.reduction.SumReductionOperation[source]#
class loopy.library.reduction.ProductReductionOperation[source]#
class loopy.library.reduction.MaxReductionOperation[source]#
class loopy.library.reduction.MinReductionOperation[source]#
class loopy.library.reduction.ReductionOpFunction(reduction_op)[source]#

Iname Tags#

loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]#

Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.

Parameters:
  • tags – An iterable of tags.

  • tag_type – a subclass of loopy.kernel.data.InameImplementationTag.

  • max_num – the maximum number of tags expected to be found.

  • min_num – the minimum number of tags expected to be found.

class loopy.kernel.data.InameImplementationTag(*args, **kwargs)[source]#
class loopy.kernel.data.ConcurrentTag(*args, **kwargs)[source]#
class loopy.kernel.data.UniqueInameTag(*args, **kwargs)[source]#
class loopy.kernel.data.AxisTag(axis)[source]#
class loopy.kernel.data.LocalInameTag(axis)[source]#
class loopy.kernel.data.GroupInameTag(axis)[source]#
class loopy.kernel.data.VectorizeTag(*args, **kwargs)[source]#
class loopy.kernel.data.UnrollTag(*args, **kwargs)[source]#
class loopy.kernel.data.Iname(name: str, tags: FrozenSet[Tag])[source]#

Records an iname in a LoopKernel. See Loop Domain Forest for semantics of inames in loopy.

This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like loopy.kernel.data.InameImplementationTag or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.

name#

An instance of str, denoting the iname’s name.

tags#

An instance of frozenset of pytools.tag.Tag.

Array#

class loopy.kernel.array.ArrayDimImplementationTag(*args, **kwargs)[source]#
class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]#
target_axis#

For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.

Note that “normal” arrays only have one target_axis.

layout_nesting_level#

For determining the stride of ComputedStrideArrayDimTag, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a single ArrayBase.dim_tags. The lowest nesting level varies fastest when viewed in linear memory.

May be None on FixedStrideArrayDimTag, in which case no ComputedStrideArrayDimTag instances may occur.

class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]#

An arg dimension implementation tag for a fixed (potentially symbolic) stride.

stride#

May be one of the following:

The stride is given in units of ArrayBase.dtype.

class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]#
pad_to#

ArrayBase.dtype granularity to which to pad this dimension

This type of stride arg dim gets converted to FixedStrideArrayDimTag on input to ArrayBase subclasses.

class loopy.kernel.array.SeparateArrayArrayDimTag(*args, **kwargs)[source]#
class loopy.kernel.array.VectorArrayDimTag(*args, **kwargs)[source]#
loopy.kernel.array.parse_array_dim_tags(dim_tags, n_axes=None, use_increasing_target_axes=False, dim_names=None)[source]#

Checks#

loopy.check.check_for_integer_subscript_indices(t_unit)[source]#

Checks if every array access is of type int.

loopy.check.check_for_duplicate_insn_ids(knl)[source]#

Check if multiple instructions of knl have the same loopy.InstructionBase.id.

loopy.check.check_for_double_use_of_hw_axes(t_unit)[source]#

Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.

loopy.check.check_insn_attributes(kernel)[source]#

Check for legality of attributes of every instruction in kernel.

loopy.check.check_loop_priority_inames_known(kernel)[source]#

Checks if the inames in loopy.LoopKernel.loop_priority are part of the kernel’s domain.

loopy.check.check_multiple_tags_allowed(kernel)[source]#

Checks if a multiple tags of an iname are compatible.

loopy.check.check_for_inactive_iname_access(kernel)[source]#

Check if any instruction accesses an iname but is not within it.

loopy.check.check_for_unused_inames(kernel)[source]#

Check if there are any unused inames in the kernel.

loopy.check.check_for_write_races(kernel)[source]#

Check if any memory accesses lead to write races.

loopy.check.check_for_data_dependent_parallel_bounds(kernel)[source]#

Check that inames tagged as hw axes have bounds that are known at kernel launch.

loopy.check.check_bounds(t_unit)[source]#

Performs out-of-bound check for every array access.

loopy.check.check_variable_access_ordered(kernel)[source]#

Checks that between each write to a variable and all other accesses to the variable there is either:

Schedule#

class loopy.schedule.ScheduleItem[source]#
class loopy.schedule.BeginBlockItem[source]#
class loopy.schedule.EndBlockItem[source]#
class loopy.schedule.CallKernel(kernel_name: 'str')[source]#
class loopy.schedule.Barrier(comment: str, synchronization_kind: str, mem_kind: str, originating_insn_id: str)[source]#
comment#

A plain-text comment explaining why the barrier was inserted.

synchronization_kind#

"local" or "global"

mem_kind#

"local" or "global"

originating_insn_id#
class loopy.schedule.RunInstruction(insn_id: 'str')[source]#
class loopy.schedule.MinRecursionLimitForScheduling(kernel)[source]#