Reference: Documentation for Internal API¶
Targets¶
See also Targets.
- class loopy.target.c.POD(ast_builder, dtype, name)[source]¶
A simple declarator: The type is given as a
numpy.dtype
and the name is given as a string.
- class loopy.target.c.ScopingBlock(contents=None)[source]¶
A block that is mandatory for scoping and may not be simplified away by
loopy.codegen.result.merge_codegen_results()
.
- class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]¶
Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.
Note
All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.
Symbolic¶
See also Expressions.
Loopy-specific expression types¶
- class loopy.symbolic.Literal(s)[source]¶
A literal to be used during code generation.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.ArrayLiteral(children)[source]¶
An array literal.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.TypedCSE(child, prefix=None, dtype=None)[source]¶
A
pymbolic.primitives.CommonSubexpression
annotated with anumpy.dtype
.
- class loopy.symbolic.TypeCast(type, child)[source]¶
Only defined for numerical types with semantics matching
numpy.ndarray.astype()
.- child¶
The expression to be cast.
- class loopy.symbolic.TaggedVariable(name, tags)[source]¶
This is an identifier with tags, such as
matrix$one
, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.- tags¶
A
frozenset
of subclasses ofpytools.tag.Tag
used to provide metadata on this object. Legacy string tags are converted toLegacyStringInstructionTag
or, if they used to carry a functional meaning, the tag carrying that same fucntional meaning (e.g.UseStreamingStoreTag
).
Inherits from
pymbolic.primitives.Variable
andpytools.tag.Taggable
.
- class loopy.symbolic.Reduction(operation, inames, expr, allow_simultaneous=False)[source]¶
Represents a reduction operation on
expr
acrossinames
.- operation¶
- an instance of :class:`loopy.library.reduction.ReductionOperation`
- expr¶
An expression which may have tuple type. If the expression has tuple type, it must be one of the following: * a
tuple
ofpymbolic.primitives.Expression
, or * aloopy.symbolic.Reduction
, or * a function call or substitution rule invocation.
- class loopy.symbolic.LinearSubscript(aggregate, index)[source]¶
Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.
- class loopy.symbolic.RuleArgument(index)[source]¶
Represents a (numbered) argument of a
loopy.SubstitutionRule
. Only used internally in the rule-aware mappers to match subst rules independently of argument names.
- class loopy.symbolic.ExpansionState(kernel, instruction, stack, arg_context)[source]¶
- kernel¶
- instruction¶
- stack¶
a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.
- arg_context¶
a dict representing current argument values
- class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context)[source]¶
Note: the third argument dragged around by this mapper is the current
ExpansionState
.Subclasses of this must be careful to not touch identifiers that are in
ExpansionState.arg_context
.
- class loopy.symbolic.ResolvedFunction(function)[source]¶
A function identifier whose definition is known in a
loopy
program. A function is said to be known in aTranslationUnit
if its name maps to anInKernelCallable
inloopy.TranslationUnit.callables_table
. Refer to Function Interface.- function¶
An instance of
pymbolic.primitives.Variable
orloopy.library.reduction.ReductionOpFunction
.
- class loopy.symbolic.SubArrayRef(swept_inames, subscript)[source]¶
An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using
SubArrayRef.swept_inames
.- swept_inames¶
An instance of
tuple
denoting the axes to which the sub array is supposed to be mapped to.
- subscript¶
An instance of
pymbolic.primitives.Subscript
denoting the array in the kernel.
Expression Manipulation Helpers¶
- loopy.symbolic.simplify_using_aff(kernel, expr)[source]¶
Simplifies expr on kernel’s domain.
- Parameters:
expr – An instance of
pymbolic.primitives.Expression
.
Types¶
DTypes of variables in a loopy.LoopKernel
must be picklable, so in
the codegen pipeline user-provided types are converted to
loopy.types.LoopyType
.
- class loopy.types.LoopyType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
.
- class loopy.types.AtomicType[source]¶
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
on which atomic operations are performed .
Codegen¶
- class loopy.codegen.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]¶
- class loopy.codegen.VectorizationInfo(iname: str, length: int, space: Space)[source]¶
- iname¶
- length¶
- space¶
- class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: Tuple[LoopyType, ...], result_dtypes: Tuple[LoopyType, ...])[source]¶
This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with
InKernelCallable
.- name¶
- c_name¶
- arg_dtypes¶
a tuple of arg dtypes
- result_dtypes¶
a tuple of result dtypes
- class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: Set, implemented_predicates: FrozenSet[str | int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], seen_dtypes: Set[LoopyType], seen_functions: Set[SeenFunction], seen_atomic_dtypes: Set[LoopyType], var_subst_map: Map[str, int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], allow_complex: bool, callables_table: Mapping[str | ReductionOpFunction, InKernelCallable], is_entrypoint: bool, var_name_generator: UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cachemanager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]¶
- kernel¶
- target¶
- implemented_domain¶
The entire implemented domain (as an
islpy.Set
) i.e. all constraints that have been enforced so far.
- seen_dtypes¶
set of dtypes that were encountered
- seen_functions¶
set of
SeenFunction
instances
- seen_atomic_dtypes¶
- var_subst_map¶
- allow_complex¶
- vectorization_info¶
None (to mean vectorization has not yet been applied), or an instance of
VectorizationInfo
.
- is_generating_device_code¶
- gen_program_name¶
None (indicating that host code is being generated) or the name of the device program currently being generated.
- schedule_index_end¶
- callables_table¶
A mapping from callable names to instances of
loopy.kernel.function_interface.InKernelCallable
.
- codegen_cache_manager¶
An instance of
loopy.codegen.tools.CodegenOperationCacheManager
.
- class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[Tuple[int, str]] = (), device_preambles: Sequence[Tuple[int, str]] = ())[source]¶
- host_program¶
A mapping from names of entrypoints to their host
GeneratedProgram
.
- device_programs¶
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- class loopy.codegen.result.GeneratedProgram(name: str, is_device_program: bool, ast: Any, body_ast: Any | None = None)[source]¶
- name¶
- is_device_program¶
- ast¶
Once generated, this captures the AST of the overall function definition, including the body.
- body_ast¶
Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.
- class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, Set], host_preambles: Sequence[Tuple[str, str]] = (), device_preambles: Sequence[Tuple[str, str]] = ())[source]¶
- host_program¶
- device_programs¶
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles¶
- device_preambles¶
- loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult [source]¶
- class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: List[InstructionBase], linearization: List[ScheduleItem], inames: Dict[str, Iname])[source]¶
Proxy to
loopy.LoopKernel
to be used byCodegenOperationCacheManager
.
- class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]¶
Caches operations arising during the codegen pipeline.
- kernel_proxy¶
An instance of
KernelProxyForCodegenOperationCacheManager
.
- with_kernel(kernel)[source]¶
Returns a new instance of
CodegenOperationCacheManager
corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.
- get_concurrent_inames_in_a_callkernel(callkernel_index: int) FrozenSet[str] [source]¶
Returns a
frozenset
of concurrent inames in a callkernel- Parameters:
callkernel_index – Index of the
loopy.schedule.CallKernel
in theCodegenOperationCacheManager.kernel_proxy
’s schedule, whose parallel inames are to be found.
Reduction Operation¶
Iname Tags¶
- loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]¶
Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.
- Parameters:
tags – An iterable of tags.
tag_type – a subclass of
loopy.kernel.data.InameImplementationTag
.max_num – the maximum number of tags expected to be found.
min_num – the minimum number of tags expected to be found.
- class loopy.kernel.data.Iname(name: str, tags: FrozenSet[Tag])[source]¶
Records an iname in a
LoopKernel
. See Loop Domain Forest for semantics of inames inloopy
.This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like
loopy.kernel.data.InameImplementationTag
or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.- tags¶
An instance of
frozenset
ofpytools.tag.Tag
.
Array¶
- class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]¶
- target_axis¶
For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.
Note that “normal” arrays only have one target_axis.
- layout_nesting_level¶
For determining the stride of
ComputedStrideArrayDimTag
, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a singleArrayBase.dim_tags
. The lowest nesting level varies fastest when viewed in linear memory.May be None on
FixedStrideArrayDimTag
, in which case noComputedStrideArrayDimTag
instances may occur.
- class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]¶
An arg dimension implementation tag for a fixed (potentially symbolic) stride.
- stride¶
May be one of the following:
A
pymbolic.primitives.Expression
, including an integer, indicating the stride in units of the underlying array’sArrayBase.dtype
.loopy.auto
, indicating that a new kernel argument for this stride should automatically be created.
The stride is given in units of
ArrayBase.dtype
.
- class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]¶
- pad_to¶
ArrayBase.dtype
granularity to which to pad this dimension
This type of stride arg dim gets converted to
FixedStrideArrayDimTag
on input toArrayBase
subclasses.
Checks¶
- loopy.check.check_for_integer_subscript_indices(t_unit)[source]¶
Checks if every array access is of type
int
.
- loopy.check.check_for_duplicate_insn_ids(knl)[source]¶
Check if multiple instructions of knl have the same
loopy.InstructionBase.id
.
- loopy.check.check_for_double_use_of_hw_axes(t_unit)[source]¶
Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.
- loopy.check.check_insn_attributes(kernel)[source]¶
Check for legality of attributes of every instruction in kernel.
- loopy.check.check_loop_priority_inames_known(kernel)[source]¶
Checks if the inames in
loopy.LoopKernel.loop_priority
are part of the kernel’s domain.
- loopy.check.check_multiple_tags_allowed(kernel)[source]¶
Checks if a multiple tags of an iname are compatible.
- loopy.check.check_for_inactive_iname_access(kernel)[source]¶
Check if any instruction accesses an iname but is not within it.
- loopy.check.check_for_unused_inames(kernel)[source]¶
Check if there are any unused inames in the kernel.
- loopy.check.check_for_write_races(kernel)[source]¶
Check if any memory accesses lead to write races.
- loopy.check.check_for_data_dependent_parallel_bounds(kernel)[source]¶
Check that inames tagged as hw axes have bounds that are known at kernel launch.
- loopy.check.check_variable_access_ordered(kernel)[source]¶
Checks that between each write to a variable and all other accesses to the variable there is either:
a direct/indirect depdendency edge, or
an explicit statement that no ordering is necessary (expressed through a bi-directional
loopy.InstructionBase.no_sync_with
)