Reference: Documentation for Internal API#
Targets#
See also Targets.
- class loopy.target.c.POD(ast_builder, dtype, name)[source]#
A simple declarator: The type is given as a
numpy.dtype
and the name is given as a string.
- class loopy.target.c.ScopingBlock(contents=None)[source]#
A block that is mandatory for scoping and may not be simplified away by
loopy.codegen.result.merge_codegen_results()
.
- class loopy.target.c.codegen.expression.ExpressionToCExpressionMapper(codegen_state, fortran_abi=False, type_inf_mapper=None)[source]#
Mapper that converts a loopy-semantic expression to a C-semantic expression with typecasts, appropriate arithmetic semantic mapping, etc.
Note
All mapper methods take in an extra argument called type_context. The purpose of type_context is to inform the method about the expected type for untyped expressions such as python scalars. The type of the expressions takes precedence over type_context.
Symbolic#
See also Expressions.
Loopy-specific expression types#
- class loopy.symbolic.Literal(s)[source]#
A literal to be used during code generation.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.ArrayLiteral(children)[source]#
An array literal.
Note
Only used in the output of
loopy.target.c.codegen.expression.ExpressionToCExpressionMapper
(and similar mappers). Not for use in Loopy source representation.
- class loopy.symbolic.TypedCSE(child, prefix=None, dtype=None)[source]#
A
pymbolic.primitives.CommonSubexpression
annotated with anumpy.dtype
.
- class loopy.symbolic.TypeCast(type, child)[source]#
Only defined for numerical types with semantics matching
numpy.ndarray.astype()
.- child#
The expression to be cast.
- class loopy.symbolic.TaggedVariable(name, tags)[source]#
This is an identifier with tags, such as
matrix$one
, where ‘one’ identifies this specific use of the identifier. This mechanism may then be used to address these uses–such as by prefetching only accesses tagged a certain way.- tags#
A
frozenset
of subclasses ofpytools.tag.Tag
used to provide metadata on this object. Legacy string tags are converted toLegacyStringInstructionTag
or, if they used to carry a functional meaning, the tag carrying that same fucntional meaning (e.g.UseStreamingStoreTag
).
Inherits from
pymbolic.primitives.Variable
andpytools.tag.Taggable
.
- class loopy.symbolic.Reduction(operation, inames, expr, allow_simultaneous=False)[source]#
Represents a reduction operation on
expr
acrossinames
.- operation#
- an instance of :class:`loopy.library.reduction.ReductionOperation`
- expr#
An expression which may have tuple type. If the expression has tuple type, it must be one of the following: * a
tuple
ofpymbolic.primitives.Expression
, or * aloopy.symbolic.Reduction
, or * a function call or substitution rule invocation.
- class loopy.symbolic.LinearSubscript(aggregate, index)[source]#
Represents a linear index into a multi-dimensional array, completely ignoring any multi-dimensional layout.
- class loopy.symbolic.RuleArgument(index)[source]#
Represents a (numbered) argument of a
loopy.SubstitutionRule
. Only used internally in the rule-aware mappers to match subst rules independently of argument names.
- class loopy.symbolic.ExpansionState(kernel, instruction, stack, arg_context)[source]#
- kernel#
- instruction#
- stack#
a tuple representing the current expansion stack, as a tuple of (name, tag) pairs.
- arg_context#
a dict representing current argument values
- class loopy.symbolic.RuleAwareIdentityMapper(rule_mapping_context)[source]#
Note: the third argument dragged around by this mapper is the current
ExpansionState
.Subclasses of this must be careful to not touch identifiers that are in
ExpansionState.arg_context
.
- class loopy.symbolic.ResolvedFunction(function)[source]#
A function identifier whose definition is known in a
loopy
program. A function is said to be known in aTranslationUnit
if its name maps to anInKernelCallable
inloopy.TranslationUnit.callables_table
. Refer to Function Interface.- function#
An instance of
pymbolic.primitives.Variable
orloopy.library.reduction.ReductionOpFunction
.
- class loopy.symbolic.SubArrayRef(swept_inames, subscript)[source]#
An algebraic expression to map an affine memory layout pattern (known as sub-arary) as consecutive elements of the sweeping axes which are defined using
SubArrayRef.swept_inames
.- swept_inames#
An instance of
tuple
denoting the axes to which the sub array is supposed to be mapped to.
- subscript#
An instance of
pymbolic.primitives.Subscript
denoting the array in the kernel.
Expression Manipulation Helpers#
- loopy.symbolic.simplify_using_aff(kernel, expr)[source]#
Simplifies expr on kernel’s domain.
- Parameters:
expr – An instance of
pymbolic.primitives.Expression
.
Types#
DTypes of variables in a loopy.LoopKernel
must be picklable, so in
the codegen pipeline user-provided types are converted to
loopy.types.LoopyType
.
- class loopy.types.LoopyType[source]#
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
.
- class loopy.types.AtomicType[source]#
Abstract class for dtypes of variables encountered in a
loopy.LoopKernel
on which atomic operations are performed .
Codegen#
- class loopy.codegen.PreambleInfo(kernel: loopy.kernel.LoopKernel, seen_dtypes: Set[loopy.types.LoopyType], seen_functions: Set[loopy.codegen.SeenFunction], seen_atomic_dtypes: Set[loopy.types.LoopyType], codegen_state: loopy.codegen.CodeGenerationState)[source]#
- class loopy.codegen.VectorizationInfo(iname: str, length: int, space: Space)[source]#
- iname#
- length#
- space#
- class loopy.codegen.SeenFunction(name: str, c_name: str, arg_dtypes: Tuple[LoopyType, ...], result_dtypes: Tuple[LoopyType, ...])[source]#
This is used to track functions that emerge late during code generation, e.g. C functions to realize arithmetic. No connection with
InKernelCallable
.- name#
- c_name#
- arg_dtypes#
a tuple of arg dtypes
- result_dtypes#
a tuple of result dtypes
- class loopy.codegen.CodeGenerationState(kernel: LoopKernel, target: TargetBase, implemented_domain: Set, implemented_predicates: FrozenSet[str | int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], seen_dtypes: Set[LoopyType], seen_functions: Set[SeenFunction], seen_atomic_dtypes: Set[LoopyType], var_subst_map: Map[str, int | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | float | complex | float32 | float64 | complex64 | complex128 | Expression], allow_complex: bool, callables_table: Mapping[str, InKernelCallable], is_entrypoint: bool, var_name_generator: UniqueNameGenerator, is_generating_device_code: bool, gen_program_name: str, schedule_index_end: int, codegen_cachemanager: CodegenOperationCacheManager, vectorization_info: VectorizationInfo | None = None)[source]#
- kernel#
- target#
- implemented_domain#
The entire implemented domain (as an
islpy.Set
) i.e. all constraints that have been enforced so far.
- seen_dtypes#
set of dtypes that were encountered
- seen_functions#
set of
SeenFunction
instances
- seen_atomic_dtypes#
- var_subst_map#
- allow_complex#
- vectorization_info#
None (to mean vectorization has not yet been applied), or an instance of
VectorizationInfo
.
- is_generating_device_code#
- gen_program_name#
None (indicating that host code is being generated) or the name of the device program currently being generated.
- schedule_index_end#
- callables_table#
A mapping from callable names to instances of
loopy.kernel.function_interface.InKernelCallable
.
- codegen_cache_manager#
An instance of
loopy.codegen.tools.CodegenOperationCacheManager
.
- class loopy.codegen.TranslationUnitCodeGenerationResult(host_programs: Mapping[str, GeneratedProgram], device_programs: Sequence[GeneratedProgram], host_preambles: Sequence[Tuple[int, str]] = (), device_preambles: Sequence[Tuple[int, str]] = ())[source]#
- host_program#
A mapping from names of entrypoints to their host
GeneratedProgram
.
- device_programs#
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles#
- device_preambles#
- class loopy.codegen.result.GeneratedProgram(name: str, is_device_program: bool, ast: Any, body_ast: Any | None = None)[source]#
- name#
- is_device_program#
- ast#
Once generated, this captures the AST of the overall function definition, including the body.
- body_ast#
Once generated, this captures the AST of the operative function body (including declaration of necessary temporaries), but not the overall function definition.
- class loopy.codegen.result.CodeGenerationResult(host_program: GeneratedProgram | None, device_programs: Sequence[GeneratedProgram], implemented_domains: Mapping[str, Set], host_preambles: Sequence[Tuple[str, str]] = (), device_preambles: Sequence[Tuple[str, str]] = ())[source]#
- host_program#
- device_programs#
A list of
GeneratedProgram
instances intended to run on the compute device.
- host_preambles#
- device_preambles#
- loopy.codegen.result.merge_codegen_results(codegen_state: CodeGenerationState, elements: Sequence[CodeGenerationResult | Any], collapse=True) CodeGenerationResult [source]#
- class loopy.codegen.tools.KernelProxyForCodegenOperationCacheManager(instructions: List[InstructionBase], linearization: List[ScheduleItem], inames: Dict[str, Iname])[source]#
Proxy to
loopy.LoopKernel
to be used byCodegenOperationCacheManager
.
- class loopy.codegen.tools.CodegenOperationCacheManager(kernel_proxy)[source]#
Caches operations arising during the codegen pipeline.
- kernel_proxy#
An instance of
KernelProxyForCodegenOperationCacheManager
.
- with_kernel(kernel)[source]#
Returns a new instance of
CodegenOperationCacheManager
corresponding to kernel if the cached variables in self would be invalid for kernel, else returns self.
- get_concurrent_inames_in_a_callkernel(callkernel_index: int) FrozenSet[str] [source]#
Returns a
frozenset
of concurrent inames in a callkernel- Parameters:
callkernel_index – Index of the
loopy.schedule.CallKernel
in theCodegenOperationCacheManager.kernel_proxy
’s schedule, whose parallel inames are to be found.
Reduction Operation#
Iname Tags#
- loopy.kernel.data.filter_iname_tags_by_type(tags, tag_type, max_num=None, min_num=None)[source]#
Return a subset of tags that matches type tag_type. Raises exception if the number of tags found were greater than max_num or less than min_num.
- Parameters:
tags – An iterable of tags.
tag_type – a subclass of
loopy.kernel.data.InameImplementationTag
.max_num – the maximum number of tags expected to be found.
min_num – the minimum number of tags expected to be found.
- class loopy.kernel.data.Iname(name: str, tags: FrozenSet[Tag])[source]#
Records an iname in a
LoopKernel
. See Loop Domain Forest for semantics of inames inloopy
.This class records the metadata attached to an iname as instances of :class:pytools.tag.Tag`. A tag maybe a builtin tag like
loopy.kernel.data.InameImplementationTag
or a user-defined custom tag. Custom tags may be attached to inames to be used in targeting later during transformations.- tags#
An instance of
frozenset
ofpytools.tag.Tag
.
Array#
- class loopy.kernel.array._StrideArrayDimTagBase(*args, **kwargs)[source]#
- target_axis#
For objects (such as images) with more than one axis, target_axis sets which of these indices is being targeted by this dimension. Note that there may be multiple dim_tags with the same target_axis, their contributions are combined additively.
Note that “normal” arrays only have one target_axis.
- layout_nesting_level#
For determining the stride of
ComputedStrideArrayDimTag
, this determines the layout nesting level of this axis. This must be a contiguous sequence of unique integers starting at 0 in a singleArrayBase.dim_tags
. The lowest nesting level varies fastest when viewed in linear memory.May be None on
FixedStrideArrayDimTag
, in which case noComputedStrideArrayDimTag
instances may occur.
- class loopy.kernel.array.FixedStrideArrayDimTag(stride, target_axis=0, layout_nesting_level=None)[source]#
An arg dimension implementation tag for a fixed (potentially symbolic) stride.
- stride#
May be one of the following:
A
pymbolic.primitives.Expression
, including an integer, indicating the stride in units of the underlying array’sArrayBase.dtype
.loopy.auto
, indicating that a new kernel argument for this stride should automatically be created.
The stride is given in units of
ArrayBase.dtype
.
- class loopy.kernel.array.ComputedStrideArrayDimTag(layout_nesting_level, pad_to=None, target_axis=0)[source]#
- pad_to#
ArrayBase.dtype
granularity to which to pad this dimension
This type of stride arg dim gets converted to
FixedStrideArrayDimTag
on input toArrayBase
subclasses.
Checks#
- loopy.check.check_for_integer_subscript_indices(t_unit)[source]#
Checks if every array access is of type
int
.
- loopy.check.check_for_duplicate_insn_ids(knl)[source]#
Check if multiple instructions of knl have the same
loopy.InstructionBase.id
.
- loopy.check.check_for_double_use_of_hw_axes(t_unit)[source]#
Check if any instruction of kernel is within multiple inames tagged with the same hw axis tag.
- loopy.check.check_insn_attributes(kernel)[source]#
Check for legality of attributes of every instruction in kernel.
- loopy.check.check_loop_priority_inames_known(kernel)[source]#
Checks if the inames in
loopy.LoopKernel.loop_priority
are part of the kernel’s domain.
- loopy.check.check_multiple_tags_allowed(kernel)[source]#
Checks if a multiple tags of an iname are compatible.
- loopy.check.check_for_inactive_iname_access(kernel)[source]#
Check if any instruction accesses an iname but is not within it.
- loopy.check.check_for_unused_inames(kernel)[source]#
Check if there are any unused inames in the kernel.
- loopy.check.check_for_write_races(kernel)[source]#
Check if any memory accesses lead to write races.
- loopy.check.check_for_data_dependent_parallel_bounds(kernel)[source]#
Check that inames tagged as hw axes have bounds that are known at kernel launch.
- loopy.check.check_variable_access_ordered(kernel)[source]#
Checks that between each write to a variable and all other accesses to the variable there is either:
a direct/indirect depdendency edge, or
an explicit statement that no ordering is necessary (expressed through a bi-directional
loopy.InstructionBase.no_sync_with
)