Function Interface¶
Resolving and specialization¶
In loopy, a loopy.TranslationUnit is a collection of callables
and entrypoints. Callables are of type
loopy.kernel.function_interface.InKernelCallable. Functions start life
as simple pymbolic.primitives.Call nodes. Call resolution turns the function
identifiers in those calls into ResolvedFunction objects.
Each resolved function has an entry in TranslationUnit.callables_table.
The process of realizing a function as a
InKernelCallable is referred to as
resolving.
During code generation for a TranslationUnit, a (resolved) callable
is specialized depending on the types and shapes of the arguments passed at a
call site. For example, a call to sin(x) in loopy is type-generic to
begin with, but it later specialized to either sinf, sin or sinl
depending on the type of its argument x. A callable’s behavior during type
or shape specialization is encoded via
with_types() and
with_descrs().
Registering callables¶
A user can register callables within a TranslationUnit to
allow loopy to resolve calls not pre-defined in loopy. In loopy,
we typically aim to expose all the standard math functions defined for
a TargetBase. Other foreign functions could be invoked by
registering them.
An example demonstrating registering a CBlasGemv as a loopy callable:
import numpy as np
from constantdict import constantdict
import loopy as lp
from loopy.diagnostic import LoopyError
from loopy.target.c import CTarget
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2 # noqa: F401
# {{{ blas callable
class CBLASGEMV(lp.ScalarCallable):
def with_types(self, arg_id_to_dtype, callables_table):
mat_dtype = arg_id_to_dtype.get(0)
vec_dtype = arg_id_to_dtype.get(1)
if mat_dtype is None or vec_dtype is None:
# types aren't specialized enough to be resolved
return self, callables_table
if mat_dtype != vec_dtype:
raise LoopyError("GEMV requires same dtypes for matrix and "
"vector")
if vec_dtype.numpy_dtype == np.float32:
name_in_target = "cblas_sgemv"
elif vec_dtype. numpy_dtype == np.float64:
name_in_target = "cblas_dgemv"
else:
raise LoopyError("GEMV is only supported for float32 and float64 "
"types")
return (self.copy(name_in_target=name_in_target,
arg_id_to_dtype=constantdict({
0: vec_dtype,
1: vec_dtype,
-1: vec_dtype})),
callables_table)
def with_descrs(self, arg_id_to_descr, callables_table):
mat_descr = arg_id_to_descr.get(0)
vec_descr = arg_id_to_descr.get(1)
res_descr = arg_id_to_descr.get(-1)
if mat_descr is None or vec_descr is None or res_descr is None:
# shapes aren't specialized enough to be resolved
return self, callables_table
assert mat_descr.shape[1] == vec_descr.shape[0]
assert mat_descr.shape[0] == res_descr.shape[0]
assert len(vec_descr.shape) == len(res_descr.shape) == 1
# handling only the easy case when stride == 1
assert vec_descr.dim_tags[0].stride == 1
assert mat_descr.dim_tags[1].stride == 1
assert res_descr.dim_tags[0].stride == 1
return self.copy(arg_id_to_descr=arg_id_to_descr), callables_table
def emit_call_insn(self, insn, target, expression_to_code_mapper):
from pymbolic import var
mat_descr = self.arg_id_to_descr[0]
m, n = mat_descr.shape
ecm = expression_to_code_mapper
mat, vec = insn.expression.parameters
result, = insn.assignees
c_parameters = [var("CblasRowMajor"),
var("CblasNoTrans"),
m, n,
1,
ecm(mat).expr,
1,
ecm(vec).expr,
1,
ecm(result).expr,
1]
return (var(self.name_in_target)(*c_parameters),
False # cblas_gemv does not return anything
)
def generate_preambles(self, target):
assert isinstance(target, CTarget)
yield ("99_cblas", "#include <cblas.h>")
return
# }}}
n = 10
knl = lp.make_kernel(
"{:}",
"""
y[:] = gemv(A[:, :], x[:])
""", [
lp.GlobalArg("A", dtype=np.float64, shape=(n, n)),
lp.GlobalArg("x", dtype=np.float64, shape=(n, )),
lp.GlobalArg("y", shape=(n, )), ...],
target=CTarget())
knl = lp.register_callable(knl, "gemv", CBLASGEMV(name="gemv"))
print(lp.generate_code_v2(knl).device_code())
Call Instruction for a kernel call¶
At a call-site involving a call to a loopy.LoopKernel, the arguments to
the call must be ordered by the order of input arguments of the callee kernel.
Similarly, the assignees must be ordered by the order of callee kernel’s output
arguments. Since a KernelArgument can be both an
input and an output, such arguments would be a part of the call instruction’s
assignees as well as the call expression node’s parameters.
Entry points¶
Only callables in loopy.TranslationUnit.entrypoints can be called from
the outside. All other callables are only visible from within the translation
unit, similar to C’s static functions.
Reference¶
- class loopy.kernel.function_interface.GeneratedExpression(*args, **kwargs)[source]¶
- expr: Expression¶
- class loopy.kernel.function_interface.AbstractExpressionToCodeMapper(*args, **kwargs)[source]¶
- infer_type(expr: Expression) LoopyType[source]¶
- __call__(expr: Expression, prec: int | None = None, type_context: str | None = None, needed_dtype: LoopyType | None = None) GeneratedExpression[source]¶
Call self as a function.
- class loopy.kernel.function_interface.ArrayArgDescriptor(shape: ShapeType | None, address_space: AddressSpace, dim_tags: Sequence[ArrayDimImplementationTag] | None)[source]¶
Records information about an array argument to an in-kernel callable.
To be passed to and returned from
with_descrs(), and used for matching shape and address space of caller and callee kernels.- address_space: AddressSpace¶
- dim_tags: Sequence[ArrayDimImplementationTag] | None¶
See Data Axis Tags.
- map_expr(subst_mapper: SubstitutionCallable) Self[source]¶
- Returns:
an instance of
ArrayArgDescriptorwith its shapes and strides mapped by subst_mapper.
- class loopy.InKernelCallable(arg_id_to_dtype: Mapping[int | str, LoopyType] | None = None, arg_id_to_descr: Mapping[int | str, ArgDescriptor] | None = None)[source]¶
An abstract interface to define a callable encountered in a kernel.
- name¶
- arg_id_to_descr: Mapping[int | str, ArgDescriptor] | None¶
- __init__(arg_id_to_dtype: Mapping[int | str, LoopyType] | None = None, arg_id_to_descr: Mapping[int | str, ArgDescriptor] | None = None) None[source]¶
- abstractmethod with_types(arg_id_to_dtype: Mapping[int | str, LoopyType], clbl_inf_ctx: CallablesInferenceContext) tuple[InKernelCallable, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyTypeinstances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
- abstractmethod with_descrs(arg_id_to_descr: Mapping[int, ArgDescriptor], clbl_inf_ctx: CallablesInferenceContext) tuple[InKernelCallable, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptororValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
- abstractmethod generate_preambles(target: TargetBase) Iterator[tuple[str, str]][source]¶
Yields the target specific preamble.
- abstractmethod emit_call(expression_to_code_mapper: AbstractExpressionToCodeMapper, expression: Call, target: TargetBase) ArithmeticExpression[source]¶
Generate a target-specific call expression by mapping its arguments to the appropriate data types.
- Parameters:
expression_to_code_mapper – an instance of
loopy.symbolic.IdentityMapperresponsible mapping the arguments (seeExpressionToCExpressionMapper()).
- abstractmethod emit_call_insn(insn: CallInstruction, target: TargetBase, expression_to_code_mapper: AbstractExpressionToCodeMapper) tuple[ArithmeticExpression, bool][source]¶
Returns a tuple of
(call, assignee_is_returned)which is the target facing function call that would be seen in the generated code.callis (usually) an instance ofpymbolic.primitives.Callandassignee_is_returnedis a boolean flag used to indicate if the assignee is returned by value of C-type targets.Note
Example: If
assignee_is_returned=True, thena, b = f(c, d)is interpreted in the target asa = f(c, d, &b). Ifassignee_is_returned=False, thena, b = f(c, d)is interpreted in the target as the statementf(c, d, &a, &b).- Parameters:
expression_to_code_mapper – an instance of
loopy.symbolic.IdentityMapperresponsible for code mapping fromloopysyntax to the target syntax (seeExpressionToCExpressionMapper()).- Returns:
a tuple of the call to be generated and an instance of
boolwhether the first assignee is a part of the LHS in the assignment instruction
- abstractmethod get_hw_axes_sizes(arg_id_to_arg: Mapping[int, Expression], space: Space, callables_table: CallablesTable) tuple[Mapping[int, PwAff], Mapping[int, PwAff]][source]¶
Returns
gsizes, lsizes, where gsizes and lsizes are mappings from axis indices to corresponding group or local hw axis sizes. The hw axes sizes are represented as instances ofislpy.PwAffon the given space.- Parameters:
arg_id_to_arg – A mapping from the passed argument id to the arguments at a call-site.
space – An instance of
islpy.Space.
- abstractmethod get_used_hw_axes(callables_table: CallablesTable) tuple[Set[int], Set[int]][source]¶
Returns a tuple
group_axes_used, local_axes_used, where(group|local)_axes_usedarefrozensetof hardware axes indices used by the callable.
- abstractmethod get_called_callables(callables_table: CallablesTable, recursive: bool = True) frozenset[CallableId][source]¶
Returns a
frozensetof callable ids called by self that are resolved via callables_table.- Parameters:
callables_table – Similar to
loopy.TranslationUnit.callables_table.recursive – If True recursively searches for all the called callables, else only returns the callables directly called by self.
- abstractmethod with_name(name: CallableId) Self[source]¶
Returns a copy of self so that it could be referred by name in a
loopy.TranslationUnit.callables_table’s namespace.
- class loopy.CallableKernel(subkernel: LoopKernel, arg_id_to_dtype: Mapping[int | str, LoopyType] | None = None, arg_id_to_descr: Mapping[int | str, ArgDescriptor] | None = None)[source]¶
Records information about a callee kernel. Also provides interface through member methods to make the callee kernel compatible to be called from a caller kernel.
CallableKernel.with_types()should be called in order to match thedtypesof the arguments that are shared between the caller and the callee kernel.CallableKernel.with_descrs()should be called in order to match the arguments’ shapes/strides across the caller and the callee kernel.- subkernel: LoopKernel¶
- with_descrs(arg_id_to_descr: Mapping[int, ArgDescriptor], clbl_inf_ctx: CallablesInferenceContext) tuple[CallableKernel, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptororValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_types(arg_id_to_dtype: Mapping[int | str, LoopyType], clbl_inf_ctx: CallablesInferenceContext) tuple[CallableKernel, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyTypeinstances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
- class loopy.ScalarCallable(name: str, arg_id_to_dtype: Mapping[int | str, LoopyType] | None = None, arg_id_to_descr: Mapping[int | str, ArgDescriptor] | None = None, name_in_target: str | None = None)[source]¶
An abstract interface to a scalar callable encountered in a kernel.
- name_in_target: str | None¶
A
strto denote the name of the function in aloopy.target.TargetBasefor which the callable is specialized. None if the callable is not specialized enough to know its name in target.
- with_types(arg_id_to_dtype: Mapping[int | str, LoopyType], clbl_inf_ctx: CallablesInferenceContext) tuple[ScalarCallable, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyTypeinstances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_descrs(arg_id_to_descr: Mapping[int, ArgDescriptor], clbl_inf_ctx: CallablesInferenceContext) tuple[ScalarCallable, CallablesInferenceContext][source]¶
- Parameters:
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptororValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns:
a tuple
(new_self, new_clbl_inf_ctx), where new_self is a newInKernelCallablespecialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallabledoes not contain any other callables within it, then clbl_inf_ctx is returned as is.
Note
The
ScalarCallable.with_types()is intended to assist with type specialization of the function and sub-classes must define it.