Function Interface¶
Resolving and specialization¶
In loopy
, a loopy.TranslationUnit
is a collection of callables
and entrypoints. Callables are of type
loopy.kernel.function_interface.InKernelCallable
. Functions start life
as simple pymbolic.primitives.Call
nodes. Call resolution turns the function
identifiers in those calls into ResolvedFunction
objects.
Each resolved function has an entry in TranslationUnit.callables_table
.
The process of realizing a function as a
InKernelCallable
is referred to as
resolving.
During code generation for a TranslationUnit
, a (resolved) callable
is specialized depending on the types and shapes of the arguments passed at a
call site. For example, a call to sin(x)
in loopy
is type-generic to
begin with, but it later specialized to either sinf
, sin
or sinl
depending on the type of its argument x
. A callable’s behavior during type
or shape specialization is encoded via
with_types()
and
with_descrs()
.
Registering callables¶
A user can register callables within a TranslationUnit
to
allow loopy to resolve calls not pre-defined in loopy
. In loopy
,
we typically aim to expose all the standard math functions defined for
a TargetBase
. Other foreign functions could be invoked by
registering them.
An example demonstrating registering a CBlasGemv
as a loopy callable:
import loopy as lp
import numpy as np
from loopy.diagnostic import LoopyError
from loopy.target.c import CTarget
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2 # noqa: F401
# {{{ blas callable
class CBLASGEMV(lp.ScalarCallable):
def with_types(self, arg_id_to_dtype, callables_table):
mat_dtype = arg_id_to_dtype.get(0)
vec_dtype = arg_id_to_dtype.get(1)
if mat_dtype is None or vec_dtype is None:
# types aren't specialized enough to be resolved
return self, callables_table
if mat_dtype != vec_dtype:
raise LoopyError("GEMV requires same dtypes for matrix and "
"vector")
if vec_dtype.numpy_dtype == np.float32:
name_in_target = "cblas_sgemv"
elif vec_dtype. numpy_dtype == np.float64:
name_in_target = "cblas_dgemv"
else:
raise LoopyError("GEMV is only supported for float32 and float64 "
"types")
return (self.copy(name_in_target=name_in_target,
arg_id_to_dtype={0: vec_dtype,
1: vec_dtype,
-1: vec_dtype}),
callables_table)
def with_descrs(self, arg_id_to_descr, callables_table):
mat_descr = arg_id_to_descr.get(0)
vec_descr = arg_id_to_descr.get(1)
res_descr = arg_id_to_descr.get(-1)
if mat_descr is None or vec_descr is None or res_descr is None:
# shapes aren't specialized enough to be resolved
return self, callables_table
assert mat_descr.shape[1] == vec_descr.shape[0]
assert mat_descr.shape[0] == res_descr.shape[0]
assert len(vec_descr.shape) == len(res_descr.shape) == 1
# handling only the easy case when stride == 1
assert vec_descr.dim_tags[0].stride == 1
assert mat_descr.dim_tags[1].stride == 1
assert res_descr.dim_tags[0].stride == 1
return self.copy(arg_id_to_descr=arg_id_to_descr), callables_table
def emit_call_insn(self, insn, target, expression_to_code_mapper):
from pymbolic import var
mat_descr = self.arg_id_to_descr[0]
m, n = mat_descr.shape
ecm = expression_to_code_mapper
mat, vec = insn.expression.parameters
result, = insn.assignees
c_parameters = [var("CblasRowMajor"),
var("CblasNoTrans"),
m, n,
1,
ecm(mat).expr,
1,
ecm(vec).expr,
1,
ecm(result).expr,
1]
return (var(self.name_in_target)(*c_parameters),
False # cblas_gemv does not return anything
)
def generate_preambles(self, target):
assert isinstance(target, CTarget)
yield("99_cblas", "#include <cblas.h>")
return
# }}}
n = 10
knl = lp.make_kernel(
"{:}",
"""
y[:] = gemv(A[:, :], x[:])
""", [
lp.GlobalArg("A", dtype=np.float64, shape=(n, n)),
lp.GlobalArg("x", dtype=np.float64, shape=(n, )),
lp.GlobalArg("y", shape=(n, )), ...],
target=CTarget())
knl = lp.register_callable(knl, "gemv", CBLASGEMV(name="gemv"))
print(lp.generate_code_v2(knl).device_code())
Call Instruction for a kernel call¶
At a call-site involving a call to a loopy.LoopKernel
, the arguments to
the call must be ordered by the order of input arguments of the callee kernel.
Similarly, the assignees must be ordered by the order of callee kernel’s output
arguments. Since a KernelArgument
can be both an
input and an output, such arguments would be a part of the call instruction’s
assignees as well as the call expression node’s parameters.
Entry points¶
Only callables in loopy.TranslationUnit.entrypoints
can be called from
the outside. All other callables are only visible from within the translation
unit, similar to C’s static
functions.
Reference¶
- class loopy.kernel.function_interface.ArrayArgDescriptor(shape, address_space, dim_tags)[source]¶
Records information about an array argument to an in-kernel callable. To be passed to and returned from
InKernelCallable.with_descrs()
, used for matching shape and address space of caller and callee kernels.- shape¶
Shape of the array.
- address_space¶
An attribute of
loopy.AddressSpace
.
- dim_tags¶
A tuple of instances of
loopy.kernel.array.ArrayDimImplementationTag
- map_expr(f)[source]¶
Returns an instance of
ArrayArgDescriptor
with its shapes, strides, mapped by f.
- depends_on()[source]¶
Returns
frozenset
of all the variable names theArrayArgDescriptor
depends on.
- class loopy.kernel.function_interface.InKernelCallable(name, arg_id_to_dtype=None, arg_id_to_descr=None)[source]¶
An abstract interface to define a callable encountered in a kernel.
- name¶
The name of the callable which can be encountered within expressions in a kernel.
- arg_id_to_dtype¶
A mapping which indicates the arguments types and result types of the callable.
- arg_id_to_descr¶
A mapping which gives indicates the argument shape and
dim_tags
it would be responsible for generating code.
- with_types(arg_id_to_dtype, clbl_inf_ctx)[source]¶
- Parameters
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyType
instances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]¶
- Parameters
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptor
orValueArgDescriptor
. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_target(target)[source]¶
Returns a copy of self with all the
dtypes
inin_knl_callable.arg_id_to_dtype
associated with the target.- Parameters
target – An instance of
loopy.target.TargetBase
.
- emit_call_insn(insn, target, expression_to_code_mapper)[source]¶
Returns a tuple of
(call, assignee_is_returned)
which is the target facing function call that would be seen in the generated code.call
is an instance ofpymbolic.primitives.Call
assignee_is_returned
is an instance ofbool
to indicate if the assignee is returned by value of C-type targets.- Example: If
assignee_is_returned=True
, thena, b = f(c, d)
is interpreted in the target as
a = f(c, d, &b)
. Ifassignee_is_returned=False
, thena, b = f(c, d)
is interpreted in the target as the statementf(c, d, &a, &b)
.
- Example: If
- get_hw_axes_sizes(arg_id_to_arg, space, callables_table)[source]¶
Returns
gsizes, lsizes
, where gsizes and lsizes are mappings from axis indices to corresponding group or local hw axis sizes. The hw axes sizes are represented as instances ofislpy.PwAff
on the given space.- Parameters
arg_id_to_arg – A mapping from the passed argument id to the arguments at a call-site.
space – An instance of
islpy.Space
.
- get_used_hw_axes(callables_table)[source]¶
Returns a tuple
group_axes_used, local_axes_used
, where(group|local)_axes_used
arefrozenset
of hardware axes indices used by the callable.
- get_called_callables(callables_table, recursive=True)[source]¶
Returns a
frozenset
of callable ids called by self that are resolved via callables_table.- Parameters
callables_table – Similar to
loopy.TranslationUnit.callables_table
.recursive – If True recursively searches for all the called callables, else only returns the callables directly called by self.
- with_name(name)[source]¶
Returns a copy of self so that it could be referred by name in a
loopy.TranslationUnit.callables_table
’s namespace.
- class loopy.kernel.function_interface.CallableKernel(subkernel, arg_id_to_dtype=None, arg_id_to_descr=None)[source]¶
Records informations about a callee kernel. Also provides interface through member methods to make the callee kernel compatible to be called from a caller kernel.
CallableKernel.with_types()
should be called in order to match thedtypes
of the arguments that are shared between the caller and the callee kernel.CallableKernel.with_descrs()
should be called in order to match the arguments’ shapes/strides across the caller and the callee kernel.- subkernel¶
LoopKernel
which is being called.
- with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]¶
- Parameters
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptor
orValueArgDescriptor
. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_types(arg_id_to_dtype, callables_table)[source]¶
- Parameters
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyType
instances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
- class loopy.kernel.function_interface.ScalarCallable(name, arg_id_to_dtype=None, arg_id_to_descr=None, name_in_target=None)[source]¶
An abstract interface to a scalar callable encountered in a kernel.
- name_in_target¶
A
str
to denote the name of the function in aloopy.target.TargetBase
for which the callable is specialized. None if the callable is not specialized enough to know its name in target.
- with_types(arg_id_to_dtype, callables_table)[source]¶
- Parameters
arg_id_to_type –
a mapping from argument identifiers (integers for positional arguments) to
loopy.types.LoopyType
instances. Unspecified/unknown types are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
- with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]¶
- Parameters
arg_id_to_descr –
a mapping from argument identifiers (integers for positional arguments) to instances of
ArrayArgDescriptor
orValueArgDescriptor
. Unspecified/unknown descriptors are not represented in arg_id_to_type.Return values are denoted by negative integers, with the first returned value identified as -1.
clbl_inf_ctx – An instance of
loopy.translation_unit.CallablesInferenceContext
. clbl_inf_ctx provides the namespace of other callables contained within self.
- Returns
a tuple
(new_self, new_clbl_inf_ctx)
, where new_self is a newInKernelCallable
specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.
Note
If the
InKernelCallable
does not contain any other callables within it, then clbl_inf_ctx is returned as is.
Note
The
ScalarCallable.with_types()
is intended to assist with type specialization of the function and sub-classes must define it.