Function Interface

Resolving and specialization

In loopy, a loopy.TranslationUnit is a collection of callables and entrypoints. Callables are of type loopy.kernel.function_interface.InKernelCallable. Functions start life as simple pymbolic.primitives.Call nodes. Call resolution turns the function identifiers in those calls into ResolvedFunction objects. Each resolved function has an entry in TranslationUnit.callables_table. The process of realizing a function as a InKernelCallable is referred to as resolving.

During code generation for a TranslationUnit, a (resolved) callable is specialized depending on the types and shapes of the arguments passed at a call site. For example, a call to sin(x) in loopy is type-generic to begin with, but it later specialized to either sinf, sin or sinl depending on the type of its argument x. A callable’s behavior during type or shape specialization is encoded via with_types() and with_descrs().

Registering callables

A user can register callables within a TranslationUnit to allow loopy to resolve calls not pre-defined in loopy. In loopy, we typically aim to expose all the standard math functions defined for a TargetBase. Other foreign functions could be invoked by registering them.

An example demonstrating registering a CBlasGemv as a loopy callable:

import numpy as np

import loopy as lp
from loopy.diagnostic import LoopyError
from loopy.target.c import CTarget
from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2  # noqa: F401


# {{{ blas callable

class CBLASGEMV(lp.ScalarCallable):
    def with_types(self, arg_id_to_dtype, callables_table):
        mat_dtype = arg_id_to_dtype.get(0)
        vec_dtype = arg_id_to_dtype.get(1)

        if mat_dtype is None or vec_dtype is None:
            # types aren't specialized enough to be resolved
            return self, callables_table

        if mat_dtype != vec_dtype:
            raise LoopyError("GEMV requires same dtypes for matrix and "
                             "vector")

        if vec_dtype.numpy_dtype == np.float32:
            name_in_target = "cblas_sgemv"
        elif vec_dtype. numpy_dtype == np.float64:
            name_in_target = "cblas_dgemv"
        else:
            raise LoopyError("GEMV is only supported for float32 and float64 "
                             "types")

        return (self.copy(name_in_target=name_in_target,
                          arg_id_to_dtype={0: vec_dtype,
                                           1: vec_dtype,
                                           -1: vec_dtype}),
                callables_table)

    def with_descrs(self, arg_id_to_descr, callables_table):
        mat_descr = arg_id_to_descr.get(0)
        vec_descr = arg_id_to_descr.get(1)
        res_descr = arg_id_to_descr.get(-1)

        if mat_descr is None or vec_descr is None or res_descr is None:
            # shapes aren't specialized enough to be resolved
            return self, callables_table

        assert mat_descr.shape[1] == vec_descr.shape[0]
        assert mat_descr.shape[0] == res_descr.shape[0]
        assert len(vec_descr.shape) == len(res_descr.shape) == 1
        # handling only the easy case when stride == 1
        assert vec_descr.dim_tags[0].stride == 1
        assert mat_descr.dim_tags[1].stride == 1
        assert res_descr.dim_tags[0].stride == 1

        return self.copy(arg_id_to_descr=arg_id_to_descr), callables_table

    def emit_call_insn(self, insn, target, expression_to_code_mapper):
        from pymbolic import var
        mat_descr = self.arg_id_to_descr[0]
        m, n = mat_descr.shape
        ecm = expression_to_code_mapper
        mat, vec = insn.expression.parameters
        result, = insn.assignees

        c_parameters = [var("CblasRowMajor"),
                        var("CblasNoTrans"),
                        m, n,
                        1,
                        ecm(mat).expr,
                        1,
                        ecm(vec).expr,
                        1,
                        ecm(result).expr,
                        1]
        return (var(self.name_in_target)(*c_parameters),
                False  # cblas_gemv does not return anything
                )

    def generate_preambles(self, target):
        assert isinstance(target, CTarget)
        yield ("99_cblas", "#include <cblas.h>")
        return

# }}}


n = 10

knl = lp.make_kernel(
        "{:}",
        """
        y[:] = gemv(A[:, :], x[:])
        """, [
            lp.GlobalArg("A", dtype=np.float64, shape=(n, n)),
            lp.GlobalArg("x", dtype=np.float64, shape=(n, )),
            lp.GlobalArg("y", shape=(n, )), ...],
        target=CTarget())

knl = lp.register_callable(knl, "gemv", CBLASGEMV(name="gemv"))
print(lp.generate_code_v2(knl).device_code())

Call Instruction for a kernel call

At a call-site involving a call to a loopy.LoopKernel, the arguments to the call must be ordered by the order of input arguments of the callee kernel. Similarly, the assignees must be ordered by the order of callee kernel’s output arguments. Since a KernelArgument can be both an input and an output, such arguments would be a part of the call instruction’s assignees as well as the call expression node’s parameters.

Entry points

Only callables in loopy.TranslationUnit.entrypoints can be called from the outside. All other callables are only visible from within the translation unit, similar to C’s static functions.

Reference

class loopy.kernel.function_interface.ValueArgDescriptor(*args, **kwargs)[source]
class loopy.kernel.function_interface.ArrayArgDescriptor(shape, address_space, dim_tags)[source]

Records information about an array argument to an in-kernel callable. To be passed to and returned from with_descrs(), used for matching shape and address space of caller and callee kernels.

shape

Shape of the array.

address_space

An attribute of loopy.AddressSpace.

dim_tags

A tuple of instances of loopy.kernel.array.ArrayDimImplementationTag

map_expr(f)[source]

Returns an instance of ArrayArgDescriptor with its shapes, strides, mapped by f.

depends_on()[source]

Returns frozenset of all the variable names the ArrayArgDescriptor depends on.

class loopy.InKernelCallable(name, arg_id_to_dtype=None, arg_id_to_descr=None)[source]

An abstract interface to define a callable encountered in a kernel.

name

The name of the callable which can be encountered within expressions in a kernel.

arg_id_to_dtype

A mapping which indicates the arguments types and result types of the callable.

arg_id_to_descr

A mapping which gives indicates the argument shape and dim_tags it would be responsible for generating code.

__init__(name, arg_id_to_dtype=None, arg_id_to_descr=None)[source]
with_types(arg_id_to_dtype, clbl_inf_ctx)[source]
Parameters:
  • arg_id_to_type

    a mapping from argument identifiers (integers for positional arguments) to loopy.types.LoopyType instances. Unspecified/unknown types are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]
Parameters:
  • arg_id_to_descr

    a mapping from argument identifiers (integers for positional arguments) to instances of ArrayArgDescriptor or ValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

generate_preambles(target)[source]

Yields the target specific preamble.

emit_call(expression_to_code_mapper, expression, target)[source]
emit_call_insn(insn, target, expression_to_code_mapper)[source]

Returns a tuple of (call, assignee_is_returned) which is the target facing function call that would be seen in the generated code. call is an instance of pymbolic.primitives.Call assignee_is_returned is an instance of bool to indicate if the assignee is returned by value of C-type targets.

Example: If assignee_is_returned=True, then a, b = f(c, d) is

interpreted in the target as a = f(c, d, &b). If assignee_is_returned=False, then a, b = f(c, d) is interpreted in the target as the statement f(c, d, &a, &b).

is_ready_for_codegen()[source]
get_hw_axes_sizes(arg_id_to_arg, space, callables_table)[source]

Returns gsizes, lsizes, where gsizes and lsizes are mappings from axis indices to corresponding group or local hw axis sizes. The hw axes sizes are represented as instances of islpy.PwAff on the given space.

Parameters:
  • arg_id_to_arg – A mapping from the passed argument id to the arguments at a call-site.

  • space – An instance of islpy.Space.

get_used_hw_axes(callables_table)[source]

Returns a tuple group_axes_used, local_axes_used, where (group|local)_axes_used are frozenset of hardware axes indices used by the callable.

get_called_callables(callables_table: CallablesTable, recursive: bool = True) FrozenSet[FunctionIdT][source]

Returns a frozenset of callable ids called by self that are resolved via callables_table.

Parameters:
  • callables_table – Similar to loopy.TranslationUnit.callables_table.

  • recursive – If True recursively searches for all the called callables, else only returns the callables directly called by self.

with_name(name)[source]

Returns a copy of self so that it could be referred by name in a loopy.TranslationUnit.callables_table’s namespace.

is_type_specialized()[source]

Returns True iff self’s type signature is known, else returns False.

Note

  • arg_id can either be an instance of int integer corresponding to the position of the argument or an instance of str corresponding to the name of keyword argument accepted by the function.

  • Negative “arg_id” values -i in the mapping attributes indicate return value with (0-based) index i.

class loopy.CallableKernel(subkernel, arg_id_to_dtype=None, arg_id_to_descr=None)[source]

Records information about a callee kernel. Also provides interface through member methods to make the callee kernel compatible to be called from a caller kernel.

CallableKernel.with_types() should be called in order to match the dtypes of the arguments that are shared between the caller and the callee kernel.

CallableKernel.with_descrs() should be called in order to match the arguments’ shapes/strides across the caller and the callee kernel.

subkernel

LoopKernel which is being called.

with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]
Parameters:
  • arg_id_to_descr

    a mapping from argument identifiers (integers for positional arguments) to instances of ArrayArgDescriptor or ValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

with_types(arg_id_to_dtype, callables_table)[source]
Parameters:
  • arg_id_to_type

    a mapping from argument identifiers (integers for positional arguments) to loopy.types.LoopyType instances. Unspecified/unknown types are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

class loopy.ScalarCallable(name, arg_id_to_dtype=None, arg_id_to_descr=None, name_in_target=None)[source]

An abstract interface to a scalar callable encountered in a kernel.

name_in_target

A str to denote the name of the function in a loopy.target.TargetBase for which the callable is specialized. None if the callable is not specialized enough to know its name in target.

with_types(arg_id_to_dtype, callables_table)[source]
Parameters:
  • arg_id_to_type

    a mapping from argument identifiers (integers for positional arguments) to loopy.types.LoopyType instances. Unspecified/unknown types are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given types. new_clbl_inf_ctx is clbl_inf_ctx’s updated state if the type-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

with_descrs(arg_id_to_descr, clbl_inf_ctx)[source]
Parameters:
  • arg_id_to_descr

    a mapping from argument identifiers (integers for positional arguments) to instances of ArrayArgDescriptor or ValueArgDescriptor. Unspecified/unknown descriptors are not represented in arg_id_to_type.

    Return values are denoted by negative integers, with the first returned value identified as -1.

  • clbl_inf_ctx – An instance of loopy.translation_unit.CallablesInferenceContext. clbl_inf_ctx provides the namespace of other callables contained within self.

Returns:

a tuple (new_self, new_clbl_inf_ctx), where new_self is a new InKernelCallable specialized for the given argument descriptors. new_clbl_inf_ctx is the clbl_inf_ctx’s updated state if descriptor-specialization of self updated other calls contained within it.

Note

If the InKernelCallable does not contain any other callables within it, then clbl_inf_ctx is returned as is.

Note

The ScalarCallable.with_types() is intended to assist with type specialization of the function and sub-classes must define it.