Design Decisions in Pytato

TODO

  • reduction inames

  • finish trawling the design doc

  • expression nodes in index lambda
    • what pymbolic expression nodes are OK

    • reductions

    • function identifier scoping

    • piecewise def (use ISL?)

Computation and Results

  • Results of computations either implement the Array interface or are a DictOfNamedArrays. The former are referred to as array expressions. The union type of both is referred to as an array result.

  • Array data is computed lazily, i.e., a representation of the desired computation is built, but computation/code generation is not carried out until instructed by the user. Evaluation/computation is never triggered implicitly.

  • Array.dtype is evaluated eagerly.

  • Array.shape is evaluated as eagerly as possible, however data-dependent name references in shapes are allowed. (This implies that the number of array axes must be statically known.)

    Consider the example of fancy indexing:

    A[A > 0]
    

    Here, the length of the resulting array depends on the data contained in A and cannot be statically determined at code generation time.

    In the case of data-dependent shapes, the shape is expressed in terms of scalar (i.e. having a Array.shape of ()) values with an integral Array.dtype (i.e. having dtype.kind == "i") referenced by name from the Array.namespace. Such a name marks the boundary between eager and lazy evaluation.

  • There is (deliberate) overlap in what various expression nodes can express, e.g.

    Expression capture (the “frontend”) should use the “highest-level” (most abstract) node type available that captures the user-intended operation. Lowering transformations (e.g. during code generation) may then convert these operations to a less abstract, more uniform representation.

    Operations that introduce nontrivial mappings on indices (e.g. reshape, strided slice, roll) are identified as potential candidates for being captured in their own high-level node vs. as an pytato.array.IndexLambda.

Naming

  • There is (for now) one Namespace per computation “universe” that defines the computational “environment”, by mapping identifiers to array expressions (note: DictOfNamedArrays instances may not be named, but their constituent parts can, by using pytato.array.AttributeLookup). Operations involving array expressions not using the same namespace are prohibited.

  • Names in the Namespace are under user control and unique. I.e. new names in the Namespace that are not a Reserved Identifiers are not generated automatically without explicit user input.

  • The (array) value associated with a name is immutable once evaluated. In-place slice assignment may be simulated by returning a new node realizing a “partial replacement”.

  • For arrays with data-dependent shapes, such as fancy indexing:

    A[A > 0]
    

    it may be necessary to automatically generate names, in this case to describe the shape of the index array used to realize the access A[A>0]. These will be drawn from the reserved namespace _pt_shp. Users may control the naming of these counts by assigning the tag pytato.array.CountNamed, like so:

    A[(A > 0).tagged(CountNamed("mycount"))]
    
  • pytato.array.Placeholder expressions, like all array expressions, are considered read-only. When computation begins, the same actual memory may be supplied for multiple placeholder names, i.e. those arrays may alias.

    Note

    This does not preclude the arrays being declared with C’s *restrict qualifier in generated code, as they do not alias any data that is being modified.

Reserved Identifiers

  • Identifiers beginning with _pt_ are reserved for internal use by pytato. Any such internal use must be drawn from one of the following sub-regions, identified by their identifier prefixes:

    • _pt_shp: Used to automatically generate identifiers used in data-dependent shapes.

    • _pt_out: The default name of an unnamed output argument

    • _pt_data: Used to automatically generate identifiers for names of DataWrapper arguments that are not supplied by the user.

  • Identifiers used in index lambdas are also reserved. These include:

    • Identifiers matching the regular expression _[0-9]+. They are used as index (“iname”) placeholders.

    • Identifiers matching the regular expression _r[0-9]+. They are used as reduction indices.

    • Identifiers matching the regular expression _in[0-9]+. They are used as automatically generated names (if required) in pytato.array.IndexLambda.bindings.

Glossary

array expression

An object implementing the Array interface

array result

An array expression or an instance of DictOfNamedArrays.

identifier

Any string for which str.isidentifier() returns True. See also Reserved Identifiers.

namespace name

The name by which an array expression is known in a Namespace.

placeholder name

See pytato.array.Placeholder.name.