hidet.graph

Classes:

Operator(inputs, attributes, task)

An operator that takes tensor as input and output.

FlowGraph(outputs[, inputs, nodes])

The computation graph representation.

PassContext()

Graph-level pass context.

GraphPassInstrument()

Graph pass instrument.

Functions:

asarray(obj, /, *[, dtype, device])

Convert a list, tuple, or numpy ndarray to a hidet tensor.

randn(shape[, dtype, mean, stddev, device])

Create a tensor with uniformly distributed values.

empty(shape[, dtype, device, layout])

Create an uninitialized tensor.

zeros(shape[, dtype, device])

Create a tensor initialized with zero.

ones(shape[, dtype, device])

Create a tensor initialized with one.

symbol(shape[, dtype, device, layout])

Create a symbolic tensor.

randn_like(data[, mean, stddev, shape, ...])

Create a randomly initialized tensor with the same shape, dtype, and device as the given tensor.

empty_like(data[, shape, dtype, device, layout])

Create an uninitialized tensor with the same shape, dtype, and device as the given tensor.

zeros_like(data[, shape, dtype, device])

Create a tensor initialized with zero with the same shape, dtype, and device as the given tensor.

ones_like(data[, shape, dtype, device])

Create a tensor initialized with one with the same shape, dtype, and device as the given tensor.

symbol_like(data[, shape, dtype, device, layout])

Create a symbol tensor like an existing tensor.

full(shape, fill_value[, dtype, device])

Create a tensor initialized with given constant.

full_like(data, fill_value[, shape, dtype, ...])

Create a tensor initialized with fill_value with the same shape, dtype, and device as the given tensor.

from_numpy(nparray)

Create a tensor from a numpy array, sharing the memory with the numpy array when possible.

from_dlpack(dltensor)

Create a hidet tensor from an object that implements the __dlpack__ protocol.

from_torch(torch_tensor)

Create a hidet tensor from pytorch tensor.

trace_from(tensor[, inputs])

Trace the flow graph given the output tensor(s).

optimize(graph)

Optimize a flow graph.

class hidet.graph.Operator(inputs, attributes, task)[source]

An operator that takes tensor as input and output.

Attributes:

device

Get the device of the output tensor of this operator.

build_target

Get the build target of this operator.

Parameters:
  • inputs (List[Tensor]) –

  • attributes (Dict[str, Any]) –

  • task (Task | None) –

property device: Device

Get the device of the output tensor of this operator.

Returns:

ret – The device of the output tensor of this operator.

Return type:

Device

property build_target: str

Get the build target of this operator.

Returns:

ret – The build target of this operator.

Return type:

str

class hidet.graph.FlowGraph(outputs, inputs=None, nodes=None)[source]

The computation graph representation.

Methods:

__call__(*inputs)

Run the computation graph.

forward(inputs)

Run the computation graph.

save(model_file)

Save the flow graph to a file.

load(model_file)

Load a flow graph from a file.

build(*[, space])

Build the flow graph to a compiled model (hidet.runtime.CompiledModel).

cuda_graph()

Create a CudaGraph from FlowGraph.

latency([warmup, number, repeat, median, ...])

Measure the latency of the flow graph.

vcuda_()

casts the flow graph object to vcuda device in place

cuda_()

casts the flow graph object from vcuda device in place

Attributes:

nodes

The list of operators in the computation graph.

usage_count

The usage count of each tensor in the computation graph.

Parameters:
  • outputs (Sequence[Tensor]) –

  • inputs (Optional[Sequence[Tensor]]) –

__call__(*inputs)[source]

Run the computation graph.

Parameters:

inputs (Sequence[Tensor]) – The input tensors.

Returns:

ret – The output tensors. If there is only one output, return it directly.

Return type:

Union[List[Tensor], Tensor]

property nodes: List[Operator]

The list of operators in the computation graph.

property usage_count: Dict[Tensor, int]

The usage count of each tensor in the computation graph.

forward(inputs)[source]

Run the computation graph.

Parameters:

inputs (List[Tensor]) – The input tensors. They should be consistent with the symbolic inputs of the computation graph.

Returns:

output – The output tensors of the computation graph.

Return type:

List[Tensor]

save(model_file)[source]

Save the flow graph to a file.

Parameters:

model_file (str) – The model file to store the flow graph.

static load(model_file)[source]

Load a flow graph from a file.

Parameters:

model_file (str) – The path to the flow graph.

Returns:

ret – The loaded flow graph.

Return type:

FlowGraph

build(*, space=0)[source]

Build the flow graph to a compiled model (hidet.runtime.CompiledModel).

Parameters:

space (int) – The space to allocate for the compiled model. Candidates are 0, 1 and 2. Space 0 means each operator will be compiled with the default schedule. Space 1 means each operator will be compiled with a small set of schedules. Space 2 means each operator will be compiled with a large set of schedules. The larger the space, the more schedules will be tried, and the better the performance will be, with the cost of longer compilation and tuning time.

Returns:

ret – The compiled model.

Return type:

hidet.runtime.CompiledGraph

cuda_graph()[source]

Create a CudaGraph from FlowGraph.

Returns:

ret – The created cuda graph.

Return type:

hidet.cuda.graph.CudaGraph

latency(warmup=1, number=3, repeat=3, median=True, dummy_inputs=None)[source]

Measure the latency of the flow graph.

Parameters:
  • warmup (int) – The number of warmup runs.

  • number (int) – The number of runs to measure the latency.

  • repeat (int) – The number of times to repeat the measurement.

  • median (bool) – Whether to return the median latency.

  • dummy_inputs (Optional[Sequence[Tensor]]) – The dummy inputs to run the flow graph. If not given, automatic generated dummy inputs would be used.

Returns:

ret – The measured latency in milliseconds.

Return type:

Union[float, List[float]]

vcuda_()[source]

casts the flow graph object to vcuda device in place

Return type:

None

cuda_()[source]

casts the flow graph object from vcuda device in place

Return type:

None

class hidet.graph.PassContext[source]

Graph-level pass context.

Use the pass context to control the behavior of optimization passes. Normally, we can optimize a flow graph by directly calling hidet.graph.optimize():

graph_opt = hidet.graph.optimize(graph)

This will optimize the given flow graph in a default context.

To customize the optimizations, run the optimize() function with in a custom hidet.graph.PassContext:

with hidet.graph.PassContext() as ctx:
    # config the contexts
    ctx.profile_pass_instrument(print_stdout=True)  # print elapsed time for each pass
    ctx.save_graph_instrument(out_dir='./outs')  # save the output of each pass as text
    ctx.set_precision(dtype='float16')  # use float16 as the data type
    ctx.set_reduce_precision(dtype='float32')  # use float32 for reduction accumulation
    ctx.set_mma('mma')  # use TensorCore in NVIDIA GPUs to accelerate matmul and conv2d
    ...   # other configs

    # call optimize function
    graph_opt = hidet.graph.optimize(graph)

Please refer to the member functions of this class for the available configs and their usage.

instruments

The graph pass instruments that will be applied before and after each pass. The instruments will be applied in order. See hidet.graph.GraphPassInstrument on how to add custom instrument.

Type:

List[GraphPassInstrument]

configs

The current configs of the pass context.

Type:

Dict[str, Any]

Methods:

current()

Get the current pass context.

set_precision([dtype])

Set the target precision to use as the output of most operators.

add_quantize_rules(patterns)

Adds selective quantization rules to the pass context.

set_reduce_precision([dtype])

Set the target precision used for accumulation results.

set_use_attention([flag])

Set to use fused attention schedule

set_verbose()

Allow each graph level passes to print detailed information related to its lowering and optimization.

set_mma(mma)

Specify the matrix-multiply-accumulate (mma) computation primitives used in matrix multiplication and convolution.

set_parallel_k([disabled, default, search, ...])

Set the strategy to parallel on reduction dimension for matrix multiplication and convolution.

save_graph_instrument(out_dir)

Save the computation graph after each pass to given output directory.

profile_pass_instrument([log_file, print_stdout])

Profile the time of each pass.

reduce_cuda_compile_mem([enable])

Reduce CUDA memory used during compilation by using vcuda tensors, might incur compile time cost

classmethod current()[source]

Get the current pass context.

Returns:

ret – The current pass context.

Return type:

PassContext

set_precision(dtype=None)[source]

Set the target precision to use as the output of most operators. To retain the accuracy, some operators will still use the original data type.

Parameters:

dtype (Optional[str]) –

The target dtype to mix the precision of the model. Candidates:

  • None Do not mix the precision.

  • ’int8’

    Converts the model into float16 data type, then selectively quantize subgraphs using default quantize_patterns. For greater flexibility and control of quantization, use self.add_quantize_pattern(), to selectively quantize subgraphs using custom quantize_patterns.

  • ’float16’ Convert the model into float16 data type.

  • ’bfloat16’ Convert the model into bfloat16 data type.

  • ’float32’ Convert the model into float32 data type.

Return type:

PassContext

add_quantize_rules(patterns)[source]

Adds selective quantization rules to the pass context.

Parameters:
  • pattern (Optional[List[SubgraphRewriteRule]]) –

    The pattern to selectively quantize.

    • List[SubgraphRewriteRule] Adds new rules on top of what is already there. The new rules will be applied after the existing ones.

  • patterns (List[SubgraphRewriteRule]) –

Return type:

PassContext

set_reduce_precision(dtype=None)[source]

Set the target precision used for accumulation results. Operators like reduce_mean, reduce_avg, matrix multiplication and convolution will reduce along some dimensions. We might want to use a data type with more precision to accumulate the results for more accuracy.

Parameters:
  • dtype (Optional[str]) –

  • accumulation. (The target dtype to use for) –

    • None Use the same as inputs of operators.

    • ’float16’ Use ‘float16’ to accumulate. Only valid when set_precision(‘float16’) has been used.

    • ’float32’ Use ‘float32’ to accumulate.

Return type:

PassContext

set_use_attention(flag=False)[source]

Set to use fused attention schedule

Return type:

PassContext

set_verbose()[source]

Allow each graph level passes to print detailed information related to its lowering and optimization.

Return type:

PassContext

set_mma(mma)[source]

Specify the matrix-multiply-accumulate (mma) computation primitives used in matrix multiplication and convolution.

Parameters:

mma (str) –

The mma computation primitive to use. Candidates:

  • ’simt’

    Use cuda cores.

  • ’mma’

    Use mma instructions.

Return type:

PassContext

set_parallel_k(disabled=False, default=False, search=False, nparts=None)[source]

Set the strategy to parallel on reduction dimension for matrix multiplication and convolution.

Only one of the three parameters should be specified.

Parameters:
  • disabled (bool) – Disable the parallelization on reduction dimension.

  • default (bool) – Allow hidet to figure our the parallel factor.

  • search (bool) – Whether to search the k.

  • nparts (Optional[int]) – Use a fixed factor.

save_graph_instrument(out_dir)[source]

Save the computation graph after each pass to given output directory.

Parameters:

out_dir (str) – The directory to save graph.

Return type:

PassContext

profile_pass_instrument(log_file=None, print_stdout=False)[source]

Profile the time of each pass.

Parameters:
  • log_file (Optional[str]) – When given, write the elapsed time for each pass to this file.

  • print_stdout (bool) – Whether to print the elapsed time for each pass to standard output.

Return type:

PassContext

reduce_cuda_compile_mem(enable=None)[source]

Reduce CUDA memory used during compilation by using vcuda tensors, might incur compile time cost

Parameters:

enable (Optional[bool]) – When given, will always enable or disable this instrument. If no argument is given, the compiler will decide to enable this with some heuristics

class hidet.graph.GraphPassInstrument[source]

Graph pass instrument.

This class defines the interface for graph pass instruments. An instrument defines the functions that will be called before and after each pass. This can be used to collect the information of graph passes. Currently, the instrument does not support modifying the flow graph passed to it (such functionality should be implemented as graph pass).

To define a custom graph pass instrument and use it:

import hidet

# define custom instrument and implement instrument functions
class MyInstrument(hidet.graph.GraphPassInstrument):
    def before_all_passes(self, graph: FlowGraph) -> None:
        print('before all passes')

    def before_pass(self, pass_name: str, graph: FlowGraph) -> None:
        print('before pass', pass_name)

    def after_pass(self, pass_name: str, graph: FlowGraph) -> None:
        print('after pass', pass_name)

    def after_all_passes(self, graph: FlowGraph) -> None:
        print('after all passes')

graph = hidet.graph.FlowGraph(outputs=[])   # empty flow graph
with hidet.graph.PassContext() as ctx:
    # add custom instrument to pass context
    ctx.instruments.append(MyInstrument())
    # optimize flow graph
    hidet.graph.optimize(graph)

We can get output like

before all passes
before pass FoldConstantPass
after pass FoldConstantPass
before pass PatternTransformPass
after pass PatternTransformPass
...
after all passes

Methods:

before_all_passes(graph)

Called before process all passes.

before_pass(pass_name, graph)

Called before each pass.

after_pass(pass_name, graph)

Called after each pass.

after_all_passes(graph)

Called after applying all passes.

before_all_passes(graph)[source]

Called before process all passes.

Parameters:

graph (FlowGraph) – The flow graph before applying all passes.

Return type:

None

before_pass(pass_name, graph)[source]

Called before each pass.

Parameters:
  • pass_name (str) – The name of the pass that is going to be applied.

  • graph (FlowGraph) – The flow graph before applying the pass.

Return type:

None

after_pass(pass_name, graph)[source]

Called after each pass.

Parameters:
  • pass_name (str) – The name of the pass that has been applied.

  • graph (FlowGraph) – The flow graph after applied the pass.

Return type:

None

after_all_passes(graph)[source]

Called after applying all passes.

Parameters:

graph (FlowGraph) – The flow graph after applying all passes.

Return type:

None

hidet.graph.asarray(obj, /, *, dtype=None, device=None)[source]

Convert a list, tuple, or numpy ndarray to a hidet tensor.

Parameters:
  • obj (bool, int, float, List, Tuple, Tensor, np.ndarray) – The object to be converted.

  • dtype (DataType or str, optional) – The data type of the output tensor.

  • device (Device or str) – The device of the output tensor.

Returns:

ret – The hidet tensor converted from given object.

Return type:

Tensor

hidet.graph.randn(shape, dtype='float32', mean=0.0, stddev=1.0, device='cpu')[source]

Create a tensor with uniformly distributed values.

Parameters:
  • shape (Sequence[int]) – The shape of new tensor.

  • dtype (DataType or str, default 'float32') – The data type of element of the tensor.

  • mean (float, default 0.0) – The mean of the uniform distribution.

  • stddev (float, default 1.0) – The standard deviation of the uniform distribution.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

Returns:

ret – The created tensor.

Return type:

Tensor

Examples

>>> randn([2, 3])
Tensor(shape=[2, 3], dtype='float32', device='cuda')
[[ 0.10720467 -1.6906018   0.06347568]
 [-0.37061226  0.562728    1.857547  ]]
hidet.graph.empty(shape, dtype='float32', device='cpu', layout=None)[source]

Create an uninitialized tensor.

Parameters:
  • shape (Sequence[int]) – The shape of new tensor.

  • dtype (str or DataType) – The data type of element of the tensor.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

  • layout (DataLayout, optional) – The layout of the new tensor. None indicates the default layout (row-major layout).

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.zeros(shape, dtype='float32', device='cpu')[source]

Create a tensor initialized with zero.

Parameters:
  • shape (Sequence[int]) – The shape of new tensor.

  • dtype (str or DataType) – The data type of element of the tensor.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.ones(shape, dtype='float32', device='cpu')[source]

Create a tensor initialized with one.

Parameters:
  • shape (Sequence[int]) – The shape of new tensor.

  • dtype (DataType or str, default 'float32') – The data type of element of the tensor.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.symbol(shape, dtype='float32', device='cpu', layout=None)[source]

Create a symbolic tensor.

Parameters:
  • shape (Sequence[Union[int, str, Expr]]) – The shape of new tensor. The shape can contain symbolic variables. str indicates the corresponding dimension is a symbolic variable with the given name.

  • dtype (str or DataType) – The data type of element of the tensor.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

  • layout (DataLayout, optional) – The layout of the new tensor. None indicates the default layout (row-major layout).

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.randn_like(data, mean=0.0, stddev=1.0, shape=None, dtype=None, device=None)[source]

Create a randomly initialized tensor with the same shape, dtype, and device as the given tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • mean (float, optional) – The mean of the normal distribution.

  • stddev (float, optional) – The standard deviation of the normal distribution.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

Returns:

ret – The created tensor with random values sampled from a normal distribution.

Return type:

Tensor

hidet.graph.empty_like(data, shape=None, dtype=None, device=None, layout=None)[source]

Create an uninitialized tensor with the same shape, dtype, and device as the given tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

  • layout (DataLayout, optional) – The layout of the new tensor. If None, the layout of data is used.

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.zeros_like(data, shape=None, dtype=None, device=None)[source]

Create a tensor initialized with zero with the same shape, dtype, and device as the given tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

Returns:

ret – The created tensor with all elements as zero.

Return type:

Tensor

hidet.graph.ones_like(data, shape=None, dtype=None, device=None)[source]

Create a tensor initialized with one with the same shape, dtype, and device as the given tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

Returns:

ret – The created tensor with all elements as one.

Return type:

Tensor

hidet.graph.symbol_like(data, shape=None, dtype=None, device=None, layout=None)[source]

Create a symbol tensor like an existing tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

  • layout (DataLayout, optional) – The layout of the new tensor. If None, the layout of data is used.

Returns:

ret – The created symbol tensor.

Return type:

Tensor

hidet.graph.full(shape, fill_value, dtype='float32', device='cpu')[source]

Create a tensor initialized with given constant.

Parameters:
  • shape (Sequence[int]) – The shape of new tensor.

  • fill_value (float or int or hidet.ir.Constant) – The constant to initialize the new tensor.

  • dtype (DataType or str, default 'float32') – The data type of element of the tensor.

  • device (Device or str, default 'cpu') – The device of the new tensor is created on.

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.full_like(data, fill_value, shape=None, dtype=None, device=None)[source]

Create a tensor initialized with fill_value with the same shape, dtype, and device as the given tensor.

Parameters:
  • data (Tensor) – The tensor to copy shape, dtype, and device from.

  • fill_value (int, float, bool, complex) – The value to fill the tensor with.

  • shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.

  • dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.

  • device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.

Returns:

ret – The created tensor with all elements as fill_value.

Return type:

Tensor

hidet.graph.from_numpy(nparray)[source]

Create a tensor from a numpy array, sharing the memory with the numpy array when possible.

Parameters:

nparray (numpy.ndarray) – The numpy array to create the tensor from.

Returns:

ret – The created tensor.

Return type:

Tensor

hidet.graph.from_dlpack(dltensor)[source]

Create a hidet tensor from an object that implements the __dlpack__ protocol.

Parameters:

dltensor (an object that implements the DLPack protocol.) – The object must have the method __dlpack__ that returns a PyCapsule object with name dltensor.

Returns:

ret – The hidet tensor that shares the same storage with the DLPack tensor.

Return type:

Tensor

hidet.graph.from_torch(torch_tensor)[source]

Create a hidet tensor from pytorch tensor.

The created tensor shared the same memory as given pytorch tensor. Thus, any content modification on one tensor would be reflected on the other one.

Parameters:

torch_tensor (torch.Tensor) – The pytorch tensor.

Returns:

ret – The created hidet tensor.

Return type:

Tensor

hidet.graph.trace_from(tensor, inputs=None)[source]

Trace the flow graph given the output tensor(s).

Each hidet.graph.Tensor has an attribute hidet.graph.Tensor.trace which indicates how the tensor is generated. If the tensor is generated by an operator with symbolic input(s), the tensor itself is also symbolic. And the tensor will have a reference to the operator that generates it. The reference is stored in this attribute.

What this function does is to walk through the trace of the given tensor(s) and construct a flow graph.

When there are multiple symbol inputs, it is mandatory to specify the “inputs” argument explicitly to avoid ambiguity.

Parameters:
  • tensor (Tensor or List[Tensor]) – The output tensor(s) that we trace from.

  • inputs (Optional, Tensor or List[Tensor]) – The inputs of the flow graph. When there is only a single symbol tensor in the flow graph, it is optional. When there are multiple inputs, this is required to specify the input order.

Returns:

ret – The flow graph that outputs the given input tensor(s).

Return type:

FlowGraph

hidet.graph.optimize(graph)[source]

Optimize a flow graph.

This function applies a sequence of predefined graph-level passes to a FlowGraph to conduct optimizations and graph transformations.

Tip

Some graph passes provide options to config, please refer to hidet.graph.PassContext for more information on graph pass configuration.

Parameters:

graph (FlowGraph) – The flow graph to be optimized.

Returns:

ret – The optimized flow graph.

Return type:

FlowGraph