hidet.graph
hidet.graph¶
Classes:

An operator that takes tensor as input and output. 

The computation graph representation. 
Graphlevel pass context. 

Graph pass instrument. 
Functions:

Convert a list, tuple, or numpy ndarray to a hidet tensor. 

Create a tensor with uniformly distributed values. 

Create an uninitialized tensor. 

Create a tensor initialized with zero. 

Create a tensor initialized with one. 

Create a symbolic tensor. 

Create a randomly initialized tensor with the same shape, dtype, and device as the given tensor. 

Create an uninitialized tensor with the same shape, dtype, and device as the given tensor. 

Create a tensor initialized with zero with the same shape, dtype, and device as the given tensor. 

Create a tensor initialized with one with the same shape, dtype, and device as the given tensor. 

Create a symbol tensor like an existing tensor. 

Create a tensor initialized with given constant. 

Create a tensor initialized with fill_value with the same shape, dtype, and device as the given tensor. 

Create a tensor from a numpy array, sharing the memory with the numpy array when possible. 

Create a hidet tensor from an object that implements the __dlpack__ protocol. 

Create a hidet tensor from pytorch tensor. 

Trace the flow graph given the output tensor(s). 

Optimize a flow graph. 
 class hidet.graph.Operator(inputs, attributes, task)[source]¶
An operator that takes tensor as input and output.
 Parameters
inputs (List[hidet.Tensor]) –
attributes (Dict[str, Any]) –
task (Optional[hidet.ir.task.Task]) –
 class hidet.graph.FlowGraph(outputs, inputs=None, nodes=None)[source]¶
The computation graph representation.
Methods:
__call__
(*inputs)Run the computation graph.
forward
(*inputs)Run the computation graph.
save
(model_file)Save the flow graph to a file.
load
(model_file)Load a flow graph from a file.
Create a CudaGraph from FlowGraph.
latency
([warmup, number, repeat, median, ...])Measure the latency of the flow graph.
Attributes:
The list of operators in the computation graph.
The usage count of each tensor in the computation graph.
 __call__(*inputs)[source]¶
Run the computation graph. See Also
FlowGraph.forward()
. Parameters
inputs (hidet.Tensor) –
 Return type
Union[List[hidet.Tensor], hidet.Tensor]
 property nodes: List[hidet.graph.operator.Operator]¶
The list of operators in the computation graph.
 property usage_count: Dict[hidet.Tensor, int]¶
The usage count of each tensor in the computation graph.
 forward(*inputs)[source]¶
Run the computation graph.
 Parameters
*inputs (Tensor) – The input tensors. They should be consistent with the symbolic inputs of the computation graph.
inputs (hidet.Tensor) –
 Returns
output – If there is only one output, it is returned directly. Otherwise, a list of output tensors are returned.
 Return type
 save(model_file)[source]¶
Save the flow graph to a file.
 Parameters
model_file (str) – The model file to store the flow graph.
 static load(model_file)[source]¶
Load a flow graph from a file.
 Parameters
model_file (str) – The path to the flow graph.
 Returns
ret – The loaded flow graph.
 Return type
 cuda_graph()[source]¶
Create a CudaGraph from FlowGraph.
 Returns
ret – The created cuda graph.
 Return type
 latency(warmup=1, number=3, repeat=3, median=True, dummy_inputs=None)[source]¶
Measure the latency of the flow graph.
 Parameters
warmup (int) – The number of warmup runs.
number (int) – The number of runs to measure the latency.
repeat (int) – The number of times to repeat the measurement.
median (bool) – Whether to return the median latency.
dummy_inputs (Optional[Sequence[Tensor]]) – The dummy inputs to run the flow graph. If not given, automatic generated dummy inputs would be used.
 Returns
ret – The measured latency in milliseconds.
 Return type
Union[float, List[float]]
 class hidet.graph.PassContext[source]¶
Graphlevel pass context.
Use the pass context to control the behavior of optimization passes. Normally, we can optimize a flow graph by directly calling
hidet.graph.optimize()
:graph_opt = hidet.graph.optimize(graph)
This will optimize the given flow graph in a default context.
To customize the optimizations, run the
optimize()
function with in a customhidet.graph.PassContext
:with hidet.graph.PassContext() as ctx: # config the contexts ctx.profile_pass_instrument(print_stdout=True) # print elapsed time for each pass ctx.save_graph_instrument(out_dir='./outs') # save the output of each pass as text ctx.set_precision(dtype='float16') # use float16 as the data type ctx.set_reduce_precision(dtype='float32') # use float32 for reduction accumulation ctx.set_mma('mma') # use TensorCore in NVIDIA GPUs to accelerate matmul and conv2d ... # other configs # call optimize function graph_opt = hidet.graph.optimize(graph)
Please refer to the member functions of this class for the available configs and their usage.
 instruments¶
The graph pass instruments that will be applied before and after each pass. The instruments will be applied in order. See
hidet.graph.GraphPassInstrument
on how to add custom instrument. Type
List[GraphPassInstrument]
 configs¶
The current configs of the pass context.
 Type
Dict[str, Any]
Methods:
current
()Get the current pass context.
set_precision
([dtype])Set the target precision to use as the output of most operators.
set_reduce_precision
([dtype])Set the target precision used for accumulation results.
set_use_attention
([flag])Set to use fused attention schedule
Allow each graph level passes to print detailed information related to its lowering and optimization.
set_mma
(mma)Specify the matrixmultiplyaccumulate (mma) computation primitives used in matrix multiplication and convolution.
set_parallel_k
([disabled, default, search, ...])Set the strategy to parallel on reduction dimension for matrix multiplication and convolution.
save_graph_instrument
(out_dir)Save the computation graph after each pass to given output directory.
profile_pass_instrument
([log_file, print_stdout])Profile the time of each pass.
 classmethod current()[source]¶
Get the current pass context.
 Returns
ret – The current pass context.
 Return type
 set_precision(dtype=None)[source]¶
Set the target precision to use as the output of most operators. To retain the accuracy, some operators will still use the original data type.
 Parameters
dtype (Optional[str]) –
The target dtype to mix the precision of the model. Candidates:
None Do not mix the precision.
’float16’ Convert the model into float16 data type.
’bfloat16’ Convert the model into bfloat16 data type.
’float32’ Convert the model into float32 data type.
 Return type
 set_reduce_precision(dtype=None)[source]¶
Set the target precision used for accumulation results. Operators like reduce_mean, reduce_avg, matrix multiplication and convolution will reduce along some dimensions. We might want to use a data type with more precision to accumulate the results for more accuracy.
 Parameters
dtype (Optional[str]) –
accumulation. (The target dtype to use for) –
None Use the same as inputs of operators.
’float16’ Use ‘float16’ to accumulate. Only valid when set_precision(‘float16’) has been used.
’float32’ Use ‘float32’ to accumulate.
 Return type
 set_verbose()[source]¶
Allow each graph level passes to print detailed information related to its lowering and optimization.
 Return type
 set_mma(mma)[source]¶
Specify the matrixmultiplyaccumulate (mma) computation primitives used in matrix multiplication and convolution.
 Parameters
mma (str) –
The mma computation primitive to use. Candidates:
 ’simt’
Use cuda cores.
 ’wmma’
Use wmma instructions.
 ’mma’
Use mma instructions.
 Return type
 set_parallel_k(disabled=False, default=False, search=False, nparts=None)[source]¶
Set the strategy to parallel on reduction dimension for matrix multiplication and convolution.
Only one of the three parameters should be specified.
 Parameters
disabled (bool) – Disable the parallelization on reduction dimension.
default (bool) – Allow hidet to figure our the parallel factor.
search (bool) – Whether to search the k.
nparts (Optional[int]) – Use a fixed factor.
 save_graph_instrument(out_dir)[source]¶
Save the computation graph after each pass to given output directory.
 Parameters
out_dir (str) – The directory to save graph.
 Return type
 profile_pass_instrument(log_file=None, print_stdout=False)[source]¶
Profile the time of each pass.
 Parameters
log_file (Optional[str]) – When given, write the elapsed time for each pass to this file.
print_stdout (bool) – Whether to print the elapsed time for each pass to standard output.
 Return type
 class hidet.graph.GraphPassInstrument[source]¶
Graph pass instrument.
This class defines the interface for graph pass instruments. An instrument defines the functions that will be called before and after each pass. This can be used to collect the information of graph passes. Currently, the instrument does not support modifying the flow graph passed to it (such functionality should be implemented as graph pass).
To define a custom graph pass instrument and use it:
import hidet # define custom instrument and implement instrument functions class MyInstrument(hidet.graph.GraphPassInstrument): def before_all_passes(self, graph: FlowGraph) > None: print('before all passes') def before_pass(self, pass_name: str, graph: FlowGraph) > None: print('before pass', pass_name) def after_pass(self, pass_name: str, graph: FlowGraph) > None: print('after pass', pass_name) def after_all_passes(self, graph: FlowGraph) > None: print('after all passes') graph = hidet.graph.FlowGraph(outputs=[]) # empty flow graph with hidet.graph.PassContext() as ctx: # add custom instrument to pass context ctx.instruments.append(MyInstrument()) # optimize flow graph hidet.graph.optimize(graph)
We can get output like
before all passes before pass FoldConstantPass after pass FoldConstantPass before pass PatternTransformPass after pass PatternTransformPass ... after all passes
Methods:
before_all_passes
(graph)Called before process all passes.
before_pass
(pass_name, graph)Called before each pass.
after_pass
(pass_name, graph)Called after each pass.
after_all_passes
(graph)Called after applying all passes.
 before_all_passes(graph)[source]¶
Called before process all passes.
 Parameters
graph (FlowGraph) – The flow graph before applying all passes.
 Return type
None
 before_pass(pass_name, graph)[source]¶
Called before each pass.
 Parameters
pass_name (str) – The name of the pass that is going to be applied.
graph (FlowGraph) – The flow graph before applying the pass.
 Return type
None
 hidet.graph.asarray(obj, /, *, dtype=None, device=None)[source]¶
Convert a list, tuple, or numpy ndarray to a hidet tensor.
 Parameters
 Returns
ret – The hidet tensor converted from given object.
 Return type
 hidet.graph.randn(shape, dtype='float32', mean=0.0, stddev=1.0, device='cpu')[source]¶
Create a tensor with uniformly distributed values.
 Parameters
shape (Sequence[int]) – The shape of new tensor.
dtype (DataType or str, default 'float32') – The data type of element of the tensor.
mean (float, default 0.0) – The mean of the uniform distribution.
stddev (float, default 1.0) – The standard deviation of the uniform distribution.
device (Device or str, default 'cpu') – The device of the new tensor is created on.
 Returns
ret – The created tensor.
 Return type
Examples
>>> randn([2, 3]) Tensor(shape=[2, 3], dtype='float32', device='cuda') [[ 0.10720467 1.6906018 0.06347568] [0.37061226 0.562728 1.857547 ]]
 hidet.graph.empty(shape, dtype='float32', device='cpu', layout=None)[source]¶
Create an uninitialized tensor.
 Parameters
shape (Sequence[int]) – The shape of new tensor.
dtype (str or DataType) – The data type of element of the tensor.
device (Device or str, default 'cpu') – The device of the new tensor is created on.
layout (DataLayout, optional) – The layout of the new tensor. None indicates the default layout (rowmajor layout).
 Returns
ret – The created tensor.
 Return type
 hidet.graph.zeros(shape, dtype='float32', device='cpu')[source]¶
Create a tensor initialized with zero.
 hidet.graph.ones(shape, dtype='float32', device='cpu')[source]¶
Create a tensor initialized with one.
 hidet.graph.symbol(shape, dtype='float32', device='cpu', layout=None)[source]¶
Create a symbolic tensor.
 Parameters
shape (Sequence[int]) – The shape of new tensor.
dtype (str) – The data type of element of the tensor.
device (Device or str, default 'cpu') – The device of the new tensor is created on.
layout (DataLayout, optional) – The layout of the new tensor. None indicates the default layout (rowmajor layout).
 Returns
ret – The created tensor.
 Return type
 hidet.graph.randn_like(data, mean=0.0, stddev=1.0, shape=None, dtype=None, device=None)[source]¶
Create a randomly initialized tensor with the same shape, dtype, and device as the given tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
mean (float, optional) – The mean of the normal distribution.
stddev (float, optional) – The standard deviation of the normal distribution.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
 Returns
ret – The created tensor with random values sampled from a normal distribution.
 Return type
 hidet.graph.empty_like(data, shape=None, dtype=None, device=None, layout=None)[source]¶
Create an uninitialized tensor with the same shape, dtype, and device as the given tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
layout (DataLayout, optional) – The layout of the new tensor. If None, the layout of data is used.
 Returns
ret – The created tensor.
 Return type
 hidet.graph.zeros_like(data, shape=None, dtype=None, device=None)[source]¶
Create a tensor initialized with zero with the same shape, dtype, and device as the given tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
 Returns
ret – The created tensor with all elements as zero.
 Return type
 hidet.graph.ones_like(data, shape=None, dtype=None, device=None)[source]¶
Create a tensor initialized with one with the same shape, dtype, and device as the given tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
 Returns
ret – The created tensor with all elements as one.
 Return type
 hidet.graph.symbol_like(data, shape=None, dtype=None, device=None, layout=None)[source]¶
Create a symbol tensor like an existing tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
layout (DataLayout, optional) – The layout of the new tensor. If None, the layout of data is used.
 Returns
ret – The created symbol tensor.
 Return type
 hidet.graph.full(shape, fill_value, dtype='float32', device='cpu')[source]¶
Create a tensor initialized with given constant.
 Parameters
shape (Sequence[int]) – The shape of new tensor.
fill_value (float or int or hidet.ir.Constant) – The constant to initialize the new tensor.
dtype (DataType or str, default 'float32') – The data type of element of the tensor.
device (Device or str, default 'cpu') – The device of the new tensor is created on.
 Returns
ret – The created tensor.
 Return type
 hidet.graph.full_like(data, fill_value, shape=None, dtype=None, device=None)[source]¶
Create a tensor initialized with fill_value with the same shape, dtype, and device as the given tensor.
 Parameters
data (Tensor) – The tensor to copy shape, dtype, and device from.
fill_value (int, float, or bool) – The value to fill the tensor with.
shape (Sequence[int], optional) – The shape of new tensor. If None, the shape of data is used.
dtype (DataType or str, optional) – The data type of element of the tensor. If None, the dtype of data is used.
device (Device or str, optional) – The device of the new tensor is created on. If None, the device of data is used.
 Returns
ret – The created tensor with all elements as fill_value.
 Return type
 hidet.graph.from_numpy(nparray)[source]¶
Create a tensor from a numpy array, sharing the memory with the numpy array when possible.
 Parameters
nparray (numpy.ndarray) – The numpy array to create the tensor from.
 Returns
ret – The created tensor.
 Return type
 hidet.graph.from_dlpack(dltensor)[source]¶
Create a hidet tensor from an object that implements the __dlpack__ protocol.
 Parameters
dltensor (an object that implements the DLPack protocol.) – The object must have the method __dlpack__ that returns a PyCapsule object with name dltensor.
 Returns
ret – The hidet tensor that shares the same storage with the DLPack tensor.
 Return type
 hidet.graph.from_torch(torch_tensor)[source]¶
Create a hidet tensor from pytorch tensor.
The created tensor shared the same memory as given pytorch tensor. Thus, any content modification on one tensor would be reflected on the other one.
 Parameters
torch_tensor (torch.Tensor) – The pytorch tensor.
 Returns
ret – The created hidet tensor.
 Return type
 hidet.graph.trace_from(tensor, inputs=None)[source]¶
Trace the flow graph given the output tensor(s).
Each
hidet.graph.Tensor
has an attributehidet.graph.Tensor.trace
which indicates how the tensor is generated. If the tensor is generated by an operator with symbolic input(s), the tensor itself is also symbolic. And the tensor will have a reference to the operator that generates it. The reference is stored in this attribute.What this function does is to walk through the trace of the given tensor(s) and construct a flow graph.
When there are multiple symbol inputs, it is mandatory to specify the “inputs” argument explicitly to avoid ambiguity.
 Parameters
tensor (Tensor or List[Tensor]) – The output tensor(s) that we trace from.
inputs (Optional, Tensor or List[Tensor]) – The inputs of the flow graph. When there is only a single symbol tensor in the flow graph, it is optional. When there are multiple inputs, this is required to specify the input order.
 Returns
ret – The flow graph that outputs the given input tensor(s).
 Return type
 hidet.graph.optimize(graph)[source]¶
Optimize a flow graph.
This function applies a sequence of predefined graphlevel passes to a
FlowGraph
to conduct optimizations and graph transformations.Tip
Some graph passes provide options to config, please refer to
hidet.graph.PassContext
for more information on graph pass configuration.