hidet.cuda
Contents
hidet.cuda¶
Contents¶
Device Management
Returns True if CUDA is available, False otherwise. |
|
Get the number of available CUDA devices. |
|
Get the current cuda device. |
|
Set the current cuda device. |
|
Get the properties of a CUDA device. |
|
Get the compute capability of a CUDA device. |
|
Synchronize the host thread with the device. |
|
Mark the start of a profiling range. |
|
Mark the end of a profiling range. |
Memory Management
Allocate memory on the current device. |
|
Allocate memory on the current device asynchronously. |
|
Allocate pinned host memory. |
|
Free memory on the current cuda device. |
|
Free memory on the current cuda device asynchronously. |
|
Free pinned host memory. |
|
Set the gpu memory to a given value. |
|
Set the gpu memory to given value asynchronously. |
|
Copy gpu memory from one location to another. |
|
Copy gpu memory from one location to another asynchronously. |
|
Get the free and total memory on the current device in bytes. |
Stream and Event
A CUDA stream. |
|
An external CUDA stream created from a handle. |
|
A CUDA event. |
|
Get the current stream. |
|
Get the default stream. |
|
Set the current stream. |
CUDA Graph
A CUDA graph that executes a |
Device Management¶
- hidet.cuda.available()[source]¶
Returns True if CUDA is available, False otherwise.
- Returns
ret – Whether CUDA is available.
- Return type
bool
- hidet.cuda.device_count()[source]¶
Get the number of available CUDA devices.
- Returns
count – The number of available CUDA devices.
- Return type
int
- hidet.cuda.current_device()[source]¶
Get the current cuda device.
- Returns
device_id – The ID of the cuda device.
- Return type
int
- hidet.cuda.set_device(device_id)[source]¶
Set the current cuda device.
- Parameters
device_id (int) – The ID of the cuda device.
- hidet.cuda.properties(device_id=0)[source]¶
Get the properties of a CUDA device.
- Parameters
device_id (int) – The ID of the device.
- Returns
prop – The properties of the device.
- Return type
cudaDeviceProp
- hidet.cuda.compute_capability(device_id=0)[source]¶
Get the compute capability of a CUDA device.
- Parameters
device_id (int) – The ID of the device to query.
- Returns
(major, minor) – The compute capability of the device.
- Return type
Tuple[int, int]
Memory Allocation¶
- hidet.cuda.malloc(num_bytes)[source]¶
Allocate memory on the current device.
- Parameters
num_bytes (int) – The number of bytes to allocate.
- Returns
addr – The address of the allocated memory.
- Return type
int
- hidet.cuda.malloc_async(num_bytes, stream=None)[source]¶
Allocate memory on the current device asynchronously.
- Parameters
num_bytes (int) – The number of bytes to allocate.
stream (Optional[Union[Stream, cudaStream_t, int]]) – The stream to use for the allocation. If None, the current stream is used.
- Returns
addr – The address of the allocated memory.
- Return type
int
- hidet.cuda.malloc_host(num_bytes)[source]¶
Allocate pinned host memory.
- Parameters
num_bytes (int) – The number of bytes to allocate.
- Returns
addr – The address of the allocated memory.
- Return type
int
- hidet.cuda.free(addr)[source]¶
Free memory on the current cuda device.
- Parameters
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc()
ormalloc_async()
.- Return type
None
- hidet.cuda.free_async(addr, stream=None)[source]¶
Free memory on the current cuda device asynchronously.
- Parameters
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc()
ormalloc_async()
.stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the free. If None, the current stream is used.
- Return type
None
- hidet.cuda.free_host(addr)[source]¶
Free pinned host memory.
- Parameters
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc_host()
.- Return type
None
- hidet.cuda.memset(addr, value, num_bytes)[source]¶
Set the gpu memory to a given value.
- Parameters
addr (int) – The start address of the memory region to set.
value (int) – The byte value to set the memory region to.
num_bytes (int) – The number of bytes to set.
- Return type
None
- hidet.cuda.memset_async(addr, value, num_bytes, stream=None)[source]¶
Set the gpu memory to given value asynchronously.
- Parameters
addr (int) – The start address of the memory region to set.
value (int) – The byte value to set the memory region to.
num_bytes (int) – The number of bytes to set.
stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the memset. If None, the current stream is used.
- Return type
None
- hidet.cuda.memcpy(dst, src, num_bytes)[source]¶
Copy gpu memory from one location to another.
- Parameters
dst (int) – The destination address.
src (int) – The source address.
num_bytes (int) – The number of bytes to copy.
- Return type
None
- hidet.cuda.memcpy_async(dst, src, num_bytes, stream=None)[source]¶
Copy gpu memory from one location to another asynchronously.
- Parameters
dst (int) – The destination address.
src (int) – The source address.
num_bytes (int) – The number of bytes to copy.
stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the memcpy. If None, the current stream is used.
- Return type
None
CUDA Stream and Event¶
- class hidet.cuda.Stream(device=None, blocking=False, priority=0, **kwargs)[source]¶
A CUDA stream.
- Parameters
device (int or hidet.Device, optional) – The device on which to create the stream. If None, the current device will be used.
blocking (bool) – Whether to enable the implicit synchronization between this stream and the default stream. When enabled, any operation enqueued in the stream will wait for all previous operations in the default stream to complete before beginning execution.
priority (int) – The priority of the stream. The priority is a hint to the CUDA driver that it can use to reorder operations in the stream relative to other streams. The priority can be 0 (default priority) and -1 (high priority). By default, all streams are created with priority 0.
- device_id()[source]¶
Get the device ID of the stream.
- Returns
device_id – The device ID of the stream.
- Return type
int
- handle()[source]¶
Get the handle of the stream.
- Returns
handle – The handle of the stream.
- Return type
cudaStream_t
- class hidet.cuda.ExternalStream(handle, device_id=None)[source]¶
An external CUDA stream created from a handle.
- Parameters
handle (int or cudaStream_t) – The handle of the stream.
device_id (int, optional) – The device ID of the stream. If None, the current device will be used.
- class hidet.cuda.Event(enable_timing=False, blocking=False)[source]¶
A CUDA event.
- Parameters
enable_timing (bool) – When enabled, the event is able to record the time between itself and another event.
blocking (bool) – When enabled, we can use the
synchronize()
method to block the current host thread until the event completes.
- handle()[source]¶
Get the handle of the event.
- Returns
handle – The handle of the event.
- Return type
cudaEvent_t
- elapsed_time(start_event)[source]¶
Get the elapsed time between the start event and this event in milliseconds.
- Parameters
start_event (Event) – The start event.
- Returns
elapsed_time – The elapsed time in milliseconds.
- Return type
float
- record(stream=None)[source]¶
Record the event in the given stream.
After the event is recorded:
We can synchronize the event to block the current host thread until all the tasks before the event are completed via
Event.synchronize()
.We can also get the elapsed time between the event and another event via
Event.elapsed_time()
(when enable_timing is True).We can also let another stream to wait for the event via
Stream.wait_event()
.
- Parameters
stream (Stream, optional) – The stream where the event is recorded.
- hidet.cuda.current_stream(device=None)[source]¶
Get the current stream.
- Parameters
device (int or hidet.Device, optional) – The device on which to get the current stream. If None, the current device will be used.
- Returns
stream – The current stream.
- Return type
CUDA Graph¶
- class hidet.cuda.graph.CudaGraph(flow_graph)[source]¶
A CUDA graph that executes a
FlowGraph
on the GPU.You can create the CUDA graph by calling
cuda_graph()
.- Parameters
flow_graph (FlowGraph) – The flow graph to be executed.
- property inputs: List[hidet.Tensor]¶
The inputs of the cuda graph.
- property outputs: List[hidet.Tensor]¶
The outputs of the cuda graph.
- run(inputs=None)[source]¶
Run the cuda graph synchronously. If the inputs are provided, the inputs will be copied to the internal inputs of the cuda graph before running.