8. API Reference

class mlsdk.MNDevice

Class to specify which device to use.

Parameters:

device_name (str) –

Device name separated by ":" (e.g. "mncore2:auto"). The first string indicates which device to use, with the subsequent string modifying it. Available device_name are as follows:

  • mncore, mncore2: Each refers to either the first or second generation of MN-Core and requires modification using a device index or auto (e.g. "mncore2:0").

  • pfvm: Used for using MLSDK with backends other than MN-Core. Must be modified for cpu or cuda (e.g. "pfvm:cpu").

  • emu, emu2: Each refers to an emulator designed for the first or second generation MN-Core. No modifiers are required (e.g. "emu2").

class mlsdk.Context(device: MNDevice)
compile(function: Callable[[Dict[str, Tensor]], Dict[str, Tensor]], inputs: Mapping[str, Tensor | TensorProxy], codegen_dir: Path, *, options: Dict[str, Any] | None = None, cache_options: CacheOptions | None = None, num_compiler_threads: int | None = None, quiet: bool = True, exit_after_generate_codegen_dir: bool = False, optimizers: List[Optimizer] | None = None, export_kwargs: Dict[str, Any] | None = None, training: bool = True, initialize: bool | None = True, optimizer_spec: List[OptimizerSpecParamGroup] | None = None, optional_options: Set[str] | None = None, group: ProcessGroup | None = None, predefined_symbols: Dict[str, MNDeviceBuffer] | None = None) CompiledFunction

Compile a Python callable to a function that can be executed on the device.

Parameters:
  • function – The Python callable to compile.

  • inputs – Sample inputs to the function.

  • codegen_dir – The directory to store intermediate and generated files.

  • options

    Specify compile options to control the compilation process. Predefined options (like O0.json through O4.json in the preset_options directory, /opt/pfn/pfcomp/codegen/preset_options/ in MLSDK) are available, with higher numbers indicating more advanced optimizations but longer compilation times.

    A crucial setting is float_dtype to prevent unintended precision degradation. By setting float_dtype to float, you can force all operations to be performed in the float precision. This option controls the floating-point type assigned to torch.float32 tensors:

    • mixed (default): Uses half-precision for GEMM operations (in/out) and float otherwise.

    • half, float, double: Assigns the specified type to all such tensors.

    You can specify those compile options like the following:

    context.compile(
        ...
        options={
            "option_json": "preset_options/O4.json",
            "float_dtype": "float",
        },
    )
    

  • cache_options – Options for caching. See CacheOptions for details.

  • num_compiler_threads – The number of threads to use for compilation. If None, the number of threads will automatically be determined.

  • quiet – If True, suppress output from the compiler.

  • exit_after_generate_codegen_dir – For internal use only. If True, exit after generating the codegen directory. This is useful for decomposed layers test.

  • optimizers – For internal use only. A list of PyTorch optimizers to use for training.

  • export_kwargs – For internal use only. kwargs related to exporting the model to ONNX.

  • training – For internal use only. If True, the function is used for training.

  • optimizer_spec – For internal use only. The optimizer spec to use for training.

  • initialize – For internal use only. TODO (akirakawata): Add description.

  • optional_options – For internal use only. TODO (akirakawata): Add description.

  • group – For internal use only. TODO (akirakawata): Add description.

  • predefined_symbols – For internal use only. TODO (akirakawata): Add description.

get_registered_value_proxy(value: Tensor) TensorProxy

Get the TensorProxy for the given value if it is registered in the context. :param value: The torch.Tensor to get the proxy for. :return: The TensorProxy for the given value.

load_codegen_dir(codegen_dir: Path) CompiledFunction

Load a function that can be executed on the device from codegen_dir without validation.

Parameters:

codegen_dir – The directory to load compile results files.

Note

This method will fail if the required compiled artifact, model.app.zst, is not found within the codegen_dir. Be aware that the returned function is strict; it requires an input dictionary with the exact same keys (variable names) and tensor shapes as the input used during the original compilation.

register_buffer(buffer: Tensor) None

Registers a buffer in the context.

Note

Before calling this method, you must set the name of the buffer using set_tensor_name_in_module or set_tensor_name.

register_optimizer_buffers(optimizer: MNCoreOptimizer) None

Registers optimizer buffers in the context.

Note

Before calling this method, you must set the name of the buffer using set_buffer_name_in_optimizer or set_tensor_name.

register_param(param: Parameter) None

Registers a parameter in the context.

Note

Before calling this method, you must set the name of the parameter using set_tensor_name_in_module or set_tensor_name.

static switch_context(new_context: Context) None

Switching a context causes all the current tensors in the device to be moved back to host memory and the next context ones to be loaded in mncore.

synchronize() None

Synchronizes the context by moving tensors to the torch framework and marks the context for initialization.

This method performs the following steps:

  1. Calls the synchronize() method of the device associated with the context.

  2. Iterates over all tensor names in the registry and moves each tensor to the torch framework.

Note

Different from sync torch.cuda.synchronize(), which only wait for all kernels in all streams on a CUDA device to complete. This function also moves all tensors in the context’s registry from the device to the host.

class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)
allocate_input_proxy() Dict[str, TensorProxy]

Allocate input proxies for the function. :return: A dictionary mapping input names to their corresponding TensorProxy objects.

class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)
cpu() Tensor

Transfer the corresponding data to CPU (Host) to access as torch.Tensor.

load_from(value: Tensor | TensorProxy, *, clone: bool = True) None

Load data from a torch.Tensor or another TensorProxy to this TensorProxy.

Parameters:
  • value – The source tensor to copy data from.

  • clone – If True and value is a torch.Tensor, it will be cloned before copying, enabling the source tensor to be modified without affecting this TensorProxy.

mlsdk.TensorLike = Union[torch.Tensor, TensorProxy]
class mlsdk.CacheOptions(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False)
__init__(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False) None

The options for specifying the cache directory and controlling cache behavior.

Parameters:
  • cache_dir_str – A path string of the root directory to store cache.

  • enable_app_cache – If True, cache compiled GPFNApp files from ONNX files. GPFNApp is the binary format of MN-Core compiler uses.

  • enable_onnx_cache – If True, cache exported ONNXs from the given function.

  • enable_codegen_cache – If True, cache the codegen compilation. This option is mainly for developers.

  • enable_gpfn2obj_cache – If True, cache the GPFN object data. This option is mainly for developers.

class mlsdk.MNCoreOptimizer(params: Iterator[Parameter], defaults: Dict[str, Any])
zero_grad(set_to_none: bool = True) None

Clear the gradient of the parameters.

Parameters:

set_to_none (bool) – If True, the gradients will be set to None instead of zero.

class mlsdk.MNCoreSGD(params: Iterator[Parameter], lr: float | Tensor = 0.001, momentum: float = 0, dampening: float = 0, weight_decay: float | Tensor = 0, nesterov: bool = False, *, maximize: bool = False, foreach: bool | None = None, differentiable: bool = False, fused: bool | None = None)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdam(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, foreach: bool | None = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False, chainer_use_torch: bool = True)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreAdamW(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, foreach: bool | None = None, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)
step(closure=None) None

Perform a single optimization step to update parameter.

Args:
closure (Callable): A closure that reevaluates the model and

returns the loss. Optional for most optimizers.

class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)
step() None

Perform a step.

mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) None

Set the buffer names in the optimizer.

This function sets the names of the tensors in the optimizer (i.e. buffers) according to the optimizer’s name, so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the optimizer buffers to the context.

Parameters:
  • optimizer – The optimizer to set buffer names.

  • name – The name of the optimizer.

mlsdk.get_tensor_name(tensor: Tensor) str | None

Get the name of the tensor.

This function returns the name of the tensor set by set_tensor_name_in_module or set_buffer_name_in_optimizer.

Parameters:

tensor – The tensor to get the name of.

Returns:

The name of the tensor.

mlsdk.set_tensor_name(t: Tensor, name: str) None

Set a custom name attribute on a PyTorch tensor for ONNX export.

This function attaches a safe string-encoded name to a tensor that can be used by the FX2ONNX exporter to identify the tensor in the resulting ONNX model.

Parameters:
  • t – The tensor to which the name attribute will be attached.

  • name – The name to assign to the tensor. Will be encoded as a safe string.

Returns:

None

mlsdk.set_tensor_name_in_module(module: Module, module_name: str | None) None

Set the tensor names in the module.

This function sets the names of the tensor in the module (i.e. parameters and buffers such as BN stats), so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the module parameters and buffers to the context.

Parameters:
  • module – The module to set tensor names.

  • module_name – The name of the module.

storage.path(target: str) Path
mlsdk.path(target: str) Path
mlsdk.trace_event(name: str, prepend_filename: bool = False) Iterator[None]

Context manager that traces execution of a code block using Perfetto tracing.

This function creates a trace event that marks the beginning and end of a code block’s execution. The trace is only recorded if Perfetto tracing is enabled in the global state.

Parameters:
  • name – The name of the trace event to be recorded.

  • prepend_filename – If True, prepends the filename and line number of the caller to the trace event name. Defaults to False.

Yields:

None. Yields control back to the caller while maintaining the trace context.

Example:
>>> with trace_event("process_data"):
...     # Code to be traced
...     perform_operation()
>>> with trace_event("calculate", prepend_filename=True):
...     # Trace name will include filename and line number
...     result = expensive_calculation()
Note:
  • If Perfetto tracing is disabled (_perfetto_state is False), the context manager acts as a no-op and yields immediately.

  • The trace event is automatically ended in the finally block to ensure proper cleanup even if an exception occurs.

  • When prepend_filename=True, the filename and line number are extracted from the caller’s stack frame.

mlsdk.trace_scope(output_filename: str | Path | None, ignore_if_traced: bool = False) Iterator[None]

Context manager for tracing performance metrics using Perfetto.

Manages the lifecycle of a Perfetto trace session. Handles trace initialization, execution, and finalization. Supports both local and remote file system outputs.

Parameters:
  • output_filename – Path where the trace file will be saved. Can be a string or Path object. If None, tracing context is entered without actual tracing.

  • ignore_if_traced – If True, yields without starting a new trace if one is already active If False (default), raises an assertion error if tracing is already active.

Yields:

None. Yields control back to the caller while managing the trace context.

Raises:

AssertionError: If tracing has already started and ignore_if_traced is False.

Note:
  • For local file systems, writes the trace directly to the specified path.

  • For remote file systems, writes to a temporary local file and then transfers to the destination.

  • Ensures cleanup of trace state in all execution paths using finally block.