9. API リファレンス
- class mlsdk.MNDevice
Class to specify which device to use.
- パラメータ:
device_name (str) --
Device name separated by
":"(e.g."mncore2:auto"). The first string indicates which device to use, with the subsequent string modifying it. Availabledevice_nameare as follows:mncore,mncore2: Each refers to either the first or second generation of MN-Core and requires modification using a device index orauto(e.g."mncore2:0").pfvm: Used for using MLSDK with backends other than MN-Core. Must be modified forcpuorcuda(e.g."pfvm:cpu").emu,emu2: Each refers to an emulator designed for the first or second generation MN-Core. No modifiers are required (e.g."emu2").
- class mlsdk.Context(device: MNDevice)
- compile(function: Callable[[Dict[str, Tensor]], Dict[str, Tensor]], inputs: Mapping[str, Tensor | TensorProxy], codegen_dir: Path, *, options: Dict[str, Any] | None = None, cache_options: CacheOptions | None = None, num_compiler_threads: int | None = None, quiet: bool = True, exit_after_generate_codegen_dir: bool = False, optimizers: List[Optimizer] | None = None, export_kwargs: Dict[str, Any] | None = None, training: bool = True, initialize: bool | None = True, optimizer_spec: List[OptimizerSpecParamGroup] | None = None, optional_options: Set[str] | None = None, group: ProcessGroup | None = None, predefined_symbols: Dict[str, MNDeviceBuffer] | None = None) CompiledFunction
Compile a Python callable to a function that can be executed on the device.
- パラメータ:
function -- The Python callable to compile.
inputs -- Sample inputs to the function.
codegen_dir -- The directory to store intermediate and generated files.
options --
Specify compile options to control the compilation process. Predefined options (like
O0.jsonthroughO4.jsonin thepreset_optionsdirectory,/opt/pfn/pfcomp/codegen/preset_options/in MLSDK) are available, with higher numbers indicating more advanced optimizations but longer compilation times.A crucial setting is
float_dtypeto prevent unintended precision degradation. By settingfloat_dtypetofloat, you can force all operations to be performed in thefloatprecision. This option controls the floating-point type assigned totorch.float32tensors:mixed(default): Uses half-precision for GEMM operations (in/out) and float otherwise.half,float,double: Assigns the specified type to all such tensors.
You can specify those compile options like the following:
context.compile( ... options={ "option_json": "preset_options/O4.json", "float_dtype": "float", }, )
cache_options -- Options for caching. See CacheOptions for details.
num_compiler_threads -- The number of threads to use for compilation. If None, the number of threads will automatically be determined.
quiet -- If True, suppress output from the compiler.
exit_after_generate_codegen_dir -- For internal use only. If True, exit after generating the codegen directory. This is useful for decomposed layers test.
optimizers -- For internal use only. A list of PyTorch optimizers to use for training.
export_kwargs -- For internal use only. kwargs related to exporting the model to ONNX.
training -- For internal use only. If True, the function is used for training.
optimizer_spec -- For internal use only. The optimizer spec to use for training.
initialize -- For internal use only. TODO (akirakawata): Add description.
optional_options -- For internal use only. TODO (akirakawata): Add description.
group -- For internal use only. TODO (akirakawata): Add description.
predefined_symbols -- For internal use only. TODO (akirakawata): Add description.
- compile_automap(function: Callable[[...], Any], sample_args: Tuple, sample_kwargs: Dict[str, Any], codegen_dir: Path, **kwargs: Any) CompiledFunctionAutoMapped
Note: this is a beta feature. Compile a Python callable whose arguments and return values can be (possibly nested) tuples, lists, dictionaries, namedtuples, and dataclasses. All items in tuples/lists, values in dictionaries, and fields in namedtuples/dataclasses must be torch.Tensor objects. Additionally, any dataclasses used must be decorated with @register_pytree_dataclass, and any generic namedtuples used must be decorated with @register_pytree_namedtuple.
- パラメータ:
function -- The Python callable to compile.
sample_args -- Positional arguments of function collected into a tuple. This allows TensorLike objects.
sample_kwargs -- Keyword arguments of function collected into a dictionary. This allows TensorLike objects.
codegen_dir -- The directory to store intermediate and generated files.
kwargs -- Keyword-only arguments passed for the context.compile method.
- get_registered_value_proxy(value: Tensor) TensorProxy
Get the TensorProxy for the given value if it is registered in the context. :param value: The torch.Tensor to get the proxy for. :return: The TensorProxy for the given value.
- load_codegen_dir(codegen_dir: Path) CompiledFunction
Load a function that can be executed on the device from codegen_dir without validation.
- パラメータ:
codegen_dir -- The directory to load compile results files.
注釈
This method will fail if the required compiled artifact, model.app.zst, is not found within the codegen_dir. Be aware that the returned function is strict; it requires an input dictionary with the exact same keys (variable names) and tensor shapes as the input used during the original compilation.
- register_buffer(buffer: Tensor) None
Registers a buffer in the context.
注釈
Before calling this method, you must set the name of the buffer using set_tensor_name_in_module or set_tensor_name.
- register_optimizer_buffers(optimizer: MNCoreOptimizer) None
Registers optimizer buffers in the context.
注釈
Before calling this method, you must set the name of the buffer using set_buffer_name_in_optimizer or set_tensor_name.
- register_param(param: Parameter) None
Registers a parameter in the context.
注釈
Before calling this method, you must set the name of the parameter using set_tensor_name_in_module or set_tensor_name.
- static switch_context(new_context: Context) None
Switching a context causes all the current tensors in the device to be moved back to host memory and the next context ones to be loaded in mncore.
- synchronize() None
Synchronizes the context by moving tensors to the torch framework and marks the context for initialization.
This method performs the following steps:
Calls the synchronize() method of the device associated with the context.
Iterates over all tensor names in the registry and moves each tensor to the torch framework.
注釈
Different from sync torch.cuda.synchronize(), which only wait for all kernels in all streams on a CUDA device to complete. This function also moves all tensors in the context's registry from the device to the host.
- class mlsdk.CompiledFunction(context: Context, code_block: _CompiledFunction, *, output_signature: ValueSignature | None = None)
- allocate_input_proxy() Dict[str, TensorProxy]
Allocate input proxies for the function. :return: A dictionary mapping input names to their corresponding TensorProxy objects.
- class mlsdk.TensorProxy(context: Context, codegen_data: TensorProxyCodegenData, *, is_input: bool = False)
- cpu() Tensor
Transfer the corresponding data to CPU (Host) to access as torch.Tensor.
- load_from(value: Tensor | TensorProxy, *, clone: bool = True) None
Load data from a torch.Tensor or another TensorProxy to this TensorProxy.
- パラメータ:
value -- The source tensor to copy data from.
clone -- If True and value is a torch.Tensor, it will be cloned before copying, enabling the source tensor to be modified without affecting this TensorProxy.
- mlsdk.TensorLike = Union[torch.Tensor, TensorProxy]
- class mlsdk.CacheOptions(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False)
- __init__(cache_dir_str: str, *, enable_app_cache: bool = True, enable_onnx_cache: bool = False, enable_codegen_cache: bool = False, enable_gpfn2obj_cache: bool = False) None
The options for specifying the cache directory and controlling cache behavior.
- パラメータ:
cache_dir_str -- A path string of the root directory to store cache.
enable_app_cache -- If True, cache compiled GPFNApp files from ONNX files. GPFNApp is the binary format of MN-Core compiler uses.
enable_onnx_cache -- If True, cache exported ONNXs from the given function.
enable_codegen_cache -- If True, cache the codegen compilation. This option is mainly for developers.
enable_gpfn2obj_cache -- If True, cache the GPFN object data. This option is mainly for developers.
- class mlsdk.MNCoreOptimizer(params: Iterator[Parameter], defaults: Dict[str, Any])
- zero_grad(set_to_none: bool = True) None
Clear the gradient of the parameters.
- パラメータ:
set_to_none (bool) -- If True, the gradients will be set to None instead of zero.
- class mlsdk.MNCoreSGD(params: Iterator[Parameter], lr: float | Tensor = 0.001, momentum: float = 0, dampening: float = 0, weight_decay: float | Tensor = 0, nesterov: bool = False, *, maximize: bool = False, foreach: bool | None = None, differentiable: bool = False, fused: bool | None = None)
- step(closure=None) None
Perform a single optimization step to update parameter.
- Args:
- closure (Callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.
- class mlsdk.MNCoreAdam(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, foreach: bool | None = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False, chainer_use_torch: bool = True)
- step(closure=None) None
Perform a single optimization step to update parameter.
- Args:
- closure (Callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.
- class mlsdk.MNCoreAdamW(params: Iterator[Parameter], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, foreach: bool | None = None, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)
- step(closure=None) None
Perform a single optimization step to update parameter.
- Args:
- closure (Callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.
- class mlsdk.MNCoreLRScheduler(scheduler: LRScheduler, context: Context | None)
- step() None
Perform a step.
- mlsdk.set_buffer_name_in_optimizer(optimizer: MNCoreOptimizer, name: str) None
Set the buffer names in the optimizer.
This function sets the names of the tensors in the optimizer (i.e. buffers) according to the optimizer's name, so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the optimizer buffers to the context.
- パラメータ:
optimizer -- The optimizer to set buffer names.
name -- The name of the optimizer.
- mlsdk.get_tensor_name(tensor: Tensor) str | None
Get the name of the tensor.
This function returns the name of the tensor set by
set_tensor_name_in_moduleorset_buffer_name_in_optimizer.- パラメータ:
tensor -- The tensor to get the name of.
- 戻り値:
The name of the tensor.
- mlsdk.set_tensor_name(t: Tensor, name: str) None
Set a custom name attribute on a PyTorch tensor for ONNX export.
This function attaches a safe string-encoded name to a tensor that can be used by the FX2ONNX exporter to identify the tensor in the resulting ONNX model.
- パラメータ:
t -- The tensor to which the name attribute will be attached.
name -- The name to assign to the tensor. Will be encoded as a safe string.
- 戻り値:
None
- mlsdk.set_tensor_name_in_module(module: Module, module_name: str | None) None
Set the tensor names in the module.
This function sets the names of the tensor in the module (i.e. parameters and buffers such as BN stats), so that ONNX exporter (FX2ONNX) can distinguish them later. You need to call this function before registering the module parameters and buffers to the context.
- パラメータ:
module -- The module to set tensor names.
module_name -- The name of the module.
- storage.path(target: str) Path
- mlsdk.path(target: str) Path
- mlsdk.trace_event(name: str, prepend_filename: bool = False) Iterator[None]
Context manager that traces execution of a code block using Perfetto tracing.
This function creates a trace event that marks the beginning and end of a code block's execution. The trace is only recorded if Perfetto tracing is enabled in the global state.
- パラメータ:
name -- The name of the trace event to be recorded.
prepend_filename -- If True, prepends the filename and line number of the caller to the trace event name. Defaults to False.
- Yields:
None. Yields control back to the caller while maintaining the trace context.
- Example:
>>> with trace_event("process_data"): ... # Code to be traced ... perform_operation()
>>> with trace_event("calculate", prepend_filename=True): ... # Trace name will include filename and line number ... result = expensive_calculation()
- Note:
If Perfetto tracing is disabled (_perfetto_state is False), the context manager acts as a no-op and yields immediately.
The trace event is automatically ended in the finally block to ensure proper cleanup even if an exception occurs.
When prepend_filename=True, the filename and line number are extracted from the caller's stack frame.
- mlsdk.trace_scope(output_filename: str | Path | None, ignore_if_traced: bool = False) Iterator[None]
Context manager for tracing performance metrics using Perfetto.
Manages the lifecycle of a Perfetto trace session. Handles trace initialization, execution, and finalization. Supports both local and remote file system outputs.
- パラメータ:
output_filename -- Path where the trace file will be saved. Can be a string or Path object. If None, tracing context is entered without actual tracing.
ignore_if_traced -- If True, yields without starting a new trace if one is already active If False (default), raises an assertion error if tracing is already active.
- Yields:
None. Yields control back to the caller while managing the trace context.
- Raises:
AssertionError: If tracing has already started and ignore_if_traced is False.
- Note:
For local file systems, writes the trace directly to the specified path.
For remote file systems, writes to a temporary local file and then transfers to the destination.
Ensures cleanup of trace state in all execution paths using finally block.
- class fx2onnx.linter.LintLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
linter の警告フィルタレベル。
値が小さいほど詳細な警告を出力します。
- 変数:
DEBUG -- linter が認識する全ての警告を出力します。PyTorch や
torch.nn.Moduleの内部に由来する可能性がある低レベルの警告も含みます。INFO -- export の失敗につながらないと推定される警告を抑制します。通常のユーザー向けの診断では、このレベルの使用を推奨します。
- fx2onnx.linter.lint(func: ~typing.Callable[[...], ~typing.Any], args: ~typing.Sequence[~typing.Any] | None = None, kwargs: ~typing.Mapping[str, ~typing.Any] | None = None, output_adapter: ~fx2onnx.adapter.OutputAdapter | None = None, ignore_modules: set[~types.ModuleType] | None = {<module 'fx2onnx' from '/opt/pfn/pfcomp/fx2onnx/fx2onnx/__init__.py'>, <module 'inspect' from '/usr/lib/python3.12/inspect.py'>, <module 'torch._dynamo' from '/usr/local/lib/python3.12/dist-packages/torch/_dynamo/__init__.py'>, <module 'torch.nn.modules' from '/usr/local/lib/python3.12/dist-packages/torch/nn/modules/__init__.py'>, <module 'torch.utils' from '/usr/local/lib/python3.12/dist-packages/torch/utils/__init__.py'>}, lint_level: ~fx2onnx.linter.LintLevel = LintLevel.DEBUG) LintResult
指定されたサンプル入力で
funcをトレースし、lint レポートを返します。- パラメータ:
func -- 検査対象の callable です。
mlsdk.Context.compile()に渡すものと同じ関数 object にしてください。args -- トレースに使用する位置サンプル引数。
kwargs -- トレースに使用するキーワードサンプル引数。
output_adapter -- トレース中に出力 tensor を収集するために使用する adapter です。省略した場合は
DefaultOutputAdapterが使用されます。ignore_modules -- global Python object の使用を報告する際に、内部のスタックフレームを無視する Python module 。
lint_level -- 警告フィルタレベル。
- 戻り値:
- class fx2onnx.linter.LintResult(global_object_refs: list[GuardInfo], tensor_leaks: list[StackSummary], tensor_item_accesses: list[StackSummary], dynamic_shapes: list[str], inout_warnings: list[str], graph_break_by_optimizer: GraphCompileReason | None = None, graph_break_by_lr_scheduler: GraphCompileReason | None = None)
lint()が返す named tuple 。- 変数:
global_object_refs -- トレース中に値が固定される可能性がある、global または外側の関数スコープの Python object の使用。
tensor_leaks -- global 変数または nonlocal 変数への
torch.Tensor値の代入に関するスタックトレース。tensor_item_accesses --
torch.Tensorの値から Python scalar 値を作成する操作に関するスタックトレースです。Tensor.item()、bool(tensor)、int(tensor)、データ依存の分岐などが含まれます。dynamic_shapes -- symbolic shape または動的 shape を持つ tensor を生成する操作に関するスタックトレース。
inout_warnings -- MLSDK でコンパイルできる関数シグネチャに一致しない入力または出力に関する警告。
graph_break_by_optimizer --
torch.optim.Optimizerの使用が検出された場合の graph break 情報。graph_break_by_lr_scheduler --
torch.optim.lr_scheduler.LRSchedulerの使用が検出された場合の graph break 情報。
- apply_lint_level(lint_level: LintLevel) LintResult
指定した警告フィルタを適用した結果のコピーを返します。
- dump() None
人間が読める形式で警告を出力します。