8.2.3. Example: Object Detection using SSD

COCO dataset を対象に SSD-ResNet34 の学習済みモデルを用いた推論を行うサンプルプログラム

注釈

ここで利用している学習済みモデル、及び一部ソースコードは intel / ai-reference-models にあるものを部分的に改変、またはそのまま利用しています。これらのライセンスはいずれも Apache License, Version 2.0 です。

また、COCO dataset は CC BY 4.0 のライセンスで提供されており、その中でも特に CC BY 2.0 で提供されている画像を使用します。

Execution Method

初回実行時には以下の内容のダウンロードが行われます (二回目以降は省略されます)。デフォルトでのダウンロード先は /tmp/mlsdk_ssd_inference/ 以下です。

$ cd /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference
$ ./run_ssd_inference.sh /tmp/mlsdk_ssd_inference/coco/val2017/000000363666.jpg --device mncore2:auto

Expected Output

検出結果は、デフォルトでは実行時のカレントディレクトリに out_mncore_ prefix のファイルとして出力されます。同時に out_torch_ prefix のファイルも出力されますが、こちらは結果比較用に PyTorch で実行した結果です。比較には structural similarity index measure (SSIM) アルゴリズムを用います。

Tip

結果ファイルの出力先ディレクトリを指定するには --out_img_dir オプションを使用できます。

検出結果 (./out_mncore_000000363666.jpg)

Object detection result using SSD-ResNet34 on MN-Core 2 — 図 8.3 Object detection using SSD-ResNet34 on MN-Core 2

ログ出力 (SSIM score は 0.99 以上であれば PyTorch と同等の推論が出来ているとみなします。)

Drawing detection result => /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/out_mncore_000000363666.jpg
Drawing detection result => /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/out_torch_000000363666.jpg
SSIM score: 0.9959848160827895

Scripts

リスト 8.18 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/run_ssd_inference.sh

#! /bin/bash

set -eux -o pipefail

EXAMPLE_NAME="mlsdk_ssd_inference"
VENV_DIR=${VENV_DIR:-"/tmp/${EXAMPLE_NAME}/venv"}
EXTERNAL_DIR=${EXTERNAL_DIR:-"/tmp/${EXAMPLE_NAME}/external"}
COCO_DIR=${COCO_DIR:-"/tmp/${EXAMPLE_NAME}/coco"}
OUT_DIR=${OUT_DIR:-"/tmp/${EXAMPLE_NAME}/out"}

CURRENT_DIR=$(realpath $(dirname $0))
CODEGEN_DIR=$(realpath ${CURRENT_DIR}/../../../)
BUILD_DIR="${CODEGEN_DIR}/build"

### Prepare and source venv/

if [[ ! -d ${VENV_DIR} ]]; then
    python3 -m venv --system-site-packages ${VENV_DIR}
    source ${VENV_DIR}/bin/activate
    pip3 install -r ${CURRENT_DIR}/requirements.txt
else
    source ${VENV_DIR}/bin/activate
fi

### Prepare external/ items

mkdir -p ${EXTERNAL_DIR}
pushd ${EXTERNAL_DIR}

if [[ ! -d ai-reference-models ]]; then
    git clone 'https://github.com/intel/ai-reference-models.git' ai-reference-models --depth 1 --branch v3.3
fi
if [[ ! -f resnet34-ssd1200.pth ]]; then
    # For downloading the trained model, please refer to this documentation.
    # Ref: https://github.com/intel/ai-reference-models/blob/main/models_v2/pytorch/ssd-resnet34/inference/cpu/CONTAINER.md
    wget --no-check-certificate \
        'https://docs.google.com/uc?export=download&id=13kWgEItsoxbVKUlkQz4ntjl1IZGk6_5Z' \
        -O resnet34-ssd1200.pth
fi

popd

### Prepare coco/ items

mkdir -p ${COCO_DIR}
pushd ${COCO_DIR}

# download the COCO datasets and annotations
if [[ ! -d val2017 ]]; then
    curl -O 'http://images.cocodataset.org/zips/val2017.zip'
    unzip val2017.zip
    rm -f val2017.zip
fi
if [[ ! -d annotations ]]; then
    curl -O 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
    unzip annotations_trainval2017.zip
    rm -f annotations_trainval2017.zip
fi
# fetch images and annotations for inference
if [[ ! -f annotations/fetched_annotations_eval.json ]]; then
    python3 ${CURRENT_DIR}/coco_preparation.py \
        -a annotations/instances_val2017.json \
        -i val2017 \
        -f annotations/fetched_annotations_eval.json
fi

popd

### Run ssd_inference.py

source "${BUILD_DIR}/codegen_pythonpath.sh"

# Do not split a small h/w image among L1/L2 blocks.
export CODEGEN_SPATIAL_SPLIT_THRESHOLD_L2B_L1B=32

PYTHONPATH=${PYTHONPATH}:"${EXTERNAL_DIR}/ai-reference-models/models_v2/pytorch/ssd-resnet34/inference/cpu"
python3 ${CURRENT_DIR}/ssd_inference.py \
    --model_path "${EXTERNAL_DIR}/resnet34-ssd1200.pth" \
    --coco_data_path "${COCO_DIR}/val2017" \
    --coco_annotation_path "${COCO_DIR}/annotations/fetched_annotations_eval.json" \
    --out_img_dir "${CURRENT_DIR}" \
    --outdir "${OUT_DIR}" \
    --option_json "${CODEGEN_DIR}/preset_options/O1.json" \
    ${@}

リスト 8.19 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/ssd_inference.py

import argparse
import os
from pathlib import Path
from typing import Mapping, Optional

import cv2
import matplotlib
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.utils.data
from PIL import Image, ImageFile
from skimage.metrics import structural_similarity as ssim

# isort: off

# MLSDK modules
from mlsdk import (
    MNDevice,
    Context,
    storage,
    CacheOptions,
    TensorLike,
)

# following modules are from:
# [ai-reference-models repository](https://github.com/intel/ai-reference-models/tree/main/models_v2/pytorch/ssd-resnet34/inference/cpu)  # NOQA: B950
from infer import dboxes_R34_coco
from ssd_r34 import SSD_R34
from utils import COCODetection, Encoder, SSDTransformer

# isort: on

# No GUI output
matplotlib.use("Agg")


# label_num = 81 and strides = [3, 3, 2, 2, 2, 2] are from original source in:
# https://github.com/intel/ai-reference-models/tree/main/models_v2/pytorch/ssd-resnet34/inference/cpu/infer.py
def load_model(
    path: str | os.PathLike,
    label_num: int = 81,
    strides: tuple[int, ...] = (3, 3, 2, 2, 2, 2),
) -> torch.nn.Module:
    # Create class object
    model = SSD_R34(label_num, strides=strides)
    # Load pretrained model's parameters
    model.load_state_dict(
        torch.load(path, map_location=lambda storage, loc: storage)["model"]
    )
    assert isinstance(model, torch.nn.Module)
    return model


# Load an input image given as a path
def load_image(
    img_path: str | os.PathLike,
    img_size: tuple[int, int] = (1200, 1200),
) -> tuple[ImageFile.ImageFile, torch.Tensor]:
    # Resize input img for model
    orig_img = Image.open(img_path)

    # Convert img to tensor and change shape
    resized_img = orig_img.resize(img_size)
    converted_img_np = np.array(resized_img).transpose(
        2, 0, 1
    )  # (C,H,W), assuming dtype=uint8
    converted_img = (
        torch.from_numpy(converted_img_np).contiguous().unsqueeze(dim=0)
    )  # (1,C,H,W)

    return orig_img, converted_img


# Decode a model's output for an input image
def decode_outputs(
    out_locs: torch.Tensor,
    out_labels: torch.Tensor,
    encoder: Encoder,
    criteria: float = 0.50,
    max_output: int = 200,
    device: int = 0,
) -> list[tuple[torch.Tensor, torch.Tensor, torch.Tensor]]:
    try:
        decoded_outputs = encoder.decode_batch(
            out_locs,
            out_labels,
            criteria=criteria,
            max_output=max_output,
            device=device,
        )
    except Exception as e:
        print(f"Error in decode_outputs: {type(e)} - {e}")
        return []

    results = []
    idx = 0
    for i in range(decoded_outputs[3].size(0)):
        detection_num = decoded_outputs[3][i].item()
        idx_range = idx + detection_num
        results.append(
            (
                decoded_outputs[0][idx:idx_range],
                decoded_outputs[1][idx:idx_range],
                decoded_outputs[2][idx:idx_range],
            )
        )
        idx += detection_num

    return results


# This function is originally from:
# [ai-reference-models](https://github.com/intel/ai-reference-models/blob/main/models_v2/pytorch/ssd-resnet34/inference/cpu/utils.py)  # NOQA: B950
# Drawing bboxes and labels for detected objects
def draw_patches(  # NOQA: CFQ002
    img: ImageFile.ImageFile,
    img_path: Path,
    bboxes: torch.Tensor,
    labels: torch.Tensor,
    order: str = "ltrb",
    label_map: Mapping[int, str] | None = None,
    bbox_alpha: float = 1.0,
    bbox_linewidth: int = 1,
    label_alpha: float = 0.3,
    label_linewidth: int = 2,
    text_size: int = 10,
) -> None:
    img_np = np.array(img)
    labels_np = labels.detach().numpy()
    bboxes_np = bboxes.detach().numpy()

    # From labels_np (ndarray[int]) to labels_list (list[str])
    if label_map is not None:
        labels_list = [
            label_map.get(int(label_i), "(no label)") for label_i in labels_np
        ]
    else:
        # Use an int value as a label str
        labels_list = [str(int(label_i)) for label_i in labels_np]

    if order == "ltrb":
        xmin, ymin, xmax, ymax = (
            bboxes_np[:, 0],
            bboxes_np[:, 1],
            bboxes_np[:, 2],
            bboxes_np[:, 3],
        )
        cx, cy, w, h = (xmin + xmax) / 2, (ymin + ymax) / 2, xmax - xmin, ymax - ymin
    else:
        cx, cy, w, h = (
            bboxes_np[:, 0],
            bboxes_np[:, 1],
            bboxes_np[:, 2],
            bboxes_np[:, 3],
        )

    htot, wtot, _ = img_np.shape
    cx *= wtot
    cy *= htot
    w *= wtot
    h *= htot

    plt.imshow(img_np)
    ax = plt.gca()
    for cx_i, cy_i, w_i, h_i, label_i in zip(cx, cy, w, h, labels_list):
        if label_i == "background":
            continue
        ax.add_patch(
            patches.Rectangle(
                (cx_i - 0.5 * w_i, cy_i - 0.5 * h_i),
                w_i,
                h_i,
                fill=False,
                color="r",
                alpha=bbox_alpha,
                linewidth=bbox_linewidth,
            )
        )
        bbox_props = dict(
            boxstyle="square,pad=0",
            fc="y",
            ec="y",
            alpha=label_alpha,
            linewidth=label_linewidth,
        )
        ax.text(
            cx_i - 0.5 * w_i,
            cy_i - 0.5 * h_i,
            label_i,
            ha="left",
            va="bottom",
            size=text_size,
            bbox=bbox_props,
        )

    plt.savefig(img_path)
    plt.clf()
    plt.close()


# Fetch results satisfying threshold and draw bounding box on the given input image
def draw_detection_result(  # NOQA: CFQ002
    out_locs: torch.Tensor,
    out_labels: torch.Tensor,
    img: ImageFile.ImageFile,
    img_path: Path,
    encoder: Encoder,
    threshold: float,
    label_info: dict[int, str],
) -> None:

    decoded_output = decode_outputs(out_locs, out_labels, encoder)
    if not decoded_output:
        print("no objects have been detected")
        return

    bboxes, labels, scores = decoded_output[0]
    detection_mask = scores > threshold
    fetched_bboxes = bboxes[detection_mask]
    fetched_labels = labels[detection_mask]
    draw_patches(img, img_path, fetched_bboxes, fetched_labels, label_map=label_info)


def calc_ssim(
    lhs_img_path: Path,
    rhs_img_path: Path,
) -> float:
    lhs_img = cv2.imread(lhs_img_path)
    rhs_img = cv2.imread(rhs_img_path)
    assert lhs_img.shape == rhs_img.shape

    # Convert to gray scale (only image structure is necessary for SSIM)
    lhs_img_gray = cv2.cvtColor(lhs_img, cv2.COLOR_BGR2GRAY)
    rhs_img_gray = cv2.cvtColor(rhs_img, cv2.COLOR_BGR2GRAY)

    return float(ssim(lhs_img_gray, rhs_img_gray))  # type: ignore[no-untyped-call]


def run_infer(  # NOQA: CFQ002
    *,
    img_path: Path,
    model_path: Path,
    coco_data_path: Path,
    coco_annotation_path: Path,
    out_img_dir: Path,
    threshold: float,
    device_name: str,
    outdir: str,
    option_json_path: Optional[Path] = None,
) -> None:
    # Create Dataset object
    IMG_SIZE = [1200, 1200]
    STRIDES = [3, 3, 2, 2, 2, 2]
    default_boxes = dboxes_R34_coco(IMG_SIZE, STRIDES)
    transformer = SSDTransformer(default_boxes, tuple(IMG_SIZE), val=True)
    coco = COCODetection(coco_data_path, coco_annotation_path, transformer)

    # Create encoder for decode process
    encoder = Encoder(default_boxes)

    # Load and prepare images
    orig_img, converted_img = load_image(img_path)
    infer_input = {"image": converted_img}
    img_name = img_path.name

    # Create pretrained model object
    model = load_model(model_path)
    model.eval()

    # Inference function for Context.compile
    def infer_fn(sample: dict[str, TensorLike]) -> dict[str, TensorLike]:
        with torch.no_grad():
            locs, labels = model(sample["image"].float() / 255.0)
        return {"locs": locs, "labels": labels}

    device = MNDevice(device_name)
    context = Context(device)
    Context.switch_context(context)

    context.registry.register("model", model)

    compile_options = {}
    if option_json_path is not None:
        compile_options = {"option_json": str(option_json_path)}

    compiled_infer_fn = context.compile(
        infer_fn,
        infer_input,
        storage.path(outdir),
        options=compile_options,
        cache_options=CacheOptions(outdir + "/cache"),
    )

    out_mncore = compiled_infer_fn(infer_input)
    out_locs_mncore, out_labels_mncore = (
        out_mncore["locs"].cpu(),
        out_mncore["labels"].cpu(),
    )

    mncore_img_path = out_img_dir / ("out_mncore_" + img_name)
    print(f"Drawing detection result => {mncore_img_path}")
    draw_detection_result(
        out_locs_mncore,
        out_labels_mncore,
        orig_img,
        mncore_img_path,
        encoder,
        threshold,
        label_info=coco.label_info,
    )

    out_torch = infer_fn(infer_input)
    out_locs_torch, out_labels_torch = (
        out_torch["locs"].cpu(),
        out_torch["labels"].cpu(),
    )

    torch_img_path = out_img_dir / ("out_torch_" + img_name)
    print(f"Drawing detection result => {torch_img_path}")
    draw_detection_result(
        out_locs_torch,
        out_labels_torch,
        orig_img,
        torch_img_path,
        encoder,
        threshold,
        label_info=coco.label_info,
    )

    score = calc_ssim(mncore_img_path, torch_img_path)
    print(f"SSIM score: {score}")
    assert score > 0.99, "Generated images differ."


def main() -> None:
    parser = argparse.ArgumentParser(
        description="Run SSD-ResNet34-1200 model for input image"
    )
    parser.add_argument("img_path", type=Path, help="Path to input image")
    parser.add_argument(
        "--model_path",
        type=Path,
        required=True,
        help="Path to trained SSD model (e.g. resnet34-ssd1200.pth)",
    )
    parser.add_argument(
        "--coco_data_path", type=Path, required=True, help="Path to the COCO dataset"
    )
    parser.add_argument(
        "--coco_annotation_path",
        type=Path,
        required=True,
        help="Path to annotation data (JSON) corresponding to the COCO dataset",
    )
    parser.add_argument(
        "--out_img_dir",
        type=Path,
        default=Path("."),
        help="Path to directory to output detection result images",
    )
    parser.add_argument(
        "--threshold",
        type=float,
        default=0.4,
        help="""
        Detection threshold (0.0-1.0), a smaller threshold make object detecting
        more sensitive""",
    )
    parser.add_argument("--device", type=str, default="mncore2:auto")
    parser.add_argument("--outdir", type=str, default="/tmp/mlsdk_ssd_inference/out")
    parser.add_argument(
        "--option_json",
        type=Path,
        default="/opt/pfn/pfcomp/codegen/preset_options/O1.json",
    )
    args = parser.parse_args()

    # Validate arguments
    assert 0 <= args.threshold and args.threshold <= 1.0
    assert not args.out_img_dir.is_file(), "Please specify directory path."

    args.out_img_dir.mkdir(parents=True, exist_ok=True)

    run_infer(
        img_path=args.img_path,
        model_path=args.model_path,
        coco_data_path=args.coco_data_path,
        coco_annotation_path=args.coco_annotation_path,
        out_img_dir=args.out_img_dir,
        threshold=args.threshold,
        device_name=args.device,
        outdir=args.outdir,
        option_json_path=args.option_json,
    )


if __name__ == "__main__":
    main()

リスト 8.20 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/requirements.txt

matplotlib
pycocotools
defusedxml

リスト 8.21 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/coco_preparation.py

import argparse
import json
import os
from typing import Any


def fetch_imgs_and_ids(
    json_obj: dict[str, Any], license_id_list: list[int]
) -> tuple[list[list[dict[str, int | str]]], list[int]]:
    img_list = []
    id_list = []
    for i in json_obj["images"]:
        if i["license"] in license_id_list:
            img_list.append(i)
            id_list.append(i["id"])

    return img_list, id_list


def fetch_annotations(json_obj: dict[str, Any], id_list: list[int]) -> list[Any]:
    ann_list = []
    for i in json_obj["annotations"]:
        if i["image_id"] in id_list:
            ann_list.append(i)

    return ann_list


def remove_wasted_imgs(
    json_obj: dict[str, Any], fetched_dict: dict[str, Any], img_dir: str | os.PathLike
) -> None:
    org_file_names = set([i["file_name"] for i in json_obj["images"]])
    fetched_file_name = set([i["file_name"] for i in fetched_dict["images"]])
    diff_set = org_file_names - fetched_file_name
    for i in diff_set:
        wasted_img_path = os.path.join(img_dir, i)
        if os.path.exists(wasted_img_path):
            os.remove(wasted_img_path)


def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-a",
        "--annotation_file",
        type=str,
        default="./coco/annotations/instances_val2017.json",
    )
    parser.add_argument("-i", "--images_dir", type=str, default="./coco/val2017/")
    parser.add_argument(
        "-f",
        "--fetched_annotation_file",
        type=str,
        default="./coco/annotations/fetched_annotations.json",
    )
    args = parser.parse_args()

    # fetch images' infomation with no license problems
    json_obj = None
    with open(args.annotation_file, mode="r") as f:
        json_obj = json.load(f)
    license_id_list = [4]  # 4: CC-BY 2.0

    img_list, id_list = fetch_imgs_and_ids(json_obj, license_id_list)
    ann_list = fetch_annotations(json_obj, id_list)
    fetched_dict = (
        {"info": json_obj["info"]}
        | {"licenses": json_obj["licenses"]}
        | {"images": img_list}
        | {"annotations": ann_list}
        | {"categories": json_obj["categories"]}
    )

    # dump fetched infomations
    with open(args.fetched_annotation_file, mode="w") as f:
        json.dump(fetched_dict, f)

    # remove files unused in inference
    remove_wasted_imgs(json_obj, fetched_dict, args.images_dir)


if __name__ == "__main__":
    main()