7.2.4. Example: Object Detection using SSD

Sample program demonstrating inference using the SSD-ResNet34 trained model on the COCO dataset

Note

The trained model and some source code used in this example have been partially modified or directly sourced from intel / ai-reference-models. All of these components are licensed under the Apache License, Version 2.0.

The COCO dataset is licensed under CC BY 4.0, and we exclusively use images licensed under CC BY 2.0.

Execution Method

The first execution performs the following downloads (subsequent runs will skip these steps). By default, the download location is /tmp/mlsdk_ssd_inference/.

$ cd /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference
$ ./run_ssd_inference.sh /tmp/mlsdk_ssd_inference/coco/val2017/000000363666.jpg --device mncore2:auto

Expected Output

A detection result will be saved in the current working directory by default as a out_mncore_-prefixed file. Additionally, a out_torch_-prefixed file will also be generated - this is a result from running the same task using PyTorch. The comparison between them will be performed using the structural similarity index measure (SSIM) algorithm.

Tip

To specify the directory for outputting result files, you can use the --out_img_dir option.

Detection results (./out_mncore_000000363666.jpg)

Object detection result using SSD-ResNet34 on MN-Core 2 — Fig. 7.4 Object detection using SSD-ResNet34 on MN-Core 2

Log output (SSIM score is considered acceptable if greater than or equal to 0.99, indicating equivalent inference result to PyTorch)

Drawing detection result => /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/out_mncore_000000363666.jpg
Drawing detection result => /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/out_torch_000000363666.jpg
SSIM score: 0.9959848160827895

Scripts

Listing 7.19 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/run_ssd_inference.sh

#! /bin/bash

set -eux -o pipefail

EXAMPLE_NAME="mlsdk_ssd_inference"
VENV_DIR=${VENV_DIR:-"/tmp/${EXAMPLE_NAME}/venv"}
EXTERNAL_DIR=${EXTERNAL_DIR:-"/tmp/${EXAMPLE_NAME}/external"}
COCO_DIR=${COCO_DIR:-"/tmp/${EXAMPLE_NAME}/coco"}
OUT_DIR=${OUT_DIR:-"/tmp/${EXAMPLE_NAME}/out"}

CURRENT_DIR=$(realpath $(dirname $0))
CODEGEN_DIR=$(realpath ${CURRENT_DIR}/../../../)
BUILD_DIR="${CODEGEN_DIR}/build"

### Prepare and source venv/

if [[ ! -d ${VENV_DIR} ]]; then
    python3 -m venv --system-site-packages ${VENV_DIR}
    source ${VENV_DIR}/bin/activate
    pip3 install -r ${CURRENT_DIR}/requirements.txt
else
    source ${VENV_DIR}/bin/activate
fi

### Prepare external/ items

mkdir -p ${EXTERNAL_DIR}
pushd ${EXTERNAL_DIR}

if [[ ! -d ai-reference-models ]]; then
    git clone 'https://github.com/intel/ai-reference-models.git' ai-reference-models --depth 1 --branch v3.3
fi
if [[ ! -f resnet34-ssd1200.pth ]]; then
    # For downloading the trained model, please refer to this documentation.
    # Ref: https://github.com/intel/ai-reference-models/blob/main/models_v2/pytorch/ssd-resnet34/inference/cpu/CONTAINER.md
    wget --no-check-certificate \
        'https://docs.google.com/uc?export=download&id=13kWgEItsoxbVKUlkQz4ntjl1IZGk6_5Z' \
        -O resnet34-ssd1200.pth
fi

popd

### Prepare coco/ items

mkdir -p ${COCO_DIR}
pushd ${COCO_DIR}

# download the COCO datasets and annotations
if [[ ! -d val2017 ]]; then
    curl -O 'http://images.cocodataset.org/zips/val2017.zip'
    unzip val2017.zip
    rm -f val2017.zip
fi
if [[ ! -d annotations ]]; then
    curl -O 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
    unzip annotations_trainval2017.zip
    rm -f annotations_trainval2017.zip
fi
# fetch images and annotations for inference
if [[ ! -f annotations/fetched_annotations_eval.json ]]; then
    python3 ${CURRENT_DIR}/coco_preparation.py \
        -a annotations/instances_val2017.json \
        -i val2017 \
        -f annotations/fetched_annotations_eval.json
fi

popd

### Run ssd_inference.py

source "${BUILD_DIR}/codegen_pythonpath.sh"

# Do not split a small h/w image among L1/L2 blocks.
export CODEGEN_SPATIAL_SPLIT_THRESHOLD_L2B_L1B=32

PYTHONPATH=${PYTHONPATH}:"${EXTERNAL_DIR}/ai-reference-models/models_v2/pytorch/ssd-resnet34/inference/cpu"
python3 ${CURRENT_DIR}/ssd_inference.py \
    --model_path "${EXTERNAL_DIR}/resnet34-ssd1200.pth" \
    --coco_data_path "${COCO_DIR}/val2017" \
    --coco_annotation_path "${COCO_DIR}/annotations/fetched_annotations_eval.json" \
    --out_img_dir "${CURRENT_DIR}" \
    --outdir "${OUT_DIR}" \
    --option_json "${CODEGEN_DIR}/preset_options/O1.json" \
    ${@}

Listing 7.20 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/ssd_inference.py

import argparse
import os
from pathlib import Path
from typing import Mapping, Optional

import cv2
import matplotlib
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.utils.data
from PIL import Image, ImageFile
from skimage.metrics import structural_similarity as ssim

# isort: off

# MLSDK modules
from mlsdk import (
    MNDevice,
    Context,
    storage,
    CacheOptions,
    TensorLike,
)

# following modules are from:
# [ai-reference-models repository](https://github.com/intel/ai-reference-models/tree/main/models_v2/pytorch/ssd-resnet34/inference/cpu)  # NOQA: B950
from infer import dboxes_R34_coco
from ssd_r34 import SSD_R34
from utils import COCODetection, Encoder, SSDTransformer

# isort: on

# No GUI output
matplotlib.use("Agg")


# label_num = 81 and strides = [3, 3, 2, 2, 2, 2] are from original source in:
# https://github.com/intel/ai-reference-models/tree/main/models_v2/pytorch/ssd-resnet34/inference/cpu/infer.py
def load_model(
    path: str | os.PathLike,
    label_num: int = 81,
    strides: tuple[int, ...] = (3, 3, 2, 2, 2, 2),
) -> torch.nn.Module:
    # Create class object
    model = SSD_R34(label_num, strides=strides)
    # Load pretrained model's parameters
    model.load_state_dict(
        torch.load(path, map_location=lambda storage, loc: storage)["model"]
    )
    assert isinstance(model, torch.nn.Module)
    return model


# Load an input image given as a path
def load_image(
    img_path: str | os.PathLike,
    img_size: tuple[int, int] = (1200, 1200),
) -> tuple[ImageFile.ImageFile, torch.Tensor]:
    # Resize input img for model
    orig_img = Image.open(img_path)

    # Convert img to tensor and change shape
    resized_img = orig_img.resize(img_size)
    converted_img_np = np.array(resized_img).transpose(
        2, 0, 1
    )  # (C,H,W), assuming dtype=uint8
    converted_img = (
        torch.from_numpy(converted_img_np).contiguous().unsqueeze(dim=0)
    )  # (1,C,H,W)

    return orig_img, converted_img


# Decode a model's output for an input image
def decode_outputs(
    out_locs: torch.Tensor,
    out_labels: torch.Tensor,
    encoder: Encoder,
    criteria: float = 0.50,
    max_output: int = 200,
    device: int = 0,
) -> list[tuple[torch.Tensor, torch.Tensor, torch.Tensor]]:
    try:
        decoded_outputs = encoder.decode_batch(
            out_locs,
            out_labels,
            criteria=criteria,
            max_output=max_output,
            device=device,
        )
    except Exception as e:
        print(f"Error in decode_outputs: {type(e)} - {e}")
        return []

    results = []
    idx = 0
    for i in range(decoded_outputs[3].size(0)):
        detection_num = decoded_outputs[3][i].item()
        idx_range = idx + detection_num
        results.append(
            (
                decoded_outputs[0][idx:idx_range],
                decoded_outputs[1][idx:idx_range],
                decoded_outputs[2][idx:idx_range],
            )
        )
        idx += detection_num

    return results


# This function is originally from:
# [ai-reference-models](https://github.com/intel/ai-reference-models/blob/main/models_v2/pytorch/ssd-resnet34/inference/cpu/utils.py)  # NOQA: B950
# Drawing bboxes and labels for detected objects
def draw_patches(  # NOQA: CFQ002
    img: ImageFile.ImageFile,
    img_path: Path,
    bboxes: torch.Tensor,
    labels: torch.Tensor,
    order: str = "ltrb",
    label_map: Mapping[int, str] | None = None,
    bbox_alpha: float = 1.0,
    bbox_linewidth: int = 1,
    label_alpha: float = 0.3,
    label_linewidth: int = 2,
    text_size: int = 10,
) -> None:
    img_np = np.array(img)
    labels_np = labels.detach().numpy()
    bboxes_np = bboxes.detach().numpy()

    # From labels_np (ndarray[int]) to labels_list (list[str])
    if label_map is not None:
        labels_list = [
            label_map.get(int(label_i), "(no label)") for label_i in labels_np
        ]
    else:
        # Use an int value as a label str
        labels_list = [str(int(label_i)) for label_i in labels_np]

    if order == "ltrb":
        xmin, ymin, xmax, ymax = (
            bboxes_np[:, 0],
            bboxes_np[:, 1],
            bboxes_np[:, 2],
            bboxes_np[:, 3],
        )
        cx, cy, w, h = (xmin + xmax) / 2, (ymin + ymax) / 2, xmax - xmin, ymax - ymin
    else:
        cx, cy, w, h = (
            bboxes_np[:, 0],
            bboxes_np[:, 1],
            bboxes_np[:, 2],
            bboxes_np[:, 3],
        )

    htot, wtot, _ = img_np.shape
    cx *= wtot
    cy *= htot
    w *= wtot
    h *= htot

    plt.imshow(img_np)
    ax = plt.gca()
    for cx_i, cy_i, w_i, h_i, label_i in zip(cx, cy, w, h, labels_list):
        if label_i == "background":
            continue
        ax.add_patch(
            patches.Rectangle(
                (cx_i - 0.5 * w_i, cy_i - 0.5 * h_i),
                w_i,
                h_i,
                fill=False,
                color="r",
                alpha=bbox_alpha,
                linewidth=bbox_linewidth,
            )
        )
        bbox_props = dict(
            boxstyle="square,pad=0",
            fc="y",
            ec="y",
            alpha=label_alpha,
            linewidth=label_linewidth,
        )
        ax.text(
            cx_i - 0.5 * w_i,
            cy_i - 0.5 * h_i,
            label_i,
            ha="left",
            va="bottom",
            size=text_size,
            bbox=bbox_props,
        )

    plt.savefig(img_path)
    plt.clf()
    plt.close()


# Fetch results satisfying threshold and draw bounding box on the given input image
def draw_detection_result(  # NOQA: CFQ002
    out_locs: torch.Tensor,
    out_labels: torch.Tensor,
    img: ImageFile.ImageFile,
    img_path: Path,
    encoder: Encoder,
    threshold: float,
    label_info: dict[int, str],
) -> None:

    decoded_output = decode_outputs(out_locs, out_labels, encoder)
    if not decoded_output:
        print("no objects have been detected")
        return

    bboxes, labels, scores = decoded_output[0]
    detection_mask = scores > threshold
    fetched_bboxes = bboxes[detection_mask]
    fetched_labels = labels[detection_mask]
    draw_patches(img, img_path, fetched_bboxes, fetched_labels, label_map=label_info)


def calc_ssim(
    lhs_img_path: Path,
    rhs_img_path: Path,
) -> float:
    lhs_img = cv2.imread(lhs_img_path)
    rhs_img = cv2.imread(rhs_img_path)
    assert lhs_img.shape == rhs_img.shape

    # Convert to gray scale (only image structure is necessary for SSIM)
    lhs_img_gray = cv2.cvtColor(lhs_img, cv2.COLOR_BGR2GRAY)
    rhs_img_gray = cv2.cvtColor(rhs_img, cv2.COLOR_BGR2GRAY)

    return float(ssim(lhs_img_gray, rhs_img_gray))  # type: ignore[no-untyped-call]


def run_infer(  # NOQA: CFQ002
    *,
    img_path: Path,
    model_path: Path,
    coco_data_path: Path,
    coco_annotation_path: Path,
    out_img_dir: Path,
    threshold: float,
    device_name: str,
    outdir: str,
    option_json_path: Optional[Path] = None,
) -> None:
    # Create Dataset object
    IMG_SIZE = [1200, 1200]
    STRIDES = [3, 3, 2, 2, 2, 2]
    default_boxes = dboxes_R34_coco(IMG_SIZE, STRIDES)
    transformer = SSDTransformer(default_boxes, tuple(IMG_SIZE), val=True)
    coco = COCODetection(coco_data_path, coco_annotation_path, transformer)

    # Create encoder for decode process
    encoder = Encoder(default_boxes)

    # Load and prepare images
    orig_img, converted_img = load_image(img_path)
    infer_input = {"image": converted_img}
    img_name = img_path.name

    # Create pretrained model object
    model = load_model(model_path)
    model.eval()

    # Inference function for Context.compile
    def infer_fn(sample: dict[str, TensorLike]) -> dict[str, TensorLike]:
        with torch.no_grad():
            locs, labels = model(sample["image"].float() / 255.0)
        return {"locs": locs, "labels": labels}

    device = MNDevice(device_name)
    context = Context(device)
    Context.switch_context(context)

    context.registry.register("model", model)

    compile_options = {}
    if option_json_path is not None:
        compile_options = {"option_json": str(option_json_path)}

    compiled_infer_fn = context.compile(
        infer_fn,
        infer_input,
        storage.path(outdir),
        options=compile_options,
        cache_options=CacheOptions(outdir + "/cache"),
    )

    out_mncore = compiled_infer_fn(infer_input)
    out_locs_mncore, out_labels_mncore = (
        out_mncore["locs"].cpu(),
        out_mncore["labels"].cpu(),
    )

    mncore_img_path = out_img_dir / ("out_mncore_" + img_name)
    print(f"Drawing detection result => {mncore_img_path}")
    draw_detection_result(
        out_locs_mncore,
        out_labels_mncore,
        orig_img,
        mncore_img_path,
        encoder,
        threshold,
        label_info=coco.label_info,
    )

    out_torch = infer_fn(infer_input)
    out_locs_torch, out_labels_torch = (
        out_torch["locs"].cpu(),
        out_torch["labels"].cpu(),
    )

    torch_img_path = out_img_dir / ("out_torch_" + img_name)
    print(f"Drawing detection result => {torch_img_path}")
    draw_detection_result(
        out_locs_torch,
        out_labels_torch,
        orig_img,
        torch_img_path,
        encoder,
        threshold,
        label_info=coco.label_info,
    )

    score = calc_ssim(mncore_img_path, torch_img_path)
    print(f"SSIM score: {score}")
    assert score > 0.99, "Generated images differ."


def main() -> None:
    parser = argparse.ArgumentParser(
        description="Run SSD-ResNet34-1200 model for input image"
    )
    parser.add_argument("img_path", type=Path, help="Path to input image")
    parser.add_argument(
        "--model_path",
        type=Path,
        required=True,
        help="Path to trained SSD model (e.g. resnet34-ssd1200.pth)",
    )
    parser.add_argument(
        "--coco_data_path", type=Path, required=True, help="Path to the COCO dataset"
    )
    parser.add_argument(
        "--coco_annotation_path",
        type=Path,
        required=True,
        help="Path to annotation data (JSON) corresponding to the COCO dataset",
    )
    parser.add_argument(
        "--out_img_dir",
        type=Path,
        default=Path("."),
        help="Path to directory to output detection result images",
    )
    parser.add_argument(
        "--threshold",
        type=float,
        default=0.4,
        help="""
        Detection threshold (0.0-1.0), a smaller threshold make object detecting
        more sensitive""",
    )
    parser.add_argument("--device", type=str, default="mncore2:auto")
    parser.add_argument("--outdir", type=str, default="/tmp/mlsdk_ssd_inference/out")
    parser.add_argument(
        "--option_json",
        type=Path,
        default="/opt/pfn/pfcomp/codegen/preset_options/O1.json",
    )
    args = parser.parse_args()

    # Validate arguments
    assert 0 <= args.threshold and args.threshold <= 1.0
    assert not args.out_img_dir.is_file(), "Please specify directory path."

    args.out_img_dir.mkdir(parents=True, exist_ok=True)

    run_infer(
        img_path=args.img_path,
        model_path=args.model_path,
        coco_data_path=args.coco_data_path,
        coco_annotation_path=args.coco_annotation_path,
        out_img_dir=args.out_img_dir,
        threshold=args.threshold,
        device_name=args.device,
        outdir=args.outdir,
        option_json_path=args.option_json,
    )


if __name__ == "__main__":
    main()

Listing 7.21 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/requirements.txt

matplotlib
pycocotools
defusedxml

Listing 7.22 /opt/pfn/pfcomp/codegen/MLSDK/examples/ssd_inference/coco_preparation.py

import argparse
import json
import os
from typing import Any


def fetch_imgs_and_ids(
    json_obj: dict[str, Any], license_id_list: list[int]
) -> tuple[list[list[dict[str, int | str]]], list[int]]:
    img_list = []
    id_list = []
    for i in json_obj["images"]:
        if i["license"] in license_id_list:
            img_list.append(i)
            id_list.append(i["id"])

    return img_list, id_list


def fetch_annotations(json_obj: dict[str, Any], id_list: list[int]) -> list[Any]:
    ann_list = []
    for i in json_obj["annotations"]:
        if i["image_id"] in id_list:
            ann_list.append(i)

    return ann_list


def remove_wasted_imgs(
    json_obj: dict[str, Any], fetched_dict: dict[str, Any], img_dir: str | os.PathLike
) -> None:
    org_file_names = set([i["file_name"] for i in json_obj["images"]])
    fetched_file_name = set([i["file_name"] for i in fetched_dict["images"]])
    diff_set = org_file_names - fetched_file_name
    for i in diff_set:
        wasted_img_path = os.path.join(img_dir, i)
        if os.path.exists(wasted_img_path):
            os.remove(wasted_img_path)


def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-a",
        "--annotation_file",
        type=str,
        default="./coco/annotations/instances_val2017.json",
    )
    parser.add_argument("-i", "--images_dir", type=str, default="./coco/val2017/")
    parser.add_argument(
        "-f",
        "--fetched_annotation_file",
        type=str,
        default="./coco/annotations/fetched_annotations.json",
    )
    args = parser.parse_args()

    # fetch images' infomation with no license problems
    json_obj = None
    with open(args.annotation_file, mode="r") as f:
        json_obj = json.load(f)
    license_id_list = [4]  # 4: CC-BY 2.0

    img_list, id_list = fetch_imgs_and_ids(json_obj, license_id_list)
    ann_list = fetch_annotations(json_obj, id_list)
    fetched_dict = (
        {"info": json_obj["info"]}
        | {"licenses": json_obj["licenses"]}
        | {"images": img_list}
        | {"annotations": ann_list}
        | {"categories": json_obj["categories"]}
    )

    # dump fetched infomations
    with open(args.fetched_annotation_file, mode="w") as f:
        json.dump(fetched_dict, f)

    # remove files unused in inference
    remove_wasted_imgs(json_obj, fetched_dict, args.images_dir)


if __name__ == "__main__":
    main()