diff --git a/README.md b/README.md index 4061176..06f55b0 100644 --- a/README.md +++ b/README.md @@ -149,6 +149,7 @@ and execute it from there. |[object in zone counting video file](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/object_in_zone_counting_video_file.ipynb)|Object detection and object counting in polygon zone: video file annotation| |[object in zone counting video stream](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/object_in_zone_counting_video_stream.ipynb)|Object detection and object counting in polygon zone: streaming video processing| |[tiled object detection](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/tiled_object_detection.ipynb)|How to do tiled object detection of a video stream from a video file. Each video frame is divided by tiles with some overlap, each tile of the AI model input size (to avoid resizing). Object detection is performed for each tile, then results from different tiles are combined.| +|[advanced tiling for object detection](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/advanced_tiling_strategies.ipynb)|Demonstrates how to perform image tiling with various strategies that mitigates partial/duplicate detections introduced by tiling.| ### Benchmarks diff --git a/examples/specialized/advanced_tiling_strategies.ipynb b/examples/specialized/advanced_tiling_strategies.ipynb new file mode 100644 index 0000000..e76d50a --- /dev/null +++ b/examples/specialized/advanced_tiling_strategies.ipynb @@ -0,0 +1,265 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Degirum banner](https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/degirum_banner.png)\n", + "## Advanced Tiling Strategies\n", + "This notebook is an example of how to perform advanced tiling using degirum_tools. The advanced tiling \n", + "strategies are used to mitigate partial/duplicate/overlapping objects introduced by tiling an image for \n", + "object detection. Four different detection merging strategies are demonstrated.\n", + "\n", + "This script works with the following inference options:\n", + "\n", + "1. Run inference on DeGirum Cloud Platform;\n", + "2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;\n", + "3. Run inference on DeGirum ORCA accelerator directly installed on your computer.\n", + "\n", + "To try different options, you need to specify the appropriate `hw_location` option. \n", + "\n", + "You also need to specify your cloud API access token in `degirum_cloud_token`.\n", + "\n", + "You can change `image_source` to a URL or path to another image file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# make sure degirum-tools package is installed\n", + "!pip show degirum-tools || pip install degirum-tools" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Overview of tiling strategies\n", + "degirum_tools comes with four different tiling strategies. The base TileModel simply recombines all detections from each tile and subsequently performs NMS. The LocalGlobalTileModel performs an inference on all tiles and the whole image and selects detections from the tiles or the whole image based on a large object threshold. The BoxFusionTileModel fuses split detections that are detected on the edges of tiles based on a one dimensional IoU threshold. The BoxFusionLocalGlobalTileModel combines the former two strategies. Below you can find validation mAP statistics on the VisDrone dataset using yolov8s trained on VisDrone with each strategy (3x2 tiles with 10% overlap).\n", + "\n", + "| Strategy | mAP50 | mAP50:95 Small | mAP50:95 Medium | mAP50:95 Large |\n", + "|-------------------------------|--------|----------------|------------------|----------------|\n", + "| No Tiling | 0.3206 | 0.0983 | 0.2918 | 0.3938 |\n", + "| TileModel (base) | 0.3825 | 0.1668 | 0.2906 | 0.2292 |\n", + "| LocalGlobalTileModel | 0.3970 | 0.1668 | 0.2974 | 0.3827 |\n", + "| BoxFusionTileModel | 0.3913 | 0.1719 | 0.2990 | 0.2320 |\n", + "| BoxFusionLocalGlobalTileModel | 0.4065 | 0.1719 | 0.3059 | 0.3867 |\n", + "\n", + "The base tiling strategy improves the mAP of small objects at the expense of large objects. By incorporating the LocalGlobal strategy, it is possible to recapture the mAP lost from tiling. The BoxFusion strategy gives modest gains in mAP across all object sizes due to relatively fewer detections occuring on edges/corners of tiles." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# hw_location: where you want to run inference\n", + "# \"@cloud\" to use DeGirum cloud\n", + "# \"@local\" to run on local machine\n", + "# IP address for AI server inference\n", + "# image_source: video source for inference\n", + "# camera index for local camera\n", + "# URL of RTSP stream\n", + "# URL of YouTube Video\n", + "# path to image file\n", + "# model_name: name of the model for running AI inference\n", + "# model_zoo_url: url/path for model zoo\n", + "# cloud_zoo_url: valid for @cloud, @local, and ai server inference options\n", + "# '': ai server serving models from local folder\n", + "# path to json file: single model zoo in case of @local inference\n", + "# class_set: whitelist for classes to detect\n", + "hw_location = \"@cloud\"\n", + "zoo_name = \"https://hub.degirum.com/degirum/visdrone\"\n", + "model_name = 'yolov8s_relu6_visdrone--640x640_quant_n2x_orca1_1'\n", + "image_source = '../../images/ParkingLot.jpg'\n", + "class_set = {\"car\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### The rest of the cells below should run without any modifications" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# imports and variables used in most cells\n", + "import degirum as dg\n", + "import degirum_tools as dgt\n", + "\n", + "from degirum_tools.tile_compound_models import TileExtractorPseudoModel, TileModel, LocalGlobalTileModel, BoxFusionTileModel, BoxFusionLocalGlobalTileModel\n", + "from degirum_tools import NmsBoxSelectionPolicy, NmsOptions\n", + "\n", + "# Base NMS options.\n", + "nms_options = NmsOptions(\n", + " threshold=0.6,\n", + " use_iou=True,\n", + " box_select=NmsBoxSelectionPolicy.MOST_PROBABLE,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## No tiling example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load model to be used for tiling\n", + "model = dg.load_model(model_name, hw_location, zoo_name, dgt.get_token(), image_backend='pil')\n", + "model.output_class_set = class_set # filter class outputss\n", + "\n", + "results = model(image_source)\n", + "results.image_overlay" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following examples all are tiled with 3 columns, 2 rows, and a 10% overlap minimum between each tile." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Base TileModel example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tile_extractor = TileExtractorPseudoModel(cols=3,\n", + " rows=2, \n", + " overlap_percent=0.1, \n", + " model2=model,\n", + " global_tile=False)\n", + "tile_model = TileModel(model1=tile_extractor,\n", + " model2=model,\n", + " nms_options=nms_options)\n", + "results = tile_model(image_source)\n", + "results.image_overlay" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## LocalGlobalTileModel example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tile_extractor = TileExtractorPseudoModel(cols=3,\n", + " rows=2, \n", + " overlap_percent=0.1, \n", + " model2=model,\n", + " global_tile=True)\n", + "tile_model = LocalGlobalTileModel(model1=tile_extractor,\n", + " model2=model,\n", + " large_object_threshold=0.01,\n", + " nms_options=nms_options)\n", + "results = tile_model(image_source)\n", + "results.image_overlay" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## BoxFusionTileModel example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tile_extractor = TileExtractorPseudoModel(cols=3,\n", + " rows=2, \n", + " overlap_percent=0.1, \n", + " model2=model,\n", + " global_tile=False)\n", + "tile_model = BoxFusionTileModel(model1=tile_extractor,\n", + " model2=model,\n", + " edge_threshold=0.02,\n", + " fusion_threshold=0.8)\n", + "results = tile_model(image_source)\n", + "results.image_overlay" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## BoxFusionLocalGlobalTileModel example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tile_extractor = TileExtractorPseudoModel(cols=3,\n", + " rows=2, \n", + " overlap_percent=0.1, \n", + " model2=model,\n", + " global_tile=True)\n", + "tile_model = BoxFusionLocalGlobalTileModel(model1=tile_extractor,\n", + " model2=model,\n", + " large_object_threshold=0.01, \n", + " edge_threshold=0.02,\n", + " fusion_threshold=0.8,\n", + " nms_options=nms_options)\n", + "results = tile_model(image_source)\n", + "results.image_overlay" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "dgenv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/images/ParkingLot.jpg b/images/ParkingLot.jpg new file mode 100644 index 0000000..175d9b4 Binary files /dev/null and b/images/ParkingLot.jpg differ diff --git a/images/image_credits.txt b/images/image_credits.txt index 5cefa58..33728c5 100644 --- a/images/image_credits.txt +++ b/images/image_credits.txt @@ -4,4 +4,5 @@ LicensePlate.jpg: Cropped from Car.jpg ThreePersons: https://pixabay.com/photos/kettlebell-fitness-crossfit-fit-3293481/ TwoCats.jpg: https://pixabay.com/photos/animal-cat-couple-curious-cute-21584/ LivingRoom.jpg: https://pixabay.com/photos/porch-fireplace-design-house-1967855/ -FirePlace.jpg: https://pixabay.com/photos/fireplace-mantel-living-room-cozy-558985/ \ No newline at end of file +FirePlace.jpg: https://pixabay.com/photos/fireplace-mantel-living-room-cozy-558985/ +ParkingLot.jpg: https://www.kaggle.com/datasets/braunge/aerial-view-car-detection-for-yolov5 (mydata/mydata/images/test/4 (47)_1650423582.jpg) (License: ODBL 1.0) \ No newline at end of file diff --git a/tests/reference/advanced_tiling_strategies_4.1.png b/tests/reference/advanced_tiling_strategies_4.1.png new file mode 100644 index 0000000..47b4602 Binary files /dev/null and b/tests/reference/advanced_tiling_strategies_4.1.png differ diff --git a/tests/reference/advanced_tiling_strategies_5.1.png b/tests/reference/advanced_tiling_strategies_5.1.png new file mode 100644 index 0000000..ec02305 Binary files /dev/null and b/tests/reference/advanced_tiling_strategies_5.1.png differ diff --git a/tests/reference/advanced_tiling_strategies_6.1.png b/tests/reference/advanced_tiling_strategies_6.1.png new file mode 100644 index 0000000..f838a1d Binary files /dev/null and b/tests/reference/advanced_tiling_strategies_6.1.png differ diff --git a/tests/reference/advanced_tiling_strategies_7.1.png b/tests/reference/advanced_tiling_strategies_7.1.png new file mode 100644 index 0000000..d4d151b Binary files /dev/null and b/tests/reference/advanced_tiling_strategies_7.1.png differ diff --git a/tests/reference/advanced_tiling_strategies_8.1.png b/tests/reference/advanced_tiling_strategies_8.1.png new file mode 100644 index 0000000..417bcef Binary files /dev/null and b/tests/reference/advanced_tiling_strategies_8.1.png differ diff --git a/tests/test_notebooks.py b/tests/test_notebooks.py index 395dbd2..d786ba1 100644 --- a/tests/test_notebooks.py +++ b/tests/test_notebooks.py @@ -62,6 +62,7 @@ ("specialized/object_in_zone_counting_video_stream.ipynb", "Masked.mp4", [3], []), ("specialized/object_in_zone_counting_video_file.ipynb", "TrafficHD_short.mp4", [3], []), ("specialized/tiled_object_detection.ipynb", "TrafficHD_short.mp4", [3,4], []), + ("specialized/advanced_tiling_strategies.ipynb", None, [4, 5, 6, 7, 8], []), ("applications/person_count_video.ipynb", "Masked.mp4", [7], []), ("applications/stop_sign_violation_detection.ipynb", "Masked.mp4", [3], []), ("applications/person_age_gender_detection.ipynb", "Masked.mp4", {4:2}, []),