Skip to content

advanced tiling initial commit #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ and execute it from there.
|[object in zone counting video file](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/object_in_zone_counting_video_file.ipynb)|Object detection and object counting in polygon zone: video file annotation|
|[object in zone counting video stream](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/object_in_zone_counting_video_stream.ipynb)|Object detection and object counting in polygon zone: streaming video processing|
|[tiled object detection](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/tiled_object_detection.ipynb)|How to do tiled object detection of a video stream from a video file. Each video frame is divided by tiles with some overlap, each tile of the AI model input size (to avoid resizing). Object detection is performed for each tile, then results from different tiles are combined.|
|[advanced tiling for object detection](https://colab.research.google.com/github/DeGirum/PySDKExamples/blob/main/examples/specialized/advanced_tiling_strategies.ipynb)|Demonstrates how to perform image tiling with various strategies that mitigates partial/duplicate detections introduced by tiling.|

### Benchmarks

Expand Down
265 changes: 265 additions & 0 deletions examples/specialized/advanced_tiling_strategies.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Degirum banner](https://raw.githubusercontent.com/DeGirum/PySDKExamples/main/images/degirum_banner.png)\n",
"## Advanced Tiling Strategies\n",
"This notebook is an example of how to perform advanced tiling using degirum_tools. The advanced tiling \n",
"strategies are used to mitigate partial/duplicate/overlapping objects introduced by tiling an image for \n",
"object detection. Four different detection merging strategies are demonstrated.\n",
"\n",
"This script works with the following inference options:\n",
"\n",
"1. Run inference on DeGirum Cloud Platform;\n",
"2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;\n",
"3. Run inference on DeGirum ORCA accelerator directly installed on your computer.\n",
"\n",
"To try different options, you need to specify the appropriate `hw_location` option. \n",
"\n",
"You also need to specify your cloud API access token in `degirum_cloud_token`.\n",
"\n",
"You can change `image_source` to a URL or path to another image file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make sure degirum-tools package is installed\n",
"!pip show degirum-tools || pip install degirum-tools"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Overview of tiling strategies\n",
"degirum_tools comes with four different tiling strategies. The base TileModel simply recombines all detections from each tile and subsequently performs NMS. The LocalGlobalTileModel performs an inference on all tiles and the whole image and selects detections from the tiles or the whole image based on a large object threshold. The BoxFusionTileModel fuses split detections that are detected on the edges of tiles based on a one dimensional IoU threshold. The BoxFusionLocalGlobalTileModel combines the former two strategies. Below you can find validation mAP statistics on the VisDrone dataset using yolov8s trained on VisDrone with each strategy (3x2 tiles with 10% overlap).\n",
"\n",
"| Strategy | mAP50 | mAP50:95 Small | mAP50:95 Medium | mAP50:95 Large |\n",
"|-------------------------------|--------|----------------|------------------|----------------|\n",
"| No Tiling | 0.3206 | 0.0983 | 0.2918 | 0.3938 |\n",
"| TileModel (base) | 0.3825 | 0.1668 | 0.2906 | 0.2292 |\n",
"| LocalGlobalTileModel | 0.3970 | 0.1668 | 0.2974 | 0.3827 |\n",
"| BoxFusionTileModel | 0.3913 | 0.1719 | 0.2990 | 0.2320 |\n",
"| BoxFusionLocalGlobalTileModel | 0.4065 | 0.1719 | 0.3059 | 0.3867 |\n",
"\n",
"The base tiling strategy improves the mAP of small objects at the expense of large objects. By incorporating the LocalGlobal strategy, it is possible to recapture the mAP lost from tiling. The BoxFusion strategy gives modest gains in mAP across all object sizes due to relatively fewer detections occuring on edges/corners of tiles."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# hw_location: where you want to run inference\n",
"# \"@cloud\" to use DeGirum cloud\n",
"# \"@local\" to run on local machine\n",
"# IP address for AI server inference\n",
"# image_source: video source for inference\n",
"# camera index for local camera\n",
"# URL of RTSP stream\n",
"# URL of YouTube Video\n",
"# path to image file\n",
"# model_name: name of the model for running AI inference\n",
"# model_zoo_url: url/path for model zoo\n",
"# cloud_zoo_url: valid for @cloud, @local, and ai server inference options\n",
"# '': ai server serving models from local folder\n",
"# path to json file: single model zoo in case of @local inference\n",
"# class_set: whitelist for classes to detect\n",
"hw_location = \"@cloud\"\n",
"zoo_name = \"https://hub.degirum.com/degirum/visdrone\"\n",
"model_name = 'yolov8s_relu6_visdrone--640x640_quant_n2x_orca1_1'\n",
"image_source = '../../images/ParkingLot.jpg'\n",
"class_set = {\"car\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The rest of the cells below should run without any modifications"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# imports and variables used in most cells\n",
"import degirum as dg\n",
"import degirum_tools as dgt\n",
"\n",
"from degirum_tools.tile_compound_models import TileExtractorPseudoModel, TileModel, LocalGlobalTileModel, BoxFusionTileModel, BoxFusionLocalGlobalTileModel\n",
"from degirum_tools import NmsBoxSelectionPolicy, NmsOptions\n",
"\n",
"# Base NMS options.\n",
"nms_options = NmsOptions(\n",
" threshold=0.6,\n",
" use_iou=True,\n",
" box_select=NmsBoxSelectionPolicy.MOST_PROBABLE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## No tiling example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load model to be used for tiling\n",
"model = dg.load_model(model_name, hw_location, zoo_name, dgt.get_token(), image_backend='pil')\n",
"model.output_class_set = class_set # filter class outputss\n",
"\n",
"results = model(image_source)\n",
"results.image_overlay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following examples all are tiled with 3 columns, 2 rows, and a 10% overlap minimum between each tile."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Base TileModel example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tile_extractor = TileExtractorPseudoModel(cols=3,\n",
" rows=2, \n",
" overlap_percent=0.1, \n",
" model2=model,\n",
" global_tile=False)\n",
"tile_model = TileModel(model1=tile_extractor,\n",
" model2=model,\n",
" nms_options=nms_options)\n",
"results = tile_model(image_source)\n",
"results.image_overlay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## LocalGlobalTileModel example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tile_extractor = TileExtractorPseudoModel(cols=3,\n",
" rows=2, \n",
" overlap_percent=0.1, \n",
" model2=model,\n",
" global_tile=True)\n",
"tile_model = LocalGlobalTileModel(model1=tile_extractor,\n",
" model2=model,\n",
" large_object_threshold=0.01,\n",
" nms_options=nms_options)\n",
"results = tile_model(image_source)\n",
"results.image_overlay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BoxFusionTileModel example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tile_extractor = TileExtractorPseudoModel(cols=3,\n",
" rows=2, \n",
" overlap_percent=0.1, \n",
" model2=model,\n",
" global_tile=False)\n",
"tile_model = BoxFusionTileModel(model1=tile_extractor,\n",
" model2=model,\n",
" edge_threshold=0.02,\n",
" fusion_threshold=0.8)\n",
"results = tile_model(image_source)\n",
"results.image_overlay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BoxFusionLocalGlobalTileModel example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tile_extractor = TileExtractorPseudoModel(cols=3,\n",
" rows=2, \n",
" overlap_percent=0.1, \n",
" model2=model,\n",
" global_tile=True)\n",
"tile_model = BoxFusionLocalGlobalTileModel(model1=tile_extractor,\n",
" model2=model,\n",
" large_object_threshold=0.01, \n",
" edge_threshold=0.02,\n",
" fusion_threshold=0.8,\n",
" nms_options=nms_options)\n",
"results = tile_model(image_source)\n",
"results.image_overlay"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "dgenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Binary file added images/ParkingLot.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion images/image_credits.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ LicensePlate.jpg: Cropped from Car.jpg
ThreePersons: https://pixabay.com/photos/kettlebell-fitness-crossfit-fit-3293481/
TwoCats.jpg: https://pixabay.com/photos/animal-cat-couple-curious-cute-21584/
LivingRoom.jpg: https://pixabay.com/photos/porch-fireplace-design-house-1967855/
FirePlace.jpg: https://pixabay.com/photos/fireplace-mantel-living-room-cozy-558985/
FirePlace.jpg: https://pixabay.com/photos/fireplace-mantel-living-room-cozy-558985/
ParkingLot.jpg: https://www.kaggle.com/datasets/braunge/aerial-view-car-detection-for-yolov5 (mydata/mydata/images/test/4 (47)_1650423582.jpg) (License: ODBL 1.0)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions tests/test_notebooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
("specialized/object_in_zone_counting_video_stream.ipynb", "Masked.mp4", [3], []),
("specialized/object_in_zone_counting_video_file.ipynb", "TrafficHD_short.mp4", [3], []),
("specialized/tiled_object_detection.ipynb", "TrafficHD_short.mp4", [3,4], []),
("specialized/advanced_tiling_strategies.ipynb", None, [4, 5, 6, 7, 8], []),
("applications/person_count_video.ipynb", "Masked.mp4", [7], []),
("applications/stop_sign_violation_detection.ipynb", "Masked.mp4", [3], []),
("applications/person_age_gender_detection.ipynb", "Masked.mp4", {4:2}, []),
Expand Down