{ "cells": [ { "cell_type": "markdown", "id": "33adcdb3", "metadata": {}, "source": [ "# Video Detection Demo with PytorchWildlife" ] }, { "cell_type": "markdown", "id": "c534c504", "metadata": {}, "source": [ "This tutorial guides you on how to use PyTorchWildlife for video detection and classification. We will go through the process of setting up the environment, defining the detection and classification models, as well as performing inference and saving the results in an annotated video.\n", "\n", "## Prerequisites\n", "Install PytorchWildlife running the following commands:\n", "```bash\n", "conda create -n pytorch_wildlife python=3.8 -y\n", "conda activate pytorch_wildlife\n", "pip install PytorchWildlife\n", "```\n", "Also, make sure you have a CUDA-capable GPU if you intend to run the model on a GPU. This notebook can also run on CPU.\n", "\n", "## Importing libraries\n", "First, let's import the necessary libraries and modules." ] }, { "cell_type": "code", "execution_count": null, "id": "a28c392c", "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "import numpy as np\n", "import supervision as sv\n", "import torch\n", "from PytorchWildlife.models import detection as pw_detection\n", "from PytorchWildlife.models import classification as pw_classification\n", "from PytorchWildlife.data import transforms as pw_trans\n", "from PytorchWildlife import utils as pw_utils" ] }, { "cell_type": "markdown", "id": "a1d72019", "metadata": {}, "source": [ "## Setting GPU\n", "If you are using a GPU for this exercise, please specify which GPU to use for the computations. By default, GPU number 0 is used. Adjust this as per your setup. You don't need to run this cell if you are using a CPU." ] }, { "cell_type": "code", "execution_count": 2, "id": "24b2cf06", "metadata": {}, "outputs": [], "source": [ "torch.cuda.set_device(0) # Use only if you are running on GPU" ] }, { "cell_type": "markdown", "id": "802747c2", "metadata": {}, "source": [ "## Model Initialization\n", "We'll define the device to run the models and then we will initialize the models for both video detection and classification." ] }, { "cell_type": "code", "execution_count": 8, "id": "dd069110", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fusing layers... \n", "Fusing layers... \n", "Model summary: 733 layers, 140054656 parameters, 0 gradients, 208.8 GFLOPs\n", "Model summary: 733 layers, 140054656 parameters, 0 gradients, 208.8 GFLOPs\n" ] } ], "source": [ "DEVICE = \"cuda\" # Use \"cuda\" if you are running on GPU. Use \"cpu\" if you are running on CPU\n", "SOURCE_VIDEO_PATH = \"./demo_data/videos/opossum_example.MP4\"\n", "TARGET_VIDEO_PATH = \"./demo_data/videos/opossum_example_processed.MP4\"\n", "detection_model = pw_detection.MegaDetectorV5(device=DEVICE, pretrained=True)\n", "classification_model = pw_classification.AI4GOpossum(device=DEVICE, pretrained=True)" ] }, { "cell_type": "markdown", "id": "fa4913d8", "metadata": {}, "source": [ "## Transformations\n", "Define transformations for both detection and classification. These transformations preprocess the video frames for the models." ] }, { "cell_type": "code", "execution_count": 4, "id": "cc6377ee", "metadata": {}, "outputs": [], "source": [ "trans_det = pw_trans.MegaDetector_v5_Transform(target_size=detection_model.IMAGE_SIZE,\n", " stride=detection_model.STRIDE)\n", "trans_clf = pw_trans.Classification_Inference_Transform(target_size=224)" ] }, { "cell_type": "markdown", "id": "3cd6262a", "metadata": {}, "source": [ "## Video Processing\n", "For each frame in the video, we'll apply detection and classification, and then annotate the frame with the results. The processed video will be saved with annotated detections and classifications." ] }, { "cell_type": "code", "execution_count": null, "id": "e6147a40", "metadata": {}, "outputs": [], "source": [ "box_annotator = sv.BoxAnnotator(thickness=4, text_thickness=4, text_scale=2)\n", "\n", "def callback(frame: np.ndarray, index: int) -> np.ndarray:\n", " results_det = detection_model.single_image_detection(trans_det(frame), frame.shape, index)\n", " labels = []\n", " for xyxy in results_det[\"detections\"].xyxy:\n", " cropped_image = sv.crop_image(image=frame, xyxy=xyxy)\n", " results_clf = classification_model.single_image_classification(trans_clf(Image.fromarray(cropped_image)))\n", " labels.append(\"{} {:.2f}\".format(results_clf[\"prediction\"], results_clf[\"confidence\"]))\n", " annotated_frame = box_annotator.annotate(scene=frame, detections=results_det[\"detections\"], labels=labels)\n", " return annotated_frame \n", "\n", "pw_utils.process_video(source_path=SOURCE_VIDEO_PATH, target_path=TARGET_VIDEO_PATH, callback=callback, target_fps=5)" ] }, { "cell_type": "markdown", "id": "8e270f0f", "metadata": {}, "source": [ "### Copyright (c) Microsoft Corporation. All rights reserved.\n", "### Licensed under the MIT License." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.18" } }, "nbformat": 4, "nbformat_minor": 5 }