Here is a quick video overview of a computer vision task I have been working on. It includes object detection, image segmentation, and monocular depth estimation.
The idea started for me when a lecturer gave us a task to conceptualise and research an application of combining a language model with a computer vision model. After a little reading, I was shocked to learn that there are roughly 300 million people with moderate to severe vision impairment and 36 million who are completely blind. I asked myself the question: What vision models are available to build situational understanding?
Depth Estimation with “Intel/dpt-hybrid-midas”
Object Classification was done with Ultrlytics Yolov8-Nano
Image Segmentation with "nvidia/segformer-b0-finetuned-ade-512-512"