Skip to content

Latest commit

 

History

History
24 lines (12 loc) · 1.35 KB

README.md

File metadata and controls

24 lines (12 loc) · 1.35 KB

computer-vision

Here is a quick video overview of a computer vision task I have been working on. It includes object detection, image segmentation, and monocular depth estimation.

The idea started for me when a lecturer gave us a task to conceptualise and research an application of combining a language model with a computer vision model. After a little reading, I was shocked to learn that there are roughly 300 million people with moderate to severe vision impairment and 36 million who are completely blind. I asked myself the question: What vision models are available to build situational understanding?

  • Video Output Link comp_vis1

comp_vis

Depth Estimation with “Intel/dpt-hybrid-midas”

comp_vis2

Object Classification was done with Ultrlytics Yolov8-Nano

comp_vis3

Image Segmentation with "nvidia/segformer-b0-finetuned-ade-512-512"

comp_vis4