Computer Vision


Definition

Computer vision is a field of study that focuses on making computers acquire a certain level of understanding, so they are able to perceive the content of digital images such as photographs and videos.

Applications

  • Healthcare - Computer vision technology is helping healthcare professionals to accurately classify conditions or illnesses that may potentially save patients’ lives by reducing or eliminating inaccurate diagnoses and incorrect treatment.
  • Agriculture - Some farmers are starting to adopt computer technologies in order to improve their growth methods increase yields, and eventually increase profit.
  • Banking - This has to do with image recognition applications that use machine learning to classify, extract data, and authenticate documents such as passports, ID cards, driver’s licenses, and checks.
  • Industrial - In this sector, computer vision is used to monitor the status of critical infrastructure and to identify new applications that might improve productivity.

Challenges

  • Image classification - This problem is definitely hard for a machine as all it sees is just a stream of numbers in an image.
  • Object detection: - This is about recognizing various sub images and drawing a bounding box around each recognized sub image. To deal with it, the best method is known as Faster-Region Convolutional Neural Network (Faster-RCNN). It uses a technique called Region Proposal Network, which is basically responsible for localizing on the regions in the image that need to be processed and classified.
  • Image segmentation - This simply means dividing an image based on the objects present, with accurate boundaries. There are two types of image segmentation. The first one is Semantic segmentation, in which each label must be labelled by a class object. Thus, every object that belongs to the same class (e.g. a group of people or a few cars) will be coloured the same. The second type is named Instance segmentation, which classifies every object differently, meaning that every person or car in a picture would have a distinct colour. The latest known technique to solve this is called Mask R-CNN, which is basically a couple of convolutional layers on top of the already explained R-CNN technique.
  • Image captioning - This involves generating a caption that is most appropriate for an image. In other words, image detection (carried out by the same Faster-RCNN method) along with captioning. The latter is done using a Recurrent Neural Network. Specifically, Long Short-Term Memory (LSTM), which is an advanced version of RNN is used.