上QQ阅读APP看书，第一时间看更新

Annotating Images with Object Detection API

Computer vision has made great leaps forward in recent years because of deep learning, thus granting computers a higher grade in understanding visual scenes. The potentialities of deep learning in vision tasks are great: allowing a computer to visually perceive and understand its surroundings is a capability that opens the door to new artificial intelligence applications in both mobility (for instance, self-driving cars can detect if an appearing obstacle is a pedestrian, an animal or another vehicle from the camera mounted on the car and decide the correct course of action) and human-machine interaction in everyday-life contexts (for instance, allowing a robot to perceive surrounding objects and successfully interact with them).

After presenting ConvNets and how they operate in the first chapter, we now intend to create a quick, easy project that will help you to use a computer to understand images taken from cameras and mobile phones, using images collected from the Internet or directly from your computer's webcam. The goal of the project is to find the exact location and the type of the objects in an image.

In order to achieve such classification and localization, we will leverage the new TensorFlow object detection API, a Google project that is part of the larger TensorFlow models project which makes a series of pre-trained neural networks available off-the-shelf for you to wrap up in your own custom applications.

In this chapter, we are going to illustrate the following:

The advantages of using the right data for your project
A brief presentation of the TensorFlow object detection API
How to annotate stored images for further use
How to visually annotate a video using moviepy
How to go real-time by annotating images from a webcam