Seeing the forest for the trees

08 May 2010

1

Object recognition is one of the core topics in computer vision research: After all, a computer that can see isn't much use if it has no idea what it's looking at. Researchers at MIT, working with colleagues at the University of California, Los Angeles, have developed new techniques that should make object recognition systems much easier to build and should enable them use computer memory more efficiently.

 
Graphic: Christine Daniloff

A conventional object recognition system, when trying to discern a particular type of object in a digital image, will generally begin by looking for the object's salient features. A system built to recognise faces, for instance, might look for things resembling eyes, noses and mouths and then determine whether they have the right spatial relationships with each other.

The design of such systems, however, usually requires human intuition: A programmer decides which parts of the objects are the right ones to key in on. That means that for each new object added to the system's repertoire, the programmer has to start from scratch, determining which of the object's parts are the most important.

It also means that a system designed to recognise millions of different types of objects would become unmanageably large. Each object would have its own, unique set of three or four parts, but the parts would look different from different perspectives, and cataloguing all those perspectives would take an enormous amount of computer memory.

In a paper to be presented at the Institute of Electrical and Electronics Engineers' Conference on Computer Vision and Pattern Recognition in June, postdoc Long (Leo) Zhu and professors Bill Freeman and Antonio Torralba, all of MIT's Computer Science and Artificial Intelligence Laboratory, and Yuanhao Chen and Alan Yuille of UCLA describe an approach that solves both of these problems at once.

Like most object-recognition systems, their system learns to recognise new objects by being ''trained'' with digital images of labelled objects. But it doesn't need to know in advance which of the objects' features it should look for. For each labeled object, it first identifies the smallest features it can - often just short line segments.

Then it looks for instances in which these low-level features are connected to each other, forming slightly more sophisticated shapes. Then it looks for instances in which these more sophisticated shapes are connected to each other, and so on, until it's assembled a hierarchical catalogue of increasingly complex parts whose top layer is a model of the whole object.

Business History Videos

History of hovercraft Part 3 | Industry study | Business History

History of hovercraft Part 3...

Today I shall talk a bit more about the military plans for ...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of hovercraft Part 2 | Industry study | Business History

History of hovercraft Part 2...

In this episode of our history of hovercraft, we shall exam...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of Hovercraft Part 1 | Industry study | Business History

History of Hovercraft Part 1...

If you’ve been a James Bond movie fan, you may recall seein...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of Trams in India | Industry study | Business History

History of Trams in India | ...

The video I am presenting to you is based on a script writt...

By Aniket Gupta | Presenter: Sheetal Gaikwad

view more
View details about the software product Informachine News Trackers