Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Georgia have developed software that can turn any smartphone into an eye-tracking device. Eye-tracking is frequently used in psychological experiments and marketing to determine what catches people their attention. Until now it required pricey hardware, which has kept it from entering the public domain.
Aside making existing eye-tracking technology more accessible, the new software system could also allow for new types of eye-controlled computer interfaces or the development of tests that can help with the detection of neurological diseases or mental illnesses.
“The field is kind of stuck in this chicken-and-egg loop,” says Aditya Khosla, an MIT graduate student in electrical engineering and computer science and co-author on the research paper. “Since few people have the external devices, there’s no big incentive to develop applications for them. Since there are no applications, there’s no incentive for people to buy the devices. We thought we should break this circle and try to make an eye tracker that works on a single mobile device, using just your front-facing camera.”
Khosla and his fellow researchers developed their eye tracking device using so called machine learning, a technique in which computers learn to perform tasks by searching for patterns in large sets of training examples.
One of the advantages Khosla and his fellow researchers had compared to previous studies was the amount of data they had to their disposal. The team’s traing set includes examples of gaze patterns from 1,500 mobile-device users. Previously, the largest data sets used to train eye-tracking systems wasn’t bigger than around 50 users.
To assemble data sets, “most other groups tend to call people into the lab,” Khosla says. “It’s really hard to scale that up. Calling 50 people in itself is already a fairly tedious process. But we realized we could do this through crowdsourcing.”
In their paper, the team describes an initial round of experiments, using training data gathered from 800 mobile-device users. On that basis, they were able to obtain the system’s margin of error down to 1.5 centimeters, a twofold improvement over previous experimental systems. After the initial round of experiments, which has been used in the paper, they’ve acquired data on another 700 people, and the additional training data has reduced the margin of error to around one centimeter.
To better understand how larger training sets can improve eye-tracking performance, the researchers continuously trained their system using different-sized subsets of their data. The experiments point out that approximately 10,000 training examples should be enough to lower the margin of error to a half-centimeter, which Khosla estimates will be precise enough to make the mobile eye-tracking system commercially viable.
To training examples, the researchers developed a mobile application for iOS devices. The app flashes a small dot somewhere on the device’s screen, attracting the user’s attention, after which it briefly replaces the dot with either an “R” or an “L,” instructing the user to tap either the right or left side of the screen.
Correctly executing the tap ensures that the user shifts his or her gaze to the correct location. During this process, the device camera thereby continuously captures images of the user’s face.
The made use of application users through Amazon’s Mechanical Turk crowdsourcing website and paid them a small fee for each successfully executed tap. The data set contains, on average, 1,600 images per user.
The machine-learning system that was used to analyse the data was a so called neural network, or a large software network of very simple information processors arranged into specific layers. By training the system the settings of individual processors get modified so that a data item — in this case, images from a mobile device — fed to the bottom layer will be processed by the subsequent layers. The output of the top layer becomes the solution of a computational problem – in this case, an estimate of the direction of the user’s gaze.
The neural network from the MIT and Georgia researchers used a technique called “dark knowledge” in order to make the system smaller. Dark knowledge involves taking the outputs of a fully trained network and using those in combination with the real solutions to train a smaller network.
The technique was able to reduce the size of the researchers’ network by roughly 80 percent, allowing it to perform much more efficiently on a smartphone. With the smaller network, the eye tracking system can also operate at 15 frames per second, which is fast enough to capture even brief glances.
“In lots of cases — if you want to do a user study, in computer vision, in marketing, in developing new user interfaces — eye tracking is something people have been very interested in, but it hasn’t really been accessible,” says Noah Snavely, an associate professor of computer science at Cornell University. “You need expensive equipment, or it has to be calibrated very well in order to work. So something that will work on a device everyone has, that seems very compelling. And from what I’ve seen, the accuracy they get seems like it’s in the ballpark that you can do something interesting.”
“Part of the excitement is that they’ve also created this way of collecting data, and also the data set itself,” Snavely adds. “They did all the legwork that will make other people interested in this problem. And the fact that the community will start working on this will lead to fast improvements.”
The researchers have described their new system in a paper that will be presented later this month at the Computer Vision and Pattern Recognition conference.