Why is NeuroVision working in the first place? Are humans really that predictable? What about cultural differences, different types of visual materials? In this primer, we take a look at exactly why NeuroVision’s powerful attention AI works in the first place.
Similarly scared humans
Consider the last time you watched a horror movie. The muscles in your body were tensing up, you felt more alert and afraid. If we measured on your bodily responses, we would see that your pupils were dilating, your pulse would be racing, and your digestional system would stop. When a scary event happened, you would jump, perhaps even look away or scream.
But you would not be alone in responding like this. Indeed, your response would be shared by virtually anyone in the same condition. So your response is uniquely yours, but still commonly human.
Even visual artists have long known about certain characteristics of human vision that is the same for all people. For example, the Fibonacci Spiral is an expression of an aesthetically pleasing principle long recognized by artists. Composing paintings and photographs according to this dictum creates a pleasing, balanced visual experience.
This is the principle behind NeuroVision. That we have certain types of responses that we, as humans, share across cultures, genders, and age groups. Indeed, our visual system is wired to look at very much the same aspects of a picture or video. That’s why it is possible to generate a predictive model of human visual attention.
But still, certain ingredients are needed. Let’s take them in turn.
Three routes to attention
The first basic knowledge is that we need to understand how attention works in the first place. Here, attention is not a single process, but actually the result of three different processes. Sometimes, one process dominates, at other times we see many processes converging or conflicting.
There are mainly three types of processes that lead to your eyes looking at something:
- Visual saliency — visual elements can attract attention automatically through their sheer properties. Things like contrast, angles, density, movement, and color composition can automatically draw your eyes like a magnet. For example, if a pixel on your monitor starts blinking out of order, you will surely notice right away. This is visual saliency.
- Emotional saliency — just as in the horror movie example, things that are relevant (for good or bad) is something we will be more likely to notice. Emotional saliency is both for positive and negative events. So a Snickers bar when you are hungry, or something resembling a snake on your trekking trip, will both lead your eyes to look towards it.
- Controlled attention — when we concentrate on reading something, or look for something specific in the store, we are indulging in top-down attention. This type of attention is the hardest to model since people have different reasons for why they are looking for something.
The 2-trillion datapoint benchmark
To ensure that any model is good at predicting visual attention, you need to have something to validate it with. In fact, in the frontiers of machine learning and AI research, one of the most agreed-upon fact is that you need to have good, valid, and labeled data to train and validate your models against.
Since 2013, we have collected eye-tracking data on consumer attention. As you know, NeuroVision is powered by Neurons, one of the world’s leading consumer neuroscience companies. Even a conservative estimate is that we have a database of more than 12.000 participants and over 2 trillion data points to base our models on. To our knowledge, this is among the largest databases out there.
Second, the data we have built our models on are directly on consumer attention data, and not generic eye-tracking data from academic lab studies. This means that our algorithms are trained for the purposes they are intended to be used for.
Finally, as Neurons continues to run studies, more data are being added to the database. As new models are being trained, we also produce better prediction models.
The attentional brain
What happens in the brain when we are paying attention to something? First, you need to show a certain length of fixation before your mind starts responding. A few tens of milliseconds are enough for your mind to start responding. A brand that is seen for 50 milliseconds can still impact people’s perception of a product. A smiling face shown for 100 milliseconds will consciously give you the impression that someone is happy.
More often than not, we look for a few hundred milliseconds, which is more than enough to be processed consciously and affecting actions. As seen above, we find at least three types of attention, all of which help make our attention more predictable. These are even manifested as different processes in the brain, as shown below:
The upside of predictability
The fact that human behaviors are reliable across people — from attention to emotional responses — is key to the success of NeuroVision. But it also makes sense for our most basic behaviors to be the same. After all, we have evolved to respond to the same environment, and this extends far beyond the evolution of humans.
The upside is, of course, that reliable behaviors can be predicted. Where we look is for the most part highly predictable. If you test two groups with about 100 people in each group, there will be very few differences in their heat map. In the same way, NeuroVision predictions are as reliable as testing two groups of 100 people.
As researchers and designers alike, human predictability is a huge upside.