The science of
A platform where facial recognition and bio-metrics analysis are combined with machine learning to transform well-being.
Mark Twain famously once said that “There are lies, damn lies, and statistics”. With the rise of AI and Machine Learning, it’s easy to forget that once an algorithm returns a prediction well…that’s just what it is: A prediction with a certain degree of confidence.
The real danger is only considering the main result and entirely forgoing the other options.
At Okaya we take this to heart. Mental Health is a very complex problem and cannot be seen as an absolute, black-or-white, proposition.
When working on our algorithm, we are most interested in the outliers, the things that do not work, the data that do not fit the pattern. We feel this is the surest way to be proactive about finding bias and understanding the limitations (and therefore the future efforts) for our algorithm.
Can Data Be Trusted?
Before reviewing any other elements of the algorithm, we always start by taking a look at the quality of the data used to feed the algorithm. The quality of the data is critical to the overall quality and reliability of the algorithm.
This is why our primary source is our own internal research. This is the most reliable data set because we can decide on the mix we collect, who we collect it from, when it is collected… We can also offer the subject takers full anonymity and privacy guarantees, which is often the tipping point to collecting clean data.
The next layer of research data we consider as credible includes data submitted during research and IRB studies. We give this data (that we help design, collect, and manage) a lot more weight but still know there are some outliers and external scenarios that will not always be covered in these samples.
Finally, there is always the option to include data that can be purchased from external vendors or collected from certain sites. We simply do not use these resources because there are simply too many questions that the data brokers cannot precisely answer.
How Effective Is Computer Vision?
Computer vision has progressed tremendously in the last few years and we are at the infancy of what can be done in the field. To give you an idea of how precise the field is becoming, there are now ways to capture someone’s heart-rate using a camera similar to what you’d find in a microsoft connect device. This is done simply by looking at changes in the skin pigmentation as blood flows through!
Yet, just because the field is progressing does not mean it’s perfect. For example, when the pandemic started many algorithms could not deal with the fact that people were wearing masks.
The speed of a blink
So we pay particular attention to making sure we “see” our subjects properly. The best way for us to do so is by making sure we properly see people’s blinks.
Blinks are very relevant for a few reasons:
From a pure science perspective, countless studies, especially in transportation, have shown the link between blinking rates and fatigue. As an aside, these studies are always done in clinical environments when the face of the subject is properly lit, the person does not move much etc, etc…in other words, environments that never happen in real life!
But blinks are by nature fast. Really fast. A fraction of a second. So we know that if the algorithm properly calculates the number of blinks a subject just had, then the rest of the facial landmarks we are considering are also accurate.
To do so, we manually label our research samples with the number of blinks the subject has over a given period of time and then we compare it to what our algorithm calculates and we look at its accuracy.
We pay particular attention to the samples when our algorithm is not accurate by more than 2 blinks. What we’ve found so far is that these “failed scenarios” share one or more of the following conditions:
Sometimes the subject’s head moves a lot and if the camera used is not recent enough the camera does not capture this movement well. As a result, the quality drops quite a bit as the computer vision can become confused between true blink movement vs a head movement that resembles a blink.
The Impact Of Hardware Quality
At the other end of the camera quality spectrum, we also have an interesting use-case when receiving data from high quality devices: We’ve also identified several instances when the subject does a blink within a blink. These are barely visible to the human eye in real time. Think of them as having somewhat of a “W” shape.
These types of blinks can really fool not just the computer vision system but also the algorithm.
The trick in these instances is to properly differentiate between a “W” blink and a regular double blink and properly accounting them.
Distance and Light
Distance is a factor when it comes to quality and accuracy. We’ve found that the best accuracy is when the subject is between 30 centimeters and 150 centimeters of the camera. In this scenario the results are accurate regardless of other elements (movement, etc..) Beyond that threshold the quality drops too much.
Lighting plays a more limited role than we expected in our original assumptions. Obviously if a video is shot entirely in the dark it’s not going to work! But from under-exposure to over-exposure computer vision performs quite well across different ranges. And this even without modifying the movie intake at all.
What Are The Limitations of NLP (voice analysis)?
What a person says and how they say it is important of course. But before including this information in the algorithm we also check for a few things.
Determining if the subject is speaking
Sometimes people stay silent even when they are supposed to say something. This, in and out of itself, is a tell-tale. So step one is making sure we identify if the person is really speaking. We are accurate 97% of the time. We do this by using labeled data and comparing our prediction with the label. The accuracy includes scenarios when there is a tv playing in the background or a second person speaking.
The other 3% can be attributed to what we call the ventriloquist effect. If someone can masterfully “lip-sing” it is really difficult for an algorithm to detect the fake.
Native vs. Non-native speakers
Now that we know they are speaking, are the subjects understandable? This question is quite tricky. To understand the scope of the challenge take a language like English. Start with native speakers: Someone from the United States sounds distinctly different from someone from Wales or Australia. And once you add non-native speakers to the mix, the variations are even more diverse and certain accents will throw you off (Scottish for example!). Voice AI is no different. Will it eventually be close to perfect? Most likely but for now it is not.
In our analysis we see a drop in comprehension quality of about 10% between a native and non-native speaker. There is also some variation between heavy native accents and more neutral native accents.
Understanding “understandability” is very important because it is at the root of sentiment analysis, or any computation really. And algorithms have a tendency to default to classifying someone as “neutral” when it does not understand someone, thus creating bias and inaccurate results.
Are There Some Privacy and Security Implications?
One known limitation of our algorithm at this time is that we do not do any facial or voice recognition of the subject. Meaning if we’re expecting a check-in from Pat but instead Jill does it, we will not pick that up and we’ll look at Jill’s results.
Can it be a problem? Yes and no. It really depends on the use case you are trying to address. Given the scenarios our technology is used for, the trade between privacy and accuracy would not benefit our customers and would also open the door to many personal monitoring options we do not want to tackle at the moment.
Do Algorithms Think Like Humans?
We, humans, are very nuanced, ever changing organisms and these constant variations are expressed in our health. Yet when an assessment is done it reflects a snap-shot of where we are at and often discounts any variation.
More problematic is that algorithms often only return a very black/white result as to whether someone falls under a given condition.
By the way, this is not just a computational issue but a classification issue that has been going on in health care for the longest time. For example, In 1917, the U.S. Public Health Service printed a list of over 60 health conditions – from anemia to varicose veins – that doctors could spot during the brief line inspection. How brief was the line? 6 seconds! That was the amount of time doctors spent with people as they were being processed by immigration at Ellis island.
In our algorithm we aim at giving a more nuanced view of where the subject is at. Specifically it means always considering the implications of loss vs. accuracy in our algorithm and the proper identification of outliers and markers.
Because we deal with Mental Health and because of the extreme consequences of mental health struggles, we voluntarily err on a more conservative side when it comes to assessing someone’s condition
We encourage you to bookmark this page as it constantly evolves based on our latest research and findings.