Currently, only 40% of people who could benefit from Hearing Aids (HAs) have them, and most people who have HA devices don't use them often enough. There is a social stigma around using visible HAs ('fear of looking old'), they require a lot of conscious effort to concentrate on different sounds and speakers, and only limited use is made of speech enhancement - making the spoken words (which are often the most important aspect of hearing to people) easier to distinguish. It is not enough just to make everything louder!
To transform hearing care by 2050, we aim to completely re-think the way HAs are designed. Our transformative approach - for the first time - draws on the cognitive principles of normal hearing. Listeners naturally combine information from both their ears and eyes: we use our eyes to help us hear. We will create "multi-modal" aids which not only amplify sounds but contextually use simultaneously collected information from a range of sensors to improve speech intelligibility. For example, a large amount of information about the words said by a person is conveyed in visual information, in the movements of the speaker's lips, hand gestures, and similar. This is ignored by current commercial HAs and could be fed into the speech enhancement process. We can also use wearable sensors (embedded within the HA itself) to estimate listening effort and its impact on the person, and use this to tell whether the speech enhancement process is actually helping or not.
Creating these multi-modal "audio-visual" HAs raises many formidable technical challenges which need to be tackled holistically. Making use of lip movements traditionally requires a video camera filming the speaker, which introduces privacy questions. We can overcome some of these questions by encrypting the data as soon as it is collected, and we will pioneer new approaches for processing and understanding the video data while it stays encrypted. We aim to never access the raw video data, but still to use it as a useful source of information. To complement this, we will also investigate methods for remote lip reading without using a video feed, instead of exploring the use of radio signals for remote monitoring.