Publications

Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing

Published in bioRxiv, 2024

Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models to perform real-world hearing tasks with simulated cochlear input, assessing the precision of auditory nerve spike timing needed to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. However, the temporal precision needed to reproduce human-like behavior varied across tasks, as did the precision that benefited real-world task performance. These effects suggest that perceptual domains incorporate phase locking to different extents depending on the demands of real-world hearing. The results illustrate how optimizing models for realistic tasks can clarify the role of candidate neural codes in perception.

Recommended citation: Saddler MR, McDermott JH (in press). "Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing." Nature Communications. https://doi.org/10.1101/2024.04.21.590435

Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception

Published in Nature Communications, 2021

Perception is thought to be shaped by the environments for which organisms are optimized. These influences are difficult to test in biological organisms but may be revealed by machine perceptual systems optimized under different conditions. We investigated environmental and physiological influences on pitch perception, whose properties are commonly linked to peripheral neural coding limits. We first trained artificial neural networks to estimate fundamental frequency from biologically faithful cochlear representations of natural sounds. The best-performing networks replicated many characteristics of human pitch judgments. To probe the origins of these characteristics, we then optimized networks given altered cochleae or sound statistics. Human-like behavior emerged only when cochleae had high temporal fidelity and when models were optimized for naturalistic sounds. The results suggest pitch perception is critically shaped by the constraints of natural environments in addition to those of the cochlea, illustrating the use of artificial neural networks to reveal underpinnings of behavior.

Recommended citation: Saddler MR, Gonzalez R, McDermott JH (2021). "Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception." Nature Communications. https://www.nature.com/articles/s41467-021-27366-6

Speech denoising with auditory models

Published in Proceedings of Interspeech, 2021

Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform. The development of high-performing neural network sound recognition systems has raised the possibility of using deep feature representations as ‘perceptual’ losses with which to train denoising systems. We explored their utility by first training deep neural networks to classify either spoken words or environmental sounds from audio. We then trained an audio transform to map noisy speech to an audio waveform that minimized the difference in the deep feature representations between the output audio and the corresponding clean audio. The resulting transforms removed noise substantially better than baseline methods trained to reconstruct clean waveforms, and also outperformed previous methods using deep feature losses. However, a similar benefit was obtained simply by using losses derived from the filter bank inputs to the deep networks. The results show that deep features can guide speech enhancement, but suggest that they do not yet outperform simple alternatives that do not involve learned features.

Recommended citation: Saddler MR, Francl A, Feather J, Qian K, Zhang Y, McDermott JH (2021). "Speech denoising with auditory models." Proc. Interspeech 2021, 2681-2685. https://arxiv.org/abs/2011.10706

Characterizing Chilean blue whale vocalizations with DTAGs: a test of using tag accelerometers for caller identification

Published in Journal of Experimental Biology, 2017

Vocal behavior of blue whales (Balaenoptera musculus) in the Gulf of Corcovado, Chile, was analysed using both audio and accelerometer data from digital acoustic recording tags (DTAGs). Over the course of three austral summers (2014, 2015 and 2016), seventeen tags were deployed, yielding 124h of data. We report the occurrence of Southeast Pacific type 2 (SEP2) calls, which exhibit peak frequencies, durations and timing consistent with previous recordings made using towed and moored hydrophones. We also describe tonal downswept (D) calls, which have not been previously described for this population. As being able to accurately assign vocalizations to individual whales is fundamental for studying communication and for estimating population densities from call rates, we further examine the feasibility of using high-resolution DTAG accelerometers to identify low-frequency calls produced by tagged blue whales. We cross- correlated acoustic signals with simultaneous tri-axial accelerometer readings in order to analyse the phase match as well as the amplitude of accelerometer signals associated with low-frequency calls, which provides a quantitative method of determining if a call is associated with a detectable acceleration signal. Our results suggest that vocalizations from nearby individuals are also capable of registering accelerometer signals in the tagged whale’s DTAG record. We cross-correlate acceleration vectors between calls to explore the possibility of using signature acceleration patterns associated with sounds produced within the tagged whale as a new method of identifying which accelerometer-detectable calls originate from the tagged animal.

Recommended citation: Saddler MR, Bocconcelli A, Hickmott L, Chiang G, Landea-Briones R, Bahamonde P, Howes G, Segre P, Sayigh (2017). "Characterizing Chilean blue whale vocalizations with DTAGs: a test of using tag accelerometers for caller identification." Journal of Experimental Biology 220, 4119-4129. https://journals.biologists.com/jeb/article/220/22/4119/18884

DTAG studies of blue whales (Balaenoptera musculus) in the Gulf of Corcovado

Published in Proceedings of Meetings on Acoustics, 2016

This investigation set out to obtain data on the ecology, foraging and acoustic behavior of Chilean blue whales (Balaenoptera musculus) in the Gulf of Corcovado, which is an important feeding ground. We deployed 17 suction cup attached sound and orientation recording tags (DTAGs) on blue whales in 2014-16, for a total duration of 124h 08 min. Acoustic data on the tags revealed a variety of different calls. These included SEP2 (Southeast Pacific) song, previously described in this area, as well as other call types not previously described for Chilean blue whales. Downsweep calls similar to those described for other blue whale populations were observed on several tags, as were various other less stereotyped calls. We are currently working on characterizing these call types and also on using the accelerometers to identify calls from the tagged animal. Tag data will prove useful for interpretation of data collected in this area from passive acoustic monitors (PAM), both for species identification and possibly also density estimation. Overall, this work has the potential to greatly increase knowledge of the biology, ecology and behavior of blue whales in the Gulf of Corcovado.

Recommended citation: Bocconcelli A, Hickmott L, Chiang G, Bahamonde P, Howes G, Landea-Briones R, Caruso F, Saddler MR, Sayigh L (2016). "DTAG studies of blue whales (Balaenoptera musculus) in the Gulf of Corcovado." Proceedings of Meetings on Acoustics 27, 040002. https://asa.scitation.org/doi/10.1121/2.0000269

Mark Saddler