Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists

Hao-Ting Wang; Jonathan Smallwood; Janaina Mourao-Miranda; Cedric Huchuan Xia; Theodore D Satterthwaite; Danielle S Bassett; Danilo Bzdok

doi:10.1016/j.neuroimage.2020.116745

Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists

Neuroimage. 2020 Aug 1:216:116745. doi: 10.1016/j.neuroimage.2020.116745. Epub 2020 Apr 8.

Authors

Hao-Ting Wang¹, Jonathan Smallwood², Janaina Mourao-Miranda³, Cedric Huchuan Xia⁴, Theodore D Satterthwaite⁴, Danielle S Bassett⁵, Danilo Bzdok⁶

Affiliations

¹ Department of Psychology, University of York, Heslington, York, United Kingdom; Sackler Center for Consciousness Science, University of Sussex, Brighton, United Kingdom. Electronic address: H.Wang@bsms.ac.uk.
² Department of Psychology, University of York, Heslington, York, United Kingdom.
³ Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.
⁴ Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
⁵ Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA; Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, 19104, USA; Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA; Department of Physics & Astronomy, School of Arts & Sciences, University of Pennsylvania, Philadelphia, PA, 19104, USA.
⁶ Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, Germany; JARA-BRAIN, Jülich-Aachen Research Alliance, Germany; Parietal Team, INRIA, Neurospin, Bat 145, CEA Saclay, 91191, Gif-sur-Yvette, France; Department of Biomedical Engineering, Montreal Neurological Institute, Faculty of Medicine, McGill University, Montreal, Canada; Mila - Quebec Artificial Intelligence Institute, Canada. Electronic address: danilo.bzdok@mcgill.ca.

PMID: 32278095
DOI: 10.1016/j.neuroimage.2020.116745

Abstract

The 21st century marks the emergence of "big data" with a rapid increase in the availability of datasets with multiple measurements. In neuroscience, brain-imaging datasets are more commonly accompanied by dozens or hundreds of phenotypic subject descriptors on the behavioral, neural, and genomic level. The complexity of such "big data" repositories offer new opportunities and pose new challenges for systems neuroscience. Canonical correlation analysis (CCA) is a prototypical family of methods that is useful in identifying the links between variable sets from different modalities. Importantly, CCA is well suited to describing relationships across multiple sets of data, such as in recently available big biomedical datasets. Our primer discusses the rationale, promises, and pitfalls of CCA.

Keywords: Big data; Data science; Deep phenotyping; Machine learning; Modality fusion.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Big Data*
Humans
Machine Learning*
Models, Statistical*
Neuroimaging / methods*
Neurosciences / methods*

Abstract

Publication types

MeSH terms

Grants and funding