Abstract
Diverse language models (LMs), including large language models (LLMs) based on deep neural networks have come to provide an unprecedented opportunity for mapping out the semantic spaces navigated in speech and their distortions in mental disorders. Recent evidence has pointed to higher mean semantic similarities between words in psychosis, conceptualized as a ‘shrunk’ (more compressed) semantic space. We hypothesized that the high dimensionality of the vector spaces defined by the embeddings of speech samples through LMs would also be easier to reduce in psychosis. To test this, we used principal component analysis (PCA) to calculate different metrics serving as proxies for reducibility, including the number of components needed to reach 90% of variance, and the cumulative variance explained by the first two components. For further exploration, intrinsic dimensionality (ID) was also estimated. Results confirmed significantly higher reducibility of the semantic space in psychosis across all measures and three languages. This result points to the existence of an underlying intrinsic geometry of semantic associations during speech, which may underlie more surface-level measurements such as semantic similarity and illustrates a new foundational approach to speech in mental disorders.
Competing Interest Statement
LP reports personal fees for serving as chief editor from the Canadian Medical Association Journals, speaker/consultant fee from Janssen Canada and Otsuka Canada, SPMM Course Limited, UK, Canadian Psychiatric Association; book royalties from Oxford University Press; investigator-initiated educational grants from Janssen Canada, Sunovion and Otsuka Canada outside the submitted work. TK received unrestricted educational grants from Servier, Janssen, Recordati, Aristo, Otsuka, neuraxpharm. All other authors declare no conflict of interest and report no biomedical financial interests.
Funding Statement
This study was supported by the German Research Foundation to Frederike Stein (STE3301/1-1) project number 527712970. The study is part of the German multicenter consortium Neurobiology of Affective Disorders. A translational perspective on brain structure and function, funded by the German Research Foundation project number 240413749 (Research Unit FOR2107) to Tilo Kircher (KI 588/14-1, KI 588/14-2, KI 588/22-1). This work was also in part supported by the CRC/TRR 393 consortium project number 521379614 to Frederike Stein and Tilo Kircher and by the DYNAMIC initiative, which is funded by the LOEWE program of the Hessian Ministry of Science and Arts (grant number: LOEWE1/16/519/03/09.001(0009)/98). We are deeply indebted to all study participants and staff. A list of acknowledgments can be found here: www.for2107.de/acknowledgements. LP acknowledges research support from the Canada First Research Excellence Fund, awarded to the Healthy Brains, Healthy Lives initiative at McGill University (through New Investigator Supplement to LP); Monique H. Bourgeois Chair in Developmental Disorders and Graham Boeckh Foundation (Douglas Research Centre, McGill University) and salary award from the Fonds de recherche du Quebec-Sante ́ (FRQS). The data acquisition for part of the study was funded by CIHR Foundation Grant (FDN 154296) to LP and supported by the Canada First Excellence Research Fund to BrainSCAN, Western University (Imaging Core); Innovation fund for Academic Medical Organization of Southwest Ontario; Bucke Family Fund, The Chrysalis Foundation and The Arcangelo Rea Family Foundation (London, Ontario). Compute Canada Resources (Application No. 1530) were used in the storage and transfers of imaging data.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Spanish sample: Comite de etica de la Investigacion con Medicamentos, internal code 2021.119 English sample: The Research Ethics Board at Western University approved all study procedures and subjects were provided with informed consent prior to participating. German sample: Ethical permissions for the data collection and sharing of the data for the present study were obtained from the Ethik-Kommission des Fachbereichs Humanmedizin der Philipps-Universitat Marburg (AZ 07:14).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data used for the present analyses are not at present available publicly, but access within the stipulations laid down by the respective ethics committees can be requested from the local PIs: FS (German), RA (Spanish), LP (English).





