Study participants
Table of Contents
All aspects of the study were carried out in strict accordance with and were approved by the Massachusetts General Brigham Institutional Review Board. Right-handed native English speakers undergoing awake microelectrode recording-guided deep brain stimulator implantation were screened for enrolment. Clinical consideration for surgery was made by a multidisciplinary team of neurosurgeons, neurologists and neuropsychologists. Operative planning was made independently by the surgical team and without consideration of study participation. Participants were only enroled if: (1) the surgical plan was for awake microelectrode recording-guided placement, (2) the patient was at least 18 years of age, (3) they had intact language function with English fluency and (4) were able to provide informed consent for study participation. Participation in the study was voluntary and all participants were informed that they were free to withdraw from the study at any time.
Acute intraoperative single-neuronal recordings
Single-neuronal prefrontal recordings using Neuropixels probes
As part of deep brain stimulator implantation at our institution, participants are often awake and microelectrode recordings are used to optimize anatomical targeting of the deep brain structures46. During these cases, the electrodes often traverse part of the posterior language-dominant prefrontal cortex3,4,5,6 in an area previously shown be involved in word planning3,4,5,6,7,8,9,10,11,12 and sentence construction13,14,15,16 and which broadly connects with premotor areas involved in their articulation51,52,53 and lexical processing17,18,19 by imaging studies (Extended Data Fig. 1a,b). All microelectrode entry points and placements were based purely on planned clinical targeting and were made independently of any study consideration.
Sterile Neuropixels probes (v.1.0-S, IMEC, ethylene oxide sterilized by BioSeal54) together with a 3B2 IMEC headstage were attached to cannula and a manipulator connected to a ROSA ONE Brain (Zimmer Biomet) robotic arm. Here, the probes were inserted into the cortical ribbon under direct robot navigational guidance through the implanted burr hole (Fig. 1a). The probes (width 70 µm; length 10 mm; thickness 100 µm) consisted of a total of 960 contact sites (384 preselected recording channels) laid out in a chequerboard pattern with approximately 25 µm centre-to-centre nearest-neighbour site spacing. The IMEC headstage was connected through a multiplexed cable to a PXIe acquisition module card (IMEC), installed into a PXIe Chassis (PXIe-1071 chassis, National Instruments). Neuropixels recordings were performed using SpikeGLX (v.20201103 and v.20221012-phase30; http://billkarsh.github.io/SpikeGLX/) or OpenEphys (v.0.5.3.1 and v.0.6.0; https://open-ephys.org/) on a computer connected to the PXIe acquisition module recording the action potential band (AP, band-pass filtered from 0.3 to 10 kHz) sampled at 30 kHz and a local-field potential band (LFP, band-pass filtered from 0.5 to 500 Hz), sampled at 2,500 Hz. Once putative units were identified, the Neuropixels probe was briefly held in position to confirm signal stability (we did not screen putative neurons for speech responsiveness). Further description of this recording approach can be found in refs. 54,55. After single-neural recordings from the cortex were completed, the Neuropixels probe was removed and subcortical neuronal recordings and deep brain stimulator placement proceeded as planned.
Single-unit isolation
Single-neuronal recordings were performed in two main steps. First, to track the activities of putative neurons at high spatiotemporal resolution and to account for intraoperative cortical motion, we use a Decentralized Registration of Electrophysiology Data software (DREDge; https://github.com/evarol/DREDge) and interpolation approach (https://github.com/williamunoz/InterpolationAfterDREDge). Briefly, and as previously described54,55,56, an automated protocol was used to track LFP voltages using a decentralized correlation technique that re-aligned the recording channels in relation to brain movements (Fig. 1a, right). Following this step, we then interpolated the AP band continuous voltage data using the DREDge motion estimate to allow the activities of the putative neurons to be stably tracked over time. Next, single units were isolated from the motion-corrected interpolated signal using Kilosort (v.1.0; https://github.com/cortex-lab/KiloSort) followed by Phy for cluster curation (v.2.0a1; https://github.com/cortex-lab/phy; Extended Data Fig. 1c,d). Here, units were selected on the basis of their waveform morphologies and separability in principal component space, their interspike interval profiles and similarity of waveforms across contacts. Only well-isolated single units with mean firing rates ≥0.1 Hz were included. The range of units obtained from these recordings was 16–115 units per participant.
Audio recordings and task synchronization
For task synchronization, we used the TTL output and audio output to send the synchronization trigger through the SMA input to the IMEC PXIe acquisition module card. To allow for added synchronizing, triggers were also recorded on an extra breakout analogue and digital input/output board (BNC2110, National Instruments) connected through a PXIe board (PXIe-6341 module, National Instruments).
Audio recordings were obtained at 44 kHz sampling frequency (TASCAM DR-40×4-Channel/ 4-Track Portable Audio Recorder and USB Interface with Adjustable Microphone) which had an audio input. These recordings were then sent to a NIDAQ board analogue input in the same PXIe acquisition module containing the IMEC PXIe board for high-fidelity temporal alignment with neuronal data. Synchronization of neuronal activity with behavioural events was performed through TTL triggers through a parallel port sent to both the IMEC PXIe board (the sync channel) and the analogue NIDAQ input as well as the parallel audio input into the analogue input channels on the NIDAQ board.
Audio recordings were annotated in semi-automated fashion (Audacity; v.2.3). Recorded audio for each word and sentence by the participants was analysed in Praat75 and Audacity (v.2.3). Exact word and phoneme onsets and offsets were identified using the Montreal Forced Aligner (v.2.2; https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)76 and confirmed with manual review of all annotated recordings. Together, these measures allowed for the millisecond-level alignment of neuronal activity with each produced word and phoneme.
Anatomical localization of recordings
Pre-operative high-resolution magnetic resonance imaging and postoperative head computerized tomography scans were coregistered by combination of ROSA software (Zimmer Biomet; v.3.1.6.276), Mango (v.4.1; https://mangoviewer.com/download.html) and FreeSurfer (v.7.4.1; https://surfer.nmr.mgh.harvard.edu/fswiki/DownloadAndInstall) to reconstruct the cortical surface and identify the cortical location from which Neuropixels recordings were obtained77,78,79,80,81. This registration allowed localization of the surgical areas that underlaid the cortical sites of recording (Fig. 1a and Extended Data Fig. 1a)54,55,56. The MNI transformation of these coordinates was then carried out to register the locations in MNI space with Fieldtrip toolbox (v.20230602; https://www.fieldtriptoolbox.org/; Extended Data Fig. 1b)82.
For depth calculation, we estimated the pial boundary of recordings according to the observed sharp signal change in signal from channels that were implanted in the brain parenchyma versus those outside the brain. We then referenced our single-unit recording depth (based on their maximum waveform amplitude channel) in relation to this estimated pial boundary. Here, all units were assessed on the basis of their relative depths in relation to the pial boundary as superficial, middle and deep (Extended Data Fig. 7).
Speech production task
The participants performed a priming-based naturalistic speech production task57 in which they were given a scene on a screen that consisted of a scenario that had to be described in specific order and format. Thus, for example, the participant may be given a scene of a boy and a girl playing with a balloon or they may be given a scene of a dog chasing a cat. These scenes, together, required the participants to produce words that varied in phonetic, syllabic and morphosyntactic content. They were also highlighted in a way that required them to produce the words in a structured format. Thus, for example, a scene may be highlighted in a way that required the participants to produce the sentence “The mouse was being chased by the cat” or in a way that required them to produce the sentence “The cat was chasing the mouse” (Extended Data Fig. 2a). Because the sentences had to be constructed de novo, it also required the participants to produce the words without providing explicit phonetic cues (for example, from hearing and then repeating the word ‘cat’). Taken together, this task therefore allowed neuronal activity to be examined whereby words (for example, ‘cat’), rather than independent phonetic sounds (for example, /k/), were articulated and in which the words were produced during natural speech (for example, constructing the sentence “the dog chased the cat”) rather than simply repeated (for example, hearing and then repeating the word ‘cat’).
Finally, to account for the potential contribution of sensory–perceptual responses, three of the participants also performed a ‘perception’ control in which they listened to words spoken to them. One of these participants further performed an auditory ‘playback’ control in which they listened to their own recorded voice. For this control, all words spoken by the participant were recorded using a high-fidelity microphone (Zoom ZUM-2 USM microphone) and then played back to them on a word-by-word level in randomized separate blocks.
Constructing a word feature space
Phonemes
To allow for single-neuronal analysis and to provide a compositional representation for each word, we grouped the constituent phonemes on the basis of the relative positions of articulatory organs associated with their production60. Here, for our primary analyses, we selected the places of articulation for consonants (for example, bilabial consonants) on the basis of established IPA categories defining the primary articulators involved in speech production. For consonants, phonemes were grouped on the basis of their places of articulation into glottal, velar, palatal, postalveolar, alveolar, dental, labiodental and bilabial. For vowels, we grouped phonemes on the basis of the relative height of the tongue with high vowels being produced with the tongue in a relatively high position and mid-low (that is, mid+low) vowels being produced with it in a lower position. Here, this grouping of phonemes is broadly referred to as ‘places of articulation’ together reflecting the main positions of articulatory organs and their combinations used to produce the words58,59. Finally, to allow for comparison and to test their generalizability, we examined the manners of articulation stop, fricative, affricate, nasal, liquid and glide for consonants which describe the nature of airflow restriction by various parts of the mouth and tongue. For vowels, we also evaluated the primary cardinal vowels i, e, ɛ, a, α, ɔ, o and u which are described, in combination, by the position of the tongue relative to the roof of the mouth, how far forward or back it lies and the relative positions of the lips83,84. A detailed summary of these phonetic groupings can be found in Extended Data Table 1.
Phoneme feature space
To further evaluate the relationship between neuronal activity and the presence of specific constituent phonemes per word, the phonemes in each word were parsed according to their precise pronunciation provided by the English Lexicon Project (or the Longman Pronunciation Dictionary for American English where necessary) as described previously85. Thus, for example, the word ‘like’ (l-aɪ-k) would be parsed into a sequence of alveolar-mid-low-velar phonemes, whereas the word ‘bike’ (b-aɪ-k) would be parsed into a sequence of bilabial-mid-low-velar phonemes.
These constituent phonemes were then used to represent each word as a ten-dimensional vector in which the value in each position reflected the presence of each type of phoneme (Fig. 1c). For example, the word ‘like’, containing a sequence of alveolar-mid-low-velar phonemes, was represented by the vector [0 0 0 1 0 0 1 0 0 1], with each entry representing the number of the respective type of phoneme in the word. Together, such vectors representing all words defined a phonetic ‘vector space’. Further analyses to evaluate the precise arrangement of phonemes per word are described further below. Goodness-of-fit and selectivity metrics used to evaluate single-neuronal responses to these phonemes and their specific combination in words are described further below.
Syllabic feature space
Next, to evaluate the relationship between neuronal activity and the specific arrangement of phonemes in syllables, we parsed the constituent syllables for each word using American pronunciations provided in ref. 85. Thus, for example, ‘back’ would be defined as a labial-low-velar sequence. Here, to allow for neuronal analysis and to limit the combination of all possible syllables, we selected the ten most common syllable types. High and mid-low vowels were considered as syllables here only if they reflected syllables in themselves and were unbound from a consonant (for example, /ih/ in ‘hesitate’ or /ah-/ in ‘adore’). Similar to the phoneme space, the syllables were then transformed into an n-dimensional binary vector in which the value in each dimension reflected the presence of specific syllables (similar to construction of the phoneme space). Thus, for the n-dimensional representation of each word in this syllabic feature space, the value in each dimension could be also interpreted in relation to neuronal activity.
Morphemes
To account for the functional distinction between phonemes and morphemes62,63, we also parsed words into those that contained bound morphemes which were either prefixed (for example, ‘re–’) or suffixed (for example, ‘–ed’). Unlike phonemes, morphemes such as ‘–ed’ in ‘directed’ or ‘re–’ in ‘retry’ are the smallest linguistic units capable of carrying meaning and, therefore, accounting for their presence allowed their effect on neuronal responses to be further examined. To allow for neuronal analysis and to control for potential differences in neuronal activity due to word lengths, models also took into account the total number of phonemes per word.
Spectral features
To evaluate the time-varying spectral features of the articulated phonemes on a phoneme-by-phoneme basis, we identified the occurrence of each phoneme using a Montreal Forced Aligner (v.2.2; https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner). For pitch, we calculated the spectral power in ten log-spaced frequency bins from 200 to 5,000 Hz for each phoneme per word. For amplitude, we took the root-mean-square of the recorded waveform of each phoneme.
Single-neuronal analysis
Evaluating the selectivity of single-neuronal responses
To investigate the relationship between single-neuronal activity and specific word features, we used a regression analysis to determine the degree to which variation in neural activity could be explained by phonetic, syllabic or morphologic properties of spoken words86,87,88,89. For all analyses, neuronal activity was considered in relation to word utterance onset (t = 0) and taken as the mean spike count in the analysis window of interest (that is, −500 to 0 ms from word onset for word planning and 0 to +500 ms for word production). To limit the potential effects of preceding words on neuronal activity, words with planning periods that overlapped temporally were excluded from regression and selectivity analyses. For each neuron, we constructed a GLM that modelled the spike count rate as the realization of a Poisson process whose rate varied as a function of the linguistic (for example, phonetic, syllabic and morphologic) or acoustic features (for example, spectral power and root-mean-square amplitude) of the planned words.
Models were fit using the Python (v.3.9.17) library statsmodels (v.0.13.5) by iterative least-squares minimization of the Poisson negative log-likelihood function86. To assess the goodness-of-fit of the models, we used both the Akaike information criterion (\({\rm{AIC}}=2k-2{\rm{ln}}(L)\) where k is the number of estimated parameters and L is the maximized value of the likelihood function) and a generalization of the R2 score for the exponential family of regression models that we refer to as D2 whereby87:
$${D}^{2}=1-\frac{K({\bf{y}},{{\boldsymbol{\mu }}}_{{\rm{full}}})}{K({\bf{y}},{{\boldsymbol{\mu }}}_{{\rm{restricted}}})}$$
y is a vector of realized outcomes, μ is a vector of estimated means from a full (including all regressors) or restricted (without regressors of interest) model and \({K}({\bf{y}}\,,{\boldsymbol{\mu }})=2\bullet {\rm{llf}}({\bf{y}}\,;{\bf{y}})-2\bullet {\rm{llf}}({\boldsymbol{\mu }}\,;{\bf{y}})\) where \({\rm{llf}}({\boldsymbol{\mu }}\,;{\bf{y}})\) is the log-likelihood of the model and \({\rm{llf}}({\bf{y}}\,;{\bf{y}})\) is the log-likelihood of the saturated model. The D2 value represents the proportion of reduction in uncertainty (measured by the Kullback–Leibler divergence) due to the inclusion of regressors. The statistical significance of model fit was evaluated using the likelihood ratio test compared with a model with all covariates except the regressors of interest (the task variables).
We characterized a neuron as selectively ‘tuned’ to a given word feature if the GLM of neuronal firing rates as a function of task variables for that feature exhibited a statistically significant model fit (likelihood ratio test with α set at 0.01). For neurons meeting this criterion, we also examined the point estimates and confidence intervals for each coefficient in the model. A vector of these coefficients (or, in our feature space, a vector of the sign of these coefficients) indicates a word with the combination of constituent elements expected to produce a maximal neuronal response. The multidimensional feature spaces also allowed us to define metrics that quantified the phonemic, syllabic or morphologic similarity between words. Here, we calculated the Hamming distance between the vector describing each word u and the vector of the sign of regression coefficients that defines each neuron’s maximal predicted response v, which is equal to the number of positions at which the corresponding values are different:
$${\rm{Hamming}}\,{\rm{distance}}={\rm{count}}\left\{i:{{\bf{u}}}_{i}\ne {{\bf{v}}}_{i},i=1\ldots n\right\}$$
For each ‘tuned’ neuron, we compared the Z-scored firing rate elicited by each word as a function of the Hamming distance between the word and the ‘preferred word’ of the neuron to examine the ‘tuning’ characteristics of these neurons (Figs. 1f and 2c). A Hamming distance of zero would therefore indicate that the words have phonetically identical compositions. Finally, to examine the relationship between neuronal activity and spectral features of each phoneme, we extracted the acoustic waveform for each phoneme and calculated the power in ten log-spaced spectral bands. We then constructed a ‘spectral vector’ representation for each word based on these ten values and fit a Poisson GLM of neuronal firing rates against these values. For amplitude analysis, we regressed neuronal firing rates against the root-mean-square amplitude of the waveform for each word.
Controlling for interdependency between phonetic and syllabic features
Three more word variations were used to examine the interdependency between phonetic and syllabic features. First, we compared firing rates for words containing specific syllables with words containing individual phonemes in that syllable but not the syllable itself (for example, simply /d/ in ‘god’ or ‘dog’). Second, we examined words containing syllables with the same constituent phonemes but in a different order (for example, /g-ah-d/ for ‘god’ versus /d-ah-g/ for ‘dog’). Thus, if neurons responded preferentially to specific syllables, then they should continue to respond to them preferentially even when comparing words that had the same arrangements of phonemes but in different or reverse order. Third, we examined words containing the same sequence of syllables but spanning a syllable boundary such that the cluster of phonemes did not constitute a syllable (that is, in the same syllable versus spanning across syllable boundaries).
Visualization of neuronal responses within the population
To allow for visualization of groupings of neurons with shared representational characteristics, we calculated the AIC and D2 for phoneme, syllable and morpheme models for each neuron and conducted tSNE procedure which transformed these data into two dimensions such that neurons with similar feature representations are spatially closer together than those with dissimilar representations90. We used the tSNE implantation in the scikit-learn Python module (v.1.3.0). In Fig. 3a left, a tSNE was fit on the AIC values for phoneme, syllable and morpheme models for each neuron during the planning period with the following parameters: perplexity = 35, early exaggeration = 2 and using Euclidean distance as the metric. In Fig. 3a right and Fig. 4a bottom, a different tSNE was fit on the D2 values for all planning and production models using the following parameters: perplexity = 10, early exaggeration = 10 and using a cosine distance metric. The resulting embeddings were mapped onto a grid of points according to a linear sum assignment algorithm between embeddings and grid points.
Population modelling
Modelling population activity
To quantify the degree to which the neural population coded information about the planned phonemes, syllables and morphemes, we modelled the activity of the entire pseudopopulation of recorded neurons. To match trials across the different participants, we first labelled each word according to whether it contained the feature of interest and then matched words across subjects based on the features that were shared. Using this procedure, no trials or neural data were duplicated or upsampled, ensuring strict separation between training and testing sets during classifier training and subsequent evaluation.
For decoding, words were randomly split into training (75%) and testing (25%) trials across 50 iterations. A support vector machine (SVM) as implemented in the scikit-learn Python package (v.1.3.0)91 was used to construct a hyperplane in n-dimensional space that optimally separates samples of different word features by solving the following minimization problem:
$$\min \left(\frac{1}{2}{w}^{T}w+C\mathop{\sum }\limits_{i=1}^{n}{\zeta }_{i}\right)$$
subject to \({y}_{i}({w}^{T}\phi ({x}_{i})+b)\ge 1-{\zeta }_{i}\) and \({\zeta }_{i}\ge 0\) for all \(i\in \left\{1,\ldots ,n\right\}\), where w is the margin in feature space, C is the regularization strength, ζi is the distance of each point from the margin, yi is the predicted class for each sample and ϕ(xi) is the image of each datapoint in transformed feature space. A radial basis function kernel with coefficient γ = 1/272 was applied. The penalty term C was optimized for each classifier using a cross-validation procedure nested in the training set.
A separate classifier was trained for each dimension in a task space (for example, separate classifiers for bilabial, dental and alveolar consonants) and scores for each of these classifiers were averaged to calculate an overall decoding score for that feature type. Each decoder was trained to predict whether the upcoming word contained an instance of a specific phoneme, syllable or morpheme arrangement. For phonemes, we used nine of the ten phoneme groups (there were insufficient instances of palatal consonants to train a classifier; Extended Data Table 1). For syllables, we used ten syllables taken from the most common syllables across the study vocabulary (Extended Data Table 1). For morpheme analysis, a single classifier was trained to predict the presence or absence of any bound morpheme in the upcoming word.
Finally, to assess performance, we scored classifiers using the area under the curve of the receiver operating characteristic (AUC-ROC) model. With this scoring metric, a classifier that always guesses the most common class (that is, an uninformative classifier) results in a score of 0.5 whereas a perfect classification results in a score of 1. The overall decoding score for a particular feature space was the mean score of the classifier for each dimension in the space. The entire procedure was repeated 50 times with random train/test splits. Summary statistics for these 50 iterations are presented in the main text.
Model switching
Assessing decoder generalization across different experimental conditions provides a powerful method to evaluate the similarity of neuronal representations of information in different contexts64. To determine how neurons encoded the same word features but under different conditions, we trained SVM decoders using neuronal data during one condition (for example, word production) but tested the decoder using data from another (for example, no word production). Before decoder training or testing, trials were split into disjoint training and testing sets, from which the neuronal data were extracted in the epoch of interest. Thus, trials used to train the model were never used to test the model while testing either native decoder performance or decoder generalizability.
Modelling temporal dynamic
To further study the temporal dynamic of neuronal response, we trained decoders to predict the phonemes, syllables and morpheme arrangement for each word across successive time points before utterance64. For each neuron, we aligned all spikes to utterance onset, binned spikes into 5 ms windows and convolved with a Gaussian kernel with standard deviation of 25 ms to generate an estimated instantaneous firing rate at each point in time during word planning. For each time point, we evaluated the performance of decoders of phonemes, syllables and morphemes trained on these data over 50 random splits of training and testing trials. The distribution of times of peak decoding performance across the planning or perception period revealed the dynamic of information encoding by these neurons during word planning or perception and we then calculated the median peak decoding times for phonemes, syllables or morphemes.
Dynamical system and subspace analysis
To study the dimensionality of neuronal activity and to evaluate the functional subspaces occupied by the neuronal population, we used dynamical systems approach that quantified the time-dependent changes in neural activity patterns31. For the dynamical system analysis, activity for all words were averaged for each neuron to come up with a single peri-event time projection (aligned to word onset) which allowed all neurons to be analysed together as a pseudopopulation. First, we calculated the instantaneous firing rates of the neuron which showed selectivity to any word feature (phonemes, syllables or morpheme arrangement) into 5 ms bins convolved with a Gaussian filter with standard deviation of 50 ms. We used equal 500 ms windows set at −500 to 0 ms before utterance onset for the planning phase and 0 to 500 ms following utterance onset for the production phase to allow for comparison. These data were then standardized to zero mean and unit variance. Finally, the neural data were concatenated into a T × N matrix of sampled instantaneous firing rates for each of the N neurons at every time T.
Together, these matrices represented the evolution of the system in N-dimensional space over time. A principal component analysis revealed a small set of five principal components (PC) embedded in the full N-dimensional space that captured most of the variance in the data for each epoch (Fig. 4b). Projection of the data into this space yields a T × 5 matrix representing the evolution of the system in five-dimensional space over time. The columns of the N × 5 principal components form an orthonormal basis for the five-dimensional subspace occupied by the system during each epoch.
Next, to quantify the relationship between these subspaces during planning and production, we took two approaches. First, we calculated the alignment index from ref. 66:
$$A=\frac{{\rm{Tr}}({D}_{{\rm{A}}}^{\intercal}{C}_{{\rm{B}}}{D}_{{\rm{A}}})}{{\sum }_{i}{\sigma }_{{\rm{B}}}(i)}$$
where DA is the matrix defined by the orthonormal basis of subspace A, CB is the covariance of the neuronal data as it evolves in space B, \({\sigma }_{{\rm{B}}}(i)\) is the ith singular value of the covariance matrix CB and Tr(∙) is the matrix trace. The alignment index A ranges from 0 to 1 and quantifies the fraction of variance in space B recovered when the data are projected into space A. Higher values indicate that variance in the data is adequately captured by either subspace.
As discussed in ref. 66, subspace misalignment in the form of low alignment index A can arise by chance when considering high-dimensional neuronal data because of the probability that two randomly selected sets of dimensions in high-dimensional space may not align well. Therefore, to further explore the degree to which our subspace misalignment was attributable to chance, we used the Monte Carlo analysis to generate random subspaces from data with the same covariance structure as the true (observed) data:
$${\bf{V}}={\rm{orth}}\left(\frac{U\sqrt{S}\,v}{{\parallel U\sqrt{S}v\parallel }_{2}}\right)$$
where V is a random subspace, U and S are the eigenvectors and eigenvalues of the covariance matrix of the observed data across all epochs being compared, v is a matrix of white noise and orth(∙) orthogonalizes the matrix. The alignment index A of the subspaces defined by the resulting basis vectors V was recalculated 1,000 times to generate a distribution of alignment index values A attributable to chance alone (compare Fig. 4b).
Finally, we calculated the projection error between each pair of subspaces on the basis of relationships between the three orthonormal bases (rather than a projection of the data into each of these subspaces). The set of all (linear) subspaces of dimension k < n embedded in an n-dimensional vector space V forms a manifold known as the Grassmannian, endowed with several metrics which can be used to quantify distances between two subspaces on the manifold. Thus, the subspaces (defined by the columns of a T × N′ matrix, where N′ is the number of selected principal components; five in our case) explored by the system during planning and production are points on the Grassmannian manifold of the full N-neuron dimensional vector space. Here, we used the Grassmannian chordal distance92:
$${\rm{d}}(A,B)=\,\frac{1}{\sqrt{2}}{\parallel A{A}^{\intercal}-B{B}^{\intercal}\parallel }_{F}$$
where A and B are matrices whose columns are the orthonormal basis for their respective subspaces and \({\parallel \cdot \parallel }_{F}\) is the Frobenius norm. By normalizing this distance by the Frobenius norm of subspace A, we scale the distance metric from 0 to 1, where 0 indicates a subspace identical to A (that is, completely overlapping) and increasing values indicate greater misalignment from A. Random sampling of subspaces under the null hypothesis was repeated using the same procedure outlined above.
Participant demographics
Across the participants, there was no statistically significant difference in word length based on sex (three-way analysis of variance, F(1,4257) = 1.78, P = 0.18) or underlying diagnosis (essential tremor versus Parkinson’s disease; F(1,4257) = 0.45, P = 0.50). Among subjects with Parkinson’s disease, there was a significant difference based on disease severity (both ON score and OFF score) with more advanced disease (higher scores) correlating with longer word lengths (F(1,3295) = 145.8, P = 7.1 × 10−33 for ON score and F(1,3295) = 1,006.0, P = 6.7 × 10−193 for OFF score, P < 0.001) and interword intervals (F(1,3291) = 14.9, P = 1.1 × 10−4 for ON score and F(1,3291) = 31.8, P = 1.9 × 10−8 for OFF score). Modelling neuronal activities in relation to these interword intervals (bottom versus top quartile), decoding performances were slightly higher for longer compared to shorter delays (0.76 ± 0.01 versus 0.68 ± 0.01, P < 0.001, two-sided Mann–Whitney U-test).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.