A transcriptomic axis predicts state modulation of cortical interneurons – Nature

ByAUTHOR

Jul 6, 2022

All experimental procedures were conducted in accordance with the UK Animals (Scientific Procedures Act) 1986. Experiments were performed at University College London under personal and project licences released by the Home Office following appropriate ethics review.

Mice

Experiments were performed on mice aged between 12 and 15 weeks maintained on a 12-h light–dark cycle, at 20–24 °C and 45–65% humidity, in individually ventilated cages. For post-hoc identification of transcriptomic subtypes, four (two males and two females) Gad2-T2a-NLS-mCherry transgenic mice (stock no: 023140, The Jackson Laboratory), expressing the red fluorescent protein mCherry in the nuclei of Gad2-expressing cells, were used. For comparison to transgenic mouse lines (Extended Data Fig. 5), additional experiments were performed as in ref. 30 using one male Pvalbtm1(cre)Arbr and two males and one female Ssttm2.1(cre)Zjh crossed with Gt(ROSA)26Sortm14(CAG-tdTomato)Hze.

Surgical procedures

All mice were injected with an unconditional GCaMP6m virus, AAV1.Syn.GCaMP6m.WPRE.SV40 obtained from the University of Pennsylvania Viral Vector Core. The virus was injected with a bevelled micropipette using a Nanoject II injector (Drummond Scientific Company, Broomall, PA 1) attached to a stereotaxic micromanipulator. Six to seven boli of 100–200 nl virus (2.23 × 1012 GC ml−1) were slowly (around 20 nl min−1) injected unilaterally into monocular V1 (ref. 51) 2.1–3.3 mm laterally and 3.5–4.0 mm posteriorly from bregma and at a depth of L2/3 (200–300 mm).

After virus injection, a small bolus (10 μl) of red fluorescent beads (FluoSpheres Carboxylate-Modified Microspheres, 2.0 µm, red fluorescent (580/605), 2% solids, Thermo Fisher Scientific) was injected at the most rostral part of the craniotomy, to allow orientation of the ex vivo slices but not interfere with V1 imaging in the caudal part. After recovery, mice were habituated for handling and head‐fixation for three days before carrying out recordings.

Recording neuronal activity in V1

Two-photon calcium imaging

Each mouse was recorded for at least three sessions. In vivo recordings were performed 15–45 days after the virus injection. We used a commercial two-photon microscope with a resonant-galvo scanhead (B-scope, ThorLabs) controlled by ScanImage 4.2 (ref. 52), with an acquisition frame rate of about 30 Hz (at 512 by 512 pixels, corresponding to a sampling rate of about 4.3 Hz). The field of view was 550–600 µm. We imaged seven planes at 15–45-µm steps, starting at various positions below the brain surface (from 0 to −150 µm) to sample different cortical depths and therefore subtypes recorded simultaneously during different sessions. Imaging calcium activity was performed at a wavelength of 920 nm or 980 nm. Three computer screens spanning −135 to +135 visual degrees (v°) along the azimuth axis and −35 to +35 v° along the elevation axis were used to display visual stimuli. During the presentation of visual stimuli, we switched off the red gun of the monitors to prevent light from the monitors contaminating the red fluorescent channel.

At the end of each recording session, reference z-stacks were acquired. Starting at the same position as the imaging planes, we acquired two z-stacks of about 400 μm depth, with a 1-μm step between planes. The first one, called the GCaMP z-stack, was acquired at the same wavelength as the calcium imaging (920 or 980 nm). The second one, called the reference z-stack, was acquired at 1,040 nm to image mCherry fluorescence.

Before euthanizing each mouse, we acquired structural z-stacks (ranging from the brain surface to 400 μm deep) at 1,040 nm to get an image of the mCherry cells across the whole craniotomy (including the position where the red fluorescent beads were injected). This structural z-stack was used to select slices on which to perform transcriptomic analysis, and to provide an initialization point for the registration algorithm.

Initial retinotopic mapping

All recordings were targeted to the V1 monocular region (>60° azimuth). To find this region, during the first imaging session, we initially mapped the retinotopy of different candidate fields of view, using single-plane imaging. Sparse noise stimuli were presented to the mouse, consisting of black or white squares of width 4.5° visual angle on a grey background at a frame rate of 5 Hz for 10 min. Squares appeared randomly at fixed positions in a 16 by 60 grid, spanning the retinotopic range of the computer screens. 1.5% of the squares were shown at any one time.

Visual stimulation

Drifting gratings were centred on the mean receptive field of the microscope’s field of view. Gratings had a duration of 0.5 s, temporal frequency of 2 Hz and spatial frequency of 0.15 cycles per degree. The gratings drifted in 12 different directions (from 0 to 330°, separated by 30°) and were of 3 different sizes (5°, 15° and 60° diameter).

Natural scenes from the ImageNet database were contrast-normalized and presented as described previously34. Each image was presented for 0.5 s with an interstimulus interval uniformly distributed from 0.3 to 1.1 s. Five per cent of the total presentations was grey stimuli. During each session we presented a given set of 1,000 different natural images twice (corresponding to a subset of the 2,800 images that were originally used34).

On each recording session we presented the same random sparse noise stimuli that were used to map retinotopy (see above), for 30 min.

Spontaneous activity was recorded in front of a uniform grey screen, set to a steady cyan level equal to the background of all the stimuli presented for visual responses protocols. The duration of these grey screen presentations was typically between 15 and 20 min.

Eye-tracking

We used a collimated infrared LED (SLS-0208-B, lpeak = 850 nm; controller: SLC-AA02-US; Mightex Systems) to illuminate the eye contralateral to the recording site. Videos of eye position were captured at 30 Hz with a monochromatic camera (DMK 21BU04.H, The Imaging Source) equipped with a zoom lens (MVL7000; Navitar), and positioned at approximately 50° azimuth and 50° elevation relative to the centre of the mouse’s field of view. Contamination light from the monitors and the imaging laser was rejected using an optical band-pass filter (700–900 nm) positioned in front of the camera objective (long-pass 092/52 × 0.75, The Imaging Source; short-pass FES0900, Thorlabs).

Processing of calcium imaging

Two-photon calcium data were processed using Suite2P (ref. 53). Neuropil contamination was corrected by subtracting from each region of interest (ROI) signal its surrounding neuropil signal multiplied by a constant factor of 0.7. Calcium traces were deconvolved using non-negative spike deconvolution54 with a calcium indicator decay timescale of 1.5 s. ROIs were manually curated to make sure that only cell bodies were considered for further analysis.

coppaFISH

Many approaches to highly multiplexed mRNA detection have been described55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73. The coppaFISH method is a development of an in situ sequencing method28 (Extended Data Fig. 1). The method uses reverse transcription, padlock probes and rolling-circle amplification to amplify mRNAs to DNA rolling-circle products (RCPs) that contain multiple copies of a 20-nucleotide (nt) barcode sequence, and then detects their location combinatorially in 7 rounds of 7-colour fluorescence imaging.

Gene selection and DNA probe design

A panel of 73 genes was selected to allow the identification of cortical cell types. This panel is a subset of a panel of 99 genes described in a previous study28, which was picked based on scRNA-seq data using an algorithm that predicts which gene combinations are required to identify fine transcriptomic subtypes. Retrospective analysis analysing the contribution that each gene made to classification accuracy revealed that 26 genes in this panel served no purpose in accurately classifying cells (figure S16 of ref. 28), leading to their removal from the panel. One gene (Yjefn3) was detected in our experiments, but could not be used to assign cells to transcriptomic subtypes as it was not present in the reference scRNA-seq dataset3. In the main text we therefore refer to a 72-gene panel.

Multiple padlock probes were designed for each gene, spanning the length of the cDNA (Supplementary Data 2). The number of different padlock probes per gene was chosen on the basis of the expression for each specific gene as determined by scRNA-seq. This means that fewer padlock probes were used for genes with high expression and vice versa (for example, four padlock probes were designed for Sst but 10 were designed for Chodl). All padlock probes consisted of two 15–20-nt recognition sites, a 20-nt gene barcode (unique to each gene) and a 20-nt anchor sequence (identical for all genes and padlock probes).

Padlock probes were designed using previously described software28. In brief, this software finds suitable RNA target sequences by restricting the melting temperature of the binding sequence, and by aligning the candidate sequences to the mouse whole transcriptome (RefSeq database) using BLAST+ to check for specificity. Any candidate targets for which another transcript or non-coding RNA sequence matched the target with more than 50% coverage, 80% homology and coverage spanning the central 10 nt of the target sequence were excluded. For each padlock probe, we also designed a specific primer for reverse transcription: a 15-nt-long DNA oligonucleotide that binds the region upstream of the mRNA sequences targeted by the padlock probes (Supplementary Data 3). The use of specific primers greatly improved the number of RCPs obtained per section compared to random primers (Bugeon, S. et al., unpublished observations).

To determine the gene-specific DNA barcode sequences (and the anchor sequence), 240,000 orthogonal 25-mer oligonucleotide sequences74 were trimmed to 20 nt from the 5′ end and screened for melting temperature (between 55 °C and 56 °C using the SantaLucia method). They were further screened for orthogonality with mouse sequences using BLAST+ with the NCBI mouse genomic plus transcript (Mouse G +T) database. We used the following BLAST parameters: “-reward”, 1, “-penalty”, -2, “-gapopen”, 2, “-gapextend”, 1, “-evalue”, 10. Any matches in this blast search were removed from the pool. Next, we checked for potential cross-reactivity of the remaining sequences to themselves using the same BLAST parameters, and any hits were removed, resulting in 6,397 possible sequences. The barcode sequences were chosen from this pool.

The combinatorial imaging strategy used two types of DNA probes. Seven ‘dye probes’ were designed, each consisting of a 20-nt-long DNA oligo conjugated to one of the seven following fluorophores: DY405, AF488, DY485xL, AF532, AF594, AF647 and AF750; the same dye probes were used on each imaging round (Supplementary Data 4). In addition, a set of 40-nt ‘bridge probes’ were designed for each imaging round, linking each gene’s RCP barcode to one of the seven dye probes (Extended Data Fig. 1 and Supplementary Data 5). The bridge probes thus caused each gene to show up in a specific colour channel on each round. This two-part strategy of linking the seven dye probes to the RCPs with bridge probes provides a substantial cost saving over making Ngenes × Nrounds dye probes, as dye-coupled probes are much more expensive than simple DNA.

Each gene was assigned a sequence of dyes for the seven imaging rounds using a Reed–Solomon coding scheme75 (Supplementary Data 6), which constructs sequences of minimum possible overlap. Specifically, the genes were numbered by integers g, and converted to a base 7 representation g2g1g0 . The dye assigned to gene g on round r was

$${D}_{{gr}}={g}_{2}{r}^{2}+{g}_{1}r+{g}_{0},$$

where addition and multiplication are understood to happen modulo 7. Codes 0 to 6, which correspond to the same colour in each round, were not used as these codes could not be distinguished from fixed background fluorescence.

All custom DNA oligos (padlock probes, primers, bridge probes and dye probes) were obtained from Integrated DNA Technologies. Padlock probes were ordered as 5′ phosphorylated 4 nmol Ultramer DNA oligos; all other oligos were ordered as classical 25 nmol DNA oligos. The DNA sequences for all 556 primers and padlock probes, 511 bridge probes and 7 dye probes are provided in Supplementary Data 2–5.

Tissue preparation

After the in vivo recordings were finished, mice were anaesthetized with isoflurane and then injected with a lethal dose of sodium pentobarbital (0.01 ml g−1). The fresh brains were then dissected out from the skull, taking great care to preserve the integrity of the tissue and avoid warping. The brains were then placed in OCT (Sakura Finetek) and left to freeze on dry ice for 30 min. The samples were then stored at −80 °C until slicing. Sagittal sections (15-µm thick) were then obtained using a Leica Cryostat for each brain and mounted on gelatine-coated borosilicate glass coverslips (22 x 55 mm). Gelatine-coated coverslips allowed tissue section adhesion to the coverslip and RNA preservation throughout the protocol. To make them, coverslips mounted on a rack were dipped for 30 s in a solution of 2% w/v gelatine and 0.2 % w/v chromium potassium sulfate dodecahydrate in distilled water (https://www.rndsystems.com/resources/protocols/protocol-preparation-gelatin-coated-slides-histological-tissue-sections). Two to three brain sections were thaw-mounted on each coverslip and then frozen and stored at −80°C.

It was not necessary to bleach the native fluorescence of mCherry and GCaMP (which might in principle interfere with later fluorescence imaging), as these faded completely during standard tissue processing.

In situ RCP production

The RCPs were prepared as described previously28, with some modifications. First, coverslips were taken out of the freezer and then directly pre-fixed using 4% paraformaldehyde (PFA) for 5 min at room temperature. This pre-fixation was followed by a quick wash with nuclease-free phosphate-buffered saline (PBS), and incubation in 0.1 M HCl for 5 min at room temperature. After one more PBS wash, the sections were incubated in 70% ethanol for 1 min and then in 100% ethanol for 1 min at room temperature. The coverslips were then left to dry in air. To keep the reagents on the tissue sections, a barrier was drawn around each section using a hydrophobic barrier PAP pen (ImmEdge Hydrophobic Barrier PAP Pen H-4000, Vector Laboratories).

The sections were then directly incubated in reverse transcription mix overnight at 37 °C in a humidified chamber (Slide staining system, StainTray M918, VWR). The mix contained 0.5 mM dNTP mix (Thermo Fisher Scientific), gene-specific primers (10 μM each), 0.2 μg μl−1 BSA (NEB), 1 U μl−1 RIBOPROTECT RNase Inhibitor (Blirt) and 20 U μl−1 TranscriptMe reverse transcriptase (Blirt) in 1× reverse transcription buffer (Blirt). The mix was removed and fresh 4% (w/v) PFA in PBS was added to the sections without any wash in between. This post-fixation step aimed to cross-link newly synthesized cDNA to the cellular matrix and was carried out at room temperature for 30 min, followed by two washes in PBS. RNaseH digestion, padlock hybridization and ligation were then performed using a single reaction mix. The mix contained 0.05 M KCl (Sigma), 20% ethylene carbonate (Sigma), 10 nM of each padlock probe (557 probes), 0.2 μg μl−1 BSA, 0.3 U μl−1 Tth DNA Ligase (Blirt) and 0.4 U μl−1 RNase H (Blirt) in 1× Ampligase buffer (Epicentre). The sections were first incubated at 37 °C for 30 min for RNaseH digestion and moved to 45 °C for 60 min for stringent hybridization and optimal DNA ligase activity. The sections were then washed twice in PBS. Finally, for rolling-circle amplification, the sections were incubated in a mix containing 5% glycerol (Sigma), 0.25 nM dNTP mix, 0.2 μg μl−1 BSA, 0.2 U μl−1 EquiPhi29 DNA Polymerase (Thermo Fisher Scientific) and 1× EquiPhi29 buffer (Thermo Fisher Scientific) overnight at 30 °C.

RCP production was quickly verified before full barcode read-out by hybridizing a AF750-conjugated oligonucleotide probe (IDT) to the anchor sequence present in all the RCPs. Sections were incubated for 15 min at room temperature in a hybridization mix containing 10 nM of the dye probe, 2× SSC, 20% ethylene carbonate and H2O. They were then washed twice with 2× SSC. The SSC was then removed from the sections and the coverslips were mounted onto SuperFrost plus (VWR) glass slides using 10 µl SuperFrost gold antifade mountant (Life Technologies). Images of the ROI (visual cortex) were then acquired to visualize the RCPs.

Imaging of the in situ barcodes (read-out)

All seven rounds of imaging occurred in a custom flow cell, using automated fluidics to wash appropriate bridge and dye probes before each round. The flow cell frame was designed using Blender and printed, using an Ultimaker S5 3D printer, in polylactic acid filament (PLA) with polyvinyl alcohol (PVA) support structures. The PVA support was removed after printing by placing the flow cells in water on a rocker overnight. To make the flow cell air-tight, two 22 ×55-mm glass coverslips (one with RCP-containing sections and one bare) and two approximately 40-cm-long EFTE tubes (Tubing Tefzel Nat 0.0625 inch outer diameter x 0.020 inch inner diameter) were securely mounted using UV curing cement (Norland Optical Adhesive 81) and a UV curing LED system with driver unit and a handheld 365-nm light source (ThorLabs, CS20K2). The coverslip with the sections was mounted so that the side with the sections faces the inside of the flow cell.

The Imaging set-up consisted of a Nikon Eclipse Ti2 microscope with a NIR-LDI laser panel and a Zyla sCMOS 4.2 camera (Andor). The fluidics set-up consisted of a Minipuls 3 pump (Gilson) and two linked MVP multivalves (Hamilton), each with 8 ports. Nikon NIS elements software (v.5.20.02, build 1453) was used to acquire the images and communicate with a second computer controlling the fluidic pump and multivalves. The opening of the valves and the speed and the duration of the pump’s activity was managed by an edited version of Kilroy software (https://github.com/ZhuangLab/storm-control; edits available at https://github.com/acycliq/storm-control). The imaging and sequencing chemistry were coordinated by NIS elements software (ND sequence acquisition module), which communicates with the computer running Kilroy by sending TTL pulses through a National Instruments NI-USB 6008 board.

Before sequencing, 15-ml falcon tubes containing bridge probe mixtures for each of the seven imaging rounds, as well as one each for dye probe mixture, anchor probe mixture, imaging buffer, distilled water, 2× SSC and 100% formamide, were attached to the multivalves via EFTE tubing and flangeless fittings (1/16 inch Red Delrin, IDEx Health and Science LLC). The mixtures for bridge, dye and anchor probes contained the appropriate oligonucleotides diluted to 10 nM each in 2× SSC, 20% ethylene carbonate and H2O. The bridge probe mix for the final anchor round contained the Cy7-conjugated anchor probe as well as the Gad1 bridge probe that binds to the AF532 dye probe (Gad1_r6 – 10 nM) and DAPI to stain the cell nuclei. A fresh formamide (S4117 Millipore) aliquot was used for every experiment (stored at 4 °C). The flow cell was then mounted onto the multi-slide stage and connected to the pump and multivalves via EFTE tubing. The speed of the pump was adjusted to approximately 0.4 ml s−1. To fill the flow cell, each solution was flushed through the fluidics system for 4 min (the flow cell volume is approximately 1 ml).

In total, eight rounds of imaging were done for each imaging experiment: seven rounds to decode the barcodes and one final anchor round to detect the position of every RCP that was used for later image alignment. In each round, sections were first incubated in 100% formamide for 15 min to strip the RCPs from any previous labelling. The formamide was then flushed from the flow cell with water for 4 min and then with 2× SSC for 4 min. The sections were next incubated in that round’s bridge probe mix for 15 min and washed with 2× SSC. After this, the sections were incubated in the dye probe mix for 15 min, and again washed with 2× SSC. The flow cell was filled with an imaging buffer consisting of glucose oxidase and catalase containing oxygen scavenging system76 to protect the fluorophores from photobleaching during imaging.

After each round of sequencing chemistry, 16-bit images were acquired using wide-field epifluorescence excitation, and a 40× magnification air-objective (CFI Plan Apochromat Lambda 40XC, NA 0.95). Images consisted of z-stacks (z-step: 0.5 μm) in seven different colour channels corresponding to the seven fluorophores (Fluorophore – excitation wavelength, emission filters: Dy405 – ex405, 460/50 m; AF488 – ex470, 525/36 m; Dy485xl – ex470, 632/60 m; AF532 – ex520, 560/40 m; AF594 – ex555, 632/60 m; AF647 – ex640, 700/75 m; AF750 – ex730, 811/80 m). Each tile was 2,048 × 2,048 pixels (pixel size: 0.1625 μm). The imaging parameters were adjusted to cover only the ROI (V1) and usually consisted of 10–15 tiles with 10% overlap. The Nikon perfect focus system was used to make sure that the focus stayed relatively constant across imaging rounds. Image files were saved in Nikon’s native ND2 format.

In situ data analysis

The in situ data were analysed with a suite of custom software for image processing, gene calling and cell calling. All code was written in MATLAB, and is freely available at https://github.com/jduffield65/iss. This software was developed from that described previously28, but has been greatly modified, so is described in full here.

The in situ data consist of eight rounds of multispectral imaging (seven combinatorial rounds, and one reference round in which all RCPs are labelled via the anchor sequence, together with an additional stain for Gad1 RCPs and a DAPI stain). Because the tissue sample is too large for a single camera image, imaging occurs in overlapping tiles. In each tile, a focus stack of wide-field images was taken for each colour, and flattened into two dimensions (2D) using an extended depth of focus algorithm77. The data therefore consist of a set of images:

$${I}_{R,C,T}\left({\bf{x}}\right).$$

Here, I gives the pixel intensity for sequencing round R, colour channel C, tile T, and pixel coordinates x within this tile. The processing pipeline to identify detected genes comprises several steps: initial registration; RCP spot detection and fine registration; cross-talk compensation; and gene calling. These analyses proceed without ever ‘stitching’ all the tiles into a single large image; this approach allows processing of very large datasets on computers with limited memory, and also easily allows non-rigid alignments. Before the pipeline, all RCP images are linearly filtered by convolving with a difference of Hannings: a Hanning of radius 0.5 μm minus a Hanning of radius of 1 μm, both normalized to have sum 1. The DAPI background images are filtered with a disk-shaped top-hat filter with radius of 8 μm.

Initial registration

The initial registration step finds offsets between all image tiles using the anchor images taken on round 8 (which we refer to as ‘reference images’). We use this to define a global coordinate system for the entire tissue sample.

Because we use a square tiling strategy, each tile may have up to four ‘neighbours’: other tiles with which it has a region of substantial overlap. We denote the set of neighbouring tile pairs as 𝔑.

Spots first are detected in each tile’s reference images, as local maxima of the filtered image exceeding a fixed detection threshold. To align the reference images, we loop over all pairs of neighbouring tiles, and compute an offset to register the overlapping regions of the filtered reference images of these two tiles. The offset between two tiles T1 and T2 is found by exhaustive search over all 2D shifts in a range around to the shift expected from the microscope’s position sensor. For each shift, we find for each spot s on T1 the pixel distance Ds to the nearest spot on T2 after the shift has been applied. A score is computed as $$\sum _{s}{{\rm{e}}}^{-{{D}_{s}}^{2}/8}$$, and the final shift vector $${{\boldsymbol{\Delta }}}_{{T}_{1},{T}_{2}}$$ is taken as the one that maximizes this score; that is, the one with the most near neighbours.

We define a single global coordinate system by finding the coordinate origin XT for each tile T. Note that this problem is overdetermined as there are more neighbour pairs than there are tiles. We therefore compute the offsets by minimizing the loss function

$$L=\sum _{({T}_{1},{T}_{2})\in {\mathfrak{R}}}{|{{\bf{X}}}_{{T}_{1}}-{{\bf{X}}}_{{T}_{2}}-{{\boldsymbol{\Delta }}}_{{T}_{1},{T}_{2}}|}^{2}$$

Differentiating this loss function with respect to XT yields a set of simultaneous linear equations, the solution of which yields the origins of each tile on the reference round. The results of this step suffice to define a global coordinate system, but do not provide pixel-level alignment of images from multiple colour channels on multiple rounds, owing to the occurrence of chromatic aberration and small rotational or non-rigid shifts. The latter will be dealt with in the next step, through point cloud registration.

Spot detection and fine registration

The second processing step detects spots in all images of the seven sequencing rounds, performs fine alignment of colour channels and sequencing rounds, and computes for each spot a position in global coordinates and an intensity vector summarizing that spot’s detected fluorescence in each round and channel.

The most intricate part of this step is fine image registration. Even though the same tile layout is used for all sequencing rounds, the precise positions of the tiles may differ owing to slight shifts in the placement and rotation of the sample. Thus, a single spot might be found on different tiles in different sequencing rounds. Furthermore, owing to chromatic aberration, a spot may be in slightly different positions (although not different tiles) in different colour channels. Because most spots are only a few pixels in size, even a one-pixel registration error can compromise accurate RNA reads.

A global coordinate is defined for each of the spots detected in the reference images using the initial registration described above. In regions where tiles overlap, duplicate spots are rejected by keeping only spots which are closer in global coordinates to the centre of their original tile than to any other.

Next, spot positions are detected in images from all sequencing rounds and colour channels. These are used to align each round and colour channel to the corresponding tile’s reference image, using point cloud registration. Specifically, we fit an affine transformation from each reference image to the images of the corresponding tile for all rounds and colour channels, using the iterative-closest point (ICP) algorithm with matches further than 3 pixels away excluded. These affine transformations can include shifts, scalings, rotations and shears, but we did not find it necessary to introduce nonlinear warping transformations within tiles (nonlinear transformations can still occur globally by variation of the affine transformation across tiles). As the ICP algorithm is highly sensitive to local maxima, it is initialized from a shift transformation computed by the same method used to find the overlap between reference images; that is, the shift that maximizes the number of near neighbours as measured by $$\sum _{s}{{\rm{e}}}^{-{{D}_{s}}^{2}/8}$$. When spots are located on neighbouring tiles on different rounds, the corresponding images are again registered with ICP.

Finally, a seven-dimensional intensity vector Vs,r is computed for each spot s in each round r, by reading the intensity from the aligned coordinate of each filtered image.

Cross-talk compensation

The last step associating spots to genes consists of transforming the intensity vectors to gene identities.

An important consideration in this stage is that cross-talk can occur between colour channels. Some cross-talk may occur owing to optical bleedthrough; additional cross-talk can occur owing to chemical cross-reactivity of probes. With the current hybridization chemistry (unlike previous sequencing-by-ligation chemistry), the degree of cross-talk tends to be constant within a round, so we learn a single 7 × 7 cross-talk matrix and apply it to all rounds.

To estimate the cross-talk present, we first collect a set of seven 7-dimensional vectors Vs,r containing the intensity in each colour channel of all well-isolated spots s in all rounds r. Only well-isolated spots are used to ensure that cross-talk estimation is not affected by spatial overlap of spots corresponding to different genes; a spot is defined as well-isolated if the reference image intensity averaged over an annular region (4–14 pixel radius) around the spot is less than a threshold value. Cross-talk is then estimated by running a scaled k-means algorithm78 on these vectors, which finds a set of seven vectors cd (d refers to one of the seven dyes), such that the error function:

$$\sum _{s,r}\mathop{min}\limits_{{d}_{s,r},{\lambda }_{s,r}}{|{{\rm{V}}}_{s,r}-{\lambda }_{s,r}{{\bf{c}}}_{{d}_{s,r}}|}^{2}$$

is minimized; in other words, it finds the seven intensity vectors cd such that each well-isolated spot on round r is close to a scaled version of one of them.

The cross-talk matrix is used to predict the colour profile expected for an RCP of each gene g, for each colour channel and round. If gene g is assigned the dye dg,r in round r, the predicted 49-dimensional intensity vector is obtained by concatenating the corresponding cross-talk vectors.

Gene calling

Improvements in tissue processing and in situ chemistry mean that our current methods produce substantially more RCPs than the previous in situ sequencing method28. Consequently, the fluorescence of neighbouring RCPs often overlaps, which would render the previous detection method unable to find them. To allow resolution of overlapping spots, we therefore developed a gene-calling algorithm, based on orthogonal matching pursuit (OMP)78. This algorithm also allows for subtraction of background autofluorescence. Essentially, OMP repeatedly tests whether the 49-dimensional fluorescence vector of a pixel overlaps with the predicted fluorescence vector of each gene; if so, a gene is detected at that location, its code is projected out from the fluorescence vector, and the process repeats.

The OMP algorithm fits a 49-dimensional image (one dimension for each combination of round and colour channel) as a sum of 49-dimensional code vectors. There is one code vector ag for each gene, and one ‘background’ code $${{\bf{a}}}_{c}^{B}$$ for each colour channel, which has equal intensity for all rounds in one colour channel only. These background codes account for tissue autofluorescence, which will affect all imaging rounds equally.

The gene codes ag are derived from using knowledge of the Reed–Solomon assigned dyes dg,r for each gene in each round and the cross-talk matrix columns cd. These codes take into account the fact that different genes can have consistently different intensities in different rounds, which may arise from non-uniformity in the synthesized concentrations of the bridge probes. To account for this non-uniformity, we learn a scale factor εg,r, and predict the 49-dimensional gene code for gene g as a concatenation:

$${{\bf{a}}}_{g}=\left[{\varepsilon }_{g,1}{{\bf{c}}}_{{d}_{g,1}};{\varepsilon }_{g,2}{{\bf{c}}}_{{d}_{g,2}};{\varepsilon }_{g,3}{{\bf{c}}}_{{d}_{g,3}};{\varepsilon }_{g,4}{{\bf{c}}}_{{d}_{g,4}};{\varepsilon }_{g,5}{{\bf{c}}}_{{d}_{g,5}};{\varepsilon }_{g,6}{{\bf{c}}}_{{d}_{g,6}};{\varepsilon }_{g,7}{{\bf{c}}}_{{d}_{g,7}}\right]$$

We will describe the general algorithm before specifying how εg,r is chosen.

The OMP algorithm expresses the 49-dimensional fluorescence vectors vp for each pixel p as a weighted sum of code vectors: $${\hat{{\bf{v}}}}_{p}={\sum }_{i=1}^{{n}_{p}}{\alpha }_{{g}_{i,p}}{{\bf{a}}}_{g}+{\sum }_{c=1}^{7}{\beta }_{c,p}{{\bf{a}}}_{c}^{B}$$. Each step of the algorithm can add a code to the set of code vectors $$\left\{{g}_{i,p}:i=1\ldots {n}_{p}\right\}$$ used to approximate pixel vp; the 7 background codes are always included. The gene set is initialized to be empty, and to choose which gene code, if any, should be added on each step, the algorithm computes how much the residual $${\left|{{{\bf{v}}}_{p}-\hat{{\bf{v}}}}_{p}|\right|}^{{\bf{2}}}$$ would decrease for each possible addition to the set, and picks the gene giving maximum decrease, provided that this decrease is above a threshold of 0.0612 multiplied by the second largest absolute value of vp (clamped by a minimum threshold of 0.01 and a maximum threshold of 3.0), up to a total of 6 genes per pixel. After this iterative process has terminated for all pixels, an image is made for each gene, containing the gene’s weight for each pixel or zero if that gene is not in the pixel’s gene set. RNA detections are found as local maxima of this image, subject to a thresholding criterion; the criterion takes into account several factors and is best understood by examining the source code (https://github.com/jduffield65/iss).

To choose the scale factors εg,r, a single iteration of the OMP algorithm is run with all εg,r = 1. Local maxima are detected as just described, but with a more stringent threshold (see source code for details) to ensure only unambiguous gene detections are used. We then compute a 7-dimensional mean intensity vector $${\bar{{\bf{v}}}}_{g,r}$$ of all detected spots for each gene in each round. We then find the scale factors εg,r for each round and gene as the least-squares solutions of

$${\bar{{\bf{v}}}}_{g,r}\approx {\varepsilon }_{g,r}{{\bf{c}}}_{{d}_{g,r}}$$

Cell calling

The DAPI image was used to segment the cells. This was performed by detection of the local maxima in each cell followed by watershed segmentation. The segmentation of matched cells and their close neighbours was manually curated.

To assign cells to transcriptomic subtypes, we used the previously described pciSeq algorithm28, a Bayesian method that assigns each in situ cell a posterior probability of belonging to each of a set of cell classes defined by prior scRNA-seq. As we recorded from V1, we used the transcriptomic clusters defined in a previous study3, using only cells from V1 to compute the mean expression of each cluster. These clusters are similar to those produced from other cortical and hippocampal regions, but may differ, particularly for fine subtypes (see Fig. S1 of ref. 6 and Extended Data Fig. 3 of ref. 1 for the probable relationships between these clusters and other classification schemes). The read counts of the scRNA-seq data were divided by 100 to predict the expected in situ RNA count; a further gene-dependent efficiency factor was estimated by the algorithm. The pciSeq algorithm produces a probability for each cell to belong to each class, which we converted to a ‘hard’ classification by assigning each cell to the subtype of maximum a posteriori probability; cells for which this maximal probability was less than 0.5 were not analysed further (around 2% of matched cells; Extended Data Fig. 4c). We assigned cells using all 109 transcriptomic clusters defined previously3, including inhibitory neurons of all layers and non-GABAergic cells. Nevertheless, the algorithm assigned the imaged inhibitory cells to just 35 of these clusters, corresponding to superficial-layer inhibitory subtypes. To evaluate the accuracy of this algorithm, we subsampled the read counts from the scRNA-seq dataset using a Poisson distribution and then estimated the a posteriori probability of belonging to each subtype similarly to in situ cells (Extended Data Fig. 10). This showed that our 72-gene panel yielded an estimated assignment accuracy of 98.1%, 96.6% and 76.4% at the subclass, type and subtype levels, respectively.

Registration of the in vivo and ex vivo cells

We used inhibitory cells, labelled in vivo by mCherry (Gad2-mCherry mice), as landmarks to perform the registration between the in vivo Gad-mCherry volume and the ex vivo brain sections (Extended Data Fig. 2). This alignment made use of two high-resolution reference z-stacks taken for each subject following each imaging session. The ‘GCaMP z-stack’ was taken using the same wavelength as functional imaging (920 or 980 nm), covering the same volume but at higher resolution. The ‘mCherry z-stack’ was acquired in the same volume with a 1,040 nm excitation wavelength to detect inhibitory neurons in Gad2-mCherry mice, but also provided some GCaMP signal in the green channel (although this signal was much lower than for the GCaMP z-stack taken at 920 nm). The different excitation wavelength of these two z-stacks led to a small chromatic aberration, which was only significant in depth. To correct this aberration, we used the green channel found in both imaged volumes, registering planes of the GCaMP z-stack to the mCherry z-stack using fast Fourier transform (FFT) convolution. This was achieved by finding the best matching plane from the later z-stack for each GCaMP z-stack plane as the z position that gave the highest FFT cross-correlation. In addition, a ‘global z-stack’ was made following the final functional imaging session, which covered the entire region under the craniotomy; this was used for coarse initial registration of the in situ slices.

Aligning calcium ROIs to the mCherry z-stack

To align the imaging planes of one functional two-photon session to the GCaMP z-stack, we first obtained their theoretical position using the measured position of the objective for each line scanned (for both the functional imaging planes and the GCaMP z-stack). We then estimated the z-drift during the recording session: the position of the calcium imaging planes over time in comparison to this GCaMP z-stack. To do so, a mean image of each functional imaging plane was obtained for 1 min every 7 min of the recording. These mean images were then aligned to the z-stack using FFT convolution. We then took the median of this z-drift over time and used it to correct the theoretical imaging plane position. We then performed FFT-based registration to correct for a small shift in X and Y between the actual mean image and the reconstructed image. We thus found the position of the imaging planes (and therefore of each functional ROI) in the GCaMP z-stack. These were then aligned to the mCherry z-stack using the transformation described above (chromatic aberration in depth).

Aligning brain slices to the mCherry z-stack

To register the positions of the in-situ-detected inhibitory neurons to the 3D mCherry z-stack, we used a custom point cloud registration method, using inhibitory neurons as landmark points. MATLAB code and an example pipeline script can be found at https://github.com/ha-ha-ha-han/NeuromicsCellDetection/, and at https://github.com/sbugeon/NeuromicsCellDetection.

During slicing, the latero-medial order of the sagittal brain sections was carefully recorded. To find the sections corresponding to the imaged region, we first screened them by generating RCPs for every 20th section, and staining with the Gad1 bridge probe and its corresponding dye probe to label inhibitory neurons. The position of the fluorescent bead injection was usually visible on one of the sections, allowing us to infer the approximate position of every slice (based on the known order and thickness of slicing).

Fine registration of screened sections to the in vivo reference z-stack started with cell detection in vivo and ex vivo. To detect cells in the in vivo mCherry z-stack, each plane was contrast-normalized to correct for the loss of brightness with depth using the following MATLAB GUI https://github.com/nadavyayon/Intensify3D/blob/master/User_GUI_Intensify3D.m (which performs background and signal estimation based on user defined thresholds), and the z-stack was then filtered using a 3D median filter of radius 2 μm to reduce background noise. The mCherry-positive cells were automatically detected on these images using a 3D difference-of-Gaussians filter followed by watershed segmentation. Manual curation was performed to correct for missed or false positive detections. To detect inhibitory cells in the ex vivo slices, we used the in situ expression of Gad1 in the reference round, as native mCherry fluorescence was not preserved in the fresh-frozen sections. Gad1 detections formed clusters on GABAergic cells (Extended Data Fig. 2), which were detected by Gaussian smoothing of the Gad1 RCP images and applying a difference-of-Gaussian filter and watershed segmentation to detect individual clusters. Finally, we manually curated these detections using the full in situ expression of the 72 genes to determine putative interneurons on the basis of the main inhibitory cell markers such as Vip, Sst, Pvalb and so on.

The slices were first coarsely registered using brain structures (hippocampus, brain surface and so on) visualized using the anchor and nuclear staining. Next, they were finely registered using an algorithm to register a 2D point cloud, corresponding to inhibitory neurons in the ex vivo slice, into a 3D point cloud, corresponding to inhibitory neurons in the in vivo volume. To align these point clouds, we used rigid registration with 6 degrees of freedom (α, β, γ, x, y, z), where α, β, γ are the rotation angles, and x, y, z are translational shifts (non-rigid point cloud registration is possible, but we found it to be unnecessary). The registration algorithm searched for the parameters (αmax, βmax, γmax, xmax, ymax, zmax) that maximize the match of the 2D slice to the corresponding section of the 3D volume.

Because this registration problem has a large number of local maxima, we performed an exhaustive grid-search over these six parameters. Because Fourier convolution of 3D arrays is fast, but rotation of them is not, we used a hybrid point and Fourier method. An outer loop searches over all combinations of rotation angles (α, β, γ), with an initial step size of 1°, refined to 0.5° for finer alignment, and rotates the 3D point cloud accordingly. A 3D volumetric image is then synthesized from these rotated points by adding a Gaussian peak at the location of each point. Each plane z of this image is Fourier convolved with a fixed 2D array synthesized similarly for the 2D cloud, and the resulting 3D correlation map is stored, to accumulate a correlation score function c(α, β, γ, x, y, z). The top local maxima of this 6D array are found and ranked using both the intensity of the cross-correlogram peaks and the percentage of cells matched within a tolerance of 15 μm (to account for small non-rigid deformations). Finally, the match validity for each section was assessed manually by looking at the overlay between the interpolated cut from the reference z-stack and the Gad1 RCP image. The rotation and translation parameters were manually adjusted to provide the best overlay between the two datasets. Typical rotation angles were found between −10° and 10° of the coarse manual registration, enabling us to save computation time by searching only this range.

Aligning individual neurons

Finally, a custom MATLAB GUI was used to curate the match between inhibitory cells in the in vivo recordings and the ex vivo sections. The GUI allowed us to visualize the in vivo mCherry image of each cell (obtained from the reference z-stack), the position of the ROIs on the reference z-stack and the overlap between the reference z-stack cross-sections and the in situ gene expression for the different genes. For each slice, we displayed all mCherry-positive ROIs that were less than 10 µm away from the found position of the slice in the reference z-stack. Each assignment of in vivo and ex vivo Gad-positive cells was curated manually on the basis of these data. At this stage, the boundaries that were initially found using automatic segmentation of the DAPI image were also manually adjusted for the matched cells and their neighbours, to correct for errors in DAPI segmentation that could affect the gene and cell-type assignment. This correction was based both on the DAPI image and on the in situ gene expression, which provided information that could indicate under-splitting in the DAPI segmentation of adjacent cells. We took a conservative approach to this manual curation process, discarding all imaged cells for which the match to ex vivo slices was not absolutely clear (around 50% of cells).

Cell selection

We recorded a total of 3,469 (204 ± 42 per session) inhibitory cells and 6,684 (393 ± 173 per session) excitatory cells. Of these inhibitory cells, 1,515 (89 ± 31 per session) cells could be aligned to the ex vivo slices with good confidence, and were thus assigned a transcriptomic identity (see Supplementary Data 1). Some ex-vivo-identified cells were recorded in multiple imaging sessions. In all figures, a unique session was picked for each matched cell (except for Fig. 2, in which we show all cells in a single session, and for pairwise correlations that used all cells in all sessions: Figs. 3a and 5d and Extended Data Figs. 6b and 8c–e). The session assigned was chosen according to the percentage of time that the mouse spent running during this session, to maximize variability of behaviour while the cell was recorded. After removing these duplicates, we obtained 1,090 unique cells. Of these cells, 17 cells were removed because their maximal posterior probability was less than 0.5. Finally, 8 cells that were assigned to subtypes with fewer than 3 cells in total were discarded. The final population of 1,065 cells belonged to 35 transcriptomic subtypes.

For hierarchical analysis, the 35 subtypes were grouped into 11 types corresponding to putative anatomical or physiological cell types based on the previous literature. For Pvalb neurons, the grouping was unambiguous: the PvalbVipr2 subtype is genetically very different to all other Pvalb subtypes, and several studies have identified molecular markers of this subtype with chandelier cells3,4,7,79. For Sst cells, UMAP analysis (Extended Data Fig. 3) suggests that the two SstTac1 subtypes bridge a continuum between the two SstCalb2 subtypes (identified as superficial-layer Martinotti cells7,80,81) and the PvalbTpbg subtype (identified as superficial-layer Pvalb basket cells7). Patch-seq analysis confirms that SstTac1 cells have less axon in L1 and faster-spiking phenotypes than classical Martinotti cells7. We therefore identify the two SstTac1 subtypes as non-Martinotti Sst cells, acknowledging that these two Sst types are likely to tile a continuum, rather than truly being discrete cell groups. For Lamp5 cells, we grouped subtypes on the basis of the results of previous studies82,83. The three subtypes that comprised the Lamp5Npy group were identified as neurogliaform cells on the basis of their strong expression of Npy. The Lamp5Fam19a1Tmem182 subtype was identified as canopy cells owing to expression of Ndnf but not Npy; the two remaining subtypes were identified as α7 cells owing to their strong expression of Chrna7 and weak expression of Ndnf and Npy. For Vip cells, we divided subtypes by transcriptomic methods: UMAP analysis suggested a clear discrete distinction between two Vip subtypes characterized by expression of Reln as well as weaker expression of Vip itself. We are not aware of any specific study on these VipReln cells; however, on the basis of their weak Vip expression and the fact that Reln is usually a L1 marker, we provisionally identify this type with the L1 VIP cells described previously82. Serpinf1 subtypes were included with the Vip category as we do not see strong evidence for this as a discrete subclass. Finally, Sncg subtypes were divided into two types according to Vip expression, with SncgVip and SncgPdzrn3 identified as small and large Cck cells, respectively84,85.

Data analysis

Modulation index

When comparing activity in two conditions (for example, visual stimulus versus grey screen; large versus small grating; running versus stationary synchronized), we used a modulation index computed as

$${\rm{Mod}}\,{\rm{i}}{\rm{n}}{\rm{d}}{\rm{e}}{\rm{x}}\,=\frac{R-B}{R+B},$$

in which R is the mean activity in the first condition (for example, during the response time window) and B is the mean activity in the second condition (for example, during the baseline time window).

Cell depth comparison to a previous Patch-seq study

For the analysis validating coppaFISH subtype calling using cell depth (Fig. 1j), we used cells of all layers, not just the in-vivo-imaged cells of L1–L3. We used 14 sections for which gene expression was obtained from L1–L6 (all taken from the same mouse). DAPI segmentation was manually curated (see above) in all layers, and cell calling was performed on these sections using the standard method. This provided the cortical depth for about 47,000 cells, among which 2,130 were assigned to a GABAergic subtype. We normalized the measured cortical depth by the maximum cortical depth in these sections (750 μm) and computed the median cortical depth for each subtype with at least 4 cells (46 such subtypes were found). We then did the same thing for the Patch-seq data of a previous study7, which gave 42 subtypes with more than 4 cells. We then compared the cortical depth of the subtypes with at least 4 cells in both datasets (33 subtypes in total; Fig. 1j).

Determining behavioural states

To distinguish the three main behavioural states during spontaneous behaviour, we used the running speed of the mouse as well as the strength of cortical oscillations. Running speed was measured by optical sensors facing the air-suspended ball86, and was smoothed with a 2-s moving average filter. We considered the mouse stationary if this smoothed speed was less than 0.3 cm s−1, and running otherwise. To distinguish between the synchronized and desynchronized stationary states, we first computed the first principal component (PC) of excitatory cells’ activity using PCA, which revealed cells more active in passive or alert states, as previously described34. The activity of the 10% of cells with most negative weight on this PC was averaged, which provided a clear summary of the oscillation that appeared in some stationary periods (Fig. 2). Periods of synchronized activity were segmented manually based on the periods in which this average was clearly oscillating. To measure the oscillatory coupling of each inhibitory neuron, we then computed the Pearson correlation between each cell’s z-scored activity and the average of this excitatory subpopulation during the synchronized periods.

Comparison to transgenic mouse line data

To validate our cell-type assignment, we compared the results obtained with post-hoc transcriptomic data with recordings performed using transgenic mouse lines (Extended Data Fig. 5). We analysed recordings from 18 transgenic mice (5 for Pvalb, 8 for Sst and 5 for Vip; 14 mice were re-analysed from ref. 30 and 4 new mice were added) and 23 sessions (6 for Pvalb, 9 for Sst and 8 for Vip) for a total of 2,589 identified cells (1,023 Pvalb, 572 Sst and 994 Vip cells).

For this analysis (Extended Data Fig. 5), we first deconvolved the calcium traces to inferred firing rates fi(t) for each neuron i at time t (ref. 53). We considered two measures of neural activity for each cell i and trial n: the average neural activity $${r}_{i}(n)={\langle {f}_{i}(t)\rangle }_{t\in [{t}_{n},{t}_{n}+{\rm{\bigtriangleup }}T]}$$ during stimulus presentation from the trial onset time tn to time tn + ∆T, and the average neural response di(n) = ri(n) − bi(n), obtained after subtracting the pre-stimulus baseline activity $${b}_{i}(n)={\langle {f}_{i}(t)\rangle }_{t\in [{t}_{n}-{\rm{\bigtriangleup }}T{,t}_{n}]}$$. The time window parameter ∆T took the value 1 s for the data from ref. 30 and 0.5 s for the new transgenic data and the post-hoc transcriptomic data, corresponding to the whole duration of the stimulus. We then computed the average activity and response for a given stimulus s and locomotion condition v (v = 0: stationary, v = 1: running): $${\bar{r}}_{i}(s,v)={\langle {r}_{i}(n)\rangle }_{n\epsilon \{s,v\}}$$ and $${\bar{d}}_{i}(s,v)={\langle {d}_{i}(n)\rangle }_{n\epsilon \{s,v\}}$$. We estimated the responsiveness of each neuron i to visual stimuli by computing the P value pi of a paired t-test comparing ri(n) with bi(n) for all trials n (pooling all different stimulus types to obtain one P value per cell). For all subsequent analysis, we selected only cells with P values < 0.05. Modulation of visual response by running was computed as follows: first, we computed the average responses $${\bar{d}}_{i}(v)={\langle {d}_{i}(n)\rangle }_{n\in v}$$ and standard deviation $$\,{\sigma }_{i}(v)={\sigma }_{i,t\in v}({d}_{i}(t))\,$$across all stimuli for running and stationary trials (v = 0 and v = 1, respectively). We then computed a modulation index as follows:$$\,{M}_{i}^{\left(3\right)}=\frac{{\bar{d}}_{i}\left(1\right)-{\bar{d}}_{i}\left(0\right)}{\sqrt{{\sigma }_{i}\left(1\right)+{\sigma }_{i}\left(0\right)}}$$. This index was then normalized for each recording session k as follows $${M}_{i}^{\left(3\right)}\to {M}_{i}^{\left(3\right)}/{\sigma }_{i\in k}\left({M}_{i}^{\left(3\right)}\right)$$. We plotted the average modulation of visual responses by running versus the Pearson correlation coefficient of spontaneous activity and running speed ρi (Extended Data Fig. 5a). Before computing the Pearson correlation coefficient, we smoothed the activity fi(t) and running speed v(t) with a time average of 5 s. For this analysis, we selected only cells that had a cortical depth more superficial than −300 μm.

For estimating size tuning curves (Extended Data Fig. 5b), we z-scored the activity of each neuron as follows $${\bar{z}}_{i}\left(s,v\right)=\left[{\bar{r}}_{i}\left(s,v\right)-{\left\langle {\bar{r}}_{i}\left(s,v\right)\right\rangle }_{s,v}\right]/$$ $${\sigma }_{s,v}\left({\bar{r}}_{i}\left(s,v\right)\right)$$ before averaging over cells of a given type.

To evaluate consistency between the physiological features identified with transgenic and transcriptomic cell-type identification, we trained a classifier to predict cell type from physiological features of each cell in the transgenic lines, and asked whether it generalized to the transcriptomic data (Extended Data Fig. 5c). We trained the classifier using 1,230 training cells (410 examples per cell type for the three cell types). The prediction was based on 14 features, which included normalized values of neural activity during different stimulus size and running condition $${\bar{z}}_{i}\left(s,v\right)$$ (features 1–8); skewness of the calcium trace computed across the whole recording session (feature 9); the correlation of spontaneous activity with running speed ρi (feature 10); the ROI diameter (feature 11); the cortical depth (feature 12); and two different measures of the difference in modulation by running: $${M}_{i,s={60}^{^\circ }}^{(1)}-{M}_{i,s={60}^{^\circ }}^{(1)}$$ between large and small stimuli, in which $${M}_{i,s}^{(1)}=$$ $$[{\bar{r}}_{i}(s,v=1)-{\bar{r}}_{i}(s,v=0)]/{\sigma }_{s,v}({\bar{r}}_{i}(s,v)\,)$$; and $${M}_{i,s={60}^{^\circ }}^{(2)}-{M}_{i,s={5}^{^\circ }}^{(2)}$$, in which $${M}_{i,s}^{(2)}=$$ $$[{\bar{d}}_{i}(s,v=1)-{\bar{d}}_{i}(s,v=0)]/\langle {\bar{r}}_{i}(s,v)\rangle {}_{s,v}$$ (features 13 and 14). We normalized features 9–14 by z-scoring them using the mean and standard deviation for each neuron of the transgenic mice, whereas features 1–8 were already normalized as in Extended Data Fig. 5b. We used cell types: y = {Pvalb, Sst, Vip} as training labels. Using the 10 different randomized splits of training and test transgenic data, we applied three different linear classifiers: linear discriminant analysis; logistic regression (regularization parameter C = 10); and linear support vector classification (regularization parameters C = 0.1). The regularization parameters were chosen after a fourfold cross-validation over the different randomized training sets scanning over C = {10−3,102,…102}. Applying the classifier to transcriptomic data gave equivalent performance to test-set transgenic data, indicating that the two methods are consistent.

Response to drifting gratings

Responsive cells (either activated or suppressed) were defined using a repeated measures ANOVA model (fitrm in MATLAB) with the stimulus direction (12 levels) and size (3 levels) as between-subject factors, and the presence of stimulus as a within-subject factor. A cell was defined as responsive if there was a significant effect of stimulus presence after performing a repeated measures analysis of variance (ranova in MATLAB). Significant cells were classified as activated if the mean activity in the response window was above baseline, or suppressed otherwise.

Orientation selectivity Index (OSI) was computed using a cross-validation method. Each cell’s preferred orientation was computed from even trials; selectivity was computed as:

$${\rm{OSI}}\,=\,\frac{({R}_{{\rm{pref}}}\,-\,{R}_{{\rm{ortho}}})}{({R}_{{\rm{pref}}}\,+\,{R}_{{\rm{ortho}}})},$$

in which Rpref is the mean response on the odd trials to the preferred orientation and Rortho is the mean response on the odd trials to the orthogonal orientation (preferred orientation + 90°). This cross-validation was used because non-cross-validated selectivity indices can show large values for sparse neural activity, even if the cells are untuned. The cross-validated measure can take negative values, which indicate inconsistent responses, and will have an expected value of 0 for untuned cells.

Direction selectivity Index (DSI) was obtained similarly. Each cell’s preferred direction was computed from even trials, selectivity was computed as:

$${\rm{DSI}}\,=\frac{\left({R}_{{\rm{pref}}}\,\mbox{–}\,{R}_{{\rm{anti}}}\right)}{\left({R}_{{\rm{pref}}}\,+\,{R}_{{\rm{anti}}}\right)},$$

in which Rpref is the mean response on the odd trials to the preferred direction and Ranti is the mean response on the odd trials to the direction opposite to the preferred (Rpref + 180°).

Size tuning curves and the state modulation of visual response (Fig. 4d,e and Extended Data Fig. 7c) were computed using the methods of a previous study30. Analysis was restricted to cells that had receptive field locations close to the centre of the grating stimuli (<20°). Size tuning curves were obtained for running and stationary states by averaging the z-scored activity of all centred cells of that type (z-scoring was computed relative to the entire drifting grating presentation). Baseline activity (shown as response to size 0 stimuli) was estimated as the average of the z-scored activity during the interstimulus intervals. For both the stimulus response and the baseline, we determined whether the mouse was running or stationary by taking the average running speed during the stimulus presentation. If this speed exceeded 1 cm s−1, we considered the mouse as running, and stationary otherwise.

Cross-validated direction tuning curves (Fig. 4b) were computed for all cells using the average across all sizes. A cell’s preferred direction was estimated as the direction providing the largest response on even trials. Direction tuning curves were computed by averaging the z-scored activity of each cell on odd trials, for each direction relative to this preferred direction. The curve was normalized by dividing by the mean response to the preferred direction (on the even trials). These normalized curves were then averaged over all cells in a subtype (Fig. 4b). The use of cross-validation means that tuning curves do not automatically have a peak at zero; for a cell with no sensory tuning the preferred direction measured on even trials would have no relationship to odd trial responses, and so the tuning curve would appear flat or random.

Pairwise correlations between types

To compute spontaneous correlations between the mean activity of cell groups (Figs. 3a and 5d and Extended Data Figs. 6b and  8c–e), we first normalized each cell’s deconvolved activity by dividing it by its maximum. For each experiment, we then averaged the normalized activity of each cell within a group during grey-screen periods, smoothed with a 1-s boxcar window, and decimated the sampling rate to 1 Hz. When the number of cells in a type was less than 2, the correlation was not computed for that experiment. We computed the Pearson correlation between each group’s mean activity and averaged over experiments. For the intra-type correlations, we randomly split the cells of each group in two halves and applied the same method, to avoid trivially obtaining a correlation of 1. When the number of cells in a type was less than 4, the correlation was not computed for that experiment.

Response to natural images

We summarized a cell’s response to natural image stimuli with two numbers (Fig. 4d and Extended Data Fig. 7f,g). Responsiveness was defined as a modulation index between activity during the stimulus presentation period and the activity just before stimulus onset. Signal correlation was defined by correlating the responses to the first repeat of the 1,000 images with the responses to the second repeat of these same images. This metric characterizes a cell’s selectivity to these image stimuli87.

Transcriptomic PCA

To compute tPC1, we used the in situ gene expression of the 72 genes for each of the 1,065 unique cells that were imaged in vivo and transcriptomically identified. We performed PCA on this 72 by 1,065 matrix, and took the score of the first component to get tPC1 for each cell. To obtain tPC1 values for cells in Patch-seq (Extended Data Fig. 9b), the same weight vector was used and read counts were transformed by log(1 + x).

Multiple linear regression using transcriptomic PCs

To assess the fraction of variance explained by transcriptomic PCs (Extended Data Fig. 8b), we performed multiple linear regression, using leave-one-out cross-validation to quantify how well each cell’s state modulation could be predicted from increasing numbers of PCs. The fraction of variance explained by this multiple linear regression was then compared to the fraction of variance explainable by a cell’s subtype, type or subclass assignment, assessed again by leave-one-out cross-validation, predicting a cell’s state modulation value as the average modulation of its subclass, type or subtype on the training set.

UMAP analysis of previous scRNA-seq data

We performed a UMAP analysis on a previous scRNA-seq dataset3, separately for caudal ganglionic eminence (CGE) (Vip, Sncg and Lamp5)- and medial ganglionic eminence (MGE) (Pvalb and Sst)-derived inhibitory subclasses from V1 only (Extended Data Fig. 3).

To do so, we used methods that have previously been described for CA110. First, a set of 150 genes was found using the ProMMT clustering algorithm. Then 150-dimensional expression vectors were made for each cell, applying a log(2 + x) transform to the scRNA-seq expression levels of these genes. UMAP analysis was performed using a MATLAB toolbox88, initialized by placing the classes around a unit circle in order of similarity.

The genes automatically selected to perform the UMAP analysis were: Vip, Tac2, Sst, Pdyn, Lamp5, Tac1, Crh, Calb1, Penk, Calb2, Th, Cxcl14, Ndnf, Spp1, Htr3a, Cplx3, Pvalb, Crhbp, Npy, Npy2r, Chodl, Crispld2, Prss23, Nov, Cbln2, Cartpt, Akr1c18, Atp6ap1l, Cadps2, Ppapdc1a (also known as Plpp4), Sncg, Tnfaip8l3, Unc13c, Pdlim3, Scgn, Pcp4, Tcap, Lgals1, Serpine2, Moxd1, Pthlh, Cd34, Cck, Sostdc1, Spon1, Gm39351, Mia, Slc5a7, Pde1a, Adarb2, Mybpc1, Car4, Cbln4, Gabrg1, Fmo1, Slc18a3, Grpr, Lypd6, Pde11a, Rxfp1, Tnnt1, Nxph2, Lpl, Cryab, Cp, Npy1r, Id3, Myl1, Id2, Kit, Serpinf1, Bcar3, Aqp5, Scrg1, Gpd1, Rxfp3, Prox1, Col25a1, Chat, Vwc2l, Amigo2, Myh8, Synpr, Grm8, Igfbp5, Gpx3, Rgs12, Lypd1, Cd24a, Reln, Hapln1, Sln, Chrm2, Ostn, Igfbp7, Prox1os, Atf3, Lect1, Gpc3, Ptprk, Teddm3, Il1rapl2, Col6a1, Nek7, Crispld1, Wif1, Wnt5a, Bmp3, Thrsp, Syt2, Pcdh20, Sfrp2, Myh13, Efemp1, Rprm, Cacna2d1, Lypd6b, Meis2, Lhx6, Angpt1, Rspo1, Sema3c, Itih5, Nfix, Sema3a, Stk32a, Ecel1, Jam2, Igfbp6, Sox6, Nfib, Sall1, Sema5b, Shisa8, Tacr3, Chst7, Frmd7, Gm31465, Rspo4, Chrna2, Lmo1, C1qtnf7, Ndst4, Ccdc109b (also known as Mcub), Npas1, Egfr, S100a10, Gpr6, Slit2, Lsp1.

Correlation with electrophysiological and morphological properties

We examined electrophysiological and morphological correlates of our results by relating them to a previously published Patch-seq dataset7, which provided electrophysiological, morphological and gene expression data from a set of V1 inhibitory cells analysed in vitro. These cells had been genetically assigned to the same transcriptomic clusters that we used3, which allowed us to correlate electrophysiological and morphological properties to the state modulation measured in our own dataset. Valid electrophysiological recordings were available for 4,391 cells and included long and short pulses of current injection as well as current ramps. We used the electrophysiological parameters calculated by the original authors using the ipfx software, renaming ‘up/down ratio’ (the absolute ratio of the slopes of the upward and downward components of the action potential) as ‘spike shape index’. Adaptation index was the rate at which spiking changed during a long depolarizing square stimulus. During a hyperpolarizing square current, the membrane time constant tau is the rate of approach of steady state, and sag is the downward deflection before steady state is reached. Capacitance was calculated as the ratio between measured tau and resistance.

We quantified the ratio of axon in each layer using morphological reconstructions obtained following Patch-seq. To enable comparison to our two-photon data, we only examined reconstructed cells with somas in L1–3 that belonged to one of the 35 subtypes we recorded from, for a total of 163 cells. Morphology was represented as an acyclic undirected graph with a position and radius associated with each node. A pair of adjacent nodes (a segment) fell within a layer if both nodes had cortical depths within the layer boundary. Segments that fell on a layer boundary (less than 4% of segments for each cell) were not classified into a layer, and segments that entered the white matter or pia were excluded. The surface area of all within-layer segments was computed using the distance between nodes and their radii. The within-layer surface area ratio is the sum of the surface area of segments within a layer divided by the total surface area of all segments.

tPC1 was computed for each Patch-seq cell using the same 72 genes and weightings found from our coppaFISH data, with gene expression transformed as log(1 + x).

Processing of eye video (pupil detection)

Eye videos were processed using facemap (https://github.com/MouseLand/facemap). An ROI was drawn manually around the pupil of the mouse. The pupil area was defined as the area of a Gaussian fit on thresholded pupil frames, in which pixels outside the pupil were set to zero.

Statistical analyses

Statistical analysis of differences between cell types faces two potential confounds. First, different experiments will by chance record different proportions of each cell type, and may also by chance show other experiment-to-experiment differences such as overall alertness levels. Second, the large number of subtypes presents a potential multiple comparisons problem.

To solve these problems, we devised a nested permutation test. First, an omnibus test asks whether subclass, type and subtype have a significant main effect on our quantity of interest y; there is no multiple comparisons problem for this omnibus test, and all shuffling occurs within an experiment to avoid conflating experiment-to-experiment variability with differences between cell types. The omnibus test is conducted at each of the three levels in a nested manner (Extended Data Fig. 6a): the first asks whether there is a main effect of subclass; the second whether there is a main effect of type beyond that predicted by subclass; and the third whether there is a main effect subtype beyond that predicted by type. After the omnibus test, post-hoc tests are used to ask whether significant differences between types exist within each individual subclass, and whether significant differences between subtypes exist within each individual type (Extended Data Fig. 6a). Additional post-hoc tests are used to ask whether the quantity is significantly different to zero for each subclass, type and subtype. All post-hoc tests are corrected for multiple comparisons using the Benjamini–Hochberg procedure.

To test for a main effect of subclass on a quantity y, the omnibus test computes its mean value of $${\bar{y}}_{f}$$ for each subclass f, and uses as test statistic the variance of $${\bar{y}}_{{f}}$$ across subclasses. To obtain a P value, this test statistic is compared to a null ensemble obtained after 10,000 random shufflings of the subclass label of each cell, separately within each experiment. To test for a main effect of type, we compute the mean $${\bar{y}}_{c}$$ of y for each type c, and use as test statistic the variance of this mean across types. A null distribution is obtained by 10,000 shufflings of type labels separately within each experiment and subclass. To test for a main effect of subtype, we use as test statistic the variance of $${\bar{y}}_{s}$$ over subtypes s A null distribution is obtained by recomputing this statistic after shuffling subtype labels 10,000 times, separately within each type and experiment.

To perform the post-hoc test for significant differences between the types within a specific subclass (indicated by P values on the far right of Fig. 3b and similar), or for significant differences between subtypes within a specific type (indicated by stars second to right in Fig. 3b and similar), we performed the same shuffle test inside individual subclasses and types. For example, to obtain the P value for significant differences of subtypes within the PvalbTac1 type, we used as test statistic the variance of $${\bar{y}}_{s}$$ across the 5 subtypes inside this type, and compared it to 10,000 shufflings of the subtype labels inside this same type. These post-hoc P values were then corrected using the Benjamini–Hochberg procedure. For post-hoc tests of whether a subclass, type or subtype is significantly different to zero, we used Benjamini–Hochberg-corrected t-tests.

For the nested permutation test on pairwise correlations (Fig. 3a and Extended Data Fig. 6b), we used the same shuffling procedure, using as test statistic the difference between the mean of intra-group correlations and the mean of inter-group correlations across all experiments and cell groups.

All P values for the nested permutation test are one-tailed.

For linear correlations (Figs. 1j, 3e and 6a,b and Extended Data Figs. 6e and 9a,b), we show the P value for the Pearson correlation coefficient. To exclude the possibility of conflating experiment-to-experiment variability with differences between cell types, we used ANCOVA controlling for a discrete effect of recording session (Figs. 3c,d and 5c and Extended Data Fig. 8a) quoting the significance of a main effect of the continuous variable. ANCOVA was also used to test whether a continuous transcriptomic variable assigned to each cell correlated significantly with state modulation even after controlling for subtype and recording session (Fig. 3e and Extended Data Fig. 6e) and for subclass and recording session (Fig. 5c), and whether cortical depths of each subtype measured by coppaFISH and Patch-seq were correlated even within a subclass or type (Fig. 1j).

To test for the effect of tPC1 on pairwise correlations (Fig. 5d and Extended Data Fig. 8c–e), we sorted types by tPC1 and computed their pairwise correlation matrix as described above. We used a permutation test to ask whether values close to the diagonal were larger than values far from the diagonal. As test statistic we used the difference between the mean correlation values one or two steps away from the diagonal, and the mean of all other type pairs (Extended Data Fig. 8f). We constructed a null distribution by recomputing this statistic after permuting the order of the types 10,000 times. Again, the P values are one-tailed for this shuffling test.

Box plots

To show the distributions of physiological properties within a cell population, we used box plots (Figs. 3b and 4c,d and Extended Data Figs. 6c,d and 7a–g). In these plots, the central black circles represent the median; the left and right edges of boxes represent the 25th and 75th percentiles; and the whiskers extend to the most extreme data points (excluding outliers more than 1.5 times the interquartile range from the box, which are plotted as small black dots).

Statistics and reproducibility

The cell shown in Extended Data Fig. 1c is a representative example of the 1,090 cells that were recorded and processed with coppaFISH. The registration example shown in Fig. 1b–e is a representative example of the 99 slices (over n = 4 mice) that we have successfully aligned to the in vivo z-stacks. The nine cells shown in Extended Data Fig. 4a,b are representative examples of the 117 molecularly identified cells recorded in the same session as shown in Fig. 2. All experiments were performed on four independent mice, and the results were reliably replicated across all mice. We did not use statistical methods to pre-determine sample sizes. However, our sample sizes are similar to those reported in previous publications using a similar approach25,26. Our study did not contain experimental groups, so randomization and blinding do not apply.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.