### Fabrication of density-modulated membranes

Table of Contents

We use the soft clamping^{6,49,50,51,52,53} technique to realize ultrahigh mechanical quality factors. Our membrane design is inspired by those pioneered in ref. ^{7}, but we use a different material for the nanopillars and a different fabrication process (see Supplementary Information for more details). We fabricated density-modulated PNC membranes by patterning amorphous silicon (aSi) nanopillars on a high aspect ratio Si_{3}N_{4} membrane. In our PNC membranes, we fabricated pillars with diameters *d*_{pil} = 300–800 nm, thickness of about 600 nm and nearest-neighbour distances *a*_{pil} = 1.0–2.0 μm. Amorphous silicon is grown with plasma-enhanced chemical vapour deposition (PECVD) at a temperature of 300 °C. Electron-beam lithography (FOx16 electron-beam resist) and dry etching (using a plasma of SF_{6} and C_{4}F_{8}) are used to pattern pillar arrays in aSi. Dry etching is stopped on a 6-nm layer of HfO_{2} (hafnium oxide) grown with atomic layer deposition (ALD) directly on top of Si_{3}N_{4}. HfO_{2} is used as an etch-stop layer because it is quite resistant to hydrofluoric acid (HF) etching, and the undercut created at the pillar base in the following process steps is limited. Undercut minimization is important to control the added dissipation induced by pillar motion (Supplementary Information). We remove the FOx mask and the residual etch-stop layer by dipping the wafer in HF 1% for about 3.5 min.

After patterning the pillars, we encapsulate them in a PECVD Si_{x}N_{y} layer to protect them during the silicon deep etching step. We first grow a thin (about 20 nm), protective layer of Al_{2}O_{3} with ALD, to shield the membrane layer from plasma bombardment during PECVD. Then, approximately 125 nm of Si_{x}N_{y} is grown at 300 °C, with 40 W of radio-frequency power exciting the plasma during deposition. This Si_{x}N_{y} layer has been characterized to have a tensile stress of around +300 MPa at room temperature. The layer perfectly seals the nanopillars during immersion in hot KOH, without significant consumption.

After patterning the pillars on the wafer frontside film, a thick (about 3 μm) layer of positive tone photoresist is spun on top for protection during the backside lithography process, which we perform with an MLA150 laser writer (Heidelberg Instruments). Optical lithography is followed by Si_{3}N_{4} dry etching with a plasma of CHF_{3} and SF_{6}. After the resist mask and protection layer removal with *N*-methyl-2-pyrrolidone (NMP) and O_{2} plasma, we deep-etch with KOH from the membrane windows while keeping the frontside protected, by installing the wafer in a watertight PEEK holder in which only the backside is exposed^{6}. KOH 40% at 70 °C is used, and the etch is interrupted when about 30–40 μm of silicon remains. The wafer is then rinsed and cleaned with hot HCl of the residues formed during KOH etching. Then, the wafer is separated into individual dies with a dicing saw, and the process continues chipwise. Chips are again cleaned with NMP and O_{2} plasma, and the deep-etch is concluded with a second immersion in KOH 40% at a lower temperature of 55 °C, followed by cleaning in HCl. From the end of the KOH etching step, the composite membranes are suspended, and great care must be taken while displacing and immersing the samples in liquid. We dry the samples by moving them to an ultrapure isopropyl alcohol bath after water rinsing. Isopropyl alcohol has a high vapour pressure, and quickly evaporates from the chip interfaces, with few residues left behind.

Finally, the PECVD nitride and Al_{2}O_{3} layers can be removed selectively with wet etching in buffered HF. The chips are loaded in a Teflon carrier in which they are vertically mounted and immersed for about 3 min 20 s in BHF 7:1. It is crucial not to etch more than necessary to fully remove the encapsulation films: membranes become extremely fragile and the survival yield drops sharply when their thickness is reduced below around 15 nm. The membranes are then carefully rinsed, transferred in an ethanol bath and dried in a critical point dryer, in which the liquids can be evacuated gently and with little contamination.

### Fabrication and simulation of phononic-crystal-patterned mirrors

The top and bottom mirror substrates are, respectively, fused silica and borosilicate glass, with a high-reflection coating sputtered on one face and an anti-reflection layer coating the other face. No layer for the protection of the optical coating is applied before machining. We use a dicing saw for glass machining to pattern a regular array of lines into the mirror substrates. The blade is continuously cooled by a pressurized water jet during the patterning process. The maximum cut depth allowed for our blade is 2.5 mm, and we constrain the designed PNC accordingly. We cut the flat bottom mirror from only one side (its thickness is only 1 mm), and the top mirror is patterned symmetrically with parallel cuts from both sides, as it is 4 mm thick. The relatively deep cuts in the top mirror need to be patterned over several passes, with gradually increasing depths. After patterning one mirror side, the piece is flipped and the other side is patterned after aligning to the first cuts, visible through the glass substrate. The lines are arranged in a square lattice for simplicity, although more complex patterns can be machined with the dicing saw. After the dicing process, the mirrors are subject to ultrasonic cleaning, while immersing first in acetone and then in isopropanol.

We simulate the band diagrams of the unit cells of both the top and the bottom mirrors in COMSOL Multiphysics with the Structural Mechanics module. We optimized the lattice constant and cut depths to maximize the bandgap width, while centring the bandgap around 1 MHz and making sure that the remaining glass thickness is sufficient to maintain a reasonable level of structural stiffness. Details of the PNC dimensions are shown in the Supplementary Information. Owing to the finite size of the mirrors, we expect to observe edge modes within the mechanical bandgap frequency range. The thermal vibrations of these modes penetrate into the PNC structure with exponentially decaying amplitudes. To account for their noise contributions, we simulated the frequency noise spectrum of the MIM assembly (details shown in the Supplementary Information). The eigenfrequency solution confirmed the existence of edge modes with frequencies within the mechanical bandgap, but did not predict any significant contribution to the cavity frequency noise: the PNC is sufficiently large to reduce their amplitude at the cavity mode position.

After patterning the PNC structures on the mirrors, we assembled a cavity with a spacer chip in place of a membrane and observed that the TE_{00} linewidth with the diced mirrors is identical to that of the original cavity. This indicates that our fabrication process does not cause measurable excess roughness or damage to the mirror surfaces. By contrast, when the assembly was clamped too tightly, excess cavity loss occurred because of significant deformation of the PNC mirrors, with a reduced stiffness. We mitigate this detrimental effect in the experiment by gently clamping the MIM cavity, with a spring compression sufficient to guarantee the structural stability of the assembly. We also ensure that the cavity mode is well-centred on the bottom mirror, to reduce the thermal noise contribution of the upper band-edge modes. For the MIM experiment discussed in the main text, we did not observe any mirror modes within the mechanical bandgap of the membrane chip. We can distinguish membrane modes from mirror modes by exploiting the fact that the coupling rates of membrane modes vary between different cavity resonances, whereas this is not the case for mirror modes.

### Nonlinear noise cancellation scheme

At room temperature, the large thermal noise of the cavity, combined with the nonlinear cavity transduction response, results in a nonlinear mixing noise (TIN). This noise could lead to excess intracavity photon fluctuations and also to excess noise in optical detection. In the following, we discuss the strategy to cancel these effects in the fast-cavity limit (*ω* ≪ *κ*). Theoretical derivations and a discussion of the effect of a finite *ω*/*κ* ratio can be found in the Supplementary Information.

In the experiment, we pump the cavity at the magic detuning, \(2\overline{\varDelta }/\kappa =-1/\sqrt{3}\), in which the nonlinear photon number noise is cancelled, to prevent excess oscillator heating due to nonlinear classical radiation pressure noise. To show the quantum correlations leading to optomechanical squeezing and conduct measurement-based state preparation, we need to perform measurements at arbitrary optical quadrature angles. Balanced homodyne detection provides the possibility of tuning the optical quadrature, but it does not offer enough degrees of freedom to cancel the nonlinear noise in detection. However, if the local oscillator is injected from a highly asymmetric beam splitter with a very small reflectivity (*r* ≪ 1) and the combined field is detected on a single photodiode, the photodetection nonlinearity is maintained and offers enough degrees of freedom to cancel the nonlinear noise in detection^{4} (for a derivation, see Supplementary Information). Specifically, simultaneous tuning of local oscillator amplitude and phase enables nonlinear mixing noise cancellation at arbitrary quadrature angles. In the fast-cavity limit, the cancellation condition is

$$\begin{array}{l}\left|\frac{{\overline{a}}_{{\rm{sig}}}}{{\overline{a}}_{\hom }}\right|=2{\rm{Re}}\,\left[\frac{{{\rm{e}}}^{-{\rm{i}}\theta }}{{(-{\rm{i}}\overline{\varDelta }+\kappa /2)}^{2}}\right]\,\left[{\overline{\varDelta }}^{2}+{\left(\frac{\kappa }{2}\right)}^{2}\right]\\ \,=2\cos [\theta -2\arg ({\chi }_{{\rm{cav}}}(0))],\end{array}$$

where \({\overline{a}}_{\hom }\approx {\overline{a}}_{{\rm{sig}}}+r{\overline{a}}_{{\rm{LO}}}\) is the coherent combination of the signal field \({\overline{a}}_{{\rm{sig}}}\) and the local oscillator field \({\overline{a}}_{{\rm{LO}}}\) (defined as the field before the beam splitter), *θ* = *θ*_{hom} − *θ*_{sig} is the quadrature rotation angle and \({\chi }_{{\rm{cav}}}(0)={\left(\kappa /2-i\overline{\varDelta }\right)}^{-1}\) is the cavity d.c. optical susceptibility.

In the experiment, to detect a certain quadrature angle while cancelling nonlinear noise, we lock the homodyne power at the corresponding combined field level \({I}_{\hom }=| {\overline{a}}_{\hom }{| }^{2}\). We then continuously vary the local oscillator power using a tunable neutral density filter until the noise in the mechanical bandgap is perfectly cancelled. The level of mixing noise is very sensitive to the local oscillator power, and therefore the cancellation point can serve as a good indicator of the measured quadrature angle *θ*. Knowing the field amplitudes \(| {\overline{a}}_{\hom }| ,| {\overline{a}}_{{\rm{sig}}}| \) and that \(\overline{\varDelta }=-\kappa /(2\sqrt{3})\), we can reconstruct the measured quadrature angle as the one satisfying the cancellation condition.

A detailed characterization of the nonlinear mixing noise and an analysis of single-detector homodyne efficiency can be found in the Supplementary Information.

### Multimode Kalman filter

The continuous position measurement of an oscillator at frequency *Ω*_{m} can be viewed as a form of heterodyne measurement of two orthogonal mechanical quadratures of motion \(\widehat{X}\) and \(\widehat{Y}\) that rotate with frequency *Ω*_{m}. IQ demodulation can then be carried out at the mechanical frequency *Ω*_{m}. This results in two independent measurement channels of two orthogonal mechanical quadratures with independent measurement noise.

We work in a parameter regime in which the measurement rate is significantly smaller than the frequency of the mechanical mode, such that we can perform IQ demodulation of the mechanical motion at *Ω*_{m} to obtain the slowly varying \(\widehat{X},\widehat{Y}\) quadratures. Their evolution is described by decoupled quantum master equations^{33}. In this parameter regime, only thermal coherent states are prepared through the measurement process. These states are essentially thermal states displaced from the origin of the phase space and belong to the larger group of Gaussian states.

We operate in the fast-cavity limit *Ω*_{m} ≪ *κ*, so the cavity dynamics are simplified in our modelling. After IQ demodulation, the normalized photocurrent signal is described by

$${\bf{i}}(t){\rm{d}}t={\rm{d}}{\bf{W}}(t)+\sum _{i}\sqrt{4{\varGamma }_{{\rm{meas}}}^{i}}\langle {\widehat{{\bf{r}}}}_{i}\rangle (t){\rm{d}}t$$

(1)

where the subscript *i* denotes different mechanical modes, \({\bf{i}}=\left[\begin{array}{c}{i}_{X}\\ {i}_{Y}\end{array}\right]\), \({\widehat{{\bf{r}}}}_{i}=\left[\begin{array}{c}{\widehat{X}}_{i}\\ {\widehat{Y}}_{i}\end{array}\right]\) and \({\rm{d}}{\bf{W}}=\left[\begin{array}{c}{\rm{d}}{W}_{X}\\ {\rm{d}}{W}_{Y}\end{array}\right]\). The Wiener increment d*W*_{X,Y}(*t*) = *ξ*(*t*)d*t* is defined in terms of an ideal unit Gaussian white noise process \(\langle \xi (t)\xi ({t}^{{\prime} })\rangle =\delta (t-{t}^{{\prime} })\).

As the measurement is purely linear, the system remains in a Gaussian state^{54}, and the dynamics are completely captured by the expectation values of the quadratures ⟨*X*_{i}⟩, ⟨*Y*_{i}⟩ and their covariance matrix *C*. We derive the time evolution of the quadrature expectation values as

$${\rm{d}}\langle {\widehat{{\bf{r}}}}_{i}\rangle ={A}_{i}\langle {\widehat{{\bf{r}}}}_{i}\rangle {\rm{d}}t+2{B}_{i}{\rm{d}}{\bf{W}}(t),$$

(2)

where

$${A}_{i}=\left[\begin{array}{cc}-{\varGamma }_{{\rm{m}}}^{i}\,/2 & {\varOmega }_{i}-{\varOmega }_{{\rm{m}}}\\ {\varOmega }_{{\rm{m}}}-{\varOmega }_{i} & -{\varGamma }_{{\rm{m}}}^{i}\,/2\end{array}\right]$$

and

$${B}_{i}=\left[\begin{array}{cc}{\sum }_{j}\sqrt{{\varGamma }_{{\rm{meas}}}^{j}}{C}_{{\widehat{X}}_{i}{\widehat{X}}_{j}} & {\sum }_{j}\sqrt{{\varGamma }_{{\rm{meas}}}^{j}}{C}_{{\widehat{X}}_{i}{\widehat{Y}}_{j}}\\ {\sum }_{j}\sqrt{{\varGamma }_{{\rm{meas}}}^{j}}{C}_{{\widehat{Y}}_{i}{\widehat{X}}_{j}} & {\sum }_{j}\sqrt{{\varGamma }_{{\rm{meas}}}^{j}}{C}_{{\widehat{Y}}_{i}{\widehat{Y}}_{j}}\end{array}\right].$$

The covariance matrix elements \({C}_{\widehat{M}\widehat{N}}=\langle \widehat{M}\widehat{N}+\widehat{N}\widehat{M}\rangle /2-\langle \widehat{M}\rangle \langle \widehat{N}\rangle \) evolve as

$$\begin{array}{l}{\dot{C}}_{{\widehat{M}}_{i}{\widehat{N}}_{j}}=-\frac{{\varGamma }_{{\rm{m}}}^{i}+{\varGamma }_{{\rm{m}}}^{j}}{2}{\dot{C}}_{{\widehat{M}}_{i}{\widehat{N}}_{j}}+{\delta }_{{\widehat{M}}_{i},{\widehat{N}}_{j}}{\varGamma }_{{\rm{th}}}^{i}+{\delta }_{M,N}\sqrt{{\varGamma }_{{\rm{qba}}}^{i}{\varGamma }_{{\rm{qba}}}^{j}}\\ \,\,+{(-1)}^{{\delta }_{M,Y}}({\varOmega }_{i}-{\varOmega }_{{\rm{m}}}){C}_{{\widehat{{\mathcal{M}}}}_{i}{\widehat{N}}_{j}}+{(-1)}^{{\delta }_{N,Y}}({\varOmega }_{j}-{\varOmega }_{{\rm{m}}}){C}_{{\widehat{M}}_{i}{\widehat{{\mathcal{N}}}}_{j}}\\ \,-4\left(\sum _{k}\sqrt{{\varGamma }_{{\rm{meas}}}^{k}}{C}_{{\widehat{M}}_{i}{\widehat{X}}_{k}}\right)\left(\sum _{l}\sqrt{{\varGamma }_{{\rm{meas}}}^{l}}{C}_{{\widehat{N}}_{j}{\widehat{X}}_{l}}\right)\\ \,-4\left(\sum _{k}\sqrt{{\varGamma }_{{\rm{meas}}}^{k}}{C}_{{\widehat{M}}_{i}{\widehat{Y}}_{k}}\right)\left(\sum _{l}\sqrt{{\varGamma }_{{\rm{meas}}}^{l}}{C}_{{\widehat{N}}_{j}{\widehat{Y}}_{l}}\right),\end{array}$$

(3)

where \(\widehat{{\mathcal{M}}}\) and \(\widehat{{\mathcal{N}}}\) are the canonical conjugate observables of \(\widehat{M}\) and \(\widehat{N}\).

Equations (1)–(3) form a closed set of update equations given the measurement record *i*(*t*), and enable quadrature estimations of an arbitrary number of modes and their correlations. The thermal occupancy \({\bar{n}}_{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{d}},i}\) of a specific mechanical mode is determined by the quadrature phase-space variances \({V}_{{\widehat{X}}_{i}}={C}_{{\widehat{X}}_{i}{\widehat{X}}_{i}}\) and \({V}_{{\widehat{Y}}_{i}}={C}_{{\widehat{Y}}_{i}{\widehat{Y}}_{i}}\), which are both equal to \({\bar{n}}_{{\rm{c}}{\rm{o}}{\rm{n}}{\rm{d}},i}+1/2\).

We record the voltage output from the photodetector using an UHFLI lock-in amplifier (Zurich Instruments), digitizing the signal at a 14-MHz sampling rate for a total duration of 2 s, and we store the data digitally for post-processing. The noise power spectrum density of the digitized signal is compared with that simultaneously measured on a real-time spectrum analyser, to rule out signal-to-noise ratio degradation from the digitization noise. Details of an additional filtering step are discussed in the Supplementary Information. After filtering, only the 10 mechanical modes around the defect mode frequency *Ω*_{m} are kept for the multimode state estimation study.

To perform the multimode state estimation, we extract the required system parameters of the nearest 10 mechanical modes around *Ω*_{m} by fitting the measured spectral noise density. We demodulate the signal at *Ω*_{m} and feed the time-series signal **i**(*t*) to the discretized version of the update equation (2),

$$\Delta \langle {\widehat{{\bf{r}}}}_{i}\rangle ={A}_{i}^{{\prime} }\langle {\widehat{{\bf{r}}}}_{i}\rangle \Delta t+2{B}_{i}\Delta {\bf{W}}(t)$$

(4)

to track all the 20 quadrature expectations at different times. Here, \({A}_{i}^{{\prime} }=\left[\begin{array}{cc}-{\varGamma }_{{\rm{m}}}^{{\prime} i}\,/2 & {\varOmega }_{i}^{{\prime} }-{\varOmega }_{{\rm{m}}}\\ {\varOmega }_{{\rm{m}}}-{\varOmega }_{i}^{{\prime} } & -{\varGamma }_{{\rm{m}}}^{{\prime} i}\,/2\end{array}\right]\) contains modified mechanical parameters:

$$\begin{array}{l}{\varGamma }_{{\rm{m}}}^{{\prime} i}={\varGamma }_{{\rm{m}}}^{i}+2{\rm{Re}}\,\left[-\frac{1-\cos (({\varOmega }_{i}-{\varOmega }_{{\rm{m}}})\Delta t)}{\Delta t}\right]\\ {\varOmega }_{i}^{{\prime} }={\varOmega }_{i}-{\rm{Im}}\,\left[i({\varOmega }_{i}-{\varOmega }_{{\rm{m}}})-\frac{{{\rm{e}}}^{{\rm{i}}({\varOmega }_{i}-{\varOmega }_{{\rm{m}}})\Delta t}-1}{\Delta t}\right]\end{array}$$

to compensate for the influence of discretization on the state estimation performance compared with an ideal continuous one.

The evolution of the matrix *B*_{i}, involving 210 independent covariance matrix elements, can be computed independently from the sampled time-domain data. Therefore, we calculate it following equation (3), with an update rate of 140 MHz to mitigate the discretization effect, which is then used for the update equation (4) at the sampling rate of 14 MHz. The verification of the correct implementation of the multimode Kalman filter is shown in the Supplementary Information.

To experimentally reconstruct the covariance matrix from the estimated quadrature data, we use the retrodiction method. The retrodiction method uses the measurement record in the future as a separate state estimation result. We derived the retrodiction update equations^{39} and found that they are identical to the prediction update equations, except with negative mechanical frequencies. As a result, we have the following relations between covariance matrix elements estimated by prediction and retrodiction (respectively identified by the superscripts p and r):

$$\begin{array}{l}{C}_{{\widehat{X}}_{i}{\widehat{X}}_{j}}^{{\rm{p}}}={C}_{{\widehat{X}}_{i}{\widehat{X}}_{j}}^{{\rm{r}}}\\ \,{C}_{{\widehat{Y}}_{i}{\widehat{Y}}_{j}}^{{\rm{p}}}={C}_{{\widehat{Y}}_{i}{\widehat{Y}}_{j}}^{{\rm{r}}}\\ {C}_{{\widehat{X}}_{i}{\widehat{Y}}_{j}}^{{\rm{p}}}=-{C}_{{\widehat{X}}_{i}{\widehat{Y}}_{j}}^{{\rm{r}}}.\end{array}$$

For each time trace slice (1 ms), we calculate the difference between the prediction and retrodiction results \({\langle \widehat{{\bf{r}}}\rangle }_{{\rm{r}}}-{\langle \widehat{{\bf{r}}}\rangle }_{{\rm{p}}}\), and calculate the covariance matrix as

$$C=\frac{1}{2}\langle \langle \left({\langle \widehat{{\bf{r}}}\rangle }_{{\rm{r}}}-{\langle \widehat{{\bf{r}}}\rangle }_{{\rm{p}}}\right)\cdot {\left({\langle \widehat{{\bf{r}}}\rangle }_{{\rm{r}}}-{\langle \widehat{{\bf{r}}}\rangle }_{{\rm{p}}}\right)}^{\top }\rangle \rangle $$

where ⟨⟨⋯⟩⟩ is the statistical average over all the time trace slices, and \(\widehat{{\bf{r}}}=\left[\cdots ,{\widehat{X}}_{i},{\widehat{Y}}_{i},\cdots \right]\). The symbol^{T} indicates the transposed vector.

For a system consisting of several mechanical modes that are not sufficiently separated in frequency (∣*Ω*_{i} − *Ω*_{j}∣ not significantly faster than any other rates in the system), cross-correlations between different mechanical modes emerge because of common measurement imprecision noise and common quantum backaction force. This generally leads to higher quadrature variance because of the effectively reduced measurement efficiency of individual modes. To decouple the mechanical oscillators that are interacting because of the spectral overlap and the measurement process, we define a new set of collective motional modes through a symplectic (canonical) transformation of quadrature basis *U* that diagonalizes the covariance matrix *U*^{†}*CU* = *V* (ref. ^{55}). As the covariance matrix is real and symmetric, the elements of *U* are always real, which is required for real observables. The transformation can be understood as a normal mode decomposition of the collective Gaussian state that preserves the commutation relations, as opposed to conventional diagonalization using unitary matrices. This is represented by the requirement of the symplectic transformation *UΩU*^{†} = *Ω*, where \(\varOmega =\left[\begin{array}{cc}0 & {I}_{N}\\ -{I}_{N} & 0\end{array}\right]\) is the N-mode symplectic form and *I*_{N} is the *N* × *N* identity matrix. We find that in the new quadrature basis based on the diagonalized covariance matrix, the defect mode is only weakly modified. The transformation coefficients for the defect mode are shown in the Supplementary Information.