### Experimental methods

Table of Contents

Micron-sized LiFePO_{4} platelets were synthesized using a solvothermal method^{1,26}. LiFePO_{4} particles were mixed with sucrose at a ratio of 5:1 and heated to 600 °C for 5 h in a tube furnace under flowing Ar to yield the carbon-coated LiFePO_{4}. The STXM experiment was conducted at beamlines 11.0.2.2 and 5.3.2.1 of the Advanced Light Source (ALS). LFP particles were dispersed on the working electrode of the liquid STXM imaging platform. X-ray absorption at Fe(II) and Fe(III) *L*_{3} edge was measured in situ and used to obtain the depth-averaged Li fraction map for each particle.

The experimental uncertainty of Li fraction *c* is estimated by inferring *c* based on the X-ray absorption at several energy levels with bootstrapping, which returns a standard deviation of *c*. The average of the standard deviation over all pixels and particles used in the uncertainty analysis is 0.072. A previous estimate of the standard deviation of Li fraction based on reference images is 0.06 (ref. ^{1}). Therefore, we estimate that the standard deviation of the error of *c* is *σ*_{ϵ} ≈ 0.07.

More details on the synthesis, structure and electrochemical characterization, STXM and AEM imaging, data processing and analysis of experimental uncertainty can be found in Supplementary Information sections 1.1, 1.2 and 6.2.

### Chemomechanics

The presence of elastic energy in the free-energy functional arises from the lattice deformation occurring as Li intercalates into the LFP crystal. The elastic constants *C*_{ijkl} can be found in Supplementary Information section 2.2. The constitutive law for the chemomechanical coupling relates the misfit strain tensor (or the deformation of the crystal lattice when it is stress free) to the local lithium concentration *ε*^{0}(*c*). The elastic stress is \({\sigma }_{ij}={C}_{ijkl}({\varepsilon }_{kl}-{\varepsilon }_{kl}^{0}(c))\), in which *ε*_{kl} is the total strain.

The misfit strain of LFP is anisotropic—insertion of Li expands the lattice in the *a* direction and contracts in the *c* direction. To minimize the elastic energy, the interface between Li-rich and Li-poor phases at equilibrium tends to be aligned in the [101] direction^{20,25} (Supplementary Information section 2.2). Therefore, it is crucial to determine the principal directions (*a* and *c*) for each imaged particle. By comparing images of relaxed particles and the simulation result based on the same particle geometry, we find that the major axis aligns well with the *c* axis, and the largest deviation between the two among all examples is found to be 10° (Supplementary Information section 3). Hence we align the *c*-axis direction to the major axis in our simulations.

### Parameterization

For the PDE-constrained optimization, we need to parameterize the unknown homogeneous free energy *g*_{h}(*c*) and exchange current *j*_{0}(*c*). Because the LFP solubility *δ* is smaller than the experimental error *σ*_{ϵ} = 0.07, *δ* and the unknown constitutive laws cannot be determined from the images. Instead, we use previous knowledge in the literature to inform the binodal composition and the single-phase region (*c* < *δ* and *c* > 1 − *δ*), that is, we represent *μ*_{h}(*c*) = d*g*_{h}/d*c* as the sum of the ideal mixing entropy \({\rm{ln}}\frac{c}{1-c}\) and Legendre polynomials while subject to the constraint of the experimentally measured values of the binodal compositions (*c* = *δ* and 1 − *δ*)^{31,32}. ln*j*_{0}(*c*) is parameterized by Legendre polynomials to ensure that *j*_{0}(*c*) is positive. See Supplementary Information sections 2.3 and 6.8 for details of the parameterizations and prior of *μ*_{h}(*c*) and *j*_{0}(*c*).

Because of the presence of both intraparticle and interparticle heterogeneity, we assume that the spatial heterogeneity \(\psi ({\bf{x}})\equiv {\rm{l}}{\rm{n}}k({\bf{x}})\) follows a prior distribution of a Gaussian random field with spatial covariance *C*(**x**_{1} – **x**_{2}) with an offset *ψ*_{0} that follows a normal distribution independently, where **x **= (x,y) is the spatial coordinate. We set the correlation length *l* of the Gaussian random field to be the width of a pixel, 50 nm, because spatial variation at the subpixel scale cannot be identified. With this prior distribution, the overall variance of the heterogeneity in space is \({\mathbb{E}}\,[\int {(\psi ({\bf{x}})-{\mathbb{E}}[\bar{\psi }])}^{2}{\rm{d}}{\bf{x}}]\,/\,\int {\rm{d}}{\bf{x}}={\sigma }_{\psi }^{2}+{\sigma }_{{\psi }_{0}}^{2}\), in which we define the mean \(\bar{\psi }=\int \psi ({\bf{x}}){\rm{d}}{\bf{x}}/\,\int {\rm{d}}{\bf{x}}\) and the interparticle variation of the mean is \({\mathbb{V}}{\rm{ar}}\left[\bar{\psi }\right]\approx {\sigma }_{{\psi }_{0}}^{2}\) when *l* is much smaller than the size of the particle.

Using Karhunen–Loève expansion, *ψ*(**x**) can be parameterized as

$$\psi ({\bf{x}})={\sigma }_{{\psi }_{0}}{Z}_{0}+\mathop{\sum }\limits_{i=1}^{{N}_{{\rm{K}}{\rm{L}}}}\sqrt{{\lambda }_{i}}{\phi }_{i}({\bf{x}}){Z}_{i}$$

(4)

in which the basis functions *ϕ*_{i}(**x**) are eigenfunctions of the covariance function and *λ*_{i} is the corresponding eigenvalue,

$$\int C({{\bf{x}}}_{1}-{{\bf{x}}}_{2}){\phi }_{i}({{\bf{x}}}_{2}){\rm{d}}{{\bf{x}}}_{2}={\lambda }_{i}{\phi }_{i}({{\bf{x}}}_{1}),$$

(5)

and \({Z}_{0},\ldots ,{Z}_{{N}_{{\rm{KL}}}}\) are components of the parameter **Z**_{j} used in the optimization for particle *j*. Supplementary Information section 8.4 discusses in detail the regularization of these parameters given *σ*_{ψ} and \({\sigma }_{{\psi }_{0}}\).

### PDE-constrained optimization

The optimization is constrained by the PDE model described in the main text and Supplementary Information section 2.1. The coupled PDEs are discretized in space using finite elements (Supplementary Information section 6.3) to obtain a set of differential-algebraic equations (DAEs). The set of DAEs is then solved using numerical differentiation formula with adaptive time stepping and variable order method. See Supplementary Information sections 6.4 and 6.5 for details on the choice of time and spatial accuracy.

The initial condition for *c*(*x*) is set as the initial frame of each half-cycle. For mechanical equilibrium, we enforce a zero-traction boundary condition. Because \(\mu ={\mu }_{{\rm{h}}}(c)-\kappa {\nabla }^{2}c-{c}_{{\rm{s}}}^{-1}{\boldsymbol{\sigma }}::{{\boldsymbol{\varepsilon }}}^{0{\prime} }(c)\) (refs. ^{7,20,24}), a boundary condition for *c* is needed. The Neumann boundary condition on the normal gradient **n** ⋅ ∇*c*, which relies on knowledge of the surface energy between the solid and the electrolyte, cannot be applied in our case. Hence, to avoid determining the gradient from the image, we impose the Dirichlet boundary condition based on boundary values from the images interpolated in time^{31,32} (Supplementary Information section 6.9 shows that this results in a smaller error compared with the no-wetting boundary condition, in which **n** ⋅ ∇*c* = 0).

The reaction rate *R* is directly determined by the local overpotential *η* and concentration *c*. In simulations, we can specify the voltage Δ*ϕ* as a control variable. However, in experimental settings, it is challenging to precisely measure the local overpotential owing to unknown electric potential losses caused by ohmic and contact resistance, as well as the overpotential loss resulting from concentration polarization. Therefore, we adopt an alternative approach by constraining the total reaction rate of each particle, \(I(t)=\int \frac{\partial c}{\partial t}{\rm{d}}V=\frac{{\rm{d}}\bar{c}(t)}{{\rm{d}}t}\), through interpolation of the time-dependent trajectory of the average concentration \(\bar{c}(t)\) observed in the experiment. Using this approach, the local potential Δ*ϕ* in the simulation becomes an algebraic variable that satisfies the constraint of the total reaction rate. See Supplementary Information section 6.7 for the analysis of error introduced by the interpolation of \(\bar{c}(t)\). We note that the dependence of the rate on the overpotential can only be tested if a precise measurement of the local potential at the particle can be obtained.

We use the trust-region method to solve the PDE-constrained optimization, perform forward sensitivity analysis to compute the gradient of the objective function and use Gauss–Newton approximation for its Hessian^{31,32,52}. The optimizer updates **p**_{global} and **Z**_{i} (*i* = 1, …, *N*) concurrently. During each iteration, the forward evaluation of the model and its sensitivities for all half-cycles and particles are performed independently and in parallel. By contrast, the adjoint sensitivity analysis computes the gradient of the objective function much faster than forward sensitivity analysis, however, without the approximation for the Hessian, gradient descent converges to a suboptimal solution (Supplementary Information section 6.6).

### Identifiability

We examine the identifiability of the parameters in the model based on a simulated dataset that contains three particles using two approaches. The first approach is HMC sampling of the posterior distribution. At the current level of pixel error *σ*_{ϵ} = 0.07, the 99% confidence region of *μ*_{h}(*c*) and *j*_{0}(*c*) is tightly distributed around the truth, indicating that they can be correctly identified in the neighbourhood of the minimum objective function (Supplementary Fig. 31). A higher error *σ*_{ϵ} = 0.3 causes the truth to lie outside the 99% confidence region. The second approach is to run the optimizations starting at different initial guesses. Supplementary Fig. 32 shows that, when the optimizer tolerance is sufficiently low, all of the final RMSE from the optimizer can be below the model solver error. However, if the gradient penalty coefficient *κ* in the chemical potential is unknown and included as a parameter in the optimization, over 80% of the final RMSE is above the solver error (Supplementary Information section 7). Therefore, we use the literature value for *κ* = 5.02 × 10^{−10} J m^{−1} to avoid non-identifiability. See Supplementary Information section 7 for further details.

### Regularization

In our study, the number of parameters for **p**_{global}, which represents *g*_{h}(*c*) and *j*_{0}(*c*), is substantially lower than the number of pixels (Supplementary Information section 5.2). Sensitivity analysis of *g*_{h}(*c*) and *j*_{0}(*c*) reveals that the variance of the MAP estimate of *g*_{h}(*c*) and *j*_{0}(*c*) scales inversely with the number of particles, denoted as *N* (Supplementary Information section 8.2 and Supplementary Fig. 34). Also, many particles consist of both charge and discharge cycles, which further decreases the uncertainty of *g*_{h}(*c*) and *j*_{0}(*c*) (ref. ^{32}). We performed regularization on a simulated dataset of three particles with the same level of noise at *σ*_{ϵ} = 0.07, which exhibited a negligible reduction in validation error owing to regularization on *g*_{h}(*c*) and *j*_{0}(*c*) (Supplementary Information section 8.1 and Supplementary Fig. 33). We anticipate an even smaller impact on our dataset with *N* = 39. Therefore, we set *ρ*_{1} = 0 to minimize the bias of **p**_{global} and let **p**_{global} be informed by the available data.

In contrast to **p**_{global}, regularization on *k*(*x*, *y*) is necessary, particularly for particles with nearly uniform Li concentration field (Supplementary Information section 8.3). As the rate (*R*/*k*) increases, leading to a more uniform *c*(*x*) profile^{7,14,19,53}, the uniformity of *c*(*x*) can provide information about the magnitude of *k*(*x*, *y*). However, as *c*(*x*) becomes nearly uniform, the uncertainty in *k*(*x*, *y*) increases. As a result, the squared error associated with these particles shows insensitivity or lacks local minima with respect to the long-wavelength modes of *ψ*(*x*, *y*), which are constant and smoothly varying in space (Supplementary Information section 8.3). Hence, to constrain the long-wavelength modes, we assume a common prior distribution for *ψ*(*x*, *y*) across all particles. Consequently, the regularization parameter *ρ*_{2} is shared by the **Z**_{i} of all particles. Regularization on *k*(*x*, *y*) is also critical to prevent overfitting and leads to a marked reduction in validation error and the error of MAP compared with the truth (Supplementary Information section 8.1 and Supplementary Fig. 33).

We also test using different regularization coefficients for the constant *ψ*_{0} and spatially varying components of *ψ*(*x*, *y*) and find that minimum validation error is reached around when the regularization coefficients are equal (Supplementary Information section 8.4). Because *k*(*x*, *y*) multiplies with *j*_{0}(*c*), we impose a normalization constraint on *k*(*x*, *y*), that is, the average of *ψ*(*x*, *y*) over all particles weighted by area is 0 (Supplementary Information section 4.1).

### Cross-validation

We consider two approaches for the cross-validation at different values of *ρ*_{2}: (1) allowing **p**_{global} to vary as *ρ*_{2} changes or (2) fixing **p**_{global} at its MAP values when \({\rho }_{2}={\sigma }_{{\epsilon }}^{2}/{\sigma }_{\psi }^{2}=0.01\), in which *σ*_{ϵ} ≈ 0.7 was first estimated using finite differences (see Supplementary Information section 4.2). Studies using simulated datasets show that using a small *ρ*_{2} reduces the bias of **p**_{global} (Supplementary Fig. 35). Results on simulated datasets show that the two approaches lead to similar optimal *ρ*_{2} and a comparable reduction in the validation error and MAP error. In the case of experimental datasets, we also perform cross-validation using these two approaches and find that, although the first approach leads to a lower training error, the second approach has a substantially smaller validation error (Supplementary Information section 8.5 and Supplementary Fig. 40). Therefore, we adopt the latter approach, which also allows us to perform *k*-fold cross-validation independently for each particle, as only **Z**_{i} is updated while varying *ρ*_{2} (Supplementary Information section 8.6).

We then performed *k*-fold cross-validation, training the model on *k* − 1 half-cycles and validated on the other one. Using the one-standard-error rule^{54}, we determine the optimal *ρ*_{2} = 0.88. As mentioned in the main text, the training RMSE at *ρ*_{2} = 0.88 is 6.8%. By contrast, the training RMSE at *ρ*_{2} = 0.01 is 6.0%, suggesting overfitting. Conversely, the training RMSE at *ρ*_{2} → *∞* is 10.6% (Supplementary Table 3 and Supplementary Information section 9.4), that is, the kinetic prefactor is spatially uniform (*k*(*x*, *y*) = 1), which indicates underfitting. At larger *ρ*_{2}, the bias in *g*_{h}(*c*) and *j*_{0}(*c*) is also expected to be large because the bias increases with *ρ*_{2} regardless of the number of particles (Supplementary Information section 8.2 and Supplementary Fig. 35). For a visual comparison of the RMSE using different parameters, refer to Supplementary Fig. 50.

Most of the particles with three half-cycles show a strong correlation among *k*(*x*, *y*) trained on the basis of the three training datasets, and all three validation curves have clear minima around the optimal *ρ*_{2}. The presence of a few particles that show a lack of consistency among *k*(*x*, *y*) in cross-validation reflects the possibility of unmodelled effects in certain half-cycles in the heterogeneous dataset (Supplementary Information section 8.6). Therefore, to avoid bias, the reported spatial heterogeneities *k*(*x*, *y*) in Supplementary Fig. 57 are obtained by training on all half-cycles with equal weights using the optimal regularization coefficient.

### Learning from the uniformity coefficient

Using the uniformity coefficient only as the training data and hence discarding spatial information, we use both the full 2D continuum model on a square and a simplified ‘0D’ model that excludes all spatial correlation (spatial heterogeneity is included in both models but only *σ*_{ψ} is an unknown parameter, whereas *k*(*x*, *y*) is a fixed and randomly generated Gaussian random field). HMC sampling shows that the posterior distribution of parameters of *j*_{0}(*c*) and *μ*_{h}(*c*) approximately lies on a manifold that is defined by the contours of the terms in the normalized autocatalytic rate *s*/*R* ((ln*j*_{0})′ and \({j}_{0}{\mu }_{{\rm{h}}}^{{\prime} }\); see Supplementary Fig. 13), resulting in coupling between the magnitude of *j*_{0} and the slope of *μ*_{h}(*c*) and between parameters of *j*_{0}(*c*). Despite the coupling, the persistent features that arise from the HMC sampling are the non-monotonicity of *μ*_{h}(*c*) and the asymmetry of *j*_{0}(*c*).

We also include ohmic film resistance, which alters the asymptotic relationship between rate and overpotential at high overpotential, and found that it has a broad posterior distribution over many orders of magnitude and has little correlation with other parameters. Hence, ohmic film resistance is omitted in the full-image inversion.