Table of Contents
All animal procedures and studies were approved by the University of California San Diego Institutional Animal Care and Use Committee according to the Association for Assessment and Accreditation of Laboratory Animal Care guidelines. Mice were maintained on a 12:12 light–dark cycle with ad libitum standard chow diet and water. Transgenic mouse assays were performed using Mus musculus C57BL/6 NHsd strain (Envigo). Animals of both sexes were included in this study.
Generation of transgenic mice using CRISPR–Cas9
Cas9 protein (IDT, 1081058), trans-activating CRISPR RNA (tracrRNA) (IDT, 1072532), CRISPR RNA (crRNA) and single-stranded DNA (ssDNA) homology-directed repair template oligos were co-injected into one-cell embryos at the Moores UCSD Cancer Center Transgenic Mouse Shared Resource. Custom ssDNA repair oligonucleotides and crRNAs were synthesized by IDT. We designed and selected crRNA if the guide sequence is predicted to have high specificity on CRISPOR (http://crispor.tefor.net/crispor.py) and if the mutation introduced by homology-directed repair will ablate the protospacer adjacent motif (PAM) site. Because a PAM site is not available in the genomic locus for the human and synthetic mutations, we first generated a mouse line that contained a de novo PAM site within the ETS-A site. The French 2, Indian 2, Syn 0.25 and Syn 0.52 mouse lines were generated with CRISPR–Cas9, using one-cell embryos with the new PAM background (Supplementary Table 4). The LOF mutation mouse line was generated using embryos with the WT background. All mouse lines were generated by homology-directed repair using ssDNA as a repair template. Genome-edited founders were identified by genotyping as described below. Wherever possible, multiple founders bearing the same desired allele were used to establish each line. Founders were crossed to WT C57BL/6N mice to generate the F1 generation for each mouse line.
Genomic tail DNA was obtained and used to genotype ETS-A transgenic mice with the following primers: forward 5′-GGACAAGAGATTAGCGTGGCTGGTGATTTCCTTTCACCCAGC-3′ and reverse 5′-GACACCAGACCAACTGGTAATGCATAATGACAGCAACATCC-3′. The underlined sequences anneal to the ZRS, and the remaining sequences are overhangs used to clone ZRS PCR products into a vector containing an ampicillin resistance cassette by Gibson assembly. For all mice, including founders, PCR products were analysed by Sanger sequencing (sequencing primer: 5′-CATCCTAGAGTGTCCAGAACC-3′) to identify ZRS genotypes. For all founder mice, PCR products were cloned, and individual clones were sequenced to confirm the initial genotyping results with single-allele resolution.
Each mouse born into our colony has all four limbs inspected by an investigator blind to genotype at postnatal day (P)10–18 during routine ear clipping (for identification) and tail biopsy collection (for genotyping). Limb and/or digit phenotypes, or the absence thereof, are readily detectable in postnatal mice and recorded in detail. Each limb is inspected for the number of digits and the presence of other features, including triphalangeal first digit(s) and/or shortened limbs. After genotyping, phenotypic data for each genotype in each ETS-A transgenic line are collated to calculate penetrance (on the basis of the presence or absence of phenotype).
Statistical tests for mouse phenotypes
Fisher’s exact test was used to measure any statistical difference in the penetrance and laterality of polydactyly across any pair of the approximately 0.25-affinity mouse lines (French 2, Indian 2 or Syn 0.25 mice). For penetrance, there are two factors: have phenotype or no phenotype. For laterality, there are three factors: bilateral, unilateral or no phenotype. To determine whether the occurrence of a unilateral phenotype has a bias on the left or right, P values from chi square goodness of fit test were calculated. Fisher’s exact test was used to measure any statistical difference in digit phenotypes across any pair of approximately 0.25-affinity mouse lines (French 2, Indian 2 or Syn 0.25 mice). P values measuring any difference in digit phenotypes across males and females in each mouse line are also calculated. These tests have nine factors: five digits no TT, five digits one TT, … , seven digits three TT.
Timed matings for embryo collection
Within each ETS-A transgenic mouse line, timed matings were set up and monitored each morning for vaginal plug formation. The date that plugs were observed was noted as E0.5. Females were removed from males on the plug date and embryos were staged at dissection. Embryos labelled as E11.75 have around 48 somites, and E12.0 embryos have around 53 somites. Pregnant females were humanely euthanized by isoflurane overdose. Embryos were dissected in ice-cold phosphate-buffered saline (PBS) pH 7.4 and then fixed in 4% paraformaldehyde in PBS pH 7.4 overnight with gentle rotation at 4 °C. Embryos were then dehydrated through a graded methanol series at 4 °C (25%, 50%, 75% methanol in PBS pH 7.4 plus 0.1% Tween-20, 100% methanol) and stored in 100% methanol at −20 °C for up to six months until use. The yolk sac of each embryo was collected and used for genotyping as described above. The sex of embryos is unknown.
Probe cloning and synthesis for in situ hybridization
Shh and Ptch1 templates were amplified from mouse E11.5 cDNA using previously described primers52, and were ligated into a pCR BluntII TOPO vector, transformed into TOP10 competent cells and plated for selection on kanamycin plates. Colonies were selected for sequence verification and then plasmid prepped. Plasmid DNA was linearized with SpeI or NotI restriction enzyme and then used as a template for in vitro transcription using a digoxigenin labelling kit (Roche, 11175025910) with T7 (antisense) or Sp6 (sense) polymerase. After DNase treatment to digest template DNA, RNA probes were recovered using a RNeasy mini kit, and RNA concentration and purity were measured to confirm probe synthesis.
Whole-mount in situ hybridization
Embryos were treated with 6% H2O2 in methanol for one hour, and then rehydrated through a methanol series to PBS-T (1% Tween-20 in PBS pH 7.4). Embryos were washed 5 × 5 min in PBS-T, and then treated with proteinase K (10 µg ml−1) (Invitrogen, 1000005393) for 20 min. After permeabilization, embryos were washed in PBS-T containing 2 mg ml−1 glycine, then PBS-T, then post-fixed for 20 min in 4% paraformaldehyde (PFA)/0.2% glutaraldehyde in PBS-T. Embryos were then washed 2 × 5 min in PBS-T, followed by 10 min in a 1:1 mixture of PBS-T and hybridization solution (50% formamide, 5× SSC pH 4.5, 1% SDS, 50 µg ml−1 yeast tRNA, 50 µg ml−1 heparin). Embryos were then allowed to sink (no rocking) in hybridization solution for 10 min. They were then changed to new hybridization solution and incubated for at least one hour at 65 °C. Hybridization solution was replaced with fresh hybridization solution containing 1 µg ml−1 of antisense (all ETS-A embryos and WT control) or sense (WT control only) probe followed by overnight incubation at 65 °C. Embryos were washed 3 × 30 min in solution I (50% formamide, 5× SSC pH 4.5, 1% SDS) at 65 °C followed by 3 × 30-min washes in solution III (50% formamide, 2× SSC pH 4.5) at 65 °C. Embryos were then washed 3 × 5 min in TBS-T (1% Tween-20 in Tris-buffered saline) and blocked for 1 h in block solution (10% heat-inactivated sheep serum and 0.1% Roche blocking reagent in TBS-T). Roche blocking reagent (Roche, 11096176001) was dissolved in maleic acid buffer according to the manufacturer’s recommendations. Embryos were then incubated in block solution containing 1:2,500 anti-digoxigenin-AP antibody (Roche, 11093274910) overnight at 4 °C. Embryos were washed 3 × 5 min in TBS-T and then 5 × 1 h in TBS-T, followed by overnight incubation in TBS-T at 4 °C. Embryos were then washed 3 × 10 min in NTMT (100 mM NaCl, 100 mM Tris pH 9.5, 50 mM MgCl2, 1% Tween-20) before coloration in AP reaction mix (125 µg ml−1 BCIP (Roche, 11383221001) and 250 µg ml−1 NBT (Roche, 11383213001) in NTMT). Coloration was carried to completion in the dark. Embryos were washed 10 min in NTMT followed by 3 × 10 min in TBS-T and then overnight in TBS-T at 4 °C. Embryos were imaged using the Leica M165 FC microscope with the Lumenera Infinity3 camera, then post-fixed in 4% PFA for 30 min and stored in 1% PFA in 4 °C. All steps were performed with gentle rocking and at room temperature unless otherwise specified.
Young postnatal mice at age P10-12 were humanely euthanized by CO2 inhalation before skeletal preparations. Dissected limbs and/or whole cadavers of representative homozygotes for each line were skinned and eviscerated, then fixed in 95% ethanol overnight. Samples were then stained over two nights in cartilage staining solution (75% ethanol, 20% acetic acid and 0.05% Alcian blue 8GX (Sigma-Aldrich, A3157)), rinsed overnight in 95% ethanol, cleared overnight in 0.8% KOH and stained overnight in bone staining solution (0.005% Alizarin red S (Sigma-Aldrich, A5533) in 1% KOH). After staining, samples were further cleared in 20% glycerol in 1% KOH until digits were free of soft tissue and long-bone morphology was visible. Samples were further processed through a graded series of 50% and 80% glycerol in 1% KOH and then into 100% glycerol for imaging and storage. All steps of the skeletal staining procedure were performed with gentle rocking at room temperature.
EMSAs were performed using the LightShift Chemiluminescent EMSA Kit (Thermo Fisher Scientific) with biotinylated and non-biotinylated double-stranded oligonucleotides. Oligonucleotides were annealed according to an advanced protocol (https://tools.thermofisher.com/content/sfs/brochures/TR0045-Anneal-oligos.pdf). DNA-binding domain (DBD) proteins were synthesized using the TNT Quick Coupled Transcription/Translation System (Promega) from the pTNT plasmid for each respective protein. ETS-1 DBD (residues 336–441, which is conserved in sequence from human and mouse) was amplified from the pET28b-Ets1-ETS vector (Addgene, 85735). DBDs for human HOXA13 (residues 322–389) and HOXD13 (residues 276–335) were amplified from human genomic DNA, and sequences were confirmed by Sanger sequencing. The coding sequences for these DBDs were amplified with flanking XhoI and NotI sites and cloned into the pTNT-B18R vector (Addgene, 58978). The binding reaction was performed in a 20-μl volume containing 2 μl of 10× binding buffer (100 mM Tris, 500 mM KCl and 10 mM DTT; pH 7.5), 50 ng Poly(dI:dC), 20 femtomol biotin-labelled probe and protein extract. For competition experiments, a 200-fold molar excess of unlabelled probe was added. Binding reactions were pre-incubated for 10 min before adding the biotin-labelled probe. Binding reactions were then incubated at room temperature for 20 min and loaded onto a DNA retardation gel (6%). Electrophoresis with 0.5× TBE on ice and transfer to a 0.45-μm Biodyne B Nylon membrane (Thermo Fisher Scientific) was done in the cold room. DNA was cross-linked for 15 min using 312-nm light, and the membrane was put between blank sheets of paper overnight. The next day, the biotinylated probes were detected using the Chemiluminescent Nucleic Acid Detection Module (Thermo Fisher Scientific). Images of the resulting membrane were acquired using a Chemidoc MP imaging system (Bio-Rad). For quantification of ETS-1 binding to ETS-A variants, band quantifications were performed by taking the ratio of the volume (intensity) for the shifted band in the reaction with the ETS transcription factor to the volume (intensity) for the unshifted band in the reaction without the ETS transcription factor, using the Analysis Table in Image Lab 6.0.
Calculation of binding affinity
Relative binding affinity is calculated using high-throughput binding data from the UniProbe database (http://thebrain.bwh.harvard.edu/uniprobe/index.php)25,53. Median intensity signals of 8-mers PBM data were measured as a percentage of their optimal 8-mer binding motif.
Analysis of previously published MPRA data
MPRA data were downloaded from previously published papers5,30,36,40. Only single-base substitutions were considered across all datasets. We classified variants as significantly altering expression using P values that were provided in the published supplementary tables. If the study did not report adjusted P values, we adjusted all raw P values using Benjamini–Hochberg adjustment. We defined variants as having a significant change in expression if their adjusted P value was smaller than 0.05. We plot log10(Padj) with direction of change in gene expression, in which positive values depict a significant variant that leads to increased gene expression. We used the one-tailed Mann–Whitney U test to test for enrichments in GOF enhancer activity in different groupings (that is, ‘All other SNVs’, ‘SNVs that do not change affinity’ and ‘SNVs that optimize affinity’). If there were more than 1,000 points in the dataset, we plotted a random 1,000 as dots over the box plots. MPRA data were analysed using standard Python libraries (pandas, numpy, scipy, seaborn, matplotlib). Processing, visualization and statistics were done using custom Python code.
ETS-binding sites were defined as NNGGAWNN54. We defined an ETS-optimizing variant as one that caused at least a 1.59-fold change (alt/ref) in all analyses with the exception of Extended Data Fig. 7. Analysis of the 2% sequence mutagenesis of the ZRS enhancer in Extended Data Fig. 7 defines ETS-optimizing variants as a fold change greater than 1.0, because only three variants within this dataset have a fold change of at least 1.59.
The AP-1-binding site was defined as NTKANNMA. IRF binding affinity was defined as NWNNGANA. Motifs used for ETS, AP-1 and IRF were determined on the basis of crystal structure and mutational analysis data54,55,56,57. We defined AP-1 and IRF-optimizing variants as one that caused at least a 1.5-fold change in binding affinity. We defined SNVs not changing the TFBS affinity as SNVs with a 0.8–1.25-fold change in affinity. For analyses on the IFNβ enhanceosome, we excluded nucleotides that contributed to two overlapping binding sites, and only analysed the effects of affinity-optimization SNVs that affect a single binding site.
Analysis of eQTL data
We analysed eQTL data downloaded from the EBI eQTL catalogue GitHub page (https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tabix/) for a lymphoblastoid cell line generated by the Geuvadis consortium41. These eQTL data were used to design the MPRA library used previously40,41. For the eQTL data, we adjusted the raw P values using the Bonferroni procedure, in which the total number of tests is the total number of genotype–gene-expression associations tested. For the seven ETS affinity-optimizing SNVs that cause significant differential expression in the MPRA experiments40 and have eQTL values, we compare the MPRA expression and eQTL using adjusted log P values plotted with the direction of differential expression for reporter assay and target gene expression (β value). For AP-1 affinity-optimizing SNVs that gave significant differential expression in the MPRA, only three overlapped with significant eQTL variants and so we did not study these. For larger-scale analysis of the relationship between significant variants and target gene expression, we analysed all eQTL variants in the MPRA library (n = 2,663) in a previous report40, and their effects on target gene expression (β values). Finally, for genome-wide analysis of all eQTL variants in lymphoblastoid cell lines, we analysed all eQTLs from the Geuvadis consortium41 with Padj < 0.01. We categorized the eQTL variants into three categories: affinity-optimizing SNVs, SNVs that do not alter affinity and all other SNVs. For eQTL variants that have multiple genes associated, we plotted only the most significant association. A one-tailed Mann–Whitney U test was used to determine any significant enrichment in eQTL variants that do not change affinity or increase affinity with GOF target gene expression (β value).
PBM–ChIP correlation analysis
BigWig files for ChIP–seq data were downloaded from Gene Expression Omnibus accessions for previously published data58,59,60. BigWig files were chosen because they contain the most quantitative metric at base-pair resolution for the ChIP–seq signal. For each dataset, we predicted ETS sites using NNGGAWNN across the reference genome used to create the bigWig. We then extracted the average bigWig ChIP–seq signal over all predicted ETS TFBS 8-mers. We associated each ETS TFBS with its predicted ETS affinity using PBM24. We placed the ETS TFBS into bins of PBM affinity 0–0.1, 0.1–0.2, 0.2–0.3, … 0.9–1.0. Within each bin, we took the average bigWig ChIP–seq signal across each chromosome and plotted the chromosomal averages. The Spearman correlation uses all points from all bins and all chromosomes.
To assess any statistical differences in the penetrance and laterality percentages between French 2, Indian 2 and Syn 0.25 mice, we performed Fisher’s exact test using the fisher.test function in R (Supplementary Table 3). Statistical differences in digit phenotypes were measured using Fisher’s exact test using a 2 × 9 table (Supplementary Tables 3 and 4). To determine whether unilateral polydactyly deviates from the assumption that phenotype would occur at a 50%–50% rate on the right and left hindlimbs, a chi square goodness of fit test was used (Supplementary Tables 3 and 4). To measure any statistical difference between the band intensities in EMSAs across the French 2, Indian 2 and Syn 0.25 sequences, we performed a one-way ANOVA test and found no significant difference with P = 0.18.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.