Fusobacterium strain isolation from tumour tissue from patients with CRC
Table of Contents
Fusobacterium strains were isolated from CRC tumour tissue specimens from patients from North America and Europe as previously described6. Briefly, tissue sections were minced with a scalpel, and spread plated on selective fastidious anaerobe agar (FAA) plates (Oxoid, Thermo Fisher Scientific) supplemented with 7% or 10% defibrinated horse blood (DHB; Lampire Biological Laboratories, Fisher Scientific) with josamycin, vancomycin and norfloxacin at 3, 4 and 1 μg ml−1, respectively (Sigma Aldrich). Plates were incubated at 37 °C in anaerobic conditions (AnaeroGen Gas Generating Systems, Oxoid, Thermo Fisher Scientific) and inspected for growth every 2 days. Colonies were picked and streak purified, and colony PCR was carried out on selected bacterial colonies as previously described6 with 16S rRNA gene universal primers (342F and 1492R). Colony PCR products were sent for Sanger sequencing, and BLASTn analysis of trace sequences was used to confirm bacterial species identity. Cultures were suspended in tryptic soy broth (TSB) and 40% glycerol and stored at −80 °C.
Fusobacterium strain isolation from Korean Collection for Oral Microbiology and ATCC ampoules
Fusobacterium strains from the Korean Collection for Oral Microbiology (KCOM) collection were isolated from the oral cavity as previously described44. Strains from the ATCC and KCOM repositories were grown from ampoules on Schaedler agar plates supplemented with vitamin K1 and 5% defibrinated sheep blood (Becton Dickinson) and FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 7% DHB (Lampire Biological Laboratories, Fisher Scientific). Plates were incubated at 37 °C in a Bactron600 anaerobic chamber (Sheldon Manufacturing) for 5–7 days. Cultures were suspended in Schaedler broth with vitamin K1 and 30% glycerol and stored at −80 °C.
High molecular weight genomic DNA extraction
Fusobacterium strains were cultured under anaerobic conditions at 37 °C (AnaeroGen Gas Generating Systems, Oxoid, Thermo Fisher Scientific) for 48–72 h on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Lampire Biological Laboratories, Fisher Scientific) and plates for CRC-associated strains were further supplemented with josamycin, vancomycin and norfloxacin at 3, 4 and 1 μg ml−1, respectively (Sigma Aldrich). High molecular weight genomic DNA was extracted using the MasterPure Gram Positive DNA Purification Kit (Epicentre, Lucigen). Cells from two plates were resuspended in 1.5 ml 1× PBS and collected by centrifugation. Pellets were processed according to the manufacturer’s instructions, modified by doubling all reagent volumes and removing vortexing steps to prevent DNA shearing. High molecular weight genomic DNA was quantified using a Qubit fluorometer (Thermo Fisher Scientific).
PacBio single-molecule real-time sequencing and genome assembly
Single-molecule real-time sequencing18 was carried out on a PacBio Sequel instrument (Pacific Biosciences) or a PacBio Sequel II instrument (Pacific Biosciences) at the University of Minnesota Genomics Center. Sequencing reads were processed using Microbial Assembly pipeline within Pacific Biosciences’ SMRTAnalysis pipeline v.9.0.0.92188. Additional assembly was carried out using Flye assembler v.2.8 as needed (https://github.com/fenderglass/Flye).
Fusobacterium species typing
Fusobacterium genomes were subtyped to the species level and Fn genomes were further subtyped to the subspecies level on the basis of a cumulative score of individual marker genes. Marker genes previously used for Fusobacterium typing were used: the 16S rRNA gene, rpoB and a zinc metalloprotease gene30. From each complete, closed genome, its species or subspecies classification was first analysed by all three marker genes individually. Each marker gene was isolated and analysed using BLASTn, with the top hit by percentage identity noted. For each possible species or subspecies, a confidence score was calculated as the number of concordant subspecies results divided by the number of marker genes present. For each genome, its final classification was determined by the highest confidence score. Results for this analysis are noted in Supplementary Table 1. Phylogenetic classifications were further tested using GTDB-Tk (ref. 64; https://github.com/Ecogenomics/GTDBTk) as listed in Supplementary Table 2.
Pangenomic analyses
Pangenome analysis was carried out using the Anvi’o workflow21, the PPanGGOLiN tool51 and the GiG-map tool (https://github.com/FredHutch/gig-map) to characterize the Fn pangenome across 135 Fn genomes, and to characterize the Fna pangenome across 51 Fna genomes. For Fn genomes, Anvi’o thresholds were set to a minbit of 0.9 and an MCL of 2, and PPanGGOLiN thresholds were set to 90% identity and 90% coverage. For Fna genomes, Anvi’o thresholds were set to a minbit of 0.9 and an MCL of 7, and PPanGGOLiN thresholds were set to 90% identity and 90% coverage. For both genome sets, GiG-map was run with default settings. PPanGGOLiN’s alignment feature was used to map resulting Anvi’o gene clusters to their corresponding PPanGGOLiN nodes. To assess the size of the pangenome as the number of sampled genomes increases, the Fn and Fna Anvi’o-derived pangenomes were independently sampled for combinations up to 10,000 or otherwise randomly subsampled 10,000 times from 1 to 135 genomes and 1 to 75 genomes, respectively. This approach was subset by niche and clade as appropriate.
Genomic dendrograms
Individual gene and protein sequences were aligned through MEGA X (ref. 65) using the MUSCLE clustering algorithm from which a maximum-likelihood dendrogram was generated. kSNP3 (ref. 45) with a k-mer size of 13, resulting in a fraction of core k-mers of 0.217, was used to generate a maximum-likelihood phylogeny of the 135 Fn genomes in our collection. Final images were generated using the interactive tree of life tool, v.5 (ref. 66).
Identification of Fn canonical virulence factors
To query the presence of canonical Fn virulence genes in our collection of Fn genomes, we used the Operon Contextualization Across Prokaryotes to Uncover Synteny tool (https://github.com/FredHutch/octapus) with a minimum percentage identity threshold of 60%.
Identification of Fn genetic defence systems and prophage
The presence of innate bacterial defence systems was queried using the Prokaryotic Antiviral Defense Locator67 and intact prophage presence was analysed using the Phage Search Tool Enhanced Release68,69 tools.
PCA
PCA of Fn Anvi’o-derived gene content was carried out on a gene cluster presence–absence matrix using the R prcomp function in the stats package, v.3.6.2. PCA of Fna methylated nucleotide motifs was carried out on a methylated motif presence–absence matrix (Supplementary Table 7) using the PCA function in the R factoextra package, v.1.0.7.
Fn and HCT116 co-culture assays
The human colon cancer epithelial cell line HCT116 was purchased from ATCC. The cell line was not authenticated. Mycoplasma testing was carried out using the MycoProbe Mycoplasma Detection Kit (R&D Systems). HCT116 cells were cultured in McCoys 5A with l-glutamine (Corning) supplemented with 10% (v/v) fetal bovine serum (Sigma) and incubated at 37 °C in 5% CO2. HCT116 cells were seeded at 1.25 × 106 cells per well into 6-well plates with a glass coverslip at the bottom of each well (Nunclon Delta Surface, Thermo Scientific) and allowed to adhere for 16 h. Resuspended cultures of Fna C1 (SB048, KCOM 3363 and KCOM 3764) and Fna C2 (SB001, SB010 and KCOM 2763) strains were prepared in McCoys. Bacterial membranes were stained with 5 µg ml−1 FM 4-64FX (Molecular Probes). Each bacterial strain was co-incubated with HCT116 cells in wells at a multiplicity of infection of 100:1. These bacterial–eukaryotic co-cultures were incubated for 3 h at 37 °C in 5% CO2. Bacterial viability was assessed at time (T) = 0, T = 1.5 and T = 3 h by preparing serial dilutions for each strain and plating 50 µl of each dilution on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Hemostat, Fisher Scientific). Plates were incubated at 37 °C in a Bactron600 anaerobic chamber (Sheldon Manufacturing) for 2 days until colonies were counted. After incubation, wells were washed four times with PBS with gentle swirling to remove unattached bacterial and HCT116 cells. Cells were fixed in 4% paraformaldehyde in PBS for 30 min at room temperature. Following fixation, cells were washed three times in PBS and then permeabilized with 0.2% (v/v) Triton X-100 in PBS for 4 min at room temperature. Cells were washed three times in PBS and then stained for 20 min at room temperature with two drops per millilitre of NucBlue Fixed Cell Stain ReadyProbes (Invitrogen) and ActinGreen 488 ReadyProbes (Invitrogen) to stain DNA and actin, respectively. A dissecting microscope was used to visually confirm that cells remained on the coverslips after processing. Samples were viewed with a Leica SP8 confocal laser scanning microscope (Leica) for image acquisition. Three z-stacks of each co-culture were taken using a 63× oil lens and the following parameters: 1,024 × 1,024 resolution, pixel size 100.21 nm, speed 600, zoom factor 1.9 and z-step 0.3 mm.
Computational analysis to determine intracellular Fn
Confocal z-stacks from bacterial–eukaryotic co-cultures of HCT116 cells co-incubated with Fna C1 (SB048, KCOM 3363 and KCOM 3764) or Fna C2 (SB001, SB010 and KCOM 2763) strains were imported into Imaris. All measurements were carried out on three different z-stacks per biological replicate, with three biological replicates. In Imaris, the bacterial surface volumes were created using the fluorescence of the FM 4-64FX membrane stain (surface detail 0.223 mm, background subtraction using diameter of largest sphere of 0.5 mm). The eukaryotic cell detection tool was used to define and ID cells using the nuclear stain and the actin stain. The nuclei were split by seed points. The detected eukaryotic cells were exported to create a cell surface mask. To define intracellular bacterial cells, the bacterial surface was classified by the shortest distance to the eukaryotic cell surface (min to −0.0000001 distance to eukaryotic cell membrane). This new classification was exported as a new ‘intracellular bacterial cell’ surface. To assess the number of eukaryotic cells with intracellular bacteria, the number of objects defined by the eukaryotic cell surface mask with internal objects defined by the ‘intracellular bacterial cell’ surface mask was counted. Statistical comparison of the percentage of HCT116 cells with intracellular Fna bacterial cells by Fna clade was carried out by applying a Welch’s t-test using GraphPad Prism v.7.0 software (GraphPad Software).
Cell length and width measurements
Fna C1 and Fna C2 strain cell dimensions were measured using Fiji with the Bioformats Plugin (required to import Leica.lif files). First, the scale of the image was set by going to Analyze, then Set Scale, and then Set 1 mm to equal 9.979 pixels (pixel size 100.21 nm). Measurements were then captured using the freehand straight-line tool from the brightest point on each cell membrane stain. Statistical comparison of cell lengths and cell width by Fna clade was carried out by applying a Welch’s t-test using GraphPad Prism v.7.0 software (GraphPad Software).
RNA sequencing
Fn strains SB010 and KCOM 3764 were grown on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Fisher Scientific). Plates were incubated at 37 °C in a Bactron600 anaerobic chamber (Sheldon Manufacturing) for 2 days. Subsequent lawns were prepared on FAA + 10% DHB plates and incubated at 37 °C in a Bactron600 anaerobic chamber for 2 days. Cells were resuspended in TSB (Becton Dickinson) and standardized to an optical density at 600 nm (OD600nm) of 0.5. The culture was split into triplicates for each condition and incubated under anaerobic conditions at 37 °C for 4 h. The conditions were as follows: TSB broth alone, TSB supplemented with 50 mM 1,2-PD (Fisher Scientific) and 20 nM vitamin B12 (Fisher Scientific) or TSB supplemented with 15 mM EA (Fisher Scientific) and 20 nM vitamin B12, for 4 h at 37 °C under anaerobic conditions. SB010 was further incubated in TSB supplemented with 20 nM vitamin B12 under the same conditions. Cells were pelleted at 8,000 r.p.m. for 5 min and washed once in 1× PBS and pelleted again under the same conditions. Cells were then washed once in RNAlater (Thermo Fisher) and pelleted again, and all supernatant was removed before storage at −80 °C. RNA was extracted using the RNeasy Extraction Kit (Qiagen) for Illumina Stranded RNA library preparation with RiboZero Plus rRNA depletion. RNA library was sequenced to a minimum read count of 12 million paired-end reads.
Mouse model experiments
Multiple intestinal neoplasia (ApcMin+/−) mice were purchased (Jackson Laboratory, strain No. 002020). Female mice aged 6–8 weeks old were used for two experimental trials with three treatment arms each. Mice were randomly assigned to treatment arms. Mice were treated with streptomycin (2 mg ml−1; Sigma Aldrich) in drinking water for 7 days and then treated with 1.5% dextran sodium sulfate (MP Biomedical) in drinking water for 7 days to induce colitis and facilitate colonic tumours. Mice were then supplied with normal water for 24 h before receiving an oral gavage of Fna strains. Treatment arm 1 mice each received a 200 µl volume of PBS vehicle control, arm 2 mice each received 1 × 109 Fna clade 1 (Fna C1) cells in a 200 µl volume, and arm 3 mice each received 1 × 109 Fna clade 2 (Fna C2) cells in a 200 µl volume. The Fna C1 slurry was an equal mix of strains KCOM 3363, KCOM 3764 and SB048, and the Fna C2 slurry was an equal mix of strains SB001, SB010 and KCOM 2763. Strain mixes instead of single-strain representatives were chosen to capture a greater proportion of Fna clade-specific genes. Fna strains were grown on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Fisher Scientific). Plates were incubated at 37 °C in a Bactron600 anaerobic chamber (Sheldon Manufacturing) for 2–3 days. Subsequent lawns were prepared on FAA + 10% DHB plates and incubated at 37 °C in a Bactron600 anaerobic chamber for 2 days. For each Fna strain, cells were resuspended in PBS. Strain mixes were prepared by volume on the basis of OD600nm standardized by each strain’s colony-forming units per millilitre at OD600nm = 1 (Fna C1: KCOM 3363 6.71 × 107, KCOM 3764 7.27 × 107, SB048 1.97 × 108; Fna C2: SB001 7.61 × 107, SB010 5.00 × 108, KCOM 2763 1.82 × 108) for an equal mix of cells from each Fna C1 and each Fna C2 strain. Mice were monitored until the end-point (6 weeks post-gavage) when the mice were 15–17 weeks old. The Fred Hutchinson Cancer Center Animal Care and Use Committee approved all experimental protocols (IACUC PROTO202100004). All animal work complied with relevant ethical guidelines. Mice were housed on a 12-h light/12-h dark cycle with controlled temperature (65–75 °F (about 18–23 °C)) and humidity (40–60%). Maximal tumour size depended on the number of palpable tumours (1 tumour, maximum 2 cm diameter; 2 tumours, maximum 1.5 cm diameter; ≥3 tumours, maximum under veterinary discretion) and these limits were not exceeded. Intestinal sections from all mice (n = 8 per arm) were blindly assessed by pathology for intestinal adenoma load. To assess differences in intestinal adenoma load by treatment arm, P values were calculated by applying a one-way ANOVA using GraphPad Prism v.7.0 software (GraphPad Software).
Intestinal metabolomics analysis
Metabolomic profiling was conducted using ultrahigh-performance liquid chromatography–tandem mass spectrometry by the metabolomics provider Metabolon on intestinal tissue sections from mice from the second mouse study (n = 4). The global discovery panel used by Metabolon includes 5,400+ metabolites in 70 major pathways, including metabolites of both eukaryotic and bacterial origin. Metabolic pathway enrichment analysis was carried out by Metabolon. Further analysis, including partial least squares discriminant analysis on detected metabolites and heat map clustering were carried out on sample-normalized data using MetaboAnalyst70, v.5.
Mouse faecal DNA extraction and quantitative PCR
DNA was extracted from mouse faecal samples using the Zymo Quick-DNA Microprep Kit (Zymo Research) according to the manufacturer’s instructions. A custom TaqMan primer and probe set was used to amplify Fusobacterium genus DNA (Integrated DNA Technologies) as previously described71. The cycle threshold (Ct) values for the Fusobacterium genus were normalized to the input amount of mouse faecal genomic DNA in each reaction and were assayed in at least duplicate in 20-µl reactions containing 1× final concentration TaqMan Universal PCR Master Mix (Applied Biosystems) and the Fusobacterium TaqMan primer and probe, in a 96-well optical PCR plate. A positive control and non-template control were included in each quantitative PCR run. Fusobacterium copy numbers were estimated following the generation of a standard curve with pure Fna C1 and Fna C2 DNA input. Amplification and detection of DNA was carried out with the QuantStudio 3 Real-Time PCR System (Applied Biosystems) using the following reaction conditions: 10 min at 95 °C and 40 cycles of 15 s at 95 °C and 1 min at 60 °C. Ct was calculated using the automated settings (Applied Biosystems). The primer and probe sequences for the TaqMan assay are as follows: Fusobacterium genus forward primer, 5′-AAGCGCGTCTAGGTGGTTATGT-3′; Fusobacterium genus reverse primer, 5′-TGTAGTTCCGCTTACCTCTCCAG-3′; Fusobacterium genus FAM probe, 5′-CAACGCAATACAGAGTTGAGCCCTGCATT-3′.
Biolog PM10 phenotype microarray plates
Biolog PM10 plates and corresponding IF-0a and IF-10b solutions were pre-reduced under anaerobic conditions at 4 °C overnight (AnaeroGen Gas Generating Systems, Oxoid, Thermo Fisher Scientific). Fna strains were grown on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Fisher Scientific). Plates were incubated at 37 °C in a Concept1000 anaerobic chamber (BakerRuskinn) for 24 h. Under these same anaerobic conditions, Fna cells were resuspended in 2 ml of pre-reduced IF-0a and normalized across all samples to OD600nm = 0.179 as recommended by Biolog. The final suspension was prepared by combining 0.75 ml of normalized bacterial suspension with 11.25 ml of mix B (100 ml pre-reduced IF-10b with 1.2 ml dye mix D, and 11.18 ml pre-reduced sterile water) to a final volume of 12 ml. For each PM10 plate well, 100 μl of final suspension was added. The PM10 plate was then equilibrated to aerobic conditions at room temperature for 10 min, and then incubated under anaerobic, hydrogen-free conditions for 24 h at 37 °C (AnaeroGen Gas Generating Systems, Oxoid, Thermo Fisher Scientific). Plates were imaged and absorbance at 590 nm was quantified using a plate reader (Biotek).
Glutaminase assay
Fna strains were grown on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Fisher Scientific) in a Concept1000 anaerobic chamber (BakerRuskinn) at 37 °C for 2 days. Sterile cotton swabs were used to resuspend cells in TSB (Becton Dickinson) supplemented with 2.5% yeast extract (Becton Dickinson) and 0.4 mg ml−1 l-cysteine (Alfa Aesar). Fna strains were grown in liquid culture in a Concept1000 anaerobic chamber (BakerRuskinn) at 37 °C for about 20 h. For each strain, 0.75 ml of culture standardized to OD600nm = 1 was spun down at 7830 r.p.m. The cell pellet was resuspended in 1 ml of Gls solution. Gls solution contains 0.2 g l-glutamine (Sigma Aldrich), 0.01 g bromocresol green (Sigma Aldrich), 18 g sodium chloride (Sigma Aldrich), 0.6 ml Triton X-100 (Sigma Aldrich) and 200 ml deionized water. Gls solution is filter sterilized post pH adjustment to 3.1. For each strain, a 300 μl volume was aliquoted into a conical-bottom 96-well plate in triplicate and incubated anaerobically at 37 °C for 2 h. The plate was spun down for 1 min at 3,000 r.p.m. The supernatant was transferred to a flat-bottom 96-well plate and absorbance at 600 nm was quantified using a plate reader (Biotek).
Acid resistance in simulated gastric fluid
Fna strains were grown on FAA plates (Oxoid, Thermo Fisher Scientific) supplemented with 10% DHB (Fisher Scientific) in a Concept1000 anaerobic chamber (BakerRuskinn) at 37 °C for 1–2 days. The cells were resuspended in 50 ml TSB (Becton Dickinson) supplemented with 2.5% yeast extract (Becton Dickinson) and 0.4 mg ml−1 l-cysteine (Alfa Aesar). The cells were grown in liquid culture in a Concept1000 anaerobic chamber (BakerRuskinn) at 37 °C for 25 h. All strains were standardized to an OD600nm = 1 in 5 ml of supplemented TSB, simulated gastric fluid (Biochemazone) at pH 3 or simulated gastric fluid supplemented with 10 mM glutamate (Sigma Aldrich) at pH 3. Every 10 min, 10 μl of each suspension was spotted on FAA + 10% DHB plates. Plates were incubated anaerobically in a Concept1000 anaerobic chamber (BakerRuskinn) at 37 °C for 3 days.
Patient specimens
All patient tumour tissue included in the analysis was diagnosed colorectal adenocarcinoma. For patient cohort 1, patients signed an informed consent for the collection and analysis of their tumour specimens. The use of patient specimens for this work was approved by the Fred Hutchinson Cancer Center Institutional Review Board under protocol numbers RG 1006552, 1005305, 1006664 and 1006974. Patient age, sex and ethnicity were not selection criteria for specimen acquisition. For microbial culturing efforts, primary CRC tumours that were treatment naive were prioritized. For patient cohort 2, samples from BioProject PRJNA362951 were used.
Bacterial 16S rRNA gene sequencing
DNA was extracted from patient tissue as described previously6 and processed with the ZymoBIOMICS Service – Targeted Metagenomic Sequencing (Zymo Research). Bacterial V3–V4 16S ribosomal RNA gene-targeted sequencing was carried out. The V3–V4 targeting primers have been custom-designed by Zymo Research to provide the best coverage of the 16S gene while maintaining high sensitivity. They are based on the general bacterial 16S rRNA gene primers 341F (CCTACGGGNGGCWGCAG) and 805R (GACTACHVGGGTATCTAATCC), which amplify the V3–V4 region of the 16S rRNA gene. The amplification was carried out at a higher annealing temperature to ensure only bacterial sequences were amplified. An extraction control was included and showed no amplification during the library preparation (run to 42 cycles). The sequencing library was prepared using the AccuBIOME Amplicon Sequencing Kit (Zymo Research), in which PCR reactions were carried out in real-time PCR machines to prevent PCR chimera formation. The amplicon libraries were cleaned up with Zymo Research’s Select-a-Size DNA Clean & Concentrator (>200-base-pair fragments were kept), quantified with TapeStation, normalized and pooled together. The final library was quantified with quantitative PCR and sequenced on an Illumina MiSeq with a v3 reagent kit (600 cycles). The sequencing was carried out with >10% PhiX mix and in paired-end mode. Raw sequence reads were trimmed with Trimmomatic-0.33 (ref. 72). Fna clade-specific amplicon sequence variants were designed by the provider CosmosID. We provided 16S rRNA gene sequences for all Fna C1 and Fna C2 strains. As the 16S sequence of Fna C1 branched closely with Fnv (Extended Data Fig. 2a), we additionally provided the 16S rRNA gene sequences for all Fnv strains, to ensure the specificity of an Fna C1 amplicon sequence variant that would not detect Fnv. A custom SILVA database was generated using these 16S rRNA gene sequences and SILVA 138.1 SSU Ref. NR99 version, and the DADA2 version of the species training set. First, all sequences in the SILVA database that matched with supplied sequences were removed from SILVA. Next, the custom sequences were added into the SILVA database file, in which the species names were appended on the basis of supplied metadata info (Fna C1, Fna C2 or Fnv). Analysis on this database was then run through the nf-core AmpliSeq pipeline, with the parameters –FW_primer CCTACGGGRSGCAGCA, –RV_primer GACTACHVGGGTATCT, –trunc_qmin 20, –trunc_rmin 0.2, –max_ee 6, –min-frequency 1, –picrust, and — dada_ref_tax_custom.
Meta-analysis of Fna clades in relation to CRC using publicly available shotgun metagenomic samples
To study the association between each Fna clade and CRC, we profiled shotgun stool metagenomic samples from 9 publicly available cohorts (Supplementary Table 22), for a total of 627 patients with CRC and 619 healthy individuals using MetaPhlAn4 (ref. 63; https://github.com/biobakery/biobakery/wiki/metaphlan4) against an Fna clade-specific database generated from our Fna genomes, which are available at the National Centre for Biotechnology Information (NCBI) under the BioProject accession number PRJNA549513. A distinct species-level genome bin (SGB)73 could be identified for each Fn subspecies and Fna clade (Fna C1: SGB6013, Fna C2: SGB6007, Fnn: SGB6011, Fnp: SGB6001, Fnv: SGB6014). Each SGB was associated with the sample condition fitting an ordinary least squares model of the shape: arcsin-squared-root-transformed SGB abundance ~ study condition + C(sex) + age + BMI + sequencing depth of sample. For each model, an adjusted standardized mean difference between the two study conditions was extracted as previously described74: standardized mean difference = (t × (n1 + n2))/(sqrt(n1 + n2) × sqrt(n1 + n2 − 2)), in which t defines the t-score of the corresponding variable, n1 is the number of samples in the zero class, n2 is the number of samples in the one class, and n1 + n2 − 2 are the degrees of freedom for the model. Corresponding standard errors were computed as: s.e. = sqrt(((n1 + n2 − 1)/(n1 + n2 −3)) × (4/(n1 + n2)) × (1 + (((standardized mean difference)2)/8))). Statistical significance was assessed by the two-tailed Wald test. Effect sizes were pooled and analysed using random-effect meta-analysis75 using the Paule–Mandel heterogeneity estimator76. The statistical significance of the meta-analysis was computed as the z-score of the null hypothesis that the average effect is zero75. All P values were corrected using the Benjamini–Yakuteli method.
Mapping of putative eut, pdu and gdar operons in publicly available metagenomic samples
To assess the presence of putative eut, pdu and gdar system operons in patients with CRC compared to healthy individuals, we profiled shotgun stool metagenomic samples from 9 publicly available cohorts (Supplementary Table 22), for a total of 627 patients with CRC and 619 healthy individuals. Metagenomic samples were mapped against the Fna SB010 eut, pdu and gdar operons using Bowtie2 (version 2.4.5, –sensitive parameter)77. Breadth and depth of coverage of each gene in the operons was assessed using the breadth_depth.py script of the CMSeq tool (parameters –minqual 30 –mincov 1)78. Detected genes had a breadth of coverage threshold above 50%. For eut and pdu results, putative operons had a threshold of presence of 90% of eut and pdu genes relative to the Fna SB010 operon structures. For gdar results, putative operons had a threshold of presence of 100% of gdar genes relative to the Fna SB010 operon structure.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.