Strange IndiaStrange India

Clinical samples and information

Peripheral blood samples were collected from ten patients enrolled in valemetostat phase I (NCT02732275) or phase II (NCT04102150) trials. No statistical methods were used to determine sample size since this study was exploratory. The availability of patient recruitment thus determined the sample size. All patients with relapsed ATL cases were categorized into clinical subtypes according to Shimoyama’s criteria38. This translational study was approved by the Institutional Review Board of the institutes (the University of Tokyo, the University of Ryukyus and Daiichi Sankyo Co., Ltd.). Written informed consents were obtained from all patients. PBMCs from patients with ATL were isolated by Ficoll separation (Ficoll-Paque, GE Healthcare). Clinical information, including abnormal lymphocytes and sIL-2R, was provided by the hospitals. The HTLV-1 proviral load measurement was previously described14. In brief, quantitative multiplex real-time PCR was performed with two sets of primers specific for the HTLV-1 provirus and the human gene encoding the RNase P enzyme. The proviral load was expressed as copy numbers per 100 PBMCs, assuming that infected cells had one copy of the integrated HTLV-1 provirus per cell. All clinical samples and data are provided in Supplementary Table 1.

Cell culture

ATL-derived TL-Om1 cells were provided by an established researcher K. Sugamura. ATN-1 cells were purchased from the RIKEN BRC cell bank (RCB1440). The diffuse large B cell lymphoma cell line WSU-DLCL2 was purchased from DSMZ (ACC 575). HEK293T cells were purchased from the American Type Culture Collection (CRL-3216). HEK293FT cells were purchased from Thermo Fisher Scientific (R70007). These cell lines were verified by each cell bank or established researchers and monitored for cross-contamination. The HTLV-1-infected cell lines had been authenticated based on the provirus integration sites and somatic mutations by panel-based targeted sequencing18. Cell-surface expressions of CD4 and CADM1 were validated by flow cytometry. HTLV-1-infected, patient-derived tumour cell lines were established by long-term culture in complete medium RPMI1640 (Invitrogen) with 20% FBS (GIBCO) and 10 ng ml−1 IL-2 (Peprotech). Genetic mutations and clonality of the propagating cells were confirmed by targeted sequencing. Commonly misidentified cell lines were not used in this study. The cell lines were also tested for mycoplasma contamination using mycoplasma detection PCR (6601, Takara) and were negative for mycoplasma contamination. Normal (HTLV-1-uninfected) CD4+ T cells were obtained from Lonza. All lymphoma cell lines were cultured in RPMI1640 with 10% FBS and antibiotics (Gibco). 293T and 293FT cells were cultured in DMEM (Nissui) with 10% FBS and antibiotics. All cell lines and primary cultures were maintained at 37 °C with 5% CO2.

Flow cytometry

ATL cell populations were obtained using a HAS-flow method as previously described20. Single-cell suspensions of lymphocytes were stained with fluorescent-labelled antibodies. An unlabelled CADM1 antibody (CM004-6, clone 3E1) and an isotype control chicken IgY antibody (2:100; PM084) were purchased from MBL. These were biotinylated (primary amine biotinylation) using biotin N-hydroxysuccinimide ester (Sigma-Aldrich). Anti-CD14–Pacific orange antibody (MHCD1430, clone TuK4) was purchased from Invitrogen. All other antibodies were obtained from BioLegend. Cells were stained using a combination of anti-CADM1–biotin (1:100; CM004-6, MBL), anti-CD7–APC (5:100; clone CD7-6B7), anti-CD3–APC–Cy7 (5:100; clone SK7), anti-CD4–Pacific blue (5:100; clone RPA-T4) and anti-CD14–Pacific orange (5:100) antibodies. After washing, phycoerythrin-conjugated streptavidin (2:100; SA10041, Thermo Fisher Scientific for phase I study; 1:80; 554061, BD Biosciences for phase II study) was applied. Propidium iodide (Sigma-Aldrich) or 7-AAD (51-68981, BD Biosciences) was added to the samples to stain dead cells immediately before flow cytometry.

For intracellular staining of the H3K27me3, we improved the HAS-Flow method. First, PBMCs (5 × 106) were washed and incubated with Ghost Dyes viability dye (Tonbo Biosciences). Then, the cells were stained using a combination of anti-CD3–APC–Cy7, anti-CD4–Pacific blue, anti-CD7–phycoerythrin–Cy7 (5:100; clone M-T701), anti-CD14–Pacific orange (or BV510 for phase II study), anti-CADM1–biotin and streptavidin–phycoerythrin. The surface-stained cells were then fixed and permeabilized using BD Cytofix fixation buffer (554655, BD Biosciences) and BD Phosflow Perm buffer IV (560746, BD Biosciences) according to the manufacturer’s instructions. After washing, the permeabilized cells were stained with anti-H3K27me3–Alexa Flour 488 (1:200; 5499, clone C36B11, Cell Signaling Technology), anti-histone H3–Alexa Fluor 647 (1:400; 12230, clone D1H2, Cell Signaling Technology), anti-rabbit IgG isotype control–Alexa Flour 488 (1:400; 4340, clone DA1E, Cell Signaling Technology) and anti-rabbit IgG isotype control–Alexa Flour 647 (1:400; 3452, clone DA1E, Cell Signaling Technology). FACSAria II or FACSLyric instrument (BD Biosciences) was used for multicolour flow cytometry and fluorescence-activated cell sorting. The collected data were analysed by FlowJo software (v10.7.1, Tree Star). CD4+CADM1+CD7 cells and CD4+CADM1CD7+ cells were analysed as malignant ATL cells and non-malignant cells, respectively. Tumour H3K27me3 levels (mean fluorescence intensity) were calculated by normalization with the data of normal CD4+ T cells.

Targeted deep sequencing

Genomic DNA from enriched cell populations, PBMC, buccal swabs and cell lines were extracted using the QIAamp DNA Blood Mini Kit (Qiagen). Target capture was conducted using the SureSelect Target Enrichment System (Agilent Technologies).

To comprehensively cover genes involved in ATL, 280 human genes were selected, including 50 genes frequently mutated in ATL16 and 190 genes frequently mutated in haematological and solid malignancies. Agilent SureDesign web-based application was used for capture bait design as previously described18. The sequence data were obtained using the HiSeq2500 or NovaSeq 6000 system (Illumina) with 100-bp paired-end reads. The sequenced data were aligned to the human reference genome hg38 by BWA (v0.7.15) software. The PCR duplicates were removed using Picard (v2.92) and SAMtools (v1.2) software39. Matched buccal DNA was used as matched normal controls to call somatic mutations. The somatic mutation candidates were called using MuTect2 from GATK (v4.0.12) software40 and annotated with ANNOVAR (v20191024)41. Candidate mutations, with (1) 5 or more variant reads in tumour samples, (2) a variant allele frequency in tumour samples 0.01 or more, (3) read depth of 200 or more, and (4) tumour variant with a normal variant ratio of 2 or more, were adopted and further filtered by excluding synonymous SNVs.

Clonality analysis

The clonality analysis of HTLV-1-infected cells was performed by high-throughput sequencing-based mapping of proviral integration sites18. To designate the virus integration sites, sequence reads were aligned to human reference genome hg38 and the virus genome (NC_001436.1) by BWA. Paired-end reads spanning the viral and human genomes and soft-clipped reads (15 bp or more soft-clipped region) were extracted using Perl scripts and then validated by Blastn (v2.6.0+). The clonality was calculated as the population size of each clone by counting the extracted reads at host–provirus junction sites. We used PyClone (v0.13.0)42 for the analysis of subclonal population structure and reconstruct hierarchical trees. PyClone is based on a Bayesian clustering method, which uses a Markov chain Monte Carlo-based framework to estimate cellular prevalence values using somatic mutations. The somatic mutation candidates for PyClone were called using MuTect2, with (1) 5 or more variant reads in tumour samples, (2) a variant allele frequency in tumour samples of 0.05 or more, (3) a read depth of 200 or more, and (2) tumour variant with a normal variant ratio of 2 or more. The clonal composition was investigated based on the β-binomial emission model, through which a set of clones with a discrete set of mutations (mutational clusters) were imputed together with their estimated clone size. The process of the clonal evolution was estimated by extrapolation of the estimated clone sizes at all tested time points. The hierarchical trees with imputed mutational subclusters were depicted by ClonEvol (v0.99.11) based on the results of clustering and cellular prevalence from the PyClone model.

Whole-genome sequencing

For whole-genome sequencing, somatic variant detection was carried out using next-generation sequencing by Azenta Japan Corporation (formerly, Genewiz Japan). In brief, genomic DNA from patient PBMC and matched buccal swabs were quantified and qualified by NanoDrop, Qubit dsDNA HS assay (Thermo Fisher) and agarose gel electrophoresis. Of genomic DNA, 1 µg was sheared into approximately 350 bp in size by an ultrasonicator (Covaris) followed by DNA purification and confirmation of DNA fragment size. Essentially, an entire amount of fragmented genomic DNA was used for library preparation with a PCR-free method (MGIEasy PCR-Free DNA Library Prep Set, MGI tech). The resulting whole-genome sequencing libraries were quantified by Qubit dsDNA HS assay and their fragment size distribution was confirmed by TapeStation D1000 ScreenTape (Agilent). The libraries in the double-stranded DNA form were further processed into single-stranded circular DNA, which is the final form of the MGI sequencing library. The single-stranded circular DNA libraries were quantified by Qubit ssDNA Assay Kit (Thermo Fisher) and used for generating DNA nanoballs by rolling circle replication reaction. DNA nanoballs were then loaded into a flow cell for sequencing on DNBSEQ-G400 platform (MGI tech) with 150 bp paired-end configuration, according to the manufacturer′s instructions, yielding approximately 320 Gb in data amount per library. Sequence data cleaning was performed by the Cutadapt software (v1.9.1)43. The Sentieon pipeline ( was used to call germline single-nucleotide variant/indel and somatic variations. Copy number variation was detected by Control-FREEC44.


Total RNA of each sample was extracted using TRIzol reagent (Invitrogen) and quantified and qualified by the Agilent 2100 Bioanalyzer (Agilent Technologies), NanoDrop (Thermo Fisher Scientific) and 1% agarose gel. Of total RNA with an RNA integrity number (RIN) value above 7, 20 ng was used following library preparation. The library preparation and sequencing were processed and analysed by Genewiz. The libraries with different indices were multiplexed and loaded on an Illumina HiSeq instrument according to the manufacturer’s instructions (Illumina). Sequencing was carried out using a 2 × 150-bp paired-end configuration; image analysis and base calling were conducted by the HiSeq control software (HCS v2.2.38 or later) plus OLB plus GAPipeline-1.6 (Illumina) on the HiSeq instrument. For quality control, to remove technical sequences, including adapters, PCR primers or fragments thereof, and quality of bases lower than 20, pass filter data of fastq format were processed by Trimmomatic (v0.30) to be high-quality clean data. For mapping, Hisat2 (v2.0.1) was used to index the reference genome sequence. Finally, clean data were aligned to the reference genome via the software Hisat2.


The single-cell RNA-seq library was constructed using the Chromium Controller and Chromium NextGEM Single Cell ATAC Reagent Kits v1.1 (10x Genomics) following the standard manufacturer’s protocols. To collect live cells for ATAC-seq, PBMC cryovials (1–10 × 106 cells per 1 ml of CELLBANKER 1 (Zenoaq resource)) were removed from liquid nitrogen or −80 °C freezer and warmed in a 37 °C water bath. Cells were then pelleted by centrifugation at 500g for 5 min and resuspended in PBS. After twice washing with PBS, nuclei isolation was conducted by the 10x Chromium standard protocol. Chilled lysis buffer (100 µl) was added to the pellet, then incubated for 3 min on ice. Chilled wash buffer (1 ml) was added immediately to the lysed cell, followed by two washes. Then, the lysed cell was resuspended in an appropriate volume of chilled diluted nuclei buffer, and 1.6 × 104 nuclei were immediately incubated in a transposition mix to recover 10,000 nuclei. After transposition, the sample was loaded onto the 10x Chromium controller to recover 10,000 nuclei. Gel beads were prepared according to standard manufacturer’s protocols. Oil partitions of single nuclei with oligo-coated gel beads (GEMs) were captured and thermal cycling was performed, resulting in single-stranded DNA tagged with a 10x cell barcode. The library was sequenced using the NovaSeq 6000 system (Illumina) according to the manufacturer’s instructions. For ATAC libraries, sequencing was performed using a 50 × 49-bp paired-end configuration following the manufacturer’s protocol. After sequencing analysis, fastq files were created by the Cell Ranger atac ver2.0.1 mkfastq pipeline (10x Genomics). The obtained fastq files were mapped to the reference genome provided by 10x Genomics (GRCh38). The Cell Ranger atac count pipeline (v2.0.1) was used to perform demultiplexing, aligning reads, filtering, peak calling, clustering and motif activity analyses, using default parameters. The Cell Ranger data were imported into the Loupe Cell Browser software (v6.0.0) for t-SNE-based clustering, heat map generation and promoter activity plots.


The scRNA-seq library was constructed using the Chromium Controller and Chromium Single Cell 5′ Reagent Kits and 3′ Reagent Kits v2 (10x Genomics) following the standard manufacturer’s protocols. To collect live cells for scRNA-seq, PBMC cryovials (1–10 × 106 cells per 1 ml of CELLBANKER 1) were removed from liquid nitrogen or −80 °C freezer and warmed in a 37 °C water bath. Cells were then pelleted by centrifugation at 500g for 5 min and resuspended in PBS. After twice washing with PBS, cells were then pipetted through a 40-μm filter to remove cell doublets and contamination. Cell viability (more than 60%) was confirmed by trypan blue staining. The collected single-cell suspension from PBMCs (1.6 × 104 live cells per sample) was immediately loaded onto the 10x Chromium Controller to recover thousands of cells from each subpopulation for library preparation and sequencing. Gel beads were prepared according to the standard manufacturer’s protocols. Oil partitions of single cell with GEMs were captured and reverse transcription was performed, resulting in cDNA tagged with a cell barcode and unique molecular index (UMI). The library was sequenced using the NovaSeq 6000 system (Illumina) according to the manufacturer’s instructions. Sequencing was carried out using a 1 × 91–98-bp single-end configuration (default setting), which is sufficient to align confidentially to the transcriptome. After sequencing analysis, fastq files were created by the Cell Ranger ver3.1.0 mkfastq pipeline (10x Genomics). The obtained fastq files were mapped to the reference genome provided by 10x Genomics (GRCh38). The Cell Ranger count pipeline (v3.1.0) was used to perform demultiplexing, aligning reads, filtering, clustering and gene expression analyses, using default parameters. In brief, after read trimming, Cell Ranger used an aligner called STAR, which performs splicing-aware alignment of reads to the genome. Cell Ranger further aligned exonic and intronic confidently mapped reads to annotated transcripts by examining their compatibility with the transcriptome. Only uniquely mapping exonic reads were carried forward to UMI counting. After the UMI filtering steps with default parameters and expected cell counts, each observed barcode, UMI and gene combination was recorded as a UMI count in the feature–barcode matrix. The workflow also performed an improved calling cell barcodes algorithm, identified the primary mode of high RNA content cells and also captured low RNA content cells.

After data processing, we recovered quality-assured data for secondary analysis of gene expression. To correct batch effects between time points, we used a Cell Ranger merge algorithm. To regress out the cell–cell variation in gene expression driven by batch and cluster data with corrected data in different time points, we used a standard Seurat v3 integration workflow with functions FindIntegrationAnchors() and IntegrateData(). The Cell Ranger data or batch-corrected data were imported into Loupe Cell Browser software (v6.0.0) for t-SNE-based clustering, heat map generation and gene expression distribution plots.

Single-cell multiome analysis

The single-cell multiome (scMultiome) libraries were constructed by using Chromium Controller and 10x Genomics Chromium Next GEM Single Cell Multiome ATAC plus Gene Expression following the standard manufacturer’s protocols (CG000365 Rev C, CG000338 Rev F, 10x Genomics). The libraries were sequenced using the NovaSeq 6000 system (Illumina) according to the manufacturer’s instructions. For ATAC libraries, sequencing was performed using a 50 × 49-bp paired-end configuration. RNA library sequencing was performed using a 28 × 91-bp paired-end configuration. The scMultiome dataset was first processed using Cell Ranger ARC v2.0.0 (Cell Ranger ARC, 10x Genomics). BCL files were converted into fastq using the command cellranger_ark mkfastq with default parameter. The fastq files were then processed by cellranger_ark count and merged by cellranger-arc aggr. To remove batch effect, the scMultiome RNA dataset was processed by Seurat (v4.3.0)24 reciprocal principal component analysis (clustering parameters principal component analysis dimensions 1–30, resolution 0.5). The scMultiome ATAC dataset was recounted by Signac (v1.9.0)45 using the merged peak bed files and processed by Harmony (v0.1.1)46.

Single-cell mutation identification and analysis

RNA variants from scRNA-seq data were validated from curated BAM files based on the results of Cell Ranger. For each cell barcode in the filtered Cell Ranger barcode list, and each somatic variant in the targeted sequencing data, variant bases were identified. Only reads with a Chromium cellular barcode tag and a Chromium molecular barcode tag were included. We then obtained the cell-associated tag for downstream analysis of UMIs. Chromium cellular barcode tags with the variant reads extracted by SAMtools were defined as at least one mutant read detected and mapped on each t-SNE projection using Loupe Cell Browser software. Almost variants were validated by manual review to identify mutant cells accurately. One-sided Fisher’s exact tests were used to identify cell clusters that were enriched for somatic mutations (P < 0.05).

Virus reads and host–virus chimeric reads from single-cell data

For detection of virus reads from scATAC-seq and scRNA-seq data, we processed Cell Ranger GRCh38-aligned sequence data. No-map and soft-clipped reads (more than 20 bp soft-clipped) were extracted using Python scripts. The pass-filter data of fastq format were processed to remove adopter and polyA sequences. The high-quality clean data were then aligned to the human reference genome (hg38) and virus genome (NC_001436.1) via the software STAR. For detection of cells expressing virus genes, Chromium cellular barcode tags with virus reads were defined as at least one virus read detected. Almost virus-aligned reads were derived from the antisense strand. Both host-aligned and virus-aligned soft-clipped reads were extracted as host–virus chimeric reads. Genomic breakpoints of chimeric reads were analysed from supplementarily mapped data from STAR alignment to link the clone-specific chimeric reads with the viral integration sites identified in the corresponding clones. The extracted Chromium cellular barcode tags with virus antisense reads or clone-specific host–virus chimeric reads were mapped on t-SNE projection using the Loupe Cell Browser. One-sided Fisher’s exact tests were used to identify cell clusters that were enriched for virus reads (P < 0.05).

Cluster assignment and single-cell data analysis

Promoter activity (promoter sum) and expression patterns of CD4, CADM1 and CD7 were used and overlaid on the t-SNE to identify ATL tumour clusters using the Loupe Cell Browser. Chromium cellular barcodes with HTLV-1-derived antisense transcripts (scRNA-seq) and proviral DNA reads (scATAC-seq) were overlaid on the t-SNE. The HTLV-1-derived reads served for inference of infected cells (P < 0.05). Infected clone-specific host–virus chimeric reads were significantly enriched in each cluster (P < 0.05). To detect the mutation-harbouring clones estimated by PyClone, RNA variants from scRNA-seq data were validated from curated BAM files based on the results of Cell Ranger. Chromium cellular barcode tags with variant reads were defined as at least one mutant read detected and mapped on each t-SNE projection (P < 0.05). log2 Fold change and median-normalized average values of assigned clusters were obtained via the Loupe Cell Browser and used in the following analysis of differentially expressed genes within each cluster. Manual clustering based on expression patterns was curated by original Python scripts or polygonal selection tool (Loupe Cell Browser interface).


Tumour cells (1 × 107) sorted by surface markers (CD4+CADM1+CD7) or normal CD4+ T cells from HTLV-1-negative healthy donors were fixed by adding 1/10 volume of freshly prepared formaldehyde solution (11% (v/v) formaldehyde, 100 mM NaCl, 1 mM EDTA (pH 8.0) and 50 mM HEPES (pH 7.9)) to the existing media or PBS and incubated for 15 min at room temperature. Fixation was stopped by adding 1/20 volume of a 1.25 M glycine solution and incubating for 5 min at room temperature. Subsequently, cells were collected and washed twice with chilled PBS with 0.5% (v/v) Igepal. The cell pellet was snap-frozen on dry ice. Further processing and ChIP experiments including chromatin extraction, fragmentation, antibody precipitation and library preparation were performed at Active Motif using validated antibodies to H3K27me3 (39155, polyclonal, Active Motif), H3K27ac (39133, polyclonal, Active Motif) and SUZ12 (39357, polyclonal, Active Motif).

Illumina sequencing libraries were prepared from the ChIP and input DNAs by the standard consecutive enzymatic steps of end-polishing, dA-addition and adaptor ligation. After a final PCR amplification step, the resulting DNA libraries were quantified and sequenced on NextSeq 500 from Illumina (75-nt reads, single end). Reads were aligned to the human genome (hg38) using the BWA algorithm (v0.7.12). Duplicate reads were removed, and only uniquely mapped reads (mapping quality ≥ 25) were used for further analysis. Alignments were extended in silico at their 3′ ends to a length of 200 bp, which is the average genomic fragment length in the size-selected library, and assigned to 32-nt bins along the genome. The resulting histograms (genomic ‘signal maps’) were stored in bigWig files. Peak call for H3K27me3 and H3K9me3 were performed using the SICER algorithm (v1.1) with a cut-off P = 10−10. Peak call for H3K27ac was performed using the MACS algorithm (v2.1.0) with a cut-off P = 10−7. Peaks that were on the ENCODE blacklist of known false ChIP–seq peaks were removed. Signal maps and peak locations were used as input data to the Active Motifs proprietary analysis program, which creates Excel tables containing detailed information on sample comparison, peak metrics, peak locations and gene annotations. EaSeq software (v1.111)47 was also used to calculate each peak value and create heat maps. For the TSS plot, the ChIP–seq dataset was normalized by input data and visualized by Deeptools (v3.3.1)48.

DNA methylation profiling

For DNA methylation profiling, genomic DNA was extracted from enriched tumour cell populations (CD4+CADM1+CD7) and cell lines using the QIAamp DNA Blood Mini Kit (Qiagen). DNA methylation levels were analysed using the Infinium MethylationEPIC BeadChip (more than 850,000 probes) (Illumina). Quality testing of the double-stranded DNA was performed by measuring absorbance with NanoDrop2000 and fluorescence with Qubit (Thermo Fisher Scientific), followed by quality testing by agarose gel electrophoresis. Genomic DNA was used for bead array analysis by iScan (Illumina) according to the Infinium HD methylation protocol guide, manual protocol (15019519 v01). Bisulfite conversion, hybridization and further data processing were performed at Takara Bio. In brief, bisulfite conversion of 250 ng of genomic DNA was performed using the EZ DNA Methylation Kit (Zymo Research). The bisulfite-converted DNA was alkaline denatured and subjected to enzymatic whole-genome amplification. The amplified genomic DNA was fragmented by enzyme, purified by isopropanol precipitation and resuspended in buffer. The resuspended DNA was heat denatured and applied to the Infinium MethylationEPIC BeadChip for hybridization at 48 °C in an oven for approximately 23 h. After hybridization, the BeadChip was washed with buffer, and a single nucleotide labelled at the probe end was incorporated by a single-nucleotide elongation reaction. The hybridized genomic DNA was then denatured, removed and stained with a fluorescent dye-labelled antibody against the incorporated labelled nucleotide. The stained BeadChip was washed, coated, dried and then fluorescence images were acquired using iScan. Normalization by background subtraction and internal controls was performed using GenomeStudio (V2011.1) or Methylation Module (v1.9.0) to analyse the acquired fluorescence image data. Each CpG site was annotated by distance from the TSS of the genes (hg38). Only CpG sites within ±5 kb of the TSS were used for further integrative analyses. The β-value was used as the methylation level (%), and probes that fluctuated more than 10% were defined as differentially methylated sites. BigWig files were created using the Enhancer Linking by Methylation/Expression Relationship (ELMER) package with the function createBigWigDNAmetArray().

For whole-genome DNA methylation analyses of patient specimens and established resistant models, we performed EM-seq49. The libraries of EM-seq were prepared from 50 ng of DNA using the NEBNext Enzymatic Methyl-seq Kit (New England BioLabs). Paired-end sequencing of 150 bp was performed using NovaSeq 6000. The EM-seq dataset was adapter-timmed by Trim Galore v0.6.7 with the default parameters. The trimmed reads were aligned to hg38 using Bismark (v0.22.3)50. PCR duplicates were removed using deduplicate_bismark with default parameter. The methylation information was extracted with a bismark_methylation_extractor. The methylation information had a filtered depth of more than five. Differential methylated regions were extracted using metilene (v0.2-8)51 (P < 0.05). All methylated CpG sites were also analysed at single-nucleotide resolution from the EM-seq data. The methylation information bedGraphs of bismark outputs were converted to BigWig by bedGraphToBigWig and visualized by Integrative Genomics Viewer. Methylation levels of target genes were calculated by Deeptools v3.3.1 and visualized by Deeptools plotProfile.

Bioinformatic analysis

The Integrative Genomics Viewer tool52 was used for visualizing and interpreting the results of DNA-seq, RNA-seq, ChIP–seq and DNA methylation data. For differentially expressed gene analysis, HTSeq (v0.6.1) estimated gene and convert read counts to transcripts per million from the paired-end clean data. Selected genes were subjected to the hierarchical clustering analysis using the iDEP.91 pipeline that contains the DESeq2 package53. Gene set enrichment analysis54 was performed using GSEA software (v4.1.0) ( with 1,000 permutations. Gene sets used in this study were selected from the MSigDB hallmark gene sets ( Significantly enriched gene sets were evaluated by normalized enrichment score (NES) and nominal P value (P < 0.001). Gene ontology analysis was performed by DAVID Bioinformatics Resources (

Data visualization

Box plots, beeswarm plots, violin plots, hierarchical clustering and correlation matrix were analysed and visualized by using R (v3.2.3). Box plots are defined as follows: the middle line corresponds to the median; the lower and upper hinges correspond to first and third quartiles; the upper whisker extends from the hinge to the largest value no further than 1.5 times the IQR from the hinge (where the IQR is the interquartile range or distance between the first and third quartiles); and the lower whisker extends from the hinge to the smallest value at most 1.5 times the IQR of the hinge. All data points are overlaid on the box plot.

Molecular dynamics simulation

The valemetostat-bound PRC2 structure was modelled as previously described11. Amino acid residue numbers for EED in the PRC2 model were renumbered based on UniProt O75530 isoform 1 (identifier: O75530-1). Binding free-energy changes (ΔΔGs) of valemetostat to PRC2 single-point mutants (EZH2(Y111S/Y111C/Y111H/Y111N), EZH2(Y661N) and EED(H213R)) relative to wild-type PRC2 were predicted by the free-energy perturbation (FEP) method55,56 using FEP protein mutation for ligand selectivity (Schrödinger release 2021-3: FEP+, Schrödinger, 2021) with default settings and the OPLS4 force field57. The value of ΔΔG for EZH2(Y111H) was defined as the mean of ΔΔGs for EZH2(Y111) mutated to histidine neutral tautomers, Nδ-protonated (Hid) and Nε-protonated (Hie), respectively. Hydrogen bonds, hydrophobic interactions, ionic interactions, and water bridges between valemetostat and wild-type or mutant PRC2s were examined throughout 5-ns molecular dynamics simulations using edge analysis of FEP protein mutation analysis (Schrödinger release 2021-3: FEP+, Schrödinger) to elucidate the effect of these mutations in PRC2 on valemetostat binding. Structural model figures were generated using PyMOL (v2.4.0, Schrödinger). Relative affinities of valemetostat to PRC2 mutants were predicted by FEP simulations and calculated as wild-type dissociation constant (Kd)/(wild-type or mutant Kd) = exp(−ΔΔG/RT), where R is the ideal gas constant (1.987 cal K–1 mol–1) and T is the absolute temperature (298.15 K).

Evaluation of PRC2 mutants

EZH2 and EED cDNAs were subcloned into the pME-FLAG vector. Point mutagenesis for generating resistant mutants was accomplished with the PrimeSTAR Mutagenesis Basal Kit (Takara) and specific primer sets (Supplementary Table 6). The generated mutant cDNAs were confirmed by Sanger sequencing. Transient transfection of FLAG-tagged cDNA in 293T cells was performed by Lipofectamine 2000 (Thermo Fisher). At 24 h after transfection, the medium was replaced with fresh medium supplemented with valemetostat and cultured for 5 days. The subsequent H3K27me3 level was evaluated by immunoblotting with primary antibodies (anti-H3K27me3 (1:1,000; 07-449, Merck/Millipore), anti-histone H3 total (1:1,000; ab10799, Abcam) and anti-FLAG M2 (1:1,000; F1804, Sigma)).

Generation and evaluation of resistant cell models

ATL cell lines were cultured in growth media supplemented with 10 nM of valemetostat for 2 months. Inhibitor-resistant outgrowth was observed at 100 nM. For knockdown of TET2, DNMT3A, DNMT3B and PRC2 genes, a replication-defective, self-inactivating lentivirus vector (CS-H1-Venus-IRES-Bsd) was used (Riken, BRC). We designed three shRNA sequences (Supplementary Table 6) and cloned them into CS-RfA-EVBsd via pENTR4-H1. For stable expression of wild-type and mutant EZH2 and EED in lymphoma cells, FLAG-tagged cDNAs were subcloned into lentivirus vector CSII-EF-MCS-IRES2-Venus (Riken). For stable expression of DNMT in lymphoma cells, haemagglutinin-tagged DNMT3A and DNMT3B cDNA were subcloned into the lentivirus vector pHIV-dTomato (Addgene #21374). DNMT3A(E629stop), which lacks the C-terminal enzymatic domain, was also generated for a negative control. A TET2-encoded lentivirus vector was purchased from VectorBuilder (pLV-Puro-EF1A-hTET2). The established viral vectors were co-transfected with the packaging plasmid (pCAG-HIVgp) and the VSV-G-expressing and Rev-expressing plasmid (pCMV-VSV-G-RSV-Rev) into 293FT cells. High-titre viral solutions were prepared by centrifugation-based concentration and used for transduction into cell lines. The infection was attained by the spinoculation method and then cultured in an appropriate condition for 5–7 days. Blasticidin (10 μg ml−1) was used to select the transduced population. Expression of fluorescent proteins (Venus and dTomato) was confirmed by flow-cytometory using FACSCalibur or FACSymphony A1 (BD Biosciences), or by automated cell counter using Countess 3 FL (Thermo Fisher Scientific). Expression levels of DNMT3A and DNMT3B were evaluated by immunoblotting with primary antibodies as follows: anti-DNMT3A (1:1,000; 3598, Cell Signaling Technology) and anti-DNMT3B (1:1,000; 57868, Cell Signaling Technology). Alternatively, knockdown and gene induction efficiencies were evaluated by qRT–PCR with specific primer sets (Supplementary Table 6). For evaluation of the anti-growth activity of valemetostat, lymphoma cell models (2 × 105) were plated in 12-well flat bottom plates with optimized media with 10% FBS and simultaneously treated with indicated doses of valemetostat solution in DMSO for 14 days. The cells were maintained by passage into fresh media every 3–4 days. The cell numbers were evaluated by Cell Counting Kit-8 (WST-8 assay, Dojindo) following the manufacturer’s protocol. Valemetostat used in this study was synthesized in-house.

Methylation-specific PCR

Genomic DNA (1 μg) was converted with sodium bisulfite using the EpiTect Bisulfite Kit (Qiagen). The converted DNA (200 ng) was amplified by KOD -multi & Epi- DNA polymerase (Toyobo) with methylated or unmethylated specific primer pairs for CpG islands within CDKN1A, CDKN1C and BCL2L11 promoters (Supplementary Table 6). The PCR products were analysed by 2% agarose gel stained with ethidium bromide and visualized under UV light.

Resistant outgrowth assay

To evaluate the ability of PRC2 mutants, DNMT3A, DNMT3B and TET2-targeting shRNA to acquire resistance to valemetostat, a resistance outgrowth assay27,28 in the presence of valemetostat was performed. Lymphoma cells expressing each gene or negative control cells were cultured with valemetostat at IC90 or higher for 1 week. Then, 10 cells per well were spread on 96-well plates and cultured in the presence of valemetostat with successive passages for more than 1 month. The cumulative cell count in each well was then measured using the WST-8 assay to determine the percentage of wells that outgrowth in the presence of valemetostat. Growth suppression resistance of the randomly collected outgrowth clones was evaluated. Gene expression levels were also evaluated by qRT–PCR. To evaluate the effect of DNA methylation, additional cultures were maintained for 1 week in the presence of a low concentration (10 nM) of 5-aza-2′-deoxycytidine (decitabine from Merck) and subsequent gene expression and cell counts were evaluated.

Evaluation of translation activity

For the 5′ UTR reporter assay, 5′ UTR sequences of EZH1, EZH2, SUZ12 and EED were amplified from the human genomic DNA region with specific primers (Supplementary Table 6) and inserted into the BamHI site upstream of the start codon of pMIR-REPORT (Promega). The orientation of the inserted 5′ UTR was confirmed by Sanger sequencing. The luciferase activities were quantified by the Dual-Luciferase Reporter Assay System (Promega) 2 days after transfection.

The CRISPR–Cas9-based ‘double nicking’33 was applied to delete a part of the endogenous EZH2 5′ UTR. To minimize nonspecific effects of guide RNA (gRNA) and to induce some length of deletion on the UTR, a double-nicking strategy with Cas9 nickase (Cas D10A) and double gRNA was used to introduce double-stand breaks at the target site. The gRNAs were designed using the CRISPR gRNA Design tool (DNA2.0) from the target sequence within EZH2 5′ UTR (cggtgggactcagaaggcagtggagccccggcggcggcggcggcggcgcgcgg; PAM sequences at both ends). The gRNA sequences are provided in Supplementary Table 6. An all-in-one vector (All-in-One Nickase Ninja vector, pD1421-AD), which can express two gRNAs and Cas9, was constructed and introduced into 293T and TL-Om1 cells using Lipofectamine 2000. After 48 h, GFP-positive cells were sorted. The 5′ UTR sequences of ten clones in TA-cloning were analysed by Sanger sequencing to confirm that deletion was occurring. The secondary structure and free-energy change of the 5′ UTR sequence were predicted using the mfold tool58. The CRISPR-transduced cells were used as bulk culture and characterized.

For the RIP assay, cells (2 × 107) were washed with PBS and lysed with 1 ml of RNA lysis buffer (25 mM Tris-HCl (pH 7.4), 150 mM KCl, 5 mM EDTA, 0.5% NP-40, 1 mM dithiothreitol, protease inhibitor cocktail and 100 U ml−1 RNase inhibitor (Takara)). After incubation on ice for 20 min, the cells were centrifuged at 4 °C at 14,000 rpm for 20 min to obtain cell lysate. Dynabeads protein G (Invitrogen) was added to the lysate and rotated at 4 °C for 15 min to remove proteins nonspecifically bound to the beads. For antibody-bound beads, Dynabeads protein G was washed and inculcated with anti-eIF3D (A301-758A, Bethyl Laboratories), anti-eIF3A (2013, Cell Signaling Technology) or control IgG (2729, Cell Signaling Technology) antibodies for 10 min. The prepared antibody-binding beads were added to the cell lysate and slowly rotated at 4 °C for 1 h. After washing five times with RNA lysis buffer, beads were mixed with 1 ml of TRIzol. The collected RNA was subjected to reverse-transcriptase reaction using ReverTra Ace qRT–PCR Master Mix (Toyobo) with the manufacturer’s protocol. Random primer-based synthesized cDNA was analysed by quantitative PCR using a real-time PCR system (Thermal cycler Dice, Takara). EZH2 and JUN mRNA levels were quantified using gene-specific primers (Supplementary Table 6).

For evaluation of the translation activity of EZH2 mRNA, the amount of mRNA in the ribosomal and polysomal fractions was quantified using sucrose density gradient centrifugation. The 15–40% sucrose density gradient solution (containing 10 mM Tris-HCl (pH 7.5), 140 mM NaCl, 5 mM MgCl2, 1 mM dithiothreitol and 100 μg ml−1 cycloheximide) was prepared in a centrifuge tube (Beckman Coulter). Cells (2 × 107) were washed with PBS containing 100 μg ml−1 cycloheximide for 5 min, then lysed with polysome lysis buffer (10 mM Tris-HCl (pH 7.5), 140 mM NaCl, 1.5 mM MgCl2, 0.5% NP-40, 0.5% deoxycholate, 2 mM dithiothreitol, 100 U ml−1 RNase inhibitor, 100 μg ml−1 cycloheximide and protease inhibitor) and then placed on top of the density gradient solution. The lysates were then centrifuged at 38,000 rpm for 2 h at 4 °C using SW41Ti rotor (Beckman Coulter). Twenty-four fractions were collected in 500-µl portions, and the absorbance was measured at 254 nm using a NanoDrop. RNA was extracted from each fraction using ISOGEN-LS (Nippon Gene), and the EZH2 mRNA level in each fraction was quantified by qRT–PCR.

For protein analysis of the subpopulations with different H3K27me3, total proteins from the fixed cells were extracted for immunoblotting according to a previous study32. The fixed proteins could be liberated from formaldehyde crosslinking in the presence of high heat, 500 mM Tris and 2% SDS. In brief, H3K27me3-depleted and relatively H3K27me3-high tumour cells were sorted from the two post-dose blood samples and then sonicated in 200 µl of modified fixed tissue lysis buffer (500 mM Tris-HCl (pH 7.4), 100 mM NaCl, 25 mM EDTA, 1% (v/v) Triton X-100, 1% (v/v) IGEPAL, 2% (w/v) SDS and protease inhibitor cocktail). Homogenates were incubated at 90 °C for 120 min, followed by centrifugation at 4 °C. Protein levels of the collected supernatants were analysed by immunoblotting.

For eIF3D knockdown, the lentivirus vector CS-H1-Venus-IRES-Bsd was used with two shRNA sequences (Supplementary Table 6). Protein levels of eIF3D, PRC2 factors and H3K27me3 were analysed by immunoblotting with primary antibodies, as follows; anti-EZH1 (1:1,000; 42088, Cell Signaling Technology), anti-EZH2 (1:1,000; 3147, Cell Signaling Technology), anti-SUZ12 (1:1,000; 3737, Cell Signaling Technology), anti-EED (1:1,000; 85322, Cell Signaling Technology), anti-eIF3D (1:1,000; A301-758A, Bethyl Laboratories), anti-H3K27me3 (1:1,000; 07-449, Merck/Millipore), anti-COX4 (1:1,000; 4850, Cell Signaling Technology), anti-TFAM (1:1,000; 8076, Cell Signaling Technology), anti-JUN (1:1,000; 9165, Cell Signaling Technology) and anti-β-actin (1:1,000; sc-69879, Santa Cruz). Alkaline phosphatase-conjugated anti-mouse (1:2,000; S3721, Promega) and anti-rabbit (1:2,000; S3731, Promega) secondary antibodies and BCIP/NBT substrate (S3771, Promega) were used for detection.

Statistics and reproducibility

All bar and line graphs that summarize multiple datasets show mean values. The middle lines within box plots indicate median values. Significant differences in gene expression and other biological assays between the two groups were analysed by a two-sided Student’s t-test. Adjustments were not made for multiple comparisons. Correlations between two groups were analysed by a two-sided Pearson’s correlation coefficients, and probabilities of overlap between gene sets were statistically tested. For electrophoresis of samples with cell lines, representative data from two to three independent repeat experiments are shown. Because experiments on multiple outgrowth clones are verified for reproducibility by examining multiple samples of interest, electrophoresis was performed only once. In addition, electrophoresis experiments with multiple independent patient samples were performed once.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link


Leave a Reply

Your email address will not be published. Required fields are marked *