Through an international collaborative effort, we have sequenced over 10,000 HR-HPV whole-genomes from well characterized epidemiological population cohorts. These NCI-HPV Genomics Project studies have uncovered several strong associations between HPV genetic variation and cervical carcinogenicity as well as new insights into HPV diversity in the population.
We have developed a PCR based next-generation sequencing (NGS) assay using the Thermo Fisher Life Sciences’ Ion Torrent Proton, custom Ion Ampliseq panels and an analytic pipeline to whole-genome sequence all HR-HPV types. We have HPV genome sequenced case and control cervical specimens from the NCI-KPNC PaP Cohort, U.S. SUCCEED study, and invasive cancers collected internationally by IARC for 5,570 HPV16’s, 1,729 HPV18’s, 1,002 HPV45’s, 2,000 HPV31’s, and 600 HPV35’s. For each HR-HPV type, we assessed variant sublineage and individual SNP associations with worst histologic outcome, and evaluated associations of the combined effects of rare nonsynonymous variants by viral gene region with risk of CIN3+.
Specific HPV16 sublineages are strongly associated with histology-specific precancer/cancer risk, with an estimated relative risk of glandular lesions exceeding 100 for specific sublineages. The next most carcinogenic types also show variable risks of precancer/cancer for specific sublineages. Additionally, at a finer single nucleotide (SNP) level, we have identified many variable positions significantly associated with precancer/cancer, and that controls have a significantly higher level of rare nonsynonymous variants in specific regions of the virus. For HPV16, an evaluation of rare SNPs determined that controls had a significant increase in rare variants consistent with APOBEC-induced nucleotide variations. E7 was strikingly more variable in the controls compared to the cases (P=1.1x10-7), and considerably more variable than E6. Controls also had increased rare variants in E1 (P=0.001) and L1 (P=7.9x10-5).
These data indicate that HPV carcinogenicity is associated with genetic variation in specific viral regions.