Transmitted drug resistance and transmission clusters among HIV-1 treatment-naïve patients in Guangdong, China: a cross-sectional study

Background Transmitted drug resistance (TDR) that affects the effectiveness of the first-line antiretroviral therapy (ART) regimen is becoming prevalent worldwide. However, its prevalence and transmission among HIV-1 treatment-naïve patients in Guangdong, China are rarely reported. We aimed to comprehensively analyze the prevalence of TDR and the transmission clusters of HIV-1 infected persons before ART in Guangdong. Methods The HIV-1 treatment-naïve patients were recruited between January 2018 and December 2018. The HIV-1 pol region was amplified by reverse transcriptional PCR and sequenced by sanger sequencing. Genotypes, surveillance drug resistance mutations (SDRMs) and TDR were analyzed. Genetic transmission clusters among patients were identified by pairwise Tamura-Nei 93 genetic distance, with a threshold of 0.015. Results A total of 2368 (97.17%) HIV-1 pol sequences were successfully amplified and sequenced from the enrolled 2437 patients. CRF07_BC (35.90%, 850/2368), CRF01_AE (35.56%, 842/2368) and CRF55_01B (10.30%, 244/2368) were the main HIV-1 genotypes circulating in Guangdong. Twenty-one SDRMs were identified among fifty-two drug-resistant sequences. The overall prevalence of TDR was 2.20% (52/2368). Among the 2368 patients who underwent sequencing, 8 (0.34%) had TDR to protease inhibitors (PIs), 22 (0.93%) to nucleoside reverse transcriptase inhibitors (NRTIs), and 23 (0.97%) to non-nucleoside reverse transcriptase inhibitors (NNRTIs). Two (0.08%) sequences showed dual-class resistance to both NRTIs and NNRTIs, and no sequences showed triple-class resistance. A total of 1066 (45.02%) sequences were segregated into 194 clusters, ranging from 2 to 414 sequences. In total, 15 (28.85%) of patients with TDR were included in 9 clusters; one cluster contained two TDR sequences with the K103N mutation was observed. Conclusions There is high HIV-1 genetic heterogeneity among patients in Guangdong. Although the overall prevalence of TDR is low, it is still necessary to remain vigilant regarding some important SDRMs.

therapy (ART) has substantially curbed rampant HIV transmission [2] and has significantly reduced the HIV infection associated mortality and morbidity [3,4]. However, emerging HIV drug resistant variants due to the long-term ART selection post a threat to HIV prevention and control [5].
Molecular transmission clusters can be identified by molecular phylogeny based on evolutionary theory and sequence analysis [16,17]. The analysis of transmission clusters has been widely used to study HIV-1 transmission kinetics and develop real-time precision interventions [18,19]. International guidelines recommend that newly diagnosed HIV patients should be tested for ART drug resistance for potential TDR and for antiviral drug selection [16,17]. Given that first-line ART drugs has been used in Guangdong for thirty years, it is essential to investigate the prevalence and transmission of TDR among HIV-1-infected adults in Guangdong. Here, we performed a large cohort cross-sectional study in ARTnaïve HIV-1-infected individuals in Guangdong.

Study population
Between January 2018 and December 2018, 2368 HIV-1 patients were enrolled in this study based on the following criteria (1) adult residents being over 16 years old and living in Guangdong Province; (2) diagnosed with HIV infection within 3-6 months and never received ART; and (3) not infected via mother-to-infant transmission. The epidemiological data of the patients (includingage, sex, marital status, education level, ethnicity, route of infection, and CD4 + T cell count) were acquired from the China Information System for Disease Control and Prevention.

HIV-1 RNA extraction and pol gene amplification
The blood sample mixed with the anticoagulant ethylene diamine tetraacetic acid (EDTA) was centrifuged at 3000 rpm for 5 min to collect plasma. Viral RNA was extracted from the plasma using the QIAamp Viral RNA Mini Kit (Qiagen, Germany) following the manufacturer's instructions. The extracted RNA was transcribed and nest amplified using the PrimeScript One Step RT-PCR Kit (Takara, China) and PrimeSTAR HS DNA Polymerase (Takara, China). The PCR products were analysed using agarose gel electrophoresis, and the positive products (approximately 1300 bp in the HIV-1 pol gene corresponding to HXB2 2147-3462 nt, encoding the protease and the first 299 residues of reverse transcriptase) were sent for ABI3730 sequencing in a commercial company (Tianyi Huiyuan, China). The sequences obtained were assembled and cleaned with Sequencher software.

Genotype determination and analysis
Sequences were aligned, adjusted manually and merged with HIV-1 subtyping references downloaded from the Los Alamos HIV Sequence Database via Bioedit software. To determine the HIV-1 genotypes, sequences were assessed with the Context-based Modeling for Expeditious Typing (COMET) genotyping tool, developed by Daniel Struck [20] and the REGA HIV-1 Subtyping Tool Version 3.0, developed by Tulio de Oliveira [21]. The ML phylogenetic tree was used for confirmation. The phylogenetic tree was constructed using the maximum likelihood method with the GTR substitution model with the PhyML program 3.0 [22], and the branch support value was estimated using the approximate likelihood ratio test (aLRT) [23].

TDR and drug resistance mutation analysis
TDR was defined as the presence of surveillance drug resistance mutation (SDRM) [10]. The Stanford Calibrated Population Resistance (CPR) tool 8.0 (last updated on 1st July 2019) was used to identify SDRMs according to the WHO 2009 surveillance list [21]. The Stanford HIVdb Program 8.9 (last updated on 7th Oct. 2019) was used to infer resistance to antiretroviral drugs, including protease inhibitors (PIs), nucleoside reverse transcriptase inhibitors (NRTIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs) [24]. Sequences with lowlevel, intermediate-level, or high-level resistance were defined as drug resistant.

Transmission cluster construction
The HyPhy program 2.2.4 was used to calculate the pairwise Tamura-Nei 93 (TN93) genetic distance for the aligned sequences [25]. The network visualisation program Cytoscape 3.2.1 was used to analyse sequences with a threshold genetic distance of 0.015 and to visualize the transmission network as nodes (sequences), edges (links) and clusters (groups of linked sequences) [26]. This genetic distance threshold has been validated to identify partners with epidemiological links [27] and has been widely used [28,29].

Statistical analysis
All statistical analyses were performed using IBM SPSS program version 25.0. Qualitative statistics are described using the frequency. Quantitative statistics are described using the median (IQR). Univariate and multivariate logistic regression analyses were performed to identify potential risk factors. A P-value < 0.05 was considered statistically significant. Variables with a P-value < 0.05 in the univariate logistic regression analysis were included in the multivariate logistic regression analysis. Odds ratios (ORs) and adjusted odds ratios (aORs) with their 95% confidence intervals (95% CIs) are reported.
The distribution of HIV-1 genotypes varied among different risk groups (Fig. 1B)
Risk factors associated with HIV TDR are listed in Table 1. In the univariate logistic regression analysis, two factors were significantly associated with HIV TDR. The OR for patients whose CD4 + T cell count was above 500 cells/mm 3 versus patients whose CD4 + T cell count was below 200 cells/mm 3 was 3.437 (95% CI 1.636-7.219) and that for patients infected with the CRF07_BC strain versus patients infected with the CRF01_AE strain was 0.406 (95% CI 0.193-0.854). The multivariate logistic regression model showed that a CD4 + T cell count above 500 cells/mm 3 and CRF07_BC were important risk factors, with aORs of 4.062 (95% CI 1.904-8.668) and 0.360 (95% CI 0.170-0.764), respectively.

Genetic transmission cluster analysis
All 2368 sequences were used to construct the genetic transmission network, of which 1066 (45.02%) were segregated into 194 clusters with a genetic distance threshold of 1.5%, ranging from 2 to 414 sequences (Fig. 3). A total of 93.30% (181/194) of clusters had a size ≤ 5 and 6.70% (13/194) of clusters had a size > 5.
The largest cluster A was the CRF07_BC cluster with 414 sequences, followed by the CRF55_01B cluster B with 124 sequences (Fig. 3). A total of 50.86% (563/1107) of sequences from MSM were included in the networks and dispersed among 53.09% (103/194) of the transmission networks, and 40.64% (408/1004) of sequences from HETs were included in the networks and dispersed among 69.59% (135/194) of the transmission networks. We also observed that 28.85% (15/52) of patients with TDR were included in 9 clusters, and    an analysis of shared mutations revealed that cluster C contained two TDR sequences with the K103N mutation (Fig. 3). The proportion of patients with TDR entering the network was lower than that of those without TDR, and the difference was statistically significant (χ 2 = 5.617, p = 0.023 < 0.05). These individuals with TDR included 10 patients with resistance to NRTIs, 4 patients with resistance to NNRTIs, and 1 patient with resistance to PIs. Patients were divided according to whether they entered the transmission network, and the risk factors listed in Table 3 (Table 3).

Discussion
In this study, we investigated the genetic characteristics and prevalence of TDR among ART-naïve HIV-1-infected individuals newly diagnosed in Guangdong, China, in 2018. The major epidemic HIV-1 genotypes detected in Guangdong were CRF07_BC (35.90%), CRF01_AE (35.56%), and CRF55_01B (10.30%). The distribution of HIV-1 genotypes in Guangdong has changed over the last three decades. Before 2000, subtype C (46.2%) and subtype B (30.7%) were the major prevalent strains before 2000 [30]. CRF01_AE (49.68%), CRF07_BC (22.26%), and CRF08_BC (21.93%) were the major strains circulating in 2006 [31]. CRF01_AE (43.2%), CRF07_BC (26.3%), CRF55_01B (8.5%) and CRF08_BC (8.4%) became the predominant strains circulating in 2013 [32]. In 2018, the proportion of individuals infected with CRF07_BC increased, while the proportion of individuals infected with CRF01_AE declined gradually. CRF07_BC was first identified from IDUs in the early 1990s and has spread to MSM [33]. In this study, CRF07_BC was confirmed as the most dominant HIV-1 genotype across MSM (40.65%, The overall prevalence of TDR is 2.20% in Guangdong. In general, this prevalence has remained low according to WHO categorisation methods [34], and is lower than that in other regions of China [12][13][14][15][16]. A significant difference between the prevalence of TDR and CD4 + T cell count and genotype was observed, consistent with previous results [13]. When the CD4 + T cell count was used as a categorisation parameter, it was determined that patients with a CD4 + T cell count above 500 cells/mm 3 were most likely to develop drug resistance. Of the six main genotypes, CRF07_BC had the lowest prevalence of TDR. In this study, TDR to NNRTIs and NRTIs was more common than TDR to PIs. This may be because NRTIs and NNRTIs are frequently used as first-line treatments. As the existence of TDR will affect antiretroviral therapy and spread drug resistance mutations, TDR continue to be monitored.
The SDRMs examined in our study were different from those in other regions. The most frequent PI-associated mutation in our study was M46L, whereas it is Q56E in southwest China [13], M46I in Iceland [35], and L90M in the south-central United States [36]. The most frequent NRTI-associated mutations in our study were M184V and L210W, while they are M41L and D67G in Southwest China [13] and T215C/D in Iceland and the south-central United States [35,36]. The most frequent NNRTI-associated SDRM in our study was K103N, while it is V179E and V106I in Southwest China [13] and K103N/S and E138A in Iceland and the south-central United States [35,36]. These dominant SDRMs are consistent with the main drug resistance sites among ART-treated patients in Guangdong [37]. The different SDRMs among different regions may be due to different genotype distributions or ART regimens.
To elucidate the transmission dynamics in the surveilled population, we constructed transmission clusters based on HIV-1 sequences. Of all the transmission networks, 53.09% included sequences from MSM. Moreover, more than half of the largest cluster, cluster A, and the second largest cluster, cluster B were comprised of sequences from MSM (68.36% and 54.84%, respectively). These results indicate that MSM may contribute significantly to the spread of the virus, and additional efforts should focus on this population for HIV prevention and control. Additionally, 28.85% (15/52) of patients infected by TDR strains were included in 9 clusters. A cluster (cluster C) containing HIV strains sharing the same SDRM (K103N) was found in the present study. The presence of TDR strains within transmission networks accounted for 4.64% (9/194) of all networks. These results indicate that HIV TDR may have spread in the transmission network, and the surveillance of TDR should be factored into treatment and prevention policies. Logistic regression analysis revealed that a CD4 + T cell count between 200 and 500 cells/mm 3 , the CRF07_BC strain and the CRF55_01B strain may be associated with the probability of entering the transmission network. The reasons for the association should be investigated further.

Conclusions
In summary, this study of 2368 treatment-naïve HIV-1 patients shows that there is high genetic heterogeneity in Guangdong China. Although the overall prevalence of TDR is low, it is still necessary to remain vigilant to some important SDRMs.