An antisense transcript in the human cytomegalovirus UL87 gene region

Background Rapid advances in research on antisense transcripts are gradually changing our comprehension of genomic and gene expression aspects of the Herpesviridae. One such herpesvirus is the human cytomegalovirus (HCMV). Although transcription of the HCMV UL87 gene has not been specifically investigated, cDNA clones of UL87 antisense transcripts were found in HCMV cDNA libraries previously. In this study, the transcription of the UL87 antisense strand was investigated in three clinically isolated HCMV strains. Results First, an 800 nucleotides transcript having an antisense orientation to the UL87 gene was found in a late HCMV cDNA library. Then, the UL87 antisense transcript was confirmed by Rapid amplification of cDNA ends (RACE) and Northern blot in three HCMV clinical strains. Two ORFs were predicted in the antisense transcript. The putative protein of ORF 1 showed a high degree of conservation among HCMV and other CMV strains. Conclusion An 800nt antisense transcript in the UL87 gene region exists in HCMV clinical strains.


Background
Human cytomegalovirus (HCMV) is the prototypical member of the subfamily Betaherpesvirinae. Seroepidemiologic studies have shown that the virus is widespread in the human population [1][2][3][4][5]. Like other herpesviruses, HCMV can not be completely eliminated by the immune system and remains either as a low-level persistent infection or in a quiescent, latent state for the lifetime of the infected person. HCMV infection is asymptomatic in most healthy adults, but causes lifethreatening disease in immunologically immature or compromised individuals, including neonates [6,7], AIDS patients [8], and allogeneic transplant recipients [9].
Although the entire sequences of some HCMV strains are available (GenBank: X17403.1, FJ616285.1, GQ466044.1, GU179001, GU937742, and others), the precise number and nature of the viral genes and gene products are still in question. To date, most HCMV genes have not been extensively characterized with respect to their expression patterns. A remarkable accumulation of antisense (AS) transcripts during HCMV infection, reported by Zhang et al. [10], suggests that currently available genomic maps based on open reading frame (ORF) and other in silico analyses may drastically underestimate the true complexity of viral gene products.
UL87 is one of the 208 ORFs of the HCMV AD169 strain (GenBank: X17403.1) predicted by Chee in 1990 [11], and was reevaluated to have coding potential by Murphy [12]. Although UL87 was identified to encode an early protein expressed during infection with HCMV recombinant virus [13], its transcriptional pattern has not been described. However, two AS transcripts overlapping the UL87 gene were obtained by screening a HCMV cDNA library made during late infection, in the study by Zhang et al. [10]. Moreover, we also found two cDNA clones in a late HCMV cDNA library containing the sequence of the UL87 AS strand [14].
In the present study, the HCMV UL87 AS transcript was screened further in a late HCMV cDNA library. The structure of the UL87 AS transcript was investigated by RACE experiment and Northern blot in three HCMV clinical strains. An unspliced AS transcript of the UL87 gene was identified.

Results
AS transcripts in the UL87 region identified from the HCMV cDNA library Nineteen cDNA clones were identified as having sequences congruent with the UL87 gene region by graded PCR from the library. All of the 19 sequences possessed a poly(A) tail which was not coded by the HCMV genome, and were found to be homologous to the complementary strand of the UL87 gene. The 5' end of one of the 19 sequences was located at nt 131055, and the 5' ends of 17 other sequences were located at nt 130263. One other sequence, with a 5' end at nt 130261, was most likely a truncated cDNA created during library preparation. The 3' ends of the 19 sequences were all located at nt 129489-129491 downstream of a poly(A) signal (AATAAA) located at nt 129565-129570 ( Figure 1). The sequencing results for the cDNA clones suggested that the transcripts present in the library correspond to the AS orientation of the UL87 gene, of which an 800 nt unspliced transcript was the dominant transcript.

3' and 5' ends of UL87 AS transcripts obtained by RACE analysis
To confirm the existence of the UL87 AS transcripts, and to find other potential forms of UL87 AS transcripts, both 5' and 3'RACE analyses were employed with late class RNAs of the three HCMV strains (X, CH, H). The products of 3' RACE for all three strains showed an accordant band of about 500 bp ( Figure 2). Sequencing results demonstrated that the 3' ends of the UL87 AS transcripts of all three strains were located at nt 129489-129491 downstream from a consensus poly (A) signal at nt position 129465-129470, which was identical to those of the transcripts derived from the cDNA library.
First, 5' RACE experiments were performed using F1 and F2 (nested) primers (Table 1 Figure 1). An~500 bp product was found in all three strains ( Figure 3). The sequences from most of the clones of the 5' RACE products initiated at nt position 130267, which was four nucleotides upstream of the 5' end at nt 130263 of the transcript represented in the cDNA library. Two other clones of the 5' RACE product, of the CH strain, initiated at nt positions 130264 and 130265, respectively.
Then, in order to confirm the 5'end at nt 131055 obtained in the cDNA library, two other nested primers (F3 and F4) ( Table 1 Figure 1) were used. Multiple 5' ends were found, ranging from nt 130645 to nt 131430 in the three strains. However, no accordant results were found among the three strains. Moreover, the 5' end at nt 131055 could not be validated in any of the strains. The result suggested that complex structures may exist in the 5' end of the transcript.

UL87 AS transcripts confirmed by Northern blot
Northern blot analysis was performed using total cellular RNAs harvested from HELF cells infected with  HCMV H strain (immediate early, early, and late class), and the total RNA of mock-infected cells was used as control. RNAs were hybridized to a riboprobe (nt 129553 to 130049) complementary to the UL87 AS region. An 800 nt transcript was detected in late class RNA from HCMV-infected HELF cells, but not in mock-infected HELF cells ( Figure 3). This suggests that the 800 nucleotide transcript is an UL87 AS transcript expressed by HCMV.

Sequence analysis of the 800 nt UL87 AS transcript
The spatial location of the 800 nt UL87 AS transcript in the HCMV genome, as well as the primers and riboprobe used, are shown in Figure 1. The sequence of the area in HCMV AD169 is detailed in Figure 4A. A non-conventional potential TATA promoter element (TATTA) is present at 28 bp upstream of the RNA initiation site, according to sequence data obtained through 5' RACE. Besides a consensus poly(A) signal (129465-129470) located upstream, the 3' terminus, a weak consensus G/T cluster (GTGTCTGTGTCGGCAAATGTG, 129484-129464) was found downstream of the 3' terminus, an element essential for cleavage of the 3'end of the mRNAs [15,16].
Two open reading frames (ORF1 and ORF2) were predicted in the transcript, which have the potential to code for a 60-amino-acid and a 78-amino-acid protein, respectively. Prosite motif research showed that there is one N-myristoylation site and one Casein kinase II phosphorylation site in both the predicted proteins, and two Protein kinase C phosphorylation sites in the predicted protein encoded by ORF 1. To study how conserved the putative UL87 AS proteins are among HCMV and other CMV genomes, a phylogenetic study was done using the UL87 AS homologous sequences of CCMV, MCMV, and HCMV of the AD169, Merlin, and Towne strains, along with the three clinical strains from this study. As shown in Figure 5, the putative proteins encoded by ORF 1 were completely consistent among these HCMV strains. CCMV and MCMV also have a similar ORF to the ORF1 of HCMV, in the same region, with the main differences located at the amino termini. The amino acid sequence of CCMV had higher homology to that of HCMV than MCMV. The ORF2 was absent in MCMV. The amino acid alignment of ORF2 did not show a high degree of conservation, in contrast to that of ORF 1, between HCMV and CCMV. Even in HCMV strains, besides amino acid changes, mutations in the termination site could be found in the CH and Towne strains ( Figure 5).

Discussion
In this study, the transcription of the AS strand of the HCMV UL87 gene area was investigated, and an 800 nt UL87 AS transcript was deeply characterized, which has been found as a cDNA clone in a late HCMV cDNA library [14]. The transcript was identified in three HCMV clinical strains.
In the present study, several lines of evidence demonstrated that an~800 nt unspliced UL87 AS transcript existed among late-class transcripts during HCMV infection. An additional poly(A) tail, which was not coded by the genome, was found at the end of the UL87 AS transcript by sequencing the cDNA clones and 3' RACE products, confirming that it was indeed polyadenylated. The potential TATA promoter element, the consensus poly(A) signal, and the weak consensus G/T cluster all provided evidence that the novel transcript was a conventional mRNA, which could potentially encode a protein.
Two small ORFs were predicted in the transcript, which could encode proteins of 60 amino-acids and 78 amino-acids, respectively. Amino-acid sequence alignments showed that the putative protein of ORF 1 displayed highly conservation among the HCMV, CCMV, and MCMV strains. It seems likely that ORF 1 could have a protein-coding function. However, the two ORFs were predicted neither in the preliminary analysis of the HCMV genome by Chee et al. [11] nor in the re-analyses of the HCMV genome [12,17]. This is because in  these analyses the authors required that any putative coding ORF encode a polypeptide of at least 100 or 80 amino acids in length. It will be important to ascertain whether the two putative proteins are in fact present in infected cells. Such studies are ongoing. About 1.5 kb unspliced cDNA of UL87 AS transcripts was found in the HCMV cDNA library. Two other spliced AS transcripts expressed from the UL87 area have also been obtained by screening a late HCMV cDNA library [10]. Compared with the transcripts identified in the present study, they have different initiation sites (nt 134496 and 132114) but the same termination site. These results indicated that more than one transcript is expressed from the UL87 area in the AS orientation. There are several non-mutually exclusive explanations to account for the failure to further confirm these transcripts in this study. First, the cDNA library could not contain all of the transcripts that accumulate during infection, especially those expressed in small quantities. Second, the transcripts may vary among different strains and under different replication conditions. Third, the possible lower abundance of these transcripts in the infected HELF cells may have made detection by Northern blot difficult in this study.
The 5' RACE result with F3 and F4 as the specific nested primers did not provide an authentic identical 5' end. This could be attributed to the complicate secondary structure (hairpins or stem-loop caps) of mRNA 5'untranslated region (5'-UTR), which may block the reverse transcription. Several RNA structures located in the 5'-UTR of eukaryotes mRNA transcripts have been shown to affect translation efficiency [18][19][20]. Further investigation on the 5' ends of other UL87 antisense transcripts and the secondary structure of the 5'-UTR would help to understand the characteristics of the transcript on translational regulation.
A recent study showed that a UL87 ORF was expressed as an early viral protein [13]. In the present study, no UL87 transcript was obtained when screening a HCMV cDNA library using primers located in the UL87 gene area. The cDNA library used in the study should contain HCMV transcripts of each infection phase, although mainly of the late class genes. However, DNA sequence analyses of several HCMV strains (AD169, GenBank: X17403.1; Merlin, GenBank: GU179001.1; Towne, GenBank: FJ616285.1) showed that the nearest poly(A) signal (AATAAA) to the 3' termini of the UL87 ORF was located 512 bp downstream of the UL94 ORF, which is about 10 kb from the 5' terminus of the UL87 gene. Therefore, the UL87 to UL94 genes could be co-expressed as a large polycistron. The full-length cDNA of this large transcript could not be contained in the cDNA library. Nevertheless, our study along with those of others [10,13] confirmed that both strands of the UL87 gene area have expression potential.
Abundant sense and antisense transcript pairs have been found by Zhang et al [10]. They obtained direct evidence for the existence of S-AS transcript pairs derived from 38 known or predicted viral genes. Individual AS transcripts have also been described for many herpesviruses, including the betaherpesviruses [21][22][23], the gammaherpesviruses [24][25][26], and especially the alphaherpesviruses [27][28][29][30][31][32][33][34][35][36][37][38][39][40]. In fact, Carter et al have predicted that genes in AS orientation to known herpesvirus genes could be common [29]. S-AS pairs may be functionally relevant with respect to regulation between them [21], so the dynamics of the S-AS transcripts of the UL87 gene area, along with their relationship to each other, need to be characterized further.

Conclusion
In this study, an 800 nt unspliced UL87 AS transcript was identified to express in HCMV late infection phase, and two ORFs were predicted in the antisense transcript.

Virus and specimens
Three HCMV clinical strains, named X, CH, and H, were isolated from urine samples from three infants less than 5 months old who had been hospitalized in Shengjing Hospital of China Medical University. The strains were passaged less than ten times in human embryonic lung fibroblast (HELF) cells, which were maintained in 1640 medium supplemented with 2% fetal calf serum, and 100 units penicillin/streptomycin at 37°C and 5% CO 2 in a humidified incubator. HELF cells were inoculated with the three strains at a multiplicity of infection (MOI) of 3-5, respectively.

RNA preparations
For preparation of immediate-early (IE) RNA of HCMV, the protein synthesis inhibitor cycloheximide (CHX) (Sigma, USA) (100 μl/ml) was added to the culture medium 1 hour before infection and the cells were harvested at 24 hours post-infection (hpi). For early (E) RNA, the DNA synthesis inhibitor phosphonoacetic acid (PAA) (Sigma, USA) (100 μl/ml) was added to the medium immediately after infection, and the cells were harvested at 48 hpi. Late (L) RNA and mock-infected cellular RNA were derived from infected and uninfected cells, respectively, cultured in parallel, and harvested at 96 hpi. Total RNAs were isolated from approximately 10 7 infected or uninfected HELF cells using TRIzol agent (Invitrogen, Carlsbad, CA). The isolated RNAs were treated with DNA-Free reagent (Ambion, Austin, USA) to remove possible contaminating DNA. The integrity and size of the isolated RNAs were analyzed by formaldehyde agarose gel electrophoresis. The quantity and purity of the RNAs were estimated by optical density value detection.
Screening a HCMV cDNA library A HCMV cDNA library had been constructed previously using the SMART technique (Clontech, USA) using the L RNA of HCMV H strain isolated from the urine sample of a HCMV-infected infant [14]. To select specific cDNA clones from the cDNA library by polymerase chain reaction (PCR), a graded PCR was set up as previously described [41,42].
Six thousand cDNA clones were screened by graded PCR using several pairs of primers ( Table 1 Figure 1). The PCR conditions were initially denatured at 94°C for 4 min, 30 cycles of 94°C for 30 sec, 55°C for 30 sec, and 72°C for 1 min, followed by a final elongation of 72°C for 10 min. Inserts in the selected clones were sequenced using vector primers (M13F and M13R). The screening results allowed us to obtain clones containing transcript sequences for both strands of the UL87 gene area.

RACE
Rapid amplification of cDNA 3' ends (3' RACE) and 5'ends (5' RACE) experiments were performed with 3'-Full RACE Core Set Ver.2.0 and 5'-Full Race Kit (TaKaRa, Dalian, China), respectively. The L class RNA preparations for the three strains and RNA of mock infected cells were used as templates. First-strand cDNAs were synthesized with MMLV reverse transcriptase using oligo-dT-adaptor primers (3' RACE) and random 9-mer primers (5' RACE). Nested PCR amplifications were carried out using LA Taq (TaKaRa, Dalian, China) after reverse transcription. All of the primers are listed in Table 1 and Figure 1. The reactions were carried out at 94°C for 4 min, 30 cycles of 94°C for 30 sec, 55°C for 30 sec, and 72°C for 3 min, with a final extension at 72°C for 10 min. In 5' RACE experiments, two control reactions were performed in strict accordance with kit instructions: i) TAP (-), omitting tobacco acid pyrophosphorylase, ii) MMLV (-), omitting MMLV reverse transcriptase.

Cloning and Sequencing
Products of RACE were separated by agarose gel electrophoresis. Different-sized products were purified using the DNA Purification Kit (Promega, Madison, WI, USA). Recovered PCR products were ligated into a pCR 2.1 TA vector (Invitrogen, China) with T4 ligase at 14°C , overnight. The ligation products were transformed into E. coli DH/5α competent cells. Ten clones of each purified PCR product were selected randomly for sequencing using the M13 primers and the ABI PRISM 3730 DNA analyzer (Applied Biosystems, Carlsbad, CA).

Northern blot
For northern blot analysis, 2 μg per lane of IE, E, and L total RNA of the HCMV H strain and RNA from mockinfected HELF cells were subjected to denaturing agarose gel (1% [wt/vol]) electrophoresis in the presence of formaldehyde, alongside the digoxigenin-labeled RNA molecular weight marker I (Roche, Indianapolis, IN, USA). The UL87 AS-specific riboprobes were labeled using a DIG Northern starter kit (Roche, Indianapolis, IN, USA) according to the manufacturer's instructions. The riboprobes correspond to nucleotides 129553 to 130049 of the complementary strand of the AD169 sequence ( Figure 1). The separated RNA fragments were transferred onto positively charged nylon membranes using capillary transfer. Then, the nylon membranes were baked at 80°C for 2 h followed by prehybridization for 30 min at 63°C using the Dig EasyHyb-buffer (Roche, Indianapolis, IN, USA). After overnight hybridization at 63°C, the membranes were washed according to the manufacturer's instructions. The hybridized probes were incubated with anti-digoxigenin conjugated to alkaline phosphatase and were then visualized with the chemiluminescence substrate CDP-Star (Roche, Indianapolis, IN, USA). The membranes were exposed using ChemiDoc™ XRS+ (Bio RAD, USA).

BLAST search and sequence analysis
Standard nucleotide-nucleotide BLAST was performed on the NCBI website. The nucleotide positions referred to in this study are in reference to the sequence of the HCMV AD169 strain (GenBank: X17403.1). The following sequences were used for alignment analysis: HCMV AD169 strain (GenBank: X17403.1), Merlin strain (Gen-Bank: GU179001.1), Towne strain (GenBank: FJ616285.1), the three clinical strains (H, CH, and X) in this study, Chimpanzee cytomegalovirus (CCMV, Gen-Bank: AF480884), and Murine cytomegalovirus (MCMV, GenBank: AM886412). DNA alignment was done by MegAlign using Clustal W algorithms. ORFs of identified transcripts were predicted by Editseq of the DNAstar package. The motifs in the predicted proteins were predicted using GeneDoc program.