This study has focused on a region of the HCMV genome that encodes products potentially very important for the overall pathogenesis of the virus. To our knowledge this is the first comparative study of all of the genomic components of the ORFs from UL146 through UL147A and the first report of multiple overlapping transcripts expressed from this region.
The results show that among clinical HCMV strains there is a gradient of sequence variability that ranges from very high to low beginning with the UL146 ORF and progressing downstream through UL147A. Paradoxically, the hypervariable UL146-UL147A sequences of individual strains were found to be completely stable throughout long term propagation in vitro and in vivo.
The adjacent UL146 and UL147 ORFs have conserved CXC chemokine motifs and are positionally conserved in all clinical strains of HCMV that have been characterized, although they have been deleted from some laboratory strains . They are therefore not essential for virus replication in vitro, but appear to be maintained for infectivity in vivo. The pattern of multiple CXC chemokine homologues is also found in the genomes of other primate CMVs [24, 33, 34] but not murine CMV (MCMV)  perhaps reflecting a divergence in evasion strategies.
The cumulative evidence from this study and others clearly establishes the hypervariability of the UL146 ORF among clinical HCMV strains [2, 9, 10, 22]. These results emphasize that within a defined sequence group, UL146 sequence similarities exist among unrelated clinical strains from widely separated geographic areas, while at the same time there is a high level of UL146 sequence divergence between different groups within individual geographic areas. Despite the hypervariability, all reported UL146 sequences including those in the present study have conserved functional residues associated with CXC chemokines [2, 9, 10, 22]. ELR residues adjacent to the CXC motif are also present in most of the UL146 sequences. ELR-positive CXC chemokines have been reported to induce angiogenesis and vascular remodeling through binding to CXCR2, while ELR-negative chemokines are angiostatic [26, 35, 36]. The UL146 protein expressed from the Toledo strain induces CXC chemokine functions including neutrophil chemotaxis, calcium flux, and degranulation and binds to CXCR2 . However, the angiogenic activity of UL146 has not been determined. In UL146 Group 5 sequences (Figure 2) the ELR residues are replaced with NGR. The arginine residue that is considered to be absolutely essential for receptor binding  is retained, but the potential effect of the NG substitution on chemokine functions is not known. All of the UL147 sequences have DXR residues in the homologous positions of the ELR residues next to the CXC motif. Similar to the UL146 sequence, the arginine residue required for receptor binding is conserved in all UL147 sequences, but no chemokine-related functions have yet been attributed to the UL147 product.
Recently it was reported that ELR-positive CXC chemokine activity is elevated in association with bronchiolitis obliterans syndrome (BOS)  in lung transplant recipients. In association with the elevated CXC chemokines there was vascular remodeling of the trachea and aberrant angiogenesis. HCMV infection and disease is a frequent and serious complication for lung transplant recipients. Although HCMV infection was not included in the analysis for the BOS study, these new findings suggest a possible functional link between HCMV chemokine activity and human disease. It will be important for HCMV pathogenesis to determine whether such a link exists and how sequence variability could affect this function.
Further sequence examination of the intergenic region between UL146 and UL147 produced two unexpected findings. First, it was found to be highly variable in both nucleotide length and sequence. This is surprising because in previous analyses of other variant HCMV genes (N.S. Lurain, unpublished data), the coding sequences from all unrelated clinical isolates could be amplified using a single set of primer pairs from the non-coding flanking regions, suggesting that the sequences of these flanking regions are generally conserved. The upstream non-coding sequence of UL146 has conserved primer binding sites, but the intergenic region provides no conserved downstream site. A second unexpected finding is the consistent linkage of the highly variable intergenic sequences with specific UL146 ORF sequence groups. The UL147 ORF has lower overall sequence variability than the UL146 ORF. However, phylogenetic analysis based on UL147 sequences shows that the strains cluster in the same groups determined by the UL146 and intergenic sequences. Thus, there is no evidence for recombination between UL146 and UL147.
The start site of the UL147A ORF is invariably only 2 nucleotides downstream of the UL147 stop codon. Despite the highly conserved sequence of the UL147A ORF, this very short 2-basepair sequence between the UL147 and UL147A ORFs strongly suggests linkage to the rest of the region. However, there is no known functional relationship between the predicted products of UL147 and UL147A.
In contrast, the present study established that the sequence groups of the hypervariable UL144 ORF are not linked to the UL146 sequence groups even though UL144 is less than 1 kb upstream of UL146. We have previously shown UL144 to be unlinked to the variable gB gene, which is more than 90 kb upstream . These data along with those reported by others indicate that most of the known hypervariable ORFs are unlinked, which suggests that so far, with the exception of UL146 and UL147, the pattern of known variant genes present in each strain was most likely generated over very long periods of time by recombination events [8, 11, 15]. The lack of recombination within the region UL146 through UL147A is further supported by the fact that there are HCMV strains from each geographic site that have identical sequences spanning this entire region. This raises the question of how the hypervariability has evolved. We addressed this question first by investigating the possibility of cumulative sequence drift over long-term virus propagation. Serial passage of multiple clinical isolates over several months in cell culture failed to produce even a single nucleotide substitution despite phenotypic changes from cell-associated to cell-free virus. This in vitro approach confirms that long-term cell culture does not add sequence artifacts. However, cell culture lacks components of the immune system that could produce sequence drift through selection of antigenic variants.
The possibility of sequence drift in vivo was addressed by analyzing sequential isolates obtained from transplant recipients over long-term follow-up of several months to several years, a much longer period of follow-up than that of previous studies [10, 22]. All isolates from specimens from the same patient including those from different body compartments maintained identical UL146 sequences demonstrating that no sequence drift occurred in vivo. The most convincing evidence that in vivo passage of HCMV strains does not produce sequence drift comes from the data from four matched pairs of transplant recipients with the same donor. All isolates from related patient pairs have identical nucleotide sequences of the UL146-UL147A region, and in the case of CH1 and CH2 the sequence identity was maintained over a period of almost 5 years. Thus, passage of the same strain in different hosts did not select variant UL146 sequences.
Some patients had evidence of infection with more than one strain, for example subjects R1 and NW23 (Table 2). However, the UL146-UL147A sequences of each individual strain remained stable over long-term passage both in vitro and in vivo with no evidence of recombination. We would predict from the observed sequence stability of HCMV strains during long-term passage that even minor sequence differences among isolates from the same patient indicate the presence of multiple strains rather than sequence drift of a single strain. This prediction can be confirmed by analyzing the sequences groups of other unlinked variable genes such as UL144 and gB detected among the same isolates.
The close linkage and sequence stability of the UL146-UL147A ORFs led to the investigation of potential splicing and temporal expression of transcripts from this region. Analysis of RT-PCR products revealed a single large transcript that contained not only the UL146-UL147A ORFs but also the downstream UL148 and UL132 ORFs. The RT-PCR sequences showed no evidence of spliced transcripts. Northern analysis identified a dominant large 3.7 kb transcript that hybridized with riboprobes representing all 5 ORFs, and also identified 5 other transcripts ranging in size from approximately 1.0 to 3.1 kb that hybridized with riboprobes from one or more of the ORFs. UL146 sequences were only detected on the largest transcript (3.7 kb), and UL132 sequences were detected on all transcripts. The transcripts represent different temporal classes as determined by the time of expression post-infection and by the effect of foscarnet on that expression. Based on size and hybridization patterns, UL146 appears to be expressed only from the large 3.7 kb transcript, which has early-late kinetics. UL147 is likely expressed from the 3.1 kb transcript, which has true late kinetics. These results are slightly different from earlier microarray analysis of HCMV transcriptional expression based on the Towne strain , which indicated that UL146 (UL152 in Towne) and UL147 have similar early-late kinetics. The discrepancies likely result from the inability of microarray analysis to distinguish overlapping transcripts. Penfold et al.  reported that the UL146 protein is expressed with true late kinetics as shown by foscarnet inhibition, but no transcriptional analysis was reported. However, early-late transcriptional expression is compatible with the potential timing of chemokine activity that would likely play a role in pathogenesis after viral replication.
The northern analysis of the UL146-UL132 ORFs shows a transcriptional pattern and complexity similar to that found in other genomic regions of HCMV including the UL93-UL99 ORFs [38, 39], which have: 1) overlapping transcripts with different 5-prime termini; 2) co-terminal 3' ends; and 3) different temporal expression of the transcripts. The RT-PCR and northern data show a series of transcripts that all include UL132 sequences at the 3' end but vary in the number of upstream ORFs and differ in their temporal expression. The single poly A signal, which is downstream of the UL132 stop codon, supports the possibility of a common 3'-terminus for all of these transcripts [2, 5].