CODEHOP-mediated PCR – A powerful technique for the identification and characterization of viral genomes
Virology Journal volume 2, Article number: 20 (2005)
Consensus-Degenerate Hybrid Oligonucleotide Primer (CODEHOP) PCR primers derived from amino acid sequence motifs which are highly conserved between members of a protein family have proven to be highly effective in the identification and characterization of distantly related family members. Here, the use of the CODEHOP strategy to identify novel viruses and obtain sequence information for phylogenetic characterization, gene structure determination and genome analysis is reviewed. While this review describes techniques for the identification of members of the herpesvirus family of DNA viruses, the same methodology and approach is applicable to other virus families.
Only a very small fraction of the vast number of viral species belonging to the different virus families have been identified and characterized to date. The majority of these uncharacterized viral species are found in host organisms which have not been targeted in biomedical, plant or animal research. However, recent reports have noted an increase in the occurrence of viral diseases, not only in humans, but in animals and plants as well. While some of this rise may reflect more effective surveillance techniques, disease outbreaks caused by novel cross-species infections and/or subsequent virus recombination events have occurred . Therefore, the development of tools for the detection of viruses, the characterization of their genomes and the study of their evolution, becomes important, not only for basic scientific study, but also for the protection of public health and the well-being of the plant and animal life that surrounds us.
We have developed a novel technology to identify and characterize distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). CODEHOPs are designed from amino acid sequence motifs that are highly conserved within members of a gene family, and are used in PCR amplification to identify unknown related family members. We have developed and implemented a computer program that is accessible over the World Wide Web to facilitate the design of CODEHOPs from a set of related protein sequences . This site is linked to the Block Maker multiple sequence alignment site  on the BLOCKS WWW server  hosted at the Fred Hutchinson Cancer Research Center, Seattle, WA.
We have utilized the CODEHOP technique to develop novel assays to detect previously unknown viral species by targeting sequence motifs within stable housekeeping genes that are evolutionarily conserved between different members of virus families. Using CODEHOPs derived from conserved motifs within retroviral reverse transcriptases, we have previously identifed a diverse family of retroviral elements in the human genome , as well as a novel endogenous pig retrovirus , and a new retrovirus in Talapoin monkeys . We have also developed assays to detect unknown herpesviruses by targeting conserved motifs within herpesvirus DNA polymerases. Using this approach, we have identified fourteen previously unknown DNA polymerase sequences from members of the alpha, beta and gamma subfamilies of herpesviruses , and have discovered three homologs of the Kaposi's sarcoma-associated herpesvirus in macaques [9, 10]. We have also used the CODEHOP technique to clone and characterize the entire DNA polymerase gene from these new viruses  and to obtain sequences for larger regions of viral genomes containing multiple genes, targeting the divergent locus B of macaque rhadinoviruses . The sequence information obtained from the amplified gene and genomic fragments from these studies has allowed informative phylogenetic characterization of the new viral species, and has provided critical information regarding the gene structure and genetic content of these unknown viral genomes.
In this review, the CODEHOP methodology and its utilization in the identification and characterization of novel viral genomes using the herpesvirus family as an example is described. Published CODEHOP assays that we have previously used to identify new herpesviruses are discussed and the latest refined assays and their utility are provided. The use of the CODEHOP methodology for the analysis of larger regions of viral genomes is presented along with the general application of this technology for the identification of viral species and their genes in other virus families. Finally, the software and Web site that we have developed to derive CODEHOP PCR primers from blocks of multiply aligned protein sequences are described.
General CODEHOP Design and PCR Strategy
CODEHOPs are derived from highly conserved amino acid sequence motifs present in multiple alignments of related proteins from a targeted gene family. Each CODEHOP consists of a pool of primers where each primer contains one of the possible coding sequences across a 3–4 amino acid motif at the 3' end (degenerate core) (Figure 1A) . Each primer also contains a longer sequence derived from a consensus of the possible coding sequences 5' to the core motif (consensus clamp). Thus, each primer has a different 3' sequence coding for the amino acid motif and the same 5' consensus sequence. Hybridization of the 3' degenerate core with the target DNA template is stabilized by the 5' consensus clamp during the initial PCR amplification reaction (Figure 1B). Hybridization of primers to PCR products during subsequent amplification cycles is driven by interactions through the 5' consensus clamp.
Conserved amino acid motifs used for CODEHOP design are identified by alignment of related proteins from a targeted gene family using computer programs such as the Clustal W multiple alignment program . Optimal blocks contain 3–4 highly conserved amino acids with restricted codon multiplicity from which the 3' degenerate core is derived; the presence of serines, arginines and leucines are not favored due to the presence of six possible codons for each amino acid. In addition, optimal blocks contain 5 or more conserved amino acids from which the 5' consensus clamp is derived. These blocks of conserved amino acid sequences should be situated in close enough proximity to allow efficient PCR amplification between blocks yet distant enough to flank a region of significant sequence information.
We have developed web-based software to predict CODEHOP PCR primers from blocks of conserved amino acid sequences [2, 13]. Multiple related protein sequences from the targeted gene family are provided to the Block Maker program  at the BLOCKs WWW server  which produces a set of conserved sequence blocks obtained from a multiple sequence alignment. The sequence block output is linked directly to the CODEHOP design software  which predicts and scores possible CODEHOP PCR primers. The different CODEHOP PCR primers discussed in this review were either designed manually or with the CODEHOP software, and are listed in Table 1.
CODEHOP PCR Amplification, Product Cloning and Sequence Analysis
CODEHOP PCR amplification has been performed using classical and touch-down approaches with a hot-start initiation . More recently, thermal gradient PCR amplification has been used to empirically determine optimal annealing and amplification conditions for the pool of primers . Different buffers, salt concentrations, and enzymes have been employed with varying success due to differences in DNA template preparation and the unknown nature of the targeted sequence. PCR products are either sequenced directly or after TA-cloning.
In this review, sequences were compared by BLAST analysis  and multiple alignment using Clustal W . Phylogenetic analysis of the multiply aligned sequences was performed using protein distance and neighbor-joining analysis implemented in the Phylip analysis package . Bootstrap analysis was also performed with 100 replicates and a consensus phylogenetic tree was determined. For the phylogenetic analysis, positions in the multiple alignment containing gaps due to insertions or deletions within the sequence blocks were eliminated.
The "TGV-IYG" CODEHOP assay to detect novel herpesviruses
The Herpesviridae was chosen as a target virus family to develop assays to detect and characterize new viral members. All members of the herpesvirus family contain a DNA polymerase within their genome which is highly conserved across the different family members. Multiple alignment of different herpesvirus polymerase sequences revealed blocks of conserved amino acids corresponding to many of the functionally important motifs , see Figure 2A. We have developed and refined PCR strategies using CODEHOP PCR primers derived from these conserved sequence blocks to detect novel herpesviruses and characterize their genomes.
Initially, we manually designed a set of nested PCR primers from four of the conserved DNA polymerase blocks (indicated as black boxes in Figure 2A) which could be used to identify new viral polymerases and detect the existence of previously unknown or uncharacterized herpesviruses . The primers, "TGV", "IYG", "DFA" and "KG1" (Table 1), and the blocks of multiply aligned sequences from which the primers were derived are shown in Figures 3, 4, 5, 6, respectively (letters in the primer name refer to conserved amino acids in the sequence motif). Although these primers were alternately referred to as either "consensus" primers or "degenerate" primers within the original publication, all except DFA were designed using the general CODEHOP strategy . In the "TGV-IYG" herpesvirus assay, the "DFA" sense primer was used in an initial PCR amplification with the "KG1" anti-sense primer (Figure 2B). An additional sense primer "ILK" located downstream of the "DFA" motif was also added to the initial amplification reaction . The product from this amplification was used as template in a nested amplification reaction using the "TGV" sense primer and the "IYG" anti-sense primer (Figure 2B). This final PCR product was sequenced to obtain the ~165–180 bp region of the DNA polymerase gene located between the two motifs "TGV" and "IYG". The distance between the two motifs was variable between viral species due to small sequence insertions or deletions.
We have shown the utility of this CODEHOP PCR primer strategy by identifying and characterizing14 previously unknown DNA polymerase sequences from members of the alpha, beta and gamma subfamilies of herpesviruses . Since this original publication, more than 21 additional "TGV-IYG" DNA polymerase sequences from previously uncharacterized herpesviruses have been obtained by other investigators using this CODEHOP primer strategy (see Additional File 1; "TGV-IYG" assay). In some cases, PCR amplification was performed with modified deoxyinosine-substituted primers .
Comparison of the amino acid sequences encoded within the "TGV-IYG" region has allowed phylogenetic comparison of the different herpesvirus species from which these sequences were obtained. Figure 7 shows a phylogenetic tree resulting from the analysis of the sequences obtained from 34 different herpesvirus species identified using the "TGV-IYG" CODEHOP strategy and the corresponding sequences of six representative human herpesviruses. Although the number of amino acid comparisons within this region is limited, ie. only 53 amino acids, preliminary assignment of many of the herpesvirus species to one of the three herpesvirus subfamilies has been possible (Figure 7 and Additional File 1). Values from the bootstrap analysis using 100 replicates are indicated for each branch point. While some of the branch points were not well defined due to the limited amount of sequence data, as indicated by boostrap values less than 50, many groupings were well supported. The analysis shows clearly the grouping of different viral species from evolutionarily related hosts. This is consistent with previous studies which have shown extensive cospeciation of viral species and their host lineages .
Parameters for refinement of the "TVG-IYG" assay
Limiting degeneracy to increase sensitivity
While the "TVG-IYG" herpesvirus assay demonstrated the ability to detect disparate herpesvirus species in high titer virus cultures in vitro, the detection of limiting amounts of virus in tissue samples in vivo was problematic. This was especially true in sections obtained from formalin-fixed, paraffin-embedded tissue blocks which contained small amounts of degraded DNA. The degeneracy of the primer pool, ie. the number of different primers necessary to encode all codon possibilities for the specified block of conserved amino acids, plays a direct role in the sensitivity of the PCR amplification. Whereas highly degenerate primers consisting of pools of hundreds or thousands of primers with different DNA sequences may allow amplification of DNA templates present in high copy number, as found in cultured virus stocks, they are less successful in amplifying low copy numbers of DNA templates found in virus infected tissues in vivo, especially in formalin-fixed tissue. As the degeneracy increases, the concentration of the primer or primers that will participate in the desired amplification reaction decreases and can become suboptimal. Conversely, the vast excess of primers not participating in the amplification of the targeted gene can cause non-specific amplification which can, in turn, inhibit or mask the amplification of the desired target.
As indicated in Table 1, the degeneracy of the primers utilized in the "TVG-IYG" assay ranged from 48–1024. This level of degeneracy was driven by the number of nucleotide possibilities encoding the targeted amino acids at each position as well as by the number of amino acid positions allowed to be degenerate. Figure 5A shows the DFA/DFAS/QAHN sequence block produced by Block Maker from multiple alignments of 11 different herpesvirus polymerase sequences. Figure 5C shows the consensus amino acids at each position, as determined by the CODEHOP algorithm, which are boxed and bolded with the alternate amino acids positioned above. The original primer manually derived from this motif, "DFA" is, in fact, completely degenerate, with multiple codons provided for each amino acid position, except the ultimate proline (P) residue, yielding a pool of 512 different primers . Because the performance of this primer was consistently suboptimal in samples with limiting template, the overall structure and degeneracy of the primer was altered by designing a PCR primer "DFASA" from the same sequence motif using the CODEHOP methodology. This primer had an 11 bp 5' consensus region and a 3' degenerate core containing multiple codons at 5 amino acid positions resulting in a pool of 256 different primers (Figure 5C). The "DFASA" primer was successfully used to amplify extremely low amounts of viral DNA in a background of genomic DNA from paraffin-embedded formalin-fixed tissue in the discovery of the macaque homolog of Kaposi's sarcoma-associated herpesvirus, called retroperitoneal fibromatosis herpesvirus (RFHV) . Subsequent estimates of virus copy number using real-time quantitative PCR indicated a level of RFHV DNA in the available samples that was 1/100–1/1000 of a single copy cellular gene (unpublished observations). The "DFASA" primer has been successfully used to identify a number of novel alpha-, beta- and gammaherpesviruses in a wide variety of host organisms (see Additional File 1: "DFASA-GDTD1B assay").
Due to the presence of a highly conserved leucine (L) at block position 7 within the "DFAS" motif (Figure 5) which significantly increased the degeneracy of the primer pool with its six possible codons, an additional CODEHOP was designed from the "QAHN" motif immediately downstream of "DFAS" to further decrease degeneracy. The "QAHNA" primer had an 11 bp 5'consensus region and a 3' degenerate core containing multiple codons at 4 amino acid positions resulting in a pool of 48 different primers (Figure 5C). This CODEHOP has been successfully used to identify several primate rhadinoviruses related to KSHV in tissue samples with limiting amount of viral DNA [10, 19], see also Additional File 1.
Primer bias and specificity
The primers developed for the "TGV-IYG" assay were designed to amplify polymerase fragments from herpesviruses of all three subfamilies based on conserved motifs within the known sequences. However, very few sequence motifs were absolutely conserved between the most divergent herpesviruses. For example, the catfish ictalurid herpesvirus (IHV) lacked the "KGV" motif from which the initial "KGV" primer was derived (Figure 6). Furthermore, numerous sequence differences were present in the IHV DNA polymerase within the DFAS/QAHN motif which was otherwise highly conserved in other herpesvirus species (highlighted residues in Fig. 5B). Because of these differences, the IHV sequence was excluded from the primer design of the "DFA", "DFASA" and "QAHNA" PCR primers. As shown in Figure 5C, the "DFA" and "DFASA" primers have mismatches with the IHV sequence at the alanine (A) and leucine (L) codons (Block positions 5 and 7, respectively; Figure 5B) and the "QAHNA" primer mismatches at three codon positions (Block positions 13–15; Figure 5B), all within the 3' degenerate cores. Figure 8 shows the presence of nucleotide mismatches with the IHV sequence throughout the different primers (black highlighting). Thus, the lack of the "KGV" motif and sequence differences in the "DFA" primer strongly biased the "TGV-IYG" assay against IHV-like herpesvirus sequences. In order to identify IHV-like herpesviruses, new primers would have to incorporate these sequence differences.
The "DFA" and "DFASA" primer pools were originally designed using only the alanine (A) codon at block position 5 in the "DFAS" motif and did not include the glutamine (Q) codon found in that position of the motif in HHV6 and HHV7, "DFQS" (highlighted, Figure 5A, B). The nucleotide mismatches in this region are shown in Figure 8. While the "DFA" and "DFASA" primers are biased by design against HHV6 and HHV7, they have been used successfully to detect betaherpesviruses related to HHV6 and HHV7 . This suggests that mismatches 13–14 nucleotides from the 3' end of the primer, do not have major affects on the utility of the primers, especially when viral template is not limiting.
More significant bias against HHV6- and HHV7-like herpesviruses was present in the "TGV" primer used in conjunction with the "IYG" primer in the secondary nested PCR reaction in the "TGV-IYG" assay (see Figure 2B). The "TGV" primer contains the partial valine (V) codon "GT" at its 3' end (Block position 11; Figure 3C). Since both HHV6 and HHV7 contain alanine (A) (codon = GCN) at this position (highlighted in Fig. 3A, B), the "TGV" primer would mismatch at the 3' terminal nucleotide with both HHV6- and HHV7-like sequences. This mismatch occurs at the 3' end of the "TGV" primer and is predicted to significantly impair polymerase extension. To remove this bias, the "TGV" primer was redesigned as the "VYGA" primer removing the 3' terminal "GT" of the valine codon and the terminal degenerate position of the glycine (G) codon. The "TGV" primer contained an additional bias against amplification of HHV6-like sequences due to the use of only the phenylalanine (F) codons (TTY) (Block position 8) at a position encoding valine (V) in both HHV6 and HHV7 (highlighted in Figure 3A and 3B). To remove this bias, "VYGA" was designed to include both the valine (V) and (F) codons at this position. The total degeneracy of the "TGV" and "VYGA" primer pools remained the same, with 256 different primers, due to the loss of the degenerate codon position in the glycine, block position 10 in "TGV" and the gain of the degenerate codon positions in the valine, block position 8 in "VYGA".
The subsequent cloning and sequence analysis of new herpesvirus DNA polymerases from the rhadinoviruses, rhesus rhadinovirus (RRV) and alcelaphine herpesvirus 1 (AlHV1) [20, 21], revealed mismatches with the downstream "IYG" primer of the "TVG-IYG" herpesvirus assay. The "IYG" primer (a reverse orientation primer) includes the codons (ATH) for isoleucine (I) at its 3' end (Block position 1; Figure 4C). Both RRV and AH1 contain a valine (V) codon (GTN) at this position (highlighted in Figure 4A). Thus, "IYG" is biased against RRV-like or AH1-like rhadinoviruses due to a T-C mismatch at the 3' end of the primer. To eliminate this bias, the "IYG" primer was redesigned as "GDTD1B" to remove the isoleucine position within the 3' degenerate core (Figure 4C) and, in addition, the length of the 5' consensus clamp was increased.
Decrease in size of the amplification products
Because typical tissue samples especially paraffin-embedded formalin-fixed tissue contain degraded DNA with sizes averaging near 300–500 bp in length, we decided to decrease the maximal amplification product size of the herpesvirus assay. The initial amplification product of the "TGV-IYG" assay (DFA-KG1) was ~800 bp (Fig. 2B). To reduce the initial amplification product size, a hemi-nested PCR assay was developed in which the newly designed downstream anti-sense primer "GDTD1B" targeting the highly conserved "YGDT" motif was used in a primary PCR amplification with the new upstream primer "DFASA". This amplification yields an approximate 500 bp PCR product (Figure 2B). This initial PCR product is then used as template in a secondary PCR amplification using the nested primer "VYGA" with the downstream anti-sense primer "GDTD1B". This amplification yields a PCR product of approximately 200 bp (see Figure 2B). These modifications produce amplification products close to the average size of degraded DNA present in fixed tissue.
The "DFASA/QAHNA-GDTD1B" herpesvirus assay: a refinement of the "TGV-IYG" assay
We have developed a refined herpesvirus assay using the optimized DNA polymerase CODEHOP PCR primers, discussed above. This assay was designed to use only three CODEHOPs in a hemi-nested PCR assay in which "DFASA" and "GDTD1B" are used in an initial PCR amplification (Figure 2B). The product from that amplification is used as template in a secondary amplification with "VYGA" and the original anti-sense primer "GDTD1B". A variation of this assay uses the "QAHNA" to replace "DFASA". Thus, the amplification of novel polymerase sequences required the conservation of only three motifs, rather than five in the original "TGV-IYG" assay. Using these assays, we have identified three novel homologs of the newly characterized human herpesvirus, KSHV, in two species of macaques  (see Table 1, RFHVMn, RFHVMm and MneRV2). Phylogenetic analysis of the molecular sequences obtained from these studies provided strong evidence for the existence of two distinct lineages of γ2 rhadinoviruses related to KSHV, called rhadinovirus-1 (RV1) and rhadinovirus-2 (RV2) (Figure 9) . Subsequent studies by others using this assay, have identified the presence of additional members of these two lineages in other Old World primates, including African green monkeys , mandrills , chimpanzees [23, 24] and gorillas  (see Additional File 1). This data predicts the existence of another human herpesvirus closely related to KSHV belonging to the RV-2 lineage of rhadinoviruses .
The utility of the "DFASA/QAHNA-GDTD1B" assays has been demonstrated by these and other studies in which more than 19 novel herpesviruses from the alpha, beta and gamma subfamilies of a wide variety of host species have been identified and molecularly characterized using CODEHOPs (Tables 2 and 3). Comparison of the amino acid sequences encoded between the "DFAS" and "IYG/GDTD" motifs has allowed the phylogenetic comparison of the different herpesvirus species from which these sequences were obtained. Figure 9 shows a phylogenetic tree resulting from the analysis of the sequences obtained from the "DFA-IYG", and "DFASA/QAHNA-GDTD1B" assays and the corresponding sequences of six representative human herpesviruses. Multiple sequence alignments of the viral sequences were performed and the positions containing gaps were eliminated, leaving 142 amino acid positions for comparison. These sequences were analyzed using protein distances and neighbor-joining analysis implemented in the Phylip analysis package . As shown in Figure 9, most of the different viral species could be unambiguously included within either of the three herpesvirus subfamilies as indicated by the high bootstrap scores obtained for most of the branch points. However, the positioning of the branch points for certain viral species could not be reliably determined using the available sequence information. Such uncertainty has been seen in similar analysis of specific herpesvirus species using much larger data sets . The results obtained using the 142 amino acid comparisons confirmed and extended the phylogenic relationships predicted from the "TVG-IYG" results derived from only 53 amino acid comparisons. Furthermore, the phylogenetic relationships predicted by the different CODEHOP assays have been subsequently confirmed when substantially more sequence information was obtained from the new viral species, see [10, 11]. The phylogenetic relationships shown in Figure 9 are consistent with the findings that extensive cospeciation of viral species and their host lineages has occurred during evolution . The wide variety of different herpesvirus species identified using the CODEHOPs assays targeting the DNA polymerase gene, as shown in Figures 7 and 9, indicate the wide applicability of the CODEHOPs assays to detect herpesviruses from disparate host lineages.
The "SLYP1A-GDTD1B" herpesvirus assay: a general herpesvirus detection assay
We designed additional primers from the DFAS/QAHN sequence motif using the CODEHOP strategy to develop further assays to detect new herpesviruses. The primer "SLYP1A" was one such primer designed to eliminate bias in the 3' degenerate core of "DFA" and "DFASA" primers against HHV6 and HHV7, described above. The "SLYP1A" primer overlaps the "DFA" and "DFASA" primers and extends further downstream in a region very well conserved across the different herpesvirus species including HHV6 and HHV7 (Block positions 8–12; Figure 5C) . Primer design across this region was based on the similarities in the first two positions for the codons for isoleucine (I) – (ATA, ATC, ATT) and methionine (M) – (ATG). These two amino acids are conserved in two positions within this sequence block in all herpesvirus species, including IHV (Block positions 11,12; Figure 5) and provide the penultimate and ultimate 3' codons for the primer. Also, the SLYP1A primer was designed with only one of the two codon types utilized for serine (S) – (AGY) to minimize degeneracy in the 3' degenerate core (Block position 10; Figure 5C). Serine at this position (Block position 10; Figure 8) is encoded by AGY-type codons in all herpesvirus species, except for CMV-like herpesviruses which use TCN-type codons and EHV2 which contains a codon for threonine. A second related primer, SLYP2A was also designed from this region with an identical sequence except that the other serine codons (TCN) were used in the third position. Although this primer was biased for CMV-like sequences, we have successfully amplified KSHV which contains an AGT codon (unpublished results).
We have previously used "SLYP1A" and "GDTD1B" to identify a new herpesvirus related to RRV, called Macaca nemestrina rhadinovirus-2 (MneRV2) in spleen tissue . We subsequently used this assay to screen for herpesviruses in lymphomas from two rhesus macaques, L758 and 881, from the Tulane Regional Primate Research Center. DNA was kindly provided by LS Levy. Strong PCR products were obtained in primary amplification reactions and were cloned and sequenced. The lymphoma from rhesus 881 yielded clones containing a single sequence which was highly related to human EBV. From the lymphoma from rhesus L758, we obtained two distinct EBV-like sequences, one which was identical to the first lymphoma sequence and the other one which contained 10 nucleotide differences across the 475 bp fragment (98% identity). Analysis of the encoded amino acids revealed 3 amino acid differences (98% identity) between the two rhesus EBV-like sequences (MmuLCV1 and MmuLCV2) (Figure 10). These sequences clustered closely with human EBV in the γ1 branch of the phylogenetic tree shown in Figure 9. The identification of DNA polymerases from two types of EBV-like lymphocryptoviruses corroborates previous reports of the existence of two closely related lymphocryptoviruses in rhesus macaques  identified by sequence comparision of two distinct EBNA-2 genes. This is similar to the situation in humans where two different EBV species, EBV1 and EBV2 have been identified .
Using the CODEHOP strategy to determine the complete sequence of novel viral genes
The CODEHOP assays described above targeted a restricted region of one gene and only provided limited sequence information. We have also used CODEHOPs to obtain the complete sequence of targeted genes and identify flanking genes within the unknown viral genome. To obtain the complete sequences of the DNA polymerase genes of the newly identified herpesvirus species of macaques, RFHVMn and RFHVMm, we designed CODEHOP PCR primers from additional conserved sequence blocks within the DNA polymerase (Figure 11 and Table 4). The new DNA polymerase-derived CODEHOP PCR primers, "CVNVA" and "YFDKB" were used in conjunction with gene specific primers derived from within the sequence of the original CODEHOP PCR product "DFASA-GDTD1B to obtain overlapping PCR products across the majority of the DNA polymerase gene . In all gammaherpesviruses, the DNA polymerase gene (ORF 9) is flanked upstream by ORF 8, the glycoprotein B, the most highly conserved glycoprotein in herpesviruses and downstream by ORF 10, a gene conserved within the gammaherpesviruses with unknown function (Figure 11). CODEHOPs were designed from conserved sequence blocks present in ORF 8 – "FREYA" and "GGMA" and in ORF 10 "GDWE2B" (Table 4). Using a combination of gene-specific primers obtained from the DNA polymerase sequence obtained above and the new CODEHOPs derived from flanking regions, overlapping PCR products spanning 331 bp of the glycoprotein B genes, 3,039 bp of the DNA polymerase genes, and 27 bp of the ORF 10 gene homolog were obtained for RFHVMn and RFHVMm .
Using the CODEHOP strategy to characterize genomic regions within novel viral genomes
Often the linear order of genes within the genomes of related viruses is maintained. Thus, the spacing and orientation of specific genes can be predicted in the genomes of related novel viruses. CODEHOP PCR primers can be utilized to obtain sequences within conserved genes which flank a targeted genomic region. Gene-specific PCR primers derived from these sequences can then used in long-range PCR to obtain the sequence of the entire genomic region between the flanking genes. We have utilized this approach to clone and characterize a portion of the divergent locus B of the genome of the macaque rhadinovirus, RFHVMn . Divergent locus B was identified in KSHV and other rhadinoviruses and contains a number of viral homologs of cellular genes that have been captured during virus evolution . Part of the divergent locus B of KSHV extends upstream of the ORF 9 DNA polymerase gene to a viral homolog of the thymidylate synthase (TS) gene situated approximately 4 kb away (Figure 12A). TS is a cellular gene and a non-functional pseudogene is present in humans. Viral TS homologs are well conserved and are found in several herpesvirus species, including KSHV, VZV, EHV2, HVS and AtHV3. To characterize the putative divergent locus B between the DNA polymerase and TS genes of RFHVMn, we targeted the TS gene for PCR amplification using the CODEHOP approach.
Two conserved blocks of amino acids within the TS gene family containing 10 and 11 identical amino acids were chosen as candidates for CODEHOP design. The 10 amino acid "RHFG" upstream motif (Fig. 13) is completely conserved between the viral sequences, the human sequence and the human TS pseudogene. The 11 amino acid "DMGL" downstream motif (Fig. 13) while completely conserved between the viral and human sequences is not present in the cellular TS pseudogene (data not shown). Since the two motifs in the cellular TS gene are separated from each other by a large intron, CODEHOP PCR amplification of DNA containing a mixture of viral and cellular DNA should only produce a virus-specific ~280 bp PCR product (Fig. 12B).
The design of the "DMGLB" CODEHOP from the conserved "DMGL" motif is shown in Figure 14. This primer was designed before the CODEHOP prediction program was available. Because RFHVMn is closely related to the gammaherpesvirus, KSHV, the "DMGLB" CODEHOP was biased towards gammaherpesviruses, in particular KSHV-like herpesviruses, in order to target the RFHV genomes. In Figure 14, the nucleotide sequences encoding the "DMGL" motif from the TS genes of KSHV, HVS and EHV2 were multiply aligned with the encoded amino acid sequence. Because "DMGL" was the downstream motif, the "DMGLB" CODEHOP was designed to be antisense, however, the complementary sequence of the primer is shown to identify codons (Figure 14). Thus, the degenerate core of the CODEHOP spans the codons for the aspartic acid (D), methionine (M), glycine (G), and leucine (L) of the motif, and is indicated in lower case letters in Figure 14B. The degenerate core provides all possibilities of the codons for these four conserved amino acids and thus has no bias. However, the nucleotides within the consensus region, shown in capitol letters, were chosen at each codon position to be similar to the sequence of KSHV (highlighted in Figure 14A), thus biasing the primer towards KSHV-like sequences.
The TS targeted CODEHOPs "DMGLB" and "RHFGA" (see Table 5) were used in PCR amplification reactions with DNA isolated from retroperitoneal fibromatosis (RF) tumor tissue of a pig-tailed macaque, Macaca nemestrina, as described previously . A PCR product of the predicted size (280 bp) was obtained and cloned and sequenced, see Fig. 12B. The sequence was 68% identical to the KSHV TS sequence and 64% identical to the TS sequence of RRV, a more distantly related gammaherpesvirus. A TS-specific primer, TSR1LR, derived from this sequence and a DNA polymerase-specific primer, PolF1LR, were chosen to amplify the region between the DNA polymerase and TS genes of RFHV (Table 5 and Figure 12B). Long range PCR amplification produced a PCR product of ~4.1 kb which was sequenced. The linear order and sequence of 5 novel genes present in the diverse region B of the RFHVMn virus was obtained (Figure 12C). Although region B of RFHV lacked a homolog of KSHV ORF 11, homologs of all the other KSHV genes in this region were present and in the same order within the genome .
CODEHOP-mediated PCR – a general approach to identify novel viral genes
In the previous sections of this review the CODEHOP assays and PCR primers that we have used to identify and characterize novel herpesvirus genes and genomes have been discussed in detail. However, CODEHOP-mediated PCR can also be used to target conserved genes from other virus families. A general flowchart detailing the specific steps involved in the CODEHOP procedure to identify novel viral genes is shown in Figure 15. This procedure is based on the CODEHOP prediction software that we have previously developed and made accessible over the internet as part of the BLOCKS database . An example of this procedure is provided below where CODEHOP PCR primers targeting the "DMGL" motif of herpesvirus TS genes (introduced above) are designed using the web-based software.
Using the web-based software to design CODEHOP PCR primers to a conserved viral gene
The amino acid sequences of the TS genes from five herpesviruses were obtained using BLAST analysis of the NCBI protein database with the KSHV TS sequence as probe. The TS sequences from KSHV, VZV, EHV2, HVS and AtHV3 (Figure 16) were provided as input to ClustalW  and a multiple alignment was obtained. As shown in Figure 13, several regions of highly conserved sequences were present in the TS sequence alignment, and the positions of the "RHFG" and "DMGL" motifs targeted above are indicated. In order to predict CODEHOP PCR primers, the sequences of the TS genes were provided as input to the BlockMaker program of the Blocks Database  and a series of conserved sequence blocks were identified (ex., Gibbs Blocks, Figure 17). Alternatively, the ClustalW alignment, itself, could be provided as input to the "Multiple alignment processor" of the Blocks Database . In order to compare a computer-predicted CODEHOP with the manually derived CODEHOP (DMGLB), the TS Block_E containing the "DMGL" motif (Figure 17) was directly input to the CODEHOP program  using all default values except that the consensus region was elongated by increasing the temperature setting from the default 60°C to 70°C. The primers predicted from the complement of Block_E were examined in order to obtain a primer from the complementary strand which could be used in conjunction with the upstream TS primer RHFGA, described above. The underlined primer targeting the "DMGL" motif was chosen and named DMGLXB (Figure 18) and was compared with the manually designed DMGLB primer in Figure 14. Whereas "DMGLB" was purposefully biased by using the KSHV sequences in the 5' consensus clamp, the "DMGLXB" is "unbiased" in design with the 5' consensus sequence derived from the most frequently used codons in the human genome. The DMGLXB sequence was examined for potential stem loop structures that could compromise the function of the primer. As shown in Figure 14, a putative stem-loop structure was identified which is indicated by the underlined nucleotides in Figure 14B and 14C. To destablize this structure, the proline codon within the "DMGLGVP" motif was changed from the computer predicted "CCC", the most frequently used codon in humans, to "CCA", another common human codon, as shown in Figure 14. This yielded a revised CODEHOP, called "DMGLX1B" (shown as the complementary sequence in Figures 14B and 14C), in which the stem-loop structure was destabilized by substituting an A for the highlighted C in Figure 14C. The DMGLX1B antisense primer could then be used in combination with the RHFGA sense primer to amplify unknown TS genes.
Other examples of CODEHOP PCR primers designed from multiple alignments of the herpesvirus DNA polymerase sequences using the Web-based CODEHOP software are shown in Figures 3, 4, 5, 6. The VYG1A primer designed from the conserved VYG motif shown in Figure 3 is aligned with the original manually designed "TGV" and "VYGA" primers. The computer-predicted "YGDTB" primer designed from the conserved GDTD motif is aligned with the original "IYG" and "GDTD1B" primers (Figure 4). In the prediction of this primer, the conserved sequence block identified by BlockMaker from the sequences shown in Figure 4A, extended only from amino acid position 1 – 10, which was the limit of the conserved sequence block determined by BlockMaker. The CODEHOP software indicated the necessity to add additional nucleotides to the 5' end of the "YGDTB" primer to obtain the minimal length for the 5' consensus region of the primer. As such, the amino acid sequences of block positions 11–13 were obtained manually and compared in order to derive the eight terminal nucleotides for "YGDTB" (overlined in Figure 4C).
In this review, the utility of CODEHOP-mediated PCR for the identification of novel viruses and the characterization of new viral genes and genomic regions is presented. While the focus of this study was on the herpesvirus family, other virus families can be easily targeted using analogous approaches. We have previously developed successful CODEHOP assays targeting the reverse transcriptase genes of retroviruses and lentiviruses [2, 6]. Recently, the CODEHOP strategy has been used to develop assays to detect novel papillomaviruses targeting the highly conserved L1 protein . With the CODEHOP strategy, molecular sequence data can be readily obtained for comprehensive virus phylogenies and tracing of evolutionary pathways. Furthermore, comparison of multiple representatives of homologous viral proteins can be of importance for understanding the protein structure and function and provided insight into virus-host relationships.
consensus-degenerate hybrid oligonucleotide primer
polymerase chain reaction
retroperitoneal fibromatosis herpesvirus
Kaposi's sarcoma-associated herpesvirus.
Kaaden OR, Eichhorn W, Essbauer S: Recent developments in the epidemiology of virus diseases. J Vet Med B Infect Dis Vet Public Health 2002,49(1):3-6.
Rose TM, Schultz ER, Henikoff JG, Pietrokovski S, McCallum CM, Henikoff S: Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res 1998,26(7):1628-1635. 10.1093/nar/26.7.1628
CODEHOPs: Consensus-Degenerate Hybrid Oligonucleotide Primers[http://blocks.fhcrc.org/blocks/codehop.html]
Blocks WWW Server[http://blocks.fhcrc.org/]
Wilson CA, Wong S, Muller J, Davidson CE, Rose TM, Burd P: Type C retrovirus released from porcine primary peripheral blood mononuclear cells infects human cells. J Virol 1998,72(4):3082-3087.
Osterhaus AD, Pedersen N, van Amerongen G, Frankenhuis MT, Marthas M, Reay E, Rose TM, Pamungkas J, Bosch ML: Isolation and partial characterization of a lentivirus from talapoin monkeys (Myopithecus talapoin). Virology 1999,260(1):116-124. 10.1006/viro.1999.9794
VanDevanter DR, Warrener P, Bennett L, Schultz ER, Coulter S, Garber RL, Rose TM: Detection and analysis of diverse herpesviral species by consensus primer PCR. J Clin Microbiol 1996,34(7):1666-1671.
Rose TM, Strand KB, Schultz ER, Schaefer G, Rankin GWJ, Thouless ME, Tsai CC, Bosch ML: Identification of two homologs of the Kaposi's sarcoma-associated herpesvirus (human herpesvirus 8) in retroperitoneal fibromatosis of different macaque species. J Virol 1997,71(5):4138-4144.
Schultz ER, Rankin GWJ, Blanc MP, Raden BW, Tsai CC, Rose TM: Characterization of two divergent lineages of macaque rhadinoviruses related to Kaposi's sarcoma-associated herpesvirus. J Virol 2000,74(10):4919-4928. 10.1128/JVI.74.10.4919-4928.2000
Rose TM, Ryan JT, Schultz ER, Raden BW, Tsai CC: Analysis of 4.3 Kb of the divergent locus-B of macaque retroperitoneal fibromatosis-associated herpesvirus (RFHV) reveals close similiarity to Kaposi's sarcoma-associated herpesvirus (KSHV) in gene sequence and genome organization. J Virol 2003,77(9):5084-5097. 10.1128/JVI.77.9.5084-5097.2003
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 2003,31(13):3497-3500. 10.1093/nar/gkg500
Rose TM, Henikoff JG, Henikoff S: CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design. Nucleic Acids Res 2003,31(13):3763-3766. 10.1093/nar/gkg524
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990,215(3):403-410. 10.1006/jmbi.1990.9999
Wang J, Sattar AK, Wang CC, Karam JD, Konigsberg WH, Steitz TA: Crystal structure of a pol alpha family replication DNA polymerase from bacteriophage RB69. Cell 1997,89(7):1087-1099. 10.1016/S0092-8674(00)80296-2
Ehlers B, Borchers K, Grund C, Frolich K, Ludwig H, Buhk HJ: Detection of new DNA polymerase genes of known and potentially novel herpesviruses by PCR with degenerate and deoxyinosine-substituted primers. Virus Genes 1999,18(3):211-220. 10.1023/A:1008064118057
McGeoch DJ, Dolan A, Ralph AC: Toward a comprehensive phylogeny for mammalian and avian herpesviruses. J Virol 2000,74(22):10401-10406. 10.1128/JVI.74.22.10401-10406.2000
Greensill J, Sheldon JA, Renwick NM, Beer BE, Norley S, Goudsmit J, Schulz TF: Two distinct gamma-2 herpesviruses in African green monkeys: a second gamma-2 herpesvirus lineage among old world primates? J Virol 2000,74(3):1572-1577. 10.1128/JVI.74.3.1572-1577.2000
Desrosiers RC, Sasseville VG, Czajak SC, Zhang X, Mansfield KG, Kaur A, Johnson RP, Lackner AA, Jung JU: A herpesvirus of rhesus monkeys related to the human Kaposi's sarcoma-associated herpesvirus. J Virol 1997,71(12):9764-9769.
Ensser A, Pflanz R, Fleckenstein B: Primary structure of the alcelaphine herpesvirus 1 genome. J Virol 1997,71(9):6517-6525.
Lacoste V, Mauclere P, Dubreuil G, Lewis J, Georges-Courbot MC, Rigoulet J, Petit T, Gessain A: Simian Homologues of Human Gamma-2 and Betaherpesviruses in Mandrill and Drill Monkeys. J Virol 2000,74(24):11993-11999. 10.1128/JVI.74.24.11993-11999.2000
Greensill J, Schulz TF: Rhadinoviruses (gamma2-herpesviruses) of Old World primates: models for KSHV/HHV8-associated disease? Aids 2000,14(Suppl 3):S11-9.
Lacoste V, Mauclere P, Dubreuil G, Lewis J, Georges-Courbot MC, Gessain A: KSHV-like herpesviruses in chimps and gorillas. Nature 2000,407(6801):151-152. 10.1038/35025145
Cho YG, Gordadze AV, Ling PD, Wang F: Evolution of two types of rhesus lymphocryptovirus similar to type 1 and type 2 Epstein-Barr virus. J Virol 1999,73(11):9206-9212.
Dambaugh T, Hennessy K, Chamnankit L, Kieff E: U2 region of Epstein-Barr virus DNA may encode Epstein-Barr nuclear antigen 2. Proc Natl Acad Sci U S A 1984,81(23):7632-7636.
Nicholas J, Ruvolo VR, Burns WH, Sandford G, Wan X, Ciufo D, Hendrickson SB, Guo HG, Hayward GS, Reitz MS: Kaposi's sarcoma-associated human herpesvirus-8 encodes homologues of macrophage inflammatory protein-1 and interleukin-6. Nat Med 1997,3(3):287-292. 10.1038/nm0397-287
Multiple Alignment Processor[http://blocks.fhcrc.org/blocks/process_blocks.html]
Baines JE, McGovern RM, Persing D, Gostout BS: Consensus-degenerate hybrid oligonucleotide primers (CODEHOP) for the detection of novel papillomaviruses and their application to esophageal and tonsillar carcinomas. J Virol Methods 2005,123(1):81-87. 10.1016/j.jviromet.2004.08.020
Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990,18(20):6097-6100.
Henikoff S, Henikoff JG, Alford WJ, Pietrokovski S: Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene 1995,163(2):GC17-26. 10.1016/0378-1119(95)00486-P
Schatzl H, Tschikobava M, Rose D, Voevodin A, Nitschko H, Sieger E, Busch U, von der Helm K, Lapin B: The Sukhumi primate monkey model for viral lymphomogenesis: high incidence of lymphomas with presence of STLV-I and EBV-like virus. Leukemia 1993,7(Suppl 2):S86-92.
Jenson HB, Ench Y, Zhang Y, Gao SJ, Arrand JR, Mackett M: Characterization of an Epstein-Barr virus-related gammaherpesvirus from common marmoset (Callithrix jacchus). J Gen Virol 2002,83(Pt 7):1621-1633.
Russo JJ, Bohenzky RA, Chien MC, Chen J, Yan M, Maddalena D, Parry JP, Peruzzi D, Edelman IS, Chang Y, Moore PS: Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8). Proc Natl Acad Sci U S A 1996,93(25):14862-14867. 10.1073/pnas.93.25.14862
Rovnak J, Quackenbush SL, Reyes RA, Baines JD, Parrish CR, Casey JW: Detection of a novel bovine lymphotropic herpesvirus. J Virol 1998,72(5):4237-4242.
Lackovich JK, Brown DR, Homer BL, Garber RL, Mader DR, Moretti RH, Patterson AD, Herbst LH, Oros J, Jacobson ER, Curry SS, Klein PA: Association of herpesvirus with fibropapillomatosis of the green turtle Chelonia mydas and the loggerhead turtle Caretta caretta in Florida. Dis Aquat Organ 1999,37(2):89-97.
Quackenbush SL, Work TM, Balazs GH, Casey RN, Rovnak J, Chaves A, duToit L, Baines JD, Parrish CR, Bowser PR, Casey JW: Three closely related herpesviruses are associated with fibropapillomatosis in marine turtles. Virology 1998,246(2):392-399. 10.1006/viro.1998.9207
Richman LK, Montali RJ, Garber RL, Kennedy MA, Lehnhardt J, Hildebrandt T, Schmitt D, Hardy D, Alcendor DJ, Hayward GS: Novel endotheliotropic herpesviruses fatal for Asian and African elephants. Science 1999,283(5405):1171-1176. 10.1126/science.283.5405.1171
Ehlers B, Ulrich S, Goltz M: Detection of two novel porcine herpesviruses with high similarity to gammaherpesviruses. J Gen Virol 1999,80(Pt 4):971-978.
Li H, Dyer N, Keller J, Crawford TB: Newly recognized herpesvirus causing malignant catarrhal fever in white-tailed deer (Odocoileus virginianus). J Clin Microbiol 2000,38(4):1313-1318.
Strand K, Harper E, Thormahlen S, Thouless ME, Tsai C, Rose T, Bosch ML: Two distinct lineages of macaque gamma herpesviruses related to the Kaposi's sarcoma associated herpesvirus. J Clin Virol 2000,16(3):253-269. 10.1016/S1386-6532(99)00080-3
The author would like to thank Emily Schultz, Greg Bruce, Lin Bennet, Brian Raden, Jon Ryan, and Kurt Strand for their help in developing the CODEHOP PCR strategy, Jorja and Steve Henikoff, of the Fred Hutchinson Cancer Research Center, for the creation and maintenance of the CODEHOP software and website, and Jeannette Stahli for editing advice.
The author(s) declare that they have no competing interests.
Design, conception and preparation of the manuscript (TMR).
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Rose, T.M. CODEHOP-mediated PCR – A powerful technique for the identification and characterization of viral genomes. Virol J 2, 20 (2005). https://doi.org/10.1186/1743-422X-2-20
- Viral Species
- Block Position
- Conserve Sequence Block
- Amino Acid Sequence Motif
- Degenerate Core