Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1)
© Lu et al. 2010
Received: 1 July 2010
Accepted: 7 October 2010
Published: 7 October 2010
Skip to main content
© Lu et al. 2010
Received: 1 July 2010
Accepted: 7 October 2010
Published: 7 October 2010
The Epstein-Barr Virus (EBV) Nuclear Antigen 1 (EBNA1) protein is required for the establishment of EBV latent infection in proliferating B-lymphocytes. EBNA1 is a multifunctional DNA-binding protein that stimulates DNA replication at the viral origin of plasmid replication (OriP), regulates transcription of viral and cellular genes, and tethers the viral episome to the cellular chromosome. EBNA1 also provides a survival function to B-lymphocytes, potentially through its ability to alter cellular gene expression. To better understand these various functions of EBNA1, we performed a genome-wide analysis of the viral and cellular DNA sites associated with EBNA1 protein in a latently infected Burkitt lymphoma B-cell line. Chromatin-immunoprecipitation (ChIP) combined with massively parallel deep-sequencing (ChIP-Seq) was used to identify cellular sites bound by EBNA1. Sites identified by ChIP-Seq were validated by conventional real-time PCR, and ChIP-Seq provided quantitative, high-resolution detection of the known EBNA1 binding sites on the EBV genome at OriP and Qp. We identified at least one cluster of unusually high-affinity EBNA1 binding sites on chromosome 11, between the divergent FAM55 D and FAM55B genes. A consensus for all cellular EBNA1 binding sites is distinct from those derived from the known viral binding sites, suggesting that some of these sites are indirectly bound by EBNA1. EBNA1 also bound close to the transcriptional start sites of a large number of cellular genes, including HDAC3, CDC7, and MAP3K1, which we show are positively regulated by EBNA1. EBNA1 binding sites were enriched in some repetitive elements, especially LINE 1 retrotransposons, and had weak correlations with histone modifications and ORC binding. We conclude that EBNA1 can interact with a large number of cellular genes and chromosomal loci in latently infected cells, but that these sites are likely to represent a complex ensemble of direct and indirect EBNA1 binding sites.
Epstein-Barr virus (EBV) is a human lymphotropic gammaherpesvirus associated with a spectrum of lymphoid and epithelial cell malignancies, including Burkitt's lymphoma, Hodgkin's disease, nasopharyngeal carcinoma, and post-transplant lymphoproliferative disease (reviewed in [1, 2]). EBV establishes a long-term latent infection in human B-lymphocytes where it persists as a multicopy episome that periodically may reactivate and produce progeny virus. During latency the EBV genome expresses a limited number of viral genes that are required for viral genome maintenance and host-cell survival. The viral gene expression pattern during latency can vary depending on the cell type and its proliferative capacity (reviewed in [3, 4]). Among the latency genes, EBNA1 is the most consistently expressed in all forms of latency and viral-associated tumors. EBNA1 is required for the establishment of episomal latent infection and for the long-term survival of latently infected cells.
EBNA1 is a nuclear phosphoprotein that binds with high-affinity to three major DNA sites within the EBV genome (reviewed in ). At OriP, EBNA1 binds to each of the 30 bp elements of the family of repeats (FR), and to four 18 bp sequences within the dyad symmetry (DS) element. EBNA1 binding to OriP is essential for plasmid DNA replication and episome maintenance, and can also function as a transcriptional enhancer of the C promoter (Cp) [7, 8]. At the Q promoter (Qp), EBNA1 binds to two 18 bp sequences immediately downstream of the transcriptional start site, and functions as an inhibitor of transcription initiation and mRNA accumulation . EBNA1 binds directly to DNA through its C-terminal DNA binding domain [5, 10]. The structure of the EBNA1 DNA binding domain has been solved by X-ray crystallography and was found to have structural similarity to papillomavirus E2 protein DNA binding domain [11, 12]. In addition to direct DNA binding through the C-terminal domain, EBNA1 tethers the EBV genome to metaphase chromosomes through its amino terminal domain [13, 14]. The precise chromosomal sites, proteins, or structures through which EBNA1 attaches during metaphase are not completely understood [14–16].
Recent studies have revealed that EBNA1 can bind to and regulate numerous cellular gene promoters [17, 18]. Others have identified cellular phenotypes, like genomic instability, and the genes associated with genomic instability, to be regulated by ectopic expression of EBNA1 in non-EBV infected Burkitt lymphoma cell lines . Overexpression of the EBNA1 DNA binding domain, which functions as a dominant negative in EBV infected cells, can inhibit cell viability in uninfected cells, suggesting that EBNA1 binds to and regulates cellular genes important for cell survival . In more recent studies, EBNA1 binding was examined at a subset of cellular sites using predicted promoter arrays. However, EBNA1 is likely to bind to other regions of the cellular chromosome that may be important for long-distance enhancer-promoter interactions, as well as for regulation of chromatin structure and DNA replication. To explore these additional possible functions of EBNA1, we applied Solexa-based deep sequencing methods to analyze the genome-wide interaction sites of EBNA1 in latently infected Raji Burkitt lymphoma cells. Our results corroborate previous studies that demonstrate multiple cellular promoter binding sites for EBNA1, and extend these studies to reveal numerous EBNA1 binding sites not closely linked to a promoter start site. We conclude that EBNA1 has the potential to function as a global regulator of cellular gene expression and chromosome organization, similar to its known function in the EBV genome.
Solexa Sequencing and Genome Mapping Summary
Solexa Illumina Pass Filtered sequence
Mapped to Human Genome
Mapped to EBV Genome
To determine if EBNA1 bound to several distinct motifs, we rederived the consensus sites for Motif 2 (Figure 4C) and Chr 11 (Figure 4D) using a higher stringency for peak scores >10 and narrower window. We find that these consensus motifs are significantly different from each other and from previously established binding site consensus from EBV genome sites. The chr11 motif is found 771 times in the complete human genome, but is occupied by EBNA1 at only 23 of these sites (> 8 fold enrichment and peak score > 10). Motif 2 is found 429331 times in the human genome, but is occupied by EBNA1 at only 74 sites. These finding indicate that EBNA1 can bind directly to multiple cellular sites in the cellular genome, but actual binding may be restricted by chromatin context. These findings also indicate that EBNA1 can recognize a more degenerate DNA consensus site than previously appreciated. A similar conclusion was reached by Dresang et al. . We also found that many EBNA1 ChIP-Seq binding sites were enriched for motifs that could not bind EBNA1. Among the most significant consensus motifs that did not bind directly to EBNA1 is shown in Figure 4E. Using search algorithms JASPER and TomTom to identify potential overlapping transcription factor recognition motifs, we found that Motif 4 contains a consensus Sp1 (p value .0011) and a Staf/Znf123 (p value .0023) recognition site. The identification of such consensus sites may help to identify cellular factors that mediate EBNA1 interaction with chromosomes through indirect mechanisms.
Raji has a rearranged copy of the c-myc gene adjacent to the gamma 1 constant region gene of the human immunoglobulin heavy-chain locus, t(8;14) (q24;q32) . Examination of EBNA1 binding sites in these translocated regions revealed peaks of >3 fold enrichment at the cMyc 3' end of chromosome 8 and >10 fold enrichment within the IgH locus of chromosome 14. In Raji Burkitt lymphoma, these two sites are fused together by a breakpoint in the cMyc and IgH 5' region, thus bringing the two EBNA1 binding sites in close proximity in the translocated allele. Although the mechanism of translocation is unknown, EBV has been considered a potential driving force for the Burkitt's translocations, and it is possible that these EBNA1 binding sites may link these sites to facilitate translocation.
In this study, we used ChIP-Seq to identify ~903 high occupancy (> 10 fold enrichment and peak score > 8), and ~4300 moderate occupancy (> 3 fold enrichment and peak score >5) binding sites for EBNA1 in the cellular chromosome of a human Burkitt lymphoma cell line. Several (~25) of the high and low occupancy binding sites identified by ChIP-Seq were validated for binding by conventional ChIP and real-time PCR (Figure 5). There was a good correlation between ChIP-Seq and ChIP-PCR for these binding sites, providing a high level of confidence in the ChIP-Seq data. Furthermore, only bonafide EBNA1 binding sites in the EBV genome (FR, DS, and Qp) scored positive in ChIP-Seq analysis (Figure 1A), suggesting that few false positives were generated by this method. Among the high-occupancy binding sites, we noted that ~7% are located within 500 bp of an annotated or predicted transcription start site (Figure 3A). We also noted that ~45% of EBNA1 binding sites overlapped with a repetitive DNA element (Figure 3B). Several DNA motifs could be identified in the high-occupancy EBNA1 binding site data set, but only two of these were found to bind directly to recombinant EBNA1 protein in vitro (Figure 4). Since EBNA1 is known to bind DNA directly through its carboxy-terminal DNA binding domain, and indirectly through its amino terminal tethering domain, it seems likely that many of the binding sites identified by ChIP-Seq represent a composite of these direct and indirect DNA-binding modes of EBNA1.
A remarkable finding from this study was the identification of a cluster of high-affinity EBNA1 binding sites in chromosome 11. The cluster represents ~10 kb of repetitive sequence situated between the divergent promoters for the Fam55B and Fam55 D genes. The function of the Fam55B and D proteins is not known, and shRNA depletion of EBNA1 had no detectable effect on Fam55B or D gene transcription. This region was elevated in histone H3 K9me3 (Figure 8) suggesting that it is largely heterochromatic and unlikely to be involved in transcription activation. We considered whether this site may represent a cellular origin of DNA replication, but we were unable to identify ORC2 or MCM protein binding at this site (data not shown). Purified EBNA1 DBD protein bound with high affinity to the major repeat elements in the chromosome 11 cluster, indicating that the binding is direct and mediated by the EBNA1 DNA binding domain. At present, it is not clear whether EBNA1 binding to this region of chromosome 11 has any functional significance.
Position weighted matrix (PWM) analysis and Web LOGO presentation revealed that many cellular EBNA1 binding sites are distinct from the consensus sites observed at EBV genome binding sites found at the FR, DS, or Qp regions. The chromosome 11 binding site consensus TGG[g/a]TAA[T/C][A/C]A[g/c]TGTT[G/A]CCT and the Motif 2 GG[C/T]AGCAtaT[A/G]CT[A/T][T/C]C do not resemble the consensus derived for previously known EBNA1 binding sites in the viral genome. However, our Motif 2 is similar the new consensus G[A/G][T/C]AGcATaTGCTaCC derived by Dresang et al using 70 viral and cellular binding sites . In a separate study, Canaan et al. identified GaA[G/A]TAT[T/C] as a consensus site for EBNA1 binding at cellular genes subject to EBNA1 dependent regulation and association with EBNA1 protein by ChIP . However, it was not clear from the Canaan et al. study whether these binding sites are bound directly or indirectly by the EBNA1 DNA binding domain. We also identified several motifs enriched in the EBNA1 ChIP-Seq peaks that did not bind directly to EBNA1 DNA binding domain in vitro. These sites may reflect indirect DNA binding by EBNA1, potentially through interactions with other sequence specific factors or chromatin-associated proteins. Cellular factors that have been implicated in mediating EBNA1 tethering to metaphase chromosomes, including EBP2 [27–29], histone H1 , and HMGA1 [14, 30], may be good candidates for interactions with some of these indirect binding motifs. In the future, it will be important to determine if there are functional differences between these different classes of EBNA1 binding sites, and what cellular factors mediate the indirect binding of EBNA1 to cellular chromosomes.
EBNA1 binding sites close to RefSeq genes
RefSeq gene hit
Dist. to hit
Motif 4.Motif 5
Motif 2.Motif 5
Motif 2.Motif 5
Raji cells (human EBV positive Burkitt lymphoma line) and DG75 (human EBV negative Burkitt lymphoma line) were maintained in RPMI containing 10% FBS and supplemented with glutamax (Invtitrogen) and antibiotics (penicillin and streptomycin). EBV-293 contains a hygromycin resistant EBV bacmid in human embryonic kidney (HEK) 293 cells (a kind gift of H. Delecluse) were maintained in RPMI containing 10% FBS, hygromycin (100 μg/ml), glutamax, and antibiotics. pCMV-Flag-EBNA1 was previously described . shRNA directed against EBNA1 was generated by cloning the targeting hairpin sequence (gatatgtctcccctccctcctaggccactcaagcttcaatggcctaggagagaagggagacacatc) into the pENTR/D-Topo vector (Invitrogen).
ChIP assays were performed as described previously . Quantification of precipitated DNA was determined using real-time PCR and the standard curve method for absolute quantitation (ABI 7000 Real-Time PCR System). IPs were performed in triplicate for each antibody and the PCR reactions were repeated at least three times and standard deviations were indicated by error bars. Primers for ChIP assays are listed in Additional File 1 in the online Data Supplement. The following rabbit polyclonal antibodies were used for ChIP assays: anti-EBNA1 (305/10 wk), anti-IgG (Santa Cruz Biotechnology), anti-Flag (Sigma), anti-Acetylated histone H3 and H4 (Millipore), anti-dimethylated histone H3K4 (Abcam), anti monomethylated H4k20 (Abcam), and anti-trimethyl histone H3K9 and H3K4 (Millipore). Mouse monoclonal anti-actin (Sigma) and anti-EBNA1 (Advance Biotechnology) were used for Western Blotting.
Briefly, RNA was isolated from 2 × 105 cells using RNeasy Kit (Qiagen) and then further treated with DNase I. Reverse transcriptase PCR (RT-PCR) was done as previously described. Real-time PCR was performed with SYBR green probe in an ABI Prism 7000 according to the manufacturer's specified parameters. Primer sequences for RT-PCR are listed in Additional File 2.
1 × 104 Raji cells were plated in 96-well plates at 96 hrs post-transfection of shEBNA1 or Control shRNA. Cell viability was then measured by incorporation of 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide (MTT) (Millipore, Cell Growth Assay Kit), according to the manufacturer's protocol.
Purified EBNA1 DNA binding domain (DBD) (aa 459-607) was expressed and purified from E. coli as a hexa-histidine fusion protein in Escherichia coli.
Protein-DNA binding reactions contained 10% Glycerol, 200 mM NaCl, 20 mM Tris-Cl pH 7.4, 1 mM DTT, 10 μg/mL BSA, 10 nM 32P-labeled oligonucleotide DNA and 246 nM purified EBNA1 DBD. Samples were incubated for 20 min at 25°C then loaded onto a 6% polyacrylamide gel and electrophoresed for 90 minutes at 170 V in 1× TBE. Gels were dried and visualized by PhosphorImager. Oligonucleotides used for EMSA are listed in Additional File 3.
Solexa ChIP-Seq experiments were performed with 2 × 106 Raji cells per IP with either EBNA1 monoclonal antibody or control mouse IgG. ChIP methods were identical to conventional ChIP assays  with the exception that competitor salmon sperm DNA was excluded from all IP and wash buffers, and purified ChIP DNA was resuspended in 25 dH2O. DNA fragments of ~150-300 bp range were isolated by agarose gel purification, ligated to primers, and then subject to Solexa sequencing using manufacturers recommendations (Illumina, Inc.).
Image analysis and base calling of ChIP-seq data was performed using Illumina pipeline software version 1.4. Sequence alignment to the human genome hg18 was done using Illumina casava_1.4 module. Uniquely aligned sequence tags, with up to two mismatches, were taken for the downstream analysis.
A combination of fold ratio and Poisson model for the tag distribution  was used to define peaks as follows: (i) Identification of genomic regions (of length 1000 bp) enriched with ChIP-seq sequence tags using fold ratio - A genomic region is considered as sequence enriched if the fold ratio, calculated using number of reads normalized to the total reads within that region in ChIP (antibody treatment) sample divided by the number of reads normalized to the total reads in control sample (IgG control) in the same region, is higher than the given cutoff. Nearby enriched regions were merged to make broader enriched genomic regions. A cutoff of 3 was applied to find the initial genomic regions of enrichment at this stage. (ii) Creating the read overlapping profile for each identified region from step 1, by extending the sequence reads from the 5' end to the 3' end of the reads up to 300 bps (the average length of the ChIP-DNA fragment sequenced from the Solexa GA with Illumina standard ChIP-seq protocol) for the experiment sample. (iii) Peak identification, by using Poisson model - by counting the number of overlapped reads at each nucleotide position and defining the genomic position with the highest number as the peak position within the significant region. Finally, only those genomic regions that have fold ration > 10 and peak score > 8 and p value < 0.001 (as determined by Poisson background model) are considered as statistically significant. Peak score is calculated as the average value of raw counts within a given region of significant fold enrichment relative to control IgG levels. The average is measured for overlapping tags at every base after extending the tags to their average tag length within the significant region.
For annotating the ChIP-seq peaks, we referred to gene information tracks from various sources available at UCSC genome browser. The tracks include Refseq gene, UCSC gene, Ensembl gene, and Vega gene. Every peak was annotated to the closest TSS regardless whether the peak is residing upstream or downstream to the TSS. Figure 3A shows the distribution of ChIP-seq peaks relative to the TSSs. We then selected a subset of ChIP-seq peaks, such that the peaks are within ± 500 bp around TSS. We call these peaks as TSS associated peaks. Overlap with specific genes is provided in Table 2.
Repeat region files were downloaded from UCSC genome browser. All those peaks that fall in repetitive regions are annotated according to the type of repetitive region. Same method was used in finding the overlap of peaks with the repeat sub categories.
We selected only the highly enriched genomic regions (enriched region fold ratio > 10 and peak score > 8) for motif identification. A sequence window of 60 bp around each peak was used for motif searching. We applied MEME online version to find the statistically significant sequence motifs [36, 37]http://meme.nbcr.net/meme4_4_0/intro.html. Possible EBNA1 binding motifs were predicted based on highest number of occurrence with the lowest p-value under "zero or one per sequence" option. Position weighted natrix (PWM) generated by MEME were then represented in the logo format by using Web Logo http://weblogo.berkeley.edu/logo.cgi to generate consensus sequences for multiple cellular EBNA1 binding sites. PWMs of the motifs identified by MEME were matched with the JASPER core database http://jaspar.genereg.net/ using the TOMTOM program http://meme.nbcr.net/meme4_3_0/cgi-bin/tomtom.cgi with a q-value significance (false discovery rate) threshold of under 0.5.
H3K4me2, me3 and H3K9me2, me3 ChIP-seq datasets were downloaded from NCBI GEO and SRA databases, published in [31, 38]. The accession numbers for the datasets are as follow: SRX000147, SRX000148, SRX000153, SRX000154, and GSM325898. Each ChIP-seq dataset was processed at p < 0.001 using Poisson background model. A histone modification mark is considered as overlapping with EBNA1 peak if it falls within ± 1000 bp around the EBNA1 peak.
We thank Andreas Wiedmer for technical support and the Wistar Institute Cancer Center Core Facilities for Bioinformatics, Genomics, and Flow Cytometry. This work was supported by grants from NIH (RO1CA093606 and R01DE017336) to PML.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.