Special Genomic Libraries

Hi-C Libraries & Sequencing

Summary: Hi-C is a library preparation technique that is designed to allow the investigation of the 3D organization of the genome. Briefly, samples are cross linked with formaldehyde, and then digested to generate ‘aggregates’ that consist of covalently linked segments of chromatin that were physically near each other in the nucleus. These chromatin aggregates are ligated under very dilute conditions to promote intra-aggregate ligation. Libraries are then constructed using conventional NGS library preparation techniques, and library molecules containing the desired proximity ligation events are selected for using streptavidin beads. We prepare Hi-C libraries using a kit from Dovetail Genomics.

For questions about Hi-C library preparation and sample submission, please contact Dr. Walt Lorenz at wlorenz@uga.edu.

Sample Preparation

Sample TypeRecommendationsInput Amount Needed
Tissue• Tissues with high cellularity and low fat content are preferred, e.g. Brain, muscle, heart, or spleen
• Samples should be collected from a live or recently deceased specimen and snap frozen in liquid nitrogen.
• The protocol does not support fat, bone, or similar tissues.
• Do not preserve samples using RNAlater, EtOH, or lyophilization. Samples should be stored at -80C and shipped on dry ice.

20-40mg per sample
Cells
• Any cell culture is compatible with Hi-C library preparation.
• Adherent cells should be dissociated with Trypsin.

0.5x106 cells
Blood • Blood samples should be collected from a live or recently deceased specimen.
• An anti-coagulant must be added. EDTA is the preferred anti-coagulant. Heparin and Citrate (ACD-A) are acceptable alternatives.
• Flash freeze samples in liquid nitrogen and store them at -80C. Samples should be shipped overnight on dry ice.
300uL-1mL of Blood. Samples are normalized to 0.5x106 cells.
Plants• Leaves collected from plants at the one or two leaf seedling stage are preferred.
• Young leaves from mature plants and plant tissue culture can also be used.
• Snap-freeze samples in liquid nitrogen and store them at -80C. Samples should be shipped overnight on dry ice.

250mg of flash frozen tissue per sample

 

Multiplexing: The Dovetail Hi-C Library Preparation Kit supports the multiplexing of up to 8 samples for sequencing on one Illumina flow cell.

Sequencing: Sequencing of Hi-C libraries is done in two stages. In the first stage, the library is sequenced to generate ~2 million paired-end reads. These reads are used to QC the library using Dovetail’s HiRise software. This step ensures that the libraries have the desired proximity ligations between portions of the genome that are physically near each other in the cell, but distant in the genome assembly. If the library passes this QC step, a full-scale sequencing run is performed to reach the desired sequencing depth. PE75 reads are sufficient for Hi-C analysis, but longer read lengths can be used if desired.

The sequencing depth and number of Hi-C libraries required for Hi-C analysis depends on genome complexity. The table below lists the recommendations from the kit manufacturer, Dovetail Genomics:

Dovetail Genomics

Simple GenomesComplex Genomes
Genome Size (Gb)No. of Hi-C
Libraries
No. of Read Pairs to sequence (Millions)No. of Hi-C
Libraries
No. of Read Pairs to sequence (Millions)
111001100
212002250
313003450
424004500
5
25005850

 

Simple Genomes: Are dipoid or haploid, have repetitive content of less than 30%, and heterozygosity of less than 0.005%. Humans, many mammals, and some fish and birds are examples of simple genomes.

Complex Genomes: Have any of the following: polyploidy, repeat content above 30%, or heterozygosity above 0.005%. Many plants, salmonid fishes, and amphibians are examples of complex genomes. If you are unsure which category your genome of interest is in, Dovetail recommends following the guidelines for complex genomes.

For HiRise analysis, users should provide a draft genome assembly with an N50 greater than 1Mb and an N90 greater than 20kb.

Example QC data from Hi-C libraries made at the GGBC:

Zea Mays Ab10 Strain

Basic Assembly Statistics
Total Length2,106,338,117 bp
Scaffold N50223,902,240 bp
Scaffold N90159,769,782 bp
Largest Scaffold307,041,717 bp
Basic Library Statistics
Total Read Pairs Analyzed
4,237,074
Library Read Length75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp14.96%
1 kbp < Insert <= 100 kbp1.48%
100 kbp < Insert <= 1 Mbp 0.62%
1 Mbp < Insert <= 3 Mbp0.3%
3 Mbp < Insert <= 5Mbp0.16%
5 Mbp < Insert1.63%

Oryza Sativa

Basic Assembly Statistics
Total Length373,245,519 bp
Scaffold N5029,958,434 bp
Scaffold N9023,207,287 bp
Largest Scaffold43,270,923 bp
Basic Library Statistics
Total Read Pairs Analyzed
2,914,137
Library Read Length75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp34.2%
1 kbp < Insert <= 100 kbp3.31%
100 kbp < Insert <= 1 Mbp 2.09%
1 Mbp < Insert <= 3 Mbp1.37%
3 Mbp < Insert <= 5Mbp0.84%
5 Mbp < Insert2.65%

Homo sapiens

Basic Assembly Statistics
Total Length3,257,330,713 bp
Scaffold N50145,138,636 bp
Scaffold N9058,617,616 bp
Largest Scaffold248,956,422 bp
Basic Library Statistics
Total Read Pairs Analyzed
5,955,150
Library Read Length75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp43.48%
1 kbp < Insert <= 100 kbp5.21%
100 kbp < Insert <= 1 Mbp 0.05%
1 Mbp < Insert <= 3 Mbp0.04%
3 Mbp < Insert <= 5Mbp0.03%
5 Mbp < Insert1.12%

TnSeq Libraries & Sequencing

For questions about TnSeq library preparation and sample submission, please contact Dr. Walt Lorenz at wlorenz@uga.edu.