Special Genomic Libraries

Hi-C Libraries & Sequencing

Summary: Hi-C is a library preparation technique that is designed to allow the investigation of the 3D organization of the genome. Briefly, samples are cross linked with formaldehyde, and then digested to generate ‘aggregates’ that consist of covalently linked segments of chromatin that were physically near each other in the nucleus. These chromatin aggregates are ligated under very dilute conditions to promote intra-aggregate ligation. Libraries are then constructed using conventional NGS library preparation techniques, and library molecules containing the desired proximity ligation events are selected for using streptavidin beads. We prepare Hi-C libraries using a kit from Dovetail Genomics.

For questions about Hi-C library preparation and sample submission, please contact Dr. Walt Lorenz at wlorenz@uga.edu.

Sample Preparation

Sample Type	Recommendations	Input Amount Needed
Tissue	• Tissues with high cellularity and low fat content are preferred, e.g. Brain, muscle, heart, or spleen • Samples should be collected from a live or recently deceased specimen and snap frozen in liquid nitrogen. • The protocol does not support fat, bone, or similar tissues. • Do not preserve samples using RNAlater, EtOH, or lyophilization. Samples should be stored at -80C and shipped on dry ice.	20-40mg per sample
Cells	• Any cell culture is compatible with Hi-C library preparation. • Adherent cells should be dissociated with Trypsin.	0.5x106 cells
Blood	• Blood samples should be collected from a live or recently deceased specimen. • An anti-coagulant must be added. EDTA is the preferred anti-coagulant. Heparin and Citrate (ACD-A) are acceptable alternatives. • Flash freeze samples in liquid nitrogen and store them at -80C. Samples should be shipped overnight on dry ice.	300uL-1mL of Blood. Samples are normalized to 0.5x106 cells.
Plants	• Leaves collected from plants at the one or two leaf seedling stage are preferred. • Young leaves from mature plants and plant tissue culture can also be used. • Snap-freeze samples in liquid nitrogen and store them at -80C. Samples should be shipped overnight on dry ice.	250mg of flash frozen tissue per sample

Multiplexing: The Dovetail Hi-C Library Preparation Kit supports the multiplexing of up to 8 samples for sequencing on one Illumina flow cell.

Sequencing: Sequencing of Hi-C libraries is done in two stages. In the first stage, the library is sequenced to generate ~2 million paired-end reads. These reads are used to QC the library using Dovetail’s HiRise software. This step ensures that the libraries have the desired proximity ligations between portions of the genome that are physically near each other in the cell, but distant in the genome assembly. If the library passes this QC step, a full-scale sequencing run is performed to reach the desired sequencing depth. PE75 reads are sufficient for Hi-C analysis, but longer read lengths can be used if desired.

The sequencing depth and number of Hi-C libraries required for Hi-C analysis depends on genome complexity. The table below lists the recommendations from the kit manufacturer, Dovetail Genomics:

Dovetail Genomics

	Simple Genomes		Complex Genomes
Genome Size (Gb)	No. of Hi-C Libraries	No. of Read Pairs to sequence (Millions)	No. of Hi-C Libraries	No. of Read Pairs to sequence (Millions)
1	1	100	1	100
2	1	200	2	250
3	1	300	3	450
4	2	400	4	500
5	2	500	5	850

Simple Genomes: Are dipoid or haploid, have repetitive content of less than 30%, and heterozygosity of less than 0.005%. Humans, many mammals, and some fish and birds are examples of simple genomes.

Complex Genomes: Have any of the following: polyploidy, repeat content above 30%, or heterozygosity above 0.005%. Many plants, salmonid fishes, and amphibians are examples of complex genomes. If you are unsure which category your genome of interest is in, Dovetail recommends following the guidelines for complex genomes.

For HiRise analysis, users should provide a draft genome assembly with an N50 greater than 1Mb and an N90 greater than 20kb.

Example QC data from Hi-C libraries made at the GGBC:

Zea Mays Ab10 Strain

Basic Assembly Statistics
Total Length	2,106,338,117 bp
Scaffold N50	223,902,240 bp
Scaffold N90	159,769,782 bp
Largest Scaffold	307,041,717 bp
Basic Library Statistics
Total Read Pairs Analyzed	4,237,074
Library Read Length	75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp	14.96%
1 kbp < Insert <= 100 kbp	1.48%
100 kbp < Insert <= 1 Mbp	0.62%
1 Mbp < Insert <= 3 Mbp	0.3%
3 Mbp < Insert <= 5Mbp	0.16%
5 Mbp < Insert	1.63%

Oryza Sativa

Basic Assembly Statistics
Total Length	373,245,519 bp
Scaffold N50	29,958,434 bp
Scaffold N90	23,207,287 bp
Largest Scaffold	43,270,923 bp
Basic Library Statistics
Total Read Pairs Analyzed	2,914,137
Library Read Length	75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp	34.2%
1 kbp < Insert <= 100 kbp	3.31%
100 kbp < Insert <= 1 Mbp	2.09%
1 Mbp < Insert <= 3 Mbp	1.37%
3 Mbp < Insert <= 5Mbp	0.84%
5 Mbp < Insert	2.65%

Homo sapiens

Basic Assembly Statistics
Total Length	3,257,330,713 bp
Scaffold N50	145,138,636 bp
Scaffold N90	58,617,616 bp
Largest Scaffold	248,956,422 bp
Basic Library Statistics
Total Read Pairs Analyzed	5,955,150
Library Read Length	75bp
Profile of Read Insert Distribution
0 bp < Insert <= 1 kbp	43.48%
1 kbp < Insert <= 100 kbp	5.21%
100 kbp < Insert <= 1 Mbp	0.05%
1 Mbp < Insert <= 3 Mbp	0.04%
3 Mbp < Insert <= 5Mbp	0.03%
5 Mbp < Insert	1.12%

TnSeq Libraries & Sequencing

For questions about TnSeq library preparation and sample submission, please contact Dr. Walt Lorenz at wlorenz@uga.edu.