Bioinformatics
Next-generation (NGS) and Third-Generation sequencing approaches are ubiquitous in biological, genetic and clinical experimentation. Researchers requiring manipulation and/or evaluation of these data often realize that either specialized guidance or higher-level bioinformatics expertise is needed to expand their capabilities and accomplish their goals.
The GGBC Bioinformatics team at UGA is comprised of experienced research faculty and graduate student interns, who apply best-practice methodologies and employ open-source and custom-built software for data processing, data analysis and visualization options for a wide range of NGS datasets.
Our consultation services are free for input on experimental design and proposal development. Some of the more common bioinformatics workflows, e.g. RNA-Seq, bacterial genome assembly, transcriptome assembly and variant analysis, are priced in an accessible, modular fashion (see Prices below). More complex analyses, e.g. eukaryotic genome assembly, comparative genomics and many customized workflows are priced using hourly rates, since it’s often tough to be accurate with estimates until the work starts.

Offerings
- Team consultation
- Experimental design
- Variety of computational and bioinformatics analyses
- Training on specific analysis upon request
- Customized analysis pipelines
Guidelines for Different Analysis Workflows
Microbiome 16S/18S/ITS Sequencing Data Analysis
- Type of Data: Targeted sequencing of any of the 16S or 18S or ITS regions.
- Analysis methods:
Quality-based sequence trimming and removal of adapters and specific primer sequences, removal of chimeric sequences, joining the forward and reverse reads, representative sequences identification, taxonomy classification, and statistical analysis. - Deliverables:
Species Richness per sample, Species Relative Abundance among samples, Core Microbiomes, Alpha-diversity, Beta-diversity, Differential Analysis
RNA-Seq
- Type of Data: Illumina PE 50, 75, 100, etc… stranded libraries. Required: annotated (GFF3/GTF) genome or transcriptome.
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQC
Data Quality Trimming: Trimmomatic
Mapping: STAR/Bowtie2, depending on input dataset.
Expression Analysis: DESeq2/edgeR/RSEM, depending on reference and dataset. - Deliverables:
Quality assessment for all samples (raw and trimmed), PCA and/or BCV analysis of samples, MA plots, list of differentially expressed genes in Excel format including fold change, P values, FDR, normalized counts, etc… as well as ancillary files, e.g. read mapping metrics, BAM files.
Small RNA (sRNA) Analysis
- Type of Data: Illumina SE75 (CAP-miRSeq requirements: reference genome, miRbase species accession)
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQC
Data Trimming: CutAdapt
Expression Analysis: CAP-miRSeq Pipeline (Bowtie, Randfold, HTSEQ, MIRDEEP2) - Deliverables:
Quality assessment for all samples (raw and trimmed), trimmed read distributions, profile of all small RNAs present in each sample, prediction of novel miRNAs.
General expression results: Excel files with mature, raw, normalized and novel miRNA counts.List of differentially expressed miRNAs in Excel format including fold change, P values, FDR, normalized counts.
SNP Analysis
- Type of Data: Illumina PE75. Mapping reference required.
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQ Data Trimming: Trimmomatic
Mapping and mark duplicates: BWA, Picard
Variant Calling and filtering: GATK - Deliverables:
Quality assessment for all samples (raw and trimmed), trimmed read metrics, mapping and read duplication stats, SNP and Indel variant call files (VCF) for both raw and filtered datasets.
Bacterial Genome Assembly & Annotation (short read)
- Type of Data: Illumina PE150, PE300
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQC
Data Trimming: Trimmomatic
Assembly: SPAdes
Benchmarking: Quast, BUSCO, BlastN, Mauve
Automated Annotation & prophage discovery: RASTtk, PHASTER - Deliverables:
Assembly fasta file, Quality assessments for all samples (raw and trimmed, Quast summary metrics (plus/minus reference), BUSCO identification of core gene set, Mauve alignment to closest genome reference and ordering of contigs, BlastN (tabular output), RASTtk annotation (xls, gff, gbk, peptide.fa), PHASTER identification of prophage sequence(s).
Bacterial Genome Assembly & Annotation (long read)
- Type of Data: PacBio
- Analysis methods:
Error correction, assembly and contig polishing: Canu, BLASR, Arrow
Benchmarking: Quast, BUSCO. - Deliverables:
Assembly fasta file, Contig coverage plots, Quast summary metrics (plus/minus reference), BUSCO identification of core gene set, Mauve alignment to closest genome reference and ordering of contigs, BlastN (tabular output), RASTtk annotation (xls, gff, gbk), PHASTER identification of prophage sequence(s).
Eukaryotic Genome Assembly (de novo, long read)/Custom
- Type of Data: Illumina PE
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQC
Data Trimming: Trimmomatic
Assembly: Velvet/Soapdenovo2/ABySS (depending on dataset)
Benchmarking: Quast, BUSCO, BlastN - Deliverables:
Assembly fasta file, Quast summary metrics (plus/minus reference), BUSCO identification of core gene set, BlastN (tabular output).
Transcriptome Assembly (short read)
- Type of Data: PE75, PE150
- Analysis methods:
Data Quality Assessment (Raw and Trimmed): FastQC
Data Trimming: Trimmomatic
Assembly: Trinity
Benchmarking: Quast, BUSCO - Deliverables:
Assembly fasta file, Quast summary metrics (plus/minus reference), BUSCO identification of core gene set.
Transcriptome Assembly (long read)
- Type of Data: PacBio
- Analysis methods:
Error Correction, assembly and contig polishing: IsoSeq3
Transcript clustering: Minimap2, Cupcake (+ reference genome), Cogent (-/+ reference genome)
Benchmarking: Quast, BUSCO - Deliverables:
High quality and low quality fasta and fastq transcript files, locus collapsed and 5’ degradation filtered assembly fastas and gff file. Quast summary metrics (plus/minus reference), BUSCO identification of core gene set.
Single-cell RNA-Seq Data Analysis
- Type of Data: 10x Single-cell RNA-sequencing Data
- Analysis methods:
Data Quality Control
Data processing: Normalization, Principal component analysis (PCA) dimension reduction, Clustering/Community Detection.
Data visualization: t-Distributed Stochastic Neighbor Embedding (t-SNE)/ Uniform
Manifold Approximation and Projection (UMAP) cell visualization.
Cell Trajectories analysis, Pseudo-time analysis, Differential expression analysis, Single-cell time series analysis - Deliverables:
Comprehensive Report on all requested analysis including codes, figures and paper format writing explanations.
Analysis on CRISPR Genetic screens
- Type of Data: Genome-wide screen data
- Analysis methods:
Generating the raw count table using “MAGeCK” MAGeCK raw count and Batch effect remove. MAGeCK raw count and NO Batch remove. MAGeCK raw count and DEseq2 package analysis. - Deliverables:
Comprehensive Report on all requested analysis including codes, figures and paper format writing explanations.
Circular Consensus Sequencing (CCS) Read Generation
- Type of Data: PacBio Sequel II
- Analysis methods:
Generation of CCS reads resulting from alignment between subreads taken from a single ZMW. CCS reads are advantageous for amplicon, RNA, and genome sequencing projects and are highly accurate (>99% accuracy, Q>20). - Deliverables:
A CCS.bam file for each SMRT cell run and a text file containing read generation statistics and metrics.
Prices
Service | UGA Fee | Non-UGA Fee | Commercial Fee |
---|---|---|---|
De novo transcriptome assembly from illumina short reads | $1,500 | $1,770 | $1,875 |
De novo transcriptome assembly from PacBio Iso-Seq data | $1,500 | $1,770 | $1,875 |
Assembly of small (< 50 Mb) eukaryotic genomes | custom | custom | custom |
Microbial genome assembly & annotation (1-2 genomes) from PacBio long reads | $1,000 | $1,180 | $1,250 |
Microbial genome assembly & annotation (3-5 genomes) from PacBio long reads | $1,500 | $1,770 | $1,875 |
Microbial genome assembly & annotation (6-10 genomes) from PacBio long reads | $2,000 | $2,360 | $2,500 |
Bacterial draft genome: sequencing using Illumina short reads/assembly/annotation | $1,000 | $1,180 | $1,250 |
Transcriptome annotation (basic) | $500 | $590 | $625 |
RNA-Seq or sRNA-Seq analysis (up to 24 samples, single mapping reference) | $1,500 | $1,770 | $1,875 |
RNA-Seq or sRNA-Seq analysis (25 to 72 samples, single mapping reference) | $2,000 | $2,360 | $2,500 |
RNA-Seq or sRNA-Seq analysis (above 73 to 96 samples, single mapping reference) | $2,500 | $2,950 | $3,125 |
RNAseq or sRNAseq analysis (for each additional mapping reference) | $500 | $590 | $625 |
Single Cell RNA-Seq analysis | $1,000 | $1,180 | $1,250 |
GO tag and InterProScan annotation | $300 | $354 | $375 |
SNPs detection/calling/filtering (up to 24 samples) | $1,500 | $1,770 | $1,875 |
GBS analysis using STACKS (up to 96 samples) | $1,500 | $1,770 | $1,875 |
Microbiome analysis (up to 24 samples) | $500 | $590 | $625 |
Microbiome analysis (25-96 samples) | $1,000 | $1,180 | $1,250 |
Bacterial genome submission to NCBI | $250 | $295 | $313 |
PacBio: CCS reads generation | $300 | $354 | $375 |
Hourly rate for custom jobs and/or personnel training | $75 | $89 | $94 |
Please contact GGBC to inquire about pricing for any bioinformatics services not listed above.