QBCG & GGF Bioinformatics

 

GGF, as part of the Quantitative Biology Consulting Group (QBCG), will provide bioinformatics support for your project. To request a free consultation, please visit QBCG.The concept is dividing the analysis workflows into basic modules and pricing these modules according to the needed time, resources, and expertise. This approach will allow users of the services to purchase only the parts they need. It also creates very clear deliverables for each step in the analysis.

 

The types of services offered are:
  • Team consultation: Prior to grants and experiments (2 hours free)
  • You-can-do-it: Recommendation on software, scripts, pipelines, and resources (BBB)(request fund for bioinformatics helpline)
  • Tutorial: Module-based tutoring sessions
  • A la carte: Order specific analysis workflow or modules
Conceptual bioinformatics workflow

The conceptual bioinformatics workflow is as follows:

  • Experimental design
  • Data suitability assessment
  • Quality and variables assessment
  • Reference assessment and preparation
  • Analysis workflows
  • Training in how to perform the above steps (if desired)
Guidelines for modules of data assessment and analysis workflows

Analysis of Data Suitability for Goal Assessment

This is primarily for clients who have generated data without any input from GGF/QBCG.

  • Check the data suitability for the intended experimental objectives
  • Check the experimental design and level of sequence coverage
  • Suggest suitable algorithms and pipelines

Quality control and experimental variables assessment (trimming and cleaning)

  • De-multiplexing
  • Quality-based and adapter trimming
  • Removal of artifacts, homopolymers
  • Assess the difference in the quality and quantity between the libraries and replicates in the same experiments
  • Assess the difference between libraries sequenced on different lanes, flowcells or platforms.
  • Assess the difference between the different kinds of libraries

Data normalization and fitting

The available DE algorithms (TSPM, GLM, EdgR, DESeq, baySeq, Cuffdiff, and rDiff) apply different models of distribution (Poisson, quasi-poisson, generalized-linear, and negative binomial distribution models). It is important to test data fitting to different models and choose the best model. This analysis includes:

  • Assess data dispersion by comparing the five numbers statistic summary between all variables, i.e. individuals, conditions, and replicates.
  • Draw distribution and scatter-plots to compare variables.
  • Assess the ratio of high to low abundant reads with all variables (this module requires a reasonable background in statistics).

Genome/transcriptome reference assessment and preparation

This is for clients who don’t have the reference sequence data and need us to obtain, evaluate and assess the reference suitability from data stored in public or other data repositories/sources.

  • Locate and download the reference genome, transcriptome, annotation files, or any required reference data.
  • Test and modify, if needed, the files’ format.

Read mapping to a reference genome

This is only the mapping component and includes:

  • Testing different mapping algorithms and selection of the best one for the particular data set.
  • Optimization of the mapping parameters.
  • Generation of mapping files in a common format, e.g. SAM/BAM.
  • Generation of files for mapped and un-mapped reads.

Mapping to a reference for transcriptome assembly

Transcriptome reference assembly (mapping, clustering and exporting consensus transcript sequences). Testing of different mapping algorithms and selection of the best for the particular data set.

  • Optimization of the mapping parameters.
  • Generation of mapping files in a common format – Generation of files for mapped and un-mapped reads.
  • Generation of a sequence file containing the assembled sequences.

Mapping to a reference transcriptome(s) for transcriptional analysis

This is mapping to a reference for gene expression profiling and isoform detection.Testing different mapping algorithms and selection of the best for the particular data set.

  • Optimization of the mapping parameters.
  • Generation of mapping files in a common format.
  • Generation of files for mapped and un-mapped reads.
  • Generation of counts (RPM, RPKM, or FBKM) file per each replicate, condition, sample, and experiment.
  • Generation of a new isoforms file (only if a nicely annotated reference is available).

Mapping to a reference for exome-capture analysis

This analysis is primarily a custom analysis for enrichment and captures experiments.Mapping of the captured reads from 100’s of samples (individuals) to a common reference.

  • Assembly of the individual exomes from each sample.
  • Comparison of the exomes to each other and to the distant reference.
  • Performance of either a phylogenetic or an expression analysis.

De novo genome assembly (viral/bacterial)

De novo assembly of viral/bacterial genomes:

  • Generation of contigs.
  • Statistical assessment of the assembly.
  • Comparison to public database or other reference.
  • Assessment of the ortholog gene regions (benchmarks).

De novo genome assembly, assembly stats, and ortholog benchmarking (<50 Mb eukaryotic genomes)

De novo assembly of small eukaryotic genomes:

  • Assembly of contigs/scaffolds.
  • Statistical assessment of the assembly.
  • Comparison to public databases.
  • Assessment of the ortholog gene regions (benchmarks).

De novo genome assembly & assessment (large, complex genomes)

A very complex project and require teamwork and extensive discussion with the client to specifically define the deliverables, timeframe, and cost.

Automated genome annotation (Bacterial/viral genome)

Blast to reference databases and parse results OR RAST pipeline.

Automated eukaryotic genome annotation (large complex genome, custom)

A very complex project and require teamwork and extensive discussion with the client to specifically define the deliverables, timeframe, and cost.

De novo transcriptome assembly

  • Assemble using different k-mers and select the best range of k-mers.
  • Re-assemble using overlapping-based method.
  • Generate contigs/scaffolds.
  • Statistical assessment of the assembly.
  • Comparison to public databases (mRNA and protein).
  • Assessment of the ortholog gene regions (benchmarks).

Transcriptome annotation

  • Reciprocal BLAST to nucleotide and protein database.
  • Parsing the results.
  • Generation of a spreadsheet for the gene description.

GO tag and InterProScan annotation

  • Assignment of gene ontology tags to transcript or protein sequences.
  • Assignment of InterProScan derived tags, e.g. Pfam and other HMM tags.
  • Parsing GO and IPS results into spreadsheet format for gene description.

Differential expression analysis

  • Assuming reads are already mapped and available in “SAM” or “BAM” format.

SNPs detection/calling/filtering

  • Assuming reads are already mapped and available in “SAM” or “BAM” format.

*Logical grouping refers to multiple samples generated in the same experiment under the same data generation conditions (e.g. each of 96 wells in the same plate) such that the same analysis would apply to each sample.

Prices

See QBCG for prices