Gene expression prediction

From BioUML platform
Revision as of 19:27, 1 April 2018 by Fedor Kolpakov (Talk | contribs)

Jump to: navigation, search
Method, code, references Input data Algorithm Accuracy Comment
INVOKE (R script)[1]

https://github.com/SchulzLab/TEPIC/tree/master/MachineLearningPipelines/INVOKE

Input:

  • TF-genes scores (calculated by TEPIC)
    • open chromatin data (DNaseI-seq, NOMe-seq)
    • PWM (Jaspar, HOCOMOCO, Uniprobe)
  • expression data (RNA-seq)

Output:

  • regression coefficients for TF
  • model performance: Pearson correlation, Spearman correlation, and MSE
    • boxplot showing model performance
    • heatmap (top 10 positive and negative coefficients)
    • scatter plots for predicted versus the measured gene expression data

INVOKE offers linear regression with various regularisation techniques (Lasso, Ridge, Elastic net) to infer potentially important transcriptional regulators by predicting gene expression from TEPIC TF-gene scores.

HepG2 - r=0.68,
K562 - r=0.68,
GM12878 - r =0.58

2009 - an approach based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection [2] Input:
  • ChIP-seq data
  • expression data (RNA-seq)

Output:

  • log-linear regression model
  • principal components with weights of corresponding TFs
  • for each TF, each gene - compute a TF association strength (TFAS) - the weighted sum of the corresponding ChIP-Seq signal strength, where the weights reflect the proximity of the signal to the gene.
  • principal component analysis (PCA) to extract uncorrelated characteristic patterns in the TFAS vectors.
  • centered and standardized the TFAS matrix A is decomposed by the singular value decomposition (SVD)
  • regression-based component selection
  • gene expression is expressed by the log-linear regression model
mouse ESCs, r=0.806, R2=0.65, CV-R2=0.64


References

Error fetching PMID 27899623:
Error fetching PMID 19995984:
  1. Error fetching PMID 27899623: [Schmidt217]
  2. Error fetching PMID 19995984: [Ouyang2009]
All Medline abstracts: PubMed | HubMed
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox