Difference between revisions of "Search for enriched TFBSs (tracks) (analysis)"

Latest revision as of 18:14, 9 December 2020

Analysis title: Search for enriched TFBSs (tracks)
Provider: geneXplain GmbH
Class: EnrichedTFBSFinderTx
Plugin: com.genexplain.analyses (geneXplain analyses)

Yes set - Study track / Track with intervals of interest
No set - Background track / Track of non-bound intervals
Sequence source - Choose a deployed sequence source from the pull-down list. Selecting Custom enables setting of a custom sequence collection
Sequence collection - Resource / Folder with sequences containing Yes and No intervals
Input motif profile - Profile of weight matrices
Output path - Path in workspace to store output table
Initial cut-off - Score cut-off to initiate search for optimal threshold given in [frequency of predicted sites per base]
Analyze multiple No sets - Analyze the specified number and size of No sets to sampled
Number of samples - Number of No set samples
Sample size - Number of No set sequences to sample
Site enrichment cutoff - Threshold for enrichment of sites in Yes set
Site FDR cutoff - Threshold for FDR of site enrichment in Yes set
Sequence enrichment cutoff - Threshold for enrichment of Yes sequences with sites in Yes set
Sequence FDR cutoff - Threshold for FDR of Yes sequences with sites in Yes set

[edit] Output

The output contains the columns described below. Columns highlighted in bold are shown in the default view. The other columns can be included on demand via the Columns tab of the lower right panel (available with opened output table).

Adj. site FE: Adjusted fold enrichment of sites in Yes set
Site FDR: FDR of site enrichment (Benjamini-Hochberg method)
Adj. seq FE: Adjusted fold enrichment of site containing Yes sequences
Seq FDR: FDR of sequence enrichment (Benjamini-Hochberg method)
#Yes sites per 1K: Number of sites per 1000 scanned windows in Yes set
#No sites per 1K: Number of sites per 1000 scanned windows in No set
Site P-value: P-value of site enrichment (binomial test)
Site cutoff: Score cut-off with best site enrichment
%Yes seq: Percent Yes sequences with at least one site
%No seq: Percent No sequences with at least one site
Seq P-value: P-value of sequence enrichment (Fisher test)
Seq cutoff: Score cut-off with best sequence enrichment

[edit] Description

This method searches for enriched transcription factor binding sites given a set of described Position-specific Frequency Matrices (PFMs), e.g. as collected in Transfac(R).

Fold enrichment of sites (Site FE) as well as of sequences with at least one site (Seq FE) are optimized and reported as statistically corrected odds ratios (99% confidence interval). The reported values correct for small site or sequence numbers, taking into account possible variability, and are therefore more suitable for ranking PFMs by their fold enrichment in Yes promoters.

Furthermore, the algorithm seeks optimal score thresholds for each type of enrichment separately and reports False Discovery Rates (FDRs) in addition to uncorrected P-values.

An initial (low, permissive) score threshold for optimization is estimated using sequences in the No set. The threshold is specified as a single parameter, the frequency of sites per basepair, (see Expert options), thereby omitting the necessity to compile a PFM profile.

To ensure smooth performance, the routine imposes some limits on the input. Yes and No sequence sets must comprise at most 10 million bases and PFMs are expected to comprise at least 4 positions. Finally, the initial frequency cut-off should have 10-fold support by the No set, e.g. setting a threshold of 0.001 for a No set of 1000 bases would be too small, whereas 10000 bases would be just at the limit for that parameter.

To handle incidental enrichment of biologically not meaningful PFMs in some Yes- and No-set combinations, the program can draw a specified number of samples from a sufficiently large No sequence set and carry out the enrichment analysis for each No sample (option "Analyze multiple No sets"). A summary output is then prepared that shows for each matrix with how many No sets it satisfied given thresholds.

Difference between revisions of "Search for enriched TFBSs (tracks) (analysis)"

Latest revision as of 18:14, 9 December 2020

Contents

[edit] Search for enriched TFBSs in a track

[edit] Parameters

[edit] Output

[edit] Description

Personal tools

Namespaces

Variants

Views

Actions

Search

BioUML platform

Community

Modelling

Analysis & Workflows

Collaborative research

Development

Virtual biology

Wiki

Toolbox