Difference between revisions of "Cluster analysis by K-means (analysis)"

From BioUML platform
Jump to: navigation, search
(Automatic synchronization with BioUML)
(Mistaken file names update reverted)
Line 1: Line 1:
 
;Analysis title
 
;Analysis title
:[[File:Cluster-analysis-by-K-means-analysis-icon.png]] Cluster analysis by K-means
+
:[[File:Data-Cluster-analysis-by-K-means-icon.png]] Cluster analysis by K-means
 
;Provider
 
;Provider
 
:[[Institute of Systems Biology]]
 
:[[Institute of Systems Biology]]

Revision as of 12:40, 16 May 2013

Analysis title
Data-Cluster-analysis-by-K-means-icon.png Cluster analysis by K-means
Provider
Institute of Systems Biology
Plugin
ru.biosoft.analysis (Common methods of data analysis plug-in)

Contents

Goal:

Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.

Input:

A table of genes or probes with their expression values or fold change calculated. Depending on the algorithm, input of certain parameters is required.

Output:

A table with the same genes grouped into clusters.

Parameters:

  • Experiment data - experimental data for analysis.
    • Table - a table with experimental data stored in repository.
    • Columns - the columns from the table which should be taken for the clustering analysis.
  • Cluster algorithm - the version of the K-means algorithm to be applied [1-4].
  • Cluster number - the number of clusters into which the input data will be divided.
  • Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.

Further details:

The clustering is done with the K-means algorithm as implemented in the R package (http://www.r-project.org/).

References:

  1. Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768–769.
  2. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108.
  3. Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128–137.
  4. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox