Cluster analysis by K-means (analysis)

From BioUML platform
Revision as of 17:02, 23 April 2013 by BioUML wiki Bot (Talk | contribs)

Jump to: navigation, search
Analysis title
Data-Cluster-analysis-by-K-means-icon.png Cluster analysis by K-means
Provider
Institute of Systems Biology

Contents

Goal:

Genes are grouped into clusters so that those in one cluster exhibit maximal similarity, whereas those of different clusters are maximally dissimilar.

Input:

A table of genes or probes with their expression values or fold change calculated. Depending on the algorithm, input of certain parameters is required.

Output:

A table with the same genes grouped into clusters.

Parameters:

  • Experiment data - experimental data for analysis.
    • Table - a table with experimental data stored in repository.
    • Columns - the columns from the table which should be taken for the clustering analysis.
  • Cluster algorithm - the version of the K-means algorithm to be applied [1-4].
  • Cluster number - the number of clusters into which the input data will be divided.
  • Output table - name and path in the repository under which the result table will be saved. If a table with the specified name and path already exists, it will be overwritten.

Further details:

The clustering is done with the K-means algorithm as implemented in the R package (http://www.r-project.org/).

References:

  1. Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768–769.
  2. Hartigan, J. A. and Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics 28, 100–108.
  3. Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128–137.
  4. MacQueen, J. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.
Personal tools
Namespaces

Variants
Actions
BioUML platform
Community
Modelling
Analysis & Workflows
Collaborative research
Development
Virtual biology
Wiki
Toolbox