Package 'nipals' reference manual

Title:	Principal Components Analysis using NIPALS or Weighted EMPCA, with Gram-Schmidt Orthogonalization
Description:	Principal Components Analysis of a matrix using Non-linear Iterative Partial Least Squares or weighted Expectation Maximization PCA with Gram-Schmidt orthogonalization of the scores and loadings. Optimized for speed. See Andrecut (2009) <doi:10.1089/cmb.2008.0221>.
Authors:	Kevin Wright [aut, cre, cph]
Maintainer:	Kevin Wright <[email protected]>
License:	MIT + file LICENSE
Version:	1.0
Built:	2025-03-02 05:34:53 UTC
Source:	https://github.com/kwstat/nipals

Average angular distance between two rotation matrices

Description

For matrices A and B, calculate the angle between the column vectors of A and the corresponding column vectors of B. Then average the angles.

Usage

avg_angular_distance(A, B)
avg_angular_distance(A, B)

Arguments

`A`	Matrix
`B`	Matrix

Details

The results of the singular value decomposition X=USV' are unique, but only up to a change of sign for columns of U, which indicates that the axis is flipped.

Value

A single floating point number, in radians.

Author(s)

Kevin Wright

References

None

Examples

# Example from https://math.stackexchange.com/questions/2113634/
rot1 <- matrix(c(-0.956395958, -0.292073218, 0.000084963,
                 0.292073230, -0.956395931, 0.000227268,
                 0.000014880, 0.000242173, 0.999999971),
               ncol=3, byrow=TRUE)
rot2 <- matrix(c(-0.956227882, -0.292623029, -0.000021887,
                 0.292623030, -0.956227882, -0.000024473,
                 -0.000013768, -0.000029806, 0.999999999),
               ncol=3, byrow=TRUE)
avg_angular_distance(rot1, rot2) # .0004950387

# Example from https://math.stackexchange.com/questions/2113634/
rot1 <- matrix(c(-0.956395958, -0.292073218, 0.000084963,
                 0.292073230, -0.956395931, 0.000227268,
                 0.000014880, 0.000242173, 0.999999971),
               ncol=3, byrow=TRUE)
rot2 <- matrix(c(-0.956227882, -0.292623029, -0.000021887,
                 0.292623030, -0.956227882, -0.000024473,
                 -0.000013768, -0.000029806, 0.999999999),
               ncol=3, byrow=TRUE)
avg_angular_distance(rot1, rot2) # .0004950387

Principal component analysis by weighted EMPCA, expectation maximization principal component-analysis

Description

Used for finding principal components of a numeric matrix. Missing values in the matrix are allowed. Weights for each element of the matrix are allowed. Principal Components are extracted one a time. The algorithm computes x = TP', where T is the 'scores' matrix and P is the 'loadings' matrix.

Usage

empca(
  x,
  w,
  ncomp = min(nrow(x), ncol(x)),
  center = TRUE,
  scale = TRUE,
  maxiter = 100,
  tol = 1e-06,
  seed = NULL,
  fitted = FALSE,
  gramschmidt = TRUE,
  verbose = FALSE
)
empca(
  x,
  w,
  ncomp = min(nrow(x), ncol(x)),
  center = TRUE,
  scale = TRUE,
  maxiter = 100,
  tol = 1e-06,
  seed = NULL,
  fitted = FALSE,
  gramschmidt = TRUE,
  verbose = FALSE
)

Arguments

`x`	Numerical matrix for which to find principal components. Missing values are allowed.
`w`	Numerical matrix of weights.
`ncomp`	Maximum number of principal components to extract from x.
`center`	If TRUE, subtract the mean from each column of x before PCA.
`scale`	if TRUE, divide the standard deviation from each column of x before PCA.
`maxiter`	Maximum number of EM iterations for each principal component.
`tol`	Default 1e-6 tolerance for testing convergence of the EM iterations for each principal component.
`seed`	Random seed to use when initializing the random rotation matrix.
`fitted`	Default FALSE. If TRUE, return the fitted (reconstructed) value of x.
`gramschmidt`	Default TRUE. If TRUE, perform Gram-Schmidt orthogonalization at each iteration.
`verbose`	Default FALSE. Use TRUE or 1 to show some diagnostics.

Value

A list with components eig, scores, loadings, fitted, ncomp, R2, iter, center, scale.

Author(s)

Kevin Wright

References

Stephen Bailey (2012). Principal Component Analysis with Noisy and/or Missing Data. Publications of the Astronomical Society of the Pacific. http://doi.org/10.1086/668105

Examples

B <- matrix(c(50, 67, 90, 98, 120,
              55, 71, 93, 102, 129,
              65, 76, 95, 105, 134,
              50, 80, 102, 130, 138,
              60, 82, 97, 135, 151,
              65, 89, 106, 137, 153,
              75, 95, 117, 133, 155), ncol=5, byrow=TRUE)
rownames(B) <- c("G1","G2","G3","G4","G5","G6","G7")
colnames(B) <- c("E1","E2","E3","E4","E5")
dim(B) # 7 x 5
p1 <- empca(B)
dim(p1$scores) # 7 x 5
dim(p1$loadings) # 5 x 5

B2 = B
B2[1,1] = B2[2,2] = NA
p2 = empca(B2, fitted=TRUE)

B <- matrix(c(50, 67, 90, 98, 120,
              55, 71, 93, 102, 129,
              65, 76, 95, 105, 134,
              50, 80, 102, 130, 138,
              60, 82, 97, 135, 151,
              65, 89, 106, 137, 153,
              75, 95, 117, 133, 155), ncol=5, byrow=TRUE)
rownames(B) <- c("G1","G2","G3","G4","G5","G6","G7")
colnames(B) <- c("E1","E2","E3","E4","E5")
dim(B) # 7 x 5
p1 <- empca(B)
dim(p1$scores) # 7 x 5
dim(p1$loadings) # 5 x 5

B2 = B
B2[1,1] = B2[2,2] = NA
p2 = empca(B2, fitted=TRUE)

Principal component analysis by NIPALS, non-linear iterative partial least squares

Description

Used for finding principal components of a numeric matrix x. Missing values in the matrix are allowed. Principal Components are extracted one a time. This algorithm computes x = TLP', where T is the scores matrix, L (Lambda) is the eigenvalue vector, and P is the loadings matrix.

Usage

nipals(
  x,
  ncomp = min(nrow(x), ncol(x)),
  center = TRUE,
  scale = TRUE,
  maxiter = 500,
  tol = 1e-06,
  startcol = 0,
  fitted = FALSE,
  force.na = FALSE,
  gramschmidt = TRUE,
  verbose = FALSE
)
nipals(
  x,
  ncomp = min(nrow(x), ncol(x)),
  center = TRUE,
  scale = TRUE,
  maxiter = 500,
  tol = 1e-06,
  startcol = 0,
  fitted = FALSE,
  force.na = FALSE,
  gramschmidt = TRUE,
  verbose = FALSE
)

Arguments

`x`	Numerical matrix for which to find principal compontents. Missing values are allowed.
`ncomp`	Maximum number of principal components to extract from x.
`center`	If TRUE, subtract the mean from each column of x before PCA.
`scale`	if TRUE, divide the standard deviation from each column of x before PCA.
`maxiter`	Maximum number of NIPALS iterations for each principal component.
`tol`	Default 1e-6 tolerance for testing convergence of the NIPALS iterations for each principal component.
`startcol`	Determine the starting column of x for the iterations of each principal component. If 0, use the column of x that has maximum absolute sum. If a number, use that column of x. If a function, apply the function to each column of x and choose the column with the maximum value of the function.
`fitted`	Default FALSE. If TRUE, return the fitted (reconstructed) value of x.
`force.na`	Default FALSE. If TRUE, force the function to use the method for missing values, even if there are no missing values in x.
`gramschmidt`	Default TRUE. If TRUE, perform Gram-Schmidt orthogonalization at each iteration.
`verbose`	Default FALSE. Use TRUE or 1 to show some diagnostics.

Details

CAUTION: Different R package functions do different things with the L matrix. For example, some functions include L in T.

The R2 values that are reported are marginal, not cumulative.

Value

A list with components eig, scores, loadings, fitted, ncomp, R2, iter, center, scale.

Author(s)

Kevin Wright

References

Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Ed., P.R. Krishnaiah), Academic Press, NY, 391-420.

Andrecut, Mircea (2009). Parallel GPU implementation of iterative PCA algorithms. Journal of Computational Biology, 16, 1593-1599.

Examples

B <- matrix(c(50, 67, 90, 98, 120,
              55, 71, 93, 102, 129,
              65, 76, 95, 105, 134,
              50, 80, 102, 130, 138,
              60, 82, 97, 135, 151,
              65, 89, 106, 137, 153,
              75, 95, 117, 133, 155), ncol=5, byrow=TRUE)
rownames(B) <- c("G1","G2","G3","G4","G5","G6","G7")
colnames(B) <- c("E1","E2","E3","E4","E5")
dim(B) # 7 x 5
p1 <- nipals(B)
dim(p1$scores) # 7 x 5
dim(p1$loadings) # 5 x 5

B2 = B
B2[1,1] = B2[2,2] = NA
p2 = nipals(B2, fitted=TRUE)

# Two ways to make a biplot

# method 1
biplot(p2$scores, p2$loadings)

# method 2
class(p2) <- "princomp"
p2$sdev <- sqrt(p2$eig)
biplot(p2, scale=0)

B <- matrix(c(50, 67, 90, 98, 120,
              55, 71, 93, 102, 129,
              65, 76, 95, 105, 134,
              50, 80, 102, 130, 138,
              60, 82, 97, 135, 151,
              65, 89, 106, 137, 153,
              75, 95, 117, 133, 155), ncol=5, byrow=TRUE)
rownames(B) <- c("G1","G2","G3","G4","G5","G6","G7")
colnames(B) <- c("E1","E2","E3","E4","E5")
dim(B) # 7 x 5
p1 <- nipals(B)
dim(p1$scores) # 7 x 5
dim(p1$loadings) # 5 x 5

B2 = B
B2[1,1] = B2[2,2] = NA
p2 = nipals(B2, fitted=TRUE)

# Two ways to make a biplot

# method 1
biplot(p2$scores, p2$loadings)

# method 2
class(p2) <- "princomp"
p2$sdev <- sqrt(p2$eig)
biplot(p2, scale=0)

U.S. Crime rates per 100,00 people

Description

U.S. Crime rates per 100,00 people for 7 categories in each of the 50 U.S. states in 1977.

Usage

uscrime
uscrime

Format

A data frame with 50 observations on the following 8 variables.

state: U.S. state
murder: murders
rape: rapes
robbery: robbery
assault: assault
burglary: burglary
larceny: larceny
autotheft: automobile thefts

Details

There are two missing values.

Source

Documentation Example 3 for PROC HPPRINCOMP. http://documentation.sas.com/api/docsets/stathpug/14.2/content/stathpug_code_hppriex3.htm?locale=en

References

SAS/STAT User's Guide: High-Performance Procedures. The HPPRINCOMP Procedure. http://support.sas.com/documentation/cdl/en/stathpug/67524/HTML/default/viewer.htm#stathpug_hpprincomp_toc.htm

Examples


library(nipals)
head(uscrime)

# SAS deletes rows with missing values
dat <- uscrime[complete.cases(uscrime), ]
dat <- as.matrix(dat[ , -1])
m1 <- nipals(dat) # complete-data method

# Traditional NIPALS with missing data  
dat <- uscrime
dat <- as.matrix(dat[ , -1])
m2 <- nipals(dat, gramschmidt=FALSE) # missing 
round(crossprod(m2$loadings),3) # Prin Comps not quite orthogonal
  
# Gram-Schmidt corrected NIPALS
m3 <- nipals(dat, gramschmidt=TRUE) # TRUE is default
round(crossprod(m3$loadings),3) # Prin Comps are orthogonal

library(nipals)
head(uscrime)

# SAS deletes rows with missing values
dat <- uscrime[complete.cases(uscrime), ]
dat <- as.matrix(dat[ , -1])
m1 <- nipals(dat) # complete-data method

# Traditional NIPALS with missing data  
dat <- uscrime
dat <- as.matrix(dat[ , -1])
m2 <- nipals(dat, gramschmidt=FALSE) # missing 
round(crossprod(m2$loadings),3) # Prin Comps not quite orthogonal
  
# Gram-Schmidt corrected NIPALS
m3 <- nipals(dat, gramschmidt=TRUE) # TRUE is default
round(crossprod(m3$loadings),3) # Prin Comps are orthogonal

Package 'nipals'

Help Index

Average angular distance between two rotation matrices

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Principal component analysis by weighted EMPCA, expectation maximization principal component-analysis

Description

Usage

Arguments

Value

Author(s)

References

Examples

Principal component analysis by NIPALS, non-linear iterative partial least squares

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

U.S. Crime rates per 100,00 people

Description

Usage

Format

Details

Source

References

Examples