SIAM J. MATRIX ANAL. APPL.
Vol. 12, No. 3, pp. 401425, July 1991
()
1991 Society for Industrial and Applied Mathematics 001
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
THE RESTRICTED SINGULAR VALUE DECOMPOSITION: PROPERTIES AND APPLICATIONS* PART L. R. DE MOORt AND GENE H. GOLUB$
Abstract. The restricted singular value decomposition (RSVD) is the factorization of a given matrix, relative to two other given matrices. It can be interpreted as the ordinary singular value decomposition with different inner products in row and column spaces. Its properties and structure, as well as its connection to generalized eigenvalue problems, canonical correlation analysis, and other generalizations of the singular value decomposition, are investigated in detail. Applications that are discussed include the analysis of the extended shorted operator, unitarily invariant norm minimization with rank constraints, rank minimization in matrix balls, the analysis and solution of linear matrix equations, rank minimization of a partitioned matrix, and the connection with generalized Schur complements, constrained linear and total linear least squares problems with mixed exact and noisy data, including a generalized GaussMarkov estimation scheme.
Key words, generalized SVD, generalized matrix inverses, (total) linear least squares, (generalized) Schur complements, matrix balls, shorted operator
AMS(MOS) subject
classifications. 15A09, 15A18, 15A21, 15A24, 65F20
1. Introduction. The ordinary singular value decomposition (OSVD) has a long history with original contributions by Seltrami (1873) [2], Sylvester (1889) [26], Autonne (1902) [1], Eckart and Young (1936) [12] and many others (see, e.g., the references in [15], [21], [27]). It has become an important tool in the analysis and numerical solution of numerous problems arising in such diverse applications as psychometrics, statistics, signal processing, and system theory. Not only does it allow for an elegant problem formulation, but at the same time it provides geometrical and algebraic insight together with an immediate numerically robust implementation [15]. Recently, several generalizations to the OSVD have been proposed and their properties analysed. The one that is best known is the generalized SVD as introduced by Paige and Saunders in 1981 [22], which we propose to rename as the Quotient SVD (QSVD) [8]. Another example is the Product SVD (PSVD) as proposed by Fernando and Hammarling in 1987 [14] and further analysed in [10]. The third one is the Restricted SVD (RSVD), introduced in its explicit form by Zha in [32] and further developed and discussed in this paper. In [8] we have proposed a standardized nomenclature for the singular value decomposition and its generalizations. This set of names has the advantage of being alphabetic and mnemonic, OPQRSVD. For the structure and properties of the OSVD, PSVD, and QSVD, we also refer to [8]. The RSVD, which is the main subject of this paper, applies for a given triplet of matrices A,B, C of compatible dimensions (Theorem 1). In essence, the RSVD provides a factorization of the matrix A, relative to matrices B and C. It could be
Received by the editors June 8, 1989; accepted for publication (in revised form) September 12, 1990. Part of this work was supported by the United States Army under contract DAAL0387K0095. Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven, Kardinaal Mercierlaan 94 B3001, Leuven, Belgium (demooresat.kuleuven.ac.be). This author was a visiting research associate at the Computer Science Department and the Department of Electrical Engineering (Information Systems Laboratory) of Stanford University, where he was supported by an Advanced Research Fellowship in Science and Technology of the North Atlantic Treaty Organization (NATO) Science Fellowships Program and by a grant from IBM. He is now a research associate of the Belgian National Fund for Scientific Research (NFWO). Department of Computer Science, Stanford University, Stanford, California 94305 (na. golu b@na net .stanford. ed u).
401
402
B.L.R. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
considered as the OSVD of the matrix A, but with different (possibly nonnegativedefinite) inner products in its column and in its row space. It will be shown that the RSVD not only allows for an elegant treatment of algebraic and geometric problems in a wide variety of applications, but that its structure provides a powerful tool in simplifying proofs and derivations that are algebraically rather complicated. Soon after the present paper was completed, Zha and de Moor discovered that the RSVD is only one of the three possible SVDlike factorizations for three matrices. Similar generalizations of the OSVD are not only limited to two or three matrices, but can be derived for 4, 5, i.e., any number of matrices of compatible dimensions. The PSVD and the QSVD serve as basic building blocks in this infinite tree of generalizations of the OSVD. For instance, the RSVD which is analysed in this paper can also be considered as a double QSVD. This is the reason why we have called it the QQSVD in [11], where the complete structure of this tree of generalizations is also developed in detail. This paper is organised as follows. In 2, the main structure of the RSVD is analysed in terms of the ranks of the concatenation of certain matrices. The factorization is related to a generalized eigenvalue problem (2.2.1). A variational characterization is provided in 2.2.2. A generalized dyadic decomposition is explored in 2.2.3 together with a geometrical interpretation. It is shown how the RSVD contains other generalizations of the OSVD, such as the PSVD and the QSVD, as special cases in 2.2.4. In 3, several applications are discussed: Rank minimization and the extended shorted operator are the subject of 3.1, as well as unitarily invariant norm minimization with rank constraints and the relation with matrix balls. We also investigate a certain linear matrix equation which is directly related to the MoorePenrose pseudoinverse of a matrix. The low rank approximation o.f a partitioned matrix when only one of its blocks can be modified is explored in 3.2, together with total least squares with mixed exact and noisy data and linear constraints. While the role of the Schur complement and its close connection to least squares estimation is well understood, it will be shown in this section that there exists a similar relation between constrained total linear least squares solutions and a generalized Schur complement. Generalized GaussMarkov models, possibly with constraints, are discussed in 3.3 and it is shown how the RSVD simplifies the solution of linear least squares problems with constraints. In 4 the main conclusions are presented together with some perspectives. Let us conclude this Introduction by referring to the reports mentioned in [9] for a detailed constructive proof of the main theorem of this paper.
...,
Notation, conventions, and abbreviations. Throughout the paper, capitals denote matrices. The lower case letters i, j, k, l, m, n,p, q, r are nonnegative integers. Other lower case letters denote vectors. The set of real numbers is denoted by Scalars (possibly complex) are denoted by Greek letters. The matrices A (m x n), B (m p), C (q x n) are given matrices. Their ranks will be denoted by ra, rb, rc. A D is a p x q matrix. M is the matrix with A, B, C, D* as its blocks: M (c ). We shall also frequently use the following ranks: rac rank(), rabc rank( ), rank( A B ). A is the transpose of a (possibly complex) matrix A and is the tab complex conjugate of A. A* denotes the complex conjugate transpose of a (complex) The matrix A* represents the inverse of A*. Ik is the k x k matrix: A*
.
.
D.
B0
RESTRICTED SINGULAR VALUE DECOMPOSITION
403
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
identity matrix. The subscript is omitted when the dimensions are clear from the context. Identity vectors with the ith component equal to 1 and all others zero, are denoted by ei (m 1). A matrix X is called an A(i, j,...)inverse of the matrix A if it satisfies equation i, j,... of the following:
1. AXA=A, 2. XAX X, 3. (AX)* AX, 4. (XA)* XA. An A(1) inverse is also called an inner inverse and denoted by A. The A(1,2,3, 4) inverse is the MoorePenrose pseudoinverse denoted by A + and it is unique. We shall also need the following lemmas. LEMMA 1 (inner inverse of a factored matrix). Let the matrix A be factored as
o o Qwhere
Da
is square
ra
ra nonsingular. Then, every
AQ
inner inverse
A
can be
written as
(1)
Z21
Z12 Z22
)
p,
of this form is an inner inverse of A. For a detailed discussion of generalized inverses, we refer to [21]. The matrices Ua (mxm), Va (nn), Vb (pp), Uc (qq) are unitary, i.e., Im UUa, The matrices P (m m) and Q In VVa, VbV Ip VVb, UcU Iq (n n) are square nonsingular. The nonzero elements of the diagonal matrices S, $2, and $3, which appear in the theorems, are denoted by hi,/3i, and 7i. The vector ai denotes the ith column of the matrix A. The range (column space) of the matrix A is denoted by R(A) {yJy Ax}. The row space of A is denoted by R(A*). The null space of the matrix A is represented as N(A) (xlAx= 0}. The symbol denotes the intersection of two vector spaces. We shall use the following wellknown result. LEMMA 2 (the dimension of the intersection of subspaces).
where Z12, Z21, Z22 are arbitrary matrices. Conversely, every matrix A
UUc.
UaU
VaV
N R(B)) dim(R(A*) N R(C*))
dim(R(A)
ra + rb
rab
ra + rc re.
JJAJl is any unitarily invariant matrix norm while JJAJJF is the Frobenius norm: JJAJJ 2 Ftrace(AA*). The norm of the vector a is denoted by ]la]12 where ]la]122 a*a. Moreover,
we will adopt the following convention for block matrices: Any (possibly rectangular) block of zeros is denoted by 0, the precise dimensions being obvious from the block dimensions. The symbol I represents a matrix block corresponding to the square identity matrix of appropriate dimensions. Whenever a dimension indicated by an integer in a block matrix is zero, the corresponding block row or block column should be omitted and all expressions and equations in which a block matrix of that block row or block column appears, can be disregarded. An equivalent formulation would be that we allow 0 n or n 0 (n # 0) blocks to appear in matrices. This permits an elegant treatment of several cases at once. Finally, we would like to introduce the term quasidiagonal matrix for a matrix, the block rows and block columns of which are a permutation of a diagonal matrix.
404
B.L. It. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
2. The restricted singular value decomposition (RSVD). The idea of a generalization of the OSVD for three matrices is implicit in the S, Tsingular value decomposition of Van Loan [30] via its relation to a generalized eigenvalue problem. Zha [32] introduced an explicit formulation of the RSVD constructing it through the use of several OSVDs and QSVDs (see also [9]). For the sake of brevity, we have omitted our constructive proof based on a sequence of OSVDs and PSVDs. It can be found in [9]. In this section, we first state the main theorem (2.1), which describes the structure of the RSVD, followed by a discussion of the main properties in 2.2, including the connection to generalized eigenvalue problems, a generalized dyadic decomposition, geometrical insights, and the demonstration that the RSVD contains the OSVD, the PSVD, and the QSVD as special cases.
2.1. The RSVD theorem. With the notation and conventions of 1, we have the following theorem. THEOREM 1 (the restricted singular value decomposition). Every triplet of matrices A (m n), B (m p), and C (q n) can be factorized as
A
C
P*SaQ 1,
B=P*SbV,
USQ
,
where P (m m) and Q (n n) are square nonsingular, and Vb (p p) and Uc (q q) are unitary. Sa (m n), Sb (m p), and Sc (q n) are real quasidiagonal matrices with nonnegative elements and the following block structure: 1 2 3 4 1 2 3 4 5 6 I 0 0 0 1 ’$1 0 0 0 0 0 0 0 0 0 2 0 I 0 0 0 0 0 I 0 0 3 0 0 I 0 0 0 0 0 0 0 4 0 0 0 I 0 0 5 0 0 0 0 0 0 0 0 0 S =6 0 0 0 0 0 0 0 0 0 0
I
0
0 0
0 0 0
0
The
S block dimensions of the matrices Sa, Sb, Sc are the following. Block columns .of Sa and Sc Block columns of Sb
1. 2. 3. 4. 5. 6.
rabc
,
0 0 0
I 0 0
0 0
0 0 0
0 0 0 0
re, b
rac d rb rabc rb rc rac ra n rac Block row8 of Sa and S’b’ l. l’abc ?’a tab l’ac 2. tab [ rc rabc 3. rac d rb rabc 4. rabc rb rc 5. tab ra
6.
m
ra rc
rac
rab
rabc rabc

rabc ] ra rac rac d rb rabc p rb tab ra
rab
Block rows rabc d l’a tab tab l’c r abc
qrc
rac ra
rab

of S
rac
RESTRICTED SINGULAR VALUE DECOMPOSITION
405
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
The matrices $1, $2, $3 are square nonsingular diagonal with positive diagonal elements. Let ai, j, "Yk be the diagonal elements of the matrices $1, $2, $3. We propose to call the following triplets of numbers the restricted singular value triplets: rabc + ra tab rac triplets of the form (ai, 1, 1) with ai > 0. By convention, they will be ordered as
tab + rc rac + rb
(1, 0, 1). (1, 1, 0). rabc rb rc triplets of the form (1, 0, 0). tab r triplets of the form (0, j, 0), j > 0 (elements of $2). rac ra triplets of the form (0, 0, "k), ’k > 0 (elements of $3). min(rn rb, n r) trivial triplets (0, 0, 0).
rabc triplets of the form rabc triplets of the form
We propose to call the factorization of a matrix triplet, as described in Theorem 1, the restricted singular value decomposition because the RSVD allows us to analyse matrix problems that can be stated in terms of the matrices A + BDC and
in which the matrices B and C represent certain restrictions on the type of operations that are allowed. Typically, we are interested in the ranks of these matrices as the matrix D is modified. The rank of the matrix A + BDC can only be reduced by modifications that belong to the column space of B and the row space of C. It will be shown how the rank of M can be analysed via a generalized Schur complement, which is of the form D* CAB, where again, C and B represent certain restrictions and A is an inner inverse of A. Moreover, the RSVD yields the restriction of the linear operator represented by the matrix A to the column space of B and the row space of C. Finally, the RSVD can be interpreted as an OSVD but with certain restrictions on the inner products to be used in the column and row space of the matrix A (see
Some algorithmic issues related to the RSVD are discussed in [11], [13], [29], [28], and [33], though a full portable and documented algorithm for the RSVD is still to be developed.
2.2. Properties of the RSVD. The OSVD, as well as the PSVD and the QSVD, can all be related to a certain (generalized) eigenvalue problem. It comes as no surprise that this is also the case for the RSVD. First, the generalized eigenvalue problem for the RSVD will be analysed in 2.2.1 and we shall point out an interesting connection with canonical correlation analysis. A variational characterization of the RSVD is provided in 2.2.2. A generalized dyadic decomposition and some geometrical properties are investigated in 2.2.3. In 2.2.4, it is shown how the OSVD, PSVD, and QSVD are special cases of the RSVD.
2.2.1. Relation to a generalized eigenvalue problem. Consider the generalized eigenvalue problem
(2)
( AO,
A
0
0
406
B.L. It. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Let p be the ith column of P and q the ith column of Q. Obviously, the column vector (p q)* is a generalized eigenvector of the pencil (2). There are four types of generalized eigenvalues (finite nonzero, zero, infinite, and arbitrary), which can be related to the restricted singular value triplets of Theorem 1. Note that if BB* Im and C*C I, the eigenvalues are :i= the singular values of the matrix A. In the case that the matrices BB* and C*C are nonsingular, it can be shown that the generalized eigenvalue problem (2) is equivalent to a singular value decomposition. It follows from (2) that
If BB* and C*C are both nonsingular, then there exist square nonsingular matrices Wb and We (for example, the Cholesky decomposition) such that BB* WWb and C*C WWc. Then, we have that
(W[*AWl )(Wcqi) (W*A*W’I )(Wbpi)
(Wbpi)Ai, (Wcqi)Ai.
From Theorem 1, it follows that P* (BB* )P SbS and Q*(C*C)Q StcSc. Hence, if BB* is nonsingular, the column vectors of P are orthogonal with respect to the
inner product provided by the positivedefinite matrix BB*. A similar observation applies for the column vectors of Q with respect to C*C. The B B*orthogonality of the vectors pi and the C*Corthogonality of the vectors qi implies that the vectors Wbpi and Wcqi are (multiples of) the left and right singular vectors of the matrix
Consider the RSVD of the matrix triplet eigenvalue problem:
W*AW
.
(A’B, A*, B) and its related generalized
A
0 0
(o.0
A*B
0
This is nothing more than the eigenvalue problem that arises in canonical correlation analysis (principal angles and vectors between subspaces; see, e.g., [3], [15]). There exist applications where the matrices BB* and C*C are (almost) singular (see, e.g., [13], [18]). The matrices BB* and C*C can be (sample) covariance matrices that are (almost) singular. This is, for instance, the case in [18], where a generalized type of canonical correlation analysis is required, allowing singular covariance matrices. Another example is generalized GaussMarkov estimation as described in 3.3. It is in these situations that the RSVD may provide essential insight into the geometry of the singularities and at the same time yield a numerically robust and elegant implementation of the solution by avoiding the explicit solution (with its "implicit
squaring") of the generalized eigenvalue problem. 2.2.2. A variational characterization. Let (x, y) x*Ay be a bilinear form of 2 vectors x and y. We wish to maximize (x, y) over all vectors x, y subject to x*BB*x 1 and y*C*Cy 1. It follows directly from the RSVD that a solution exists only if one of the following situations occurs" O. In this case, the maximum is equal to the largest rabc T ra rab rac diagonal element of S and the optimizing vectors are x p (first column vector of P) and y q (first column vector of Q) so that (pl, q)
RESTRICTED SINGULAR VALUE DECOMPOSITION
rabc
407
+ ra tab rac
O. The norm constraints on x and y can only be
rabc
satisfied if
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
rac t rb
and
tab
>0
or
tab
ra
0
rc
rabc
0 or
rac ra > O.
In either case, the maximum is 0. If none of these conditions is satisfied, there is no solution. Assume that the maximum is achieved for the vectors x q. pl and y other extrema of the objective function (x, y) x*Ay, constrained to lie in Then, subspaces that are BB*orthogonal to p and C*Corthogonal to q, can be found in an obvious recursive manner. All of these extrema are then generated by the columns of the matrices P and Q.
2.2.3. A generalized dyadic decomposition and geometrical properties. P* and Q1 Q’*. Then, with an appropriate partitioning of the matrices P’, Q’, Uc, and Vb, corresponding to the diagonal structure of the matrices Sa, Sb, Sc of Theorem 1, it is straightforward to obtain the following sums:
Denote P’
A B
c
Hence,
P S Q’* + P Q’i +., r),* + P V + P V2 + P S2 V,
+
Q,*
+ UaSaQ ’*
.
R(P) + R(P) R(Q’) + R(Q’)
R(A)’ R(B), R(A*)N R(C*).
R(B)
4
The decomposition of A can be interpreted as a decomposition relative to R(B) and R(C*): The four terms of this decomposition can be classified geometrically as follows:
R(C*) not in R(C*)
in
It(B) S Q’* P
in
not in
,
r),*
’’* "g3
.’’* 4g
Obviously, the term PSQ’* represents the restriction of the linear operator represented by the matrix A to the column space of the matrix B and the row space of the matrix C, while the term PQ* is the restriction of A to the orthogonal complements of R(B) and R(C*). Also, we find that
R(B*) It(C)
and
R(Vb)+ R(Vb) + R(Vb), I(Uc) + R(Uc2)+ R(Uc4),
0 0
BVb3
U3C
=== ==v
N(B) R(Vb3), N(C*) R(Uc3).
408
B.L.R. DE MOOR AND G. H. GOLUB
Finally, some of the block dimensions in the RSVD of the matrix triplet (A, B, C) can be related to geometrical interpretations by repeated application of Lemma 2.
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
dim
R( )NR(Bo )]racirbrabc,
f’ R(C 0)*
rab
1" c
dim[ R(A B)*
dim[ R(A) dim[ R(A*) R(C*)
It is easy to show that

rabc,
R(Q)
N(A)
N N(C),
R(P)
of dimension m
N(A*)"] N(B*).
Hence Q provides a basis for the common null space of A and C, which is of dimension n rac, while P provides a basis for the common null space of A* and B*, which is
tab.
2.2.4. Relation to (generalized) SVDs. The RSVD reduces to the OSVD, the PSVD, or the QSVD for special choices of the matrices A, B, and/or C. For the precise structure of the PSVD and the QSVD, we refer to [8]. THEOREM 2 (special cases of the RSVD). 1. RSVD of (A, Im, In) is an OSVD of A. 2. RSVD of (Ira, B, C)is a PSVD of (B*, C). 3. RSVD of (A,B, In) is a QSVD of (A,B). 4. RSVD of (A, Ira, C) is a QSVD of (A, C). Proof. 1. B I,, C In. Consider the RSVD of (A, Ira, In). By definition, Im= P*SbV and In UcScQ 1. This implies P* VS1 and Hence, we find that A Vb(S[SaSI)U, which is an OSVD of A. 2. A Ira. Consider the RSVD of (Ira, B, C). Then Im P*SaQ which C Uc(ScS)P *, which is nothing implies Q SiP *. Hence, B* else than a PSVD of (B*, C). 3. C In. Consider the RSVD of (A, B, In). Then In UcSQ which implies Q S[IU. Then, A P*(SaS[)U, B P*SDVb* which is (up to a diagonal scaling) a QSVD of the matrix pair (A, B). 4. B Ira. The proof is similar to part 3.
VbSP ,
,
,
3. Applications. In this section, we shall first explore the use of the RSVD in the analysis of problems related to expressions of the form A/ BDC where A, B, C are given matrices. The connection with Mitra’s concept of the extended shorted operator [20] and with matrix balls will be discussed, as will the solution of the matrix equation BDC A, which led Penrose to rediscover the pseudoinverse of a matrix [24], [25]. In 3.2, it is shown how the RSVD can be used to solve constrained total linear least squares problems with exact, noiseless rows and columns and the close connection to Carlson’s generalized Schur complement [4] is emphasized. In 3.3, we discuss the application of the RSVD in the analysis and solution of generalized GaussMarkov models, with and without constraints. Throughout this section, we shall use a matrix E, defined as
(3)
E
VDUc
RESTRICTED SINGULAR VALUE DECOMPOSITION
409
as follows:
with a block partitioning derived from the block structure of
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Sb and Sc
q
(4)
rabc
El2 Eli E21 E22 p rb E31 E32 E41 E42 tab ra 3.1. On the structure of A + BDC. The RSVD provides geometrical insight into the structure of a matrix A relative to the column space of a matrix B and the row space of a matrix C. As will now be shown, it is an appropriate tool to analyse expressions of the form A / BDC where D is an arbitrary p q matrix. The RSVD
rabc
+ ra rac + rb
rab rabc
rac
" ra
tab
rac
rab
t rc
rabc
rc rac ra El3 El4 E23 E24 E34 E33 E43 E44
allows us to analyse and solve the following questions: 1. What is the range of ranks of A / BDC over all possible p
q matrices D
(3.1.1)?
2. When is the matrix D that minimizes the rank of A + BDC unique (3.1.2)? 3. When is the term BDC that minimizes rank(A + BDC) unique? It will be shown how this corresponds to Mitra’s extension of the shorted operator [20] in 3.1.3. 4. In the case of nonuniqueness, what is the minimum norm solution (for unitarily invariant norms) D that minimizes rank(A + BDC) (3.1.4)? _< 5 where 5 is a 5. The reverse question is the following: Assume that given positive real scalar. What is the minimum rank of A + BDC? This can be linked to rank minimization problems in socalled matrix balls (3.1.5). 6. An extreme case occurs if we look for the (minimum norm) solution D to A. The RSVD provides the necessary the linear matrix equation BDC and sufficient conditions for consistency and allows us to parameterize all
IIDII
solutions
(3.1.6).
3.1.1. The range of ranks of A+BDC. The range of ranks of A + BDC for all possible matrices D is described in the following theorem. THEOREM 3 (on the rank of A + BDC).
rab
+rac rabc rank(A + BDC) <_ rain(tab, rac).
_
For every number r in between these bounds, there exists a matrix D such that
rank(A + BDC)
r.
Proof.
The proof uses the RSVD structure of Theorem 1"
A + BDC
where E
P*SaQ
+ P*SbVDUcScQ P*(Sa + SbESc)Q
rank(A + BDC)
VDUc.
Because of the nonsingularity of P, Q, Uc, Vb, we have that

,
0

and the block partitioning of E as in
rank(Sa + SbESc). Using elementary row and column operations (4), it is easy to show that
S + EI
rank(A+BDC)=rank
0 0 0 $2E41 0
0
0
I 0 0 0 I 0 0 0 I
0 0
0 0
0 0
0 0 0 $2E44S3 0
E14S3
0 0 0 0 0 0
410
B.L. It. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
the block dimensions of which are the same as those of Sa in Theorem 1. Obviously, a lower bound is achieved for Ell $1, E14 0, E4 0, E44 0. The upper bound is achieved for almost every ("random") choice of E,E4, E4,E44. Observe that, if ra tab t rac rabc, then there is no S block in Sa and the minimum rank of A + BDC will be ra. Also observe that the minimum achievable rank, rab b rac rabc, is precisely the number of restricted singular values triplets of the form (1, 0, 1), (1, 1, 0), and (1, 0, 0).
3.1.2. The unique rank minimizing matrix D. When is the matrix D that minimizes the rank of A + BDC unique? The answer is given in the following theorem. THEOREM 4. Let D be such that rank(A + BDC) tab + rac rabc and assume that ra > rab qrac rabc. Then the matrix D that minimizes the rank of A + BDC is unique if and only if: 1. rc q, 2. rbP 3. rabc rab q rc rac q rb. In the case where these conditions are satisfied, the matrix D is given as
D
Vb
0
0
Observe that the expression for the matrix D is nothing more than an OSVD! Proof. It can be verified from the matrix in (5) that the rank of A / BDC is independent of the block matrices El2, El3, E21, E22, E23, E24, E31, E32, E33, E34, E42, E43. Hence, the rank minimizing matrix D will not be unique, whenever one of the corresponding block dimensions is not zero, in which case it is parameterized by the blocks Eij in
S
(6)
D
V
E2 E21 E22 Ea Ea 0 E42
E13 0 E23 E24 Eaa Ea E43 0
U.
Setting the expressions for these block dimensions equal to zero results in the necessary conditions. The unique optimal matrix D is then given by D VbEU2, where
q + ra
tab
rac rac ra
ra
E41
E44
0
0
[3
3.1.3. On the uniqueness of BDC: The extended shorted operator. A question related to the one of 3.1.2 concerns the uniqueness of the product term BDC that minimizes the rank of A / BDC. As a matter of fact, this problem has received much attention in the literature where the term BDC is called the extended shorted operator and was introduced in [20]. It is an extension to rectangular matrices, of the shorting of an operator considered by Krein, Anderson, and Trapp only for positive operators (see [20] for references). DEFINITION 1 (the extended shorted operator ). Let A (m n), B (m p), and C (q n) be given matrices. A shorted matrix S(A]B, C) is any m n matrix that satisfies the following conditions:
We have slightly changed the notation that is used in [20].
RESTRICTED SINGULAR VALUE DECOMPOSITION
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
R(S(AIB, C)) c_ R(B),
2. If F is an m
R(S(AIB, C)*) R(C*). n matrix satisfying It(F) C_ R(B) and It(F*) C_ R(C*), rank(A F) _> rank(A 8(AIB C)).
_
411
then
Hence, the shorted operator is a matrix whose column space belongs to the column space of B, whose row space belongs to the row space of C, and which minimizes the rank of A F over all matrices F, satisfying these conditions. From this, it follows that the shorted operator can be written as
8(AIB, C)
BDC
for a certain p q matrix D. This establishes the direct connection of the concept of extended shorted operator with the RSVD. The shorted operator is not always unique, as can be seen from the following example. Let
A=
1 1 0
0 1 1
1 0
0)
B=
Then, all matrices of the form
1
a
0 0
0
0)
0 0
minimize the rank of A S, which equals 2, for arbitrary a and Necessary conditions for uniqueness of the shorted operator can be found in a straightforward way from the RSVD. THEOREM 5 (on the uniqueness of the extended shorted operator). Let the RSVD of the matrix triplet (A, B, C) be given as in Theorem 1. Then
.
The
,S(AIB, C) P*,S(S,[Sb, Sc)Q extended shorted operator 8(A[B, C) is unique if and only 1. rabc rc [ tab, 2. rabc rb rac
.
and is given by
S
S(A[B,C)=P*
0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
Q1
from Theorem 3 that the minimum rank of A + BDC is tab + case EI 0. A short S1,E14 0, E41 0, E44 computation shows that
Proof. It follows
rac rabc, and that in this
S
0
BDC
P*
E2
0 0 0
0 $2E42 0
E2 0 E22
0 0 0 0 0 0 0 0 0 0 0 0
0 0
E24S3
0 0 0
0 0 0 0 0 0
Q 1
412
B.L.R. DE MOOR AND G. H. GOLUB
Hence, the matrix BDC is unique if and only if the blocks El2, E22, E42, E21, E22, and
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
E24 do not appear in this decomposition. Setting the corresponding block dimensions equal to zero proves the theorem. Observe that the conditions for uniqueness of the extended shorted operator BDC are less restrictive than the uniqueness conditions for the matrix D (Theorem 4). As a consequence of Theorem 5, we also obtain a parameterization of all shorted operators in the case where the uniqueness conditions are not satisfied. All possible shorted operators are then parameterized by the matrices El2, E21, E22, E24, E42. Observe that the shorted operator is independent of the matrices El3, E23, E31, E32, E33, E34, E43. The result of Theorem 5, derived via the RSVD, corresponds to Theorem 4.1 and Lemma 5.1 of [20]. Some connections with the generalized Schur complement and statistical applications of the shorted operator can also be found in [20].
3.1.4. The minimum norm solutions D that reduce the rank of A+BDC. Consider the problem of finding the matrix D of minimal (unitarily invariant) norm ]ID]I such that"
rank(A + BDC)
r
< r,
where r is a prescribed nonnegative integer. It follows from Theorem 3 that a necessary condition for a solution to exist is that ra > r >_ tab + rac rabc. Observe that if ra rabc, no solution tab + rac exists. In this case, there is no diagonal matrix $1 in Sa of Theorem 1. Assume that the required rank r equals the minimal achievable" r tab + rac rabc. Then, if the conditions of Theorem 4 are satisfied, the optimal D is unique and follows directly from the RSVD. The interesting case occurs whenever the rank minimizing D is not unique. Before examining matrices D that minimize the rank of A / BDC, note that, whenever min(rab, rac) ra > 0, there exist many matrices that will increase the rank of A + BDC. In this case,
in,f{ e IIDII rank(A + BDC) > ra}
O,
which implies that there exist arbitrarily "small" matrices D that will increase the rank. THEOREM 6. Consider all matrices D satisfying
tab
+ rac rbc <_ r rank(A + BDC) < r
I[.11
be any unitarily invariant norm. A matrix D
where r is a given integer and let minimal norm IlDll is given by
of
D=Vb( SO O0 )Uc,
where S[ is a singular diagonal matrix
r / rabc
rac
tab
ra
r
,r
contains the
r / rabc ra r
tab
rac (
o
0
S
o)
ra r smallest diagonal elements of St.
RESTRICTED SINGULAR VALUE DECOMPOSITION
413
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Proof. From the RSVD of the matrix triplet A, B, C it follows that A + BDC P*(S + Sb(VDUc)Sc)Q P*(Sa + SbESc)Q
with IIEll IIVDUcll IIDll. The result follows immediately from the partitioning [:] of E as in (4) and from equation (5). We could use Theorem 6 to define the restricted singular values ak as
ak

inef{e
amax(D) rank(A + BDC)
k 1
}
where
maximum ordinary singular value. Because the rank of tab "+" rac rabc, there will be tab +rac rabc infinite restricted singular values. There are ra + rabc rab rac finite restricted singular values, corresponding to the diagonal elements of $1. From (5), it can be seen that the diagonal elements of $2 and $3 can be used to increase the rank of A + BDC to min(rab, rac). However, from (7) it is obvious that min(rac ra, rab ra) restricted singular values will be zero.
amax(.) denotes the
A + BDC cannot be reduced below
3.1.5. The reverse problem: Given IIDII, what is the minimal rank of A+BDC? The results of 3.1.3 and 3.1.4 allow us to obtain in a simple fashion the answer to the reverse question: Assuming we are given a positive real number 5 such _< what is the minimum rank rmin of A + BDC? that The answer is an immediate consequence of Theorem 6. Note that the optimal matrix D is given as the product of three matrices, which form its OSVD! Hence, IIS[II and the integer rmin c&n be determined as follows. Let S/be the x diagonal matrix that contains the smallest elements of $1. Then,
IIDII
,
IIDII
(8)
rmin
ra (m.ax {size(S/) such that IISill <_ 5}).
It is interesting to note that expressions of the form A + BDC with restrictions on the norm of D can be related to the notion of matrix balls, which show up in the analysis
of socalled completion problems [6]. DEFINITION 2 (matrix ball). For given matrices A C (q x n), the closed matrix ball Ti(AIB C) with center right semiradius C is defined by
(m x n), B (m x p), and A, left semiradius B, and
T(AIB, C)
{X
IX
A + BDC where
IIDII2 _< 1}.
Using Theorem 6 and (8), we can find all matrices of least rank within a certain given matrix ball by simply requiring that amax(D) <_ 1. The solution is obtained from the appropriate truncation of S[ in Theorem 6. Since the solution of the completion problems investigated in [6] are described in terms of matrix balls, it follows that we can find the minimal rank solution in the matrix ball of all solutions of the completion problems, using the RSVD.
3.1.6. The matrix equation BDC A. Consider the problem of investigating the consistency of, and, if consistent, finding a (minimum norm) solution to, the linear equation in the unknown matrix D:
BDC
A.
414
B.L.R. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
This equation has an historical significance because it led Penrose to rediscover what is now called the MoorePenrose pseudoinverse [21], [24]. Of course, this problem can be viewed as an extreme case of Theorems 3 and 6, with the prescribed integer r0o THEOREM 7. The matrix equation BDC A in the unknown matrix D is consistent if and only if
rab
rb
rac
rc
rabc
rb [rc.
All solutions are then given by
D=Vb
0 $1 El3 E3 E33 E34 0 E43 0
)U
0,
and a minimum norm solution corresponds to E3 O, E3 O, E33 0, E34 O. Proof. Let E VDUc and partition E as in (4). The consistency of BDC depends on whether the following is satisfied with equality
E43
A
EI
0
0 $2E41 0
E2
0 $2E42 0
E2 0 E22
0 0 0 0 0 0 0 0 0 0 0 0
E14S3 0 E24S3
0
$2E44S3 0
0 0 0 0 0 0
’1
0 0 0 0 0
0
0 0 0 0
I 0 0 0 0 0 I 0 0 0 0 0 I 0 0
0 0 0 0 0 0 0 0 0 0
Comparing the diagonal blocks, the conditions for consistency follow immediately as rac b rb rb [rc, which implies rab rb and rac rc. These tab qrc conditions express the fact that the column space of A should be contained in the column space of B and that the row space of A should be contained in the row space of C. If these conditions are satisfied, the matrix equation BDC A is consistent and the matrix E VDUc is given by
rabc
ra
q
ra
E
The equation BDC
p
rb rb
ra
(E
E31 E41
rc rc ra El3 El4 E33 E34 E43 E44
)
A is equivalent to
$2E41 0
$2E44 $3 0 0 0
0 0
0 0 0 0
This is solved for EI S, El4 0 E41 0, E44 0. Observe that the solution is independent of the blocks E3, E3, E33, E34, E43. Hence, all solutions can be parameterized as
D
Vbl Vb3 Vb4
E3 E33 E34 0 0 E43
The minimum norm solution follows immediately.
RESTRICTED SINGULAR VALUE DECOMPOSITION
415
Penrose originally proved [21], [24] that a necessary and sufficient condition for BDC A to have a solution is:
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
(9)
written as
BBACC =A,
where B and C are inner inverses of B and C. All solutions D can then be
(10)
D
BAC
+ Z BBZCC,
where Z is an arbitrary p x q matrix. It requires a tedious though straightforward calculation to verify that our solution of Theorem 7 coincides with (10). In order to verify this, consider the RSVD of A, B, C and use Lemma 1 to obtain an expression for the inner inverses of B and C, which will contain arbitrary matrices. Using the block dimensions of Sa, Sb, Sc as in Theorem 1, it can be shown that the consistency conditions of Theorem 7 coincide with the consistency condition (9). Before concluding this section, it is worth mentioning that all results of this section can be specialized for the case where either B or C equals the identity matrix. In this case, the RSVD specializes to the QSVD (Theorem 2) and mutatis mutandis, the same type of questions, now related to two matrices, can be formulated and solved using the QSVD such as shorted operators, minimum norm rank minimization, solution of the matrix equation DC A, etc.
3.2. On low rank approximations of a partitioned matrix. In this section, the RSVD will be used to analyse and solve problems that can be stated in terms A S of the matrix 2 M (D D* where A, B, C, D are given matrices. The main results include the analysis of the (generalized) Schur complement [4] in terms of the RSVD (3.2.1), the range of ranks of the matrix M as D is modified, and the analysis of the (non)unique matrix D that minimizes the rank of M (3.2.2), and finally the solution of the constrained total least squares problem with exact and noisy data by imposing additional norm constraints on D (3.2.3). 3.2.1. (Generalized) Schur complements and the RSVD. The notion of a Schur complement S of the matrix A in M (which is S D* CA1B when A is square nonsingular), can be generalized to the case where the matrix A is rectangular and/or rank deficient [4] as follows.
DEFINITION 3 ((Generalized) Schur complement). A generalized Schur compleA ment of A in M (c .)is any matrix S D*CAB where A is an inner inverse of A. In general, there are many generalized Schur complements, because from Lemma 1 we know that there are many inner inverses. However, the RSVD allows us to investigate the dependency of S on the choice of the inner inverse. THEOREM 8. The Schur complement S D* CAB is independent of A if and only if ra tab =rac. In this case, S is given by
S
Uc
EI Sf E[2 E3
2 In order to keep the notation consistent with that of 3.1, we use the matrix D*, which is the complex conjugate transpose of D in 3.1, as the lower right block of M. This allows us, for instance, to use the same matrix E as defined in (3) and (4).
416
B.L.R. DE MOOR AND G. H. GOLUB
Proof. Consider the factorization of inner inverse of A can be written as
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
S 0
AQ
0 0

A
0 0
as in the
RSVD. From Lemma 1, every
0
I
0 0
I
0
0 0 0
X X
I
Xl X52 X3 X4 X X61 X. X3 X4 X X6
X25 X26 x3 x36 x45 x46
p.
for certain block matrices Xij, where the block dimensions correspond to the block dimensions of the matrix S of Theorem 1. It is straightforward to show that
CA B
Uc
S 0
0
0 0 0
SX SX3
Hence, this product
0 0 0 0
XS
X2 S.
0
V
SzXS
is dependent on the blocks X5, X25, Xh, X53, X55. The corre[3 sponding block dimensions are 0 if and only if ra rab rac. Observe that the theorem is equivalent with the statement that the (generalized) Schur complement S D* CAB is independent of the precise choice of A if and only if R(B) C R(A) and R(C*) C R(A*). This corresponds to Carlson’s statement of the same result (Proposition 1 of [4]). In the case that these conditions are not satisfied, all possible generalized Schur complements are parameterized by the blocks X5, X53, X15, X25, and X55 as
3.2.2. How does the rank of M change with changing D? Define the A B matrix M([9) (c D*b. )" We shall also use D D D. How can we modify the rank of M(/) by changing the matrix/? Before answering this question, we need to
state the following (wellknown) lemma. LEMMA 3 (rank of a partitioned matrix and the Schur complement). square and nonsingular, then
rank
If A
is
C D*
rank(A) + rank(D*
CAB).
Proof.
Observe that:
C D*
o) (,
CA
Thus we have Theorem 9. THEOREM 9.
rank

0
I
0
D*CAIB
)(,
0
A C D*
rab
+ rac ra + rank
E2 E3
E2 E3
3.’2
"33
RESTRICTED SINGULAR VALUE DECOMPOSITION
417
Proof. From the RSVD, it follows immediately that the required rank is equal to the rank of the matrix
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
’ S1
0 0 0 0 0
0
0
0
I 0 0 0 I 0 0 0 I
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 $3
0 0 0 0 0 0
0 0 0 0
I
0 0 0 0 0
0 0
I
0 0 0
0 0 0 0 0 0
2
0 0 0 0
0
I
0 0 \ 0
I 0 0
000 0 0 0
EI
From the nonsingularity of $2 and $3, it follows that the rank is independent of E41, E42, E43, El4, E24, E34, E44. The result then follows immediately from Lemma
[:] 3, taking into account the block dimensions of the matrices. A consequence of Theorem 9 is the following result. COROLLARY 1. The range of ranks r of M attainable by an appropriate choice
of[9
in M
(A B CO’[9*)
tab
is
+ rac ra
__
r
E22 E3, E2 EI E
E2 E4 E34 E44
E
min(p +rac, q + tab).
The minimum is attained for
(12)
where the
/*=Uc
E3
E4
E,3 E3 E3 E4 E4 E4
and 44 are arbitrary matrices.
matrices/4,/24,/34,/41,/42,/43,
Compare the expression of D of Corollary i with the expression for the generalized Schur complement of A in M, as given by (11). Obviously, the set of matrices D contains all generalized Schur complements, which are those matrices D for which /34 E34 and E43 E43. If these blocks are not present in E, there are no matrices D, other than generalized Schur complements, that minimize the rank of M. Hence,
we have proved the following theorem.
THEOREM 10. The rank of M([9) is minimized for [9 equal to a generalized Schur complement of A in M. The rank of M([9) is minimized only for D D* CAB where A is an inner inverse of A, if and only ff rab ra or rc q and rac rc or rac, then the minimizing D is unique. tab rb p. If ra The fact that each generalized Schur complement minimizes the rank of Proof. M(/) follows directly from the comparison of/ in Corollary 2 with the expression for the generalized Schur complement in (11). The rank conditions follow simply from setting the block dimensions of E34 and E43 in (4) equal to 0. The condition for uniqueness of/ follows from Theorem 8. This theorem can also be found as Theorem 3 of [4], where it is proved via a different approach. Related results can be found in [7] and [31].
3.2.3. Total linear least squares with exact rows and columns. The nomenclature total linear least squares was introduced in [16]. The technique is an extension of least squares fitting in the case where there are errors in both the observation vector b and the data matrix A for overdetermined sets of linear equations
418
B.L.R. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Ax b. The analysis and solution is given completely in terms of the OSVD of the concatenated matrix (A b). In the case where some of the columns of A are noisefree while the others contain errors, a mixed least squarestotal least squares strategy was developed in [17]. The problem where some rows are also errorfree was analysed via a Schur complementbased approach in [7]. One of the key canonical decompositions (Lemma 2 of [7]) and related results concerning rank minimization were described earlier in [4]. Another useful reference is [31]. We shall now show how the RSVD allows us to treat the general situation in an elegant way. Again, let the data matrix A be given as M (c DB.) where A, B, C are free of error and only D is contaminated by noise. It is assumed that the data matrix is of full row rank. The constrained total linear least squares problem is the following. Find the matrix/) and the nonzero vector x such that
C
and
D*
lID/)IIF is minimized.
A slightly more general problem is the following. Find the matrix/) such that lid/)IIF is minimal and
(in)
rank
C /).
Assume that a solution x is found. The error matrix D/ will be denoted by By partitioning x conformally to the dimensions of A and B, we find that the vector x satisfies
.
_< r.
Axl + Bx2
O,
O.
Cx + [9*x
Hence, the total least squares problem can be interpreted as follows. The rows of A
and B correspond to linear constraints on the solution vector x. The columns of the matrix C contain errorfree (noiseless) data while those of the matrix D are corrupted by noise. In order to find a solution, we must modify the matrix D with minimum effort, as measured by the Frobenius norm of the "error matrix" /), into the matrix /). Without the constraints imposed by matrices A and B, the problem reduces to a mixed lineartotal linear least squares problem, as is analysed and solved in [17]. From the results in 3.2.2, we already know that a necessary condition for a solution to exist is r >_ tab +rac ra (Corollary 1). The class of rank minimizing matrices/) is described by Corollary 1 when r tab + rac ra. Theorem 9 shows how the generalized Schur complements of A in M form a subset of this set. From Corollary 1, it is straightforward to find the minimum norm matrix/) that reduces the rank of M() to r rab t rac ra. It is given by
*
EISf
U
i:
0
E E
0 0
E2 E2 E9. E3
0 0 0 0
1
vb*"
The minimum norm generalized Schur complement that minimizes the rank of M is
RESTRICTED SINGULAR VALUE DECOMPOSITION
419
given by
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
E[S
S
Uc
El2
E
0
E3 0
E2 E23
o
0
/
EIS 1, X25
This corresponds to a choice of inner inverse in
E2 S 1, X51 S 1E4, X53 S 1E4, X55 S 1E4 S 1.
(11) given by X15
We shall now investigate two solution strategies, both of which are based on the RSVD. The first one is an immediate consequence of Theorems 6, but, while elegant and extremely simple, might be considered as suffering from some "overkill." It is a direct application of the insights obtained in analysing the sum A + BDC. The second one is less elegant but is more in the line of results reported in [4] an,d7]. It exploits the insights obtained from analysing the partitioned matrix M ( 3.2.3.1. Constrained total linear least squares directly via the I:tSVD. It is straightforward to show that the constrained total least squares problem can be
.
recast as a minimum norm problem as discussed in Theorem 6 as follows. Find the matrix D of minimum norm I[DII such that
Iq rank((A ) + (Omq)f). (0nn /P)
B C D*
The solution is an immediate consequence of Theorem 6. COROLLARY 2. The solution of the constrained total linear least squares problem follows from the application of Theorem 6 to the matrix triplet A’, B’, C’ where
C D*
Ia
=(0pxn Ip).
Hence, all that we need is the RSVD of the matrix triplet (A’, B ’, C ) and the truncation of the matrix $1 as described in Theorem 6. It is interesting to also apply Theorem 3 to the matrix triplet (A B C)
,,
o
ra, b,=rank ra, c,
A B 0 C D* Iq
C D* C D*
)
=tab+q,
rank
rank
rac + P ra + p + q.
ra’b’c’
Hence, from Theorem 3, the minimum achievable rank is ra’b’ qra’c’ ra’b’c’ rab qrac ra, which corresponds precisely to the result from Corollary 1. As a special case, consider the GolubHoffmanStewart result [17] for the total linear least squares solution of (A B)x , O, where A is noisefree and B is contaminated with errors. Instead of applying the QRSVDleast squares solution as discussed in [17], we could as well achieve the mixed lineartotal linear least squares solution from the following. Minimize I[/1[ such that
rank((A B) J0(0pn Ip)) <_ r,
420
B.L.R. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
where r is a prespecified integer. This can be done directly via the QSVD of the matrix pair ((A B), (Opn Ip)) and it is not too difficult to provide another proof of the GolubHoffmanStewart result derived in [17], now in terms of the properties of the QSVD. As a matter of fact, the RSVD of the matrix triplet of Corollary 2 allows us to provide a geometrical proof of constrained total linear least squares, in the line of the GolubHoffmanStewart result, taking into account the structure of the matrices B’ and C ’. We shall not, however, consider this any further in this paper.
3.2.3.2. Solution via RSVDOSVD. While the solution to the constrained total least squares problem as presented in Corollary 2 is extremely simple, we might object to it because of the apparent "overkill" in computing the RSVD of the matrix triplet (A B C), where B and C have an extremely simple structure (zeros and the identity matrix). It will now be shown that the RSVD, combined with the OSVD, may lead to a computationally simpler solution, which more closely follows the lines of the solution as presented in [7]. Using the RSVD, we find that
,,
C D*
Let E*
.)
Since
0
Uc
0)( Sc o
S Sc /*
U2D*Vb
0
V(
0)
UD* Vb.
Uc and Vb are unitary matrices, the problem can be restated
as follows.
Find/ such that
liE/llF is minimal and
rank
The constrained total least squares problem can now be solved as follows. THEOREM 11 (RSVDOSVD solution of constrained total least squares). Consider the OSVD
El2 E3 E22 E23 E32 E33 i1 where re is the rank of this matrix. The modification of minimal Frobenius norm follows immediately from the OSVD of this matrix by truncating its dyadic decomposition after r tab rac + ra terms. Let
EllS
E2 E31
)
Z
rtabrac
U gr
i=1
Then the optimal [9 is given by
o o u;.
Theorem 9, it follows that the rank of reducing the rank of the matrix
E
Proof. From
E21 E31
S

( j.) can be reduced by
E12 El3 E22 E23 E32 E33
)
RESTRICTED SINGULAR VALUE DECOMPOSITION
421
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
The matrix/) is then obtained from (12) by setting the blocks/14, /24, /34, /41, /42,/43,/43 to 0 in order to minimize the Frobenius norm and then truncating the OSVD of the matrix above. We conclude this section by pointing out that more results as well as algorithms to solve total least squares problems with and without constraints and given covariance matrices, can be found in [7], [28], [29], and [31].
3.3. Generalized GaussMarkov models with constraints. Consider the problem of minimizing Ilyll 2 + Ilzll 2 y*y + z*z over all vectors x, y, z satisfying
b
Ax + By,
z
Cx
where A, B, C, b are given. This formulation is a generalization of the conventional least squares problem where B Im and C 0. The formulation above admits singular or illconditioned matrices B and C. The problem formulation as presented here could be considered as a "square root" version of the problem as follows. Find x such that
is minimized, where Ilullwb u*Wbu and Wb and Wc are nonnegativedefinite symmetric matrices. In the case that BB* is nonsingular, we can put Wb (BB*) and Wc C*C. The solution can then be obtained as follows. Minimize Ilyll 2 + Ilzl] 2 where

y*y z* z
(b Ax)*Wb(b Ax),
X* C* Cx.
Setting the derivative with respect to x equal to 0, results in
(14)
x
(A*WbA + Wc)IA*Wbb.
In the case where Wb =Im and C 0, (14) reduces to the classical least squares expression. For the more general case, we can see a connection with socalled regularization problems. Consider the case where C 0 and B Im. If the matrix A is ill conditioned (because of socalled collinearities, which are (almost) linear dependencies
:
among the columns of A), the addition of the term C*C may possibly make the sum better suited for numerical inversion than the original product A* A, hence stabilizing the solution x. The matrix B acts as a "static" noise filter: Typically, it is assumed that the vector y is normally distributed with the covariance matrix E(yy*) being a multiple of the identity. The error vector By for the first equation can only be in a direction which the column space of B. If the observation vector b has some component is present in a certain direction not present in the column space of B, this component should be considered as errorfree. The matrix C represents a weighting on the components of x. It reflects possible a priori information concerning the unknown components of x or may reflect the fact that certain components of x (or linear combinations thereof) are more "likely" or less costly than others. The fact that we try to minimize y*y + z*z reflects the intention to explain as much as possible (i.e., min y’y) in terms of the data (columns of the matrix A), taking into account a priori knowledge of the
n
422
B.L.R. DE MOOR AND G. H. GOLUB
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
geometrical distribution of the noise (the weighting Wb). The matrix C reflects the cost per component, expressing the preference (or prejudice?) of the modeller to use more of one variable in explaining the phenomenon than of another. In applications, however, typically the matrix A contains many more rows than columns, which corresponds to the fact that better results are to be expected if there are more equations (measurements) than unknowns. However, the condition that BB* is nonsingular requires a priori knowledge concerning the statistics of the noise. Because typically this knowledge is rather limited, B will have fewer columns than rows, implying that BB* is singular and (14) does not apply. In this case, however, the RSVD can be applied. It provides important geometrical information on the sensitivity of the solution. Inserting the RSVD of the matrix triplet (A, B, C), the problem can be rewritten as
(P’b) (U2z)
Define b’ P*b,x’ Qix, y b x y, z ’, it follows that
Sa(Qlx) + Sb(V , y),
Vy,z
,,
Uz.
Then, with obvious partitionings of
Observe that b’6 0 is a consistency condition. It reflects the fact that b is not allowed to have a component in a direction that is not present in the column space of (A B). The components of and can be estimated without error while the fact that b5 S2y could be exploited to estimate the variance of the noise. Most terms in the object function y*y + z*z can now be expressed with the subvectors x’, (i 1,..., 6),
x
x
The minimum solution follows from differentiation with respect to these vectors and results in
2 X 3
b b 3,
+
X4 x5 X6
b, 0
Y’2 0, Y3 0, Y4 Sb5,
+
z2 Z3 z4
b,
0, 0,
+
arbitrary.
Statistical properties, such as (un)biasedness and consistency, can be analysed in the same spirit as in [23], where Paige has related the GaussMarkov model without the zequation, to the QSVD. Similarly, the RSVD also allows us to analyse the sensitivity of the solution. If, for instance, $2 is ill conditioned, then the minimum of the object function will tend to be high, whenever b has strong components among the "weak"
singular vectors of $2, because of the term b*
RESTRICTED SINGULAR VALUE DECOMPOSITION
423
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
c where A, B, C, b, c are given. This is also a GaussMarkov linear estimation problem as in [23], but now with constraints. The solution is again straightforward from the RSVD. With b’ P’b, x’ Qix, y’ Vy, c’ Uc, and an appropriate partitioning, we find
A related problem is the following. Minimize y*y subject to b Ax / By and Cx
y’
y b Y3:0
=0,
Si c,
X’
x
arbitrary.
0 are two consistency conditions.
,,la,4 3
Observe that
c
b and c
4. Conclusions and perspectives. In this paper, we have derived a generalization of the OSVD, the restricted singular value decomposition (RSVD), which has the OSVD, PSVD, and QSVD as special cases. A constructive proof, based upon a sequence of OSVDs and PSVDs can be found in [9]. We have also analysed in detail its structural and geometrical properties and its relations to generalized eigenvalue problems and canonical correlation analysis. It was shown how the RSVD is a valuable tool in the analysis and solution of rank minimization problems with restrictions. First, we have shown how to study expressions of the form A + BDC and find matrices D of minimum norm that minimize the rank. It was demonstrated how this problem is connected to the concept of shorted operators and matrix balls. Second, we have analysed in detail low rank approximations of a partitioned matrix, when only one of its blocks can be modified. The close relation with generalized Schur complements was discussed and it was shown how the RSVD permits us to solve constrained total linear least squares problems with mixed exact and noisy data. Third, it was demonstrated how the RSVD provides an elegant solution to GaussMarkov models with constraints. The fact that the RSVD is only the tip of an iceberg of generalizations of the OSVD for 2, 3, 4, matrices, is fully explored in [11].
Acknowledgments. We would like to thank Hongyuan Zha, Sabine Van Huffel and the referees for their detailed comments and suggestions, which allowed us to improve considerably an earlier version of this paper. We would also like to thank Dan Boley for providing us with an English translation of Beltrami’s original paper.
REFERENCES
[I] L. AUTONNE, Sur les groupes lindaires, rdels et orthogonaux, Bull. Sci. Math., France, 30 (1902),
pp. 121133.
[2] E. BELTRAMI, Sulle funzioni bilineari, in Giornale di Mathematiche, G. Battagline and E. Fergola, eds., 11 (1873), pp. 98106. [3] A. BJSRCK AND G. H. GOLUB, Numerical methods for computing angles between linear subspaces, Math. Comp., 27 (1973) pp. 579594. [4] D. CARLSON, What are Schur complements, anyway?, Linear Algebra Appl., 74 (1986), pp. 257275.
[5] J. S. CHIPMAN,
[6] [7]
Estimation and aggregation in econometrics: An application of the theory of generalized inverses, in Generalized Inverses and Applications, Academic Press, New York, 1976, pp. 549769. C. DAVIS, W. M. KAHAN, AND H. F. WEINBERGER, Normpreserving dilations and their application to optimal error bounds, SIAM J. Numer. Anal., 19 (1982), pp. 445469. J.W. DEMMEL, The smallest perturbation of a submatrix which lowers the rank and constrained total least squares problems, SIAM J. Numer. Anal., 24 (1987), pp. 199206.
424
B.L.R. DE MOOR AND G. H. GOLUB
Generalized singular value decompositions: A proposal for a standardized nomenclature, Numerical Analysis Project Manuscript NA8903, Department of Computer Science, Stanford University, Stanford, CA, April 1989; also in ESATSISTA Report 198910, Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, April 1989. The restricted singular value decomposition: Properties and applications, Numerical Analysis Project Manuscript NA8904, Department of Computer Science, Stanford University, Stanford, CA, April 1989; also in ESATSISTA Report 198909, Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, April 1989. B. DE MooR, On the structure and geometry of the product singular value decomposition, Numerical Analysis Project Manuscript NA8905, Department of Computer Science, Stanford University, Stanford, CA, May 1989; also in ESATSISTA Report 198912, Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, May 1989. B. DE MOOR AND H. ZHA, A tree of generalizations of the ordinary singular value decomposition, ESATSISTA Report 198921 and revised version ESATSISTA Report 199011, Department of Electrical Engineering, Katholieke Universiteit Leuven, Belgium; also in the special issue on Matrix Canonical Forms, Linear Algebra Appl., to appear. C. ECKART AND G. YOUNG, The approximation of one matrix by another of lower rank, Psychometrik, 1 (1936), pp. 211218. L. M. EWERBRING AND F. T. Lug, Canonical correlations and generalized SVD: Applications and new algorithms, Proc. SPIE, Vol. 977, Real Time Signal Processing XI, paper 23, 1988. K. V. FERNANDO AND S. J. HAMMARLING, A product induced singular value decomposition for two matrices and balanced realisation, in Linear Algebra in Signal Systems and Control, B. N. Datta, C. R. Johnson, M. A. Kaashoek, R. Plemmons, and E. Sontag, eds., Society for Industrial and Applied Mathematics, Philadelphia, PA, 1988, pp. 128140. G. H. GOLUB AND C. VAN LOAN, Matrix Computations, The Johns Hopkins University Press, Baltimore, MD, 1983. An analysis of the total least squares problem, SIAM J. Numer. Anal., 17 (1980), pp. 883893. G. H. GOLUB, A. HOFFMAN, AND G. W. STEWART, A generalization of the EckartYoungMirsky matrix approximation theorem, Linear Algebra Appl., 88/89 (1987), pp. 317327. W. E. LARIMORE, Identification of nonlinear systems using canonical variate analysis, in Proc. 26th Conference on Decision and Control, Los Angeles, CA, December 1987.
[8] B. DE Moor AND G. H. GOLUB,
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
[9]
,
[10]
[11]
[12] [13] [14]
[15] [16] [17] [18]
,
[19] C. LAWSON [20] [21] [22] [23] [24] [25]
AND R. HANSON, Solving Least Squares Problems, PrenticeHall, Englewood Cliffs, N J, 1974. S. K. MITRA AND M. L. PURI, Shorted matricesAn extended concept and some applications, Linear Algebra Appl., 42 (1982), pp. 5779. M. Z. N ASHED, ED., Generalized Inverses and Applications, Academic Press, New York, 1976. C. C. PAIGE AND M. A. SAUNDERS, Toward8 a generalized singular value decomposition, SIAM J. Numer. Anal., 18 (1981), pp. 398405. C. C. PAIGE, The general linear model and the generalized singular value decomposition, Linear Algebra Appl., 70 (1985), pp. 269284. R. PENROSE, A generalized inverse for matrices, Proc. Cambridge Philos. Soc., 51 (1955),
pp. 406413. On best approximate solutions 52 (1956), pp. 1719.
rgduction
of linear matrix equations, Proc. Cambridge Philos. Soc., forme
lindolindaire
sa
[26] J. J. SYLVESTER Sur la [27] J. VANDEWALLE
[28] [29]
AND
biorthogonale d’une
ique, Comptes Rendus,
[30]
tion, in SVD and Signal Processing: Algorithms, Applications and Architectures, E. Deprettere, ed., NorthHolland, Amsterdam, 1988, pp. 4391. S. VAN HUFFEL AND J. VANDEWALLE, Analysis and properties of the generalized total least B when somme or all columns in A are subject to error, SIAM J. squares problem Ax Matrix Anal. Appl., 10 (1989), pp. 294315. S. VAN HUFFEL AND n. ZHA, Restricted total least squares: A unified approach for solving (generalized) (total) least squares problems with(out) equality constraints, ESATSISTA Report 198905, Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, March 1989. C. F. VAN LOAN, Generalizing the singular value decomposition, SIAM J. Numer. Anal., 13 (1976), pp. 7683.
CVIII, 1889, pp. 651653. B. DE MOOR, A variety of applications of the singular value decomposi
forme
canon
RESTRICTED SINGULAR VALUE DECOMPOSITION
425
Downloaded 10/23/13 to 211.140.199.43. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
[31] G. A. WATSON, The smallest perturbation of a submatrix which lowers the rank of the matrix, IMA J. Numer. Anal., 8 (1988), pp. 295303. [32] H. ZHA, Restricted SVD for matrix triplets and rank determination o.f matrices, Scientific
[33]
, A numerical algorithm .for computing the RSVD for matrix triplets, Scientific Report
891, KonradZuseZentrum fiir Informationstechnik, Berlin, Germany, 1989.
Report 892, KonradZuseZentrum fiir Informationstechnik, Berlin, Germany, 1989.