Solving Systems of Linear Equations

Section 5.1 Solving Systems of Linear Equations

Matrix manipulation can be found in many applied mathematical fields, and the principle reason for this is the utility of amtrices in solving systems of simultaneous linear equations.

Definition 5.1.1. System of Linear Equations.

A system of linear equations (SLE) is a collection of \(m-1\) equations in variables \(z_0, z_1, \dotsc, z_{n-1}\) of the form

\begin{align*} a_{0,0}z_0 + a_{0,1}z_1 + a_{0,2}z_2 + \cdots + a_{0,n-1}z_{n-1} \amp = b_0\\ a_{1,0}z_0 + a_{1,1}z_1 + a_{1,2}z_2 + \cdots + a_{1,n-1}z_{n-1} \amp = b_1\\ a_{2,0}z_0 + a_{2,1}z_1 + a_{2,2}z_2 + \cdots + a_{2,n-1}z_{n-1} \amp = b_2\\ \amp\vdots\\ a_{m-1,0}z_0 + a_{m-1,1}z_1 + a_{m-1,2}z_2 + \cdots + a_{m-1,n-1}z_{n-1} \amp = b_{m-1} \end{align*}

where all the values \(a_{r,c},b_r\in\Comps\) for \(r\in\upto{m}\) and \(c\in\upto{n}\text{.}\) A vector \(\vv\in\CV{n}\) is a solution to the system of linear equations if and only if

\begin{equation*} a_{r,0}\entry{\vv}{0} + a_{r,1}\entry{\vv}{1} + \cdots + a_{r,n-1}\entry{\vv}{n-1} = \sum_{c=0}^{n-1}a_{r,c}\entry{\vv}{c} = b_r \end{equation*}

for all \(r\in\upto{m}\text{.}\)

This should look familiar to everyone, as there are also signs of this being a linear combination!

Since we already have the language of matrices and vectors, we can take the definition of an SLE and recast it as a vector equation very easily. Consider the following: given

a constant matrix \(A\in\Mats{m,n}\) satisfying \(\entry{A}{r,c}=a_{r,c},\)
a variable vector \(\vv\in\CV{n}\) satisfying \(\entry{\vv}{c} = z_c\text{,}\) and
a constant vector \(\vec{b}\in\CV{m}\) satisfying \(\entry{\vec{b}}{r} = b_r\)

for all \(r\in\upto{m}\) and \(c\in\upto{n}\text{,}\) we can recast the equation using bracket notation for entries. In fact,

\begin{equation*} \entry{\vec{b}}{r} = \entry{\sum_{c=0}^{n-1}\entry{A}{r,c}\entry{\vv}{c}}{r} = \entry{A\vv}{r} \end{equation*}

for all \(r\in\upto{m}\) by definition of matrix-vector product, and so by definition of vector equality this is the vector equation

\begin{equation*} A\vv = \vec{b}\text{.} \end{equation*}

Definition 5.1.2. Augmented matrix representation of a SLE.

The SLE represented by a vector equation \(A\vv = \vec{b}\) will also has an augmented matrix representation, specifically the matrix \([A|\vec{b}]\in\Mats{m,n+1}\text{.}\)

In order to build an algorithm for solving systems of equations, we need to carefully define a few more matrix operations.

Subsection 5.1.1 Elementary row operations

There are precisely three new operations which must be added to the algebraic operations on \(\Mats{m,n}\) to allow us to build an amazing algorithm. We will call these the row swap, the row scaling, and elimination operations; the last name is not necessarily accurate to the operation but is precisely the use to which we will put the operation.

Definition 5.1.3. Row swap operation.

The operation \(\rswap{k}{\ell}\) applied to the matrix \(A\in\Mats{m,n}\) produces the matrix \(A'\in\Mats{m,n}\) satisfying

\begin{equation*} \entry{A'}{r,c} = \begin{cases} \entry{A}{r,c} \amp r\notin\set{k,\ell} \\ \entry{A}{k,c} \amp r=\ell \\ \entry{A}{\ell,c} \amp r=k \end{cases}\text{.} \end{equation*}

That is to say, \(A'\) is the same as \(A\) except rows \(k\) and \(\ell\) have been swapped.

Definition 5.1.4. Row scaling operation.

The operation \(\rscale{\alpha}{k}\) applied to the matrix \(A\in\Mats{m,n}\) produces the matrix \(A'\in\Mats{m,n}\) satisfying

\begin{equation*} \entry{A'}{r,c} = \begin{cases} \entry{A}{r,c} \amp r\neq k \\ \alpha\entry{A}{r,c} \amp r=k \end{cases}\text{.} \end{equation*}

That is to say, \(A'\) is the same as \(A\) except all entries in row \(k\) have been scaled by a factor of \(\alpha\text{.}\)

Definition 5.1.5. Row elimination operation.

The operation \(\relim{\alpha}{k}{\ell}\) applied to the matrix \(A\in\Mats{m,n}\) produces the matrix \(A'\in\Mats{m,n}\) satisfying

\begin{equation*} \entry{A'}{r,c} = \begin{cases} \entry{A}{r,c} \amp r\neq \ell \\ \alpha\entry{A}{k,c}+\entry{A}{\ell,c} \amp r=\ell \end{cases}\text{.} \end{equation*}

That is to say, \(A'\) is produced from \(A\) by scaling row \(k\) of \(A\) by a factor of \(\alpha\text{,}\) and adding corresponding entries of that row to the entris of row \(\ell\text{;}\) the resulting row remains in row \(\ell\) and row \(k\) of \(A'\) is identical to that in \(A\text{.}\)

Without examples, these three operations can be quite confusing.

Example 5.1.6. Examples of the elementary row operations.

Let us begin with a particular matrix \(A\in\Mats{3,5}\text{,}\)

\begin{equation*} A = \begin{bmatrix} -7 \amp 1 \amp -1+1j \amp 16+8j \amp -44+1j \\ -22j \amp 1-2j \amp 1-11j \amp -1+3j \amp 29+2j \\ -3-3j \amp 90j \amp -2+1j \amp -1-1j \amp 0 \end{bmatrix}\text{.} \end{equation*}

We will denote the performance of a row operation with an annotated arrow, like so: \(A\xto{\rswap{0}{2}} A'\text{.}\)

(a) Swapping rows.

This operation does not introduce any problems, as it simply rearranges the elements of a matrix. For instance,

\begin{equation*} A \xto {\rswap{0}{2}} \begin{bmatrix} -3-3j \amp 90j \amp -2+1j \amp -1-1j \amp 0 \\ -22j \amp 1-2j \amp 1-11j \amp -1+3j \amp 29+2j \\ -7 \amp 1 \amp -1+1j \amp 16+8j \amp -44+1j \end{bmatrix} \end{equation*}

(b) Scaling a row.

Unlike swapping a row, scaling a row can introduce difficulties as the process of complex multiplication (especially by a reciprocal) can be tedious and reqire a conversion to floating point representations. For an easy example, see this:

\begin{equation*} A \xto{\rscale{(2-1j)}{1}} \begin{bmatrix} -7 \amp 1 \amp -1+1j \amp 16+8j \amp -44+1j \\ -22-44j \amp -5j \amp -9-23j \amp 1+7j \amp 60-25j \\ -3-3j \amp 90j \amp -2+1j \amp -1-1j \amp 0 \end{bmatrix} \end{equation*}

(c) Eliminating a row.

While this operation can be used for any \(\alpha\in\Comps\) and valid row indices, its most typical use is to change the left-most nonzero entry in a row into a zero, by adding a multiple of another row. For instance, the leftmost nonzero entry of the row index 1 in \(A\) is \(-22j\text{;}\) to add a scalar multiple of the row of index 0 to this and eliminate the \(-22j\) requires scaling the top row by the constant \(-\frac{22}7 j\text{.}\) That is,

\begin{equation*} A\xto{\relim{(-\frac{22}7 j)}{0}{1}} \begin{bmatrix} -7 \amp 1 \amp -1+1j \amp 16+8j \amp -44+1j \\ 0 \amp \frac17(1-36j) \amp \frac17(29-55j) \amp \frac17(169-331j) \amp \frac17(225+982j) \\ -3-3j \amp 90j \amp -2+1j \amp -1-1j \amp 0 \end{bmatrix}\text{.} \end{equation*}

The difficulties here are not found in exact arithmetic, but in the approximation to exact arithmetic which is carried out during floating point computation. With exact numeric types, none of these problems occur, but this necessitates the use of sophisticated computer algebra systems or more sophisticated mathematical resources than Python’s math package. Just in this example, for instance, we see the following:

>>> 1-36j/7
(1-5.142857142857142j)

It is interesting that for each of these elementary row operations there is an associated elementary matrix which provides by left-multiplication the same effect.

Theorem 5.1.7. Elementary matrices.

Let \(I_n\in\Mats{m,m}\) be the unique identity matrix of size \(m\text{,}\) satisfying

\begin{equation*} \entry{I_m}{r,c} = \begin{cases}1,\amp r=c\\ 0,\amp r\neq c\end{cases}\text{.} \end{equation*}

Moreover, let

\begin{align*} I_m \amp\xto{\rswap{k}{\ell}} S_{k,\ell} \amp I_m \amp\xto{\rscale{\alpha}{k}} M_{\alpha, k} \amp I_m \amp\xto{\relim{\alpha}{k}{\ell}} E_{\alpha,k,\ell}\text{.} \end{align*}

Then for any \(A\in\Mats{m,n}\text{,}\)

\begin{align*} A\amp\xto{\rswap{k}{\ell}} S_{k,\ell} A \amp A\amp\xto{\rscale{\alpha}{k}} M_{\alpha,k} A \amp A\amp\xto{\relim{\alpha}{k}{\ell}} E_{\alpha,k,\ell} A\text{.} \end{align*}

The matrix \(I_m\in\Mats{m,m}\) serves as the left-multiplicative identity on \(\Mats{m,n}\text{,}\) since \(I_m A = A\) for every \(A\in\Mats{m,n}\text{.}\) Moreover in \(\Mats{m,m}\) the matrix \(I_m\) is the multiplicative identity for both sides of multiplication, since \(I_mA = A = AI_m\) whenever \(A\in\Mats{m,m}\text{.}\)

Subsection 5.1.2 Row equivalence and solutions to SLEs

It is beyond the scope of this class to prove the following results, but they are intrinsicly important to the study of SLEs.

Definition 5.1.8. Row-equivalent matrices.

Two matrices \(A,A'\in\Mats{m,n}\) are row-equivalent if there is a finite sequence \(S\) of elementary row operations such that \(A\xto{S} A'\text{.}\)

Theorem 5.1.9. Row-equivalent augmented matrices have the same solutions.

Suppose that \(A,A'\in\Mats{m,n}\) and \(\vec{b},\vec{b'}\in\CV{m}\) such that the augmented matrices \([A|\vec{b}]\) and \([A'|\vec{b}']\) are row-equivalent. Then \(\vv\in\CV{n}\) is a solution to the augmented system \([A|\vec{b}]\) if and only if \(\vv\) is a solution to the augmented system \([A'|\vec{b}']\text{;}\) that is, \(A\vv = \vec{b}\) if and only if \(A'\vv = \vec{b}'\text{.}\)

This is important but not all-important until we discover that there is a particular form of matrix which allows for easy determination of the solutions of an augmented system.

Definition 5.1.10. Reduced row-echelon form.

A matrix \(A\in\Mats{m,n}\) is in reduced row-echelon form, abbreviated RREF, if and only if it satisfies the following conditions:

ZR@B: Zero-Rows at the Bottom: Any row of \(A\) consisting only of zeros has a greater row index than any row containing any nonzero elements; visually this is further “down” in the matrix.
LMNZ=1: Leftmost Nonzeros Equal 1: In any row of \(A\) not consisting only of zeros, the leftmost nonzero (LMNZ) entry is a 1. The LMNZ is the nonzero entry of least column index.
LMNZKC: Leftmost Nonzeros Kill Columns: In a column containing a LMNZ, the only nonzero entry in the column is the LMNZ.
LMNZGDR: Leftmost Nonzeros Go Down and Right: If rows index \(k\) and \(\ell\) have \(k\lt\ell\text{,}\) then the column index of the LMNZ in row \(k\) is less than the column index of the LMNZ in row \(\ell\text{;}\) visually that makes the LMNZs go down and right, starting in the upper left corner of the matrix.

These can be formalized in terms of entry notation using brackets, but it is not fully necessary to do so. Moreover, let \(D=\set{d_0,d_1,\dotsc,d_{s-1}}\) be the set of column indices containing LMNZs; these are called pivot columns, as another term for the matrix position containing a LMNZ is a pivot position. Similarly let \(F=\set{f_0,f_1,\dotsc,f_{n-s-1}}\) be the complementary set of column indices not containing a LMNZ, so that \(D\cup F = \upto{n}\text{.}\)

The sets \(D\) and \(F\) will be used algorithmically to produce a general solution to a system of linear equations.

Example 5.1.11. Matrices in RREF and not in RREF.

(a) A matrix in RREF.

The matrix

\begin{equation*} A = \begin{bmatrix} 1 \amp 0 \amp 2 \amp 0 \amp 0 \\ 0 \amp 1 \amp 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \end{bmatrix} \end{equation*}

is in RREF. Check:

(i) ZR@B.

The matrix has one row all of zeros, and it occurs at the bottom, beneath any row containing a nonzero element. Good.

(ii) LMNZ=1.

In the first rows indexed 0, 1, and 2, there are nonzero entries. The leftmost nonzero entries of these rows are \(\entry{A}{0,0}=1\text{,}\) \(\entry{A}{1,1}=1\text{,}\) and \(\entry{A}{2,3}=1\text{.}\) Very good!

(iii) LMNZKC.

Columns indexed 0, 1, and 3 contain LMNZs, and all other elements are 0. Great!

(iv) LMNZGDR.

The pairs of row indices and column indices of LMNZs in \(A\) are, in order, \((0,0)\text{,}\) \((1,1)\text{,}\) and \((2,3)\text{,}\) which are in “lexicographic” order, which is correct. Done!

(b) A matrix not in RREF.

The matrix

\begin{equation*} B = \begin{bmatrix} 0 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 3 \amp 0 \amp 1 \\ 1 \amp 1 \amp 0 \amp 1 \amp 1 \\ 0 \amp 1 \amp 0 \amp 1 \amp 1 \end{bmatrix} \end{equation*}

actually fails all four conditions.

We have defined RREF, but not demonstrated its usefulness, nor have we given any additional properties of RREF matrices. The most important property of the RREF is the following result.

Theorem 5.1.12. Existence and Uniqueness of RREF.

Let \(A\in\Mats{m,n}\text{.}\) Then there is a matrix \(B\in\Mats{m,n}\) which is in reduced row-echelon form and is row-equivalent to \(A\text{;}\) moreover, if there are two such matrices \(B,B'\in\Mats{m,n}\) which are both in reduced row-echelon form and each is row-equivalent to \(A\text{,}\) then \(B=B'\text{.}\)

Due to the existence and uniqueness result, it is common to discuss “the reduced row-echelon form of a matrix \(A\)”, when really we mean “a matrix \(A'\) in RREF which is row-equivalent to the matrix \(A\)”.

Remark 5.1.13. GJE notation.

We often write \(A\xto{GJE}A'\) when we mean that \(A'\) is the matrix in RREF which is row-equivalent to \(A\text{,}\) specifically to avoid that lengthy phrase.

Since we mentioned above that the matrix \(I_n\) is multiplicative identity in \(\Mats{n,n}\text{,}\) it is interesting to note which matrices in \(\Mats{n,n}\) have multiplicative inverses.

Definition 5.1.14. Matrix inverses.

Let \(A\in\Mats{n,n}\text{.}\) Then a matrix \(B\in\Mats{n,n}\) is an inverse of \(A\) if and only if \(AB = I_n = BA\text{.}\) If such a matrix \(B\) exists, the matrix \(A\) is said to be invertible.

Theorem 5.1.15. Matrix inverse uniqueness.

Suppose that \(A, B, C\in\Mats{n,n}\) with \(AB=I_n=BA\) and \(AC=I_n=CA\text{.}\) Then \(B=C\text{,}\) and as such the unique inverse of \(A\) is denoted \(A^{-1}\text{.}\)

Proof.

If \(AB = I_n\) and \(CA=I_n\text{,}\) then

\begin{equation*} B = I_nB = (CA)B = C(AB) = CI_n = C\text{.} \end{equation*}

We already have all of the tools to find \(A^{-1}\) when it exists.

Theorem 5.1.16. Elementary matrices are invertible.

Let \(0\lt k \lt \ell \lt m\) be integers and \(\alpha\neq 0\) a complex number. Then \(S_{k,\ell}\text{,}\) \(M_{\alpha,k}\text{,}\) and \(E_{\alpha,k,\ell}\) are all invertible matrices, with \(S_{k,\ell}^{-1} = S_{k,\ell}\text{,}\) \(M_{\alpha,k}^{-1} = M_{1/\alpha,k}\text{,}\) and \(E_{\alpha,k,\ell}^{-1} = E_{-\alpha,k,\ell}\text{.}\)

Proof.

Recalling the definitions of \(S_{k,\ell}\text{,}\) \(M_{\alpha,k}\text{,}\) and \(E_{\alpha,k,\ell}\) in terms of elementary row operations, we see that we can extend those definitions as follows:

\begin{align*} I_m \xto{\rswap{k}{\ell}} \amp S_{k,\ell} \xto{\rswap{k}{\ell}} I_m\\ I_m \xto{\rscale{\alpha}{k}} \amp M_{\alpha,k} \xto{\rscale{1/\alpha}{k}} I_m\\ I_m \xto{\relim{\alpha}{k}{\ell}} \amp E_{\alpha,k,\ell} \xto{\relim{-\alpha}{k}{\ell}} I_m \end{align*}

as all the row operations are trivially reversible. Each of the second arrows on the above lines corresponds to the row operation which, when performed on \(I_m\text{,}\) results in the inverse matrices stated in the theorem.

There are many proofs of the following theorem via many different techniques, but the result is what we desire.

Theorem 5.1.17. GJE produces matrix inverses.

Let \(A\in\Mats{n,n}\) and let \(J,B\in\Mats{n,n}\) be matrices such that

\begin{equation*} [A\mid I_n]\xto{GJE}[J\mid B]\text{.} \end{equation*}

Then \(A\) is invertible if and only if \(J=I_n\text{,}\) and in that case \(B=A^{-1}\text{.}\)

Prev Top Next