\(
\newcommand{\BE}{\begin{equation}}
\newcommand{\EE}{\end{equation}}
\newcommand{\BA}{\begin{eqnarray}}
\newcommand{\EA}{\end{eqnarray}}
\newcommand\CC{\mathbb{C}}
\newcommand\FF{\mathbb{F}}
\newcommand\NN{\mathbb{N}}
\newcommand\QQ{\mathbb{Q}}
\newcommand\RR{\mathbb{R}}
\newcommand\ZZ{\mathbb{Z}}
\newcommand{\va}{\hat{\mathbf{a}}}
\newcommand{\vb}{\hat{\mathbf{b}}}
\newcommand{\vn}{\hat{\mathbf{n}}}
\newcommand{\vt}{\hat{\mathbf{t}}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bv}{\mathbf{v}}
\newcommand{\bg}{\mathbf{g}}
\newcommand{\bn}{\mathbf{n}}
\newcommand{\by}{\mathbf{y}}
\)

Linear Algebra Primer

Introduction

In this brief linear algebra primer, we review some of the very basics of linear algebra. In the first part (blue) we introduce matrices and how to compute with them (in this part vectors are simply coordinate matrices, i.e. matrices with only one column). In the second part (red) we give a motivation for abstract algebra based on the algebraic properties of real numbers. In the third part, we develop the basics of linear algebra formally from some of the basic building blocks of abstract algebra (e.g. group theory). In the fourth part we derive the matrix for the basis transformation of basis vectors, \([BA]\) (which will become the directional cosine matrix (DCM) if the transformation is a rotation). We also derive the matrix for the transformation of vector coefficients under basis transformation, \([BAc]\), and argue that these two matrices are the same in the special case of the transformation being a rotation.

Note: The equation numbering and referencing on this page will remain broken, until we figure out how to do automatic equation numbering for \(\LaTeX\) in WordPress.

Notation

  • Numbers (ofter referred to as scalars) will be denoted by lowercase letters in normal font, either different ones like \(a, b, c, d,\) etc., or by lowercase letters with indices, e.g.: \(a_{11}, a_{12}, a_{21}, a_{22},\) etc. We shall deal only with real numbers in this document, but in general they could be complex (i.e. containing a real and an imaginary part).
  • Matrices will be denoted by uppercase letters, e.g.: \(A, B, C, D,\) etc., and/or will be sometimes enclosed in brackets […], for easier recognition.
    The elements (entries) of matrices are numbers and are denoted with lowercase letters as any other number discussed above. If indices are used to distinguish the matrix elements, the first index counts through the (horizontal) lines, while the second index counts through the vertical columns.
  • Vectors will be denoted with lowercase boldface letters. oftentimes \(\mathbf{v}\) and \(\mathbf{w}\), but sometimes also \(\mathbf{a}\), \(\mathbf{b}\) and others. An optional hat on top of the vector signifies the vector has unit length (length = 1).
    The elements (entries) of vectors (which are numbers) will again be lowercase letters in normal font, because they are numbers.

Matrices and Matrix Operations

Introduction

What is a matrix?

An \(m\times n\) matrix (pronounced “em by en matrix”) is an array of numbers with m rows (horizontal lines) and n vertical columns. Mathematically it is considered to be one object. Notationally, the array is enclosed either in round parentheses or square brackets.

The individual numbers within the matrix are called its elements or entries. (The plural of the noun “matrix” is “matrices”.)

Below is an illustration of a 2 x 3 matrix:

\begin{equation}
\begin{pmatrix}
5 & 3 & 7\\
12 & 0 & 2
\end{pmatrix}
\end{equation}
Throughout much of the first part of this document, we will use 2 x 2 matrices for illustration purposes, although in three dimensions typically 3 x 3 matrices are common.

For now, matrices are simply an array of numbers. We need to learn how to calculate with them. For this, we will have to define operations on matrices, e.g. how to add and how to multiply them. We could do so in many different ways. But it turns out that some definitions are more practical than others. In order to appreciate this, we must first appreciate the role that matrices play in linear algebra.

What do Matrices have to do with Linear Algebra?

The basic objects and structures of linear algebra are vectors, vector spaces, and linear maps between them. A vector space is a set (in the sense of set theory) with operations defined on it which satisfy certain axioms, which give it a specific algebraic structure. The elements of a vector space are called vectors. A linear map (also called a vector space homomorphism or a linear transformation) between two vector spaces is a mapping (in a mathematical sense, assigning elements in one vector space to elements in another) which preserves the algebraic structure of the vector space. We shall develop all of this further below formally in the language of abstract algebra.

For now, however, we will take a shortcut. We shall jump immediately to choosing an ordered basis in this \(m\)-dimensional vector space and defining a coordinate map into \(\RR^m\). Essentially, what we have done is define a coordinate system on the vector space and used the property that all \(m\)-dimensional vector spaces are isomorphic to each other.

In doing so, magic happens: the vectors, which in general are abstract objects (arrows in space, polynomial functions, etc.) become ordered \(m\)-tuples of numbers, which we typically arrange in a single column. They now look exactly like a matrix with a single column. We shall therefore call them coordinate matrices of the vectors (and by abuse of language sometimes simply vectors). And the linear maps become multiplication with an \(m \times n\) matrix (a two dimensional array of numbers with \(m\) rows and \(n\) columns), which operates on these coordinate matrices. (As we shall see, an \(m \times n\) matrix makes an \(m\)-dimensional vector out of an \(n\)-dimensional vector – but it does more than that, the change of dimension is not the point here.)

So with this shortcut of choosing a coordinate system in the vector space, we do not need to deal with any of the cryptic mathematical terminology thrown at you above, which we have not even properly defined (ignore it if it confuses you), and all we need to do is to learn how to compute with matrices: matrix addition, matrix multiplication, and multiplication of a matrix by a number. We shall also learn how to invert a matrix and how to diagonalize it.

For the advanced reader, we shall later develop linear algebra in a general form properly from the point of view of abstract algebra (group theory, etc.), in a fashion that even the above intro will make sense upon a second read. But we recognize that for some readers this may become too much; our primary goal is that everyone learns to compute with matrices and has a general idea of the significance of these calculations for the applications we will need. One major such application will be the solution of a system of linear equations, which, as we shall see, reduces to the mere problem of matrix inversion in linear algebra. (While matrix inversion is nontrivial in general, it follows well known algorithms and can be readily taken care of on a computer numerically using common mathematical libraries.)

What is a Vector (for Our Purposes)?

For the purpose of this crude, quick introduction, a vector is simply a special case of a matrix. A vector is a matrix which only has one column (it can have two or more rows). The number of rows is called the dimension of the vector. Thus such a vector is simply a vertical \(m\)-tuple of numbers.

We can give such a vector a geometric interpretation if we have chosen a coordinate system in Euclidean space. Say the first entry measures the number of meters north, the second entry the number of meters east, and the third entry the number of meters down. Now this vector has actually obtained a direction and magnitude. It has become an arrow in space.

Note however, that the same coordinate matrix will represent a completely different arrow in space if we were to choose a different coordinate system (e.g. the first component measuring up, the second component measuring south, and the third component measuring east). In some sense, this logic is backwards. The physically immutable thing is really the arrow in space; depending on choice of coordinate system, it is represented by a different coordinate matrix. The choice of coordinate system is called a coordinate map (from the space of arrows to the space of \(\RR^3\)).

But vectors do not just have to be arrows in space. They can also be polynomial functions, for instance. These, too, can be expressed as a coordinate matrix with proper choice of a coordinate map (which now does not map arrows onto \(\RR^3\) but rather monomial functions onto \(\RR^n\). We will dive into this deeper, when we develop linear algebra properly from group theory. In the language of abstract algebra, a vector is an element of a vector space. A vector space is a group over a field satisfying certain properties. But once the coordinate map is defined, we can simply forget about everything else and do all computations with matrices (coordinate matrices representing vectors and multiplication with general \(m\times n\) matrices representing linear maps), so matrix computations is what we shall turn to now.

Matrix Operations

On real numbers, we know two basic operations: addition and multiplication. We would like to have addition and multiplication for matrices as well. In order to achieve this, addition and multiplication on matrices first have to be defined. The definitions that follow are arbitrary in principle (i.e. we could have defined them differently), but they turn out to be particularly useful and are the ones universally used.

Because it will turn out to be useful, we will also define a third operation: multiplication of a matrix by a scalar, in which we multiply a matrix by a number. Adding a scalar to a matrix, on the other hand, is not defined. Let us go through these three new operations on matrices in turn.

Matrix Operation 1: Matrix Addition

We define the addition of two matrices A and B as follows (the righthand side of the equation defines the lefthand side:

\begin{equation}
A\mathbf{+}B=
\begin{pmatrix}
a_{11} & a_{12}\\
a_{21} & a_{22}
\end{pmatrix}
\mathbf{+}
\begin{pmatrix}
b_{11} & b_{12}\\
b_{21} & b_{22}
\end{pmatrix}
:=
\begin{pmatrix}
a_{11}+b_{11} & a_{12}+b_{12}\\
a_{21}+b_{21} & a_{22}+b_{22}
\end{pmatrix}
\end{equation}

Note that there are two different kinds of operations (plusses) in the above equation. The boldface plus is the new operation (on matrices) we are defining (we will usually not boldface it in the future). The regular plusses are the usual plusses between real numbers that we already know. We could have use an entirely new symbol for the boldface plus. But we do not, by abuse of notation. The reader is to understand from the context of the objects the plus operates on, whether the boldface plus between matrices is meant or the regular plus between numbers.

Note that matrix addition is commutative (or Abelian): \(A+B = B+A.\) This follows from the above definition and the fact that a+b=b+a for real numbers. 

Matrix Operation 2: Matrix Multiplication

We define the multiplication of two matrices A and B as follows:
\begin{equation}
A\circ B =
\begin{pmatrix}
a_{11} & a_{12}\\
a_{21} & a_{22}
\end{pmatrix}
\circ
\begin{pmatrix}
b_{11} & b_{12}\\
b_{21} & b_{22}
\end{pmatrix}
:=
\begin{pmatrix}
a_{11}\cdot b_{11} + a_{12}\cdot b_{21} & a_{11}\cdot b_{12} + a_{12}\cdot b_{22}\\
a_{21}\cdot b_{11} + a_{22}\cdot b_{21} & a_{21}\cdot b_{12} + a_{22}\cdot b_{22}
\end{pmatrix}
\end{equation}

Like before, there are two different kinds of operations (multiplications) in the above equation. The “hollow” dot (small circle) is the new operation (on matrices) we are defining. The regular dots are the usual multiplication between real numbers that we already know. We shall use regular dots for both from now on, and frequently, the dot is omitted in the notation and one simply writes \(AB\) to denote the multiplication of two matrixes \(A\) and \(B\), just as it is customary to write ab to denote the multiplication between numbers \(a\) and \(b\).

Note that matrix multiplication is NOT commutative: \(A\cdot B \not= B\cdot A\), in general.

Matrix Operation 3: Scalar Multiplication (Multiplication of a Matrix by a Scalar)

We also define the multiplication of a matrix with a scalar (a number), resulting in another matrix:

\begin{equation}
c \circ A:=
\begin{pmatrix}
c\cdot a_{11} & c \cdot a_{12}\\
c \cdot a_{21} & c \cdot a_{22}
\end{pmatrix}
\end{equation}

The righthand side again defines the lefthand side. The dots on the right are regular multiplications of real numbers, which we already know. The dot on the left – represented by the small circle again just for highlighting purposes here – is the new operation we are defining. It takes two arguments: a number and a matrix. This operation is commutative, i.e. \(cA = Ac\), which follows from the definition and the fact that the multiplication of real numbers is commutative.

Note that this is a different operation from the previously defined matrix multiplication, and must therefore be explicitly defined. The previous operation multiplied two matrices; this operation multiplies a matrix with a number (those are two different objects). By abuse of notation, however, we again use the same dot symbol for this (and often even omit it).

Matrix Inversion

Given an \(n times n\) matrix \(A\), the (multiplicative) inverse of \(A\) is another \(n \times n\) matrix \(B\) which satisfies \(A \cdot B = I\), where \(I\) is the identity matrix, meaning it is zero everywhere, except for the diagonal, where it is 1. I.e. \(I\) is such that \(i_{11}=i_{22}=\dots=i_{mm}=1\) and \(i_{jk}=0\) if \(j\not= k\). We use the symbol \(A^{-1}\) to denote such a matrix \(B\). Matrix inversion is simply the task of finding matrix \(A^{-1}\). The reason why the identity matrix was chosen as the desired outcome of the \(A A^{-1}\) multiplication is because the identity matrix is the neutral element of matrix multiplication (in the language of abstract algebra, which we shall introduce in the red and black sections of this article further below). 

Not Every Matrix Has an Inverse

We will review an algorithm for matrix inversion below. For now, let us illustrate that not every matrix has an inverse.

For instance, there is no matrix \(B\) which would take a \(2\times2\) matrix consisting of 1s in the first row and zeros in the second row to the identity matrix:

\begin{eqnarray}
\begin{pmatrix}
1 & 1\\
0 & 0
\end{pmatrix}
\cdot B &=&
\begin{pmatrix}
1 & 1\\
0 & 0
\end{pmatrix}
\cdot
\begin{pmatrix}
b_{11} & b_{12}\\
b_{21} & b_{22}
\end{pmatrix}

\begin{pmatrix}
1\cdot b_{11} + 1\cdot b_{21} & 1\cdot b_{12} + 1\cdot b_{22}\\
0\cdot b_{11} + 0\cdot b_{21} & 0\cdot b_{12} + 0\cdot b_{22}
\end{pmatrix} = \\
&=&
\begin{pmatrix}
b_{11} + b_{21} & b_{12} + b_{22}\\
0 & 0
\end{pmatrix}
\not =
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
\end{eqnarray}

The inverse of a general \(2 \times 2\) matrix

\begin{equation}
A =
\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}
\end{equation}

is 

\begin{equation}
A = \frac{1}{\det A}
\begin{pmatrix}
a & -b\\
-c & d
\end{pmatrix}
\end{equation}

where \(\det A = ad-bc\) is the determinant of A. This becomes more complicated for larger matrices.

When the determinant is zero, the denominator in the above formula becomes zero and the inverse matrix becomes undefined. This happens if a row or column of a matrix can be expressed as a linear combination of the others (i.e. as an addition of arbitrary multiples).

Matrix Inversion Algorithm: Gauss-Jordan Elimination

Many different matrix inversion algorithms exist (see e.g. the Wikipedia article on matrix inversion). We shall introduce only one here, as we will typically invert a matrix on the computer rather than by hand, when we need.

The Gauss-Jordan elimination algorithm for an \(n \times n\) matrix \(A\) consists of appending an \(n \times n\) identity matrix to the right of it, forming a block matrix that looks like an \(n \times 2n\) matrix. One is then allowed to perform row operations: one can multiply rows by numbers and add arbitrary multiples of other rows to a row (essentially use any linear combination of rows). The goal of the row operations is to get an \(n \times n\) identity matrix in the first half, where the original matrix \(A\) was. The second half of the extended \(n \times 2n\) matrix then turns out to be the inverse of matrix \(A\). This algorithm will only work if the matrix \(A\) is invertible; otherwise it will be impossible to find row operations to the the identity matrix on the left, and instead one will end up with one or more rows, where the lefthand half of the extended block matrix contains only zeros.

To give a worked out example of the Gauss-Jordan algorithm for a \(2 \times 2\) matrix, let

\begin{equation}
[A] =
\begin{pmatrix}
4 & 3\\
2 & 2
\end{pmatrix}
\end{equation}

Then the extended block matrix is
\begin{equation}
[A|I] =
\left(\begin{array}{cc|cc}
4 & 3 & 1 & 0\\
2 & 2 & 0 & 1
\end{array}\right)
\end{equation}

After elementary row operations we obtain
\begin{equation}
[I|A^{-1}] =
\left(\begin{array}{cc|cc}
1 & 0 & 1 & -\frac{3}{2}\\
0 & 1 & -1 & 2
\end{array}\right)
\end{equation}

and thus
\begin{equation}
[A^{-1}] =
\begin{pmatrix}
1 & -\frac{3}{2}\\
-1 & 2
\end{pmatrix}
\end{equation}

An illustration for a \(3 \times 3\) matrix is given in the Wikipedia article on Gaussian elimination. This algorithm is applicable to any invertible \(n \times n\) matrix. For very large matrices, this can become evidently computationally expensive.

Application of Matrix Inversion: Solution of System of Linear Equations

Consider the following system of linear equations:

\begin{eqnarray}
5x_1 + 3x_2 &=& 7\\
4x_1 + 6x_2 &=& 2
\end{eqnarray}

where \(x_1\) and \(x_2\) are the variables we want to solve for. We can, of course, solve the first equation for \(x_1\), then substitute into the second, etc. But there is a more elegant solution using basic matrix operations. Note that the above equation can be written in matrix notation as

\begin{equation}
A\mathbf{x} = \mathbf{b},
\end{equation}
where

\begin{equation}
A :=
\begin{pmatrix}
5 & 3\\
4 & 6
\end{pmatrix}
\end{equation}

\begin{equation}
\mathbf{x}:=
\begin{pmatrix}
x_1 \\
x_2
\end{pmatrix}
\end{equation}

\begin{equation}
\mathbf{b}:=
\begin{pmatrix}
7 \\
2
\end{pmatrix}
\end{equation}

If we multiply both sides of the equation from the left with the inverse of \(A\), i.e. by \(A^{-1}\), we obtain \(A^{-1}A \mathbf{x}= A^{-1}\mathbf{b}\). Remembering that by definition of \(A^{-1}\), \(A^{-1}A= I\), and for the identity matrix \(I\) we have \(I\mathbf{x} = \mathbf{x}\), we obtain

\begin{equation}
\mathbf{x} = A^{-1}\mathbf{b}.
\end{equation}

Thus we have converted the problem of solving the system of linear equations to a simple matrix inversion task followed by a matrix multiplication, which can be automatized by a standard matrix inversion algorithm and matrix multiplication algorithm on a computer to obtain a numerical solution. (We have represented the vectors \(\mathbf{x}\) and \(\mathbf{b}\) by boldface symbols above, but for all practical purposes you can view them in this problem just as one-column matrices, for which the standard matrix multiplication rules apply as for any other matrices.)

If one wants to solve this problem by hand, one can use the Gaussian elimination algorithm similar to the Gauss-Jordan elimination mentioned earlier. (Alternatively, of course, one could use Gauss-Jordan on \(A\) and then multiply the inverse with the column vector \(\mathbf{b}\) on the right, but this would require extra computation steps.) It is faster to do instead the following. 

The procedure is to create a block matrix
\begin{equation}
[A|\mathbf{b}]=
\left(\begin{array}{cc|c}
5 & 3 & 7\\
4 & 6 & 2\\
\end{array}\right)
\end{equation}

and then use elementary row operations as in the Gauss-Jordan algorithm, until one obtains the identity matrix on the left, i.e.
\begin{equation}
[I|\mathbf{x}]=
\left(\begin{array}{cc|c}
1 & 0 & x_1\\
0 & 1 & x_2\\
\end{array}\right)
\end{equation}

The \(x_1\) and \(x_2\) on the right, obtained this way, are the solution of the system of linear equations.

Matrix Diagonalization

An \(n\times n\) matrix \(D\) is called a diagonal matrix if it has nonzero entries only on the diagonal (i.e. all entries \(d_{ij}=0\) if \(i\not=j\)).

An \(n \times n\) matrix \(A\) is called diagonalizable, if there exists an \(n \times n\) invertible matrix \(B\) and an \(n \times n\) diagonal matrix \(D\), such that:

\begin{equation}
A = B^{-1} D B
\end{equation}

The motivation for defining it this way comes from \(B\) corresponding to a basis transformation, i.e. for a linear map which is expressed with respect to one basis by matrix \(A\) a different basis can be found, with respect to which the matrix of the linear map is diagonal.

The linear map then leaves the vectors of this new basis invariant up to scaling (change in length). These vectors are therefore called eigenvectors. The diagonal elements of \(D\) are called eigenvalues and indicate by how much these eigenvectors get scaled by the linear map.

Euler’s rotation theorem states that any composition of rotations can be expressed as a single rotation around an axis. Clearly, a vector along this axis is left unchanged. It is therefore an eigenvector of the rotation and the eigenvalue is one. Unless the rotation is by 180 degrees, no other vectors are left unchanged up to a multiplicative factor, so rotation matrices are generally not diagonalizable in \(\RR\).

The process of finding \(B\) and \(D\) above is called diagonalization. You will learn it in any standard introductory linear algebra course. We shall not dwell on it here, as we just wanted to convey the main idea and will use a computer to find the eigenvectors and eigenvalues of a matrix if needed. We shall remark though that the eigenvalues are found quite readily by solving the equation

\begin{equation}
\mathrm{det} (A – \lambda I) = 0
\end{equation}

for the new parameter \(\lambda\), which will become the eigenvalue. Above, \(I\) is the identity matrix as always, and \(\mathrm{det}\) denotes the determinant, which is to be taken of matrix \(A-\lambda I\). The lefthand side of the above equation yields a polynomial in \(\lambda\) of degree \(n\) (for an \(n \times n\) matrix), and finding the eigenvalues is therefore equivalent to finding the roots of this so called characteristic polynomial. There may be several eigenvalues with the same value (multiplicity), and some of the eigenvalues may be complex, even if the entries of \(A\) are real. because not every polynomial of degree \(n\) has \(n\) real roots. With each eigenvalue an eigenvector is associated, which is left invariant (up to scaling) by \(A\).

These eigenvalues will play a crucial role as the factors in the exponents of the solutions of systems of first-order linear differential equations, which we discuss in our linear differential equations primer. These will become relevant, when we discuss dynamic stability of aircraft (e.g. phugoid mode and short period mode). The eigenvalues, and in particular whether they are real or complex, will determine the time evolution of the solution and in particular whether the solution is exponentially decaying/growing (first-order response) or oscillatory (second-order response).

For the interested reader, we shall expand on matrix diagonalization a bit more formally later (in the black section of this article), when we develop linear algebra from the foundations of abstract algebra.

Motivation for Abstract Algebra: Algebraic Properties of Real Numbers and Matrices

To develop a deeper understanding of the process of matrix inversion, i.e. the existence and finding of the multiplicative inverse of a matrix, we shall quickly investigate which algebraic properties of real numbers carry over to matrices. Real numbers form a field: both operations commute and have inverse elements. Matrices form a non-commutative ring with 1: matrix addition is commutative, but matrix multiplication is not (we have already seen this above), and furthermore, while all matrices have an additive inverse, not all matrices have a multiplicative inverse (we shall explore this in this section). This distinction means than not all theorems of abstract algebra which are applicable to real numbers apply to matrices – only those that do not rely on the existence of a multiplicative inverse in their proof. While we shall not go into abstract algebra in any detail in this article, we do need to discuss the multiplicative inverse of a matrix.

Real Number Properties

In the language of abstract algebra, real numbers form a field, which means they are a set with two operations (called addition and multiplication), and the following properties:

First Operation (Addition)

1a) Associativity: For number \(a, b, c\) the following holds: \((a+b)+c = a+(b+c)\).
1b) Neutral Element (Identity Element): A neutral element for addition exists: Zero. Neutral element means, if we add this number to any other number, we get the other number back:
\(a + 0 = a\), for any real number \(a\).
1c) Inverse Element: For every number, the additive inverse element exists. Inverse element means that if it is added to another number, we get the neutral element: \(a + b = 0\). The element \(b\) is guaranteed to exist, it is the negative number \(b = -a\) (“negative” assuming \(a\) is positive). Note that the minus sign is not treated as an operation here but to indicate the inverse: the common notation \(a – c\) is really shorthand for \(a + (-c)\), where \(-c\) denotes the “additive inverse”, i.e. whatever number gives zero when added to \(c\).

(It is also required that the operation (the plus in this case) applied to any two elements of the set yields another element of the set, and does not go outside the set.)

A set and an operation with the above properties is called a group in abstract algebra. The three properties above are the group axioms. If the operation commutes, i.e. if \(a+b=b+a\) for any two elements, as is the case for real numbers, the group is called a commutative or Abelian group. For further reading on groups, see this wikipedia article.

Second Operation (Multiplication)

For the second operation (multiplication), we have the following properties:
2a) Associativity: For number \(a, b, c\) the following holds: \((a\cdot b)\cdot c = a\cdot(b\cdot c)\).
2b) Neutral Element (Identity Element): A neutral element of multiplication exists: 1. Neutral element again means, if we multiply this number to any other number, we get the other number back:
\(a \cdot 1 = a\), for any real number \(a\).
(The neutral element of multiplication is always a different element than the neutral element of addition, in any field.)
2c) Inverse Element: For every number (except zero), the multiplicative inverse element exists. Inverse element means that if it is multiplied to another number, we get the neutral element of multiplication: \(a \cdot b = 1\). b exists, it is 1/a. Note that we need to exclude zero (the neutral element of the first operation), in this requirement, otherwise it will not be consistent.
3) Distributivity: Distributivity of multiplication over addition means that if we combine the two operations, the following holds: \(a\cdot (b+c) = (a\cdot b) + (a\cdot c)\).

A set and two operations and all of the above properties is called a field in abstract algebra, provided the operations are commutative. For more information on fields, see this wikipedia article.

(The word “field” here is not to be confused with other uses of this word in mathematics, such as “vector field” (a section of the tangent bundle of a differentiable manifold), which itself is not to be confused with “vector space” (an Abelian group over a field, see below) – those are all different things.)

Matrix Properties

Unlike real numbers, matrices do not form a field, because some of the aforementioned properties are violated. Let us first examine which properties hold:

– 1a), 1b), and 1c) hold, with the “zero matrix”, consisting of all zeros, being the neutral element for matrix addition.
– 2a) and 2b) hold, with the “identity matrix”, consisting of all 1s on the diagonal (i.e. elements \(a_{11}, a_{22}, a_{33}, a_{44}, etc.) and zeros otherwise being the neutral element for matrix multiplication.
– 3) The axiom of distributivity holds.

However, 2c) is violated. Some matrices (other than the zero matrix) do not have a multiplicative inverse (as we have already seen in an example above). Also, matrix multiplication has been mentioned previously as not being commutative. Matrices, therefore, do not form a field, but rather a Noncommutative Ring with 1.

(Multiplicative) Inversion

The main purpose of the brief detour above into the most basic foundations of abstract algebra in this section was to give us a somewhat more formal understanding of inversion. Additive or multiplicative inversion means, that we want to find the element of a field or ring which gives us the neutral element, when combined with some given element. For instance, for real numbers, -2 is the additive inverse of 2, giving us 0 (neutral element of addition) when added to 2. 1/2 is the multiplicative inverse of 2, giving us 1 (neutral element of multiplication), when multiplied with 2.

Similarly for matrices. Finding the additive inverse of a matrix is trivial and every matrix has one: simply take the negative values of all its entries. On the other hand, finding the multiplicative inverse of a matrix is a big deal for several reasons (we cannot just take the reciprocals of the entries):

  • It can be computationally expensive for large matrices.
  • Not every matrix has a multiplicative inverse. Such matrices are called singular and have at least on zero eigenvalue.
  • Matrices which have a multiplicative inverse but are numerically very close to a matrix which is singular, can be difficult to invert numerically because numerical rounding errors accumulate during the calculation of the inverse, yielding a result which does not give the identity matrix when multiplied with the originally given matrix. Numerical approximations must be used in such cases to prevent the result from being far off and unusable, such as singular value decomposition. The result will not be exact, but close enough.

Covering these numerical challenges is beyond the scope of this article. In part for those reasons, the terms “matrix inversion” and “inverse matrix” are always used with regard to the multiplicative inverse of a matrix (not for the additive inverse).

What about the 3rd Matrix Operation?

When defining matrix operations, we had the first operation of matrix addition, the second operation of matrix multiplication, and the third operation of multiplication of a matrix by a scalar. The astute reader may have noticed that in the above discussion of real number and matrix properties, only the first two operations appeared. What about the 3rd operation?

This is because we were looking at matrices as ring, which does not have a third operation. Matrices have an additional algebraic structure. They form a vector space, if we only take into account matrix addition and forget about matrix multiplication. (Indeed, we can simply unravel the rows of the matrix and form one large column vector with all the entries to visualize this, though in practice this is never done.)

In order to understand vector spaces and vectors properly, it is beneficial to develop linear algebra formally from some of the very foundations of abstract algebra: groups and fields. We shall do so next.

Formal Development of Linear Algebra

Basics of Group Theory

We can cast the above rather chaotic observations of the properties real number and matrices in a more formal setting, which lays the foundation for abstract algebra. This will allow us later to define vector spaces in the same language in a clean fashion. Vector spaces (their elements being vectors) and linear maps (or vector space homomorphisms) between them are at the core of linear algebra. We shall start with some basic definitions. This is an advanced, very formal section; you may choose to look instead at the corresponding wikipedia article on vector spaces for a somewhat less rigorous and more graphically illustrated approach.

Operations

Definition (Operation): An operation \(\circ\) on a set \(S\) assigns to each ordered pair \((s, t)\in S\times S\) in a unique fashion an element \(s\circ t \in S\).

Therefore, an operation is a map:
\begin{eqnarray}
\circ: S\times S &\rightarrow& S\\
(s, t) &\mapsto& s\circ t\nonumber
\end{eqnarray}

\(s\circ t\) is again an element in \(S\). We can therefore write, if we like represent it with a single letter, \(r:=s\circ t\), where \(r\in S\). Which element in \(S\) is \(r\), depends on what \(S\) is and how we define the operation \(\circ\).

Groups

We would now like to introduce a few extremely important definitions which form the fundament of algebra: the definitions of groups, rings, and fields.

Definition (Group): A group consists of a set \(G\) and an operation \(\circ\), which fulfills the following axioms:
(G1) Associativity: \( \hspace{1cm} (x\circ y) \circ z = x \circ(y\circ z) \hspace{1cm} \forall x,y,z \in G.\)
(G2) (Left-)Neutral Element: There exists an element \(e\in G\) such that
\begin{equation}
e\circ x=x \hspace{1cm} \forall x\in G.
\end{equation}
(G3) (Left-)Inverse Element: For all \(x\in G\) there exists a \(y\in G\) such that
\begin{equation}
y \circ x = e,
\end{equation}
where \(e\) is the neutral element from above.
(We remind the reader that the symbol “\(\forall\)” means “for all”).

We write \((G,\circ)\) to denote the group and its operation, but we may often colloquially abbreviate the notation by just writing \(G\) to denote the group, suppressing to mention explicitly its associated operation. The neutral element is often also called the identity element.

Furthermore, it is easily shown that the left-neutral element is equal to the right neutral element, and the left-inverse element is equal to the right-inverse element. Therefore we commonly just speak of the neutral and inverse elements, without having to specify if we mean the “left-” or “right-“.

Definition (Commutative Group or Abelian Group):
An Abelian group is a group \((G,\circ)\) which—in addition to the above properties (G1), (G2), and (G3)—also satisfies:
(G4) Commutativity:
\begin{equation}
x\circ y=y\circ x \hspace{1cm} \forall x, y\in G
\end{equation}

Examples of groups include:
1) the real (or whole) numbers and the operation of addition (note that “-” is not a group operation but rather denotes the inverse element with respect to addition: \(x-y\) is really shorthand for \(x+(-y)\)),
2) the real (or rational) numbers excluding zero and the operation of multiplication,
3) the subset of whole numbers {0, 1, 2, 3, 4} modulo 5 (division by 5 with rest) and multiplication.
4) \(n\times n\) matrices and matrix addition.
There are many other groups.

Examples of sets and operations which are not groups:
1) The set of whole numbers {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} and addition. 3+5 is not an element of the set.
2) Positive whole numbers with zero and addition. Inverse elements (negative numbers) are missing.
3) Whole numbers without zero and multiplication. The inverse elements are missing (the inverse element of 2 would be 1/2, which is a rational number).
4) \(n\times n\) matrices except zero matrix and matrix multiplication. Some matrices do not have multiplicative inverses, as we have already seen.

However, there are subsets of matrices with matrix multiplication which are groups. For example, rotations in 3 dimensions form the group \(SO(3)\).

Fields, Rings

It turns out that a set \(G\) can carry two group structures. Intuition for this are the real numbers, which are a group under addition and under multiplication simultaneously. But as we saw earlier in the example of real numbers, there was a little complication with having to exclude the zero for multiplication. It turns out this is generally so. This brings us to the important definitions of rings and fields.

Definition (Field): A field consists of a set \(F\) together with two operation (denoted here by “\(+\)” and “\(\cdot\)”, which satisfy the following axioms:
(F1) \((F,+)\) is an Abelian group. (We shall use the symbol “\(0\)” to denote its neutral element in this definition, but this is just a symbol and does not mean that it actually is the real number zero.)
(F2) \((F^{*}, \cdot)\) is an Abelian group, where we have defined \(F^*:=F\setminus{0}\), i.e. \(F\) without the neutral element of \((F,+)\). (We shall use the symbol “\(1\)” to denote the neutral element of this second operation, but this does not mean that it corresponds to the real number 1 – it will do so only if the field is the real (or rational) numbers.)
(F3) Distributivity:
\begin{eqnarray}
(x+y)\cdot z &=& x\cdot z+y\cdot z\\
x\cdot(y+z) &=& x\cdot y + x\cdot z
\end{eqnarray}

Axiom (F3) governs how the two operations \(+\) and \(\cdot\) are combined. In (F2) we had to exclude the neutral element of the group defined in (F1). Note that for (F1) we do not have to exclude the neutral element of \((F^*,\cdot)\), which is usually denoted by the symbol 1.

Why such a weird, asymmetric setup, like excluding one element for (F2), but not excluding anything of (F2) in (F1)? Clearly the intuition behind this comes from real numbers together with addition and multiplication—there it seems to work this way and such an exclusion is necessary. Indeed, real numbers with those two operations are a field.

But they are just one example of a field. There are many other fields, not just real numbers with addition and multiplication, and it turns out that this concept is extremely powerful. Many theorems can be proven for a field based abstractly on the above definitions, and they are then valid for all fields, not just real numbers. However, some very common objects like matrices are do not satisfy all the field axioms, and one therefore also studies weaker structures like rings:

Definition (Ring): A ring consists of an Abelian group \((G,+)\) with an additional operation \(\cdot\) which has the distributive property (F3):
\begin{eqnarray}
(x+y)\cdot z &=& x\cdot z+y\cdot z\\
x\cdot(y+z) &=& x\cdot y + x\cdot z
\end{eqnarray}

Definition (Associative Ring): If in addition to the above definition of a ring also the following holds
\begin{equation}
(x\cdot y)\cdot z=x\cdot(y\cdot z)
\end{equation}
then the ring is said to be associative.

Definition (Ring with 1): An associative ring which possesses a neutral element for its second operation \(\cdot\) is called a “ring with 1”, or “ring with unity.”

Note that a ring with 1 fulfills almost all the field axioms, except that not each element is required to have an inverse with respect to the second operation, and the second operation does not have to be commutative.

Matrices fulfill almost all the field axioms, but not quite: not every matrix (other than the zero matrix) has a multiplicative inverse. For example \(A= \left(\begin{array}{cc}0 & 1\\0 & 0\end{array}\right)\) is not invertible, i.e. there is no matrix \(B\) for which \(A\cdot B=1\), where \(1\) is the identity matrix here (not the number 1). Matrices therefore do not form a field, but rather only a ring with unity.

Exercise: \(\RR^n\) is not a Field

To develop an appreciation for the field axioms, let us do the following exercise. We have seen above that real numbers \(\RR\) with addition and multiplication are a field. Let us try to do the same thing with \(\RR^n\) and see where it fails.

We can represent an element in \(\RR^n\) as an \(n\)-tuple of numbers
\begin{equation}
\mathbf{x} = \begin{pmatrix}
x_1\\
x_2\\
\dots\\
x_n
\end{pmatrix}
\in \RR^n
\end{equation}

with \(x_i\in\RR, i=1,\dots,n\).

We can define addition of two objects like this component wise and obtain an Abelian group, just like we did with matrices earlier. How about multiplication?

Component wise multiplication does not work, because as soon as one component is zero, but not all, the element will not have a multiplicative inverse, and (F2) will be violated.

We cannot use the same definition as for matrix multiplication, either, because it is not applicable to two \(n\)-tuples of this shape (the number of rows of one object has to equal the number of columns of the other).

We could try the dot product (often used for vectors). But the dot product is not an operation: it yields a single number, not another \(n\)-tuple of numbers. So let us try the cross product (assuming \(n=3\)). The cross product yields another vector, so that is good. But the cross product is anticommutative, and thus violates (F2) as well.

While with the above feeble attempts we have by no means exhausted all possibilities here, how we could try to define multiplication on such \(n\)-tuples, we do see that in general the field axioms are quite demanding and not so easy to satisfy by a random set of objects, even if we take something as simple as multiple copies of a field. (We will see soon that \(\RR^n\) has the algebraic structure of an \(n\)-dimensional vector space.)

Vector Spaces

Definition of a Vector Space

Now that we have formally introduced groups and fields, we are in a position to define a vector space. A vector space is a set with some specific operations giving it a predefined algebraic structure. In the language of abstract algebra, a vector space is a commutative group over a field, satisfying certain properties.

In order to construct a vector space, we need three main ingredients:

  • a commutative group \((V, \mathbf{+})\): its elements are called vectors, and the group operation \(\mathbf{+}\) is called vector addition (note the boldface plus).
  • a field \((\FF, +, \cdot)\) (such as the real or complex numbers): its elements are called scalars, and its two operations \(+\) and \(\cdot\) are called field addition and field multiplication, respectively. (The regular-font \(+\) here for field addition is not to be confused with the earlier boldface \(\mathbf{+}\) for vector addition in \((V, \mathbf{+})\); they are two distinct operations.)
  • a combining operation \(\circ\), called scalar multiplication, which takes a scalar in \(\FF\) and a vector in \(V\) as arguments and returns another vector in \(V\). This new operation dictates how the scalars of the field interact with the vectors of the group, which until then where completely unrelated. This operation needs to satisfy certain axioms (see below), for the whole structure to be called a vector space. It is not the same as (and not to be confused with) the field multiplication \(\cdot\) which \(\FF\) already comes with as a field.

The following definition properly defines a vector space.

Definition (Vector Space): A vector space over a field \(\FF\) consists of an additively written, commutative group \(V\) (the elements of which are called vectors), a field \(\FF\) (the elements of which are called scalars) and a multiplication (called scalar multiplication). This scalar multiplication uniquely assigns every ordered pair \((a, \mathbf{x})\) (with \(a \in \FF\) and \(x \in V\)) a vector \(a \circ\mathbf{x} \in V\) which is again an element of \(V\), such that the following axioms hold for this scalar multiplication:
1) Compatibility with field multiplication (Associativity):
\((a\cdot b)\circ\mathbf{x}=a \circ (b \circ\mathbf{x})\)
2) Distributivity:
with respect to vector addition:
\(a\circ(\mathbf{x+y})=(a\circ\mathbf{x}) \mathbf{+} (a\circ\mathbf{y})\)
with respect to field addition: \((a+b)\circ \mathbf{x}=(a \circ\mathbf{x}) \mathbf{+} (b \circ\mathbf{x})\)
3) Identity Element:
\(1\circ\mathbf{x}=\mathbf{x}\), for \(1 \in \FF\) (Note that \(1\) here denotes the neutral element of the second operation of field \(\FF\).)
The above is valid for all \(\mathbf{x}, \mathbf{y} \in V\) and \(a, b \in \FF\).

Note that the \(\cdot\) and \(\circ\) symbols for the field multiplication and scalar multiplication are often omitted in writing, and both operations are given priority over the vector addition, such that the corresponding parentheses can be dropped as well. For instance, from the above, compatibility can simply be written as \((ab)\mathbf{x} = a(b\mathbf{x})\), while distributivity with respect to vector addition can be written as \(a(\mathbf{x+y}) = a\mathbf{x} \mathbf{+} a\mathbf{y}\).

 

Basis and Dimension

Definition (Linear Independence): Let \(V\) be a vector space over the field \(\FF\). The vectors \(\mathbf{x}_1, \mathbf{x}_2, …, \mathbf{x}_n\in V\) are said to be linearly independent (over \(\FF\)), if from a relation \(\lambda_1\mathbf{x}_1+\lambda_2\mathbf{x}_2+…+\lambda_n\mathbf{x}_n=0\), with \(\lambda_1, \lambda_2, \dots, \lambda_n\in\FF\) follows that \(\lambda_1=\lambda_2=…=\lambda_n=0\).

The vectors \(\mathbf{x}_1, \mathbf{x}_2, …, \mathbf{x}_n\in V\) are called linearly dependent if they are not linearly independent, i.e. if \(\lambda_1\mathbf{x}_1+\lambda_2\mathbf{x}_2+…+\lambda_n\mathbf{x}_n=0\) can be satisfied with some \(\lambda_1, \lambda_2, \dots, \lambda_n\in\FF\) not all zero. Note that if one of the vectors \(\mathbf{x}_1, \mathbf{x}_2, …, \mathbf{x}_n\in V\) is the null vector, \(\mathbf{0}\), then they are automatically linearly dependent (because the \(\lambda_i\) in front of that vector can be chosen to be equal to any finite number and the sum is still zero).

Definition (Generating Set): A set \(S\) of vectors in \(V\) is called a generating set if any vector \(\mathbf{v}\in V\) can be written as a linear combination of finitely many vectors in \(S\). In other words, there exist vectors \(\mathbf{x}_1, \mathbf{x}_2, …, \mathbf{x}_n\in S\) and \(\lambda_1, \lambda_2, \dots, \lambda_n\in\FF\) such that \(\mathbf{v}=\lambda_1\mathbf{x}_1+\lambda_2\mathbf{x}_2+…+\lambda_n\mathbf{x}_n\).
One writes \(V=\langle S \rangle\).

Definition (Finitely Generated): If there exists a generating set \(S\) comprised of finitely many vectors, then the vector space \(V\) is said to be finitely generated.

Definition (Basis): A set \(\{\mathbf{x}_1, \mathbf{x}_2, …, \mathbf{x}_n\}\) of a finite number of vectors \(\mathbf{x}_i \in V\) with \(1\le i\le n\) is called a basis of the vector space \(V\) if:
(VS1) \(V\) is generated by \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\),
i.e. any vector \(v \in V\) can be written as a linear combination of the basis vectors: \(v=\lambda_1\mathbf{x}_1+\lambda_2\mathbf{x}_2+…+\lambda_n\mathbf{x}_n\) with \(\lambda_i \in \FF\) for all \(i\).
(VS2) The vectors \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n\) are linearly independent,
i.e. from \(\lambda_1\mathbf{x}_1+\lambda_2\mathbf{x}_2+…+\lambda_n\mathbf{x}_n=0\) follows that \(\lambda_1=\lambda_2=…=\lambda_n=0\).

Definition (Dimension): The number of basis vectors is called the dimension of a vector space.

The above is just a quick incomplete run-through of the very basics. Please consult a linear algebra textbook for more details as needed.

Linear Transformations

Definition of Homomorphisms

Definition (Linear Transformation or Linear Map or Vector Space Homomorphism): Let \(V\) and \(W\) be two vector spaces over a field \(\FF\). A linear transformation \(\varphi\) is a map \(\varphi:V\rightarrow W\) which satisfies the following properties:
(1) \(\varphi(\mathbf{x}+\mathbf{y})=\varphi(\mathbf{x})+\varphi(\mathbf{y})\) for all \(\mathbf{x}, \mathbf{y}\in V\).
(2) \(\varphi(\lambda\cdot\mathbf{x})=\lambda\cdot\varphi(\mathbf{x})\) for all \(\lambda\in\FF\) and \(\mathbf{x}\in V\).
Such a linear map is often also called a (vector space) homomorphism, because it preserves the algebraic structure of the vector space. The set of all linear transformations from \(V\) to \(W\) is denoted by \(\mathcal{L}(V, W)\) (and \(\mathcal{L}(V)\) if \(W=V\)). \(\mathcal{L}(V, W)\) itself is a vector space (given without proof here).

Note that in the equation of property (1) above the plus sign on the left-hand side is the Abelian group operation in \(V\), while the plus sign on the right-hand side is the Abelian group operation in \(W\); these are in general different operations. Linear transformations ensure that we can either add vectors in the domain before making the mapping or make the mapping on the individual vectors and add them up afterwards in the target space.

Definition (Isomorphism): A homomorphism according to the previous definition which is bijective (i.e. injective (one-to-one) and surjective (onto)), and therefore invertible, is called a (vector space) isomorphism. (Its inverse is denoted by \(\varphi^{-1}\). In particular \(\varphi^{-1}\circ\varphi\) is the identity map, which maps a vector onto itself.)

The importance of isomorphisms comes from the fact that two vector spaces with an isomorphism in between them are the same for all practical purposes of linear algebra, because the isomorphism establishes a pairwise correspondence between the elements of the two spaces which preserves the algebraic vector space structure.

Definition (Endomorphism): A homomorphism \(\varphi: V \rightarrow V\) which maps \(V\) onto itself (i.e. for which \(W=V\) in our previous definition of homomorphism) is called an endomorphism. The set of endomorphisms of \(V\) is denoted by End(\(V\)).

Definition (Automorphism): An endomorphism which is also an isomorphism is called an automorphism.  

Theorem: 1) The set \(\mathcal{L}(V, W)\) is a vector space under ordinary addition of functions and scalar multiplication of functions by elements of \(\FF\).
2) If \(\varphi\in\mathcal{L}(U, V)\) and \(\tau\in\mathcal{L}(V, W)\), then the composition \(\tau \circ \varphi\) is in \(\mathcal{L}(U, W)\).
3) If \(\varphi\in\mathcal{L}(V, W)\) is bijective then \(\varphi^{-1}\in\mathcal{L}(W, V)\).
4) The vector space \(\mathcal{L}(V)\) is an algebra, where multiplication is composition of functions. The identity map \(\mathrm{id}\in\mathcal{L}(V)\) is the multiplicative identity and the zero map \(0\in\mathcal{L}(V)\) is the additive identity.

Theorem (Knowing transformation of basis vectors under linear map defines the linear map):
Let \(V\) and \(W\) be vector spaces over \(\FF\), and let \(\mathcal{B}=\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\) be a basis for \(V\). Then we can define a linear transformation \(\varphi\in\mathcal{L}(V,W)\) by specifying the values of \(\varphi(\mathbf{v}_i)\in W\) for all the basis vectors \(\mathbf{v}_i\in\mathcal{B}\) and extending the domain of \(\varphi\) to all of \(V\) by using linearity, i.e. for an arbitrary vector \(\mathbf{x}=\lambda_1\mathbf{v}_1+\dots+\lambda_n\mathbf{v}_n\in V\) we have:
\begin{equation} \varphi(\mathbf{x})=\varphi(\lambda_1\mathbf{v}_1+\dots+\lambda_n\mathbf{v}_n)=\lambda_1\varphi(\mathbf{v}_1)+\dots+\lambda_n\varphi(\mathbf{v}_n).
\end{equation}
The above linear extension uniquely defines a linear transformation on \(V\), i.e. if \(\varphi, \tau\in\mathcal{L}(V, W)\) and \(\varphi(\mathbf{v}_i)=\tau(\mathbf{v}_i)\) for all \(\mathbf{v}_i\in\mathcal{B}\), then \(\varphi=\tau\). The theorem says that a linear transformation is fully and uniquely specified on the whole vector space \(V\) if its values are known on the basis vectors of \(V\).

Linear Transformations on \(\FF^n\) and Matrix Multiplication

In this subsection we shall first look at linear transformations \(\varphi:\FF^n\rightarrow\FF^m\), i.e. at the special case where the domain and codomain of the linear transformations are products of the base field. This means that the vectors are \(n\)- and \(m\)-tuples of scalars, where the scalars are elements in \(\FF\).

Definition (Standard Vector, Standard Basis for \(\FF^n\)): The \(i\)-th standard vector in \(\FF^n\) is the vector \(\mathbf{e}_i\), which is a vertical \(n\)-tuple of numbers with zeros in all coordinate positions except the \(i\)-th, where it is \(1\). The set \(\{\mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n\}\) is called the standard basis for \(\FF^n\). (Sometimes it is also called the canonical basis of \(\FF^n\).)

Proposition (Linear maps for matrices): Let A be a fixed \(i\times j\) matrix, i.e. \(A\in M(i,j)\). Then we can define a map \(\varphi_A\):
\begin{eqnarray}
\varphi_A: M(j, k)&\rightarrow& M(i, k)\nonumber\\
X&\mapsto&\varphi_A(X):=A\circ X,
\end{eqnarray}
where \(\circ\) denotes matrix multiplication. The proposition is that \(\varphi_A\) is a linear map and satisfies the properties:
(1) \(\varphi_A(X+Y)=\varphi_A(X)+\varphi_A(Y)\) for all \(X, Y\in M(j,k)\).
(2) \(\varphi_A(\lambda X)=\lambda\cdot\varphi_A(X)\) for all \(\lambda\in\RR\) and all \(X\in M(j,k)\).

Theorem (Matrix of linear map): (1) If \(A\) is an \(m\times n\) matrix over the field \(\FF\), then \(\varphi_A\in\mathcal{L}(\FF^n, \FF^m)\), where \(\varphi_A\) is defined as
\begin{equation}
\varphi_A(\mathbf{v}):=A\circ\mathbf{v},
\end{equation}
for any vector \(\mathbf{v}\in\FF^n\) (i.e. a coordinate matrix or column vector of \(n\) numbers), with \(\circ\) denoting matrix multiplication.
(2) If \(\varphi\in\mathcal{L}(\FF^n,\FF^m)\), i.e. if \(\varphi:\FF^n\rightarrow\FF^m\) is a linear transformation, then \(\varphi=\varphi_A\) (with the same definition of \(\varphi_A\) as above), and
\begin{equation}
A:=(\varphi(\mathbf{e}_1)|\dots|\varphi(\mathbf{e}_n)),
\end{equation}
where \(\mathbf{e_1},\dots\mathbf{e}_n\in\FF^n\) is the standard basis of \(\FF^n\) (see Definition of Standard Basis), and the \(\varphi(\mathbf{e}_i)\) are \(m\)-dimensional column vectors in \(\FF^m\). The matrix \(A\) is called the matrix of \(\varphi\).

Ordered Bases and Coordinate Matrices

Let us consider an \(n\)-dimensional vector space \(V\), with a set of basis vectors \(\{\mathbf{v}_1,\dots,\mathbf{v}_n\}\). We can then write any arbitrary vector \(\mathbf{x}\) as a sum of the basis vectors \(\mathbf{v}_i\), i.e.
\begin{equation}
\mathbf{x}=\lambda^1\mathbf{v}_1+\lambda^2\mathbf{v}_2+\dots+\lambda^n\mathbf{v}_n
\end{equation}
with scalars \(\lambda_1, \dots, \lambda_n\in\FF\).

If we define an order among the basis vectors, i.e.\ if we take the above set of basis vectors and arrange them in an ordered \(n\)-tuple \(\mathcal{B}=(\mathbf{v}_1,\dots,\mathbf{v}_n)\) (instead of just a set, which has no order among its elements). This is called an ordered basis \(\mathcal{B}\). We can then put the scalars also in a unique ordered \(n\)-tuple \((\lambda_1, \dots, \lambda_n)\).

Definition (Coordinate Map, Coordinate Matrix): Given the above ordered basis \(\mathcal{B}\)$ we can define the coordinate map \(\phi_{\mathcal{B}}:V\rightarrow \FF^n\) by
\begin{equation}
\phi_{\mathcal{B}}(\mathbf{v})=[\mathbf{v}]_{\mathcal{B}}=\left[ \begin{array}{c}\lambda_1\\ \vdots\\ \lambda_n \end{array} \right].
\end{equation}
The \(n\times 1\)-dimensional column matrix \([\mathbf{v}]_{\mathcal{B}}\) is called the coordinate matrix of \(\mathbf{v}\) with respect to the ordered basis \(\mathcal{B}\).

Knowing \([\mathbf{v}]_{\mathcal{B}}\) is equivalent to knowing \(\mathbf{v}\), provided one knows \(\mathcal{B}\). It can be easily seen that the coordinate map \(\phi_{\mathcal{B}}\) is an isomorphism (see definition of isomorphism), i.e. that it is bijective and preserves the vector space structure by
\begin{equation}
\phi_{\mathcal{B}}(\lambda_1\mathbf{v}_1+\dots+\lambda_n\mathbf{v}_n)=\lambda_1\phi_{\mathcal{B}}(\mathbf{v}_1)+\dots+\lambda_n\phi_{\mathcal{B}}(\mathbf{v}_n).
\end{equation}
We capture this explicitly below in the next proposition.

Proposition (Choice of basis is isomorphism into copies of field): Let \(V\) be a vector space of dimension \(n\) over the field \(\FF\). The choice of a basis of \(V\) is equivalent to a choice of an isomorphism \(\beta:V\rightarrow\FF^n\).

Coordinate Transformations

We shall now review what happens to the coordinate matrices of (fixed) vectors if we switch from one basis to another, i.e. if we perform a coordinate transformation which changes the values of the coordinates \(\lambda_i\in\FF\).

Let us consider an \(n\)-dimensional vector space \(V\) where we choose an ordered basis \(\mathcal{B}=(\mathbf{v}_1,\dots,\mathbf{v}_n)\). We can then write any arbitrary vector \(\mathbf{x}\in V\) as a sum of the basis vectors \(\mathbf{v}_i\in V\), \(i=1,\dots,n\), i.e.
\begin{equation}
\mathbf{x}=\lambda_1\mathbf{v}_1+\lambda_2\mathbf{v}_2+\dots+\lambda_n\mathbf{v_n}
\end{equation}
Using the coordinate map \(\phi_{\mathcal{B}}\) above, knowing it is an isomorphism, this is equivalent to writing \(\mathbf{x}\) as coordinate matrix (column vector) of its coordinates with respect to this basis \(\mathcal{B}\):
\begin{equation}
\phi_{\mathcal{B}}(\mathbf{x})=[\mathbf{x}]_{\mathcal{B}}=\left[ \begin{array}{c}\lambda_1\\ \vdots\\ \lambda_n \end{array} \right]_{\mathcal{B}}
\end{equation}
where \(\lambda_i\in\FF\) for \(i=1,\dots,n\).

Alternatively, we can express the same vector \(\mathbf{x}\in V\) with respect to a different ordered basis \(\mathcal{B}’=(\mathbf{w}_1,\dots,\mathbf{w}_n)\) (again with \(\mathbf{w}_i\in V,\ i=1,\dots,n)\):
\begin{equation}
\mathbf{x}=\kappa_1\mathbf{w}_1+\kappa_2\mathbf{w}_2+\dots+\kappa_n\mathbf{w_n},
\end{equation}
or via a coordinate map \(\phi_{\mathcal{B}’}\) as an \(n\)-tuple in \(\FF^n\):
\begin{equation}
\phi_{\mathcal{B}’}(\mathbf{x})=[\mathbf{x}]_{\mathcal{B}’}=\left[ \begin{array}{c}\kappa_1\\ \vdots\\ \kappa_n \end{array} \right]_{\mathcal{B}’}
\end{equation}
where \(\kappa_i\in\FF\) for \(i=1,\dots,n\). Note the subscripts on the \(n\)-tuples above denoting the basis. The \(n\)-tuples are only meaningful for \(V\) if we know with respect to which basis they are meant, and the numbers in them are different with respect to different bases.

The left-hand sides of the above equations all refer to the same object, a fixed vector \(\mathbf{x}\in V\). Coordinate transformations tell us how to obtain the \(\kappa_i\) from the \(\lambda_i\), \(i=1,\dots,n\) as we switch from basis \(\mathcal{B}\) to basis \(\mathcal{B}’\).

We want to know how the coordinate matrices \([\mathbf{x}]_{\mathcal{B}}\) and \([\mathbf{x}]_{\mathcal{B}’}\) are related, i.e. how to compute one from the other if we know the coordinate maps \(\phi_{\mathcal{B}}\) and \(\phi_{\mathcal{B}’}\) for the two bases. The map which converts \([\mathbf{x}]_{\mathcal{B}}\) to \([\mathbf{x}]_{\mathcal{B}’}\) is the change of basis operator \(\phi_{\mathcal{B},\mathcal{B}’}:=\phi_{\mathcal{B}’}\circ\phi_{\mathcal{B}}^{-1}\). Since \(\phi_{\mathcal{B},\mathcal{B}’}\in\mathcal{L}(\FF^n)\) (i.e.\ is linear and from \(\FF^n\) to \(\FF^n\)), \(\phi_{\mathcal{B},\mathcal{B}’}\) has the form \(\varphi_A\) from the above Theorem (matrix of linear map) with \(n\times n\) matrix
\begin{eqnarray}
A&:=&(\phi_{\mathcal{B},\mathcal{B}’}(\mathbf{e}_1)|\dots|\phi_{\mathcal{B},\mathcal{B}’}(\mathbf{e}_n))=(\phi_{\mathcal{B}’}\circ\phi_{\mathcal{B}}^{-1}([\mathbf{v}_1]_{\mathcal{B}})|\dots|\phi_{\mathcal{B}’}\circ\phi_{\mathcal{B}}^{-1}([\mathbf{v}_1]_{\mathcal{B}}))\nonumber\\
&=&([\mathbf{v}_1]_{\mathcal{B}’}|\dots|[\mathbf{v}_n]_{\mathcal{B}’}),\nonumber
\end{eqnarray}
i.e. its columns are the ordered basis vectors of basis \(\mathcal{B}\) expressed in \(\FF^n\) with respect to the other basis \(\mathcal{B}’\) (i.e. via the coordinate map \(\phi_{\mathcal{B}’}\)).

We write \(M_{\mathcal{B},\mathcal{B}’}\) for \(A\) and call it the change of basis matrix from \(\mathcal{B}\) to \(\mathcal{B}’\).

Theorem (Change of Basis Matrix): Let \(\mathcal{B}\) and \(\mathcal{B}’\) be ordered bases of a vector space \(V\). Then the change of basis is accomplished by the change of basis operator \(\phi_{\mathcal{B},\mathcal{B}’}:=\phi_{\mathcal{B}’}\circ\phi_{\mathcal{B}}^{-1}\) (where the latter are the coordinate maps of the bases). The associated matrix \(M_{\mathcal{B},\mathcal{B}’}\) to this operator, which takes the coordinate matrix of a vector \(\mathbf{x}\in V\) from basis \(\mathcal{B}\) to basis \(\mathcal{B}’\), i.e. the matrix
\begin{equation}
[\mathbf{x}]_{\mathcal{B}’}=M_{\mathcal{B},\mathcal{B}’}[\mathbf{x}]_{\mathcal{B}},
\end{equation}
has the form
\begin{equation}
M_{\mathcal{B},\mathcal{B}’}=([\mathbf{v}_1]_{\mathcal{B}’}|\dots|[\mathbf{v}_n]_{\mathcal{B}’}).
\end{equation}
\(\mathbf{v}_1,\dots,\mathbf{v}_n\in V\) here denote the basis vectors of basis \(\mathcal{B}\).

The opposite direction of the basis transformation is accomplished by \(M_{\mathcal{B’},\mathcal{B}}=M_{\mathcal{B},\mathcal{B}’}^{-1}\) (the matrix is guaranteed to be invertible, because \(\phi_{\mathcal{B},\mathcal{B}’}\) is an isomorphism (automorphism)).

Matrix Representation of Linear Transformations

In this subsection we want review how to represent linear transformations as matrices with respect to bases of the source and target vector spaces.

The next theorem refers to the upper half of the commutative diagram in the above Figure (let us ignore the lower half at first), showing a linear transformation \(\varphi:V\rightarrow W\) and its representation, the matrix \(\rho:\FF^n\rightarrow\FF^m\) with respect to some bases of \(V\) and \(W\) denoted by the dashes lines.

Theorem (Linear map as matrix): Let \(V\) and \(W\) be two vector spaces with dim \(V=n\) and dim \(W=m\). Let \(\varphi:V\rightarrow W\) be a linear transformation from \(V\) to \(W\). Let \(\mathcal{B}\) be an ordered basis of \(V\) and \(\beta:V\rightarrow\FF^n\) the corresponding a coordinate map (denoted by \(\phi_{\mathcal{B}}\) previously). Let us choose an ordered basis \(\mathcal{C}\) of \(W\) as well, with coordinate map \(\gamma:W\rightarrow\FF^m\). Then the linear transformation \(\varphi:V\rightarrow W\) can be represented with respect to bases \(\mathcal{B}\) and \(\mathcal{C}\) as a matrix multiplication with a matrix \(\rho(\varphi)(=[\varphi]_{\mathcal{B},\mathcal{C}})\) such that for any vector \(\mathbf{x}\in V\):
\begin{equation}
[\varphi(\mathbf{x})]_{\mathcal{C}}=\rho(\varphi)[\mathbf{x}]_{\mathcal{B}},
\end{equation}
where the matrix
\begin{equation}
\rho=([\varphi(\mathbf{v}_1)]_{\mathcal{C}} | \dots | [\varphi(\mathbf{v}_n]_{\mathcal{C}})),
\end{equation}
and \(\mathbf{v}_1,\dots,\mathbf{v}_n\) denotes the basis vectors of basis \(\mathcal{B}\) of \(V\).

Next let us see what happens to the representation matrix \(\rho\) of the linear transformation \(\varphi\) under change of bases of \(V\) and \(W\). Let \(\mathcal{B}’\) be another ordered basis of \(V\) and \(\beta’:V\rightarrow\FF^n\) the corresponding a coordinate map. Analogously, let us choose a second basis \(\mathcal{C}’\) for \(W\) with corresponding coordinate map \(\gamma’:W\rightarrow\FF^m\). The matrix \(\rho\) of the linear transformation depends on the choice of bases in both \(V\) and \(W\):

Proposition: The matrix \(\rho\) of a linear transformation \(\varphi:V\rightarrow W\) changes with the change of bases in \(V\) and \(W\) as:
\begin{equation}
\rho'(\varphi)=C\circ\rho(\varphi)\circ B^{-1},
\end{equation}
where \(B\) is a coordinate transformation between bases of \(V\) and \(C\) is a coordinate transformation between bases of \(W\) as in the preceding text and depicted in the above Figure.

Proof: It follows from the commutativity of the diagram in the above Figure that \(\rho(\varphi)=\gamma\circ\varphi\circ\beta^{-1}.\) Then we have
\begin{equation}
\rho'(\varphi)=\gamma’\circ\varphi\circ(\beta’)^{-1}=(C\circ\gamma)\circ\varphi\circ(B\circ\beta)^{-1}=C\circ\underbrace{\gamma\circ\varphi\circ\beta^{-1}}_{=\rho(\varphi)}\circ B^{-1}.
\end{equation}
Q.E.D.

Diagonalization

Diagonalization is a larger topic, which we shall not get into here, because we will not really need it much. However, we shall point out a few obvious facts from what we have just worked out to connect it to our definition of a diagonalizable matrix in the first part of this article.

In the last proposition above, we see that if function \(\varphi: V\rightarrow V\) is an endomorphism and we require that the choice of basis in domain and codomain is the same, we end up with

\begin{equation}
\rho'(\varphi) = B \rho(\varphi) B^{-1}
\end{equation}

for any basis transformation B. This is exactly the expression we encountered in the definition of a diagonalizable matrix earlier on, at the end of the first part on matrix calculations. \(\rho(\varphi)\) is diagonalizable, if we can find a \(B\) such that \(\rho'(\varphi\) is diagonal.

The above proposition spoke of basis transformation. Therefore, we see that a linear transformation \(\varphi: V\rightarrow V\) is diagonalizable, if a basis of \(V\) can be found such that the matrix \(\rho’\) of the linear transformation with respect to this basis is diagonal. Such a basis consists of eigenvectors of \(\varphi\):

Definition (Eigenvalue, Eigenvector): Let \(\varphi: V\rightarrow V\) be an endomorphism of a vector space \(V\) over field \(\FF\). Then \(\mathbf{v}\in V\) is called an eigenvector and \(\lambda\in\FF\) its associated eigenvalue, if

\begin{equation}
\varphi(\mathbf{v}) = \lambda \mathbf{v}.
\end{equation}

The prefix “eigen-” comes from German and means “its own”, capturing the idea that an eigenvector \(\mathbf{v}\) is a vector which is mapped by the linear transformation \(\varphi\) onto itself, up to a scalar multiple (change in length). Its change in length is determined by the eigenvalue \(\lambda\), which corresponds to the diagonal entry in the diagonal matrix \(\rho'(\varphi)\) for that eigenvector.

Eigenvalues can be zero. In that case, the endomorphism is not invertible (not bijective), because the endomorphism maps a certain subspace of the domain onto the zero vector in the codomain.

Not all matrices are diagonalizable. For instance, rotation matrices in 3D are generally not diagonalizable (over \(RR\)). One can easily visualize this by remembering Euler’s rotation theorem: every rotation can be expressed as a rotation around a suitable axis. The vector along this axis is an eigenvector of the rotation, because it does not change direction under the rotation. It has eigenvalue 1, because it does not change length, either. However, all vectors in other directions do change direction under the rotation (unless the rotation is by 180 degrees, in which case the other two eigenvalues are -1), so there are no other eigenvectors. Since we will be dealing with rotations mostly, diagonalization will not play a major role in our studies. (Rotation matrices are, however, diagonalizable over the algebraically closed field of complex numbers, \(\CC\), but that is yet another topic. They have one real eigenvalue equal to 1 and two complex ones.)

Let us summarize the essence of the last few subsections:

  1. If we choose a basis of an \(n\)-dimensional vector space \(V\), this is equivalent to choosing an isomorphism from \(V\) into \(\FF^n\), where \(\FF\) is the field over which the vector space \(V\) is defined.
  2. A vector space homomorphism between \(\FF^n\) and \(\FF^m\) can be expressed as a multiplication with a \(m\times n\) matrix. The entries of this matrix are elements of \(\FF\).
  3. If we choose two different bases of \(V\), they are both equivalent to choosing different isomorphisms into \(\FF^n\). The basis transformation (going from one basis to another) can then be accomplished by an \(n\times n\) matrix with entries in \(\FF\)
  4. The columns of the change of basis matrix are the coordinate matrices of the original basis vectors expressed with respect to the new basis.
  5. Linear maps between vector spaces over the same field can be expressed by matrix multiplication, and the matrix of the map depends on the choice of bases in the two vector spaces.

We will use Point 4 a lot in studying flight dynamics. We will work with four different bases (axis systems/reference frames): inertial/world/Earth frame, body (aircraft) frame, wind frame, and occasionally also the velocity frame. We will need the basis transformations (coordinate transformations) frequently, because gravity is most naturally expressed in the inertial frame of the Earth, the aerodynamic forces (lift, drag, sideforce) are naturally expressed in the wind frame, and the equations of motion are best solved in the body frame (because the inertia tensor does not depend on time there). So all forces will need to be taken from their original frames to the body frame.

We shall look in the next section at such basis transformations (coordinate transformations) in a somewhat less rigorous fashion. You will need this section only if you plan on doing actual calculations in our courses and want to do more than just follow along.

Basis Transformations Between Axis Systems/Frames

Basis transformations in linear algebra are very useful. We will need them, for instance, to compute the components of a vector (e.g. force, moment, velocity) given in one frame (e.g. the inertial/world frame or the wind frame) in another frame (e.g. the body frame of the aircraft).

We wish to learn how to transform the basis vectors of one basis into another. For this we will end up using matrix \([BA]\) to go from basis A to basis B.

And we wish to learn how to solve the similar problem of expressing a fixed vector given as coefficients with respect to one basis in another basis. For this we will use matrix \([BAc]\). We will see that if the basis transformation is a rotation, the two matrices are the same. For general basis transformations, matrices \([BA]\) and \([BAc]\) turn out to be the inverse-transposed of each other.

In three dimensions, we define a basis (reference frame) \(\mathcal{A}\) consisting of three unit vectors \(\hat{\mathbf{a}}_1, \hat{\mathbf{a}}_2, \hat{\mathbf{a}}_3\). A basis is any set of vectors which are linearly independent\footnote{I.e. none of the basis vectors can be expressed as a linear combination of the others.} and span the whole vector space.\footnote{I.e. an vector in the vector space can be expressed as a linear combination of basis vectors. In an \(n\)-dimensional vector space this will require the basis to consist of \(n\) vectors.} In this work we typically choose these vectors to be orthogonal to each other with respect to the standard scalar product (dot product) in 3D, though in general this does not have to be the case. For brevity, we denote such a construction as \(\mathcal{A}:\{\hat{\mathbf{a}}_1, \hat{\mathbf{a}}_2, \hat{\mathbf{a}}_3\}\), where the lower case \(\hat{\mathbf{a}}_i\) denote the basis vectors, with \(i=1,2,3\). Likewise, we can construct a second basis \(\mathcal{B}:\{\hat{\mathbf{b}}_1, \hat{\mathbf{b}}_2, \hat{\mathbf{b}}_3\}\).

Basis-Independent Notation of Vectors as Vector Objects

We can write any arbitrary vector \(\mathbf{v}\) as a linear combination of basis vectors
\begin{eqnarray}\label{eq:vector}
\mathbf{v} &=& {}^\mathcal{A}v_1 \va_1 + {}^\mathcal{A}v_2 \va_2 + {}^\mathcal{A}v_3 \va_3 = \sum_{j=1}^3 {}^\mathcal{A}v_j \va_j \nonumber \\
&=& {}^\mathcal{B}v_1 \vb_1 + {}^\mathcal{B}v_2 \vb_2 + {}^\mathcal{B}v_3 \vb_3 = \sum_{j=1}^3 {}^\mathcal{B}v_j \vb_j
\end{eqnarray}
The above is a generally true vector equation and is presented in a basis independent notation. No reference to a particular basis has been made: all boldface quantities are vectors themselves, i.e. abstract elements of the vector space.

A basis {\(\va_i\)}\({}_{i=1,2,\dots,n}\) of an \(n\)-dimensional vector space is a set of vectors which:

  • span the vectors space (i.e. every vector in the vector space can be written as a linear combination of basis vectors like we did above for \(\mathbf{v}\))
  • the basis vectors are linearly independent, meaning their linear combination to express some arbitrary vector \(\mathbf{v}\) of the vector space is unique (which amounts to the coefficients \({}^\mathcal{A}v_i\) above being unique, i.e. only one combination works for a given \(\mathbf{v}\) and a chosen basis {\(\va_i\)}\({}_{i=1,2,\dots,n}\)).

For finite-dimensional vector spaces the number of vectors in a basis is always the same as the number of dimensions. It cannot be less, because of the first condition, and it cannot be more, because of the second one. Note that not any arbitrary combination of \(n\) vectors forms a basis though.

We will mostly work with 3-dimensional vector spaces, corresponding to the three spatial dimensions, and take as a basis vectors of unit length, which are orthogonal to each other with respect to the standard scalar product.

Basis-Dependent Notation of Vectors as Coordinate Matrices (3-Tuple of Coefficients)

We can write the coefficients of vector \(\mathbf{v}\) with respect to the \(\mathcal{A}\) basis as a 3-tuple of numbers (a so called coordinate matrix) in a basis dependent way:
\begin{equation}
{}^\mathcal{A}\mathbf{v} := \begin{pmatrix} {}^\mathcal{A}v_1 \\ {}^\mathcal{A}v_2 \\ {}^\mathcal{A}v_3 \end{pmatrix}_{\mathcal{A}}
\end{equation}
and likewise with respect to the \(\mathcal{B}\) basis:
\begin{equation}
{}^\mathcal{B}\mathbf{v} := \begin{pmatrix} {}^\mathcal{B}v_1 \\ {}^\mathcal{B}v_2 \\ {}^\mathcal{B}v_3 \end{pmatrix} _{\mathcal{B}}
\end{equation}
This is convenient in many situations, because now we can express linear transformations between vectors (such as rotations) by the means of matrix multiplication and let the computer crunch the numbers. But let us be clear that this notation makes only sense if we simultaneously specify the basis with respect to which these coefficients are given in the 3-tuples, which is expressed by the subscript behind the large parentheses. Constantly keeping track of which underlying basis the 3-tuple refers to is important, when we start doing matrix calculations.

Vectors are Abstract Objects, not \(n\)-Tuples of Numbers

To emphasize the above, let us remind ourselves that a three dimensional vector is an abstract object, an element of a vector space \(V\), which in the language of abstract algebra is an Abelian (commutative) group over a field satisfying certain properties. The vector becomes a 3-tuple of numbers only when expressed with respect to a chosen basis as a coordinate matrix. The choice of such a basis, as done in the previous section, is equivalent to choosing an isomorphism from the original vector space \(V\) into \(\RR^3\) (the vector space is in this case defined over the field of real numbers \(\RR\)).

Vectors can be visualized as arrows in space: they have magnitude and direction. Given as a 3-tuple of numbers they make only sense if one specifies the basis directions these numbers refer to, e.g. the first entry meaning direction of north, the second entry meaning direction east, and the third entry meaning down. Because we could choose a different basis, e.g. with the third entry pointing up, and then the vector with the same numerical coefficients would be pointing into a different direction.

Visualizing vectors as arrows is not the only way. It is important to note, in fact, that vectors do not only have to be arrows. The real numbers \(\RR\) are a field, but the real numbers in multiple dimensions, \(\RR^n\), are not a field, but form a vector space, as an additive group over the field of real numbers themselves. Vectors can be polynomials, i.e. a certain type of function \(f(x)\), with one possible choice of basis being the monomials \(\mathcal{M}:\{1, x, x^2, x^3, \dots\, x^n\}\), where the 1 is the constant function \(f(x) = 1 \in V\), not the real number \(1\in\RR\). Let us look at the function \(f(x) = 5\cdot x^3 + (2+4)\cdot x+7\), for example. Here \(5\in\RR\), \(\cdot\) denotes the multiplication of a scalar with a vector (this is an operation different from the multiplication of two numbers in \(\RR\)). The first plus is the plus between vectors (that of the Abelian group of the vector space). The second plus (between the 2 and the 4) is a different plus, it is the plus of the real number field \(\RR\) between two numbers. The third plus is again a plus between vectors, because the 7 at the end is not the number \(7\in\RR\), but rather shorthand for the constant function \(7\cdot 1\in V\) (the \(1\) here again being the constant function and not the number 1), also a vector. In fact, with respect to the monomial basis in four dimensions (polynomials up to 3rd order), we can write the above function \(f(x)\) consistently with our notation as the 4-tuple of numbers
\begin{equation}
{}^{\mathcal{M}}f(x) = \begin{pmatrix} 5 \\ 0 \\ 6 \\ 7 \end{pmatrix}_{\mathcal{M}}
\end{equation}
Here again, the choice of basis is absolutely crucial for this 4-tuple of coefficients to have any meaning. So very generally, the elements of any set that has the algebraic structure of a vector space are vectors. And they can always be mapped into \(n\)-tuples of numbers, because all finite-dimensional (\(n\)-dimensional) vector spaces with the same number of dimensions are isomorphic to each other, so an isomorphism into \(\RR^n\) can always be found.

Transformation of Basis Vectors

This section treats how the basis vectors (abstract objects) transform under basis transformation. We are thinking here of the basis vectors as moving in the process (e.g. by asking the question, how can I express a new basis vector in terms of the old ones).

We can express the basis vectors of one basis with respect to another. For instance, if we take the vector \(\mathbf{v}\) in Eq.~\eqref{eq:vector} to be the \(i\)-th basis vector of basis \(\mathcal{B}\), we can write it as a linear combination of basis vectors from basis \(\mathcal{A}\):
\begin{equation}
\mathbf{b}_i = {}^\mathcal{A}b_{i1} \va_1 + {}^\mathcal{A}b_{i2} \va_2 + {}^\mathcal{A}b_{i3} \va_3 = \sum_{j=1}^3 {}^\mathcal{A}b_{ij} \va_j
\end{equation}
Stacking the above line for each individual basis vector, we can now form the vector of vectors (or vectrix, because when viewed as a \(3\times3\) matrix each row is a vector) on the lefthand side, which satisfies the equation
\begin{equation}\label{eq:vectrix}
\begin{pmatrix} \vb_1 \\ \vb_2 \\ \vb_3 \end{pmatrix} = [BA] \begin{pmatrix} \va_1 \\ \va_2 \\ \va_3 \end{pmatrix}
\end{equation}
where each entry of the above three component vector is not a number, but one of the basis vectors themselves. Here \([BA]\) is the matrix
\begin{equation}
[BA]=\begin{pmatrix}
{}^\mathcal{A}b_{11} & {}^\mathcal{A}b_{12} & {}^\mathcal{A}b_{13} \\
{}^\mathcal{A}b_{21} & {}^\mathcal{A}b_{22} & {}^\mathcal{A}b_{23} \\
{}^\mathcal{A}b_{31} & {}^\mathcal{A}b_{32} & {}^\mathcal{A}b_{33}
\end{pmatrix}
\end{equation}
By construction, the \(i\)-th row of this maxtrix contains the coefficients of the \(i\)-th basis vector of basis \(\mathcal{B}\) expressed with respect to basis \(\mathcal{A}\). We can write this as
\begin{equation}\label{eq:B rows}
[BA] = \begin{pmatrix} ({}^{\mathcal{A}} \vb_1)^T \\ ({}^{\mathcal{A}} \vb_2)^T \\ ({}^{\mathcal{A}} \vb_3)^T \end{pmatrix}
\end{equation}
We have included the transposed symbol, because we typically think of the 3-tuple of numbers \({}^{\mathcal{A}} \vb_i\) as being vertical, yet above we want to spread them out horizontally.

 

Note that we can make the same argument as above for the vectors of the \(\mathcal{A}\) basis, by multiplying Eq.~\eqref{eq:vectrix} from the left with the multiplicative inverse of \([BA]\), i.e. by \([AB]:=[BA]^{-1}\). The rows of \([AB]\) contain the coefficients of the basis vectors of basis \(\mathcal{A}\) with respect to basis \(\mathcal{B}\):
\begin{equation}
[AB] = \begin{pmatrix} ({}^{\mathcal{B}} \va_1)^T \\ ({}^{\mathcal{B}} \va_2)^T \\ ({}^{\mathcal{B}} \va_3)^T \end{pmatrix}.
\end{equation}

Special Case: Rotations

In general, to get to \([AB]\) we have to invert \([BA]\), so there is no obvious, neat way to achieve a similar picture in terms of \([BA]\) directly. However, in the special case that \([BA]\) is a rotation matrix and therefore orthonormal—which happens to be the only case of interest for us for the entire document, as all we are dealing with are rotations—we can make elegant use of the property that \([AB]=[BA]^T\) for orthonormal matrices. Since transposition simply switches rows for columns, we realize that—again only for the special case of rotations—:
\begin{equation}\label{eq:A columns}
[BA] = ({}^{\mathcal{B}} \va_1, {}^{\mathcal{B}} \va_2, {}^{\mathcal{B}} \va_3)
\end{equation}
i.e.\ the \(j\)-th column of matrix \([BA]\) contains the coefficients of the \(j\)-th basis vector of basis \(\mathcal{A}\) expressed with respect to basis \(\mathcal{B}\).

Being able to construct the columns of \([BA]\) from the vectors of the originating basis, Eq.~\eqref{eq:A columns}, or the rows of \([BA]\) from the target basis vectors, Eq.~\eqref{eq:B rows}, is a very important and useful tool we will employ often when we need to write down the rotation matrix \([BA]\) (commonly referred to as the directional cosine matrix (DCM) in the context of rotations), especially numerically on the computer. We use this for instance when implementing the TRIAD method to find the initial attitude of the aircraft, where we go from constructing the TRIAD basis vectors to forming the associated DCMs with Eq.~\eqref{eq:A columns}, multiplying them, and finally extracting the Euler angles of the aircraft attitude from the combined DCM using Eqs.~\eqref{eq:DCM to Euler321 1}–\eqref{eq:DCM to Euler321 3}.

Transformation of Coefficients of a Vector

This section treats how the coefficients (3-tuple of numbers with respect to a chosen basis) of a fixed vector (abstract object) transform under basis transformation. This can be thought of as the basis vectors rotating underneath the fixed vector. This is different from the previous subsection, where we were looking at how the basis vectors (abstract objects) themselves move under basis transformation.

The above matrix \([BA]\) told us how the basis vectors transform, e.g. the vector \(\vb_2\). But how about their coefficients. What matrix \([BAc]\) for the transformation of vector coefficients (not basis vectors) do we need to write to make the following equation true:
\begin{equation}
{}^\mathcal{B}\mathbf{v} = [BAc] {}^\mathcal{A}\mathbf{v}
\end{equation}
for any vector \(\mathbf{v}\)? This will be in general a different matrix than the matrix \([BA]\) we encountered in Eq.~\eqref{eq:vectrix}, even though it also goes from basis \(\mathcal{A}\) to basis \(\mathcal{B}\), because the objects we are transforming here are different (\({}^\mathcal{B}\mathbf{v}\) is not the same type of object as the \((\vb_1, \vb_2, \vb_3)^T\) in Eq.~\eqref{eq:vectrix}).

From Eq.~\eqref{eq:vector} (in the first step) and Eq.~\eqref{eq:vectrix} (in the second step) we obtain
\begin{eqnarray}
\mathbf{v} &=& {}^\mathcal{B}v_1 \vb_1 + {}^\mathcal{B}v_2 \vb_2 + {}^\mathcal{B}v_3 \vb_3 = \\
&=& {}^\mathcal{B}v_1BA[1,1]\va_1 + {}^\mathcal{B}v_1BA[1,2]\va_2 + {}^\mathcal{B}v_1BA[1,3]\va_3 + \nonumber \\
&+& {}^\mathcal{B}v_2BA[2,1]\va_1 + {}^\mathcal{B}v_2BA[2,2]\va_2 + {}^\mathcal{B}v_2BA[2,3]\va_3 + \nonumber \\
&+& {}^\mathcal{B}v_3BA[3,1]\va_1 + {}^\mathcal{B}v_3BA[3,2]\va_2 + {}^\mathcal{B}v_3BA[3,3]\va_3 \nonumber
\end{eqnarray}
Each line contains one of the \(\vb\) basis vectors expanded/written in terms of \(\va\)’s.

We can regroup the above terms by like \(\va\)’s, i.e. group the rows together and remember from Eq.~\eqref{eq:vector} that
\begin{equation}
\mathbf{v} = {}^\mathcal{A}v_1 \va_1 + {}^\mathcal{A}v_2 \va_2 + {}^\mathcal{A}v_3 \va_3
\end{equation}
to obtain by comparison the vector coefficients with respect to the \(\mathcal{A}\) basis:
\begin{eqnarray}
{}^\mathcal{A}v_1&=& {}^\mathcal{B}v_1BA[1,1] + {}^\mathcal{B}v_2BA[2,1] + {}^\mathcal{B}v_3BA[3,1] \nonumber \\
{}^\mathcal{A}v_2&=& {}^\mathcal{B}v_1BA[1,2] + {}^\mathcal{B}v_2BA[2,2] + {}^\mathcal{B}v_3BA[3,2] \nonumber \\
{}^\mathcal{A}v_3&=& {}^\mathcal{B}v_1BA[1,3] + {}^\mathcal{B}v_2BA[2,3] + {}^\mathcal{B}v_3BA[3,3] \nonumber \\
\end{eqnarray}
We define the transformation matrix \([ABc]\) for the coefficients of an arbitrary vector from the \(\mathcal{B}\) basis to the \(\mathcal{A}\) basis as
\begin{equation}\label{eq:ABc}
{}^\mathcal{A}\mathbf{v} = [ABc] {}^\mathcal{B}\mathbf{v}
\end{equation}
and realize that the columns of this matrix are the components (coefficient matrices) of the \(\mathcal{B}\) basis vectors expressed with respect to the \(\mathcal{A}\) basis:

\begin{equation}
[ABc] = [{}^{\mathcal{A}}\vb_1, {}^{\mathcal{A}}\vb_2, {}^{\mathcal{A}}\vb_3].
\end{equation}

Furthermore, when we compare this expression component wise to the expression for [BA] obtained earlier, we realize that
\begin{equation}
[ABc] = [BA]^T.
\end{equation}
On the other hand, multiplying Eq.~\ref{eq:ABc} by \([BAc]:=[ABc]^{-1}\) defined as the multiplicative inverse of matrix \([ABc]\), we obtain finally:
\begin{equation}\label{eq:BAc}
{}^{\mathcal{B}}\mathbf{v} = [BAc] {}^\mathcal{A}\mathbf{v} = {[ABc]}^{-1} {}^{\mathcal{A}}\mathbf{v} = {([BA]^T)}^{-1} {}^{\mathcal{A}}\mathbf{v}
\end{equation}
So our general answer for transformation of vector coefficients expressed in terms of the basis vector transformation matrix is:
\begin{equation}\label{eq:BAc2}
[BAc] = {([BA]^T)}^{-1}
\end{equation}
The vector coefficients transform with the inverse transposed of the transformation matrix for the basis vectors themselves. (This is oftentimes expressed as saying that basis vectors transform covariantly under coordinate transformations, while coefficients of vectors transform contravariantly.)

Special Case: Rotations

It is a standard result from linear algebra that rotation matrices are orthonormal. These matrices have the computationally nice property that the inverse is equal to the transposed, i.e. \(M^{-1}=M^T\). Applying this property to Eq.~\eqref{eq:BAc} results in
\begin{equation}
[BAc]=[BA].
\end{equation}
We shall stress that this is only true for rotations and not general basis transformations. However, since in this document we will deal with rotations only (we will never stretch any of the directions or change the angles between the basis vectors—they will always stay orthogonal to each other), we can safely ignore the distinction between basis vector and vector coefficient transformation matrices, when we discuss transformation matrices, and you will see us write \([BA]\) instead of \([BAc]\) even for the transformation of vector coefficients.