\(
\newcommand{\BE}{\begin{equation}}
\newcommand{\EE}{\end{equation}}
\newcommand{\BA}{\begin{eqnarray}}
\newcommand{\EA}{\end{eqnarray}}
\newcommand\CC{\mathbb{C}}
\newcommand\FF{\mathbb{F}}
\newcommand\NN{\mathbb{N}}
\newcommand\QQ{\mathbb{Q}}
\newcommand\RR{\mathbb{R}}
\newcommand\ZZ{\mathbb{Z}}
\newcommand{\va}{\hat{\mathbf{a}}}
\newcommand{\vb}{\hat{\mathbf{b}}}
\newcommand{\vn}{\hat{\mathbf{n}}}
\newcommand{\vt}{\hat{\mathbf{t}}}
\newcommand{\bx}{\mathbf{x}}
\newcommand{\bv}{\mathbf{v}}
\newcommand{\bg}{\mathbf{g}}
\newcommand{\bn}{\mathbf{n}}
\newcommand{\by}{\mathbf{y}}
\)

Multivariable Calculus Primer

Introduction

In this brief multivariable calculus primer, we review some of the very basics of this topic. First we discuss multivariable differential calculus and will introduce vector fields and the nabla operator, which gives rise to gradient, when applied to a scalar function and to divergence and curl, when applied to a vector valued function by means of the dot product and cross product, respectively. Then we will proceed to discuss multivariable integral calculus, introducing volume integrals, line integrals, and surface integrals. Finally, we will discuss the gradient theorem, the divergence theorem, Stokes’ theorem, and its two-dimensional special case, Green’s theorem, which combine all of these aspects of multivariable calculus.

We use multivariable calculus in some of our more advanced courses, e.g. on test pilot/flight test theory, where for instance we use line integrals in the chapter on 2D aerodynamics to compute the air circulation around an airfoil, which is proportional to the lift generated by the airfoil according to the Kutta-Joukowski theorem.

Basic multivariable calculus is not difficult and the concepts are easy to grasp. We shall focus here on just that: presentation of the basic ideas and concepts and focusing on the practical aspects like learning how to calculate, without dwelling too much on mathematical minutiae (e.g. we dispense with the stating of the exact prerequisites for theorems and definitions). (This is unlike our linear algebra primer, where we felt that a thorough, precise foundation is necessary for deep understanding of the concepts.) Many of the links for specific terms lead to articles on Wikipedia which may provide additional information, more precise definitions, and calculated examples. (We can also calculate some examples together in class if you like.)

Notation

Numbers (ofter referred to as scalars) will be denoted by lowercase letters in normal font, either different ones like \(a, b, c, d,\) etc., or by lowercase letters with indices, e.g.: \(a_{11}, a_{12}, a_{21}, a_{22},\) etc. We shall deal only with real numbers in this document, but in general they could be complex (i.e. containing a real and an imaginary part).
Matrices will be denoted by uppercase letters, e.g.: \(A, B, C, D,\) etc., and/or will be sometimes enclosed in brackets […], for easier recognition.
The elements (entries) of matrices are numbers and are denoted with lowercase letters as any other number discussed above. If indices are used to distinguish the matrix elements, the first index counts through the (horizontal) lines, while the second index counts through the vertical columns.
Vectors will be denoted with lowercase boldface letters. oftentimes \(\mathbf{v}\) and \(\mathbf{w}\), but sometimes also \(\mathbf{a}\), \(\mathbf{b}\) and others. An optional hat on top of the vector signifies the vector has unit length (length = 1).
The elements (entries) of vectors (which are numbers) will again be lowercase letters in normal font, because they are numbers.

Differential Multivariable Calculus

Scalar and Vector-Valued Functions

Scalar Functions

A scalar function \(f\) assigns every position in multidimensional space \(\mathbf{x}\in\RR^n\) a single real number \(f(\mathbf{x})\in\RR\). They are functions of the form

\begin{eqnarray}
f:\RR^n&\rightarrow&\RR\\
\mathbf{x}&\mapsto&f(\mathbf{x})
\end{eqnarray}

where \(\mathbf{x}=(x_1, \dots, x_n)\in\RR^n\). We use lowercase letters to denote scalar functions. A physical example of such a function would be one that assigns a value for temperature to every point in 3-dimensional space, but keep in mind that strictly speaking the above is just dealing with real numbers so far (without deeper interpretation, which may imply a vector space structure and choice of coordinates \(x_j, j=1,\dots,n\)).

Vector-Valued Functions

Vector-valued functions are just like the above, but instead of a single number, they assign a point in \(\RR^m\) to every point \(\mathbf{x}\) in the domain \(U \subset \RR^n\):

\begin{eqnarray}
\mathbf{F}:U\subset \RR^n&\rightarrow&\RR^m\\
\mathbf{x}&\mapsto&\mathbf{F}(\mathbf{x})
\end{eqnarray}

We use uppercase boldface letters to denote them. The vector-valued function \(\mathbf{F}\) can be split into components, \(\mathbf{F}=(F_1, F_2, \dots, F_m)\), where each of the \(F_i\), with \(i=1,\dots,n\), is a scalar function on \(\RR^n\):

\begin{eqnarray}
F_i:U\subset \RR^n&\rightarrow&\RR\\
\mathbf{x}&\mapsto&F_i(\mathbf{x})
\end{eqnarray}

The above is just working with real numbers, but we can extend it easily onto vector spaces and interpret \(\mathbf{F}\) as a vector field, i.e. a map that assigns a vector to every point in the domain. Let us pick a vector space, e.g. Euclidean space, and a coordinate map into \(\RR^n\). We can do this for both the domain and the codomain and thereby extend the above formalism onto vector spaces.

A physical example of such a vector-valued function \(\mathbf{F}:U\subset \RR^3\rightarrow\RR^3\) would be to assign the 3-dimensional direction of the magnetic field of the Earth to every point in space (after choice of suitable bases/coordinate maps in both physical spaces). Unlike the temperature in the previous example, the magnetic field not only has magnitude, but also direction.

Derivatives of Scalar and Vector-Valued Functions

Partial Derivative

A partial derivative with respect to \(x_k\) of a scalar function \(f\) of multiple variables \(\mathbf{x}=(x_1, x_2, \dots, x_n)\) defined on domain \(U\subset\RR^n\),

\begin{eqnarray}
f: U\subset \RR^n &\rightarrow& \RR\\
\mathbf{x}=(x_1, x_2, \dots, x_n)&\mapsto& f(\mathbf{x})=f(x_1, x_2, \dots, x_n)
\end{eqnarray}

is the derivative of \(f\) with respect to \(x_k\) alone, with all other variables \(x_j, j\not= k\), held fixed. It is denoted by
\begin{equation}
\frac{\partial f}{\partial x_k}
\end{equation}

(Note the symbol \(\partial\) as opposed to \(d\), which is used for total derivatives, see below.)

A partial derivative of a vector-valued function is applied to each scalar function component \(F_i\) individually

\begin{equation}
\frac{\partial \mathbf{F}}{\partial x_k} =
\begin{pmatrix}
\frac{\partial F_1}{\partial x_k} \\
\frac{\partial F_2}{\partial x_k}\\
\dots\\
\frac{\partial F_m}{\partial x_k}
\end{pmatrix}
\end{equation}

and yields a vector. However, we will typically not apply individual partial derivatives to vector-valued functions, but rather use the nabla operator, which is a vector of partial derivatives and operates on individual components directly, as we shall explain next.

Nabla Operator

We define the nabla (or del) operator as an \(n\) dimensional column vector of partial derivative operators:

\begin{equation}
\nabla =
\begin{pmatrix}
\frac{\partial}{\partial x_1}\\
\frac{\partial}{\partial x_2}\\
\dots\\
\frac{\partial}{\partial x_n}\\
\end{pmatrix}
\end{equation}

It is an operator, meaning that it gets applied to functions. We can apply is to scalar and vector valued functions as follows (we shall continue in three dimensions).

Gradient of a Scalar Function

We can apply the nabla operator to a scalar function \(f\) to obtain a gradient vector:

\begin{equation}
\mathrm{grad}\ f = \nabla f =
\begin{pmatrix}
\frac{\partial f}{\partial x}\\
\frac{\partial f}{\partial y}\\
\frac{\partial f}{\partial z}
\end{pmatrix}
\end{equation}

\(\mathrm{grad} f:\RR^3\rightarrow\RR^3\) is a vector-valued function on \(\RR^3\) (it can be extended analogously to higher dimensions as well).

The gradient vector always points in the direction of the steepest slope (largest increase of \(f\)) and has the magnitude of the this steepest slope.

Divergence and Curl of a Vector Field

If applied to a vector-valued function \(\mathbf{F}=(F_1, F_2, F_3)\), the nabla operator \(\nabla\) can be either applied with a dot product (scalar product), resulting in divergence, or with a cross product (vector product), resulting in curl (rotation).

Let us look at divergence first:

\begin{equation}
\mathrm{div}\ \mathbf{F} = \nabla \cdot \mathbf{F} = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}.
\end{equation}

Note that divergence \(\mathrm{div}\ \mathbf{F}: \RR^3 \rightarrow \RR\), is a scalar function on \(\RR^3\), even though \(\mathbf{F}\) originally was vector-valued. It gives a measure how much the vectors in the vector field \(\mathbf{F}\) are spreading apart in a particular point in space and will later lead to the divergence theorem (the volume integral of the divergence over an enclosed volume is equal to the flux of the vector field through its bounding surface). The Wikipedia article linked above contains an illustration.

The curl of \(\mathbf{F}\), on the other hand, will remain a vector-valued function:

\begin{equation}
\mathrm{curl}\ \mathbf{F} = \mathrm{rot}\ \mathbf{F} = \nabla \times \mathbf{F} =
\begin{pmatrix}
\frac{\partial F_3}{\partial y} – \frac{\partial F_2}{\partial z}\\
\frac{\partial F_1}{\partial z} – \frac{\partial F_3}{\partial x}\\
\frac{\partial F_2}{\partial x} – \frac{\partial F_1}{\partial y}\\
\end{pmatrix}
\end{equation}

Curl measures the amount of rotation (vorticity) of a vector field, i.e. how much the vectors rotate around a certain point. The Wikipedia article linked above contains an illustration.

(Total) Differential and Jacobian Matrix

You may skip the rest of this blue section if you like and proceed to the red one, as this part will not be needed.

Derivatives and Differentials

For all practical purposes of this article, you can treat derivatives and differentials of a function as being the same thing. However, for mathematical purists, there is a distinction. If \(y\) is a function of \(x\), then we call \(dy\) the differential of \(y\), defined as

\begin{equation}
dy = \frac{dy}{dx} dx,
\end{equation}

and \(\frac{dy}{dx}\) is called the derivative. \(dx\) is its own independent variable, oftentimes regarded as an infinitesimal (an infinitely small change in a quantity) or as a 1-form if viewed from the perspective of differential forms.

Essentially, the differential is the derivative with the infinitesimal or 1-form included. The distinction is important for precise mathematical definitions, but for the purpose of our calculations, we shall ignore these details and sometimes use the term “derivative” for a differential. Disregard of the infinitesimals/1-forms will also happen when we “identify” the Jacobian matrix \([J]\) of a function with its differential \(df\) (implicitly assuming that the Jacobian matrix is defined with respect to a basis of 1-forms made up of \(dx_j\) based on choice of coordinates \(x_j, j=1,\cdot,n\)). We included this preamble such that you do not get confused by the different terminology in different sources and references.

Total Differential and Jacobian Matrix of a Scalar Function

Unlike the partial derivative, the or (total) differential (sometimes also called the total derivative) \(df(\mathbf{p})\) of a function \(f\) at point \(\mathbf{p}\in U\) takes all variable dependencies of all \(x_j, j=1,\dots,n\), into account simultaneously. The total differential \(df(\mathbf{p})\) (for fixed \(\mathbf{p}\)) is a linear map, which gives the best linear approximation of the function \(f\) at its evaluation point \(\mathbf{p}\). For a given choice of coordinates \(x_k\), \(k=1,\dots,n\) in the domain \(U\) of \(f\), the total differential \(df\) can be expressed in these coordinates by the so called Jacobian matrix

\begin{equation}
[J] =
\begin{pmatrix}
\frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \dots & \frac{\partial f}{\partial x_n}
\end{pmatrix}
\end{equation}

The Jacobian matrix is a collection of all partial derivatives with respect to all the coordinates, arranged in a specific order. All the partial derivatives are to be evaluated at point \(\mathbf{p}\in U\). The total differential \(df\) can then be written as:

\begin{equation}
df=[J]d\mathbf{x}=\sum_{j=1}^n \frac{\partial f}{\partial x_j}dx_j
\end{equation}

where \(d\mathbf{x} = (dx_1, dx_2, \dots, dx_n)^T\). In this sense, using the above purist terminology distinction, \(df\) is the (total) differential, while \([J]\) is the total derivative.

The total differential \(df\) of a function and its Jacobian matrix \([J]\) represent the same thing in a certain way, but are conceptually not quite identical: they are related in the same way as the linear map \(\varphi:V\rightarrow W\) between two vector spaces \(V\) and \(W\) was related to its associated matrix \(\rho\) with respect to a choice of bases in \(V\) and \(W\) in our linear algebra primer (we encourage you to review this there). There we had a linear map \(\varphi: V\rightarrow W\), which we were able to represent by matrix multiplication with a matrix \(\rho\) (the multiplication with which was a function \(\RR^n\rightarrow\RR^m\), once a particular choice of ordered bases in the two vector spaces \(V\) and \(W\) was made (we are assuming here in the notation compared to the linear algebra primer that these vector spaces are defined over the field \(\RR\)).. We also proved a proposition how the matrix \(\rho\) changed (for the same unchanged linear map \(\varphi\)) and became a different matrix, which we denoted by \(\rho’\)), when a different choice of bases in \(V\) and \(W\) was made.

Same thing here. Here, the total differential \(df(\mathbf{p})\) (for fixed \(\mathbf{p}\)) is the abstract linear map and the Jacobian matrix \([J]\) (also evaluated at point \(\mathbf{p}\)) is the associated matrix given a choice of basis; they are related in the same way as linear map \(\varphi\) and its matrix \(\rho\) in the linear algebra primer. The basis with respect to which [J] was taken is the basis of coordinate infinitesimals \(dx_j\).

It is easy to see that the Jacobian matrix \([J]\) will change, if we pick a different set of coordinates \(x_j\), \(j=1,\dots,n\), because then the partial derivatives with respect to these different coordinates will have different values, as the coordinates will run in different directions (but the function \(f\) its total differential \(df\), will not have changed just because of a change of coordinates).

As a side note, we also notice that \(df(\mathbf{p})\) is a linear functional, i.e. a linear map \(\RR^n\rightarrow\RR\) from the vector space into its field of scalars, since its associated matrix has only one row. It operates on tangent vectors. In the language of differential forms it is a 1-form.

We can relate this understanding of total differentials of a function to the total derivatives which you know from high school (a slightly different but related concept), where you generally associated a total derivative \(\frac{df}{dt}\) of a function \(f(\mathbf{x}(t))\) with respect to a single parameter \(t\) as taking into account parameter dependencies all the way through, using the chain rule of differential calculus. In the case of nested functions, the total derivatives become a successive application of the linear maps which are the total differentials of the individual functions involved. For instance, consider function \(f(\mathbf{x})\) and let all the \(x_j\) above depend on a variable \(t\). This defines a one-parameter curve, along which we can compute the total derivative of \(f\) with respect to variable \(t\). It can be written as and equates to:

\begin{equation}
\frac{df}{dt} = \sum_{j=1}^{n} \frac{\partial f}{\partial x_j}\frac{d x_j}{d t}
\end{equation}

This expression can be written in terms of Jacobian matrices of the nested functions \(f\) and \(\mathbf{x}\) involved: \(\frac{df}{dt}=[J]\dot{\mathbf{x}}\). The above expression is nothing but the Jacobian matrix of \(f:\RR^n\rightarrow\RR, \mathbf{x}\mapsto f(\mathbf{x})\), multiplied with the Jacobian matrix of the vector-valued function \(\mathbf{x}:\RR\rightarrow\RR^n, t\mapsto \mathbf{x}(t)\), which happens to be the tangent vector of the curve

\begin{equation}
\dot{\mathbf{x}}=
\begin{pmatrix}
\frac{d x_1}{dt} \\
\frac{d x_2}{dt}\\
\dots\\
\frac{d x_n}{dt}
\end{pmatrix}
\end{equation}

We see that in order to understand this properly, we must first define the total derivative and Jacobian matrix for vector-valued functions. We shall do this next. We also note from the above that differentials operate as linear maps on tangent vectors, not on the points in \(\RR^n\) (points on the curve) itself. But the differential \(df\) is evaluated at a point on the curve (fixed \(\mathbf{p}\)), before it acts as a linear map on a tangent vector.

Notational side note: What do the \(dx_i\) do in \(df\)? Let \(\mathbf{e}_j\) be a unit basis vector of tangent space associated with (in direction of) coordinate \(x_j\). Then \(dx_i(\mathbf{e}_j)=1\) if \(i=j\), and 0 otherwise. Working with matrices in coordinates (i.e. with \([J]\) instead of \(df\)), one does not write the \(dx_i\) and \(\mathbf{e}_j\) explicitly; they are implied by the position of the entry in the matrix/vector.

Total Differential and Jacobian Matrix of a Vector-Valued Function

For a vector-valued function

\begin{eqnarray}
\mathbf{F}:U\subset\RR^m&\rightarrow&\RR^n\\
\mathbf{x}&\mapsto&\mathbf{F}(\mathbf{x})=(F_1(x_1, x_2, \dots,x_n), F_2(x_1, x_2, \dots,x_n), \dots, F_m(x_1, x_2, \dots,x_n))
\end{eqnarray}

we can perform the same for each of its scalar function components \(F_i(x_1, x_2, \dots,x_n)\). Building on our definition of a Jacobian matrix for a scalar function \(f\) from before, we shall arrange the Jacobian row matrices of the individual \(F_i\) as the rows of a new \(m \times n\) matrix.

We define the Jacobian matrix of a vector-valued function \(\mathbf{F}\) as the matrix of all its partial derivatives:

\begin{equation}
[J]=
\begin{pmatrix}
\frac{\partial F_1}{\partial x_1} & \frac{\partial F_1}{\partial x_2} & \dots & \frac{\partial F_1}{\partial x_n} \\
\frac{\partial F_2}{\partial x_1} & \frac{\partial F_2}{\partial x_2} & \dots & \frac{\partial F_2}{\partial x_n} \\
\dots & \dots & \dots & \dots\\
\frac{\partial F_m}{\partial x_1} & \frac{\partial F_m}{\partial x_2} & \dots & \frac{\partial F_m}{\partial x_n}
\end{pmatrix}
\end{equation}

This matrix (and the partial derivatives) are to be evaluated at any particular point \(\mathbf{p}\in\RR^n\) in the domain of \(\mathbf{F}\). Then the Jacobian matrix becomes a linear map, which best approximates function \(\mathbf{F}\) at that point \(\mathbf{p}\).

For the total differential \(d\mathbf{F}\) of function \(\mathbf{F}\) we have analogously as before

\begin{equation}
d\mathbf{F}=[J]d\mathbf{x}.
\end{equation}

The Jacobian matrix therefore is again the differential \(d\mathbf{F}\) expressed with respect to coordinates \(x_j,j=1,\dots,n\).

We can now write the total derivative of a vector-valued function with respect to a parameter \(t\) as a multiplication of the Jacobian matrix with the tangent vector of the one-parameter curve \(x(t)\),

\begin{equation}
\frac{d\mathbf{x}}{dt}=
\begin{pmatrix}
\frac{d x_1}{dt}\\
\frac{d x_2}{dt}\\
\dots\\
\frac{d x_n}{dt}\\
\end{pmatrix}
\end{equation}

The tangent vector takes care of the further dependence of the coordinates \(x(t)\) on \(t\) in this case. This results in

\begin{equation}
\frac{d \mathbf{F}}{dt} =
\begin{pmatrix}
\frac{d F_1}{dt}\\
\frac{d F_2}{dt}\\
\dots\\
\frac{d F_m}{dt}
\end{pmatrix}
=
[J]\ \frac{d\mathbf{x}}{dt}
=
\begin{pmatrix}
\sum_{j=1}^{n}\frac{\partial F_1}{\partial x_j}\frac{dx_j}{dt}\\
\sum_{j=1}^{n}\frac{\partial F_2}{\partial x_j}\frac{dx_j}{dt}\\
\dots\\
\sum_{j=1}^{n}\frac{\partial F_m}{\partial x_j}\frac{dx_j}{dt}
\end{pmatrix}
\end{equation}

Note that the tangent vector \(dot{\mathbf{x}}\) is itself the Jacobian matrix of the vector valued function \(\mathbf{x}:[a,b]\in\RR\rightarrow \RR^3, t\mapsto \mathbf{x}(t)\). So the total derivative with respect to parameter \(t\) of a function \(\mathbf{F}\) long a one-parameter curve \(\mathbf{x}\) with respect to the curve parameter \(t\) is just the matrix multiplication of the Jacobian matrices of \(\mathbf{F}\) and \(\mathbf{x}\). Put in other words, the total derivative of \(\mathbf{F}\) along the curve is the successive application of the total derivatives of the functions \(\mathbf{F}\) and \(\mathbf{x}\) with respect to their individual coordinates. This is commonly known as the chain rule for differentials of functions. The total differentials are linear maps (linear approximations of the functions), so it is not surprising that they can be combined as successive matrix multiplications with their matrices, the Jacobians, given a choice of coordinates.

Integral Multivariable Calculus

Line Integrals, Surface Integrals, and Volume Integrals

In this section we will study integrals of functions and vector fields along one-parameter curve in three-dimensional space, integrals of functions and vector fields over two-dimensional surfaces in three-dimensional space, and three-dimensional integral over volumes in three-dimensional space.

For pedagogical reasons, we shall tackle the volume integrals first and then proceed to discuss line and surface integrals.

Volume Integrals

Volume integrals integrate a scalar function \(f\) over a three-dimensional volume (a higher-dimensional generalization exists as well, but we shall just stick to three dimensions). We shall present the integral in Cartesian, cylindrical, and spherical coordinates, respectively:

\begin{eqnarray}
\int_V f dV & = & \iiint_V f(x,y,z)\, dxdydz\\
& = & \iiint_V f(\rho, \varphi, z)\, \rho d\rho d\varphi dz\\
& = & \iiint_V f(\rho,\theta,\varphi)\,\rho^2\sin\theta d\rho d\theta d\varphi
\end{eqnarray}

Fubini’s theorem says that the order of the integration variables does not matter (unless the boundaries depend on some of the variables, in which case you should choose to integrate over one variable first and then over another). So you can choose to integrate first over \(x\), \(y\), or over \(z\) above, and as you perform each integral, you treat all other variables as fixed.

Other coordinate systems than the three presented above can be chosen as well. The importance lies in expressing the volume form \(dV\) properly, when choosing a different coordinate system (and the three coordinates do not encompass a unit volume, when each is incremented by 1). This has been done above properly for the cylindrical and spherical coordinates, hence the extra factors after the function. We will encounter the same with the surface element, when we study surface integrals shortly.

Line Integrals

Line integrals integrate a scalar function or a vector field along a one-parameter, piecewise smooth curve C in three-dimensional space. Just like with the volume integral above, we must be mindful to include how far one has traveled along the curve for each unit increment of the curve parameter. This is expressed by the line element \(ds\).

Let

\begin{eqnarray}
\mathbf{r}:[t_1,t_2]\subset\RR&\rightarrow& C\\
t&\mapsto&\mathbf{r}(t)
\end{eqnarray}

be an arbitrary bijective (injective and surjective, i.e. invertible) parameterization of the curve \(C\) in three-dimensional space from a closed interval \([t_1,t_2]\subset\RR\). Then the infinitesimal line element \(ds\) along the curve can be expressed by the length of the tangent vector \(\dot{\mathbf{r}}\) times the infinitesimal parameter increment \(dt\), i.e. \(ds = |\dot{\mathbf{r}}| dt\), and we can write for the line integral of a scalar function, \(f:\RR^3\rightarrow\RR\) along curve C:

\begin{equation}
\int_C f(\mathbf{r})\, ds = \int_{t_1}^{t_2} f(\mathbf{r}(t))\, |\dot{\mathbf{r}}(t)| \, dt = \int_{t_1}^{t_2} f(x(t),y(t),z(t))\,\sqrt{\dot x^2(t)+\dot y^2(t)+\dot z^2(t)}\,dt
\end{equation}

where the last equality only holds in Cartesian coordinates for \(\RR^3\) and parametrized the curve in those coordinates is denoted by \(\mathbf{r}(t)=(x(t), y(t), z(t))\). The dot over the \(\mathbf{r}\) simply denotes the total derivative with respect to parameter \(t\), \(\dot{\mathbf{r}}=\frac{d\mathbf{r}}{dt}\).

We can use a line integral to compute the length of a curve by setting the function \(f\) in the expression above to \(f(x,y,z)=1\) everywhere. So the length of a line is simply the length of the tangent vector of the parametrization integrated over the whole parametrizing interval from the starting point to the end point.

The extension of line integrals to vector fields, \(\mathbf{F}:\RR^3\rightarrow\RR^3\), is relatively straightforward. Instead of multiplying the function \(f\) with the length of the tangent vector with respect to a parametrization, \(|\dot{\mathbf{r}}(t)|\), one uses the dot product (scalar product) between the vector \(\mathbf{F}(\mathbf{r}(t))\) at point \(\mathbf{r}(t)\) in 3D space and the tangent vector \(\dot{\mathbf{r}}(t)\). In doing so, one essentially only takes into account the component of the vector field \(\mathbf{F}\) which is parallel to the curve at any given point:

\begin{equation}
\int_C \mathbf{F}(\mathbf{r})\cdot d\mathbf{r} = \int_{t_1}^{t_2} \mathbf{F}(\mathbf{r}(t))\cdot \dot{\mathbf{r}}(t)\,dt
\end{equation}

Taking only the component of the vector field parallel to the curve is useful in particular also in physics. For instance, if the vector field represents a force pushing on a train riding on tracks of the shape of the curve (imagine wind blowing and the train having a sail rather than an engine) then only the component parallel to the curve matters. (This will be exactly the opposite, when we discuss surface integrals next; there we will be interested in computing the flux of the vector field through the surface, i.e. parallel to its normal vector and perpendicular to the tangent vectors of the surface). It is also possible, though less common, to define a line integral across a curve (also called flux integral), i.e. taking into account the component of the vector field perpendicular to the tangent vector of the curve, but we shall not do this here (the Wikipedia article on line integrals has more details).

Whether for a scalar function \(f\) and for a vector-valued function (vector field) \(\mathbf{F}\), the evaluated definite line integral always yields as a result a scalar (a single number), never a vector. The slight difference in its definition for the two types of functions takes care of that.

Line integrals have many applications in physics. However, as a side note, let us mention that they are not to be confused with path integrals in quantum mechanics (those are something else, they are integrals over all possible paths).

Surface Integrals

Surface integrals follow a similar scheme as line integrals, except that we will now have a two-parameter parametrization of the surface

\begin{eqnarray}
\mathbf{r}: [u_1,u_2]\times[v_1,v_2] &\rightarrow& S\\
(u,v)&\mapsto& \mathbf{r}(u,v)
\end{eqnarray}

where \([u_1, u_2\) and \([v_1,v_2]\) are closed intervals of \(\RR\)

We will not care much about the tangent vectors here, but instead about the normal vector, which is always perpendicular to the surface, and we want to give it a length equal to the area of a surface element in that parametrization (i.e. the area of the surface to which a unit square in the parameters (u,v) is mapped). It turns out that this normal vector can be constructed from the tangent vectors by use of the cross product (vector product):

\begin{equation}
\mathbf{n}(u, v) = \mathbf{r}_{,u} \times \mathbf{r}_{,v}
\end{equation}

where \(\mathbf{r}_{,u}=\frac{\partial \mathbf{r}}{\partial u}\) and \(\mathbf{r}_{,v}=\frac{\partial \mathbf{r}}{\partial v}\) are the tangent vectors of the surface computed from the partial derivatives of the parametrization of the surface with respect to the two parameters \(u\) and \(v\).

We can see that:

this vector \(\mathbf{n}\) is normal to the surface, because \(r_{,u}\) and \(r_{,v}\) are tangent vectors of the surface along the two parameter lines, and the resulting vector from a cross product is always perpendicular to the two vectors that form the cross product, and
the length of the vector resulting from a cross product is equal to the area of the parallelogram that is spanned by the two vectors forming the cross product, so the length of \(\mathbf{n}\) corresponds to the area of a surface element, as desired.

With these things in mind, we can now write down surface integrals for scalar functions \(f\) and vector fields \(\mathbf{F}\), analogously to how we have done for line integrals previously. We shall use the notation \(d\mathbf{S} = \mathbf{n}(u,v) dudv\) and \(dS = |\mathbf{n}(u,v)| dudv\).

For a scalar function, \(f: \RR^3\rightarrow\RR\), we simply multiply the function by the surface element in the integral, i.e. by the length of the normal vector:

\begin{equation}
\int_S f(\mathbf{r}) dS = \int_{v_1}^{v_2}\int_{u_1}^{u_2} f(\mathbf{r}(u,v))|\mathbf{n}(u,v)|\,dudv = \int_{v_1}^{v_2}\int_{u_1}^{u_2} f(\mathbf{r}(u,v))|\mathbf{r}_{‘u}\times\mathbf{r}_{,v}|\,dudv
\end{equation}

In particular, the area of the surface can be obtained by simply setting the function \(f(\mathbf{r})=1\) everywhere. And for the surface area given by the graph of a function \(\xi(u,v)\) we can further use \(\mathbf{r}=(u, v, \xi(u,v))\) to obtain after a little bit of math (which is left as an exercise for the reader)

\begin{equation}
A = \int_{v_1}^{v_2} \int _{u_1}^{u_2}\sqrt{(\xi_{,u})^2+(\xi_{,v})^2+1}\,dudv.
\end{equation}

For a vector field, \(\mathbf{F}:\RR^3\rightarrow\RR^3\), we take the dot product (scalar product) with the normal vector \(\mathbf{n}\) directly (similarly like we have done with the tangent vector \(\dot{\mathbf{r}}\) in the case of one-parameter curves in the previous section):

\begin{equation}
\int_S \mathbf{F}(\mathbf{r}) d\mathbf{S} = \int_{v_1}^{v_2}\int_{u_1}^{u_2} \mathbf{F}(\mathbf{r}(u,v))\cdot\mathbf{n}(u,v)\,dudv = \int_{v_1}^{v_2}\int_{u_1}^{u_2} \mathbf{F}(\mathbf{r}(u,v))\cdot (\mathbf{r}_{,u}\times\mathbf{r}_{,v})\,dudv
\end{equation}

This has the geometrical interpretation that the scalar product only takes into account the component of the vector field \(\mathbf{F}\) parallel to the normal vector \(\mathbf{n}(u,v)\), i.e. the component of \(\mathbf{F}\) which is perpendicular to the surface. The surface integral thus computes the flux of the vector field through the surface. The components of the vector field tangential to the surface at any given point do not contribute to the surface integral. (This is different from the line integral discussed earlier, where we specifically took into account only the component of \(\mathbf{F}\) parallel to the curve (i.e. parallel to its tangent vector) and ignored the perpendicular component.)

Just like with line integrals, the definite surface integral yields a single number as a result, not a vector, regardless whether we have computed it for a scalar function \(f\) or a vector-valued function \(\mathbf{F}\) (the difference how the surface integral is defined for the two kinds of function makes sure of that).

Important Fundamental Theorems of Multivariable Integral Calculus

In this section, we will review some of the fundamental theorems of integral multivariable calculus:

Gradient Theorem: Relates the difference of the value of a function between two points to the line integral of the gradient of this function along a path between these two points.
Divergence Theorem: Relates the flux of the vector field through the bounding surface of a closed volume to the volume integral of the divergence of this vector field over the same volume.
Stokes’ Theorem: Relates the line integral of a vector field over the boundary of a surface to the surface integral of the flux of the curl of this vector field through the surface.
Green’s Theorem: a special case of Stokes’ theorem in 2D (with surface being a flat part of 2D space).

The purpose of this section is to convey the idea of the theorems. We have explained how to calculate the individual sides of the theorems above, where we discussed volume, line, and surface integrals. We shall not state the exact mathematical prerequisites for the theorems and assume everything is “well behaved” as necessary, i.e. smooth when functions need to be differentiable, etc. (for more accurate prerequisites under which the theorems are valid, see the linked Wikipedia articles and any decent mathematical textbook on multivariable calculus.

Gradient Theorem

For a continuously differentiable scalar function, \(f:\RR^3\rightarrow\RR\), the gradient theorem states that the line integral along any arbitrary curve with starting point \(\mathbf{p}_1\in\RR^3\) and end point \(\mathbf{p}_2\in\RR^3\) is equal to the difference in value of the function \(f\) at the starting and end points:

\begin{equation}
\int_C \mathrm{grad} f(\mathbf{r})\cdot d\mathbf{r} = f(\mathbf{p_2})-f(\mathbf{p_1})
\end{equation}

As usual, \(\mathrm{grad} f = \nabla f\) denotes the gradient of the function \(f\) and is a vector field (see earlier section on differential multivariable calculus). In other words, the gradient theorem states that line integrals are path independent for vector fields \(\mathbf{F}=\nabla f\) which can be written as the gradient of a scalar function \(f\).

Not every vector field \(F\) can be written this way. Gradient fields always have zero curl, as can be quickly verified by seeing that \(\nabla \times (\nabla f) = \mathbf{0]}\) for any differentiable scalar function \(f\), since the partial derivatives commute (their order can be changed without changing the result).

In physics, such gradient fields are called conservative fields, and the function \(f\) is called a potential. An example for such a potential function would be the gravitational potential of the earth, from which the gravitational force field can be computed. It does not matter which path we take; if at the end we arrive at the starting location, we will have exactly the same potential energy as we had initially, i.e. we cannot gain or lose potential energy by taking different closed paths. This is not the case for fields which have non-zero curl, e.g. the magnetic field around a straight wire, through which an electric current is flowing.

Divergence Theorem

Let \(V\) be a closed volume with bounding surface \(S=\partial V\). The divergence theorem states that the volume integral over yhe divergence of a vector field \(\mathbf{F}\) equals the surface integral of the flux of the vector field through the bounding surface \(S\) of the volume \(V\):

\begin{equation}
\iiint_V \mathrm{div}\ \mathbf{F}\, dV = \oint_{S} \hspace{-11pt}\int \mathbf{F} \cdot \hat{\mathbf{n}}\, dS
\end{equation}

where \(\hat{\mathbf{n}}\) is the unit-length normal vector and \(dS\) the infinitesimal surface element. \(\hat{\mathbf{n}}\,dS\) corresponds to the \(\mathbf{n}(u,v)\,dudv\) we had earlier, when covering surface integrals. The divergence \(\mathrm{div}\ \mathbf{F} = \nabla \cdot \mathbf{F}\) is defined as usual (see earlier section on differential multivariable calculus).

Stokes' Theorem

Stokes’ theorem states that the surface integral of the curl of a smooth vector field, \(\mathrm{curl}\ \mathbf{F} = \nabla \times \mathbf{F}\), over a smooth, oriented surface is equal to the line integral of the vector field over the boundary of this surface. Let \(S\) be a surface and \(C=\partial S\) its boundary (a closed loop) parametrized by function \(\mathbf{r}\). Then

\begin{equation}
\iint_S (\nabla \times \mathbf{F}) \cdot \hat{\mathbf{n}} \,dS = \oint_C \mathbf{F} \cdot d\mathbf{r}
\end{equation}

In the same notation we used for surface and line integrals before, using \(\mathbf{F}:\RR^3\rightarrow\RR^3, (x, y, z)\mapsto\mathbf{F}(x,y,z)=(F_1(x, y, z), F_2(x,y,z), F_3(x,y,z))\) for the vector field in 3-dimensional space, \(\boldsymbol{\sigma}: [u_1,u_2]\times[v_1,v_2]\subset\RR^2\rightarrow\RR^3, (u,v)\mapsto \boldsymbol{\sigma}(u,v)\) for the parametrization of the surface \(S\) and \(\mathbf{r}:[t_1,t_2]\subset\RR\rightarrow\RR^3, t\mapsto \mathbf{r}(t)\) for the one-parameter curve \(C=\partial S\), we can write Stokes’ theorem also as

\begin{equation}
\int_{v_1}^{v_2}\int_{u_1}^{u_2} \mathbf{F}(\boldsymbol{\sigma}(u,v)) \cdot (\boldsymbol{\sigma}_{,u} \times \boldsymbol{\sigma}_{,v}) \, dudv = \int_{t_1}^{t_2} \mathbf{F}(\mathbf{r}(t)) \cdot \dot{\mathbf{r}}\,dt.
\end{equation}

This tells you explicitly how to calculate both sides, if you know the vector-valued functions \(\mathbf{F}\), \(\boldsymbol{\sigma}\) and \(\mathbf{r}\) (if you are not given these functions explicitly, you may have to figure them out from the geometrical picture of a problem that you are given).

This result is quite astonishing. Imagine we have a one-parameter closed curve \(C\) in 3-dimensional space. Stokes’ theorem implies that is does not matter which surface \(S\) we choose (how much it “bulges”), as long as its boundary is \(C\), and yet the surface integral of the curl of a vector field \(\mathbf{F}\) will be the same. Unlike the gradient theorem, which worked only for specific vector fields (such that can be written as the gradient of a scalar function), Stokes’ theorem holds for any vector field \(\mathbf{F}\). (In particular, curl-free vector fields will yield zero, consistent with the gradient theorem applied to a closed curve with same start and ending points.)

There is also a generalized Stokes’ theorem, which we do not want to get into here, where this idea is extended to integration of differential forms on manifolds in the framework of differential geometry.

Green's Theorem

Green’s theorem is a special case of Stokes’ theorem restricted to the 2-dimensional plane. As such, the notion of curl and vector product does not really exist (because there is not third dimension) and one resorts to formulating the theorem directly in components.

Let \(C\) be a piecewise smooth (i.e. piecewise differentiable), oriented, closed curve and \(S\) the region bounded by this curve. Then for any differentiable functions \(f:\RR^2\rightarrow\RR, (x,y)\mapsto f(x,y)\), and \(g:\RR^2\rightarrow\RR, (x,y)\mapsto g(x,y)\), the following equality holds:

\begin{equation}
\iint_S \left(\frac{\partial g}{\partial x} – \frac{\partial f}{\partial y}\right)\, dxdy = \oint_C \left(f \, dx + g \, dy\right)
\end{equation}

The line integration along curve \(C\) needs to be conducted anticlockwise.

On the lefthand side, performing the integration is merely a matter of determining the integration boundaries correctly to obtain surface \(S\). On the righthand side, the way to perform the integral is to use a parametrization of the curve \(\mathbf{r}(t)=(x(t),y(t))\) and write the integral as

\begin{equation}
\oint_C \left(f\, dx + g\, dy\right) = \int_{t_1}^{t_2} \left[f(x(t), y(t)) \left(\frac{dx}{dt}(t)\right) + g(x(t), y(t)) \left(\frac{dy}{dt}(t)\right)\right]\, dt
\end{equation}