MIT OCW

18.02 Multivariable Calculus

Lagrange Multiplier

We've known the method to maximize or minimize a multivariable function, but what happens if there are some constraint? The critical point usually does not fulfill the constraint, so we have to maximize or minimize in another way.

Introduction

e.g. Find the closest point to origin on $xy=3$.
Idea We are going to minimize $f(x,y)=x^2+y^2$ subject to $xy=3$. These are the level surfaces of $f(x,y)$ and $g(x,y)=xy=3$.
multiplier-1.png
When the level surface $f(x,y)=c$ becomes smaller and smaller, until it has no intersection with the $xy=3$, then we have almost achieved the goal. We can find that at the maximum or minimum $f_0$, the level surface $f(x,y)=f_0$ is tangent to the level surface $g(x,y)=3$.
multiplier-2.png
It means that the gradient at $f(x,y)$ is parallel to $g(x,y)$, namely, $\nabla f=\lambda\nabla g$, where $\lambda$ is an unknown. So, we have such system of equations $$\left\{\begin{aligned}2x=\lambda y\\2y=\lambda x\\xy=3\end{aligned}\right.,$$ and we get the point is $(\sqrt3,\sqrt3)$ or $(-\sqrt3,-\sqrt3)$.

Lagrange Multiplier

Now we conclude the method, given a function $f(x,y)$ and a constraint $g(x,y)=c$, then to maximize or minimize the function, we need to solve the system of equations $\left\{\begin{aligned}\frac{\partial f}{\partial x}&=\lambda\frac{\partial g}{\partial x}\\\frac{\partial f}{\partial y}&=\lambda\frac{\partial g}{\partial y}\\g(x,y)&=c\end{aligned}\right..$

Geometry

Why it is correct? If there is no constraint $g(x,y)=c$, we just solve $f_x=f_y=0$, and it means when we move on a horizontal surface near the point, the function doesn't change.
Now we have a constraint, similarly, we just find a point where we move along the constraint surface that the function doesn't change too. So $\nabla_{\hat\mathbf{u}}f=0$, where $\hat\mathbf{u}$ is any direction on the constraint surface, in other words, $\nabla f\cdot\hat\mathbf{u}=0$. Since the gradient $\nabla g$ is also normal to the constraint surface, we have $\nabla f\parallel\nabla g$.

Application

e.g. Find a best solution to minimizing the surface area of a pyramid with a given triangular base $a_1,a_2,a_3$ and a given height $h$.
Solution We can plot the pyramid in $xy$ plane.
pyramid-base.png
And it looks like the following in 3D space.
pyramid.png
To determine the position of the vertex $D$ more easily, we project the vertex on the $xy$ plane.
pyramid-vertex.png
And take the distance from the projection to three sides $a_1,a_2,a_3$ as $d_1,d_2,d_3$, and we can express the surface area $S$ and the base area $A$: $$\begin{aligned}S=\frac12a_1\sqrt{d_1^2+h^2}+\frac12a_2\sqrt{d_2^2+h^2}+\frac12a_3\sqrt{d_3^2+h^2}\\A=\frac12a_1d_1+\frac12a_2d_2+\frac12a_3d_3\end{aligned}.$$
And apply the Lagrange Multiplier $$\left\{\begin{aligned}\frac{\partial S}{\partial d_1}=\lambda\frac{\partial A}{\partial d_1}\\\frac{\partial S}{\partial d_2}=\lambda\frac{\partial A}{\partial d_2}\\\frac{\partial S}{\partial d_3}=\lambda\frac{\partial A}{\partial d_3}\end{aligned}\right.,$$ and found that $d_1=d_2=d_3$, so the vertex is just above the incenter of the triangular base.

MIT OCW

18.02 Multivariable Calculus

Gradient

The chain rule for a multivariable function $f(x,y,z)$, where $x=x(t),y=y(t),z=z(t)$ is $$\frac{\mathrm{d}f}{\mathrm{d}t}=\frac{\partial f}{\partial x}\frac{\mathrm{d}x}{\mathrm{d}t}+\frac{\partial f}{\partial y}\frac{\mathrm{d}y}{\mathrm{d}t}+\frac{\partial f}{\partial z}\frac{\mathrm{d}z}{\mathrm{d}t}.$$
But now, when the gradient $\nabla f=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right)$ is introduced, the formula has another form $$\frac{\mathrm{d}f}{\mathrm{d}t}=\nabla f\cdot\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t},$$ where $\mathbf{r}(t)=(x(t),y(t),z(t))$.

Relationship with Level Surfaces

The gradient $\nabla f$ is normal to the level surface $F(x,y,z)=c$, also to the tangent plane of the level surface.
Proof Given a function $f(x,y,z)$, and take any curve $\mathbf{r}(t)=(x(t),y(t),z(t))$ on the level surface $f(x,y,z)=c$. According to the chain rule, $$\frac{\mathrm{d}f}{\mathrm{d}t}=\nabla f\cdot\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t},$$ where $f$ now is on the level surface, so $$\frac{\mathrm{d}f}{\mathrm{d}t}=0,$$ that is $$\nabla f\cdot\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t}=0.$$

Application

Tangent Planes

e.g. Solve the tangent plane for $x^2+y^2-z^2=4$ at $(2,1,1)$.
Solution 1 Consider a three-variable function $f(x,y,z)=x^2+y^2-z^2$, then it becomes a level surface $f=4$, since the gradient is normal to the tangent plane of the level surface, so the normal vector of the tangent plane is $$\mathbf{n}=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right)_{(2,1,1)}=(4,2,-2),$$ so we have a equation like $$4x+2y-2z=k,$$ and plug the point into it, we get the tangent plane $$4x+2y-2z=8.$$
Solution 2 Another point of view is at the total differential, near the point that is $$\mathrm{d}f=\frac{\partial f}{\partial x}_{(2,1,1)}\mathrm{d}x+\frac{\partial f}{\partial y}_{(2,1,1)}\mathrm{d}y+\frac{\partial f}{\partial z}_{(2,1,1)}\mathrm{d}z,$$ since we are moving on the level, thus $\mathrm{d}z$ is actually $0$, which gives us $$\frac{\partial f}{\partial x}_{(2,1,1)}\mathrm{d}x+\frac{\partial f}{\partial y}_{(2,1,1)}\mathrm{d}y+\frac{\partial f}{\partial z}_{(2,1,1)}\mathrm{d}z=0,$$ which means $$4(x-x_0)+2(y-y_0)-2(z-z_0)=0,$$ namely $$4(x-2)+2(y-1)-2(z-1).$$

Directional Derivative

Sometimes, we care not only the derivative on $\hat\mathbf{i}$ and $\hat\mathbf{j}$ but on some other direction $\hat\mathbf{u}$.
The directional derivative is defined as $$\nabla|_{\hat\mathbf{u}}f=\nabla f\cdot\hat\mathbf{u},$$ it's natural because near some point $$\frac{\mathrm{d}f}{\mathrm{d}s}=\nabla f\cdot\frac{\mathrm{d}r}{\mathrm{d}s},$$ where $s$ is a tiny segment on the direction $\hat\mathbf{u}$, that becomes $$\frac{\mathrm{d}f}{\mathrm{d}s}=\nabla f\cdot\hat\mathbf{u}.$$

Geometry

According to the directional derivative, we can write it in a geometric form $$\nabla|_{\hat\mathbf{u}}f=|\nabla f||\hat\mathbf{u}|\cos(\theta),$$ where $\theta$ is the angle between $\nabla f$ and $\hat\mathbf{u}$.

  • When $\theta=0$, the directional derivative is maximal, so the function increases fastest in the direction of $\nabla f$;
  • When $\theta=\pi$, the directional derivative is minimal, so the function decreases fastest in the opposite direction of $\nabla f$;
  • When $\theta=\frac\pi2$, the directional derivative is $0$, so the function does not change and stay on a level surface.
    So, the gradient points at the direction where the function has a max rate of change.

MIT OCW

18.02 Multivariable Calculus

Total Differentials

When we are considering a multivariable function, is there a way to hold changes of all components?
Well there's the total differential defined for $f(x,y,z)$ that is $$\mathrm{d}f=\frac{\partial f}{\partial x}\mathrm{d}x+\frac{\partial f}{\partial y}\mathrm{d}y+\frac{\partial f}{\partial z}\mathrm{d}z.$$

Notice I've been confusing the derivatives and differentials, but I'm now clearing the edge between them. In single variable situation, when apply "differential" to some function $f(x)$, we get actually another function $$\mathrm{d}f(x,\Delta x)\overset{\Delta}{=}f'(x)\Delta x.$$
We often write something like $$\mathrm{d}f=\boxed{}\mathrm{d}x$$ because according to definition $$\mathrm{d}(x,\Delta x)=\Delta x.$$

Chain Rule

If we have some multivariable function $f(x,y,z)$, where $x=x(t),y=y(t),z=z(t)$, we can get $$\frac{\mathrm{d}f}{\mathrm{d}t}=\frac{\partial f}{\partial x}\frac{\mathrm{d}x}{\mathrm{d}t}+\frac{\partial f}{\partial y}\frac{\mathrm{d}y}{\mathrm{d}t}+\frac{\partial f}{\partial z}\frac{\mathrm{d}z}{\mathrm{d}t}.$$

Validation for Product and Quotient Rule

Treat product of two functions $u=u(t),v=v(t)$ as a multivariable function $f(u,v)=uv$, and apply the chain rule $$\begin{aligned}\frac{\mathrm{d}f}{\mathrm{d}t}&=\frac{\partial f}{\partial u}\frac{\mathrm{d}u}{\mathrm{d}t}+\frac{\partial f}{\partial v}\frac{\mathrm{d}v}{\mathrm{d}t}\\&=v\frac{\mathrm{d}u}{\mathrm{d}t}+u\frac{\mathrm{d}v}{\mathrm{d}t}.\end{aligned}$$
The quotient rule can be validated similarly, omitted.

Chain Rule for Several Variables

Given a function $f(x,y)$ where $x=x(u,v),y=y(u,v)$, how to get $\frac{\partial f}{\partial u}$ and $\frac{\partial f}{\partial v}$ without plugging $x=x(u,v)$ and $y=y(u,v)$ in?
Let's calculate the total differential of $f$, that is $$\begin{aligned}\mathrm{d}f&=\frac{\partial f}{\partial x}\mathrm{d}x+\frac{\partial f}{\partial y}\mathrm{d}y\\&=\frac{\partial f}{\partial x}\left(\frac{\partial x}{\partial u}\mathrm{d}u+\frac{\partial x}{\partial v}\mathrm{d}v\right)+\frac{\partial f}{\partial y}\left(\frac{\partial y}{\partial u}\mathrm{d}u+\frac{\partial y}{\partial v}\mathrm{d}v\right)\\&=\left(\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}\right)\mathrm{d}u+\left(\frac{\partial f}{\partial x}\frac{\partial x}{\partial v}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial v}\right)\mathrm{d}v.\end{aligned}$$
And notice that $$\mathrm{d}f=\frac{\partial f}{\partial u}\mathrm{d}u+\frac{\partial f}{\partial v}\mathrm{d}v,$$ therefore we get $$\left\{\begin{aligned}\frac{\partial f}{\partial u}&=\frac{\partial f}{\partial x}\frac{\partial x}{\partial u}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial u}\\\frac{\partial f}{\partial v}&=\frac{\partial f}{\partial x}\frac{\partial x}{\partial v}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial v}.\end{aligned}\right.$$

MIT OCW

18.02 Multivariable Calculus

Basics of Multivariable Functions

Map

At a higher point of view, a single variable function maps some number to a number, and a multivariable function maps some $n$-tuple to a number. The essence of a function doesn't change.

Domain

Like a single variable function, a multivariable function have its domain, like $$f(x,y)=x^2+y^2$$ can be defined all the time, and $$f(x,y)=\sqrt{y}$$ is only defined when $y\geq 0$.

Graph

It's difficult to plot a multivariable function accurately, but the main idea doesn't change.
e.g. Plot the graph of $f(x,y)=-y$.
Consider the $yz$ plane, it's just a line through origin. And now we move the value of $x$, and it doesn't depend on the value $x$, so it will be a plane.
z=-y.png

e.g. Plot the graph of $f(x,y)=1-x^2-y^2$.
Consider the $yz$ plane, where $x=0$, it will be a parabola with the quation $z=1-y^2$. Similarly, the part on $xz$ plane is still a parabola with the quation $z=1-x^2$.
But if we consider the graph on the $xy$ plane, where $z=0$, we'll get a unit circle that is $x^2+y^2=1$.
1-x2-y2.png

Contour Plot

The process of a traditional plot is hard and not that easy to understand. Take the last example, we can draw a contour plot like the following
contour.png
which indicated where the function achieves the same value on the $xy$ plane.
We can feel the change of the function by observing the gap between curves and know how the function changes along some direction.

Partial Derivatives

We care about the change rate of a multivariable function, but it has several variables. What we do is just convert the multivariable function into a single variable function, that is treat other variables as constants.
Given a function $f(x,y)$, the partial derivative at point $(x_0,y_0)$ of $f$ with respect to $x$ is $$\frac{\partial f}{\partial x}(x_0,y_0)=\lim_{\Delta x\to 0}\frac{f(x_0+\Delta x, y_0)-f(x_0, y_0)}{\Delta x},$$ and generally the partial derivative of $f$ with respect to $x$ is a multivariable function that is $$f_x=\frac{\partial f}{\partial x}=\lim_{\Delta x\to 0}\frac{f(x+\Delta x, y)-f(x, y)}{\Delta x}.$$

Approximation

Like the linear approximation in single variable function, a multivariable function also has approximation. That is
$$\Delta f(x,y)\approx f_x\Delta x+f_y\Delta y.$$
Why is it correct?
Thinking about the following assumption $$\left\{\begin{aligned}f_x(x_0,y_0)=a\\f_y(x_0,y_0)=b\end{aligned}\right.,$$
and we have two tangent lines $$l_1:\left\{\begin{aligned}z&=z_0+a(x-x_0)\\y&=y_0\end{aligned}\right.,l_2:\left\{\begin{aligned}z&=z_0+b(y-y_0)\\x&=x_0\end{aligned}\right..$$
And these two lines can determine a tangent plane $z=z_0+a(x-x_0)+b(y-y_0)$.

Maxima / Minima

At local maxima or minima, we have $f_x(x_0,y_0)=0$ and $f_y(x_0,y_0)=0$, which means the point has a horizontal tangent plane.

Critical Points

Point $(x,y)$ is a critical point of $f$ if $f_x(x_0,y_0)=0$ and $f_y(x_0,y_0)=0$.
e.g. Find critical points for $f(x,y)=x^2-2xy+3y^2+2x-2y$.
Solution According to definition, we get $$\left\{\begin{aligned}f_x&=2x-2y+2\\f_y&=-2x+6y-2=0\end{aligned}\right.,$$ and we can solve the critical point that is $(-1,0)$.
But what's the minima?
Notice that $$f(x,y)=x^2-2xy+3y^2+2x-2y=(x-y+1)+2y^2-1\geq -1,$$ and we plug $(-1,0)$ into it, it's exactly $-1$, so we get local and global minima at $(-1,0)$.

Saddle Points

Focus on the condition $f_x(x_0,y_0)=0$ and $f_y(x_0,y_0)=0$, it's neccessary but not sufficient in terms of maxima or minima. Because we could have some examples that $f_x(x_0,y_0)=0$ and $f_y(x_0,y_0)=0$ where the function doesn't have local minima or maxima at $(x_0,y_0)$.
For example, the function $f(x,y)=x^2-y^2$.
x2-y2.png
It's easy to validate that $f_x(0,0)=0,f_y(0,0)=0$, anyhow, it doesn't achieve maxima or minima at the point, these points are called saddle points.

Application

Least-square Interpolation

When we are given a lot of discrete data (often seen in scientific experiments), we want to find some line to approximate these data, and enable us to find the relation between variables and predict their value.
Considering the data set $(x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)$ and the target line $y=ax+b$, how to optimize the line in order that the line become best for discrete data? Actually, it's a minima problem.
As a convention, the indicator we use here is the offset squared, for each data $(x_i,y_i)$ that is $(ax_i+b-y_i)^2$, so the function we'd like to analyze is $$D(a,b)=\sum_1^n (ax_i+b-y_i)^2,$$ whose minima we want to find.
We take the partial derivative with respect to $a,b$
$$\frac{\partial D}{\partial a}=\sum_1^n [2(ax_i+b-y_i)x_i],\frac{\partial D}{\partial b}=\sum_1^n [2(ax_i+b-y_i)],$$ and to get the critical point, we solve the system $$\left\{\begin{aligned}&\frac{\partial D}{\partial a}=0\\&\frac{\partial D}{\partial b}=0\end{aligned}\right.,$$ namely $$\left\{\begin{aligned}&\left(\sum_1^nx_i^2\right)a+\left(\sum_1^nx_i\right)b=\sum_1^nx_iy_i\\&\left(\sum_1^nx_i\right)a+nb=\sum_1^ny_i\end{aligned}\right..$$

Validation by Second Derivatives

Special Cases

Completing Squares

Consider the following function behavior at origin $$f(x,y)=ax^2+bxy+cy^2,$$ and we try to complete the square here to judge whether the function can achieve maxima or minima at origin, namely $$\begin{aligned}f(x,y)&=a\left(x^2+\frac{b}{a}xy\right)+cy^2\\&=a\left(x+\frac{b}{2a}y\right)^2-\frac{b^2}{4a}y^2+cy^2\\&=a\left(x+\frac{b}{2a}y\right)^2+\frac{4ac-b^2}{4a}y^2\\&=\frac1{4a}\left[4a^2\left(x+\frac{b}{2a}y\right)^2+(4ac-b^2)y^2\right].\end{aligned}$$
It's obvious that the origin is a critical point because $$\frac{\partial f}{\partial x}=2ax+by,\frac{\partial f}{\partial y}=2cy+bx.$$ And plug the origin into these, easy to get they are $0$, the problem here is can $f(0,0)$ be the true local maxima or minima?
Observe the completed part, the signs before two squared terms are interesting, $4a^2$ is always positive, and $4ac-b^2$ is not determined.
If $4ac-b^2>0$, then the parts in bracket are two non-negative terms, and have to be non-negative. And $f(0,0)=0$, then it has local maxima or minima (depending on the sign of $\frac1{4a}$, namely $a$).
If $4ac-b^2=0$, the function will depends on only one variable $x$, and the behavior here cannot be concluded. In this special case, it will be local maxima or minima, where any point $(0,t)$ will achive.
If $4ac-b^2<0$, then the parts in bracket are one non-negative, the other non-positive, so it's possible to get either positive or negative value. So $f(0,0)$ cannot be maxima or minima, it's a saddle point.

Homogenous Equation

We find that quadratic discriminant $b^2-4ac$ occur in the analysis above, is that a coincidence?
Still the example above, notice that each term is quadratic, so $$f(x,y)=y^2\left(a\left(\frac{x}{y}\right)^2+b\left(\frac{x}{y}\right)+c\right),$$ and near the origin, $\frac{x}{y}$ can be any number.
If the equation $at^2+bt+c=0$ has two roots, namely, $b^2-4ac>0$, it means that the function can achieve two sides of $0$, and it keeps $0$ on some direction ($\frac{x}{y}$ indicates the direction when approaching the origin).
quad-root-up.pngquad-root-down.png
If the equation $at^2+bt+c=0$ has only one root, namely, $b^2-4ac=0$, actually it can't be concluded, anyhow, it indicates that on some direction, the function keeps its value at origin.
If the equation $at^2+bt+c=0$ has no roots, namely, $b^2-4ac<0$, it means that $f(0,0)=0$ will be exactly local maxima or minima (depending on the sign of $a$), because near the origin, the function value can't be zero.
quad-noroot.png

Second Derivative Test

According to multivariable quadratic Taylor's formula $$\Delta f\approx f_x(x-x_0)+f_y(y-y_0)+\frac12f_{xx}(x-x_0)^2+f_{xy}(x-x_0)(y-y_0)+\frac12f_{yy}(y-y_0)^2,$$
we can have a general test on other functions.
To test a critical point $(x_0,y_0)$ of $f$, let $$A=f_{xx}(x_0,y_0),B=f_{xy}(x_0,y_0),C=f_{yy}(x_0,y_0),$$

  • if $AC-B^2>0$
    • if $A>0$, we get local minima at $(x_0,y_0)$
    • if $A<0$, we get local maxima at $(x_0,y_0)$
  • if $AC-B^2<0$, then it's a saddle point
  • if $AC-B^2=0$, no conclusion

MIT OCW

18.02 Multivariable Calculus

Motion

Recall the cycloid, we use a parametric equation to describe some point $P(x(t),y(t),z(t))$. The vector $\mathbf{r}(t)=\mathbf{OP}=(x(t),y(t),z(t))$ is called position vector, and we can learn about more details when analysing the vector.
Take a cycloid where $t=\theta$ as an example that is $$\mathbf{r}(t)=(t-\sin(t),1-\cos(t)),$$ though it hasn't the third component, but it doesn't lose generality.

Velocity

We care about not only the rate at some point but also the direction, in multivariable calculus, a vector can be diffentiated. To get the velocity, we just differentiate the position vector $$\mathbf{v}=\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t}.$$ In this cycloid case, $$\mathbf{v}=\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t}=\left(\frac{\mathrm{d}x}{\mathrm{d}t},\frac{\mathrm{d}y}{\mathrm{d}t}\right)=(1-\cos(t),\sin(t)).$$

Speed

In some applications, we only care about the rate. The magnitude of velocity becomes speed, in this example that is $$|\mathbf{v}|=\sqrt{(1-\cos(t))^2+\sin^2(t)}=\sqrt{2-2\cos(t)}.$$

Acceleration

Image you are driving, and you take a tight turn without change of the speed, in the view of single variable calculus, you do not have acceleration. Anyhow, in multivariable calculus, your change of direction is also taken into consideration. Similarly defined like velocity, we have the acceleration $$\mathbf{a}=\frac{\mathrm{d}\mathbf{v}}{\mathrm{d}t}.$$
In this cycloid case, $$\mathbf{a}=\frac{\mathrm{d}\mathbf{v}}{\mathrm{d}t}=(\sin(t),\cos(t)).$$

Arc Length

If we add velocity continuously, we can get a vector from the start point to the end point. How can we get the arc length during the process? A good idea is to add speed continuously, because speed doesn't have direction. So we have $$\frac{\mathrm{d}s}{\mathrm{d}t}=|\mathbf{v}|.$$
e.g. Length of an arch of cycloid is $\int_0^{2\pi}\sqrt{2-2\cos(t)}\mathrm{d}t$

Trajectory Unit Tangent Vector

Trajectory unit tangent vector is defined as $\hat{\mathbf{T}}=\frac{\mathbf{v}}{|\mathbf{v}|}$.
And we notice that $$\mathbf{v}=\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}t}=\mathbf{v}=\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}s}\frac{\mathrm{d}s}{\mathrm{d}t},$$ where $\frac{\mathrm{d}s}{\mathrm{d}t}$ is actually $|\mathbf{v}|$, so we have $$\hat{\mathbf{T}}=\frac{\mathrm{d}\mathbf{r}}{\mathrm{d}s}.$$

Second Law of Kepler

[TO BE CONTINUED]