« funbyjohn.com — 17 august 2017

2D game camera using inverse affine transformation

First off, let's see what we want to be able to make, by looking at two demos. This first demo shows off the
placeCameraTarget()
function, in which you supply a position, a zoom level and a rotation (in radians.)
Rotation Scale
This second demo shows off the
placeCameraFocusRect()
function, in which you supply a rectangle that need to be visible, and a padding value (how much space there should be in the edges.)
Hey black & white, help me demo this! Thanks!
The code for
placeCameraTarget()
and
placeCameraFocusRect()
can be found in the bottom of the article. Now we're going to look at the mathematics behind the camera.

Vectors and matrices

Here is a very quick recap. A vector $\mathbf{v} \in \mathbb{R}^2$ is an ordered list of two numbers $(v_1, v_2)$. However, we usually write them vertically like so: $\mathbf{v} = \begin{pmatrix}v_1 \\ v_2\end{pmatrix}$. If you don't know what $\in$ or $\mathbb{R}$ means, read Set theory basics, it's a short read. $ \\ \\ $ A $2 \times 2$ matrix $A$ is a grid of numbers. It may look like so: $A = \begin{pmatrix}a & b \\ c & d\end{pmatrix}$, where $a,b,c,d \in \mathbb{R}$. $ \\ \\ $ We define matrix and vector multiplication now. Let $\mathbf{v} \in \mathbb{R}^2$ and $A$ be a $2 \times 2$ matrix with entries in $\mathbb{R}$. Then: $$A\mathbf{v} = \begin{pmatrix}a & b \\ c & d\end{pmatrix}\begin{pmatrix}v_1 \\ v_2\end{pmatrix} = \begin{pmatrix}a \cdot v_1 + b \cdot v_2 \\ c \cdot v_1 + d \cdot v_2\end{pmatrix}$$ The product between a matrix and a vector is a vector. $ \\ \\ $ Now, we define the product between two $2 \times 2$ matrices. Let $A$ and $B$ be $2 \times 2$ matrices. We define their product as: $$AB = \begin{pmatrix}a_{11} & a_{12} \\ a_{21} & a_{22}\end{pmatrix}\begin{pmatrix}b_{11} & b_{12} \\ b_{21} & b_{22}\end{pmatrix} = \begin{pmatrix}a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{12} + a_{12}b_{22} \\ a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{12} + a_{22}b_{22}\end{pmatrix}$$ The product between two matrices is another matrix. The algorithm here is: if you want to find for example the top left entry in the product, then you sum up the products between the left matrix's top row and the right matrix's left column. That is, if you want to find the $n$th row and $m$th column in a matrix product, you sum up the products between the $n$th row in the left matrix, and the $m$th column in the right matrix. Matrix and vector multiplication is actually just matrix multiplication in which the vector is treated as a matrix with one column and as many rows as there are numbers in the vector.

Affine transformations

Let $\mathbf{x}, \mathbf{b} \in \mathbb{R}^2$ and $A$ be a $2 \times 2$ matrix with entries $a_{ij} \in \mathbb{R}$. Then a map $$\mathbf{x} \mapsto A\mathbf{x} + \mathbf{b}$$ is called an affine transformation. It consists two parts: a linear transformation (the matrix $A$), and a translation (the vector $\mathbf{b}$.) A linear transformation includes things like rotation, skewing, shearing and scaling. Translation simply means offsetting. Now we're interested in finding an inverse mapping, that is $A\mathbf{x} + \mathbf{b} \mapsto \mathbf{x}$. $\\ \\$ If an explicit formula is given like: $$f(\mathbf{x}) = A\mathbf{x} + \mathbf{b}$$ then if the matrix $A$ is invertible, $f$ has an inverse: $$f^{-1}(\mathbf{x}) = A^{-1}(\mathbf{x} - \mathbf{b})$$ Here is a proof: assume that $A$ is invertible with inverse $A^{-1}$, which means that $A^{-1}A = AA^{-1} = I$, then $f$ has an inverse $f^{-1}(\mathbf{x}) = A^{-1}(\mathbf{x} - \mathbf{b})$ because: $$\begin{align} f^{-1}(f(\mathbf{x})) &= f^{-1}(A\mathbf{x} + \mathbf{b}) \\ &= A^{-1}((A\mathbf{x} + \mathbf{b}) - \mathbf{b}) \\ &= A^{-1}(A\mathbf{x} + \mathbf{b} - \mathbf{b}) \\ &= A^{-1}(A\mathbf{x}) \\ &= A^{-1}A\mathbf{x} \\ &= \mathbf{x} \end{align}$$ and $$\begin{align} f(f^{-1}(\mathbf{x})) &= f(A^{-1}(\mathbf{x} - \mathbf{b})) \\ &= A(A^{-1}(\mathbf{x} - \mathbf{b})) + \mathbf{b} \\ &= A(A^{-1}\mathbf{x} - A^{-1}\mathbf{b}) + \mathbf{b} \\ &= AA^{-1}\mathbf{x} - AA^{-1}\mathbf{b} + \mathbf{b} \\ &= \mathbf{x} - \mathbf{b} + \mathbf{b} \\ &= \mathbf{x} \end{align}$$ and so for every $\mathbf{x} \in \mathbb{R}^2$ we have that $f(f^{-1}(\mathbf{x})) = f^{-1}(f(\mathbf{x})) = \mathbf{x}$.

So what can $A$ look like?

$A$ can contain scaling, shearing, reflection, rotations, etc. Here we'll see how this looks like for some examples. Note that in these examples the y axis goes from bottom to top, which is the standard in mathematics. Many game engines and graphic engines use the opposite: y axis going from top to bottom.
Action Effect Matrix Description
Identity $A = \begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}$ This matrix has the property that for every $\mathbf{v} \in \mathbb{R}^2$ that $A\mathbf{v} = \mathbf{v}$.
Scale $A = \begin{pmatrix}w & 0 \\ 0 & h\end{pmatrix}$ This transformation scales by a factor of $w$ on the x-axis, and a factor of $h$ on the y-axis. If $w = h$ then $A\mathbf{v} = w\mathbf{v}$ for all $\mathbf{v} \in \mathbb{R}^2$.
Shear $A = \begin{pmatrix}1 & k \\ 0 & 1\end{pmatrix}$ One example of a shear. Here the x-axis stays the same, but the y-axis is moved horizontally, depending on the value of $k \in \mathbb{R}$.
Reflection on the y-axis $A = \begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}$ Flips the y-axis, creating a reflection along the y-axis. This follows from the fact that this is just scaling the y-axis with a negative value, effectively flipping it.
Rotation $A = \begin{pmatrix}\textrm{cos} \, \theta & -\textrm{sin} \, \theta \\ \textrm{sin} \, \theta & \textrm{cos} \, \theta\end{pmatrix}$ Rotates $\theta$ radians counter-clockwise.
A very important intution we can get from looking at the above matrices is that the left column represents the first axis, and the right column represents the second axis. That is, the arrows on the images. Thus in the identity matrix, you see the normal x-axis and y-axis represented. In any other matrix these axes are changed. This is a very important and powerful concept in linear algebra. $ \\ \\ $ Now, notice how $A$ only contains a single operation here. It is possible to combine two operations simply by multiplying the two operations' matrices. Keep in mind though that if $A$ and $B$ are two $2 \times 2$ matrices, then it does not hold that $AB = BA$ for every combination of matrices $A, B$. Therefore the order in which the compositon of transformations is made matters. Let's look at an example. $\\ \\$ Let $$\mathbf{v} = \begin{pmatrix}3 \\ 3\end{pmatrix} \in \mathbb{R}^2$$ $$T_{\textrm{reflect}} = \begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}$$ $$T_{\textrm{rotate}} = \begin{pmatrix}\textrm{cos} \, \frac{\pi}{4} & -\textrm{sin} \, \frac{\pi}{4} \\ \textrm{sin} \, \frac{\pi}{4} & \textrm{cos} \, \frac{\pi}{4}\end{pmatrix} = \begin{pmatrix}\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\end{pmatrix}$$ Here $T_{\textrm{reflect}}$ is a transformation that reflects a vector on the y-axis, and $T_{\textrm{rotate}}$ is a transformation that rotates a vector counter-clockwise 45 degrees about the origin ($\frac{\pi}{4}$ radians is 45 degrees.) Now $$T_{\textrm{reflect}}T_{\textrm{rotate}} = \begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix}\begin{pmatrix}\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\end{pmatrix} = \begin{pmatrix}\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{pmatrix}$$ and $$T_{\textrm{rotate}}T_{\textrm{reflect}} = \begin{pmatrix}\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\end{pmatrix}\begin{pmatrix}1 & 0 \\ 0 & -1\end{pmatrix} = \begin{pmatrix}\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{pmatrix}$$ As you can see, $T_{\textrm{rotate}}T_{\textrm{reflect}} \neq T_{\textrm{reflect}}T_{\textrm{rotate}}$. Now let's see them applied to $\mathbf{v}$: $$T_{\textrm{reflect}}T_{\textrm{rotate}}\mathbf{v} = \begin{pmatrix}\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{pmatrix}\begin{pmatrix}3 \\ 3\end{pmatrix} = \begin{pmatrix}\frac{3}{\sqrt{2}} - \frac{3}{\sqrt{2}} \\ -\frac{3}{\sqrt{2}} - \frac{3}{\sqrt{2}}\end{pmatrix} = \begin{pmatrix}0 \\ -\frac{6}{\sqrt{2}}\end{pmatrix}$$ and $$T_{\textrm{rotate}}T_{\textrm{reflect}}\mathbf{v} = \begin{pmatrix}\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\end{pmatrix}\begin{pmatrix}3 \\ 3\end{pmatrix} = \begin{pmatrix}\frac{3}{\sqrt{2}} + \frac{3}{\sqrt{2}} \\ \frac{3}{\sqrt{2}} - \frac{3}{\sqrt{2}}\end{pmatrix} = \begin{pmatrix}\frac{6}{\sqrt{2}} \\ 0\end{pmatrix}$$ It turns out when applying a sequence of transformations, they're being done right to left, so you can imagine an arrow above them like this: $$\overleftarrow{T_{\textrm{reflect}}T_{\textrm{rotate}}}\mathbf{v}$$ That is, first you rotate $\mathbf{v}$ 45 degrees about the origin, and then you reflect it on the y-axis. So in general, if you apply transformations $T_{1}T_{2}\cdots{T_{n}}$, they're actually applied in reverse order, $T_n, T_{n-1}, ..., T_1$, of how you would intuitively assume they would. As the example above illustrates, sometimes this order can be very important. $ \\ \\ $ In making our camera, we'll use this reverse order property to perform both scaling and rotation in the proper order in the
placeCameraTarget()
function.

Making a camera

Now the question is, how can we use an affine transformation to make a game camera. The big idea is that our camera is an actual entity in our scene. It has a position, a size, an orientation, and so on. The question is, how can we move what is inside the camera parallelogram, to the square that is what the user is looking at. $ \\ \\ $ Our camera will consist of 3 things: The shape and position of the camera is illustrated by the grey box in the two demos in the start of the article, and the viewport is the size of each of the two screens displayed. Now, the camera itself is represented as the affine transformation $f(\mathbf{x}) = A\mathbf{x} + \mathbf{b}$, where $\mathbf{x}$ is a vector we want transformed. Now, the inverse of this transformation is $f^{-1}(\mathbf{x}) = A^{-1}(\mathbf{x} - \mathbf{b})$ as we proved earlier. The idea is to find out where a point in our game scene should be in our viewport, and so we apply the inverse transformation on that point to find out where. $ \\ \\ $ The points that are in our viewport (and thus is visible on screen!) are the points $\mathbf{x}$ that satisfy $f^{-1}(\mathbf{x}) \in [-0.5, 0.5] \times [-0.5, 0.5]$. Now when we got such a point $\mathbf{y} = f^{-1}(\mathbf{x})$, we want to transform it from $[-0.5, 0.5] \times [-0.5, 0.5]$ to $[0, v_w] \times [0, v_h]$, since we're dealing with actual pixels when placing things. This is very easy: $$\begin{pmatrix} \frac{v_w}{2} + y_1 \cdot v_w \\ \frac{v_h}{2} + y_2 \cdot v_h \end{pmatrix}$$ is the position on screen, where $\mathbf{y} = \begin{pmatrix}y_1 \\ y_2\end{pmatrix}$ is the same one as above. This transformation is seen in the return statement of the
applyCamera()
function. $ \\ \\ $ That is all the ingredients for a simple but powerful camera. It supports scaling, shearing (if for some reason you would use that), rotation, and so on. I won't go into detail about how the two camera types in the demos in the start are implemented, but I'll leave you to look at the code to discover for yourself, you should know have some more knowledge as to what's actually going on.

The code

It's very easy to use the camera supplied in the code below. First, create the camera with the size of your game's window/screen/etc.:
  let camera = createCamera(1920, 1080);
to update the camera, use one of these functions every frame/every time the camera changes:
  placeCameraTarget(camera, target, zoom, rotation);
  placeCameraFocusRect(camera, rect, padding);
where these produce the results seen in the two demos above (first demo is Target, second demo is FocusRect.) To use Target: the target argument is a vector which will be the camera's position, the zoom is a number where 1 is the size of the viewport, and anything else is zoom times viewport. Rotation is in radians. To use FocusRect: Supply a rectangle (in my code it's a matrix), where the two first numbers of the matrix represent the first point, and the third and fourth numbers represent a second point. Padding is how much spacing there should be outside the rectangle. $\\ \\$ To transform a point, use:
  let somePoint = createVector(300, 150);
  let screenLocation = applyCamera(camera, somePoint);
now
screenLocation
describes where on your screen
somePoint
is in your scene. $\\ \\$ Here's the full code for the camera system:
	
         function createVector(v1, v2) {
             return [v1, v2];
         }

         function copyVector(v) {
             let [v1, v2] = v;
             return [v1, v2];
         }

         function addVectors(u, v) {
             return [u[0] + v[0], u[1] + v[1]];
         }

         function subVectors(u, v) {
             return [u[0] - v[0], u[1] - v[1]];
         }

         function createMatrix(a, b, c, d) {
             return [a, b, c, d];
         }

         function multiplyMatVec(A, v) {
             let [a, b, c, d] = A;
             let [v1, v2] = v;
             /*
                 {a b} {v1}
                 {c d} {v2}
             */
             return createVector(
                 a * v1 + b * v2,
                 c * v1 + d * v2
             );
         }

         function multiplyMatMat(A, B) {
             let [a, b, c, d] = A;
             let [e, f, g, h] = B;
             /*
                 {a b} {e f}
                 {c d} {g h}
             */
             return createMatrix(
                 a * e + b * g, a * f + b * h,
                 c * e + d * g, c * f + d * h
             );
         }

         function computeInverse(A) {
             let [a, b, c, d] = A;
             let det = a * d - b * c;
             if (det == 0)
                 return null;
             let invDet = 1 / det;
             return createMatrix(
                 d * invDet, -b * invDet,
                 -c * invDet, a * invDet
             );
         }

         function createCamera(viewport) {
             return {
                 viewport: viewport,
                 shape: createMatrix(viewport[0], 0, 0, viewport[1]),
                 position: createVector(0, 0),
                 inverse: null
             };
         }

         function updateCamera(camera) {
             camera.inverse = computeInverse(camera.shape);
             return camera.inverse != null;
         }

         function applyCamera(camera, point) {
             if (camera.inverse == null) {
                 if(updateCamera(camera)) {
                     return applyCamera(camera, point);
                 } else {
                     return null;
                 }
             }
             let [viewportW, viewportH] = camera.viewport;
             let [homogX, homogY] = multiplyMatVec(camera.inverse, subVectors(point, camera.position));
             return createVector(
                 0.5 * viewportW + homogX * viewportW,
                 0.5 * viewportH + homogY * viewportH
             );
         }

         function placeCameraTarget(camera, target, zoom, rotation) {
             let [viewportW, viewportH] = camera.viewport;
             
             // first apply size, then rotate.
             // <--------------
             // rotation * size
             let cos0 = Math.cos(rotation);
             let sin0 = Math.sin(rotation);
             camera.shape = multiplyMatMat(
                 createMatrix(
                     cos0, -sin0,
                     sin0, cos0
                 ),
                 createMatrix(
                     zoom * viewportW, 0,
                     0, zoom * viewportH
                 )
             );
             camera.position = copyVector(target);

             updateCamera(camera);
         }

         function placeCameraFocusRect(camera, rect, padding) {
             if (padding == undefined) {
                 padding = 0;
             }

             let [x1, y1, x2, y2] = [Math.min(rect[0], rect[2]) - padding,
                                     Math.min(rect[1], rect[3]) - padding,
                                     Math.max(rect[0], rect[2]) + padding,
                                     Math.max(rect[1], rect[3]) + padding];
             let [viewportW, viewportH] = camera.viewport;
             let viewAspectRatio = viewportW / viewportH;

             let w = x2 - x1;
             let h = y2 - y1;
             let focusAspectRatio = w / h;

             if (focusAspectRatio >= viewAspectRatio) {
                 let newSize = w * (1 / viewAspectRatio);
                 let diff = 0.5 * (newSize - h);
                 y1 -= diff;
                 y2 += diff;
             } else {
                 let newSize = h * viewAspectRatio;
                 let diff = 0.5 * (newSize - w);
                 x1 -= diff;
                 x2 += diff;
             }

             let newW = x2 - x1;
             let newH = y2 - y1;

             camera.shape = createMatrix(
                 newW, 0,
                 0, newH
             );

             camera.position = createVector(
                 x1 + 0.5 * newW,
                 y1 + 0.5 * newH
             );

             updateCamera(camera);
         }