While the answer to that question is beyond the scope of what I could put into a single blog post, understanding that answer requires some basic knowledge of one very important are of mathematics: linear algebra. It's important not just for the number-crunching tools it provides, but because it helps us think about things differently and know how to ask the right questions and know whether or not those questions have answers. Before I get too far ahead of myself lets first take a look at the video which motivated this post in the first place, which strings together 60 years of female actors from CBS (click the button in the lower right corner to watch it full-screen):
CBS - 60 Years of Actresses from Philip Scott Johnson on Vimeo.
More videos by Philip Scott Johnson (including CBS -
60 Years of Actors) can be found on vimeo and on youtube.
So how are these animations created?
If you replay part of the video, you'll notice that there are two things going on: 1) facial features in each image are stretched and rotated to line up with the facial features of the next image, and 2) there's a fade from one image to the next. The fade seems simple enough, so lets just focus on the first process of stretching and rotating facial features.
To get a feel for how this works, we first need to recognize what an image really is: just grid of pixels, each assigned a color value. The location of each pixel in the image can be described by what row and column it's in, giving the images a ready-made coordinate system for doing math. You can imagine it looking something like this, but with a smaller grid size...
[Source]
How might we move around pixels for the desired effect? First, the image is carved up into regions that will each have different rules for being twisted and stretched, with the added constraint that all the pieces fit nicely together as the image morphs. For faces, this includes finding the location and shapes of the eyes, mouth, nose, etc. in the image.
Having identified the facial features in the image, we need a define some rules for how these regions move and thereby form our transitional images. Thinking about pixels, we'd like to somehow define the curve tracing the path followed by an individual pixel (say the left eye of image 1) so that it ends up in a location that matches the correct position in the left eye in image 2. A naive way to do this is simply having pixels follow straight lines, or maybe a series of straight lines so the image preserves the integrity of certain facial features during the transitions.
The challenge then is to find a way to do this for each pixel, all at once and to do so as efficiently as possible on the computer. Having boiled down the problem to the world of straight lines, it's no big surprise which field of mathematics is going to be running the show...
Why Linear Algebra??
Once we have in mind this notion of straight lines between pixels, we can bust out the mathematical big guns: linear algebra. Why? Linear algebra is often described as the study of matrices and vectors but it's important to remember that it's all motivated by the goal of solving systems of linear equations. Often times, questions about linear equations can be answered by doing some straightforward mathematical operations, but only after reformulating those equations in the context of matrix and vector equations.
So here's how it works. Many problems can be boiled down to a system of linear equations with known outputs, known equations, but unknown inputs. Take for example, the n equations below, assuming we know the coefficients Cij and a set of output values Y1, Y2, ... Yn. Our question is often How can we solve for the corresponding input values X1, X2, ..., Xm that gave us these outputs?
Y1 = C11 X1 + C12 X2 + ... + C1m Xm
Y2 = C21 X1 + C22 X2 + ... + C2m Xm
...
Yn = Cn1 X1 + Cn2 X2 + ... + Cnm Xm
(Here the Cij subscripts just indicate which equation (row) and input variable (column) they belong to.)
Solving for each Xi by hand could be quite an undertaking (especially if the number of equations n and inputs m are large), but using linear algebra, it's actually pretty easy once you know the basics.
First, we need to bring our equations into the world of matrices and vectors. Defining the output column vector Y = (Y1, Y2, ..., Yn) and the input vector X = (X1, X2, ..., Xn) we can rewrite the above system of equations in terms of multiplication using matrix C, defined as the grid of coefficients
C11 C12 ... C1m
C21 C22 ... C2m
...
Cn1 Cn2 ... Cnm
Mathematicians have of course defined what it means to multiply matrices (and vectors) so that mathematically our n equations from above are equivalent to saying that vector Y is the product of matrix C times vector X, i.e.
Y = C X
Why go through the extra work to use matrix notation? Because it makes thinking about the problem so much easier. Honest!
How easy? Suppose for a moment Y, C, and X were just single numbers, and I asked you to find X where
6 = 2 X
No doubt you'd simply divide both sides by 2 (or multiply both sides by 1/2) and get that X = 3. That's about how easy it is to solve our n equations up above using basic linear algebra.
Defining the matrix inverse appropriately, we can solve for X (assuming C has inverse C-1) by simply finding the inverse of C and multiplying it by Y.
C-1 Y = X
While doing so by hand isn't trivial, finding the inverse of a matrix and multiplying is easy to do on a computer. For example, in R we can find the inverse of a matrix with a single line of code...
> C=matrix(c(0.25, -0.55, -0.12, 0.16), 2, 2); C [,1] [,2] [1,] 0.25 -0.12 [2,] -0.55 0.16 > Cinv = solve(C); Cinv [,1] [,2] [1,] -6.153846 -4.615385 [2,] -21.153846 -9.615385
Multiply by vector Y and you're done! Problem solved!
Practically, if we're doing all this on a computer with lots of equations (i.e. one for each pixel in an image) we want it to be fast and automated. Computers work very well with matrices, and as we saw above, can do everything all at once so we don't have to worry about dealing with one pixel at a time.
Conceptually, knowing some of linear algebra basics makes it's much easier to think about solving these problems using vectors and matrices rather than individual numbers and equations. Thinking about things in more abstract terms can therefore help guide our intuition for better ways to approach these kinds of problems, and guide the development of techniques that can be applied more broadly.
Moreover, we can generalize this way of looking at things to more complicated representations of linear transformations using things like tensors which likewise come in handy for challenging computational problems such as facial recognition and image manipulation.
For more on the math, you can play with interactive demonstrations of some simple rotations, contractions and expansions using matrices here, here, here, and here. Math resources include most linear algebra texts and various websites (e.g. here).
You can read more details on face morphing algorithm here and here. If you'd like to play with some images on your own, check out the Face of the Future site for some interactive demos.
[Hat tip to Hemant and the J-walk blog]
3 comments:
Wonderful learning guys I’m a fan of your website.
all vectors
You're shortchanging linear algebra by saying it is motivated by the goal of solving systems of linear equations!
Post a Comment