banner.gif
On-Line Computer Graphics Notes
THE VIEWING TRANSFORMATION


Overview

One of the most important operations in rendering is the projection of a a three-dimensional scene onto a two-dimensional screen from an arbitrary camera position. A fundamental part of this operation is the specification of a viewing transformation, a $ 4 \times 4$ matrix that transforms a region of space into image space.

pdficonsmall.gif For a pdf version of these notes look here.

If you wish to ``cut to the chase'' and view the matrix directly, look here


Specification of the Parameters

We will assume that the user has defined a camera transform which can be applied to the object in the Cartesian frame. This transform will convert the coordinates of the object into the local coordinate system of the camera. Assuming this, we can assume that we are viewing the object from the origin of the camera frame and the scene has been transformed to lie on the negative $ w$-axis of the frame. We also assume that the user has defined the following parameters:

A three-dimensional view of the camera and its viewing space is given in the following figure.

\includegraphics {figures/camera-at-the-origin-1}

A side view of the this space, with the $ u$-axis coming out of the paper, is given in the following figure. Note the field-of-view angle $ \alpha$.

\includegraphics {figures/camera-at-the-origin-2}

The specification of $ \alpha$ forms a viewing volume in the shape of a pyramid with the camera (placed at the origin) at the apex of the pyramid and the negative-$ w$ axis forming the axis of the pyramid. This pyramid is commonly referred to as the viewing pyramid.

The specification of the near and far planes forms a truncated viewing pyramid giving a region of space that contains the portion of the scene which is the ``center of attention'' of the camera. The viewing transform, defined below, will transform this truncated viewing pyramid onto the image space volume $ -1 \leq u,v,w \leq 1$.


The Viewing Transformation Matrix

Given the specification of the parameters $ (\alpha,n,f)$, we define a transformation that can be applied to all elements of a scene and takes the truncated viewing volume (bounded by the viewing pyramid and the planes $ z=-n$ and $ z=-f$) to the cube $ -1 \leq u,v,w \leq 1$. This transformation is given by

\begin{displaymath}
A _{\alpha, n, f} \: = \:
\left[
\begin{array}{cccc}
\cot{\f...
...{f-n} & -1 \\
0 & 0 & \frac{2fn}{f-n} & 0
\end{array}\right]
\end{displaymath}

The transformation $ A _{\alpha, n, f}$ is commonly referred to as the viewing transformation and is developed below.


Development of the Matrix

The viewing transformation $ A_{\alpha, n, f}$ is not a combination of simple translations, rotations, scales or shears: its development is more complex. First, we motivate the development of the projection portion of the matrix and then apply this knowledge to the construction of the actual matrix.

Motivation

As motivation, consider the case when the camera is at the origin, the viewing direction is along the negative $ w$-axis, and points are to be projected along the line that passes through both the origin and the point, onto a plane defined by $ w=-d$. The following figure illustrates the projection of a point $ (u,v,w)$ onto the plane.

\includegraphics {figures/viewing-model-1a}

By a similar triangle argument

$\displaystyle v' \: = \: \left(\frac{d}{-w}\right) v
$

(we note that the distance is $ -w$ since the $ w$ coordinate of $ (u,v,w)$ is negative) and similarly

$\displaystyle u' \: = \: \left(\frac{d}{-w}\right) u
$

A transformation that projects

$\displaystyle (u,v,w) \longrightarrow \:
\left(
\frac{d u}{-w}, \frac{d v}{-w}...
...\right)
\: = \:
\left(
\frac{d u}{-w}, \frac{d v}{-w}, \frac{d w}{-w}
\right)
$

can be expressed in 4-dimensional homogeneous coordinates by

$\displaystyle (u,v,w,1) \: \longrightarrow \: ( d u, d v, d w, -w )
$

which can be expressed in a matrix form by

\begin{displaymath}
P_d \: = \:
\left[
\begin{array}{cccc}
d & 0 & 0 & 0 \\
0 ...
... & 0 \\
0 & 0 & d & -1 \\
0 & 0 & 0 & 0
\end{array}\right]
\end{displaymath}

since

$\displaystyle \left[
\begin{array}{cccc}
u & v & w & 1
\end{array}\right]
\left...
...ht]
\: = \:
\left[
\begin{array}{cccc}
d u & d v & d w & -w
\end{array}\right]
$

So the projection induces a unique fourth column in the matrix. When the matrix is applied to a point $ (u,v,w)$ it returns the distance of the $ w$ coordinate from the $ xy$ plane ($ -w$ because $ w$ is negative). Since we divide by the $ w$ coordinate, the result of the operation is inversely proportional to the distance of the point from the $ xy$ plane.


Now, consider the case where $ n$, $ f$ and the field of view $ \alpha$ are present as parameters, and it is necessary to transform the viewing pyramid defined by the angle $ \alpha$ and the planes $ w=-n$ and $ w=-f$ into the cube $ -1 \leq u,v,w \leq 1$.

\includegraphics {figures/viewing-model-1b}

To transform the truncated viewing pyramid to the cube, we will start with a transformation $ P$ of the form

\begin{displaymath}
P \: = \:
\left[
\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & ...
...
0 & 0 & a & -1 \\
0 & 0 & b & 0
\end{array}\right] \text{.}
\end{displaymath}

Here we have incorporated the projection in the fourth column, and have recognized that we must scale the resulting $ z$ values (the $ z$ values in the pyramid range between $ -n$ and $ -f$ and the result in image space must lie between $ 1$ and $ -1$) and must translate the center truncated pyramid over to the origin. The values $ a$ and $ b$ are chosen so that the face of the truncated pyramid defined by the near plane ($ z=-n$) goes to the face $ w=1$ of the image space cube, and the face defined by the far plane ($ z=-f$) goes to the face $ w=-1$ of the cube. At a minimum, this implies that

$\displaystyle (0,0,-n) P$ $\displaystyle = ( 0,0,1 ), \: {\rm and}$    
$\displaystyle (0,0,-f) P$ $\displaystyle = ( 0,0,-1 )$    

and these two equations will enable us to calculate $ a$ and $ b$.

To calculate these, we apply the matrix to obtain

\begin{displaymath}
\left[
\begin{array}{cccc}
0 & 0 & -n & 1
\end{array}\right]...
...ft[
\begin{array}{cccc}
0 & 0 & -an + b & n
\end{array}\right]
\end{displaymath}

and

\begin{displaymath}
\left[
\begin{array}{cccc}
0 & 0 & -f & 1
\end{array}\right]...
...ft[
\begin{array}{cccc}
0 & 0 & -af + b & f
\end{array}\right]
\end{displaymath}

Projecting these back to three dimensions, we obtain

$\displaystyle (0,0,-n) P$ $\displaystyle = ( 0, 0, \frac{-an+b}{n} ), \: {\rm and}$    
$\displaystyle (0,0,-f) P$ $\displaystyle = ( 0, 0, \frac{-af+b}{f} )$    

That is, in order that the values on the left map to $ (0,0,1)$ and $ (0,0,-1)$, respectively, we must have

$\displaystyle -an + b$ $\displaystyle = n, \: {\rm and}$    
$\displaystyle -af + b$ $\displaystyle = -f$    

Subtracting the equations, obtains

$\displaystyle af - an \: = \: f + n
$

or

$\displaystyle a = \left(\frac{f+n}{f-n}\right)
$

and by substitution,

$\displaystyle b$ $\displaystyle = n + an$    
  $\displaystyle = n ( 1 + a )$    
  $\displaystyle = n ( 1 + \frac{f+n}{f-n} )$    
  $\displaystyle = n ( \frac{f-n+f+n}{f-n} )$    
  $\displaystyle = \frac{2fn}{f-n}$    

Substituting for $ a$ and $ b$, the transformation $ P$ becomes

\begin{displaymath}
P \: = \:
\left[
\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & ...
...{f-n} & -1 \\
0 & 0 & \frac{2fn}{f-n} & 0
\end{array}\right]
\end{displaymath}

Unfortunately, this is not quite correct, as we have mapped only the near and far faces that correspond to the near and far planes. We need to also adjust for the left, right, top and bottom faces so that the viewing pyramid is transformed to the image space cube. So consider points on the top plane that bounds the viewing pyramid.

\includegraphics {figures/viewing-model-1c}

If we apply our transformation to these points, we obtain

$\displaystyle \left[ \begin{array}{cccc} 0 & n \tan{\frac{\alpha}{2}} & -n & 1 \end{array} \right] P$ $\displaystyle = \left[ \begin{array}{cccc} 0 & n \tan{\frac{\alpha}{2}} & -n \frac{f-n}{f+n} + \frac{2nf}{f+n} & n \end{array} \right]$    
  $\displaystyle = \left[ \begin{array}{cccc} 0 & n \tan{\frac{\alpha}{2}} & n & n \end{array} \right]$    

Similarly

$\displaystyle \left[ \begin{array}{cccc} 0 & f \tan{\frac{\alpha}{2}} & -f & 1 \end{array} \right] P$ $\displaystyle = \left[ \begin{array}{cccc} 0 & f \tan{\frac{\alpha}{2}} & -f \frac{f-n}{f+n} + \frac{2nf}{f+n} & f \end{array} \right]$    
  $\displaystyle = \left[ \begin{array}{cccc} 0 & f \tan{\frac{\alpha}{2}} & f & f \end{array} \right]$    

So after dividing by the fourth coordinate we see that the $ v$ coordinates of the points that lie on the line $ \left\{ (0, -z \tan{\frac{\alpha}{2}}, z ) \: {\rm for} \: w < 0 \right\}$ are all transformed to have a constant $ v$ value of

$\displaystyle \tan{\frac{\alpha}{2}}
$

Since we wish $ v$ coordinates to be mapped to $ 1$ (which is the $ v$ value of the top of the image space cube), we must multiply our projection matrix by a matrix that scales the $ x$ and $ y$ coordinates by

$\displaystyle c \: = \: \frac{1}{\tan{\frac{\alpha}{2}}
}
$

That is, we define

$\displaystyle A _{\alpha, n, f} \: = \: S_{c,c,1} P
$

giving

\begin{displaymath}
A _{\alpha, n, f} \: = \:
\left[
\begin{array}{cccc}
\cot{\f...
...{f-n} & -1 \\
0 & 0 & \frac{2fn}{f-n} & 0
\end{array}\right]
\end{displaymath}

This is the matrix that transforms the truncated viewing pyramid into image space.


The Inverse of the Viewing Transformation

The above transformation transforms the truncated viewing pyramid into image space. The inverse of this transformation does the opposite - transforms points from image space into the truncated viewing pyramid. This inverse is given by

\begin{displaymath}
{A _{\alpha, n, f}} ^{-1} \: = \:
\left[
\begin{array}{cccc}...
...-n}{2fn} \\
0 & 0 & -1 & +\frac{f+n}{2fn}
\end{array}\right]
\end{displaymath}

That this is actually the inverse can be easily checked.


Summary

We have developed a matrix that works in the local coordinates of the camera space and transforms the points of an object into image space. The matrix is applied in homogeneous space, so that the perspective divide must be done after the viewing matrix is applied.


Return to the Graphics Notes Home Page
Return to the Geometric Modeling Notes Home Page
Return to the UC Davis Visualization and Graphics Group Home Page


This document maintained by Ken Joy

Mail us your comments

All contents copyright (c) 1996, 1997, 1998, 1999
Computer Science Department
University of California, Davis

All rights reserved.


Ken Joy
1999-12-06