A camera, a wonderful device that brings two pieces of magic together:
- Casting a real image of the subject on a surface
- Capturing that image for posterity
Seeing an image of an object on a screen is magical enough: I've got a 'wow!' from my children when I've shown them an image of some commonplace luminous or brightly illuminated object projected on a wall using a
magnifying glass, or on the tracing-paper screen of a home-made
pinhole camera.
Now, in everyday life, we don't see real images too often. The images we are used to seeing are
virtual images which can't be cast on a screen and
shadows, which can.
So how do real images happen?
Let's take a very simple case of an object (imagine it is luminous or brightly illuminated if that will help) and a flat screen. We will have a sharp real image of the object on the screen if we can set things up such that:
- For each distinct point $A_i$ of the side of the object facing the screen, there is a corresponding distinct point $B_i$ on the screen such that each and every ray of light falling on the screen that originated at $A_i$ falls on the screen at point $B_i$.
- No ray of light originating from any other point (on the object, or elsewhere) passes through $B_i$.
How can this be done?
One way is to put, between the object and the screen, a thin, opaque plate with a pinhole $P$ such that the plate is perpendicular, at $P$, to the line joining the mid-points of the object and the screen. This way, for each $A_i$, the point $B_i$ is where the extension of the straight line $A_{i}P$ intersects the screen. All rays originating from $A_i$ other than the line $A_{i}B_{i}$ are physically blocked by the plate.
The pinhole works by
eliminating light: the images it creates are therefore faint. But it is so constrained because it handles only light traveling in straight lines. A better method to create a real image would be to have a contraption that
bends the light rays in such a way that a much larger fraction of light rays that originate from the same source point $A_i$ (some of which otherwise may not even have touched the screen) are bent just right so as to pass through the same destination point $B_i$.
Either of the arrangements illustrated in the following diagram would do this:
Here, we have an object $A_{0}A_{1}$ of height $h_A$ that forms and image $B_{0}B_{1}$ of height $h_B$. In each of the two diagram, we have something--a device--to bend the light rays, and we show that device as a thick grey line. The center of our contraption is designated $P$, and the line perpendicular to it at $P$ is called the
axis. Our object and image are both perpendicular to this axis.
When we use a pinhole, an image is formed irrespective of the distance
$u$ between the object and the pinhole, and the distance
$v$ between the image and the screen. With the methods suggested by the diagram, however, this is no longer the case.
First, we consider the
red ray: the one that originates at $A_1$ and passes through $P$. For the type of device for which the left-side diagram is applicable, this ray proceeds in a straight line through $P$ to the other side: just as it would have been for a pinhole. For the type of device for which the right-side diagram is appropriate, this ray is reflected at $P$ just as it would have been in a plane mirror.
Next we consider the
blue ray. This one originates at $A_1$ and proceeds parallel to the axis up to $X$, where it is bent by our contraption. From $X$, this ray travels in the new direction until it intersects the red ray. This point of intersection is $B_1$, which is by definition the point where all rays from $A_1$ converge. On the way, the blue ray intersects the axis at point $F$.
A little geometry now. Irrespective of whether we choose to look at the diagram on the left or the one on the right, we can see that triangles $\triangle P B_0 B_1$ and $\triangle P A_0 A_1$ are
similar, so $h_B / h_A = v/u$. We can also see that triangles $\triangle F B_0 B_1$ and $\triangle FPX$ are similar, so $h_B / h_A = (v-f)/f$.
To these two observations, we can apply a little algebra to get the well known formula: $$\frac{1}{u} + \frac{1}{v}=\frac{1}{f}$$ We can also obtain another useful result: $$\frac{h_A}{h_B} = \frac{u}{f} - 1$$ The point $F$ is called a
focus of our device: a beam of light parallel to the axis will, after hitting our device, converge at the focus. The value
$f$, being the length $PF$, is called the
focal length of our device. It is not possible to obtain a real image of any object for which $u \lt f$ (we get a virtual image instead).
Turns out that such a device can be built in many ways. The ray diagram to the left can be satisfied by:
- Using refraction at a curved boundaries between media that have different refractive index values, with a convex lens (biconvex, plano-convex or positive meniscus) placed between the object and the screen. This will collect all light rays that hit the lens. For a converging lens, $f$ depends on the geometry of the lens and the of the material it is made of: the refractive index in turn depends on the wavelength (roughly, the colour) of the rays of light.
- Using refraction within a single medium with gradually varying refractive index, with a gradient index lenses. This will collect all light rays that hit the lens. The focal length $f$ depends on the geometry of the lens and the variation of the refractive index of the material it is made of
- Using diffraction, with a zone plate (ideally, a 'sinusoidal zone plate', a.k.a. 'Gabor zone plate' or GZP) placed between the object and the screen. This will collect roughly half the light rays that hit the zone plate—the other half will be blocked by the plate itself. For a zone plate, $f$ depends on the geometry of the plate and the wavelength of the plate: for a given geometry, it is inversely proportional to the wavelength.
The diagram to the right can be satified too:
- Using reflection, with a concave mirror placed such that both the object and screen are on the same side (the reflecting side) of the mirror. This will collect all the light rays that hit the mirror, minus any that are blocked by the screen or the object (whichever is closer to the mirror). For a concave mirror, $f$ depends only on the geometry of the mirror.
- Using a combination of reflection and refraction, with a converging lens with one side silvered. This will work pretty much the same as a concave mirror, except that $f$ will now depend both on the geometry and the wavelength as in a lens.
Note that the thickness and shape of the lens or mirror is ignored here: this is because all of this works under that approximation that lens is very thin, or the concave mirror is very shallow.
Pinholes, zone-plates, curved mirrors and lenses have all been used for photography at some time or another. Today, pinholes and zone plates are alive in a class of artistic photography. Concave mirrors remain the mainstay of photography as used in astronomy, and can also be found in some telephoto lenses.
However, it is the lens that rules the roost when it comes to image forming in photography. One possible reason for this is the ease with which lenses can be combined in a 'compound lens'. This allows us to fine tune optical properties such as the focal length and its variation with wavelength.
Actually, all types of optical elements—lenses, mirrors and zone plates—can be combined in an optical path between the object and the screen. And not all the elements in such a combination need to be able to cast a real image by itself, either—it is perfectly possible to throw in a diverging lens or a convex mirror (these have $f < 0$) or even a plane mirror or reflecting prism (these have $f = \infty$) into the mix if that is what is needed. But it is more convenient with a lens because it can be inserted into the straight line path between the object and the screen
without blocking any part of the light path.
So how do these combinations work? Consider two converging lenses, of focal lengths $f_1$ and $f_2$ placed along the same axis, at a distance of $w$ from each other. Like the following diagram:
It can be shown that the focal length $f$ of the combined setup is $$f = \frac {f_1 f_2} {f_1 + f_2 - w}$$ If we consider a beam of light parallel to the axis coming from the
left and hitting this combined setup, it would converge at a point that is at a distance $f$ to the right of the
second principal plane, which is at a distance $s_2$ behind (to the left of) the second lens, where: $$s_2 = \frac {w f_2} {f_1 + f_2 - w}$$ If we consider a similar beam coming from the
right, we'll find that it converges the same $f$ to the left of the
first principal plane, which is $s_1$ to the right of the first lens, where $$s_1 = \frac {w f_1} {f_1 + f_2 - w}$$ The distance $t$ between the two principal planes is, then: $$t = \frac {-w^2} {f_1 + f_2 - w}$$ Even though this compound lens can no longer satisfy the 'thin lens' approximation, it turns out that the equations $1/u + 1/v = 1/f$ and $h_A/h_B = u/f-1$ will still hold, with the proviso that $u$ is measured from the first principal plane, while $v$ is measured from the second principal plane.
These equations hold even if any or all the lenses are diverging lenses, that is even if $f_1$ or $f_2$ or both are negative. Irrespective of whether any of $f_1$ or $f_2$ is negative, the final $f$ has to be positive if the combined system is to cast an image.
Let us count the blessings:
- The two lenses can be made with materials of different dispersion (variation of refractive index with wavelength) so that the combination can have a focal length independent of colour. We get achromatic lenses.
- Given an object and the screen at fixed positions with respect to one another (i.e. a constant $u+v+t$), by simply adjusting $w$ it is possible to change the focal length of the combination, thus changing the image size $h_B$ cast on the screen. In other words, we get zoom.
- It is possible to design a compound lens such that the second principal plane that is even before the first lens in the light path. This lets us have a camera with a focal length that is larger than the size of the camera itself, if so desired. This can be done, for example, with a converging left-side lens such that $f_1 \gt w$ and a diverging right side lens such that $-f_2 \gt f_1 - w$. This is the principle behind most telephoto lenses.
Let us now look at an aspect we've been side-stepping so far. While the screen can accurately be considered a two-dimensional plane, the same cannot in general be said about the object. Whatever we are photographing is usually three-dimensional, so when we've cast its image on the screen for any given $u$, only that part of the object that intersects a plane at distance $u$ from the lens will be exactly 'in focus'. For everything else, the focus would be somewhere in front of or behind the screen: on the screen, they will be 'out of focus'.
In general, an out-of-focus point of light will be a patch of light of the same shape as the aperture through which the light hits the lens. In cameras, this shape is usually a circle, or a regular polygon when an iris-diaphragm is in use. The size, as in diameter, of the patch will be proportional to the distance the screen is from the actual in-focus position for that point. It will also be fainter the farther it is from the in-focus position, the brightness being inversely proportional to the square if this distance. These patches of light merge with each other, and make out-of-focus images blurred.
Images captured by a camera are meant to be seen by the human eye, which cannot resolve more less than about 1 minute of arc that is about 0.0003 radians. This means that any shape that subtends an angle of less than this cannot be distinguished from a point.
For a given captured picture, this angle is dependent on the size the image is scaled to when viewed, as well as the distance the picture is viewed from: for example, say a 24mm x 36mm capture is printed at 10cm x 15cm, and is viewed from a distance of 50 cm. An angle 0f 0.0003 radians will be subtended by 0.15mm (50cm x 0.0003) on the print, which will be 36 microns (0.15mm x 24mm / 10 cm) when captured. If the same picture is printed at 30cm x 45 cm, the same 0.0003 radians will be some 12 microns (0.15mm x 24mm / 30cm) when captured.
Ultimately, the resolution of the image when viewed is limited by the image as captured. Film will resolve down to the grain size, which is anything from 0.2 to 2 microns. Digital sensors will resolve down to the pixel size, which is a function of the 'sensor size' and 'megapixels' parameters: today, it is anything between 1 to 10 microns.
This tells us that some amount of 'out-of-focus' is
acceptable, whether we define it from the resolution of the capture technology, or from the point of view of the print size and viewing distance. The maximum acceptable size of the patch of light in the image corresponding to an out-of-focus point is called the
circle of confusion.
In the diagram above, the diameter of the circle of confusion is marked as 'c', while the diameter of aperture that lets light in is marked 'd'. In the picture, the point
$u_0$ is perfectly in-focus at the screen, while points up to
$u_n$ on one side and
$u_f$ on the other side are
acceptably out-of-focus. With simple geometry, it can be shown that: $$u_n = \cfrac {du_0f}{df+cu_0-cf} \qquad\qquad u_f = \cfrac {du_0f}{df-cu_0+cf} \qquad\qquad \cfrac 1 {u_n} + \cfrac 1 {u_f} = \cfrac 2 {u_0}$$ Any object lying between $u_n$ and $u_f$ will appear to be in focus. The distance $e$ between the $u_n$ and $u_f$ is called the
depth of field. $$\begin{align}e &= u_f - u_n \\ &= 2u_0 \cfrac {\left(\cfrac{c}{d}\right)\left(\cfrac{u_0}{f}-1\right)} {1 - \left(\cfrac{c}{d}\right)^2\left(\cfrac{u_0}{f}-1\right)^2},\qquad f\le u_0 \le H \\ &= u_n \cfrac { u_n - f } {\cfrac{H}{2} - u_n},\qquad f \le u_n \le \cfrac{H}{2} \\ \end{align}$$ Here, $H$ is the
hyperfocal distance defined as: $$H = \left(\frac{d}{c} + 1\right)f$$ When $u_0 = H$, we get $u_n = H/2$ and $u_f = e = \infty$.
It should be noted that this depth of field shrinks quickly for points farther away for the axis: these points would cast elliptical patches of light on the screen instead of circular ones, and such patches would only be indistinguishable from points if the longer (transverse) diameter of the ellipse is less than $c$. This diameter will be longer the further the point is from the axis.
As $u_n$ goes from $f$ to $H/2$, $u_0$ goes from $f$ to $H$, and $e$ increases monotonically from $0$ to $\infty$. Beyond this, no point beyond $u_0$, even if it is at infinity, can cast a patch of light larger that $c$, so the depth of field stays $\infty$. Simply put, if we
focus on a more distant object, we get a
larger depth of field.
Let's consider the focal length $f$ next. It can be seen that as long as $f \le u_n \le H/2$, as $f$ increases, the numerator $u_n - f$ decreases and the denominator $H/2 - u_n$ increases. The net result is that as $f$ increases, $e$ decreases. That is, if we use a
larger focal length, we get a
smaller depth of field.
Next we look at the aperture diameter $d$. It is evident that as $d$ increases, $H$ also increases, and consequently the denominator of the expression for $e$ increases. Therefore, as $d$ increases, $e$ decreases. In other words, if we have a
larger aperture, we have a
smaller depth of field.
Finally, we come to the $c$, the diameter of the circle of confusion. And this is where we have to step very carefully ... for what would changing $c$ actually
mean? If we take the simplistic approach of keeping $u_n$, $f$ and $d$ same, a smaller $c$ will result in a smaller depth of field ... but is that what we are looking for?
If, however, we go back to the original definition of 'circle of confusion', we see that $c$ is, in fact, a specified fraction of a specified length dimension of the capture frame. This is easier understood for digital sensors: if we take two capture frames of different sizes but the same megapixel resolution, it is easy to see that pixel size in the larger frame will be larger. For example, if we consider a 4:3 12 megapixel sensor, its resolution is 4000 x 3000, and thus the pixel diameter will be 1/4000 the width of the capture frame size. Thus, changing $c$ will mean changing the size of captured frame: such as the film format, or the digital image sensor size. Now, what exactly do we wish to keep the same? Let's say we want the 'same picture' in the two capture frame sizes. Since the DOF will be different, it will obviously not be the exactly the same picture, so let us interpret 'same picture' to mean 'same composition and same brightness of image'.
Suppose the new capture frame size is $n$ times the old one: the height and width each have been scaled $n$ times. This multiplies $c$ by $n$, or $c_{NEW} = n \cdot c_{OLD}$.
Since the frame size is proportional to $c$, multiplying $c$ by $n$ will multiply the
area of the image by $n^2$. We want the brightness of the image, defined as 'light per unit area' to be the same, we need $n^2$ times light to be admitted. The amount of light admitted is proportional to $d^2$, so in order to keep the image brightness the same, we need to multiply $d$ by $n$ too: $d_{NEW} = n \cdot d_{OLD}$.
This leaves the ratio $c:d$ unchanged.
To keep the 'same composition', we have to multiply the image size by $n$ as well. To do this, we need to change $f$. But by how much? If the new $h_B$ is $n$ times the old one, the new ${h_A}/{h_B}$ is $1/n$ times the old value. As $h_A/h_B = u/f - 1$, this means ${u_0}/{f_{NEW}}-1 = ({u_0}/{f_{OLD}}-1 )/n$.
Now let us take another look at the expression for $e$ expressed in terms of $u_0$. We can say: $$e=2u_0 \cfrac{z}{1-z^2},\qquad z=\left(\frac{c}{d}\right)\left(\frac{u_0}{f}-1\right), \qquad 0 \le z \le 1$$ Since ${c}/{d}$ is unchanged, this means $z_{NEW} = {z_{OLD}}/{n}$. Now, as $z$ increases from $0$ to $1$, the value $\frac{z}{1-z^2}$ increases monotonically from $0$ to $\infty$. Since $n \gt 1$ implies $z_{NEW} \lt z_{OLD}$ and
vice-versa, if we use
a larger format, we get a
smaller depth of field.
Enough about depth of field. But while talking about it, we've mentioned brightness of the image, so let's dwell on that for a moment.
This brightness is something that can be expressed, roughly, as 'light per unit area' on the screen where the image is projected. Considering that this light originates at the object being photographed which is a distance $u$ away. Consider also that the sum total of light that falls on the screen is that which comes in through that circle of diameter $d$ at the first lens: so far, we've considered this to be the size of the lens itself, but it could equally well be an hole, an
aperture, with a $d$ we can select, placed just before the lens.
From the law of inverse squares, the light per unit area available at the position where the aperture is proportional to $1/u^2$. This means the amount of light actually admitted is proportional to $d^2/u^2$.
Suppose we happen to be photographing the same object from different distances $u_1$ and $u_2$, using the same capture frame size. Let us say we want to have the same size of image, and get the same brightness for both the images. So what do we have to do?
Since the brightness has got to be the same, we must have ${d_1}^2/{u_1}^2 = {d_2}^2/{u_2}^2$. As the image size $h_B$ has got to be the same as well, from the relation $h_A/h_B + 1 = u/f$, we can see that we have to use use different focal lengths $f_1$ and $f_2$ such that $u_1/f_1 = u_2/f_2$. Combining the two, we can see that if we want the
same image brightness and
same image size, we need $f/d$ to remain the same.This ratio is called the
f-number, and will play a starring role in what we are going to talk about next.
This brings us to the
other bit of magic inside the camera: the
capturing of the image that has been cast on the screen.
This is deeper magic. For close to two centuries, this magic was done using chemistry, with photographic plates and film that depended on the properties of a magic element ...
silver!
Exposure to light causes many silver salts, notably the silver halides
AgI,
AgBr and
AgCl to break down, releasing metallic silver. This is the basic chemistry used in photography.
Take s sheet of good quality white paper—soaked it a solution of
NaCl (salt), dried, coated with a solution of
AgNO3 (silver nitrate) and dried again—and, placing a few coins on it, expose it to bright sunlight for half an hour or so. The paper would have darkened, except under the coins where the sunlight couldn't reach it. The shadow of the coins would have been
captured on the sheet of paper (making not a photograph, but what is known as a
photogram).
What happened? Well,
AgNO3 + NaCl → NaNO3 + AgCl for starters. The
AgCl, being insoluble in water, is precipitated on the paper. Subsequently, light breaks it down to metallic silver, which is what made the paper dark.
When we take a snapshot, however, we certainly aren't exposing our film for half an hour to sunlight! When the silver halide (which is in the film) is exposed for a fraction of a second to an image which is much less bright in comparison to direct sun-light, the breakdown of the silver halide is far from complete. After the exposure, some 'grains' of the halide will have a few atoms of silver in them, and many others will have none. Areas of the image that were brighter will have a larger proportion of grains containing free silver atoms.
There is
another interesting property of silver halides that helps us here. Certain chemicals will break down a silver halide crystal to release all the silver in it
only if the crystal contains some atoms of metallic silver to start with. This is called 'developing' the image.
The image that we get after developing is evidently 'reversed', with the bright portions of the object rendered dark on the image and vice-versa. This won't do for the final product, obviously. To get it 'right', we first ensure that when capturing the image in the camera, the silver halide is coated over a
transparent base: historically glass, and polyester film today. Then:
- If we want our final print on a paper base, first a 'fixer' such as a solution of Na2S2O3 (sodium thiosulphate or 'hypo'), (NH4)2S2O3 (ammonium thiosulphate) or KCN (potassium cyanide) is used to wash away any remaining silver halide—so further exposure to light has no effect on the image. (If we want our photogram of the coins to be permanent, we should do this too!) Next, a photogram of this 'negative' image is taken ...with a halide coating on paper this time. It is developed and fixed, and we have a printed photograph.
- If we want a transparency (slide), we can get it with a 'reversal' process. In this case, we would treat the developed film with chemicals that would remove the silver, leaving the remaining halide intact. Then the film would be treated with another kind of chemical (called a 'fogging developer'), one that would convert all silver halide to silver. This is a 'positive' slide that we want.
Photochemical reactions are usually sensitive to the wavelength (colour) of the light, and the response is like a step-function: there is a maximum wavelength over which the reaction won't happen. Silver halides are actually sensitive to the shorter wavelength part of the spectrum: blue, violet, ultraviolet and beyond. It turned out that adding certain dyes to the halide could 'sensitize' them to longer wavelengths too, giving more natural black-and-white photographs.
All of this gives us only black-and-white photographs. But
cc