the special theory of relativity - ucm facultywebfaculty.ucmerced.edu/dkiley/special...

36
The Special Theory of Relativity “The views of space and time which I wish to lay before you have sprung from the soil of experimental physics, and therein lies their strength. They are radical. Henceforth space by itself, and time by itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.” Hermann Minkowski, 1908. Newton’s laws of motion and gravity have been wildly successful in the everyday areas in which they are applicable. However, once we move past these everyday applications and begin to discuss systems moving at speeds approaching that of light, or strong gravitational fields, then Newton’s laws have to be modified. In this chapter we want to discuss the modifications that Einstein made to Newton’s laws of motion, obtaining his Special Theory of Relativity, leaving the changes to the gravitational law to be discussed later. We begin by giving Einstein’s motivation for Special Relativity stemming from his investigations of Maxwell’s equations for electrodynamics. The laws of electrodynamics predict that electromagnetic waves propagate through the vacuum at a precise speed, c, which seems to be independent of the relative motion of the observer. This was a tremendous puzzle, requiring new concepts which ultimately led to a fundamental new understanding of the nature of the Universe. Part of that understanding is developed in this chapter. We discuss the observations seen by two different observers in relative motion, deriving the Lorentz transformations. These transformations lead to some very unintuitive ideas, mixing together space and time into a single spacetime, the interesting properties of which we will study through various examples. In order to better understand the new concepts introduced by Special Relativity, we need to introduce some new mathematical ideas including four-vectors and tensors. These ideas generalize the simple concepts of scalars and vectors of elementary physics to be consistent with Special Relativity. We explore these mathematical ideas, which will be extremely important in our later work, once again using electrodynamics as a guide. We finish the chapter with a discussion of the Minkowski metric and gauge invariance, each of which will be crucial later. 1 Electrodynamics. Einstein was motivated to discover the modification to Newton’s laws of motion by consid- ering a puzzling aspect of the laws of electrodynamics, namely Maxwell’s equations. These equations for the electric and magnetic fields, ~ E and ~ B, respectively, are ∇· ~ E = ρ 0 ∇× ~ E = - ~ B ∂t ∇· ~ B = 0 ∇× ~ B = μ 0 ~ J + μ 0 0 ~ E ∂t . (1) Here ρ is the charge density, and ~ J is the current density. The constants 0 and μ 0 are the permittivity and permeability of free space, respectively. Together with the Lorentz force 1

Upload: vophuc

Post on 21-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

The Special Theory of Relativity

“The views of space and time which I wish to lay before you have sprung from thesoil of experimental physics, and therein lies their strength. They are radical.Henceforth space by itself, and time by itself, are doomed to fade away intomere shadows, and only a kind of union of the two will preserve an independentreality.” Hermann Minkowski, 1908.

Newton’s laws of motion and gravity have been wildly successful in the everyday areas inwhich they are applicable. However, once we move past these everyday applications and beginto discuss systems moving at speeds approaching that of light, or strong gravitational fields,then Newton’s laws have to be modified. In this chapter we want to discuss the modificationsthat Einstein made to Newton’s laws of motion, obtaining his Special Theory of Relativity,leaving the changes to the gravitational law to be discussed later. We begin by givingEinstein’s motivation for Special Relativity stemming from his investigations of Maxwell’sequations for electrodynamics. The laws of electrodynamics predict that electromagneticwaves propagate through the vacuum at a precise speed, c, which seems to be independentof the relative motion of the observer. This was a tremendous puzzle, requiring new conceptswhich ultimately led to a fundamental new understanding of the nature of the Universe.

Part of that understanding is developed in this chapter. We discuss the observations seenby two different observers in relative motion, deriving the Lorentz transformations. Thesetransformations lead to some very unintuitive ideas, mixing together space and time into asingle spacetime, the interesting properties of which we will study through various examples.In order to better understand the new concepts introduced by Special Relativity, we needto introduce some new mathematical ideas including four-vectors and tensors. These ideasgeneralize the simple concepts of scalars and vectors of elementary physics to be consistentwith Special Relativity. We explore these mathematical ideas, which will be extremelyimportant in our later work, once again using electrodynamics as a guide. We finish thechapter with a discussion of the Minkowski metric and gauge invariance, each of which willbe crucial later.

1 Electrodynamics.

Einstein was motivated to discover the modification to Newton’s laws of motion by consid-ering a puzzling aspect of the laws of electrodynamics, namely Maxwell’s equations. Theseequations for the electric and magnetic fields, ~E and ~B, respectively, are

∇ · ~E = ρε0

∇× ~E = −∂ ~B∂t

∇ · ~B = 0

∇× ~B = µ0~J + µ0ε0

∂ ~E∂t.

(1)

Here ρ is the charge density, and ~J is the current density. The constants ε0 and µ0 are thepermittivity and permeability of free space, respectively. Together with the Lorentz force

1

Page 2: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

law,~F = q

(~E + ~v × ~B

), (2)

which gives the force on a charge q, moving with a velocity ~v through an electric and magneticfield, Eq. (1) gives a complete description of electricity and magnetism.

As a specific example of these laws, let’s look at the source-free (vacuum) Maxwell’s

equations, obtained from Eq. (1) by setting the sources ρ and ~J to zero. Then, Eq. (1)reads

∇ · ~E = 0

∇× ~E = −∂ ~B∂t

∇ · ~B = 0

∇× ~B = µ0ε0∂ ~E∂t.

(3)

Let’s play around with these a little bit. Take the curl of the second equation

∇×(∇× ~E

)= ∇

(∇ · ~E

)−∇2 ~E = −∇×

(∂ ~B

∂t

)= − ∂

∂t

(∇× ~B

).

But, from the first of Eq. (3), ∇ · ~E = 0, and from the fourth equation, ∇ × ~B = µ0ε0∂ ~E∂t

.Plugging these in gives

∇2 ~E − µ0ε0∂2 ~E

∂t2= 0.

This is precisely the equation for a wave traveling at a speed v if we identify the value(µ0ε0)−1/2 as the velocity of the wave. Since these are constants, we find a constant velocityof

v ≡ 1√µ0ε0

= 299, 792, 458 m/s,

which is precisely the speed of light, c! Thus, the wave equation reads

∇2 ~E − 1

c2

∂2 ~E

∂t2≡ 2 ~E = 0, (4)

where 2 = ∇2− 1c2

∂2

∂t2is the d’Alembertian operator. Furthermore, starting from the fourth

of Eq. (3) and performing the same calculations one finds

∇2 ~B − 1

c2

∂2 ~B

∂t2≡ 2 ~B = 0. (5)

Thus, both the electric and magnetic fields travel in waves in vacuum. Eqs. (4) and (5)showed that not only are the electric and magnetic fields part of the same electromagneticfield, but light is, as well! This was Maxwell’s crowning achievement.

However, Eqs. (4) and (5) contain a curious property. These equations clearly say thatthe electromagnetic field always travels at the speed of light; but, one might ask: “at thespeed of light with respect to what?” Conventional intuition says that speeds are only definedwith respect to something else. As an example, suppose that you are riding in a train at 50miles per hour. Then you throw a ball to a friend at 30 miles per hour. You would see the

2

Page 3: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

ball traveling at 30 miles per hour, but someone standing outside the train watching youwould see the ball traveling at the speed of the ball plus the speed of the train, i.e., 80 milesper hour.

This idea is completely reasonable and intuitive. We can make it more mathematicallyprecise using the Galilean Transformations, which relate the motion of objects as seen bydifferent observers, as seen in Figure 1. Here we have two observers, in inertial referenceframes S and S ′, respectively, initially overlapping at time t = t′ = 0. The observer in frameS ′ is moving to the right at speed V with respect to S.

S’S

y y’

x’x

V

Figure 1: The Galilean transformations relate the motion of an object moving non-relativistically as seen by two different observers.

The object in the figure has coordinates ~r = (x, y, z) in frame S, and ~r′ = (x′, y′, z′)in figure S ′. What we would like to do is give the relationship between the coordinates,which we can easily just read off of the figure. Clearly y = y′, since the motion is along thehorizontal axis, we similarly see that z = z′. For this nonrelativistic motion, the two timesmeasured by each observer will agree with each other, so t = t′. Now, the differences in thex coordinates of each frame depends on the relative motion between the two frames, andhow long the frames have been in relative motion. So, x−x′ = V t′ = V t, since t′ = t, and sox′ = x−V t. This gives us the full Galilean transformations, relating the primed coordinatesto the unprimed coordinates,

x′ = x− V ty′ = yz′ = zt′ = t.

(6)

We could easily invert these equations to find the components of motion in the unprimedcoordinates in terms of the primed coordinates.

Knowing the relationship between the coordinates we can find the relationship for thevelocity of the object in each frame. Supposing that the velocity of the object in the unprimedframe is ~v = d~r

dt, then taking the derivative of the expressions in Eq. (6) with respect to time

3

Page 4: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

(and recalling that t = t′ so that ~v′ = d~r′

dt′= d~r′

dt) gives

v′x′ = vx − Vv′y′ = vyv′z′ = vz.

(7)

Thus, we see that the velocities add intuitively, just as expected. Furthermore, taking onemore derivative tells us that the accelerations (and therefore the forces) seen by each observeris the same.

The result that light always travels at the same speed, regardless of the observer, seemsto contradict the velocity addition rules in Eq. (7). However, the speed of light is veryspecifically predicted by Maxwell’s equations, and as time went on these equations werefound to always agree with the results of experiments. Thus it seems that, if nothing iswrong with the unintuitive result from Maxwell’s equations, then something must be wrongwith the intuitive relationship between the observers in Eq. (6)!

Various ideas were put forth in trying to reconcile these two ideas; foremost among themwas the concept of the aether (also called the luminiferous ether). The aether was supposedto be some sort of background medium, like a space-filling Jell-O, providing the referencebackground from which we could measure the speed of light. This solved the issue associatedwith Maxwell’s equations, saying that the speed of light was measured with respect to thisbackground medium.

The aether was a purely theoretical concept, invented to explain the propagation oflight, but it lacked any experimental basis up until that point. So various experiments werepredicted to try to measure properties of this aether. The most notable of these was theMichelson-Morely experiment, which tried to measure the motion of the Earth through theaether, determining our speed through this background. However, when performed, theMichelson-Morely experiment suggested that our velocity through the aether was preciselyzero! Clearly, the Earth moves through space (we are orbiting the Sun, after all), and theseexperimental results said that the Earth and aether are moving in exactly the same way, eventhrough our elliptical orbit. Obviously, the aether concept needs some refinement; since allthe experiments suggest that the aether doesn’t produce any measurable effects, then weshould conclude that the aether isn’t there!

Einstein finally realized that a new concept was required. Maxwell’s equations say thatlight always travels at the same speed, c, but with respect to whom? Einstein decided thatlight travels at speed c, with respect to whomever cares to measure it ! This means that everyobserver always sees light traveling at the same speed, independent of their own motion.This idea is completely unintuitive, disagreeing with the Galilean transformations in Eq.(7) (and so therefore also Eq. (6)), but it is what seems to be required by the results ofexperiment.

Thus, Einstein arrived at the idea which would become one of the cornerstones of hisSpecial Theory of Relativity, which is that light always moves at the same speed, relative toall observers. However, this isn’t quite enough to get the full results of experiment; we alsoneed one more idea. Not only, Einstein said, do all observers have to agree on the speed oflight, but they also have to agree on all the laws of physics ! We will actually see later thatthis will be vitally important for the formulation of Einstein’s General Theory of Relativity.For now, though, let’s see what these ideas say about the laws of mechanics.

4

Page 5: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

2 Mechanics in Special Relativity.

As we will see, the simple postulates Einstein proposed will lead to some very interestingeffects, including (but not limited to) the merging of space and time! Let’s begin with asimple example which demonstrates some of the interesting physics. Consider a movingtrain of height L, moving to the right with velocity v, as seen in Figure 2, with one personriding inside the train, and another standing outside watching the train go by. The train hasmirrors at the top and bottom, and a beam of light fires from the bottom mirror to the top.

v

L

Observer Inside Train Observer Outside Train

Figure 2: The motion of light as seen by two different observers, one moving along with thelight, and another moving relative to the light. The different observers will measure differentamounts of time for the light to complete its journey.

The two observers see different motions of the light. The person sitting inside the trainsees the light fire straight up from the bottom to the top, with the light traveling for atime ∆t = L/c. The observer outside sees something different; since the train is moving(relative to the person standing outside), the light travels along a different, diagonal path.The total distance that the light travels is not just the usual distance up, but also thedistance the train moves along, which, if the outside observer sees the light hit the topmirror in a time ∆t′, then the distance the train moves is v∆t′. So, the total distance thelight goes is, from the Pythagorean theorem, D =

√L2 + v2∆t′2, and so the total time is

∆t′ = D/c =√L2 + v2∆t′2/c. Now, before Einstein both observers would have had to agree

on the amount of time (disagreeing on the speed of the light), but now, since the speed oflight is fixed, we find a difference in times! Solving for the time ∆t′ gives

∆t′ =L√

c2 − v2.

But, the observer inside the train saw that L/c = ∆t, and so

∆t′ =∆t√

1− v2/c2, (8)

which says that the two observers see the light taking different times ! The person sittingoutside sees the light taking a longer amount of time to reach the top than the person inside

5

Page 6: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

does. The time measured by the person inside the train (∆t) measures what is called theproper time, the time that passes moving along with the clock.

The height of the train could be chosen (at least in principle) such that it takes onesecond for the light to go up to the top (or we could just bounce the light up and down abunch of times until a second has passed). In this way, we’ve made a light clock which willtick off the seconds. Then, the two different observers will see different seconds ticking off.What this means is that moving clocks run slow ! The amount of time passing differs fordifferent observers; time can be affected simply by moving. This idea is called time dilation,and is completely unexpected according to Newton, who believed that time is fixed andimmutable. It’s easy to see that the exact expression in Eq. (8) is important only when theclock is moving at speeds comparable to that of light, and when v � c, then ∆t′ ≈ ∆t, andthe two observers agree on the ticking rates of clocks.

We’ll see soon that time dilation is only the tip of the iceberg. Let’s look at one moreexample before moving on. Consider a bar moving along with speed v, being watched bytwo observers, one moving with the bar, and the other remaining stationary, as in Figure 3.The person moving along with the bar measures its length L, called it’s proper length.

v

Moving Observer Stationary Observer

L L’

t1t2

t’1t’2

Figure 3: Different observers will measure different lengths for the moving bar, with station-ary observer measuring a shorter bar.

A flash of light is fired from the back end, reflects off the front end, back to the rear.Let’s look at how the different observers see the flash of light, first from the point of view ofthe observer moving along with the bar (the first half of the figure). The flash of light takesa time t1 = L/c to reach the front end, then the same amount of time t2 = L/c to comeback for a total time of ∆t = 2L/c.

Now let’s look at the same process from the point of view of someone watching the barmove past with a speed v (the second half of the figure). This observer may not see thesame length of the bar, so we’ve labelled the length L′ instead. The bar takes a longer timet′1 = L′/(c − v), since the end of the bar is “running away from the light,” and a shortertime t′2 = L′/(c+ v) since the end of the bar is now moving towards the end of the bar. Thisgives a total time

∆t′ = L′(

1

c− v+

1

c+ v

)=

2L′

c

1

c2 − v2.

Now, as we’ve seen before the two times are related by Eq. (8), and so 2L′

c1

c2−v2 = ∆t√1−v2/c2

=

6

Page 7: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

2Lc

1√1−v2/c2

, and so comparing gives

L′ =

√1− v2

c2L, (9)

which says that the length of the bar is shorter as measured by the stationary observer! Thisidea is called Length Contraction, and is yet another interesting effect that is required bythe constancy of the speed of light; not only do moving clocks run slow, but moving rods getshort! Once again, we see that this effect is important only if the rod is moving close to thespeed of light; in order to change the length of the rod by even one percent, the rod wouldneed to be moving at more than 14% of the speed of light, or about 4.23 × 107 meters persecond! This is far outside our everyday range of experience.

2.1 Lorentz Transformations.

We have seen some of the interesting effects of moving near the speed of light, including timedilation and also length contraction. Neither of these effects is predicted by the Galileantransformation equations, Eq. (6). So, we need to try to figure out how those equations inEq. (6) need to change to accommodate Einstein’s new ideas.

The correct transformation laws should not be too very different than those in Eq. (6),since the correct laws must reduce down to the simple equations when the speeds are lowcompared to that of light. So, with this in mind, for relative motion along the x directionat speed v, then let’s try a solution of the form

x′ = γ (x− vt) ,

where γ is to be determined, and must have the property that γ → 1 as v → 0. Thisexpression has the inverse transformation

x = γ (x′ + vt′) .

Suppose that the origin of each observer overlaps at t = t′ = 0. Then we fire a laser fromthe origin. In the unprimed frame after a time t the light beam travels a distance x = ct,while in the primed frame it travels a distance x′ = ct′. Thus,

ct′ = γ (ct− vt) = γ (c− v) tct = γ (ct′ + vt′) = γ (c+ v) t′.

Solving the second for t′ and plugging into the first gives γ2 = c2/(c2 − v2), or

γ ≡ 1√1− v2

c2

, (10)

which is the same factor as in Eqs. (8) and (9)! So, we have the correct transformation forthe x components, but what about t? Start with x = γ (x′ + vt′), and plug in x′ = γ (x− vt).Solving for t′ gives

t′ = γ(t− v

c2x).

7

Page 8: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

Thus, we have the correct transformation laws, called the Lorentz Transformations

x′ = γ (x− vt)y′ = yz′ = zt′ = γ

(t− v

c2x),

(11)

where the directions perpendicular to the motion do not transform (we say they’re invariantunder these transformations), and it’s easy to see that we get the old Galilean transforma-tions, Eq. (6), in the limit that v � c. Thus, we have the correct transformations betweenthe two frames. We can easily see that these transformations give us the time dilation andlength contraction effects in Eqs. (8) and (9).

The Lorentz transformations are really quite spectacular. They mix together space andtime! These equations say that one can change the amount of time that passes, or thespace between objects, just by moving ! This merging of space and time together into onespacetime is our first taste of the malleability of space and time, a concept that will commandour attention in the chapters to follow.

2.2 The Doppler Shift.

We can use ideas introduced above to obtain an important result about the motion of lightwhen considered by two different observers. We have already introduced the idea of redshiftin Chapter ?? when we discussed the expansion of the Universe, but we did not give anydetails. It’s time to remedy this.

We’ve all heard the change in pitch of a siren as a fire engine is passing by us. Thisfamiliar effect, known as the Doppler effect, comes from the shift in frequency of the soundheard by an observer from an emitter in relative motion. This is a property of waves, and sois true for light, as well. A source moving away from us will have it’s wavelength stretchedout, and so will appear redder, while a source moving toward us will have it’s wavelengthshortened and so will look bluer than its original color. This effect will arise for light, in part,because the emitter and receiver will disagree on the amount of time between waves, due tothe time dilation effects discussed above. Let’s figure out the expression for the frequencyshift.

Consider two observers, Jack sitting at rest at the origin, and the Jill, initially at adistance x0 at time t = 0, moving away at a speed v. Jack has a flashlight which he flips onand off at intervals of T , so there is a frequency ν = T−1. What is the frequency seen by Jillas she’s moving away? Suppose Jack pulses the light at time t = 0, he will see Jill receivethe pulse at a distance x1 = ct1, where t1 is the time it took the pulse to reach Jill. Sinceshe’s moving away, in a time t1 Jill moves a distance vt1, so x1 = x0 + vt1 = ct1, or t1 = x0

c−v .Now, a time T later, Jack pulses the light again, and sees Jill receive the pulse at a

distance x2 = x0 + vt2, where vt2 is the distance away from her starting point that Jill couldmove in a time t2. But, this distance isn’t just ct2, because the light hasn’t been travelingfor the full time - there was a delay of time T between pulses, and so the light has only beentraveling for time t2 − T . Thus, x2 = c (t2 − T ) = x0 + vt2, such that t2 = x0+cT

c−v . Then

t2 − t1 =cT

c− v

8

Page 9: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

gives the difference in time measured by Jack sitting at the origin. Furthermore, just fromtheir definitions, x2 − x1 = v (t2 − t1), or

x2 − x1 =cvT

c− v.

Now, because Jill is moving relative to Jack they will disagree on the amount of time passing,and from Eq. (11) we see that

t′2 − t′1 = γ[(t2 − t2)− v

c2(x2 − x1)

]= γ

[cTc−v −

vc2cvTc−v

]= γ cT

c−v

(1− v2

c2

).

But, since γ =(

1− v2

c2

)−1/2

, we find

t′2 − t′1 =1

1− vc

√1− v2

c2T,

which can be simplified to

t′2 − t′1 =

√1 + v/c

1− v/cT.

Since t′2 − t′1 is the period, T ′, in Jill’s frame, she sees the light pulses stretched out in time.Since the frequency is just the inverse of the period we find, for an observer moving away,

νobs =

√1− v/c1 + v/c

νsource, (12)

which says that the frequency is decreased for a receding source. Since the color of lightdepends on the frequency, light sources moving away appear redder, in agreement with ourearlier assertions (as always, to get the frequency for an approaching source, you just flipthe sign of the velocity v → −v). Since the wavelength of a light wave is just λ = c/ν, wecan easily express Eq. (12) in terms of the wavelength,

λobs =

√1 + v/c

1− v/cλsource, (13)

We will return to these ideas, yet again, when we discuss the expansion of the Universeaccording to Einstein’s Theory of General Relativity.

2.3 The Interval.

The transformations in Eq. (11) have a very interesting property, which will be even more

useful in discussing General Relativity. Recall what happens if we have a vector, ~A, havingcomponents ~A = (Ax, Ay) in one frame, and then we rotate our coordinate system by an

9

Page 10: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

x’

y

Ax

Ay

A

φ

Ax’

Ay’

y’

x’

Figure 4: While the components of a rotated vector change, the length is always the same.

angle φ about the z axis to a new frame, as in Figure 4. In the new frame the vector hascomponents ~A = (Ax′ , Ay′).

We can easily relate the components in the primed system to those in the unprimedsystem from the geometry in Fig. 4. In this case

Ax′ = Ax cosφ+ Ay sinφAy′ = −Ax sinφ+ Ay cosφ.

(14)

So, the components of the vector are different. But, the actual length of the arrow, A ≡ | ~A| =√A2x + A2

y =√A2x′ + A2

y′ does not, as you can easily check. The length of the vector, which

would be the real physical part, independent of our choice of coordinates, is an invariantvalue.

We can do something very similar with the transformation laws in Eq. (11). But, justsquaring and adding the components doesn’t work; and there’s no reason to expect that itshould. Even though the Lorentz transformations mix together time and space, time stillsticks out a little bit. We can move in three directions of space, back and forth at will, butwe seem to be stuck moving along always forward in time (although we can change the rateof passage of time by moving, as we’ve seen). So, instead of squaring and adding togetherall the terms, let’s square and add the space components (this way we still get the rotationalinvariance for a fixed time), and subtract the square of the time components. We define theinterval, s2,

s2 = −c2t′2 + x′2 + y′2 + z′2 = −c2t2 + x2 + y2 + z2, (15)

which is invariant under the Lorentz transformations! This can be written in another way,recalling that ~r = xx+ yy + zz, we can write

s2 = −c2t2 + ~r · ~r.

This form of the interval shows the rotational invariance of the spatial components explicitly(since the dot product is a scalar, and independent of the coordinate system). Before looking

10

Page 11: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

at what the interval means, note that we can look at a tiny differential interval, obtained bymoving only a small distance in space and time,

ds2 = −c2dt2 + dx2 + dy2 + dz2. (16)

This form will turn out to be the most useful one.The interval in Eq. (16) will turn out to have a very important application describing

spacetime in the absence of gravity, which we will discuss in detail later. For now, however,we want to focus on a particular property. While the length of the vector in Figure 4 isalways positive, the interval need not be. If the spatial components are less than the timecomponent, then ds2 < 0. This means that a particle moving along moves faster throughtime than it does through space (this is clearly the case if the particle is standing still), andis the case for any particles moving slower than the speed of light. Particles moving in thisway are called timelike, and are said to move along timelike paths.

On the other hand, suppose that the spatial components are bigger than the time com-ponent. In this case the particle can get to some distance faster than light can travel, andso the interval ds2 > 0. These particles move along spacelike paths. Finally the interval canalso be zero, ds2 = 0, which says that (for a particle moving along x, say) dx/dt = c, whichis just the equation for the velocity of light. This means that light travels along paths (calledlightlike) of zero interval. We can include all of this information in a handy diagram, calleda spacetime diagram, seen in Figure 5.

ct

x

lightlike path

timelike path

spacelike region

timelike region

Figure 5: A spacetime diagram including the light cone. The region inside the shadedtriangles are in causal contact with the particle moving along on the timelike path, whilespacelike regions are out of causal contact.

The diagram in Fig. 5 shows a particle moving at less than the speed of light along thetimelike path on the x axis (we’ve suppressed the y and z axes for simplicity). The particle

11

Page 12: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

is currently at the intersection of the two wedges. The lines stretching out at 45◦ (wherex = ct, or ds2 = 0) are the possible paths of light moving straight out along the x axes,defining what is called the light cone. The region in the lower wedge is called the past lightcone, and represents the maximum distance away for which anything could possibly affectthe particle. Only particles inside this back wedge are close enough to have any kind of effecton the particle, while particles outside this wedge are too far away that light couldn’t get tothe particle in time to affect it. Similarly, the top wedge is called the future light cone, andrepresents the maximum possible distance that the particle could affect. Since informationcan only travel at a maximum speed of light, these wedges are bounded by the lightlike pathsmoving out at 45◦.

Regions inside the light cones (in the timelike region) can be affected by (or can affect)the particle, and are said to be in causal contact with the particle. Regions outside the lightcones (the spacelike regions) are too far away to affect (or be affected) by the particle (atleast right now), and are said to be out of causal contact with the particle. The timelikepath that the particle is moving along is also called its worldline, and traces out it’s motion.Massive particles always have timelike worldlines, and massless particles (like photons) havelightlike worldlines. There are hypothetical particles, called tachyons, which are supposedto move faster than light; however, such particles have never been found experimentally,and in fact would result in potential instabilities of the vacuum. For now, we’ll ignore thepossibility of tachyons.

The interval, ds2, is also called the proper length of the path, an example of which is therest length of the rod in the length contraction experiment. This interval can be related tothe proper time, dτ 2, which is the time measured on a clock moving along with a particlesimply by

dτ 2 ≡ − 1

c2ds2. (17)

As an example, consider a particle moving along the x axis with a constant speed v ≡ dxdt

.Then, the proper time is

dτ 2 = dt2 − 1

c2dx2 =

(1− 1

c2

(dx

dt

)2)dt2 =

(1− v2

c2

)dt2,

which, after taking the square root gives dτ =√

1− v2

c2dt. We can integrate both sides to

find the time in both frames, the proper time, τ , measures the time passing along with theparticle, while the time t measures the time passing with respect to a stationary observer.Integrating and solving for t gives

t =τ√

1− v2

c2

, (18)

which is precisely the equation for time dilation, Eq. (8)! We’ll see later that both theproper length and proper time are very useful concepts.

2.4 Velocity Addition in Special Relativity.

Galilean velocity additions were intuitive and simple in Eq. (7), but these can’t be rightsince it wouldn’t let everyone see the same speed of light. We need to figure out the correct

12

Page 13: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

expressions for the addition of velocities now, and it happily turns out to be very simple.Suppose that the relative velocity of the two frames is V , and that the velocity of a particlein the stationary frame is v ≡ dx

dt, while the velocity in the moving frame is v′ ≡ dx′

dt′. Taking

the differentials of the first and last of Eq. (11) gives

dx′ = γ (dx− V dt)dt′ = γ

(dt− V

c2dx)

Dividing the first of these by the second gives

dx′

dt′=dx− V dtdt− V

c2dx

=dxdt− V

1− Vc2dxdt

.

Now, replacing v′ = dx′

dt, and v = dx

dt, then we find the law of velocity addition

v′ =v − V

1− Vc2v, (19)

with an inverse transformation

v =v′ + V

1 + Vc2v′. (20)

Let’s consider a couple examples, first suppose a ball is moving with velocity v in one frame,that the relative motion between the two frames is V = 0, then both observers clearly agreeon the velocity. Now, suppose that a particle is moving at speed c/2 in the moving frame,which is moving at a speed of c/2. Classically, the outside observer would measure a speedc, just by adding the velocities. Eq. (20) gives, instead,

v =c/2 + c/2

1 + c2c2

c2

=c

1 + 14

=4

5c,

which is less than the speed of light. Now, suppose that the moving observer is looking atlight, and is moving at speed V . What is the speed measured by the stationary observer? Inthis case, Eq. (20) gives

v =c+ V

1 + Vc2c

=c+ V

1 + Vc

= c

(c+ V

c+ V

)= c,

independent of V , which shows that all observers agree on the speed of light!In the same way, we can find the relativistic velocities in the other directions, v′y = dy′

dt′,

and v′z = dz′

dt′, which give

v′y = vy

γ(1− vxVc2

)v′z = vz

γ(1− vxVc2

),

(21)

where we have used the fact that vy = dydt

and vz = dzdt

. Thus, we have the full description ofvelocities.

13

Page 14: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

2.5 Energy and Momentum in Special Relativity.

We now have all of the equivalent expressions to the Galilean transformations, includingthe laws of velocity addition. We’ve seen that these expressions lead to some interestingand counter-intuitive results, but we’re not done yet. We now need to expand our analysisbeyond just the coordinate transformations and velocity additions to include momentum andenergy. We begin with the momentum.

In the classical case the momentum of a particle of mass m moving along the x directionis just p = mv = mdx

dt. The question is how to generalize this result to include Special

Relativity. The momentum is the ratio of the distance traveled by the particle to the timethat it takes to go that distance; but what time should we use? We only have one observernow, watching a particle zipping by and asking what its momentum is. So, we don’t needto try to relate the values seen by two observers; instead, we should be looking at a timeintrinsic to the particle, itself. This suggests that we use the proper time of the particle,given (for a constant velocity) by Eq. (18). Thus, we define

p = mdx

dτ=

m√1− v2/c2

dx

dt= γmv, (22)

where we have recalled that γ =(

1− v2

c2

)−1/2

and v = dxdt

. Eq. (22), which reduces to

p = mv when the velocity is much less than that of light, is the correct expression for therelativistic momentum.

The expression, Eq. (22), is very interesting. We can still describe the force by findingF = dp

dτ, and say that the momentum of a particle is changed by the force acting on it.

However, according to Newton we can just keep pushing harder and accelerating faster toan arbitrarily high speed. But, Eq. (22) says that as we push, the particle picks up moremomentum. The momentum asymptotically approaches infinity as the velocity of the particleapproaches that of light. Changing this momentum by even a little bit requires a tremendousforce, becoming infinite as v → c. This demonstrates that no massive particle can travelfaster than light, since it would take an infinite amount of energy to push it up to the speedof light!

Now that we have the relativistic momentum, how do we get the relativistic energy?Let’s go back to the expression for the proper time, moving only along x, dτ 2 = dt2− 1

c2dx2.

Now, let’s divide by dτ 2 to find

1 =

(dt

)2

− 1

c2

(dx

)2

.

But, as we’ve just seen, dxdτ

= pm

, while dtdτ

= γ, and so

1 = γ2 − p2

m2c2⇒(mc2

)2=(γmc2

)2 − (pc)2 .

Rewriting gives (γmc2

)2= (pc)2 +

(mc2

)2.

14

Page 15: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

The question now becomes, how do we interpret these values, specifically the one on the left?To find out, let’s consider the nonrelativistic case, where the velocity is small compared tothe speed of light. Then p ≈ mv, and

γmc2 =√p2c2 +m2c4

≈ mc2√

1 + m2v2

m2c2

≈ mc2(

1 + 12v2

c2

)= mc2 + 1

2mv2,

where we have used the binomial expansion for the third line. The last term is the non-relativistic kinetic energy, while the first term is an energy associated with the mass of theparticle. So, we should associate γmc2 with the total energy of the particle, and so we shouldexpect that the total relativistic energy is E = γmc2 such that

E2 = p2c2 +m2c4. (23)

This is the correct expression for the energy of the system. The value mc2 is identified withthe rest-mass energy of the particle, giving us Einstein’s famous expression

E = mc2, (24)

for a stationary particle. Thus, unlike the classical case, a free particle has an energy justfrom it’s sheer existence! This is a very important result, saying that energy and mass areequivalent! This idea will lead to some very interesting and important results, for it saysthat, given sufficient energy, one can make mass! All of particle physics is predicated uponthis result, allowing for the creation of different particles in accelerators, and will give us theorigin of all structure in the Universe after the Big Bang! We will discuss this in more detailas we go along.

We can see a couple more properties of the energy. First of all, subtracting away the restmass energy from the total energy gives the kinetic energy of a free particle

KE = E −mc2 = (γ − 1)mc2. (25)

Finally, what is the energy of a particle with no mass? Since E = γmc2, one might firstexpect that the energy would be zero. But this makes no sense, since we know that lightcarries energy (the Sun heats the Earth, for example). We also know that massless particles(like light) travel at the speed c. In that case the energy becomes E = mc2

1−v2/c2 → 0/0, which

is completely useless to us. However, we also know that E2 = p2c2 + m2c4, which says thatE = pc for a massless particle. All of the energy is carried by the momentum of the particle(of course, the momentum isn’t given by Eq. (22) anymore), which depends on the frequencyof the light, as is familiar from quantum mechanics. Thus, we have the correct expressionsfor the relativistic energy and momentum of a particle, both of which must be conserved inany reaction. Let’s consider an interesting example.

Figure 6 shows a reaction producing anitprotons in two different reference frames. Inthe top reaction we are looking at the reaction in the center of mass frame, in which both

15

Page 16: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

protons are moving towards each other at the same speed vcm. This produces antiprotonsvia the reaction

P + P → P + P + P + P ,

where P is the antiproton. You might wonder why we get so many products from thisreaction. The answer is that this reaction must conserve a quantity called baryon number,meaning that the total number of baryons (protons) that we start with (2) has to be thesame number that we end up with (since antiprotons count as −1, we get 3− 1 = 2).

Center of Mass FrameInitial

vcm vcm

Final

v = 0f

Lab FrameInitial

vlab v=0

Final

vf

Figure 6: Smashing together two protons with enough energy can produce antiprotons, asshown in the two figures above. The reaction is seen in two different reference frames, one inthe center of mass frame, and another in the lab frame where the second proton is stationary.

In the lower reaction, looked at in the lab frame, we have a single proton moving towardsa stationary one. What we want to determine is how much energy does the incident protonhave to have (in the lab frame) to produce these antiprotons. Let’s look at the center of massframe, first. The initial energy of each proton is the same, since they’re moving at the samespeed, and is Ei = γcmmc

2, where m is the mass of the proton, and γcm is the gamma factorin the center of mass frame, which is different from the gamma factor in the lab frame! Theinitial momentum is zero in this frame. Supposing that all the energy went into making theproton and antiproton, then the final energy in the center of mass frame is just the massenergy of the four product particles, Ef = 4mc2, since the proton and antiproton have the

same mass. Equating the energies gives γCM = 2, which says that vCM =√

32c.

Now, we want the energy in the lab frame, so let’s use the velocity addition equations,Eq. (19) to transform to the lab frame. To do so, we need to run along with the proton onthe right at a speed V = −vcm. Thus, we find the velocity of the moving proton to be

v′ =vCM + vCM

1 +v2CM

c2

=4√

3

7c.

Now we can figure out γlab = 1r1−

v2labc2

= 7, meaning that the initial energy of the proton in

16

Page 17: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

the lab frame is Ei = 7mc2. Since one of these mc2’s is the mass energy of the proton, itself,this means that 6 are left for the kinetic energy. Thus, a proton needs a kinetic energy of6mc2 to just make this reaction happen. This was a fair amount of work, requiring us totransform between frames; we’ll soon revisit this problem using a different method which ismore useful for many problems.

2.6 Acceleration in Special Relativity.

In everything that we’ve discussed so far we have typically kept the velocities constant,for example in the equations for velocity addition, Eqs. (19). It is sometimes claimedthat Special Relativity doesn’t include accelerations, and that we need the complete generalrelativistic theory to describe it, but this is actually false. By the same analysis that led toEq. (19) one can show that the acceleration transforms as

a′x =ax

γ3(1− V v

c2

)3 . (26)

By considering the acceleration we can obtain a very interesting result. Suppose we havea rocket of height h with a laser at the bottom, pointing up. The laser fires a series ofpulses, separated by time intervals T . At the top of the rocket we have an observer countingthe pulses. The rocket is accelerating upwards at a rate g, which is the usual gravitationalconstant. We expect that the observer won’t measure the same time between the pulses asthe laser is sending out since the observer is “running away from the light,” as she accelerates.Let’s figure out what she sees.

Suppose that the rocket starts from rest. Then when the laser fires it reaches the topof the rocket in a time t1. In that time the light travels a distance y1 = ct1, which isthe initial height, h, plus the distance that the rocket travelled during that time, 1

2gt21, so

y1 = h + 12gt21 = ct1. The laser sends out the next pulse at a time T later, and the light

pulse now reaches the top at a time t2, which is again the initial height, plus the distancethe rocket moved during that time, y2 = h+ 1

2gt22, but the light has travelled only for a time

t2 − T , there was no light before the beam was sent out. So, y2 = h + 12gt22 = c (t2 − T ).

Solving these expressions gives the time it takes for the light to reach the top as

t1 = cg

[1−

√1− 2gh

c2

]t2 = c

g

[1−

√1− 2g(h+cT )

c2

],

where we have taken the minus sign in the quadratic equation solution to get the correctg → 0 limit. Now, the laser sent out the light pulses with period T , but our observer seesthem with period T ′ = t2 − t1, which is

T ′ =c

g

[√1− 2gh

c2−√

1− 2g (h+ cT )

c2

].

In general we don’t expect the periods to agree. To see what the difference is, let’s expandthis result to second order in a Taylor series. In general, both 2gh and 2g (h+ cT ) � c2,

17

Page 18: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

and so we can use the binomial theorem to write

T ′ ≈ c

g

[1− gh

c2− 1

2

g2h2

c4− 1 +

g (h+ cT )

c2+

1

2

g2 (h+ cT )2

c4

],

Canceling off the common terms and dropping the terms of order T 2 gives

T ′ =

(1 +

gh

c2

)T,

which says that the period of reception of the pulses is longer than the period of emission,as we should expect. Instead of expressing our result in terms of the period, we can insteadexpress it in terms of the frequency, ν = 1/T . Calling νobs the observed frequency, and νsource

the emitted frequency, we have

νobs =νsource

1 + ghc2

.

Now, in general gh� c2, and so we can expand the result once again to find

νobs =

(1− gh

c2

)νsource. (27)

Thus, we find that the acceleration produces a Doppler shift in the frequency (in factthis result could have been obtained from Eq. (12) by writing v = gt, where t = h/c is thetime to reach the top of the rocket, and expanding the square root recalling that v � c).This is actually not a surprising result. But, Eq. (27) does contain a surprising result. Thefrequency depends on the acceleration, which we have chosen to be a = g, the accelerationdue to gravity on Earth. If we ignore the derivation of Eq. (27), and just look at the result,then we would be led to believe that a beam of light just traveling upwards in a gravitationalfield would lose frequency!

This is, in fact, completely true. As we will discuss further later, light is affected bygravity, and as light tries to escape from a gravitational field it experiences a redshift, causingits frequency to decrease (hence becoming redder). We can think of this in another (classical)way. When we throw a ball up into the air, it slows down, using it’s kinetic energy to dowork against the force of gravity. Light has to do work against gravity, too, but it can’tchange it speed. Therefore it has to lose energy, not by losing speed, but by losing frequency,since the energy of light depends on its frequency. We’ll return to this result later.

As a final comment, notice that the gravitational potential energy of a mass near thesurface of the Earth is U = mgh, and so gh = U/m, but we know that U/m ≡ Φ, thegravitational potential. Thus, we can write Eq. (27) as

νobs =

(1− Φ

c2

)νsource. (28)

This will turn out to be a useful form. For now, though, we want to introduce a veryconvenient and important notation.

18

Page 19: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

3 Four Vectors and Tensors.

As we’ve seen, there is a deep connection between space and time. They are no longer theseperate, immutable backgrounds that Newton imagined, but rather a plastic, changableframework unified into a single spacetime, where time and space can be changed into eachother simply by moving ! The Lorentz transformations, Eqs. (11), mix together time andspace in much the same way as the rotation of the coordinate axes mix together the com-poents of a vector, as in Eqs (14). This gives us a hint suggesting that we look at the Lorentztransformations as a sort of generalized rotation of a vector containing both spatial and timecomponents! This idea will turn out to be crucial to every aspect of our future work, so let’sdevelop it in detail.

First, let’s examine Eq. (14) a bit more, rewriting the components in matrix notation,now including the z components (z′ = z since we’ve rotated the coordinate system aboutthe z axis), A′x

A′yA′z

=

cosφ sinφ 0− sinφ cosφ 0

0 0 1

AxAyAz

. (29)

The preceding equation can be written fully in matrix notation. If ~A is the column vector,and R is the rotation matrix, then Eq. (29) is simply

~A′ = R ~A. (30)

In order to form the scalar product giving the length of ~A, we need to multiply the columnvector ~A by a row vector on the left (this way we have a 1 × 3 matrix multiplying a 3 × 1matrix, giving a 1 × 1 matrix, or just a number, which is precisely what we need). We can

form a row vector by taking the transpose of ~A, defined by

~AT = (Ax, Ay, Az) , (31)

such that ~AT ~A = A2x +A2

y +A2z ≡ A2 (note that ~AT ~A 6= ~A ~AT , in general; in the first case we

get a number, and in the second we get a 3× 3 matrix). The transpose operation just flipsthe rows and columns about the diagonal. In the same way, the transpose of the rotationmatrix in Eq. (29) is just

RT =

cosφ − sinφ 0sinφ cosφ 0

0 0 1

. (32)

Now, in order for the length of the vector to be an invariant, we need ~A′T ~A′ = ~AT ~A. Pluggingin the expression for ~A′ from Eq. (30), and remembering that for two matrices A and B,(AB)T = BTAT ,

~A′T ~A′ = ~ATRTR ~A = ~AT ~A.

In order for this expression to be true for any rotations at all, we need

RTR = 1, (33)

19

Page 20: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

where 1 is the unit matrix, consisting of 1 in every entry along the diagonal and zeroeverywhere else. Eq. (33), which says that RT = R−1, is the orthonormality condition forthe rotation matrix, and you can easily check that the R and RT in Eqs. (29) and (32)satisfy it. Matrices which satisfy Eq. (33) are called orthogonal.

Instead of working in matrix notation, we can instead work entirely in terms of thecomponents of the vector and matrix, which is often much more convenient. Denoting thecomponents of the vector ~A by Ai (where i = x, y, z), and the entries of the rotation matrixby Rij (Rxx = cosφ, etc.), we can rewrite Eq. (29) as

A′i =3∑j=1

RijAj. (34)

Eq. (34), which is just Eq. (30) in component form, is, in fact, the definition of a vector.Any object that transforms upon the change of coordinates in this way is a vector. Becausethe length of the vector is an invariant, we know that

3∑i=1

A′iA′i =

3∑i=1

AiAi.

Plugging in Eq. (34) for A′i on the left-hand side gives

3∑i=1

3∑j=1

3∑k=1

RijAjRikAk =3∑j=1

3∑k=1

AjAk

3∑i=1

RijRik =3∑i=1

AiAi.

Once again, in order for the equality to hold we need the rotation matrices to satisfy anorthonormality condition,

3∑i=1

RijRik = δjk, (35)

where δjk is the Kronecker delta symbol, defined such that

δij =

{1 i = j0 i 6= j.

(36)

This is the component form of the orthonormality condition in Eq. (33), since RijRik =RTjiRik for proper matrix multiplication.

Now, the length (squared) of the vector becomes

3∑j=1

3∑k=1

AjAk

3∑i=1

RijRik =3∑j=1

3∑k=1

AjAkδjk =3∑j=1

AjAj,

since the Kronecer delta kills off every term of the summation over k which isn’t j, andshows the invariance of the length of the vector under rotations. So, the lesson here is thatwe can work with either the abstract matrix notation, as in Eq. (30), or with the componentnotation, as in Eq. (34). It will turn out for our work that the component notation will bemore convenient.

20

Page 21: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

Let’s return to Eq. (34) for a moment. Suppose that the vector ~A was just the displace-ment vector, ~r. Then, rotating the coordinate system about the z axis by an angle φ givesthe components of the new displacement vector ~r′ as x′

y′

z′

=

cosφ sinφ 0− sinφ cosφ 0

0 0 1

xyz

.

Now, for a given angle, it’s clear that since x′ = x cosφ + y sinφ, then ∂x′

∂x= cosφ, while

∂x′

∂y= sinφ, and so on for the y′ derivatives. Thus, the components of the rotation matrix can

be written in terms of partial derivatives. In general we can write (letting x1 = x, x2 = y,and x3 = z)

A′i =3∑j=1

∂x′i∂xj

Aj, (37)

which is a more general definition of a vector. We will return to these ideas, generalizingthem to more dimensions, very soon.

3.0.1 Spacetime Four Vector.

Now that we have a good grasp on ordinary vectors in three dimensions (called “three-vectors”), let’s now try to generalize the idea to including the time component. Going backto Eq. (11) and rewriting it in matrix notation after a bit of manipulation we find

ct′

x′

y′

z′

=

γ −γ v

c0 0

−γ vc

γ 0 00 0 1 00 0 0 1

ctxyz

. (38)

Eq. (38) is very similar in form to Eq. (29), and so we’ll pursue this analogy further. Wedefine the spacetime four-vector as being the column vector with components xµ, whereµ = ct, x, y, z, or µ = 0, 1, 2, 3 where by 0 we mean ct, etc. Instead of writing the abstractfour vector, we’ll now just refer to it in terms of its components, so

xµ ≡

ctxyz

≡ ( ct~r

). (39)

As we’ll see, writing the index µ as a superscript defines the coordinates as a vector, and isnot meant to imply raising it to a power. We can further define the Lorentz transformationmatrix in terms of its components, and write

Λµν ≡

γ −γ v

c0 0

−γ vc

γ 0 00 0 1 00 0 0 1

, (40)

21

Page 22: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

where we require two indices on the matrix to label the rows and columns. We’ll explainthe placement of the indices in a moment. Thus, we can rewrite Eq. (38) completely incomponent notation as

xµ′=∑ν

Λµνx

ν , (41)

where the sum over ν runs over ct, x, y, z. Eq. (41) says, for example, that

x0′= ct′ =

∑ν

Λ0νx

ν = Λ00x

0 + Λ01x

1 + Λ02x

2 + Λ03x

3 = γct− γ vcx,

which is precisely correct.We’ve rewritten the Lorentz transformations in terms of the components, defining the

coordinate four-vector in process. The next thing to consider is the scalar product of four-vectors, generalizing the usual dot product. Remember that the dot product’s definingcharacteristic was that it preserved the length of the vector in any coordinate system; wewant the scalar product of the four-vectors to keep this property. A first guess might besimply to write

∑µ x

µxµ, in analogy with the ordinary three-dimensional dot product. Infact, if the time was zero, then we would need the four-dimensional dot product to reduceto this case. However, recall that we have already defined the invariant quantity associatedwith space and time, namely the interval, s2, in Eq. (15), which contains a minus sign on thetime component. If we just took our interval to be the sum of the squares of the four-vectorcomponents then we wouldn’t get back the proper form. Instead let’s define a new vector,much like the transpose of the four-vector, xν , with the index written as a subscript,

xµ = (x0, x1, x2, x3) = (−ct, x, y, z) ≡ (−ct, ~r) , (42)

such thats2 =

∑µ

xµxµ =∑µ

xµxµ = −c2t2 + x2 + y2 + z2 = −c2t2 + r2, (43)

which is invariant. We now see the importance of the index placement. Vectors with super-script indices, as in Eq. (39) are called contravariant, or just vectors. Vectors with subscriptindices, as in Eq. (42) are called covariant or one-forms. The product of a vector and aone-form, summing over the common index, is a number. Eq. (43) can be generalized todifferential lengths ds2 as

ds2 =∑µ

dxµdxµ =∑µ

dxµdxµ = −c2dt2 + dx2 + dy2 + dz2. (44)

Just as we can generalize the idea of the three-dimensional displacement vector ~r toother vectors (such as velocity, forces, etc.), we can have different four-vectors. We’ll seeother specific examples, soon, but for now any four-vector is defined as aµ such that thecomponents transform under the Lorentz transformations,

aµ′=∑ν

Λµνx

ν , (45)

22

Page 23: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

while a general one-form transforms as

b′µ =∑ν

Λνµbν . (46)

Note the placement of the indices in both cases. The index that is being summed over (νin this case) is a dummy index, and any (typically greek) letter can be used, but the freeindex (not being summed over) has to balance on both sides. Then, the dot product of twofour-vectors is

aµbµ = aµbµ = −a0b0 + a1b1 + a2b2 + a3b3, (47)

and is an invariant in any reference frame. Once again, let’s find the condition on thetransformations required by the invariance of the dot product, aµ

′b′µ = aµbµ. From Eqs. (45)

and (46) we have∑µ

aµ′b′µ =

∑µ

∑ν

∑λ

Λµνa

νΛλµbλ =

∑ν

∑λ

aνbλ∑µ

ΛµνΛ

λµ

In order for the right-hand side to equal∑

ν aνbν (remember that the dummy index can

be any letter since we’re summing over it, and so this is still∑

µ aµbµ), then the Lorentz

transformation matrices have to satisfy∑µ

ΛµνΛ

λµ = δλν (48)

where δλν is the same Kronecker delta we defined before (note the placement of the indices;since the left-hand side has the λ upstairs, the delta symbol has to, as well). In standardmatrix notation this would be ΛTΛ = 1. Plugging Eq. (48) into the dot product expressionabove kills off the sum over λ and sets λ = ν, which preserves the dot product. So, we haveour four-vectors.

Now, before continuing we can simplify the notation a bit. In all the expressions involvinga sum over an index, for example Eqs. (41), (44), and (48) we see that any time we aresumming over an index in an expression, that index appears twice, once upstairs and oncedownstairs. Since this is always the case, we hardly need to write the sum! So, any time wesee the same index written as a superscript and also as a subscript in the same expression wewill take it to mean that we should sum over the repeated index. Thus, for any expression∑

µ

aµbµ ≡ aµbµ, (49)

with the summation implied. This notation was originally suggested by Einstein himself,and joked that he had made a great discovery in mathematics. It definitely simplifies theexpressions, giving, for example

xµ′

= Λµνx

ν

ds2 = dxµdxµ,

and so on (note that in the first expression, only ν is summed over). From now on we willfollow Einstein’s summation convention.

23

Page 24: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

Once again, we note that from Eq. (38)

∂x0′

∂x0= Λ0

0,

and so on for the rest of the terms. Thus, the general transformation law for a four-vector,upon a coordinate transformation xµ → xµ

′= xµ

′(xµ), may be written

aµ′=∂xµ

∂xνaν , (50)

while for a one-form we have

bµ′ =∂xν

∂xµ′ bν . (51)

Taking the dot product of these two expressions gives

aµ′bµ′ = ∂xµ

∂xνaν ∂x

ρ

∂xµ′ bρ

= ∂xµ′

∂xν∂xρ

∂xµ′ aνbρ,

but, by the chain rule∂xµ

∂xν∂xρ

∂xµ′ = δρν , (52)

(always remember that we are summing over the repeated index, and a superscript in thedenominator counts as subscript in the numerator) and so

aµ′bµ′ = δρνa

νbρ = aνbν = aµbµ,

(after changing the dummy index at the end) which correctly preserves the dot product. Wenow have our first example of a four-vector, the spacetime vector in Eq. (39). However, aswe have noted, any object that transforms as Eq. (45) is a four-vector. Let’s now try to findother examples.

3.0.2 Energy-Momentum Four-Vector.

In our discussion of Special Relativity we discussed, not only the space and time, but alsoenergy and momentum. Recalling Eq. (23) with E = γmc2, and ~p = γm~v, we can definethe energy-momentum four-vector, pµ (which is a vector)

pµ =

(Ec

~p

)=

γmcγmvxγmvyγmz

. (53)

Let’s first check that the dot product of the vector is an invariant,

pµpµ = −E2

c2+ ~p · ~p = − 1

c2

(E2 − p2c2

).

24

Page 25: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

But, from Eq. (23) the term inside the parenthesis gives m2c4, which tells us that

pµpµ = −m2c2, (54)

which is, indeed, an invariant. Taking Eq. (53) as a four-vector immediately tells us howthe energy and momentum transform under a Lorentz transformation,

pµ′= Λµ

νpν , (55)

as is true for any four-vector.We can see the usefulness of this four-vector by reconsidering the problem described in

Fig. (6), using four-vectors to solve it, instead. The initial four-momentum of the systemis just the sum of the four-momenta of each particle, so pµi = pµ1 + pµ2 , while the final four-momentum for the collection of particles is just pµf . Energy and momentum conservationtells us that

pµ1 + pµ1 = pµf .

Now, let’s square both sides (i.e., dot each side with the one-form version of itself), whichgives

pµ1pµ1 + pµ2pµ2 + 2pµ1pµ2 = pµfpµf

Now, we know that pµ1pµ1 = pµ2pµ2 = −m2c2, while pµfpµf = −E2f/c

2 = −16m2c2, since theenergy of the final system is just 4mc2 in its own rest frame. So, we find

pµ1pµ2 = −7m2c2.

We have to evaluate the dot product on the left-hand side. To do so, we can apply the veryuseful property of the dot product of the four-vectors which is that we can evaluate it inany reference frame! In particular, in the lab frame where the target proton is at rest, thenpµ1 = (E/c, ~p), and pµ2 = (mc, 0), and so

pµ1pµ2 = −mE = −7m2c2,

which gives a total energy of E = 7mc2, for a kinetic energy of KE = 6mc2, just as before,but with less work! So, we see that the invariance of the dot product is a very handy property.

We can determine two more useful four-vectors which are related to the energy-momentumfour-vector. First, notice that the energy-momentum tensor in Eq. (53) involves the restmass of the particle. Suppose we divide through by the mass, defining a new four-vector,

Uµ ≡ 1

mpµ. (56)

Remembering that the three-momentum is the mass times the velocity of the particle, we seethat Eq. (56) is the relativistic generalization of the ordinary velocity, and is the four-velocityof the particle. In particular, note that, from Eq.(54)

UµUµ = −c2, (57)

which is clearly an invariant.

25

Page 26: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

To get the other interesting four-vector, recall that the energy of light is related to it’sfrequency, E = hν = ~ω, while the momentum is related to the wavelength, p = h/λ = ~k,where k is the wave number. Thus, we can define the wave number four-vector, kµ, suchthat

kµ =

(ωc~k

)=

1

~pµ, (58)

which transforms as does any four vector; in particular, we can use the four-vector nature ofkµ to work out the expressions for the Doppler shift.

3.0.3 Electromagnetic Potential Four Vector.

We still aren’t done with four vectors. In particular, we can define two more very useful ones.Let’s go back to the Maxwell equations in Eqs. (1). Since the divergence of the magnetic

field is zero, we know that ~B can be written as the curl of some vector, ~A,

~B = ∇× ~A, (59)

where ~A is called the vector potential. Now, plugging this result into Faraday’s law (thesecond of Maxwell’s equations) gives

∇× ~E = − ∂

∂t

(∇× ~A

)⇒ ∇×

(~E +

∂ ~A

∂t

)= 0.

Any vector whose curl vanishes can be written as the gradient of some scalar, so we canwrite the electric field as

~E = −∇V − ∂ ~A

∂t, (60)

where V is the scalar potential (we have chosen a minus sign on V for convention). So, if wecan determine these potentials we immediately know the fields. To determine the equationsfor the potentials we plug our expressions for ~E and ~B back into the two remaining Maxwell’sequations which give

∇2V + ∂∂t

(∇ · ~A

)= − 1

ε0ρ(

∇2 ~A− 1c2∂2 ~A∂t2

)−∇

(∇ · ~A+ 1

c2∂V∂t

)= −µ0

~J.

We can rewrite the first equation to bring it closer to the second by adding and subtracting1c2∂2V∂t2

to get (∇2V − 1

c2

∂2V

∂t2

)+∂

∂t

(∇ · ~A+

1

c2

∂V

∂t

)= − 1

ε0ρ.

Finally, recalling that ε−10 = µ0c

2, and dividing everything through by c gives(∇2V

c− 1

c2

∂2

∂t2V

c

)+

∂(ct)

(∇ · ~A+

1

c2

∂V

∂t

)= −µ0 (cρ) .

26

Page 27: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

This form of the equation looks very similar to the equation for ~A, except for the timederivative instead of the gradient term. Let’s ignore this for now, and suppose that we canput together the charge and current densities into a single four vector,

Jµ =

(cρ~J

), (61)

which then suggests that the vector and scalar potentials should be joined into a singleelectromagnetic potential,

Aµ =

(Vc~A

). (62)

If we accept these as four-vectors, then how should we look at the Maxwell equations deter-mining them? The two equations are almost the same, but one involves a gradient (deriva-tives over space), while the other involves a time derivative. However, we know that timeand space are related, so perhaps we could blend together the space and time derivativesinto a single spacetime derivative. We define the spacetime derivative of a function f(xµ) as

∂xµf(xµ) ≡ ∂µf ≡

(1

c

∂f

∂t,∇f

), (63)

where we have introduced a very handy notation writing the derivative as a one-form. Withthis definition we can write

∂µJµ =

1

c

∂t(cρ) +∇ · ~J =

∂ρ

∂t+∇ · ~J.

But, we know that ∂tρ+∇· ~J = 0, from the equation of continuity, which is just conservationof charge. It states that any change in charge in some region can only happen by the chargeflowing in or out of that region. Thus, we find that ∂µJ

µ = 0, which is clearly an invariant,as we know it has to be.

For the potential, Aµ, we have

∂µAµ =

1

c

∂t

(V

c

)+∇ · ~A =

1

c2

∂V

∂t+∇ · ~A,

which appears in the Maxwell equations. As we’ll discuss soon, what this term actuallyequals is up to us and can be chosen at will (though we know it has to be an invariant);we’ll return to this below. What about the other term in Maxwell’s equations, involving thewave equation? In this case we’re taking a second derivative, which suggests that we applythe derivative operator in Eq. (63) twice using the dot product. So, we need to define afour-vector analog of Eq. (63), which will just change the sign of the first term

∂µf ≡ ∂

∂xµf ≡

(−1

c

∂f

∂t,∇f

). (64)

Thus, we can write the second derivative of a function as

∂µ∂µf = ∂µ∂µf =

(∇2 − 1

c2

∂2

∂t2

)f ≡ 2f, (65)

27

Page 28: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

where 2 is the d’Alembertian defined before. So, we can rewrite the two Maxwell equationsall in one equation as

2Aµ − ∂µ (∂νAν) = −µ0J

µ, (66)

which looks very much simpler (though, of course, contains no new information). This addsseveral more four-vectors to our array; four-vectors seem to be everywhere!

The plethora of four-vectors introduced above might suggest that any ordinary three-dimensional vector could be made into a four-vector simply by finding a time componentto add onto the spatial components. However, this is not the case; the electric field, forexample, can’t be viewed as part of a larger vector. We already know that the electricand magnetic fields really belong together in a single electromagnetic field, which is not afour-vector. To figure out what the electromagnetic field is, we need to generalize the ideaof vectors to introduce the concept of tensors.

3.0.4 Tensors.

In our study we have so far discussed two types of objects, scalars and four-vectors. Whatdistinguishes one object from another depends on how that object transforms under a changeof coordinate system. Suppose that we make a coordinate transformation, letting xµ →xµ

′(xµ), where the new coordinates are functions of the old coordinates (this generalizes the

coordinate axis rotation seen before). Scalar quantities are invariant under our change ofcoordinate system. Suppose we have a scalar quantity, S (xµ) depending on the coordinates,xµ. Then, under the coordinate transformation we get a new quantity S ′

(xµ

′)which must

satisfy the invariance

S ′(xµ

′)

= S (xµ) . (67)

This expression says that the physical quantity, S, cannot be changed upon a coordinatetransformation (for example, the object’s mass doesn’t change simply because we switch topolar coordinates). The form of the scalar may look different, but evaluating it at the corre-sponding physical point must return the same answer; such equations are called covariant.As a simple example, in Cartesian coordinates the electrostatic energy of a point charge is

U (x, y, z) =q

4πε0√x2 + y2 + z2

.

Making the coordinate transformations

x = r sin θ cosφy = r sin θ sinφz = r cos θ

leads to a much simpler form

U (r) =q

4πε0r.

Evaluating the energy at the point (x0, y0, z0) must return the same value as evaluatingit in the new system with r0 =

√x2

0 + y20 + z2

0 . The answer must be the same since theelectrostatic energy is a scalar.

28

Page 29: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

We have also encountered four-vectors, generalizing the three-dimensional (three-vectors)familiar from elementary physics. Four-vectors have a different transformation law thanscalars, as seen in Eq. (50). Under the transformation xµ → xµ

′the four-vector, aµ, changes

to

aµ′=∂xµ

∂xνaν ,

while a one-form transforms as in Eq. (51)

b′µ =∂xν

∂xµ′ bν ,

such that aµ′b′µ = aµbµ is a scalar quantity, independent of the coordinate system used. These

transformations define the four-vector, as we’ve discussed. But, what about the product offour-vectors? How does that transform? Suppose we have two four-vectors, aµ and bν , andmultiply them together, forming aµbν . Then, under a coordinate transformation, we find

aµ′bν

′=

(∂xµ

∂xρ

)aρ(∂xν

∂xλ

)bλ

= ∂xµ′

∂xρ∂xν

∂xλaρbλ

This is a new transformation, and so aµbν must be a new object. In particular, we coulddefine a new quantity Cµν ≡ aµbν . Then, the above equation would say that Cµν wouldtransform as

Cµν′=∂xµ

∂xρ∂xν

∂xλCρλ. (68)

Suppose, instead, we formed the product aµbν with a vector and one-form. Then this productwould transform as

aµ′b′ν =

∂xµ′

∂xρ∂xλ

∂xν′ aρbλ,

such that we could again define a new quantity Cµν ≡ aµbν which would transform as

Cµ′ν =

∂xµ′

∂xρ∂xλ

∂xν′Cρλ. (69)

The process could be generalized to any number of indices, either up or downstairs.The quantities in Eqs. (68) and (69) are called tensors of rank two, since they have two

indices. The tensor in Eq. (68) is a twice contravariant tensor (sometimes called a (2, 0)tensor), while that in Eq. (69) is a mixed tensor, once contravariant once covariant (oftencalled a (1.1) tensor). Tensors generalize scalars and vectors; a tensor of rank zero (meaningno indices) is a scalar, while a tensor of rank one (one index) is a vector. The rank can goas high as you like.

We have, in fact, already encountered tensors of rank two. One example is the four-dimensional Kronecker delta, seen for example in Eq. (52). The Kronecker delta symbol hasthe very nice property that it is always the same, no matter what the transformation, as wecan easily check. Under a transformation, δµν → δµ′ν , where

δµ′ν =∂xµ

∂xρ∂xλ

∂xν′ δρλ =

∂xµ′

∂xρ∂xρ

∂xν′ =∂xµ

∂xν′ = δµ′ν ,

29

Page 30: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

always. This means that δµν is always the same form (with ones along the diagonal andzeros everywhere else).

The next rank-2 tensor that we encountered is the Lorentz transformation matrix, it-self, Λµ

ν , as in Eq. (40). Notice, if the change in coordinate system is simply a Lorentztransformation, then

∂xµ′

∂xν= Λµ

ν ,

and we just get back the Lorentz transformation tensor. In particular, under a Lorentztransformation, the electromagnetic potential in Eq. (62) becomes

Aµ′=∂xµ

∂xνAν = Λµ

νAν ,

mixing together the scalar and potentials.We can construct matrices of lower rank from higher ones by a process called contraction,

which sets two indices (one upstairs and one downstairs, of course) equal to each other. Forexample, suppose we have a mixed tensor T µν . Then, setting µ = ν (and summing of therepeated index, as always) we end up with a scalar, as we can easily check by looking at thetransformation. In general, T µν → T µ

′ν where

T µ′ν =

∂xµ′

∂xρ∂xλ

∂xν′ Tρλ.

Then, upon setting µ = ν we find (using the chain rule)

T µ′µ = ∂xµ

∂xρ∂xλ

∂xµ′ T ρλ

= ∂xλ

∂xρT ρλ

= δλρTρλ

= T λλ,

which doesn’t transform at all. So, by contraction, we have reduced the rank by a factor oftwo (changing a rank-2 tensor into a scalar). This will end up being very useful in our laterwork.

Returning to the electromagnetic field, we are now in a position to see how to combinethe electric and magnetic fields into a single quantity. Recalling the definitions of the electricand magnetic fields in terms of the scalar and vector potentials in Eqs. (59) and (60),

~B = ∇× ~A~E = −∇V − ∂ ~A

∂t.

Now, we have combined the scalar and vector potentials into a single four-vector, Aµ. Con-sideration of the above fields suggests that we should take the derivative of Aµ. Whichderivatives to take, though? Since we are mixing space and time together via the Lorentztransformations, it’s clear that we should take the spacetime derivative (∂µ or ∂µ). Let’slook at the electric field, first, and consider Ex,

Ex = −∂V∂t− ∂Ax

∂t

= −c ∂∂x

(Vc

)− c ∂

∂(ct)(Ax)

= −c∂At∂x− c ∂

∂(ct)Ax

= −c∂xAt − c∂tAx.

30

Page 31: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

We can rewrite this a bit more symmetrically by replacing ∂t → −∂t, and ∂x → ∂x to find

Ex = −c∂xAt + c∂tAx

= c (∂tAx − ∂xAt) ,

which has a nice symmetry. So, it seems that we shouldn’t take only a single derivative, butshould subtract a derivative from another. Suppose we define a rank-2 tensor, F µν such that

F µν ≡ ∂µAν − ∂νAµ. (70)

Then,

F tx = ∂tAx − ∂xAt =Exc.

Similarly,F ty = ∂tAy − ∂yAt = Ey

c

F tz = ∂tAz − ∂zAt = Ezc.

This is very nice, so far, but what about the magnetic field. If it doesn’t fit into Eq. (70),then it isn’t much use to us. Let’s look at Bx = ∂yAz − ∂zAy = ∂yAz − ∂zAy, which isjust F yz. Similarly, F xy = Bz, and F zx = By. Thus, we can write (noting that F µν isantisymmetric such that F µν = −F νµ)

F µν =

0 Ex

c

Eyc

Ezc

−Exc

0 Bz −By

−Eyc−Bz 0 Bx

−Ezc

By −Bx 0

. (71)

F µν is the electromagnetic field tensor, encoding the electric and magnetic fields into a singlerank-2 tensor. In this form, the two remaining Maxwell’s equations take on a particularlysimple form. Suppose we take the derivative of Eq. (70)

∂µFµν = ∂µ (∂µAν − ∂νAµ)

= ∂µ∂µAν − ∂ν (∂µA

µ)= 2Aν − ∂ν (∂µA

µ)= −µ0J

ν ,

from Eq. (66). So∂µF

µν = −µ0Jν . (72)

Once again, Eq. (72) contains no new information, but is written in a very simple form whichmakes its transformation properties obvious. Notice that if we take one more derivative,

∂µ∂νFµν ≡ 0, (73)

from the antisymmetry of the field tensor, and also the vanishing of the derivative of thecurrent density four-vector, as you can easily check. We have seen the usefulness of tensors,and many of the interesting properties. Tensors will be absolutely crucial in every aspect ofour work later.

31

Page 32: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

3.0.5 The Minkowski Metric.

There is one more tensor that we need to introduce. We have seen two different types ofvectors, the ordinary contravariant vector and the covariant one-form. It would be very niceto be able to change one into the other. This amounts to changing the sign of the first term,which we could do by multiplying the vector by a matrix. We can introduce the matrix ηµν ,defined as

ηµν ≡

−1 0 0 00 1 0 00 0 1 00 0 0 1

. (74)

Then, multiplying this matrix by the four-vector changes the sign of the time-component,giving back a one-form, thus we can write

aµ ≡ ηµνaν . (75)

More generally, for any tensor (for example a rank-2 tensor) we can write Cρν = ηµνC

ρµ

(notice, as always, the balancing of indices on either side of the equals sign). We can definean inverse to Eq. (74), which will raise an index, making a four-vector out of a one-form.Calling the inverse ηµν it has to satisfy

ηµαηαν = δµν , (76)

which tells us that the inverse is numerically the same as the matrix. We can see that theinverse raises the index by multiplying both sides of Eq. (75) by ηµα and using Eq. (76)

ηµαaµ = ηµαηµνaν

= δανaν

= aα,

which gives aα = ηµαaµ, which is correct. So, we now have a simple way of changing vectors.We can give an important interpretation of Eq. (74). Recall the (differential) spacetime

interval in Eq. (16), which we can write in terms of the product of a differential four-vectorand one-form, dxµ = (cdt, dx, dy, dz), which we can write using Eq. (74) as

ds2 = ηµνdxµdxν . (77)

Consider an ordinary three-vector of infinitesimal length, d~x = (dx, dy, dz). We can write itslength (squared) as d~x · d~x = dx2 + dy2 + dz2 = dxidxi (where we sum over the repeated iindex). We can rewrite this using the three-dimensional Kronecker delta symbol as d~x ·d~x =δijdx

idxj, which is identical in form to the expression in Eq. (77). The length tells usthe infinitesimal spatial distance between two points, in flat Euclidean space (i.e., we’re notlooking at the distance of two nearby points on the surface of a ball, or anything like that).The fact that the matrix multiplying the displacement vectors is just the Kronecker delta(i.e., the unit matrix) encodes the fact that the distance is measured on a flat plane. If thematrix had been different, then the distances could be measured on a curved surface.

This, then, suggests that Eq. (77) can tell us the invariant spacetime distance betweentwo points, encoding the flatness of that spacetime interval. Since Eq. (77) tells us distances

32

Page 33: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

along the spacetime, it is called the metric, and ηµν is specifically called the Minkowskimetric. We’ll return to these ideas in great detail later, where we’ll learn that there can beother metrics describing spaces of intrinsic curvature.

4 The Stress-Energy Tensor.

The energy-momentum four-vector, pµ, describes the four-momenta of a single particle. How-ever, in most cases we will need to describe collections of particles or, instead, fields. Wecould describe this collection by giving the four-momenta of all the constituent particles,completely specifying the exact state of the collection. But, for N particles this means spec-ifying 4N numbers to describe the system, and for a large collection this is an unwieldynumber of variables to keep track of. This would be very much like trying to describe theflow of water along a stream by trying to specify the position and momentum of every watermolecule in that stream. Clearly, this is far too much work.

When trying to describe water, we can ignore the individual molecules and instead treatit like a single continuous fluid. In this case the properties of the water are describedby macroscopic properties like density, pressure, and viscosity, instead of the microscopicproperties of the particles making it up. We can use this idea to tackle the collections ofparticles, treating them like a fluid, as well.

If we are going to treat collections of particles as fluids, including effects of pressure, etc.,then we will no longer be able to specify the state of the system by a four-vector. Eventhinking about ordinary water, it’s clear that the forces created by the water moving aroundcan either push directly onto a surface (like a wall), leading to pressures, or along the surface(leading to shearing forces). We could have a force pointing along the x direction, producinga pressure on a surface whose normal points along the x direction, Txx. We could also havea force pointing along the x direction, producing a shear on a surface hose normal pointsalong the y direction, Txy. In order to fully describe this system we need two indices, andso we need a 3× 3 tensor which is called the stress tensor ; the diagonal terms of the tensorrepresent the pressures, and the off-diagonal terms represent the shearing terms.

The stress tensor can be defined by looking at the force along the ith direction, per unitarea with normal pointing along the jth direction, ∆Sj (for example, ∆S1 = ∆y∆z, etc.).Then, the elements of the tensor are T ij, where

T ij =∆F i

∆Sj,

for example

T 11 =∆F x

∆y∆z= Px,

where Px is the pressure along the x direction. We can re-express this result in terms of themomentum along the x direction by recalling that the force is the momentum per unit time,∆F i = ∆pi

∆t, and so

T ij =∆pi

∆t∆Sj. (78)

33

Page 34: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

To make the system relativistic, we need to include another row and column, making it a4×4 tensor in such a way that it transforms correctly under a coordinate transformation. Wecan get a clue about how to generalize the 3× 3 stress tensor by thinking about the energy-momentum four-vector. The stress tensor is specified by the ordinary three-momentum, ~p,to which we’ve added an energy component to get the four-momentum, and so we shouldexpect to add an energy component to the stress tensor. The expected generalization of Eq.(78) is

T µν ≡ c∆pµ

∆Sν, (79)

where ∆Sν is defined in the same way as in the three-dimensional case, such that ∆S0 =∆x1∆x2∆x3, and ∆S1 = ∆x0∆x2∆x3 = c∆t∆x2∆x2, etc. Eq. (79) is called the stress-energy tensor, or sometimes the energy-momentum tensor. Notice that the stress-energytensor is symmetric, such that T µν = T νµ. Let’s look at some of the interesting cases, forexample

T 00 = c(E/c)∆x1∆x2∆x3 ≡ ρ

T 11 = cp1

c∆t∆x2∆x2 ≡ p1,

where ρ is the energy density, and p1 is the pressure in the x direction. In general the pressureswill differ in different directions (the fluid is anisotropic) leading to potential shearing forces.However, in the case of a perfect fluid, all of the pressures are the same in every direction,and there are no shearing forces. In this case, Eq. (79) becomes, T µν = diag (ρ, p, p, p), or

T µν =

ρ 0 0 00 p 0 00 0 p 00 0 0 p

. (80)

While Eq. (80) is a perfectly good description in matrix form, it would be nice to expressit in terms of tensors. Suppose we start with a single particle, which clearly would have nopressure (as there are no other particles around to exert pressure on it), p ≡ 0. It would,though, have an energy density, ρ. In it’s own rest frame, the energy density of the particlewould be ρ = E

Vol= mc2

Vol. The four-velocity of the particle in this frame is Uµ = (c, 0, 0, 0),

and so the stress-energy tensor would be

T µν =

ρ 0 0 00 0 0 00 0 0 00 0 0 0

c2UµUν .

This is only true for the pressureless case, where p = 0. This case is actually useful, however,for a collection of non-interacting particles, called dust. Any system in which the pressurecan be neglected is dust, and this includes a system of particles moving nonrelativistically.

In the general case (even for a perfect fluid) we cannot neglect the pressure, and so theabove expression must be changed to include p. A first guess might be T µν = (ρ+ p)UµUν ,but this would give T µν = diag (ρ+ p, 0, 0, 0), which isn’t Eq. (80). We can fix it up by

34

Page 35: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

adding a term pηµν where ηµν is the Minkowski metric. Thus, in terms of tensors, the perfectfluid stress-energy tensor is

T µν =( ρc2

+p

c2

)UµUν + pηµν . (81)

Later on we’ll see how to determine the general expression for the stress-energy tensor for agiven system. However, it can often be put into the form in Eq. (81) for a particular choiceof energy and pressure. In fact, this form will turn out to be very useful, as the expansionof the Universe can be modeled as a perfect fluid!

Because the stress-energy tensor represents the energy and momentum of a system, itmust be conserved such that

∂µTµν = 0, (82)

no matter what the system is (Eq. (82) amounts to four constraints, one for each value of ν,on the stress-energy tensor). The stress-energy tensor will play a central role in the GeneralTheory of Relativity, telling the spacetime how to bend. We will return to these ideas in ourdiscussion of gravity.

5 A Glimpse at Gauge Invariance.

To complete our discussion of the Special Theory of Relativity, let’s go back to the electro-magnetic potentials for a moment, first in terms of three-vectors. Since the magnetic fieldis the curl of ~A, and since the curl of a gradient is identically zero, we can add to ~A thegradient of any scalar function, letting ~A → ~A +∇λ, without changing the magnetic field.However, since the electric field also depends on ~A, making this transformation would change~E. But, the electric field also depends on the scalar potential, V . So, if we make a simulta-neous change V → V − ∂λ

∂t(for the same λ), then both the electric and magnetic fields are

unchanged. Putting the two of these transformations together into a single transformationof Aµ gives

Aµ → Aµ = Aµ + ∂µλ. (83)

The Maxwell equations are invariant under these transformations, for any choice of λ, as caneasily be checked by plugging in the transformation to Eq. (66). This has a very practicaland useful application. We said earlier that although we know that ∂µA

µ is an invariant,what it equals is actually up to us; in particular, we can choose ∂µA

µ = 0. Let’s see howthis happens. Suppose that we start with a potential Aµ, such that ∂µA

µ 6= 0, and then wemake a transformation to a new Aµ = Aµ + ∂µλ. This new potential is a perfectly good one,satisfying Maxwell’s equations, and so we can use it as our solution. However, we still havethe arbitrary function λ at our disposal. In particular, taking the derivative of Aµ gives

∂µAµ = ∂µA

µ + ∂µ (∂µλ)= ∂µA

µ + 2λ.

Now, by assumption ∂µAµ 6= 0, and so it must be some function, say we call it f (xµ). Now,

suppose we choose λ such that 2λ = −f (xµ), which we can always do since λ is completelyup to us. In this case we find ∂µA

µ = 0, and since any choice of Aµ is good, we can use

35

Page 36: The Special Theory of Relativity - UCM FacultyWebfaculty.ucmerced.edu/dkiley/Special Relativity.pdfThe Special Theory of Relativity \The views of space and time which I wish to lay

Aµ, meaning we can choose ∂µAµ = 0 right from the beginning. In this case, the Maxwell

equations in Eq. (66) reduces to

2Aµ − ∂µ (∂νAν) = 2Aµ = −µ0J

µ,

which considerably simplifies the equations. Notice that taking the derivative again gives

2 (∂µA) = −µ0∂µJµ.

The right hand side is zero because of current conservation, which forces the right-handside to be zero, so ∂µA

µ = 0, which is consistent with our choice of Aµ. So, we see thatby a proper choice of potential we can The transformation in Eq. (83) is called a gaugetransformation, and picking a specific choice of λ is called choosing a gauge. Since Maxwell’sequations don’t change under a gauge transformation, they are said to be gauge invariant.These ideas will be very useful in our later work.

Solving Maxwell’s equations, even using a clever gauge choice, is a fair amount of work.Once it’s done, though, the solutions fit with our ideas of causality, where the informationabout a charge can’t reach far away before light could. As we’ve discussed, the electromag-netic force is transmitted via photons, which travel at the speed of light, and so electro-dynamics is fully consistent with Einstein’s Special Theory of Relativity; gravity, as we’vediscussed it so far, is not. Newton’s theory of gravity predicts an instantaneous transfer offorce, and it is not at all clear how to fix it up. Einstein postulated his Special Theory ofRelativity in 1905, and it was another ten years before he was able to generalize it to includegravity. We’ll learn about that generalization soon.

36