lecture notes - math 231a - real analysis€¦ · lecture notes - math 231a - real analysis kyle...

Lecture Notes - MATH 231A - Real Analysis

Kyle Hambrook

February 19, 2020

Contents

1 Introduction: Riemann to Lebesgue 4

2 Preliminaries: Set Theory, Extended Real Numbers, Limsup and Liminf, InfiniteSeries 7

3 σ-Algebras 13

4 Measurable Functions 18

5 Measures 24

6 Integral of Non-Negative Simple Functions 27

7 Integral of Non-Negative Extended Real-Valued Functions 30

8 Integral of Extended Real-Valued Functions 35

9 Integrable Functions and Absolute Values 37

10 Fatou’s Lemma 38

11 Dominated Convergence Theorem 39

12 Interchanging Limits and Derivatives with Integrals 40

1

13 Almost Everywhere 42

14 Complete Measures 45

15 Completion of Measures (Optional) 47

16 The Lebesgue Integral Extends the Riemann Integral 48

17 Lebesgue’s Condition for Riemann Integrability 52

18 Normed Spaces and Banach Spaces 55

19 Lp Spaces: Definitions and Basic Properties 57

20 Lp Spaces: Completeness 59

21 Lp Spaces: Dense Subspaces (Optional) 61

22 Counting Measure and `p Spaces 65

23 Complex-Valued Measurable Functions 66

24 Caratheodory Construction: Measures from Outer Measures 67

25 Lebesgue Measure and Lebesgue σ-Algebra 71

26 Product σ-Algebras 74

27 Monotone Classes 75

28 Iterated Integrals of Characteristic Functions 76

29 Product Measures 80

30 Tonelli’s Theorem and Fubini’s Theorem 81

2

31 Lebesgue Measure on Rn 83

32 Probability Theory: Basic Definitions 84

33 Theory of Cumulative Distribution Functions 88

34 Probability Theory: Expected Value 90

35 Absolutely Continuous and Singular Measures 92

36 Decomposition of Probability Measures 92

3

1 Introduction: Riemann to Lebesgue

You are probably familiar with the Riemann integral from calculus and undergraduate analysis. Iff is a non-negative real-valued function defined on an interval [a, b], then, roughly speaking, theRiemann integral of f is a limit of Riemann sums∫ b

af(x)dx = lim

n→∞

n∑i=1

f(x∗i )(xi − xi−1),

where [a, b] is partitioned into subintervals [x0, x1], [x1, x2], . . . , [xn−1, xn], we choose sample pointsx∗i ∈ [xi−1, xi], and (xi − xi−1) is the length of [xi−1, xi]. We say f is Riemann integrable on [a, b]when its Riemann integral is defined.

In this course, we will study Lebesgue’s theory of integration with respect to a measure. Roughly, ameasure µ on a set X is a function which takes subsets A ⊆ X as inputs and gives non-negative realnumbers µ(A) as outputs. You can think of µ(A) as some general notion of the size of A. The mostimportant measure is the Lebesgue measure on R, denoted by λ. For each interval [a, b], λ([a, b])is the length b − a. For other sets A ⊆ R, λ(A) is the length of A (we will see how to define thislater.) There are many other useful measures. If f is a non-negative real-valued function definedon a set X, then, roughly speaking, the integral of f with respect to µ is∫

Xf(x)dµ(x) = lim

n→∞

n∑i=1

f(x∗i )µ(Ai)

Here the set X is partitioned into subsets A1, . . . , An, we choose sample points x∗i ∈ Ai, and µ(Ai)is the measure of the set Ai. When the integral is defined, we say f is µ-integrable. When µ is theLebesgue measure λ on R, the integral is simply called the Lebesgue integral.

Lebesgue’s theory of integration has several advantages over Riemann’s theory.

(1) Lebesgue’s theory allows us to integrate functions defined on arbitrary sets, whereas the Rie-mann integral is restricted to functions defined on R or Rd. This is especially important for certainapplications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-dation for modern probability theory, where the functions are random variables and we integrateover the sample space of possible outcomes.

(2) The Lebesgue integral (i.e., the integral with respect to Lebesgue measure) is defined for alarger class of functions on R, though it still agrees with the Riemann integral whenever the latteris defined. For example, the indicator function of the rationals

1Q =

1 if x ∈ Q0 if x /∈ Q

is not Riemann integrable on any interval [a, b], but it is Lebesgue integrable on any such interval.We will see the details later.

(3) Lebesgue’s theory possesses better convergence theorems, which lead to more general elegantresults and more useful spaces of functions.

Here is the type of convergence theorem we are interested in:

4

Prototype Convergence Theorem. If (fn) is a sequence of integrable functions defined on [a, b]

and (fn) converges to f “nicely”, then f is integrable and limn

∫ ba fn =

∫ ba f .

The problem is, of course, to figure out what “nicely” might mean and to define the integral insuch a way that theorems like this one will be widely applicable. These were important unresolvedissues in the late nineteenth century; they arose, for example, in the study of Fourier series. TheLebesgue theory was developed, in large part, to address this. Let us look at this problem in a bitmore detail.

Definition 1.1. A sequence (fn) of functions fn : X → R is said to converge pointwise to afunction f : X → R if

limn→∞

fn(x) = f(x) for each x ∈ X.

In this case, we write fn → f pointwise.

Example 1.2. fn = tent function = graph is triangle with vertices (0, 0), (1/n, n), (2/n, 0), equalszero elsewhere. Then fn → f = 0 pointwise, fn is Riemann integrable on [a, b], and f = 0 is

Riemann integrable on [a, b], but limn

∫ ba fn = 1 6= 0 =

∫ ba f . So we don’t get the desired conclusion

of the Prototype Convergence Theorem.

Example 1.3. Choose an enumeration Q = r1, r2, r3, . . .. Define f = 1Q and define fn byfn(x) = 1 if x ∈ r1, . . . , rn and fn(x) = 0 otherwise. Then fn → f pointwise, and fn is Riemannintegrable on [a, b], but f is not Riemann integrable on [a, b]. Again we don’t get the desiredconclusion of the Prototype Convergence Theorem. Note that here the sequence (fn) is quite nice.Indeed, (fn) is bounded (|fn| ≤ 1 for all n) and s(fn) is increasing (fn ≤ fn+1 for all n). But it’snot enough.

Example 1.4. Later, we will construct a sequence of functions fn : [0, 1]→ R such that

• 0 ≤ fn ≤ 1 for each n

• fn is continuous for each n

• (fn) is an increasing sequence

• (fn) converges pointwise to a function f on [0, 1].

• f is not Riemann integrable on [0, 1].

This sequence is even nicer than the one in the previous example, but we still don’t get the desiredconclusion of the Prototype Convergence Theorem.

The examples above show that pointwise convergence isn’t good enough for Riemann integration.By assuming more, we can get some convergence theorems for Riemann integration. We describethree such theorems below. Unfortunately, they are all somewhat unsatisfactory.

Definition 1.5. A sequence (fn) of functions fn : X → R is said to converge uniformly to afunction f : X → R if

limn→∞

sup |fn(x)− f(x)| : x ∈ X = 0.

In this case, we write fn → f uniformly.

5

Note that uniform convergence implies pointwise convergence.

Theorem 1.6. If fn is Riemann integrable on [a, b] for each n and fn → f uniformly on [a, b], then

f is Riemann integrable on [a, b] and limn

∫ ba fn =

∫ ba f .

Theorem 1.7. If fn is Riemann integrable on [a, b] for each n, fn ≤ fn+1 for each n, fn → f

pointwise on [a, b], and f is Riemann integrable on [a, b], then limn

∫ ba fn =

∫ ba f .

Theorem 1.8. If fn is Riemann integrable on [a, b] for each n, |fn| ≤ M for each n, fn → f

pointwise on [a, b], and f is Riemann integrable on [a, b], then limn

∫ ba fn =

∫ ba f .

The problem with the first of these three theorems is that uniform convergence is too strong. Inmany applications, we don’t have it. The problem with the last two theorems is that the Riemannintegrability of f is part of the hypothesis, rather than part of the conclusion.

We will see that the Lebesgue theory gives us very powerful convergence theorems, which lead tosome very impressive results and some very useful function spaces.

6

2 Preliminaries: Set Theory, Extended Real Numbers, Limsupand Liminf, Infinite Series

Set Theory

Definition 2.1.

• If A and B are sets, their union and intersection are

A ∪B = x : x ∈ A or x ∈ BA ∩B = x : x ∈ A and x ∈ B

• If Ai : i ∈ I is an indexed family of sets, the union and intersection of the family are⋃i∈I

Ai = x : x ∈ Ai for at least one i ∈ I⋂i∈I

Ai = x : x ∈ Ai for all i ∈ I

• If A and B are sets, the set difference of A and B (or the relative complement of Bin A) is is

A \B = x : x ∈ A and x /∈ B

It is read as “A minus B” or “A take away B”.

• If all the sets in a given context are subsets of a fixed set X, then the complement (orabsolute complement) of a set A ⊆ X is

Ac = X \A = x ∈ X : x /∈ A

Properties of Complement:Suppose all sets under consideration are subsets of a fixed set X. Let A,B ⊆ X and let Ai : i ∈ Ibe an indexed family of subsets of X.

• A ∪Ac = X

• A ∩Ac = ∅

• ∅c = X

• Xc = ∅

• (Ac)c = A

• A \B = A ∩Bc = Bc \Ac

• If A ⊆ B, then Bc ⊆ Ac

7

• De Morgan’s Laws: (⋃i∈I

Ai

)c

=⋂i∈I

Aci and

(⋂i∈I

Ai

)c

=⋃i∈I

Aci

In words, De Morgan’s Laws say that the complement of a union is the intersection of the comple-ments, and the complement of an intersection is the union of the complements.

Properties of Union, Intersection, and Set Difference

• B ∪⋃

i∈I Ai =⋃

i∈I(B ∪Ai)

• B ∩⋂

i∈I Ai =⋂

i∈I(B ∩Ai)

• B ∪⋂

i∈I Ai =⋂

i∈I(B ∪Ai)

• B ∩⋃

i∈I Ai =⋃

i∈I(B ∩Ai)

• A = (A ∩B) ∪ (A \B)

• A ∩B = A \ (A \B)

• A \ (B ∪ C) = (A \B) \ C

• (A ∪B) \ C = (A \B) ∪ (A \ C)

Definition 2.2. For any set X, the power set of X, denoted by P(X), is the collection of allsubsets of X:

P(X) = A : A ⊆ X

Definition 2.3. Let X,Y be sets. A function (or map) f : X → Y is a rule that assigns toeach element x ∈ X a unique element f(x) ∈ Y . The sets X and Y are called the domain andcodomain of f . The image (or range) of f is the set

f(X) = f(x) : x ∈ X .

If A ⊆ X, the image of A under f is the set

f(A) = f(x) : x ∈ A

If B ⊆ Y , the inverse image (or preimage) of B under f is the set

f−1(B) = x ∈ X : f(x) ∈ B

The inverse image commutes with unions, intersections, set differences, and complements:

• f−1(⋃

i∈I Ai

)=⋃

i∈I f−1(Ai)

• f−1(⋂

i∈I Ai

)=⋂

i∈I f−1(Ai)

• f−1 (A \B) = f−1(A) \ f−1(B)

8

• f−1 (Ac) =(f−1(A)

)cRemark 2.4. The image is not so well-behaved. See Exercise ???.

Definition 2.5. If f : X → Y and g : Y → Z, the composition of f and g is the function

g f : X → Z

defined by(g f)(x) = g(f(x)) for every x ∈ X.

Extended Real Numbers

The set of extended real numbers is the set obtained by adjoining the two symbols −∞ and+∞ to the set of real numbers. It is denoted by R. Thus

R = R ∪ −∞,+∞ .

We often write ∞ instead of +∞ We extend the usual ordering on R to R by declaring that

−∞ < x <∞ for every x ∈ R

The interval notation is extended in the natural way. For example,

(0,∞] = (0,∞) ∪ ∞ =x ∈ R : x > 0

=x ∈ R : 0 < x ≤ ∞

.

We extend the usual arithmetic operations on R to R by declaring that

• x+∞ =∞ and x−∞ = −∞ for all x ∈ R

• ∞+∞ =∞ and −∞−∞ = −∞

• x · (±∞) = ±∞ for all x ∈ (0,∞]

• x · (±∞) = ∓∞ for all x ∈ [−∞, 0)

• 0 · (±∞) = 0

• x

±∞= 0 for all x ∈ R.

Expressions of the form

∞−∞ and∞∞

are left undefined.

In some other areas of mathematics, the products 0 · (±∞) = 0 are left undefined, but not here.

The absolute values of −∞ and +∞ are

| −∞| = |+∞| = +∞

9

Supremum and Infimum

Let A ⊆ R.

An element a0 ∈ R is called the maximum (or greatest element) of A if a0 ∈ A and a ≤ a0 forevery a ∈ A. The maximum of A is denoted by max(A). It is easy to see that the maximum isunique if it exists, hence why we say “the maximum” instead of “a maximum”. The minimum(or least element) of A is defined similarly; it is denoted by min(A) and is unique if it exists.

An element a0 ∈ R is called an upper bound of A if a ≤ a0 for every a ∈ A. If A has an upperbound, it is said to be bounded above. A lower bound of A is defined similarly.

An element a0 ∈ R is called the supremum of A if it is the least upper bound of A, i.e., if a0 isthe minimum of the set of all upper bounds of A. Thus a0 is the supremum of A if and only if a0is an upper bound of A and a0 ≤ a′0 for every upper bound a′0 of A. The supremum is denotedby sup(A). Since minimums are unique if they exist, the supremum is unique if it exists. Theinfimum of A is the greatest lower bound of A; it is denoted by inf(A) and is unique if it exists.

If A is non-empty and bounded above in R, the order completeness axiom of the real numbersimplies that A has a supremum sup(A) in R.

If A is non-empty but not bounded above in R, then ∞ is the only upper bound for A, and sosup(A) =∞.

If A is empty, then every extended real number is an upper bound for A, and so sup(A) = −∞.Every subset of R has a supremum and an infimum in R.

Thus every subset of R has a supremum in R. The same goes for infimum.

Limits, Limsup, Liminf

Let (an) be a sequence in R.

Definition 2.6.

• Let a ∈ R. We write lim an = a (and we say that (an) converges to a and that a is the limitof (an)) if for every real ε > 0 there exists an N ∈ N such that if n ≥ N then

|an − a| < ε.

• We write lim an = +∞ (and we say that (an) converges to +∞ and that +∞ is the limit of(an)) if for every real M > 0 there exists an N ∈ N such that if n 6= N then

an > M.

• We write lim an = −∞ (and we say that (an) converges to −∞ and that −∞ is the limit of(an)) if for every real M > 0 there exists an N ∈ N such that if n 6= N then

an < −M.

10

Definition 2.7. The limsup (or limit superior) and liminf (or limit inferior) are defined by

lim sup an = infk≥1

(supn≥k

an

)

lim inf an = supk≥1

(infn≥k

an

)Note that

• The sequence ck =

(supn≥k

an

)is a decreasing sequence

• The sequence bk =

(infn≥k

an

)is an increasing sequence

• lim sup an = infk≥1

(supn≥k

an

)= lim

k

(supn≥k

an

)

• lim inf an = supk≥1

(infn≥k

an

)= lim

k

(infn≥k

an

)• bk ≤ ak ≤ ck for all k

• lim inf an ≤ lim sup an

Theorem 2.8. We have lim sup an = lim inf an iff limn an = L for some L ∈ R, in which case

lim sup an = lim inf an = limnan.

Proof. Exercise.

Infinite Series

Definition 2.9. Let∑∞

n=1 an be an infinite series whose terms an belong to R. The series has asum in R (i.e., the series converges in R) if both

(a) ∞ and −∞ do not both occur among the terms of∑∞

n=1 an, and

(b) the sequence sk =∑k

n=1 an of partial sums of∑∞

n=1 an has a limit in R

The sum of the series∑∞

n=1 an is then defined to be the limit of the partial sums:

limksk = lim

k

k∑n=1

an =

∞∑n=1

an.

Remark 2.10. (i) The sum of the series may be −∞ or +∞.

11

(ii) Note that condition (a) above is needed to guarantee that the undefined expression ∞−∞does not occur in any of partial sums sk =

∑kn=1 an.

(iii) If∞ is one of the terms of the series and −∞ is not, then the sum of the series is∞. Likewiseif we swap the roles of ∞ and −∞.

12

3 σ-Algebras

Definition 3.1. Let X be any set. A σ-algebra on X is a collection A ⊆ P(X) with the followingproperties.

(a) ∅ ∈ A

(b) If A ∈ A, then Ac = X \A ∈ A.

(c) If A1, A2, . . . ∈ A, then⋃∞

i=1Ai ∈ A.

The pair (X,A) is called a measurable space.

Remark: σ indicates “countable union”

Theorem 3.2. If A is a σ-algebra on a set X, then A has the following properties:

(i) X ∈ A

(ii) If A1, A2, . . . ∈ A, then⋂∞

i=1Ai ∈ A.

(iii) If A1, . . . , An ∈ A, then⋃n

i=1Ai ∈ A.

(iv) If A1, . . . , An ∈ A, then⋂n

i=1Ai ∈ A.

(v) If A,B ∈ A, then A \B = A ∪Bc ∈ A.

Proof. In this proof, (a),(b),(c) refer to the properties in the definition of a σ-algebra.

(i): By (a) and (b), X = ∅c ∈ A.

(ii): Let A1, A2, . . . ∈ A. By De Morgan’s laws

∞⋂i=1

Ai =

( ∞⋃i=1

Aci

)c

.

By (b), Ac1, A

c2, . . . ∈ A. Then, by (c),

⋃∞i=1A

ci ∈ A. Finally, by (b) again,

⋂∞i=1Ai = (

⋃∞i=1A

ci )

c ∈A.

(iii): Define Ai = ∅ for i > n and apply (c).

(iv): Define Ai = X for i > n and apply (ii). Alternatively, use De Morgan’s laws and apply (b)and (iii).

(v): Use (b) and (iv).

Example 3.3. Let X be any set. The following are σ-algebras on X.

(a) P(X)

(b) ∅, X

13

(c) ∅, A0, Ac0, X for any fixed A0 ⊆ X.

(d) A ⊆ X : A is countable or Ac is countable

Theorem 3.4. Let X be any set. The intersection of any collection of σ-algebras on X is aσ-algebra on X.

Proof. Let Ajj∈J be any collection of σ-algebras on X and consider their intersection⋂

j∈J Aj .We need to check properties (a), (b), and (c) of the definition of a σ-algebra.

(a): We have ∅ ∈ Aj for every j ∈ J . So ∅ ∈⋂

j∈J Aj .

(b): Suppose A ∈⋂

j∈J Aj . Then A ∈ Aj for every j ∈ J . Since Aj is a σ-algebra for every j ∈ J ,we have Ac ∈ Aj for every j ∈ J . So A ∈

⋂j∈J Aj .

(c): Suppose A1, A2, . . . ∈⋂

j∈J Aj . Then A1, A2, . . . ∈ Aj for every j ∈ J . Since Aj is a σ-algebrafor every j ∈ J , we have

⋂∞i=1Ai ∈ Aj for every j ∈ J . So

⋂∞i=1Ai ∈

⋂j∈J Aj .

Definition 3.5. Let X be any set and let G ⊆ P(X). The intersection of all σ-algebras on X thatcontain G is denoted by σ(G). In other words,

σ(G) =⋃A : A is a σ-algebra on X, G ⊆ A .

We call σ(G) the σ-algebra generated by G.

The following theorem is immediate from the definition of σ(G).

Theorem 3.6. Let X be any set and let G ⊆ P(X). Then σ(G) is the smallest σ-algebra on Xthat contains σ(G); in other words,

(i) σ(G) is a σ-algebra on X

(ii) G ⊆ σ(G)

(iii) If A is any σ-algebra on X and G ⊆ A, then σ(G) ⊆ A.

Definition 3.7.

• An open ball in Rd is a set of the form B(x0, r) = x ∈ X : |x− x0| < r, where x0 ∈ Rd

and 0 < r <∞. Note that the open balls in R are the open intervals of the form (a, b) with−∞ < a < b <∞. To see this, note (a, b) = (x0− r, x0 + r) = B(x0, r), where x0 = (b+ a)/2and r = (b− a)/2.

• An open set in Rd is a union open balls.

Definition 3.8. Suppose X is R or Rd (or any topological space). The Borel σ-algebra on X,denoted by B(X), is the σ-algebra generated by the collection of all open subsets of X. In otherwords, if T denotes the collection of all open subsets of X, then B(X) = σ(T ). The elements ofB(X) are called Borel sets in X.

14

Theorem 3.9. The Borel σ-algebra B(Rd) is generated by the collection of all open balls in Rd.In other words, if G is the collection of all open balls in Rd, then B(Rd) = σ(G).

Proof. Let T be the collection of all open sets in Rd. Since B(Rd) = σ(T ), we must show σ(G) =σ(T ). Since G ⊆ T , we have σ(G) ⊆ σ(T ) by Theorem 3.6. It remains to show σ(T ) ⊆ σ(G). IfA ∈ T , then A is a union of open balls in Rd. So for each point in x ∈ A, there is an open ballBx that contains x and is contained in A. Now for each point x ∈ A choose an open ball (1) thatcontains x, (2) that is contained in Bx, (3) whose center is a point with rational coordinates, and(4) whose radius is rational. In this way, A is written as a union of balls with whose centers haverational coordinates and whose radii are rational. Thus A is written as a union of a countablecollection of open balls. Hence A ∈ σ(G). Thus T ⊆ σ(G). Therefore σ(T ) ⊆= σ(G).

Theorem 3.10. The Borel σ-algebra B(R) is generated by each of the following collections of sets:

(i) G1 = (a, b) : a, b ∈ R, a < b

(ii) G2 = (a, b] : a, b ∈ R, a < b

(iii) G3 = (−∞, b] : b ∈ R

(iv) G4 = (−∞, b) : b ∈ R

(v) G5 = [a, b] : a, b ∈ R, a < b

(vi) G6 = [a, b) : a, b ∈ R, a < b

(vii) G7 = [a,∞) : a ∈ R

(viii) G8 = (a,∞) : a ∈ R

Proof. (i): The previous theorem implies G1 generates B(R), i.e., σ(G1) = B(R).

(ii): By writing,

(a, b] =∞⋂n=1

(a, b+ 1/n)

we see that G2 ⊆ σ(G1), hence σ(G2) ⊆ σ(G1). By writing

(a, b) =∞⋃n=1

(a, b− 1/n]

we see that G1 ⊆ σ(G2), hence σ(G1) ⊆ σ(G2). This proves G2 generates B(R).

(iii): By writing

(−∞, b] =

∞⋃n=1

(−n, b]


(a, b] = (−∞, b] ∩ (a,∞) = (−∞, b] ∩ (−∞, a]c


15

(iv): By writing

(−∞, b) =∞⋃n=1

(−∞, b− 1/n]


(−∞, b] =∞⋂n=1

(−∞, b+ 1/n)


(v),(vi),(vii),(viii): Similar.

Definition 3.11. Let B(R) denote the collection of all sets of the form A, A ∪ −∞, A ∪ ∞,A ∪ −∞∪∞, where A is a Borel set in R. It is straightforward to check that B(R) is a σ-algebraon R. We call B(R) the Borel σ-algebra on R.

Remark 3.12. It is possible to define a collection of open sets in R and to define the Borel σ-algebraon R as the σ-algebra generated by the open sets in R. See Exercise X???

Theorem 3.13. The Borel σ-algebra B(R) is generated by each of the following collection of sets:

(i) G′3 = [−∞, b] : b ∈ R

(ii) G′4 = [−∞, b) : b ∈ R

(iii) G′7 = [a,∞] : a ∈ R

(iv) G′8 = (a,∞] : a ∈ R

Proof. (i) By the previous theorem,

G′3 = −∞ ∪ (−∞, b) : b ∈ R ⊆ B(R),

hence σ(G′3) ⊆ B(R). Now we show B(R) ⊆ σ(G′3). Let B ∈ B(R). Then B equals one of A orA∪−∞ or A∪∞ or A∪−∞∪∞, for some A ∈ B(R). To show that B ∈ σ(G′3), it suffices to showthat −∞ , ∞ ∈ σ(G′3) and B(R) ⊆ σ(G′3). By writing,

−∞ =∞⋂n=1

[−∞,−n]

we see that −∞ ∈ σ(G′3). By writing

∞ =∞⋂n=1

(n,∞] =∞⋂n=1

[−∞, n]c

we see that ∞ ∈ σ(G′3). Recall the definition of G2 from the previous theorem:

G2 = (a, b] : a, b ∈ R, a < b

By writing(a, b] = [−∞, b] ∩ (a,∞] = [−∞, b] ∩ [−∞, a]c

16

we see that G2 ⊆ σ(G′3). Therefore σ(G2) ⊆ σ(G′3). It is tempting to cite the previous theorem andassert that B(R) = σ(G2). But this it not true. Here σ(G2) is the smallest σ-algebra on R thatcontains G2. But (by the previous theorem) B(R) is the smallest σ-algebra on R that contains G2.We need to work a bit harder. We showed G2 ⊆ σ(G′3). Since every element of G2 is a subset of R,we have

G2 ⊆R ∩B : B ∈ σ(G′3)

.

The reader can verify that R ∩B : B ∈ σ(G′3) is a σ-algebra on R. Therefore, since B(R) is thesmallest σ-algebra on is the smallest σ-algebra on R that contains G2, we have

B(R) ⊆R ∩B : B ∈ σ(G′3)

.

For each B ∈ σ(G′3),R ∩B = R ∩ −∞c ∩ ∞c ∩B ∈ σ(G′3),

and so B ∩ R : B ∈ σ(G′3)

⊆ σ(G′3).

ThereforeB(R) ⊆ σ(G′3).

(ii),(iii),(iv): Similar.

17

4 Measurable Functions

To motivate the definition of a measurable function, we ask the reader to recall the followingtheorem from undergraduate analysis about continuous functions.

Theorem 4.1. Let f : R→ R. Then f is continuous iff

f−1(V ) is an open set in R for every V is an open set in R

If T is the collection of open sets in R, we can rewrite this as: f is continuous iff

f−1(V ) ∈ T for every V ∈ T

The reader who has studied topology may recall the following more general theorem.

Theorem 4.2. Let X and Y be topological spaces and f : X → Y . Then f is continuous iff

f−1(V ) is an open set in X for every V is an open set in V in Y

If TX is the collection of open sets in X and TY is the collection of open sets in Y , we can rewritethis as: f is continuous iff

f−1(V ) ∈ TX for every V ∈ TY

In summary, a continuous function is one whose inverse image preserves open sets.

Now we state the definition of a measurable function.

Definition 4.3. Let f : X → Y . Let A be a σ-algebra on X. Let B be a σ-algebra on Y . We saythat f is measurable (or (A,B)-measurable to avoid ambiguity) if

f−1(B) ∈ A for every B ∈ B.

Now we turn to some special cases. If f : X → R, the σ-algebra on R is always assumed tobe the Borel σ-algebra B(R), i.e., (Y,B) = (R,B(R)). We say f : X → R is measurable (orA-measurable to avoid ambiguity) if

f−1(B) ∈ A for every B ∈ B(R).

If f : Rd → R, we sometimes (but not always) consider the Borel σ-algebra on the domain Rd. Wesay f : Rd → R is Borel measurable (or B(Rd)-measurable) if

f−1(B) ∈ B(Rd) for every B ∈ B(R).

The next theorem says that to show a function f is measurable we only need to check f−1(B) forB in a generating collection.

Theorem 4.4. Let (X,A) and (Y,B) be measurable spaces. Let f : X → Y . Let G be anycollection of subsets of Y that generates B, i.e., σ(G) = B. Then f is measurable iff

f−1(B) ∈ A for every B ∈ G.

18

Proof. ⇒: Since G ⊆ σ(G) = B, if

f−1(B) ∈ A for every B ∈ B.

thenf−1(B) ∈ A for every B ∈ G.

⇐: It is readily verified thatB ⊆ Y : f−1(B) ∈ A

is a σ-algebra. If

f−1(B) ∈ A for every B ∈ G.

thenB ⊆ Y : f−1(B) ∈ A

contains G Thus B = σ(G) ⊆

B ⊆ Y : f−1(B) ∈ A

. So f−1(B) ∈ A

for every B ∈ B. Therefore f is measurable.

By combining the above theorem and Theorem 3.13 (which gives generating sets for B(R)), weobtain:

Theorem 4.5. Let (X,A) be a measurable space and let f : X → R. Then the following areequivalent:

(i) f is measurable

(ii) x ∈ X : f(x) < b = f−1([−∞, b)) ∈ A for every b ∈ R

(iii) x ∈ X : f(x) ≤ b = f−1([−∞, b]) ∈ A for every b ∈ R

(iv) x ∈ X : f(x) > a = f−1((a,∞]) ∈ A for every a ∈ R

(v) x ∈ X : f(x) ≥ a = f−1([a,∞]) ∈ A for every a ∈ R

Remark 4.6. When verifying a function f : X → R is measurable, we usually use the theoremabove, rather than the definition.

Theorem 4.7. Let (X,A) be a measurable space and let f : X → R. If f is measurable, thenx ∈ X : f(x) = c = f−1(c) ∈ A for every c ∈ R.

Proof.x ∈ X : f(x) = c = x ∈ X : f(x) ≤ c ∩ x ∈ X : f(x) ≥ c ∈ A.

Notation. For brevity, we write

f < b = x ∈ X : f(x) < b f−1([−∞, b)),

f = c = x ∈ X : f(x) = c = f−1(c),

and so on.

Theorem 4.8. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.Then (i) f < g, (ii) f ≤ g, and (iii) f = g are measurable sets.

19

Proof. (i): For each fixed x ∈ X, we have f(x) < g(x) iff there is an r ∈ Q such that f(x) < r <g(x). Thus

x ∈ X : f(x) < g(x) =⋃r∈Qx ∈ X : f(x) < r ∩ x ∈ X : r < g(x) ∈ A

(ii): f ≤ g = g < fc ∈ A by (i).

(iii): f = g = f ≤ g ∩ f ≥ g ∈ A by (ii).

Now we give some examples of measurable functions.

Theorem 4.9. Let (X,A) be a measurable space. Let f : X → R. If f is constant, then f ismeasurable.

Proof. Assume f is constant. This means there exists a c ∈ R such that f(x) = c for all x ∈ X.Let a ∈ R. Then x ∈ X : f(x) ≥ a = ∅ if a > c, and x ∈ X : f(x) ≥ a = X if a ≤ c. Eitherway, x ∈ X : f(x) < a ∈ A. Thus f is measurable.

Theorem 4.10. Let (X,A) be a measurable space. Let A ⊆ X. Then A is measurable (i.e.,A ∈ A) iff the indicator function 1A is measurable.

Proof. ⇒: Assume A ∈ A. For every a ∈ R,

x ∈ X : 1A(x) ≥ a =

∅ if a > 1A if 0 < a ≤ 1X if a ≤ 0

∈ A.

So 1A is measurable.

⇐: Assume 1A is measurable. Then A = x ∈ X : 1A(x) > 0 ∈ A.

Theorem 4.11. Let f : R→ R. If f is continuous, then f is Borel measurable.

Proof. Since f is continuous, f−1(G) is open for every open set G ⊆ R. So x ∈ X : f(x) < a =f−1((−∞, a)) is open for every a ∈ R. Therefore x ∈ X : f(x) < a = f−1((−∞, a)) ∈ B(R) forevery a ∈ R.

Theorem 4.12. Let f : R→ R. If f is increasing or decreasing, then f is Borel measurable.

Proof. Exercise.

Theorem 4.13. Let (X,A),(Y,B), (Z, C) be a measurable spaces. Let f : X → Y and g : Y → Z.If f is (A,B)-measurable and if g : Y → Z is measurable (B, C)-measurable, then then the functiong f : X → Z is (A, C) measurable.

Proof. Exercise.

20

Theorem 4.14. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.Then the following functions are measurable:

(i) f + g, if f(x) + g(x) 6=∞−∞,−∞+∞ for every x ∈ X.

(ii) fg

(iii) f/g, if g(x) 6= 0 for every x ∈ X.

Proof. Let a ∈ R be arbitrary.

(i):

f + g < a = f < −g + a =⋃r∈Q

(f < r ∩ r < −g + a) =⋃r∈Q

(f < r ∩ g < −r + a) ∈ A.

(ii): fg < a = A0 ∪A1 ∪A2 ∪A3 ∪A4, where Ai are as follows:

A0 = (f = 0 ∪ g = 0) ∩ fg < aA1 = f > 0 ∩ g > 0 ∩ fg < aA2 = f > 0 ∩ g < 0 ∩ fg < aA3 = f < 0 ∩ g > 0 ∩ fg < aA4 = f < 0 ∩ g < 0 ∩ fg < a

We have:

A0 = (f = 0 ∪ g = 0) ∩ fg < a =

∅ if a ≤ 0f = 0 ∪ g = 0 if a ≤ 0

∈ A

A1 = f > 0 ∩ g > 0 ∩ fg < a= f > 0, g > 0, fg < a= f > 0, g > 0, f < a/g

=⋃r∈Qr>0

(f > 0, g > 0, f < r ∩ f > 0, g > 0, r < a/g)

=⋃r∈Qr>0

(f > 0, g > 0, f < r ∩ f > 0, g > 0, g < a/r)

= f > 0 ∩ g > 0 ∩⋃r∈Qr>0

(f < r ∩ g < a/r) ∈ A

21

A2 = f > 0 ∩ g < 0 ∩ fg < a= f > 0, g < 0, fg < a= f > 0, g < 0, f(−g) > −a= f > 0, g < 0, f > (−a)/(−g)

=⋃r∈Qr>0

(f > 0, g < 0, f > r ∩ f > 0, g < 0, r > (−a)/(−g))

=⋃r∈Qr>0

(f > 0, g < 0, f > r ∩ f > 0, g < 0,−g > (−a)/r)

=⋃r∈Qr>0

(f > 0, g < 0, f > r ∩ f > 0, g < 0, g < a/r)

= f > 0 ∩ g < 0 ∩ ∪r∈Qr>0

(f > r ∩ g < a/r) ∈ A

Similarly, A3, A4 ∈ A. Therefore fg < a ∈ A

(iii): f/g < a = f < ag ∈ A by (ii) and the previous theorem.

Theorem 4.15. Let (X,A) be a measurable space. Let f, g : X → R be measurable functions.The following functions are measurable:

(i) max f, g

(ii) min f, g

(iii) |f |

Proof. (i): max f, g > a = f > a ∪ g > a ∈ A

(ii): min f, g > a = f > a ∩ g > a ∈ A

(iii): |f | < a = −a < f < a = f > −a∩f < a ∈ A. Alternatively, write |f | = max(f,−f).

Theorem 4.16. Let (X,A) be a measurable space. Let (fn) be a sequence of measurable functionsX → R. The following functions are measurable:

(i) supn fn

(ii) infn fn

(iii) lim sup fn = infk≥1(supn≥k fn)

(iv) lim inf fn = supk≥1(infn≥k fn)

22

Proof. (i): For each a ∈ R,supnfn > a

=

x ∈ X : sup

nfn(x) > a

=∞⋃n=1

x ∈ X : fn(x) > a ∈ A.

(ii): Similar to (i).

(iii): By (i), gk = supn≥k fn is measurable for each k ∈ N. By (ii), lim sup fn = infk≥1 gk ismeasurable.

(iv): Similar to (iii).

Corollary 4.17. Let (X,A) be a measurable space. Let (fn) be a sequence of measurable functionsX → [−∞,∞]. If limn fn(x) converges to an extended real number for each x ∈ X, then limn fn =lim supn fn = lim infn fn, and so limn fn is measurable.

23

5 Measures

Definition 5.1. Let A be a σ-algebra on a set X. A measure on A is a function µ : A → [0,∞]that has the following properties:

(a) µ(∅) = 0

(b) (Countable Addivity) If E1, E2, . . . is a sequence of disjoint sets in A, then µ (⋃∞

i=1Ei) =∑∞i=1 µ(Ei).

Theorem 5.2 (Properties of Measures). Let A be a σ-algebra on a set X. If µ is a measure on A,then µ has the following properties:

(i) (Finite Additivity) If A,B are disjoint sets in A, then µ(A ∪B) = µ(A) + µ(B).

(ii) (Subtraction) If A,B ∈ A, A ⊆ B, and µ(A) <∞, then µ(B \A) = µ(B)− µ(A).

(iii) (Monotonicity) If A,B ∈ A and A ⊆ B, then µ(A) ≤ µ(B).

(iv) (Countable Subadditivity) If A1, A2, . . . ∈ A, then µ (⋃∞

i=1Ai) ≤∑∞

i=1 µ(Ai).

(v) (Continuity From Below) IfA1, A2, . . . ∈ A andA1 ⊆ A2 ⊆ . . ., then µ (⋃∞

i=1Ai) = limn µ(An).

(vi) (Continuity From Above) If A1, A2, . . . ∈ A, A1 ⊇ A2 ⊇ . . ., and µ(A1) < ∞, thenµ (⋂∞

i=1Ai) = limn µ(An).

Proof. (i): Define A1 = A, A2 = B, and Ai = for i > 2 and use countable additivity.

(ii): Since B = (B ∩A) ∪ (B \A) = A ∪ (B \A), finite additivity gives

µ(B) = µ(A) + µ(B \A).

Since µ(A) <∞, we can subtract it from both sides to get the desired result.

(iii): Since B = (B ∩ A) ∪ (B \ A) = A ∪ (B \ A), finite additivity and the fact that µ(B \ A) ≥ 0gives

µ(B) = µ(A) + µ(B \A) ≥ µ(A)

.

(iv): Define B1 = A1, B2 = A2 \A1, B3 = A3 \ (A1 ∪A2), and so on. Then B1, B2, . . . is sequenceof disjoint sets in A such that ∪∞i=1Bi = ∪∞i=1Ai and Bi ⊆ Ai for all i ∈ N. Therefore, by countableadditivity and monotonicity,

µ

( ∞⋃i=1

Ai

)= µ

( ∞⋃i=1

Bi

)=∞∑i=1

µ(Bi) ≤∞∑i=1

µ(Ai).

(v): Define B1 = A1, B2 = A2 \ A1, B3 = A3 \ A2, and so on. Then B1, B2, . . . is sequence ofdisjoint sets in A such that ∪∞i=1Bi = ∪∞i=1Ai and ∪ni=1Bi = ∪ni=1Ai = An for all n ∈ N. Therefore,

24

by countable additivity and finite additivity,

µ

( ∞⋃i=1

Ai

)= µ

( ∞⋃i=1

Bi

)=∞∑i=1

µ(Bi)

= limn→∞

n∑i=1

µ(Bi) = limn→∞

µ (∪ni=1Bi)

= limn→∞

µ (∪ni=1Ai) = limn→∞

µ (An) .

(vi): Define B1 = A1 \ A1 = ∅, B2 = A1 \ A2, B3 = A1 \ A3, and so on. Then B1, B2, . . . ∈ A andB1 ⊆ B2 ⊆ . . .. By (v),

µ

( ∞⋃i=1

Bi

)= lim

n→∞µ(Bn).

Since⋃∞

i=1Bi = A1 \⋂∞

i=1Ai and Bn = A1 \An, we have

µ

(A1 \

∞⋂i=1

Ai

)= lim

n→∞µ(A1 \An).

By monotonicity, µ (⋂∞

i=1Ai) ≤ µ(An) ≤ µ(A1) <∞. Then (ii) gives

µ(A1)− µ

( ∞⋂i=1

Ai

)= lim

n→∞(µ(A1)− µ(An)) .

Subtracting µ(A1) and then multiply by −1 gives the desired result.

Example 5.3. Let X be any set.

(i) The counting measure on X is the measure µ : P(X) → [0,∞] defined by µ(A) = ∞ if A isan infinite subset of X and µ(A) equals the number of elements in A if A is a finite subset ofX.

(ii) Let x0 ∈ X. The Dirac measure at x0 or point mass at x0 is the measure µ : P(X)→ [0,∞]defined by µ(A) = 1 if x0 /∈ A and µ(A) = 1 if x0 ∈ A.

As an exercise, the reader should verify that these are indeed measures.

The most important measure is the Lebesgue measures on (R,B(R)). Later, we will see how to definethe Lebesgue measure. For now, we state without proof the following theorem that characterizesit.

Theorem 5.4. There is a unique measure λ on (R,B(R)) such that

λ ((a, b]) = b− a

for every a, b ∈ R, a ≤ b. It is called the Lebegsue measure on (R,B(R))

We will also consider the Lebesgue measure on (Rd,B(Rd)). The following theorem characterizesit.

25

Theorem 5.5. There is a unique measure λd on (Rd,B(Rd)) such that

λ ((a1, b1]× · · · × (ad, bd]) = (b1 − a1) · · · (bd − ad)

for every ai, bi ∈ R, ai ≤ bi. It is called the Lebegsue measure on (Rd,B(Rd)).

26

6 Integral of Non-Negative Simple Functions

We define the integral in stages. We start by defining the integral for the class of simple functionsand establishing its basic properties.

Definition 6.1. A function s : X → R is called simple if its range s(X) = s(x) : x ∈ X is afinite set.

Lemma 6.2. If s : X → R is a simple function whose range consists of the distinct numbersc1, c2, . . . , cn, then

s =n∑

i=1

ci1Ei where Ei = s−1(ci) = x ∈ X : s(x) = ci. (6.1)

We call (6.1) the standard representation of s. Note E1, . . . , En are disjoint sets and⋃n

i=1Ei =X. Note also that s ≥ 0 iff ci ≥ 0 for all 1 ≤ i ≤ n.

DRAW A GRAPH HERE

Lemma 6.3. Let (X,A) be a measurable space. Let s : X → R be a simple function whose rangeconsists of the distinct numbers c1, c2, . . . , cn. Then s is measurable iff each set Ei = s−1(ci) inits standard representation is measurable.

Proof. If each Ei is measurable, then each 1Ei is measurable, so s =∑n

i=1 ci1Ei is measurable.

Conversely, if s is measurable, then

Ei = x ∈ X : s(x) = ci =∞⋂n=1

x ∈ X : ci − 1/n < s(x) < ci + 1/n

is measurable for each i = 1, . . . , n.

Definition 6.4. Let (X,A, µ) be a measure space. If s : X → R is a non-negative measurablesimple function with standard representation

s =n∑

i=1

ci1Ei ,

then the integral of s with respect to µ (or the µ-integral of s) is∫sdµ =

n∑i=1

ciµ(Ei)

For the sum, remember the convention 0 · ∞ = 0. Note that∫sdµ may equal ∞. When there

is no ambiguity, we write∫s instead of

∫sdµ. Alternatively, it is sometimes convenient to write∫

s(x)dµ(x) instead of∫sdµ.

The next theorem says the integral is linear and monotone for non-negative measurable simplefunctions.

27

Theorem 6.5. Let (X,A, µ) be a measure space. Let s and t be non-negative measurable simplefunctions.

(a) If c ∈ [0,∞), then∫cs = c

∫s.

(b)∫

(s+ t) =∫s+

∫t

(c) If s ≤ t, then∫s ≤

∫t.

Proof. Let s =∑m

i=1 ai1Ai and t =∑n

i=1 bj1Bj be the standard representations of s and t. SoAi = x : s(x) = ai and Bj = x : t(x) = bi. Note that Ai =

⋃nj=1(Ai ∩ Bj) for each fixed

i = 1, . . . ,m. Likewise Bj =⋃m

i=1(Ai ∩ Bj) for each fixed j = 1, . . . , n. Finally, note that Ai ∩ Bj

is disjoint from Ai′ ∩Bj′ whenever (i, j) 6= (i′, j′).

(a): If c ∈ (0,∞), then Ai = x : cs(x) = cai, and so the standard representation of cs is

cs =

m∑i=1

cai1Ai .

Therefore ∫cs =

m∑i=1

caiµ(Ai) = c

m∑i=1

aiµ(Ai) = c

∫s.

If c = 0, then cs = 0, and its standard representation is

cs = 01X .

Thus ∫cs = 0µ(X) = 0 = 0

∫s = c

∫s

(b): Then, by the finite additivity of µ,∫s+

∫t =

m∑i=1

aiµ(Ai) +n∑

j=1

biµ(Bi)

=

m∑i=1

n∑j=1

aiµ(Ai ∩Bj) +

m∑i=1

n∑j=1

bjµ(Ai ∩Bj)

=

m∑i=1

n∑j=1

(ai + bj)µ(Ai ∩Bj).

Note s+ t is a simple function because its range is the finite set

ai + bj : 1 ≤ i ≤ m, 1 ≤ j ≤ n .

Let c1, . . . , c` be the distinct numbers in this set. For each k ∈ 1, . . . , `, let

Ek = (s+ t)−1(ck) = x ∈ X : s(x) + t(x) = ck

ThenEk =

⋃(i,j):ai+bj=ck

(Ai ∩Bj)

28

In other words, Ek is the union of those sets Ai ∩ Bj such that ai + bj = ck. So the standardrepresentation of s+ t is

s+ t =∑k=1

ck1Ek.

Then, by the finite additivity of µ,∫(s+ t) =

∑k=1

ckµ(Ek)

=∑k=1

∑i,j:ai+bj=ck

ckµ(Ai ∩Bj)

=m∑i=1

n∑j=1

(ai + bj)µ(Ai ∩Bj)

Comparing the formulas above for∫s+

∫t and

∫(s+ t) shows they are equal.

(c): If s ≤ t, then for every x ∈ Ai ∩Bj we have

ai = s(x) ≤ t(x) ≤ bj

Thus ai ≤ bj whenever Ai ∩Bj is non-empty. Then, by the finite additivity of µ,∫s =

m∑i=1

aiµ(Ai) =

m∑i=1

n∑j=1

aiµ(Ai ∩Bj) ≤m∑i=1

n∑j=1

bjµ(Ai ∩Bj) =

n∑j=1

bjµ(Bj) =

∫t

29

7 Integral of Non-Negative Extended Real-Valued Functions

Now we extend the definition of the integral to the class of non-negative functions.

Definition 7.1. Let (X,A, µ) be a measure space. Let f : X → [0,∞] be a measurable function.The integral of f with respect to µ (or the µ-integral of f) is defined by Then∫

fdµ = sup

∫s dµ : s simple measurable, 0 ≤ s ≤ f

(7.1)

Note that∫fdµ may equal ∞. When there is no ambiguity, we write

∫f instead of

∫fdµ.

Alternatively, it is sometimes convenient to write∫f(x)dµ(x) instead of

∫fdµ.

Remark 7.2. We must check that this definition of the integral agrees with the earlier one when fis simple. We temporarily use

∮to denote the integral defined in Definition 6.4. Then the formula

for the integral in Definition 7.1 becomes∫fdµ = sup

∮s dµ : s simple measurable, 0 ≤ s ≤ f

If f : X → [0,∞] is a measurable simple function, then

∮f is an element of the set on the right.

Moreover, for any simple measurable function s with 0 ≤ s ≤ f , Theorem 6.5 implies∮s ≤

∮f , and

so∫f is the maximum of the set on the right. Since the supremum of a set equals the maximum

whenever the maximum exists, we have∫fdµ = sup

∮s dµ : s simple measurable, 0 ≤ s ≤ f

=

∮fdµ

The next theorem establishes that the integral is monotone for non-negative functions.

Theorem 7.3. Let (X,A, µ) be a measure space. Let f, g : X → [0,∞] be measurable functions.If f ≤ g, then

∫f ≤

∫g.

Proof. If s is a simple measurable function such that 0 ≤ s ≤ f , then 0 ≤ s ≤ g. Thus∫s dµ : s simple measurable, 0 ≤ s ≤ f

⊆∫

s dµ : s simple measurable, 0 ≤ s ≤ g

Taking supremums gives the desired inequality.

The next theorem is the first of the fundamental convergence theorems in Lebesgue’s theory ofintegration.

Theorem 7.4. (Monotone Convergence Theorem) Let (X,A, µ) be a measure space. Let fn :X → [0,∞] (n = 1, 2, . . .) be a sequence of measurable functions. If fn increases to f pointwise(meaning that f1(x) ≤ f2(x) ≤ . . . and f(x) = limn fn(x) = supn fn(x) for every x ∈ X), then f ismeasurable and

limn

∫fn =

∫f.

30

Proof. Since f is the pointwise limit of a sequence of measurable functions, Corollary 4.17 impliesf is measurable. Since f1 ≤ f2 ≤ . . . ≤ f , we have

∫f1 ≤

∫f2 ≤ . . . ≤

∫f , and so limn

∫fn ≤

∫f .

We must prove the reverse inequality. Let 0 < c < 1 be given. Let s be a simple measurablefunction such that 0 ≤ s ≤ f . For every n ∈ N, define

En = x ∈ X : cs(x) ≤ fn(x) .

For every n ∈ N, we have cs1En ≤ fn, and so

c

∫s1En ≤

∫fn. (7.2)

We want to take the limit n → ∞ in (7.2). Suppose the standard representation of s is s =∑ki=1 ai1Ai . Then s1En is a simple function and its standard representation is s1En =

∑ki=1 ai1En∩Ai .

Therefore ∫s =

k∑i=1

aiµ(Ai)

and ∫s1En =

k∑i=1

aiµ(En ∩Ai).

Note E1 ⊆ E2 ⊆ . . . and

X =∞⋃n=1

En.

(To see the last equality, consider an arbitrary x ∈ X. If f(x) = 0 or s(x) = 0, then x ∈ E1. Iff(x) > 0 and s(x) > 0, then cs(x) < f(x), hence there exists n ∈ N such that cs(x) < fn(x) ≤ f(x),and so x ∈ En.) Let i ∈ 1, . . . , k be arbitrary. We have E1 ∩Ai ⊆ E2 ∩Ai ⊆ . . . and

Ai =∞⋃n=1

(En ∩Ai).

By continuity from below,limn→∞

µ(En ∩Ai) = µ(Ai).

Therefore

limn

∫s1En = lim

n

k∑i=1

aiµ(En ∩Ai) =k∑

i=1

aiµ(Ai) =

∫s.

Thus taking n→∞ in (7.2) gives

c

∫s ≤ lim

n

∫fn.

Since 0 < c < 1 is arbitrary, ∫s ≤ lim

n

∫fn.

Since s is an arbitrary simple measurable function with 0 ≤ s ≤ f , we have∫f ≤ lim

n

∫fn.

31

The next theorem concerns approximation of functions by simple functions. It is frequently usedto produce a sequence of functions to which the monotone convergence theorem can be applied.Notice the theorem does not involve a measure µ.

Theorem 7.5. (Simple Approximation Theorem) Let (X,A). Let f : X → R. Then there existsa sequence (sn) of functions sn : X → R such that

(i) s1, s2, . . . are simple functions.

(ii) |s1| ≤ |s2| ≤ . . . ≤ |f |.

(iii) limk→∞ sk(x) = f(x) for each x ∈ X.

(iv) If f is bounded, then sk → f uniformly on X.

(v) If f ≥ 0, then 0 ≤ s1 ≤ s2 ≤ . . . ≤ f .

(vi) If (X,A) is a measurable space and f is measurable, then s1, s2, . . . are measurable.

Proof. Let k ∈ N. Decompose [−∞,∞] as the following union of subintervals:

[−∞,∞] = [−∞,−k] ∪ (−k, 0] ∪ [0, k) ∪ [k,∞]

Decompose [−∞,∞] further by dividing (−k, 0] and [0, k) up into intervals of length 1/2k. So weget

[−∞,∞] = [−∞,−k] ∪k2k⋃m=1

(−m2k

,−(m− 1)

2k

]∪

k2k⋃m=1

[m− 1

2k,m

2k

)∪ [k,∞].

For each x ∈ X, consider the subinterval to which f(x) belongs and define sk(x) to the endpointof that subinterval which is closest to 0. More precisely,

sk(x) =

k if f(x) ∈ [k,∞]m−12k

if f(x) ∈[m−12k

, m2k

),m ∈

1, . . . , k2k

−(m−1)

2kif f(x) ∈

(−m2k, −(m−1)

2k

],m ∈

1, . . . , k2k

−k if f(x) ∈ [−∞,−k].

Figure 1 illustrates the definition of sk and sk+1.

(i): Since sk takes only finitely many values, sk is simple.

(ii): No matter which subinterval f(x) belongs, sk(x) rounds f(x) to a number closer to zero thandoes sk+1(x). See Figure 1.

(iii): The key observation is that if f(x) ∈ (−k, k), then

|f(x)− sk(x)| ≤ 1

2k.

Here are the details. Let x ∈ X. If f(x) = ∞, then sk(x) = k for every k ∈ N, and solimk→∞ sk(x) =∞ = f(x). If f(x) = −∞, then sk(x) = −k for every k ∈ N, and so limk→∞ sk(x) =

32

Figure 1: Definition of sk and sk+1

−∞ = f(x). If f(x) ∈ (−∞,∞), there exists M > 0 such that f(x) ∈ (−M,M). Let ε > 0. ChooseK ∈ N such that 1

2K< ε and K ≥M . For every k ∈ N, we have f(x) ∈ (−k, k), and so

|f(x)− sk(x)| ≤ 1

2k≤ 1

2K< ε.

Therefore limk→∞ sk(x) = f(x).

(iv): If f is bounded, then there exists M > 0 such that f(x) ∈ (−M,M) for every x ∈ X. Letε > 0. Choose K ∈ N such that 1

2K< ε and K ≥ M . For every k ∈ N and every x ∈ X, we have

f(x) ∈ (−k, k), and so

|f(x)− sk(x)| ≤ 1

2k≤ 1

2K< ε.

Therefore sk → f uniformly on X.

(v): If f(x) ≥ 0, then sk(x) ≥ 0 by definition. The rest follows from (ii).

(vi): For k ∈ N and m ∈

1, . . . , k2k

, define

Fk = x : f(x) ∈ [k,∞] = f−1([k,∞]),

F−k = x : f(x) ∈ [−∞,−k] = f−1([−∞,−k]),

Em =

x : f(x) ∈

[m− 1

2k,m

2k

)= f−1

([m− 1

2k,m

2k

)),

E−m =

x : f(x) ∈

(−m2k

,−(m− 1)

2k

]= f−1

((−m2k

,−(m− 1)

2k

]).

Then

sk = k1Fk+ (−k)1F−k

+

k2k∑m=1

m− 1

2k1Em +

k2k∑m=1

−(m− 1)

2k1E−m .

33

If f is measurable, then all the sets Fk, F−k, Em, E−m are measurable, and so sk is measurable.

Now we combine the monotone convergence theorem and the simple approximation theorem, toprove the linearity of the integral for non-negative functions.

Theorem 7.6. Let (X,A, µ) be a measure space. Let f, g : X → [0,∞] be measurable functions.

(a) If c ∈ [0,∞), then∫cf = c

∫f .

(b)∫

(f + g) =∫f +

∫g

Proof. (a): By Theorem 7.5, there is a sequence of non-negative measurable simple functions (sn)such that sn increases to f . Then (csn) is a sequence of of non-negative measurable simple functionsthat increases to cf . By the monotone convergence theorem and Lemma 6.5(a)∫

cf = limn

∫csn = c lim

n

∫sn = c

∫f

(b): By Theorem 7.5, there are sequences of non-negative measurable simple functions (sn) and(tn) such that sn increases to f and tn increases to g. Then (sn + tn) is a sequence of non-negativemeasurable simple functions that increases to f + g. By the monotone convergence theorem andLemma 6.5(b), ∫

(f + g) = limn

∫(sn + tn) = lim

n

∫sn + lim

n

∫tn =

∫f +

∫g

Remark 7.7. In the exercises, you will be asked to prove the previous theorem without using themonotone convergence theorem and the simple approximation theorem.

34

8 Integral of Extended Real-Valued Functions

Definition 8.1. Let f : X → [−∞,∞]. The positive part of f is the function

f+ = max 0, f .

The negative part of f is the function

f− = max 0,−f .

Lemma 8.2. Let f : X → [−∞,∞].

(a) f+, f− ≥ 0

(b) f = f+ − f−

(c) |f | = f+ + f−

(d) f+(x) = 12(|f |(x) + f(x)) whenever f(x) 6= −∞

(e) f−(x) = 12(|f |(x)− f(x)) whenever f(x) 6=∞

(f) f is measurable iff f+ and f− are measurable

Definition 8.3. Let (X,A, µ) be a measure space. Let f : X → [−∞,∞] be a measurable function.If at least one of

∫f+dµ and

∫f−dµ is finite, the Lebesgue integral (or simply integral) of f

with respect to µ is defined to be ∫fdµ =

∫f+dµ−

∫f+dµ.

If∫f+dµ =

∫f−dµ = ∞, the integral

∫fdµ is not defined Note

∫fdµ is finite iff both

∫f+dµ

and∫f−dµ are finite. If

∫fdµ is finite, we say that f is integrable with respect to µ.

Definition 8.4. Let (X,A, µ) be a measure space. Let f : X → R. If f is measurable, E ∈ A,and

∫f1Edµ is defined, we define ∫

Efdµ =

∫f1Edµ.

With this notation,∫fdµ =

∫X fdµ.

Theorem 8.5. ?? Let (X,A, µ) be a measure space. Let f, g : X → R be measurable functions.Assume

∫f and

∫g are defined.

(a) If c ∈ R, then∫cf = c

∫f .

(b)∫

(f + g) =∫f +

∫g

(c) If f ≤ g, then∫f ≤

∫g.

Proof. (a): Exercise.

35

(b): Case 1: f and g are real-valued. Note (f + g)+ ≤ f+ + g+. Then Theorem ?? gives∫(f + g)+ ≤

∫(f+ + g+) =

∫f+ +

∫g+ <∞.

Likewise∫

(f + g)− < ∞. Therefore f + g is integrable. Note (f + g)+ − (f + g)− = f + g =f+ − f− + g+ − g−. Since f−, g−, and (f + g)− are finite, rearranging gives (f + g)+ + f− + g− =(f + g)− + f+ + g+. Then Theorem ?? gives∫

(f + g)+ +

∫f− +

∫g− =

∫(f + g)− +

∫f+ +

∫g+.

Since∫f−,

∫g−, and

∫(f + g)− are finite, rearranging gives∫

(f + g)+ −∫

(f + g)− =

∫f+ −

∫f− +

∫g+ −

∫g−.

Therefore ∫(f + g) =

∫f +

∫g.

Case 2: To be filled.

(c): Assume f ≤ g. Then f+ ≤ g+ and f− ≥ g−. Therefore∫fdµ =

∫f+dµ−

∫f−dµ ≤

∫g+dµ−

∫g−dµ ≤

∫gdµ.

Theorem 8.6. Let (X,A, µ) be a measure space. If f, g : X → [−∞,∞] are measurable functions,then ∫

(f + g) =

∫f +

∫g,

provided we do not have∫f = −

∫g =∞ or

∫f = −

∫g = −∞.

Proof. Exercise.

Theorem 8.7. Let (X,A, µ) be a measure space. Let f, g : X → [−∞,∞] be measurable functions.

(a) If f−, g−,∫f−, and

∫g− are finite, then

∫(f + g) =

∫f +

∫g.

(b) If f+, g+,∫f+, and

∫g+ are finite, then

∫(f + g) =

∫f +

∫g.

Proof. Exercise.

36

9 Integrable Functions and Absolute Values

Theorem 9.1. Let (X,A, µ) be a measure space. Let f : X → [−∞,∞] or f : X → C be ameasurable function.

(a) f is integrable iff |f | is integrable.

(b) If f is integrable, then∣∣∫ f ∣∣ ≤ ∫ |f |.

Proof. Case 1: f : X → [−∞,∞].

(a): f is integrable iff∫f =

∫f+−

∫f− is finite iff

∫f+ and

∫f− are finite iff

∫|f | =

∫f+ +

∫f−

is finite iff |f | is integrable.

(b): ∣∣∣∣∫ f

∣∣∣∣ =

∣∣∣∣∫ f+ −∫f−∣∣∣∣ ≤ ∣∣∣∣∫ f+

∣∣∣∣+

∣∣∣∣∫ f−∣∣∣∣ =

∫f+ +

∫f− =

∫|f |

Case 2: f : X → C.

(a): f is integrable iff∫f =

∫Re f +

∫Im f is finite iff

∫Re f and

∫Im f are finite iff

∫|Re f | and∫

| Im f | are finite. The last equivalence comes from Case 1. But |f | ≤ |Re f | + | Im f | ≤ 2|f |, sothat ∫

|f | ≤∫|Re f |+

∫| Im f | ≤ 2

∫|f |.

Thus∫|Re f | and

∫| Im f | are finite iff

∫|f | is finite iff |f | is integrable.

(b): The inequality is trivial if∫f = 0. Assume

∫f 6= 0. Set α =

∣∣∫ f ∣∣ (∫ f)−1. Then∣∣∣∣∫ f

∣∣∣∣ = α

∫f =

∫αf

In particular,∫αf is real. Therefore∣∣∣∣∫ f

∣∣∣∣ = Re

∫αf =

∫Re(αf) ≤

∫|αf | =

∫|f |.

Corollary 9.2. Let (X,A, µ) be a measure space. Let f, g : X → [−∞,∞] or f, g : X → C. If fis measurable, g is integrable, and |f | ≤ |g|, then f is integrable.

Proof. Note∫|f | ≤

∫|g| <∞ and apply the previous theorem.

37

10 Fatou’s Lemma

Lemma 10.1 (Fatou’s Lemma). Let (X,A, µ) be a measure space. If fn : X → [0,∞] (n = 1, 2, . . .)is a sequence of measurable functions, then∫

lim inf fn ≤ lim inf

∫fn.

Proof. Define gk = infn≥k fn for k ∈ N. Then (gk) is an increasing sequence of measurable functionsand limk gk = lim inf fn. By the monotone convergence theorem,∫

lim inf fn = limk

∫gk.

Note gk ≤ fn for n ≥ k. So∫gk ≤

∫fn for n ≥ k. Thus

∫gk ≤ infn≥k

∫fn. Therefore∫

lim inf fn = limk

∫gk ≤ lim

kinfn≥k

∫fn = lim inf

∫fn.

38

11 Dominated Convergence Theorem

Theorem 11.1. (Dominated Convergence Theorem) Let (X,A, µ) be a measure space. Let fn :X → C be a sequence of measurable functions. Let f : X → C be a function. If fn → f pointwiseand there exists an integrable function g : X → [0,∞] such that |fn| ≤ g for all n, then f isintegrable, fn is integrable for each n,

limn

∫|f − fn| = 0,

and

limn

∫fn =

∫f

Proof. Since f is the pointwise limit of a sequence of measurable functions, Corollary 4.17 implies fis measurable. Since fn → f pointwise and |fn| ≤ g for all n, we have |f | ≤ g, and so Corollary 9.2implies f is integrable. Corollary 9.2 implies fn is integrable for each n. Then f − fn is integrablefor each n. Since |f − fn| ≤ 2g, we have 0 ≤ 2g − |f − fn|. Fatou’s lemma,∫

2g =

∫limn

(2g − |f − fn|)

=

∫lim inf

n(2g − |f − fn|)

≤ lim infn

∫(2g − |f − fn|)

= lim infn

(∫2g −

∫|f − fn|

)=

∫2g + lim inf

n

(−∫|f − fn|

)=

∫2g − lim sup

n

∫|f − fn|

Since∫

2g is finite, we can subtract it to obtain lim supn

∫|f − fn| ≤ 0. So we have

0 ≤ lim infn

∫|f − fn| ≤ lim sup

n

∫|f − fn| ≤ 0.

Thus

limn

∫|f − fn| = lim sup

n

∫|f − fn| = lim inf

n

∫|f − fn| = 0.

Furthermore,

limn

∣∣∣∣∫ fn −∫f

∣∣∣∣ = limn

∣∣∣∣∫ (fn − f)

∣∣∣∣ ≤ limn

∫|f − fn| = 0,

whence limn

∫fn =

∫f .

39

12 Interchanging Limits and Derivatives with Integrals

Theorem 12.1. Let (X,A, µ) be a measure space. Let f : X × [a, b]→ C. Suppose that f( · , t) :X → C is integrable for each t ∈ [a, b]. Define F (t) =

∫X f(x, t)dµ(x) for each t ∈ [a, b].

(a) Suppose there exists an integrable function g : X → [0,∞] such that |f(x, t)| ≤ g(x) for allx ∈ X, t ∈ [a, b]. Suppose c ∈ [a, b] and limt→c f(x, t) = f(x, c) for every x ∈ X. Then

limt→c

F (t) = limt→c

∫Xf(x, t)dµ(x) =

∫Xf(x, c)dµ(x) = F (c).

(b) Suppose∂f

∂t(x, t) exists for all x ∈ X, t ∈ [a, b]. Suppose there exists an integrable function

g : X → [0,∞] such that |∂f∂t

(x, t)| ≤ g(x) for all x ∈ X, t ∈ [a, b]. For any c ∈ [a, b],

F ′(c) =

∫X

∂f

∂t(x, c)dµ(x).

Proof. The idea is to combine the dominated convergence theorem with the following fact fromundergraduate analysis: limx→c h(x) = L iff limn→∞ h(xn) = L for every sequence (xn) withxn 6= c and xn → c.

(a): Exercise.

(b): Let (tn) be any sequence of numbers in [a, b] such that tn 6= c for all n and tn → c. Define

fn(x) =f(x, tn)− f(x, c)

tn − c

for each x ∈ X. Then fn is measurable and limn fn(x) =∂f

∂t(x, c). By the mean value theorem,

|fn(x)| ≤ supt∈[a,b]

|∂f∂t

(x, t)| ≤ g(x)

for all x ∈ X. By the dominated convergence theorem,

= limn

F (tn)− F (c)

tn − c

= limn

1

tn − c

(∫Xf(x, tn)dµ(x)−

∫Xf(x, c)dµ(x)

)= lim

n

∫X

f(x, tn)− f(x, c)

tn − cdµ(x)

= limn

∫Xfn(x)dµ(x)

=

∫X

∂f

∂t(x, c)dµ(x).

Since (tn) is arbitrary, the limit

F ′(c) = limt→c

F (t)− F (c)

t− c

40

exists, and

F ′(c) =

∫X

∂f

∂t(x, c)dµ(x).

41

13 Almost Everywhere

Definition 13.1. Let (X,A, µ) be a measure space. If A ∈ A, µ(Ac) = 0, and P is a property thatholds for every x ∈ A, then we say that P holds almost everywhere or that P holds for almostevery x ∈ X.

We often abbreviate “almost everywhere” and “almost every” to “a.e.” The concept of a.e. dependson the measure µ. When clarity demands it, we write µ-a.e. instead of a.e.

For example, if f, g : X → C and there exists A ∈ A such that µ(Ac) = 0 and f(x)=g(x) for everyx ∈ A, then we say that f = g a.e.

As another example, if fn : X → C (n = 1, 2, . . .) is a sequence functions and there exists A ∈ Asuch that µ(Ac) = 0 and (fn(x)) converges for each x ∈ A, we say that fn converges a.e.

Remark: “almost everywhere” means “everywhere except on a set of measure zero”

Remark: Almost everywhere convergence is not a topological convergence; there is no topologythat can be placed on the set of measurable functions for which convergence in the topology isequivalent to convergence almost everywhere.

Theorem 13.2. Let (X,A, µ) be a measure space. Let f : X → [0,∞] be a measurable function.Then

∫f = 0 iff f = 0 a.e.

Proof. Case 1: f is simple. Let f =∑n

i=1 ci1Ei be the standard representation of f . So ci ≥ 0 andEi = f−1(ci) for each 1 ≤ i ≤ n. Then

∫f =

∑ni=1 ciµ(Ei) = 0 iff for every 1 ≤ i ≤ n either

ci = 0 or µ(Ei) = 0 iff f =∑n

i=1 ci1Ei = 0 a.e..

Case 2: f not simple. Assume f = 0 a.e.. For every measurable simple function s with 0 ≤ s ≤ f ,we have s = 0 a.e., hence

∫s = 0 by Case 1. Therefore∫

f = sup

∫s : s measurable simple, 0 ≤ s ≤ f

= sup 0 = 0.

Conversely, assume∫f = 0. Note x ∈ X : f(x) > 0 =

⋃∞n=1En, where En =

x ∈ X : f(x) ≥ n−1

.

For every n ∈ N, we have 1En ≤ nf , and so

µ(En) =

∫1En ≤ n

∫f = 0.

Therefore

µ(x ∈ X : f(x) > 0) ≤∞∑n=1

µ(En) = 0

Thus f = 0 a.e.

Corollary 13.3. Let (X,A, µ) be a measure space. Let f, g : X → [0,∞] be measurable functions.

(a) If f ≤ g a.e., then∫f ≤

∫g.

(b) If f = g a.e., then∫f =

∫g.

42

Proof. (a): Suppose f ≤ g a.e. Then there exists A ∈ A such that µ(Ac) = 0 and f(x) ≤ g(x) forall x ∈ A. So f1Ac = 0 a.e. and f1A ≤ g everywhere. Therefore∫

f =

∫(f1A + f1Ac) =

∫f1A +

∫f1Ac =

∫f1A ≤

∫g.

(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.

Corollary 13.4. Let (X,A, µ) be a measure space. Let f, g : X → [−∞,∞] be measurablefunctions.

(a) If f ≤ g a.e. and both∫f and

∫g are defined, then

∫f ≤

∫g.

(b) If f = g a.e. and both∫f and

∫g are defined, then

∫f =

∫g.

Proof. (a): Suppose f ≤ g a.e. Then f+ ≤ g+ a.e and g− ≤ f− a.e. Therefore∫f+ ≤

∫g+ and∫

g− ≤∫f−. Thus ∫

f =

∫f+ −

∫f− ≤

∫g+ −

∫g− =

∫g.

(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.

Corollary 13.5. Let (X,A, µ) be a measure space. Let f, g : X → C be measurable functions. Iff = g a.e. and both f and g are integrable, then

∫f =

∫g.

Proof. Suppose f = g a.e. Then Re f = Re g a.e and Im f = Im g a.e. Therefore∫f =

∫Re f + i

∫Im f =

∫Re g + i

∫Im g =

∫g.

Theorem 13.6. Let (X,A, µ) be a measure space. If f : X → [0,∞] is integrable, then µ(x ∈ X : f(x) =∞) =0, hence f is finite a.e.

Proof. For each n ∈ N, define En = x ∈ X : n ≤ f(x). Since x ∈ X : f(x) =∞ ⊆ En and1En ≤ n−1f , we have

µ (x ∈ X : f(x) =∞) ≤ µ(En) =

∫1En ≤ n−1

∫f.

Since∫f is finite, letting n→∞ gives µ (x ∈ X : f(x) =∞) = 0.

Corollary 13.7. Let (X,A, µ) be a measure space. If f : X → [−∞,∞] is an integrable function,then µ(x ∈ X : f(x) =∞) = µ(x ∈ X : f(x) = −∞) = 0, hence f is finite a.e.

Theorem 13.8. (Almost Everywhere Dominated Convergence Theorem) Let (X,A, µ) be a mea-sure space. Let fn : X → C (n = 1, 2, . . .) be a sequence of measurable functions. Let f : X → Cbe a measurable function. If fn → f a.e. and there exists an integrable function g : X → [0,∞]such that |fn| ≤ g a.e. for all n, then f is integrable, fn is integrable for each n,

limn

∫|f − fn| = 0 and lim

n

∫fn =

∫f.

43

Proof. Choose a set A ∈ A such that µ(Ac) = 0 and such that, for each x ∈ A, we have limn fn(x) =f(x) and |fn(x)| ≤ g(x) for all n. (To see that such a set A exists, argue as follows. Since fn → fa.e, we have that there exists A0 ∈ A with µ(Ac

0) = 0 such that limn fn(x) = f(x) for all x ∈ A0. Foreach n, we have |fn| ≤ g a.e., so there exists An ∈ A with µ(Ac

n) = 0 such that |fn(x)| ≤ g(x) for allx ∈ An. Then A =

⋂∞n=0An is the desired set.) Then, for each x ∈ A, |f(x)| = | limn fn(x)| ≤ g(x).

Since |f |1Ac = 0 a.e. and |f |1A ≤ g everywhere, we have∫|f | =

∫(|f |1A + |f |1Ac) =

∫|f |1A +

∫|f |1Ac =

∫|f |1A ≤

∫g <∞.

Thus f is integrable. Similar arguments show that f1A, fn, and fn1A are integrable for each n.Moreover, f1A is integrable and fn1A is integrable for each n. For each n, |f − fn| = |f1A− fn1A|a.e., and so

∫|f − fn| =

∫|f1A − fn1A|. We also have fn1A → f1A pointwise and fn1A ≤ g for

each n. So Theorem 13.2 and the dominated convergence theorem implies

limn

∫|f − fn| = lim

n

∫|f1A − fn1A| = 0.

Furthermore,

limn

∣∣∣∣∫ f −∫fn

∣∣∣∣ = limn

∣∣∣∣∫ (f − fn)

∣∣∣∣ ≤ limn

∫|f − fn| = 0,

which shows limn

∫fn =

∫f .

44

14 Complete Measures

If we compare Theorem 11.1 (dominated convergence theorem) to Theorem 13.8 (almost everywheredominated convergence theorem), we notice that f being measurable is part of the conclusion ofTheorem 11.1 but it is part of the hypothesis in Theorem 13.8. This is because a pointwise limit ofmeasure functions is measurable, but an a.e. limit of measurable functions may not be measurable.We explore this in this section.

Definition 14.1. Let (X,A, µ) be a measure space. We say that µ is complete (or that (X,A, µ)is complete) if the following property holds: If N ∈ A, µ(N) = 0, and A ⊆ N , then A ∈ A. Inother words, µ is complete if all subsets of measure zero sets are measurable.

Theorem 14.2. Let (X,A, µ) be a measure space. The following are equivalent.

(a) µ is complete.

(b) For all f, g : X → C, if f is measurable and f = g a.e., then g is measurable.

(c) For all f, f1, f2, . . . : X → C, if fn is measurable for each n and fn → f a.e., then f ismeasurable.

Proof. (a) ⇒ (b): Assume f is measurable and f = g a.e. Then there exists a set A ∈ A such thatµ(Ac) = 0 and f(x) = g(x) for all x ∈ A. Let B ∈ B be given. Write

g−1(B) = (g−1(B) ∩A) ∪ (g−1(B) ∩Ac).

Note g−1(B) ∩ A = f−1(B) ∩ A ∈ A. Note g−1(B) ∩ Ac ⊆ Ac, Ac ∈ A, and µ(Ac) = 0. Since µ iscomplete, g−1(B) ∩Ac ∈ A. Therefore g−1(B) ∈ A. This proves g is measurable.

(b) ⇒ (c): Assume fn is measurable for each n and fn → f a.e.. Then there exists a set A ∈ Asuch that µ(Ac) = 0 and fn(x) → f(x) for all x ∈ A. Define gn = fn1A and g = f1A. Then gn ismeasurable for each n and gn → g pointwise. So g is measurable. Moreover, g = f a.e. Then (b)implies f is measurable.

(c) ⇒ (b): Assume f is measurable and f = g a.e. Define hn = f for all n. Then hn is measurablefor all n and hn → g a.e.. Then (c) implies g is measurable.

(b) ⇒ (a): We prove the contrapositive. Assume µ is not complete. So there exist sets A,N ⊆ Xsuch that N ∈ A, µ(N) = 0, A ⊆ N , and A /∈ A. Define f = 1Nc and g = 1Ac . Then f = g a.e., fis measurable, but g is not measurable.

Remark: The theorem still holds if we replace C by [0,∞], [−∞,∞], or R. The proof is easy tomodify.

Remark: In the almost everywhere dominated convergence theorem, if the measure µ is complete,then f being measurable can be part of the conclusion, rather than part of the hypothesis.

The next theorem says that the Caratheodory restriction theorem produces a complete measure.

45

Theorem 14.3. Let µ∗ be an outer measure on a set X. Let A∗ be the σ-algebra of µ∗-measurablesets and let µ be the measure which is the restriction of µ∗ to A∗. Then µ is complete.

Proof. Assume N ∈ A∗, µ(N) = 0, and A ⊆ N . Then µ∗(A) ≤ µ∗(N) = µ(N) = 0, henceµ∗(A) = 0. Therefore, for any E ∈ P(X),

µ∗(E ∩A) + µ∗(E ∩Ac) ≤ µ∗(A) + µ∗(E) = µ∗(E).

So A ∈ A∗.

Corollary 14.4. Lebesgue measure is complete.

46

15 Completion of Measures (Optional)

Definition 15.1. Let (X,A, µ) be a measure space. The completion of A with respect to µ isthe collection A consisting of all sets of the form A∪B, where A ∈ A, B ∈ P(X), and there existsN ∈ A such that B ⊆ N and µ(N) = 0.

Theorem 15.2. Let (X,A, µ) be a measure space.

(a) A is smallest σ-algebra containing all sets E such that E ⊆ N for some N ∈ A with µ(N) = 0.

(b) A is smallest σ-algebra on which some extension of µ is complete.

Theorem 15.3. Let (X,A, µ) be a measure space. There is a unique extension of µ to A. Thisextension is called the completion of µ and is denoted by µ. The measure space (X,A, µ) is calledthe completion of (X,A, µ).

Theorem 15.4. The Lebesgue σ-algebra on R, L(R), is the completion of the Borel σ-algebra onR, B(R).

Theorem 15.5. The Lebesgue measure on (R,L(R)) is the completion of the Lebesgue measureon (R,B(R))

Theorem 15.6. Let µ be a measure defined on a σ-algebra A. If µ is σ-finite, then the measureobtained by doing a Caratheodory extension of µ is the completion of µ.

Remark: Exploring what happens in the non-σ-finite case is an exercise.

Theorem 15.7. Let (X,A, µ) be a measure space and let (X,A,mu) be its completion.

(a) If f : X → R is A-measurable, then f is A-measurable.

(b) If f : X → R is A-measurable, then there is a A-measurable function g : X → R such thatf = g µ-a.e.

Proof. (a): Use A ⊆ A and the definition of measurability.

(b): Prove it first for f simple. For general f , consider a sequence of simple functions convergingpointwise to f .

Remark: The theorem still holds if we replace C by [0,∞], [−∞,∞], or R. The proof is easy tomodify.

47

16 The Lebesgue Integral Extends the Riemann Integral

In this section, X is the interval [a, b], the σ-algebra is the collection of Lebesgue measurable subsetsof [a, b], and the measure is the Lebesgue measure on R restricted to this σ-algebra. It is easy tocheck that this defines a complete measure space.

Theorem 16.1. Suppose f : [a, b] → R is bounded on [a, b]. If f is Riemann integrable on [a, b],then f is Lebesgue measurable, f is Lebesgue integrable, and the Riemann and Lebesgue integralsof f on [a, b] are equal.

Proof 1. We use the notation∫f for the lower Riemann integral of f on [a,b],

∫f for the upper

Riemann integral of f on [a, b], and∫f for the Lebesgue integral on [a, b]. Assume f is Riemann

integrable on [a, b]. Using Riemann’s condition, choose partitions P1 ⊆ . . . ⊆ Pn ⊆ Pn+1 ⊆ . . . suchthat

U(f, Pn, [a, b])− L(f, Pn, [a, b]) <1

n. (16.1)

For each n, write Pn = x0, . . . , xk, and define

sn =

k∑i=1

( inf[xi−1,xi]

f)1[xi−1,xi], tn =

k∑i=1

( sup[xi−1,xi]

f)1[xi−1,xi]

so that ∫sn = L(f, Pn, [a, b]),

∫tn = U(f, Pn, [a, b])

Notes1 ≤ . . . ≤ sn ≤ sn+1 . . . ≤ f ≤ . . . ≤ tn+1 ≤ tn ≤ . . . ≤ t1.

Thens1 ≤ s ≤ f ≤ t ≤ t1

where we have defined s = limn sn and t = limn sn. It follows that s and t are finite everywhereand Lebesgue integrable. Since 0 ≤ sn − s1 ↑ s− s1, the monotone convergence theorem implies

limn

∫(sn − s1) =

∫(s− s1),

and adding∫s1 gives

limnsn =

∫s.

A similar argument gives

limntn =

∫t.

Combining the above with (16.1) gives∫(t− s) = lim

n

∫(tn − sn) = lim

n(U(f, Pn, [a, b])− L(f, P,[a, b])) = 0.

48

Since t − s ≥ 0, it follows that t − s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, wehave s ≤ f ≤ t, and so s = f = t a.e. Since s and t are measurable and the Lebesgue measure iscomplete, f is measurable. Moreover, ∫

s =

∫f =

∫t

For each n, ∫f ≤

∫tn ≤

∫sn +

1

n≤∫f +

1

n

Taking n→∞ gives ∫f ≤

∫t =

∫s ≤

∫f

hence ∫f =

∫f =

∫t =

∫s =

∫f

So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.

49

Here is a slightly different proof.

Proof 2. We use the notation∫f for the lower Riemann integral of f on [a,b],

∫f for the upper

Riemann integral of f on [a, b], and∫f for the Lebesgue integral on [a, b]. Choose a sequence (P ′n)

of partitions of [a, b] such that

limnL(f, P ′n, [a, b]) =

∫f.

Choose a sequence (P ′′n ) of partitions of [a, b] such that

limnU(f, P ′′n , [a, b]) =

∫f.

Define Pn =⋃n

i=1(P′i ∪ P ′′i ) for each n. Then P1 ⊆ P2 ⊆ . . .,

limnL(f, Pn, [a, b]) =

∫f, lim

nU(f, Pn, [a, b]) =

∫f.

For each n, write Pn = x0, . . . , xk, and define

sn =k∑

i=1

( inf[xi−1,xi]

f)1[xi−1,xi], tn =k∑

i=1

( sup[xi−1,xi]

f)1[xi−1,xi]

so that ∫sn = L(f, Pn, [a, b]),

∫tn = U(f, Pn, [a, b])

and

limnsn =

∫f, lim

n

∫tn =

∫f.

Notes1 ≤ . . . ≤ sn ≤ sn+1 . . . ≤ f ≤ . . . ≤ tn+1 ≤ tn ≤ . . . ≤ t1.

Thens1 ≤ s ≤ f ≤ t ≤ t1

where we have defined s = limn sn and t = limn sn. It follows that s and t are finite everywhereand Lebesgue integrable. Since 0 ≤ sn − s1 ↑ s− s1, the monotone convergence theorem implies

limn

∫(sn − s1) =

∫(s− s1),

and adding∫s1 gives ∫

s = limnsn =

∫f

A similar argument gives ∫t = lim

ntn =

∫f

Since f is Riemann integrable on [a, b], we have∫(t− s) =

∫f −

∫f = 0.

50

Since t− s ≥ 0, it follows that t− s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, we haves ≤ f ≤ t. Therefore s = f = t a.e. Since s and t are measurable and the Lebesgue measure iscomplete, f is measurable. Moreover,∫

f =

∫s =

∫t =

∫f =

∫f

So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.

51

17 Lebesgue’s Condition for Riemann Integrability

In this section, X is the interval [a, b], the σ-algebra is the collection of Lebesgue measurable subsetsof [a, b], and the measure is the Lebesgue measure on R restricted to this σ-algebra. It is easy tocheck that this defines a complete measure space.

Theorem 17.1. (Lebesgue’s Condition for Riemann Integrability) Let f : [a, b]→ R be a boundedfunction. Then f is Riemann integrable on [a, b] iff f is continuous at almost every point of [a, b].

The idea of the proof is as follows. According to Riemann’s condition, a bounded function f :[a, b]→ R will be Riemann integrable if and only if the sum

n∑i=1

(( sup[xi−1,xi]

f)− ( inf[xi−1,xi]

f)

)∆xi

can be made arbitrarily small by choosing an appropriate partition of [a, b]. Split this sum into twoparts, say S1 + S2, where S2 contains terms from subintervals where continuity makes the difference(

(sup[xi−1,xi] f)− (inf [xi−1,xi] f))

small and S1 contains the remaining terms where discontinuities

prevent the difference from being small. In S2, each term is small, so a large number of terms canoccur and still keep S2 small. In S1, the terms may not be small, but they are bounded in size(because f is bounded), so that S1 will be small if the sum of the lengths of the subintervals issmall. Hence we may expect that the set of discontinuities of an Riemann integrable function canbe covered by intervals whose total length is small.

To make the argument above precise, we need some preparation.

Definition 17.2. Let f : [a, b]→ R be a bounded function. For any interval I ⊆ [a, b], the number

Ωf (I) = supx∈I

f(x)− infx∈I

f(x) = sup |f(x)− f(y)| : x, y ∈ I

is called the oscillation of f on I.

Lemma 17.3. Let f : [a, b]→ R be a bounded function.

(a) If I and J are intervals such that I ⊆ J ⊆ [a, b], then Ωf (I) ≤ Ωf (J).

(b) For every x ∈ [a, b], f is continuous at x iff for every ε > 0 there exists a δ > 0 such thatΩf ([a, b] ∩ (x− δ, x+ δ)) < ε.

(c) D =⋃∞

k=1Dk, where D is the set of points x ∈ [a, b] at which f is discontinuous, and

Dk = x : x ∈ [a, b],Ωf ([a, b] ∩ (x− δ, x+ δ)) ≥ 1/k for all δ > 0 .

(d) For each k ∈ N, Dk is closed.

(e) If E is a compact subset of [a, b] \Dk, then there is a δ > 0 such that Ωf (J) < 1/k for everyinterval J ⊆ E with `(J) < δ.

52

Proof. The proofs of (a)-(c) are easy exercises.

(d): To see that Dk is closed, let (xn) be any sequence of points in Dk that converges to apoint x0 ∈ R. Since [a, b] is closed and xn ∈ [a, b] for all n, we have x0 ∈ [a, b]. Let δ > 0be given. Choose n large enough that xn ∈ (x0 − δ, x0 + δ). Choose δ′ > 0 small enough that(xn − δ′, xn + δ′) ⊆ (x0 − δ, x0 + δ). Since xn ∈ Dk, we have

Ωf ([a, b] ∩ (xn − δ′, xn + δ′)) ≥ 1

k.

Since [a, b] ∩ (xn − δ′, xn + δ′) ⊆ [a, b] ∩ (x0 − δ, x0 + δ), we have

Ωf ([a, b] ∩ (x0 − δ, x0 + δ)) ≥ Ωf ([a, b] ∩ (xn − δ′, xn + δ′)) ≥ 1

k.

Thus x0 ∈ Dk. Hence Dk is closed.

(e): For each x ∈ E, we have x ∈ [a, b] \Dk, so there exists a δ(x) > 0 such that

Ωf ([a, b] ∩ (x− δ(x), x+ δ(x))) <1

k.

Then E ⊆⋃

x∈E(x − δ(x)/2, x + δ(x)/2). Since E is compact, there exist finitely many pointsx1, . . . , xm ∈ E such that E ⊆

⋃mi=1(xi − δ(xi)/2, xi + δ(xi)/2). Define δ = min δ(x1), . . . , δ(xm).

Let J be any interval contained in E with `(J) < δ. Then J intersects (x − δ(xi)/2, x + δ(xi)/2)for some i. Since `(J) < δ(xi), we have J ⊆ (xi − δ(xi), xi + δ(xi)), and so

Ωf ([a, b] ∩ J) ≤ Ωf ([a, b] ∩ (xi − δ(xi), xi + δ(xi))) <1

k.

Proof of Theorem 17.1. We will use the same notation as the lemma.

We first assume λ(D) = 0 and show that Riemann’s condition is satisfied. Let k ∈ N be arbitrary.Since Dk ⊆ D, we have λ(Dk) = 0. Choose finite open intervals I1, I2, . . . which cover Dk andsatisfy

∑∞i=1 `(Ii) <

1k . Since Dk is closed and bounded, Dk is compact. So Dk is covered by

finitely many intervals I1, . . . , IN . Define E = [a, b]\⋃N

i=1 Ii. So E is compact and E ⊆ [a, b] ⊆ Dk.Thus there is a δ > 0 such that Ωf (J) < 1/k for every interval J ⊆ E with `(J) < δ. Note E is theunion of finitely many closed subintervals of [a, b]. Further divide E into a finite number of closedsubintervals of length < δ. The endpoints of these subintervals form a partition P = x0, . . . , xnof [a, b]. Then

U(f, P, [a, b])− L(f, P, [a, b]) =∑i=I1

Ωf ([xi−1, xi])∆xi +∑i=I2

Ωf ([xi−1, xi])∆xi

where I1 is the set of those i ∈ 1, . . . , n such that [xi−1, xi] contains points of Dk and I2 is theset of remaining i ∈ 1, . . . , n. For each i ∈ I2, [xi−1, xi] is an interval contained in E with length< δ, and so Ωf ([xi−1, xi]) < 1/k. Thus∑

i=I2

Ωf ([xi−1, xi])∆xi ≤1

k

∑i=I2

∆xi ≤b− ak

.

53

Since⋃

i∈I1 [xi−1, xi] ⊆⋃N

i=1 Ii, we have

∑i∈I1

∆xi ≤N∑i=1

`(Ik) <1

k,

and so ∑i=I1

Ωf ([xi−1, xi])∆xi <Ωf ([a, b])

k.

Therefore

U(f, P, [a, b])− L(f, P, [a, b]) <Ωf ([a, b]) + b− a

k

Since this holds for every k ∈ N, Riemann’s condition holds, and so f is Riemann integrable on[a, b].

For the converse, we assume λ(D) > 0 and show that Riemann’s condition is not satisfied. Wemust have λ(Dk) > 0 for some k ∈ N. Set ε = λ(Dk)/k. For any partition P = x0, . . . , xn of[a, b], we have

U(f, P, [a, b])− L(f, P, [a, b]) =n∑

i=1

Ωf ([xi−1, xi])∆xi ≥∑i∈I

Ωf ([xi−1, xi])∆xi

where I is the set of those i ∈ 1, . . . , n such that (xi−1, xi) contains points of Dk. Then Dk \P ⊆⋃i∈I(xi−1, xi), and so∑

i∈I∆xi =

∑i∈I

λ((xi−1, xi)) ≥ λ(Dk \ P ) = λ(Dk) = kε.

For each i ∈ I, there is a point x ∈ (xi−1, xi)∩Dk. Choose δ > 0 such that (x−δ, x+δ) ⊆ [xi−1, xi].Then, since x ∈ Dk, we have

Ωf ([xi−1, xi]) ≥ Ωf ([a, b] ∩ (x− δ, x+ δ)) ≥ 1

k.

Combining the above inequalities, we get

U(f, P, [a, b])− L(f, P, [a, b]) ≥ ε.

Thus Riemann’s condition is satisfied, and so f is not Riemann integrable on [a, b].

54

18 Normed Spaces and Banach Spaces

In this section, K denotes either R or C, and V is a vector space with scalar field K. The proofsof the theorems are left as exercises; they are simple extensions of the proofs of the analogoustheorems in R.

Definition 18.1. A norm on V is a function ‖ · ‖ : V → [0,∞) such that the following conditionsare satisfied for all x, y ∈ V and c ∈ K:

(i) ‖x+ y‖ ≤ ‖x‖+ ‖y‖ (Triangle Inequality)

(ii) ‖cx‖ = |c|‖x‖

(iii) ‖x‖ = 0 if and only if x = 0

The pair (V, ‖ · ‖) is called a normed space. When the norm is clear from context, we write Vinstead of (V, ‖ · ‖).

Example 18.2. For 1 ≤ p <∞, define

‖x‖p =

(n∑

i=1

|xi|p)1/p

.

for each x = (x1, . . . , xn) ∈ Kn. For p =∞, define

‖x‖∞ = max1≤i≤n

|xi|.

for each x = (x1, . . . , xn) ∈ Kn. For each 1 ≤ p ≤ ∞, ‖ · ‖p is a norm on Kn called the p-norm.When p = 2, ‖ · ‖p is also called the Euclidean norm.

Definition 18.3. Let (V, ‖ · ‖) be a normed space. A sequence (xn) in V is called convergentif there exists a point x ∈ X with the following property: For every ε > 0 there exists a positiveinteger N such that for every integer n ≥ N we have ‖xn − x‖ < ε. In such case, x is called thelimit of (xn), we say that (xn) converges to x, and we write limn xn = x or xn → x. A sequencewhich is not convergent is called divergent.

Theorem 18.4. Let (V, ‖ · ‖) be a normed space. If (xn) is a convergent sequence in V , then (xn)has a unique limit. In other words, if (xn) is a convergent sequence in V such that xn → x ∈ V andxn → y ∈ V , then x = y. (This justifies saying “the limit” rather than “a limit” in the definition.)

Theorem 18.5. Let (V, ‖ · ‖) be a normed space. Let x ∈ X and let (xn) be a sequence in X.Then xn → x iff ‖xn − x‖ → 0.

Theorem 18.6. (Reverse Triangle Inequality) Let (V, ‖ ·‖) be a normed space. For every x, y ∈ V ,we have |‖x‖ − ‖y‖| ≤ ‖x− y‖.

Theorem 18.7. Let (V, ‖ · ‖) be a normed space. If (xn) and (yn) are convergent sequences in Vand c ∈ K, then

(a) limn(xn + yn) = limn xn + lim yn.

(b) limn cxn = c limxn

55

(c) limn ‖xn‖ = ‖ limn xn‖

Definition 18.8. Let (V, ‖ · ‖) be a normed space. A sequence (xn) in V is called Cauchy if forevery ε > 0, there exists a positive integer there exists a positive integer N such that for all integersm,n ≥ N we have ‖xm − xn‖ < ε.

Theorem 18.9. Every convergent sequence in a normed space is Cauchy.

Definition 18.10. Let (V, ‖ · ‖) be a normed space. A set E ⊆ V is called complete if everyCauchy sequence in E converges and its limit is in E. If V is complete, V is called a Banachspace.

Example 18.11. Cn and Rn are complete, but Qn is not.

The following theorem will be useful.

Theorem 18.12. Let (V, ‖ · ‖) be a normed space. If (xn) is a Cauchy sequence in V and (xn) hasa subsequence which converges to x ∈ V , then (xn) converges to x.

56

19 Lp Spaces: Definitions and Basic Properties

In this section, K denotes either R or C.

Definition 19.1. Let (X,A, µ) be a measure space. Let f : X → K be a measurable function.For each 1 ≤ p <∞, define the p-norm of f to be

‖f‖p =

(∫f(x)dµ(x)

)1/p

with the convention that (∞)1/p =∞. Define the ∞-norm of f to be

‖f‖∞ = inf M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X .

Let 1 ≤ p ≤ ∞, define Lp == Lp(µ) = Lp(X,A, µ) to be the set of all measurable functionsf : X → K such that ‖f‖p < ∞, where we adopt the convention that functions which are equalalmost everywhere as regarded as equals elements of Lp. Thus, to be accurate, Lp is the space ofequivalence classes of measurable functions f : X → K such that ‖f‖p <∞, where the equivalencerelation is “equal almost everywhere.” We will show below that ‖ · ‖p is a norm on Lp. Unlessanother norm is given, we always assume Lp is equipped with the p-norm.

Remark. Let (X,A, µ) be a measure space. Let f : X → K. The∞-norm of f equals the essentialsupremum of |f |:

‖f‖∞ = ess sup |f | = inf M ∈ [0,∞] : |f(x)| ≤M for almost every x ∈ X

It should be compared to the uniform norm of f , which equals the supremum of |f |:

‖f‖u = sup |f | = sup |f(x)| : x ∈ X = inf M ∈ [0,∞] : |f(x)| ≤M for every x ∈ X .

It is easy to check (exercise) that

|f | ≤ ‖f‖u, ‖f‖∞ ≤ ‖f‖u, |f | ≤ ‖f‖∞a.e.

If (X,A, µ) = (R,L, λ), the ∞-norm and the uniform norm are equal for continuous functionsf : R → K, but they are not equal in general. For example, the function f(x) = x1Q∩[0,1](x) has‖f‖u = 1 and ‖f‖∞ = 0.

Theorem 19.2. Let (X,A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For every measurable functionf : X → K, we have ‖f‖p = 0 iff f = 0 a.e. iff f = 0 in Lp.

Proof. The second equivalence is simply the convention that functions which are equal a.e. areequal as elements of Lp. Now we prove the first equivalence. For 1 ≤ p <∞, ‖f‖p = 0 iff

∫|f |p = 0

iff |f | = 0 a.e. iff f = 0 a.e. Assume p = ∞. If f = 0 a.e., then f ≤ 0 a.e., so ‖f‖∞ ≤ 0, hence‖f‖∞ = 0. Conversely, if ‖f‖∞ = 0, then |f | ≤ ‖f‖∞ = 0 a.e., then |f | = 0 a.e., then f = 0 a.e.

Theorem 19.3. Let (X,A, µ) be a measure space. Let 1 ≤ p ≤ ∞. Let f : X → K be measurablefunction and let c ∈ K. Then ‖cf‖p = |c|‖f‖p. Consequently, if f ∈ Lp, then cf ∈ Lp.

57

Proof. If 1 ≤ p <∞, then

‖cf‖pp =

∫|cf |p = |c|p

∫|f |p = |c|p‖f‖pp.

If p =∞, then

‖cf‖∞ = inf M ∈ [0,∞] : |cf(x)| ≤M for almost every x ∈ X= inf M ∈ [0,∞] : |f(x)| ≤M/|c| for almost every x ∈ X= inf

|c|M ′ : M ′ ∈ [0,∞], |f(x)| ≤M ′ for almost every x ∈ X

= |c| inf

M ′ : M ′ ∈ [0,∞], |f(x)| ≤M ′ for almost every x ∈ X

= |c|‖f‖∞

The third equality is a simple exercise in showing that two sets are equal.

The next theorem will lead to the triangle inequality for Lp spaces. But it is also extremelyimportant in its own right.

Theorem 19.4. (Holder’s Inequality) Let (X,A, µ) be a measure space. Let 1 ≤ p, q ≤ ∞ suchp−1 + q−1 = 1 with the convention that ∞−1 = 0. (We say p and q are conjugate exponents, wesay q is the conjugate exponent of p, and vice versa.) If f, g : X → K are measurable, then

‖fg‖1 ≤ ‖f‖p‖g‖q.

Consequently, if f ∈ Lp and g ∈ Lq, then fg ∈ L1.

The proof of Holder’s inequality will use the following lemma.

Lemma 19.5. If a ≥ 0, b ≥ 0, p > 0, q > 0, and p−1 + q−1 = 1, then

ab ≤ p−1ap + q−1bq..

Proof. For fixed b, p, q, we consider h(a) = p−1ap + q−1bq − ab, a ∈ [0,∞). Then h′(a) = ap−1 − b.So h′(a) > 0 if a > b1/(p−1) and h′(a) < 0 if a < b1/(p−1). Thus the minimum value of h(a) is

h(b1/(p−1)) = p−1(b1/(p−1))p + q−1bq − (b1/(p−1))b = (p−1 + q−1)bq − bq = 0

where for the second equality we used that q = p/(p−1). Therefore 0 ≤ h(a) = p−1ap + q−1bq−ab,hence ab ≤ p−1ap + q−1bq. Note that equality happens iff a = b1/(p−1), which is equivalent toap = bq (because q = p/(p− 1))

Proof of Holder’s Inequality. If 1 < p, q < ∞ and 0 < ‖f‖p, ‖g‖q < ∞, set a = |f(x)|/‖f‖p,b = |g(x)|/‖g‖q, apply the lemma, then integrate. The result is

1

‖f‖p‖g‖q

∫|fg| ≤ 1

p‖f‖pp

∫|f(x)|p +

1

q‖g‖qq

∫|g(x)|p =

1

p+

1

p= 1.

The other cases are easy. By If ‖f‖p = 0, then f = 0 a.e., then fg = 0 a.e., then ‖fg‖1 = 0, so theinequality is trivial; likewise if ‖g‖q = 0. If ‖f‖p =∞ and ‖g‖q > 0, the inequality is clear; likewiseif ‖f‖p >∞ and ‖g‖q =∞. If p =∞, then |f | ≤ ‖f‖∞ a.e., hence |fg| ≤ ‖f‖∞|g| a.e.; likewise ifq =∞.

58

The next theorem is the triangle inequality for Lp spaces.

Theorem 19.6. (Minkowski’s Inequality) Let (X,A, µ) be a measure space. Let 1 ≤ p ≤ ∞. Letf, g : X → K be measurable functions. Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p

Consequently, if f, g ∈ Lp, then f + g ∈ Lp.

Proof. If either ‖f‖p =∞ or ‖g‖p =∞, the inequality is obvious. So we assume f, g ∈ Lp. If p = 1or p =∞, the proof is easy and is left as an exercise. Assume 1 < p <∞. Note

|f + g|p = |f + g||f + g|p−1 ≤ |f ||f + g|p−1 + |g||f + g|p−1.

Integrating and applying Holder’s inequality to the two terms on the right gives∫|f + g|p ≤ ‖f‖p‖(f + g)p−1‖q + ‖g‖p‖(f + g)p−1‖q = (‖f‖p + ‖g‖p)

(∫|f + g|(p−1)q

)1/q

with p−1 + q−1 = 1. Noting (p− 1)q = p and 1/q = 1− 1/p gives∫|f + g|p ≤ (‖f‖p + ‖g‖p)

(∫|f + g|p

)1−1/p,

hence‖f + g‖pp ≤ (‖f‖p + ‖g‖p)‖f + g‖p−1p .

If ‖f + g‖p > 0, dividing by ‖f + g‖p−1p gives the result. If ‖f + g‖p = 0, the result is trivial.

Combining the theorems above gives

Theorem 19.7. Let (X,A, µ) be a measure space. For each 1 ≤ p ≤ ∞, Lp is a vector space and‖ · ‖p is a norm on Lp.

20 Lp Spaces: Completeness

Theorem 20.1. Let (X,A, µ) be a measure space. For every 1 ≤ p ≤ ∞, Lp = Lp(X,A, µ) iscomplete.

Proof. Case: p =∞. Exercise.

Case: 1 ≤ p < ∞. Let (fn) be a Cauchy sequence in Lp. Inductively choose positive integers nj(j = 1, 2, . . .) such that

‖fm − fn‖p < 2−j for all m,n ≥ nj .and nj < nj+1. Then (fnj )

∞j=1 is a subsequence of (fn). By Theorem 18.12, to show that (fn)

converges in Lp, it will suffice to show that (fnj ) converges in Lp. Note

fnk= fn1 +

k∑j=1

(fnj+1 − fnj ) (20.1)

59

Define

Gk = |fn1 |+k∑

j=2

|fnj+1 − fnj |, G = |fn1 |+∞∑j=2

|fnj+1 − fnj |.

Then (∫Gp

k

)1/p

= ‖Gk‖p ≤ ‖fn1‖p +k∑

j=1

2−j < ‖fn1‖p + 1

for each k. Note Gpk ↑ G

p. Then the monotone convergence theorem gives∫Gp = lim

k

∫Gp

k ≤ ‖fn1‖p + 1 <∞.

So Gp is integrable. It follows that Gp is finite a.e., which implies G is finite a.e. The latter meansthat, for a.e. x ∈ X, the series

fn1(x) +

∞∑j=1

(fnj+1(x)− fnj (x))

of complex numbers is absolutely convergent and (hence) convergent. For those x in the set ofmeasure zero where the series diverges, define F (x) = 0. For those x at which the series converges,define

F (x) = fn1(x) +∞∑j=1

(fnj+1(x)− fnj (x)).

So, by (20.1),fnk→ F a.e.

We will show that fnk→ F in Lp. This will complete the proof. We have |F | ≤ G, and so∫

|F |p ≤∫Gp <∞. Thus F ∈ Lp. We have

|fnk− F |p → 0 a.e.

We also have|fnk− F |p ≤ (|Gk|+ |F |)p ≤ (2G)p.

Since (2G)p is integrable, the dominated convergence theorem gives

limk‖fnk

− F‖pp = limk

∫|fnk− F |p = 0

Thus fnk→ F in Lp.

60

21 Lp Spaces: Dense Subspaces (Optional)

Definition 21.1. Let (V, ‖ · ‖) be a normed space. A set E ⊆ V is called dense if for each x in Vthere exists a sequence (xn) in E such that xn → x.

Theorem 21.2. Let (X,A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For each f ∈ Lp and eachε > 0, there exists a simple function s ∈ Lp such that ‖f − s‖p < ε. In other words, the set ofsimple functions in Lp is a dense subset of Lp.

Proof. If the theorem holds for real-valued f , then it holds for complex-valued f by consideringreal and imaginary parts. So we assume f is real-valued.

Case 1: 1 ≤ p < ∞. Let f ∈ Lp and ε > 0 be given. By Theorem 7.5, we can choose a sequenceof simple measurable functions sn such that |sn| ≤ |f | for each n and sn → f pointwise. Then|sn− f |p → 0 pointwise and |sn− f |p ≤ (|sn|+ |f |)p ≤ (2|f |)p ∈ L1. By the dominated convergencetheorem,

limn‖sn − f‖pp = lim

n

∫|sn − f |p =

∫0 = 0,

and so ‖sn − f‖p → 0. Therefore there exists N ∈ N such that ‖sn − f‖ < ε. Take s = sN . Notes ∈ Lp because ‖s‖p ≤ ‖s− f‖p + ‖f‖p < ε+∞ =∞.

Case 2: p = ∞. Let f ∈ L∞ and ε > 0 be given. Since |f | ≤ ‖f‖∞ < ∞ a.e., there existsA ∈ A such that µ(Ac) = 0 and |f(x)| ≤ ‖f‖∞ for all x ∈ A. So f1A is bounded by the number‖f‖∞. By Theorem 7.5, we can choose a sequence of simple measurable functions sn such thatsn → f1A uniformly. Thus ‖sn − f1A‖u → 0. Since ‖sn − f1A‖∞ ≤ ‖sn − f1A‖u, we also have‖sn − f1A‖∞ → 0. Finally, since sn − f1A = sn − f a.e., we have ‖sn − f‖∞ = ‖sn − f1A‖∞ → 0.Thus ‖sn − f‖∞ → 0. Therefore there exists N ∈ N such that ‖sn − f‖ < ε. Take s = sN . Notes ∈ L∞ because ‖s‖∞ ≤ ‖s− f‖∞ + ‖f‖∞ < ε+∞ =∞.

The next lemma tells us what the simple functions in Lp spaces look like.

Lemma 21.3. Let (X,A, µ) be a measure space.

(a) Every measurable simple function is in L∞.

(b) If s is a simple function and 1 ≤ p <∞, then s ∈ Lp iff s is measurable and s = 0 outside aset of finite measure.

Proof. (a): Every measurable simple function is bounded, hence belongs to L∞.

(b): Let s =∑n

i=1 ci1Ei be the standard representation of s. So c1, . . . , cn ∈ C are the distinct num-bers in the range of s and Ei = s−1(ci). Since E1, . . . , En are disjoint, we have |s|p =

∑ni=1 |ci|p1Ei .

Suppose s ∈ Lp. Then s is measurable and ‖s‖pp =∫|s|p =

∑ni=1 |ci|pµ(Ei) < ∞. Let E be the

union of those sets Ei with ci 6= 0. Then µ(E) < ∞ and Ec is the union of those sets Ei withci = 0. So for all x ∈ Ec we have s(x) = 0. (Note that if Ec = ∅, then it is vacuously true thats(x) = 0 for all x ∈ Ec.)

61

Conversely, suppose s is measurable and s = 0 outside a set of finite measure. So there exists a setA ∈ A such that µ(A) <∞ and s(x) = 0 for all x ∈ Ac. Then

‖s‖pp =

∫|s|p1A +

∫|s|p1Ac =

∫|s|p1A =

∫ n∑i=1

|ci|p1Ei∩A =

n∑i=1

|ci|pµ(Ei ∩A).

Since µ(A ∩ Ei) ≤ µ(A) <∞ for each i, it follows that ‖s‖pp <∞, hence s ∈ Lp.

Now we give some approximation theorems for the specific case of Lebesgue measure on (R,L).

Definition 21.4. A function h : R→ C is called a step function if

h =n∑

k=1

ck1Ik

for some numbers c1, . . . , cn ∈ C and intervals I1, . . . , In.

Theorem 21.5. Let (X,A, µ) = (R,L, λ). Let 1 ≤ p <∞. For each f ∈ Lp and each ε > 0, thereexists a step function h ∈ Lp such that ‖f − h‖p < ε. In other words, the set of step functions inLp is dense subset of Lp.

Proof. Let f ∈ Lp and let ε > 0. By the previous theorem, there exists simple function s ∈ Lp suchthat ‖f − s‖p < ε/2. So it will suffice to find a step function h ∈ Lp such that ‖s− h‖p < ε/2. Letc1, . . . , cn be the distinct, non-zero numbers, in the range of s. Let Ei = s−1(ci) for each i. Then

s =n∑

i=1

ci1Ei .

For each i = 1, . . . , n, we will find a step function hi ∈ Lp such that

‖1Ei − hi‖p <ε

2|ci|n.

Then the function defined by

h =n∑

i=1

cihi

will be the desired step function in Lp with ‖s− h‖p < ε/2.

Fix i ∈ 1, . . . , n. Choose δ > 0 such that 2(2δ)1/p < ε/(2|ci|n). Since s ∈ Lp, 1 ≤ p < ∞, andci 6= 0, the previous lemma implies λ(Ei) <∞. By the definition of Lebesgue measure, there existsa sequence of intervals (Ij) such that E ⊆

⋃∞j=1 Ij and

λ(Ei) ≤∞∑j=1

`(Ij) ≤ λ(Ei) + δ.

Since µ(Ei) <∞, each Ij is a finite interval. Choose N ∈ N such that

N∑j=1

`(Ij) ≤∞∑j=1

`(Ij) ≤N∑j=1

`(Ij) + δ.

62

Define hi to be the indicator function of⋃N

j=1 Ij . Then hi is a step function (verify). Since eachIj is a finite interval, hi ∈ Lp. Define gi to be the indicator function of

⋃∞j=1 Iij . Note hi ≤ gi and

1Ei ≤ gi. We have

2−p‖hi − 1Ei‖pp ≤ ‖gi − hi‖pp + ‖gi − 1Ei‖pp = λ

∞⋃j=1

Ij \N⋃j=1

Ij

+ λ

∞⋃j=1

Ij \ Ei

=

λ ∞⋃

j=1

Ij

− λ N⋃

j=1

Ij

+

λ ∞⋃

j=1

Ij

− λ(Ei)

≤ 2δ

Since δ is arbitrary, we can choose it such that 2(2δ)1/p < ε/(2|ci|n), and so

‖1Ei − hi‖p <ε

2|ci|n.

Lemma 21.6. Let (X,A, µ) = (R,L, λ). Let 1 ≤ p < ∞. A step function h : R → C belongs toLp iff

h = 0

outside a finite interval.

Proof. Exercise. Similar to the previous lemma.

Theorem 21.7. Let (X,A, µ) = (R,L, λ). Let 1 ≤ p < ∞. For each f ∈ Lp and each ε > 0,there exists a continuous function g such that g = 0 outside a finite interval and ‖f − g‖p < ε.Consequently, the set of continuous functions g such that g = 0 outside a finite interval is a densesubset of Lp.

Proof. Let f ∈ Lp and ε > 0 be given. By the previous theorem, there exists a step function h ∈ Lp

such that ‖f −h‖p < ε/2. So it will suffice to find a continuous function g such that ‖h− g‖p < ε/2and g = 0 outside a finite interval. We have

h =

n∑i=1

ck1Ik

for some numbers c1, . . . , cn ∈ C and intervals I1, . . . , In. The sum does not change if we deleteterms with ck = 0 or Ik = ∅. So we assume that ck 6= 0 and Ik 6= ∅ for all k. The previous lemmaimplies Ik is finite for all k. For each k ∈ 1, . . . , n, we will define a continuous function gk suchthat

‖1Ik − gk‖p <ε

2|ci|nand gk = 0 outside Ik. Then the function defined by

g =

n∑k=1

ckgk

63

will be a continuous function such that ‖h−g‖p ≤ ε/2 and g = 0 outside a finite interval containing⋃nk=1 Ik. observing that set of continuous functions g such that g = 0 outside a finite interval is

a subset of Lp. This will also give the density statement of the theorem, as the set of continuousfunctions g such that g = 0 outside a finite interval is easily checked to be a subset of Lp.

Let k ∈ 1, . . . , n be given. Choose δ so that (2δ)1/p < ε/(2|ck|n). Define gk to be the continuousfunction that equals 0 on (−∞, a] and [b,∞), equals 1 on [a + δ, b − δ], and is linear on [a, a + δ]and [b− δ, b] (draw a picture). Set A = [a, a+ δ]∪ [b− δ, b]. Note λ(A) ≤ 2δ and 0 ≤ 1Ik − gk ≤ 1A.Then

‖1Ik − gk‖p ≤ ‖1A‖p = µ(A)1/p ≤ (2δ)1/p <ε

2|ci|n.

64

22 Counting Measure and `p Spaces

Theorem 22.1. Let X be a nonempty set. Let f : X → [0,∞]. Let µ be counting measure on(X,P(X)). (Recall that counting measure is defined as follows: µ(A) is the number of elements inA if A is a finite subset of X, and µ(A) =∞ if A is an infinite subset of X).

(a) f is measurable

(b)∫fdµ = sup

∑x∈F f(x) : F ⊆ X, F is finite

.

(c) If X = 1, . . . , n, then∫fdµ =

∑nk=1 f(k).

(d) If X = N, then∫fdµ =

∑∞k=1 f(k).

Proof. Exercise.

Definition 22.2. If X is any nonempty set and µ is counting measure on X on (X,P(X)), wedefine `p(X) = Lp(A,P(A), µ).

Corollary 22.3. If 1 ≤ p ≤ ∞, X = 1, . . . , n, and µ is the counting measure on (X,P(X)), then`p(X) = Cn and (`p(X), ‖ · ‖p) = (Cn, ‖ · ‖p).

Corollary 22.4. If 1 ≤ p ≤ ∞, then `p(N) is the space of all sequences (an)n∈N in C such that‖(an)‖p <∞, where

‖(an)‖p =

(∑∞

i=1 |ai|p)1/p if 1 ≤ p <∞

supi∈N |ai| if p =∞

65

23 Complex-Valued Measurable Functions

If X is a set and f : X → C, then f = Re(f) + i Im(f), where Re(f) : X → R and Im(f) : X → Rare defined by Re(f)(x) = Re(f(x)) and Im(f)(x) = Im(f(x)) for every x ∈ X.

Definition 23.1. Let (X,A) be a measurable space. A function f : X → C is called measurableif Re(f) and Im(f) are measurable.

Theorem 23.2. Let (X,A) be a measurable space. Let f, g : X → C be measurable functions.Let c ∈ C. The following functions are measurable.

(i) cf

(ii) f + g

(iii) fg

(iv) |f |

Proof. (i): Write c = a + ib and f = u + iv (where a = Re(c), b = Im(c), u = Re(f), v = Im(f)).Then cf = au − bv + i(av + bu). So Re(cf) = au − bv and Im(cf) = av + bu are measurable.Therefore cf is measurable.

(ii) and (iii): Similar to (i).

(iv): For each a ∈ R,

x ∈ X : |f(x)| > a =x ∈ X : |f(x)|2 > a2

=x ∈ X : (Re(f)(x))2 + (Im(f)(x))2 > a2

∈ A.

Theorem 23.3. Let (X,A) be a measurable space. If fn : X → C is measurable for n = 1, 2, . . .and limn fn(x) exists for each x ∈ X, then limn fn is measurable.

Proof. Exercise.

Theorem 23.4. Let (X,A) be a measurable space. If f : X → C is measurable, then f−1(a) =x ∈ X : f(x) = a is measurable (i.e., belongs to A) for each a ∈ C.

Proof. Exercise.

66

24 Caratheodory Construction: Measures from Outer Measures

Lemma 24.1. Let X be any set. Let A be a σ-algebra on X. Let µ∗ be an outer measure on X.Then µ∗ is a measure on A iff µ∗ is finitely additive on A. More precisely, the restriction of µ∗ toA is a measure on A iff µ(A ∪B) = µ(A) + µ(B) for every A,B ∈ A such that A ∩B = ∅.

Proof.⇒: Theorem ?? (a),(c),(d).

⇐: LetA1, A2, . . . be any sequence of disjoint sets inA. By countably subaddivity of µ∗, µ∗ (⋃∞

i=1Ai) ≤∑∞i=1 µ

∗(Ai). To prove the reverse inequality, we now assume µ∗ is finitely additive on A. We haveµ(A ∪ B) = µ(A) + µ(B) for every A,B ∈ A such that A ∩ B = ∅. By induction, we haveµ∗ (

⋃ni=1Ai) =

∑ni=1 µ

∗(Ai) for every n ∈ N. By monotonicity and taking a limit,

∞∑i=1

µ∗(Ai) = limn→∞

n∑i=1

µ∗(Ai) = limn→∞

µ∗

(n⋃

i=1

Ai

)

≤ limn→∞

µ∗

( ∞⋃i=1

Ai

)= µ∗

( ∞⋃i=1

Ai

)

Lemma 24.2. Let X be a set. Let A be an algebra on X. Let µ : A → [0,∞] be a function. Thefollowing are equivalent.

1. µ(A ∪B) = µ(A) + µ(B) for every A,B ∈ A such that A ∩B = ∅.

2. µ(E) = µ(E ∩ F ) + µ(E ∩ F c) for every E,F ∈ A.

Proof. ⇒: Set A = E ∪ F and B = E ∩ F c. ⇐: Set E = A ∪B and F = A.

Definition 24.3. Let X be a set. Let µ∗ be an outer measure on X. A µ∗-measurable (orCaratheodory measurable) set is a set A ∈ P(X) such that

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac)

for every E ∈ P(X).

Note that since the ≤ inequality follows immediately from finite subadditivity of µ∗, a set A ∈ P(X)is µ∗-measurable iff

µ∗(E) ≥ µ∗(E ∩A) + µ∗(E ∩Ac)

for every E ∈ P(X).

A µ∗-measurable set A can be thought of as a sharp knife; it can cut any set E into two piecesE ∩ A and E ∩ Ac so cleanly that the sum of the outer measure of the pieces equals the outermeasure of the whole.

Theorem 24.4 (Caratheodory’s Restriction Theorem). Let X be a set. Let µ∗ be an outer measureon X. Let A∗ be the collection of µ∗-measurable sets.

67

(a) If A ∈ P(X) and µ∗(A) = 0, then A ∈ A∗.

(b) A∗ is a σ-algebra on X

(c) The restriction of µ∗ to A is a measure on A.

Proof. Step 0. If A ∈ P(X) and µ∗(A) = 0, then A ∈ A∗.

Assume A ∈ P(X) and µ∗(A) = 0. For any E ∈ P(X),

µ∗(E ∩A) + µ∗(E ∩Ac) ≤ µ∗(A) + µ∗(E) = µ∗(E).

So A ∈ A∗.

Step 1. ∅ ∈ A∗.

For any E ∈ P(X),µ∗(E ∩ ∅) + µ∗(E ∩ ∅c) = µ∗(∅) + µ∗(E) = µ∗(E).

So ∅ ∈ A∗.

Step 2. If A ∈ A∗, then Ac ∈ A.

If A ∈ A, then, for any E ∈ P(X),

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac) = µ∗(E ∩Ac) + µ∗(E ∩ (Ac)c)

, and so Ac ∈ A.

Step 3. If A,B ∈ A∗, then A ∪B ∈ A∗.

Let E ∈ P(X). Then

µ∗(E) = µ∗(E ∩A) + µ∗(E ∩Ac)

= µ∗(E ∩A ∩B) + µ∗(E ∩A ∩Bc) + µ∗(E ∩Ac ∩B)

+ µ∗(E ∩Ac ∩Bc).

But A ∪B = (A ∩B) ∪ (A ∩Bc) ∪ (Ac ∩B) and Ac ∩Bc = (A ∪B)c. So, by finite subadditvity,

µ∗(E ∩ (A ∪B)) + µ∗(E ∩ (A ∪B)c) ≤ µ∗(E ∩A ∩B) + µ∗(E ∩A ∩Bc) + µ∗(E ∩Ac ∩B)

+ µ∗(E ∩Ac ∩Bc)

= µ∗(E).

Thus A ∪B ∈ A∗.

Step 4. A∗ is an algebra.

Steps 1, 2, and 3 combined show that A∗ is an algebra.

Step 5. If B1, B2, . . . , Bn are disjoint sets in A∗ and if E ∈ P(X), then

µ∗

(E ∩

(n⋃

i=1

Bi

))=

n∑i=1

µ∗(E ∩Bi).

68

Since Bn ∈ A∗,

µ∗(E ∩n⋃

i=1

Bi) = µ∗(E ∩

(n⋃

i=1

Bi

)∩Bn) + µ∗(E ∩

(n⋃

i=1

Bi

)∩Bc

n)

Since the B1, B2, . . . , Bn are disjoint sets,

E ∩

(n⋃

i=1

Bi

)∩Bn = E ∩Bn

and

E ∩

(n⋃

i=1

Bi

)∩Bc

n = E ∩

(n−1⋃i=1

Bi

). Therefore

µ∗(E ∩

(n⋃

i=1

Bi

)) = µ∗(E ∩Bn) + µ∗

(E ∩

(n−1⋃i=1

Bi

)).

Repeating the argument (or induction) gives the desired result.

Step 6. If B1, B2, . . . are disjoint sets in A∗, then⋃∞

i=1Bi ∈ A.

Let E ∈ P(X). Let n ∈ N be arbitrary. Since A∗ is an algebra (Step 4),⋃n

i=1Bi ∈ A∗, and soThen

µ∗(E) = µ∗

(E ∩

(n⋃

i=1

Bi

))+ µ∗

(E ∩

(n⋃

i=1

Bi

)c)

By Step 5,

µ∗(E) =n∑

i=1

µ∗(E ∩Bi) + µ∗

(E ∩

(n⋃

i=1

Bi

)c).

Since (⋃∞

i=1Bi)c ⊆ (

⋃ni=1Bi)

c, monotonicity of µ∗ gives

µ∗(E) ≥n∑

i=1

µ∗(E ∩Bi) + µ∗

(E ∩

( ∞⋃i=1

Bi

)c).

Letting n→∞ gives

µ∗(E) ≥∞∑i=1

µ∗(E ∩Bi) + µ∗

(E ∩

( ∞⋃i=1

Bi

)c).

By countable subaddivity of µ∗,

µ∗(E) ≥ µ∗(E ∩

( ∞⋃i=1

Bi

))+ µ∗

(E ∩

( ∞⋃i=1

Bi

)c).

Therefore⋃∞

i=1Bi ∈ A∗.

69

Step 7. If A1, A2, . . . ∈ A, then⋃∞

i=1Ai ∈ A.

Define B1 = A1, B2 = A2 \ A1, B3 = A3 \ (A1 ∪ A2), . . .. Note B1, B2, . . . are disjoints sets. SinceA∗ is an algebra (Step 4), B1, B2, . . . ∈ A∗. By Step 6,

∞⋃i=1

Ai =∞⋃i=1

Bi ∈ A.

Step 8. The restriction of µ∗ to A∗ is a measure on A∗.

For every E,FA∗, we haveµ∗(E) = µ∗(E ∩ F ) + µ∗(E ∩ F c).

So Lemma 24.2 implies µ∗ is finitely additive on A*. Then Lemma 24.1 implies µ∗ is a measure onA*.

70

25 Lebesgue Measure and Lebesgue σ-Algebra

Definition 25.1. Let λ∗ be the Lebesgue outer measure on R. The σ-algebra of all λ∗-measurablesets is called the Lebesgue σ-algebra on R. It is denoted by L. The elements of L are called theLebesgue measurable sets. The restriction of λ∗ to L is a measure called Lebesgue measureand is denoted by λ.

Theorem 25.2. The Borel σ-algebra on R is contained in the Lebesgue σ-algebra on R, i.e., B ⊆ L.

Proof. Since B is the smallest σ-algebra containing the open sets, we just need to show that Lcontains every open set. We first show that L contains every interval of the form (a,∞). Let I beany open interval of the form (a,∞). We must show

λ∗(E) = λ∗(E ∩ I) + λ∗(E ∩ Ic)

for every E ∈ P(R). Let E ∈ P(R). By subadditivity,

λ∗(E) ≤ λ∗(E ∩ I) + λ∗(E ∩ Ic).

It remains to showλ∗(E) ≥ λ∗(E ∩ I) + λ∗(E ∩ Ic).

If λ∗(E) = ∞, we are done. Assume λ∗(E) < ∞. Let ε > 0 be given. There exists a sequence ofintervals (In)n∈N that covers E and satisfies

∞∑n=1

`(In) ≤ λ∗(E) + ε.

Since µ∗(E) is finite, the last inequality implies each In is a finite interval. For each n ∈ N, choosean open interval Jn such that In ⊆ Jn and `(Jn) ≤ `(In) + ε2−n. Then the sequence of finite openintervals (Jn)n∈N covers E and

∞∑n=1

`(Jn) ≤ λ∗(E) + 2ε.

For each n ∈ N, define J ′n = Jn ∩ I = Jn and J ′′n = Jn ∩ Ic. Then each J ′n and J ′′n is an interval and

`(Jn) = `(J ′n) + `(J ′′n).

Since E ∩ I ⊆⋃∞

n=1 J′n, we have

λ∗(E ∩ I) ≤∞∑n=1

`(J ′n).

Since E ∩ Ic ⊆⋃∞

n=1 J′′n , we have

λ∗(E ∩ Ic) ≤∞∑n=1

`(J ′′n).

Therefore

λ∗(E ∩ I) + λ∗(E ∩ Ic) ≤∞∑n=1

`(J ′n) +∞∑n=1

`(J ′′n) =∞∑n=1

`(Jn) ≤ λ∗(E) + 2ε.

71

Since ε > 0 is arbitrary, we have

λ∗(E ∩ I) + λ∗(E ∩ Ic) ≤ λ∗(E).

We have proved L contains every open interval of the form (a,∞). Since L is closed under com-plements it contains the complement of each such interval, i.e., it contains every interval of theform (−∞, a]. Since L is closed under finite intersections, it contains every interval of the form(a, b) = (a,∞)∩ (−∞, b]. Since L is closed under countable unions, it contains every interval of theform

(a, b) =

n⋃i=1

(a, b− 1

n].

and every interval of the form

(−∞, b) =n⋃

i=1

(−n, b− 1

n].

Thus L contains every open interval.

Let U be any open set in R. Then U is a union of open balls, by definition. But the openballs in R are exactly the finite open intervals. Indeed, B(x, r) = (x − r, x + r) and (a, b) =B((a + b)/2, (b − a)/2). Thus U is union of open intervals, U =

⋃k∈K Ik. Note that the index

set K may not be countable. For each rational number r ∈ U , let Jr be the union of thoseinterval Ik such that r ∈ Ik. Then Jr is an open interval (possibly infinite) for each r ∈ Q andU =

⋃k∈K Ik =

⋃r∈QQ Jr. Since Jr ∈ L for each r ∈ Q and Q is countable, we see that U is a

countable union of sets in L, and so U ∈ L. This show that L contains all the open sets in R.

Theorem 25.3. Let A ⊆ X. The following are equivalent:

(a) A ∈ L.

(b) For every ε > 0, there exists an open set G such that A ⊆ G and λ∗(G \A) < ε.

(c) There exists a set B ∈ B such that A ⊆ B and λ∗(B \A) = 0.

(d) For every ε > 0, there exists a closed set F such that F ⊆ A and λ∗(A \ F ) < ε.

(e) There exists a set B ∈ B such that B ⊆ A and λ∗(A \B) = 0.

Proof. (a) ⇒ (b):

We first prove this under the assumption that λ∗(A) < ∞. Let ε > 0. Choose a sequence ofintervals I1, I2, . . . such that A ⊆

⋃∞i=1 Ii and

∑∞i=1 `(Ii) ≤ λ∗(A) + ε. Since λ∗(A) is finite, the

last inequality implies each Ii is a finite interval. For each i, choose an open interval Ji such thatIi ⊆ Ji and `(Ji) ≤ `(Ii) + ε2−i. Define G =

⋃∞i=1 Ji. Then G is open, A ⊆ G, and

λ∗(G) ≤∞∑i=1

`(Ji) ≤ λ∗(A) + 2ε

. Since A ∈ L and A ⊆ G, we have

λ∗(G) = λ∗(G ∩A) + λ∗(G ∩Ac) = λ∗(A) + λ∗(G \A).

72

Combining the inequalities above and using that λ∗(A) <∞, we get

λ∗(G \A) = λ∗(G)− λ∗(A) ≤ 2ε < 3ε.

Replacing ε by ε/3 everywhere in the argument gives the desired result.

Now we drop the assumption that λ∗(A) = ∞. Let ε > 0. Define An = A ∪ [−n, n] for eachn ∈ N. Then A =

⋃∞n=1An. Moreover An ∈ L and λ∗(An) < ∞ for each n ∈ N. By mimicking

the argument argument above, for each n ∈ Z we get an open set Gn such that An ⊆ Gn andλ∗(Gn \An) < ε2−n. Define G =

⋃∞n=1Gn. Then G is an open set, A ⊆ G, and

G \A =∞⋃n=1

(Gn \A) ⊆∞⋃n=1

(Gn \An).

Therefore

λ∗(G \A) ≤∞∑n=1

λ∗(Gn \An) <

∞∑n=1

ε2−n−1 = ε.

(b)⇒ (c): By (b), for each n ∈ N, there exists an open set Gn such that A ⊆ Gn and λ∗(Gn \A) <1/n. Define B =

⋃∞n=1Gn. Then B ∈ B, A ⊆ B, and B \ A ⊆ Gn \ A for every n ∈ N. Therefore

λ∗(B \A) ≤ λ∗(Gn \A) < 1/n for every n ∈ N. Hence λ∗(B \A) = 0.

(c) ⇒ (a): Let E ∈ P(R) be given. By (c), there exists B ∈ B such that A ⊆ B and λ∗(B \A) = 0.Since B ⊆ L, we have B ∈ L, and so

λ∗(E) = λ∗(E ∩B) + λ∗(E ∩Bc).

But λ∗(E ∩A) ≤ λ∗(E ∩B) and

λ∗(E ∩Ac) ≤ λ∗(E ∩Ac ∩B) + λ∗(E ∩Ac ∩Bc) ≤ λ∗(B \A) + λ∗(E ∩Bc) = λ∗(E ∩Bc).

Thereforeλ∗(E) ≥ λ∗(E ∩A) + λ∗(E ∩Ac).

The ≤ inequality comes from subaddivity. Thus A ∈ L.

At this point, we have proved (a) ⇔ (b) ⇔ (c). We use this equivalence in what follows.

(a)⇔ (d): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac, i.e., for every ε > 0, there exists anopen set G such that Ac ⊆ G and λ∗(G \Ac) < ε. This last statement is equivalent to (d) becauseof the following facts: G is open iff Gc is closed; Ac ⊆ G iff Gc ⊆ A; G \Ac = G ∩A = A \Gc.

(a) ⇔ (e): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac, i.e., for every ε > 0, there exists aB ∈ B such that Ac ⊆ B and λ∗(B \ Ac) = 0. This last statement is equivalent to (e) because ofthe following facts: B ∈ B iff Bc ∈ B; Ac ⊆ B iff Bc ⊆ A; B \Ac = B ∩A = A \Bc.

73

26 Product σ-Algebras

Definition 26.1. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . A measurablerectangle is a set of the form A×B where A ∈ S and B ∈ T . The product σ-algebra on X × Y isdefined to be

S × T = σ(A×B : A ∈ S, B ∈ T )

In other words, S × T is the σ-algebra generated by the collection of measurable rectangles.

Definition 26.2. Let X,Y be any sets. Let E ⊆ X × Y . If a ∈ X, the a-section (or a-cross-section or a-slice) of E is

[E]a = y ∈ Y : (a, y) ∈ E .

If b ∈ Y , the b-section (or b-cross-section or b-slice) of E is

[E]b = x ∈ X : (x, b) ∈ E .

Theorem 26.3. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . If E ∈ S ⊗T ,then [E]a ∈ T for all a ∈ X and [E]b ∈ S for all b ∈ Y . Informally, the sections of a measurable setare measurable.

Proof. We must show

S ⊗ T ⊆ E ⊆ X × Y : [E]a ∈ T for all a ∈ X and [E]b ∈ S for all b ∈ Y .

It is easy to check that the right-hand side is a σ-algebra containing the measurable rectangles,which implies the desired containment.

Definition 26.4. Let X,Y be any sets. Let f : X × Y → R. If a ∈ X, the a-section (ora-cross-section or a-slice) of f is the function [f ]a : Y → R defined by

[f ]a (y) = f(a, y)

for all y ∈ Y . If b ∈ X, the b-section (or b-cross-section or b-slice) of f is the function[f ]b : X → R defined by

[f ]b (x) = f(x, b)

for all x ∈ X.

Example 26.5. If f(x, y) = 5x2 + y3, then [f ]2 (y) = f(2, y) = 20 + y3.

Theorem 26.6. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . Let Letf : X × Y → R. If f is S ⊗ T -measurable, then [f ]a is T -measurable for each a ∈ X and [f ]b isS-measurable for each b ∈ Y . Informally, the sections of a measurable function are measurable.

Proof. Let a ∈ X. For any Borel set B ⊆ R, we have f−1(B) ∈ S ⊗ T , and so

([f ]a)−1(B) =[f−1(B)

]a∈ T .

This proves that [f ]a is T -measurable for each a ∈ X. The other conclusion is proved similarly.

74

27 Monotone Classes

Monotone classes are similar (but not identical) to σ-algebras.

Definition 27.1. Let X be a set. A collection M ⊆ P(X) is called a monotone class on X ifthe following properties hold:

(i) If E1 ⊆ E2 ⊆ . . . are in M, then⋃∞

i=1Ei ∈M.

(ii) If E1 ⊇ E2 ⊇ . . . are in M, then⋂∞

i=1Ei ∈M.

In other words, a monotone class is a collection of subsets of X which is closed under countableincreasing unions and countable decreasing intersections.

Theorem 27.2. Every σ-algebra is a monotone class, but not conversely.

Proof. Exercise.

Theorem 27.3. The intersection of any collection of monotone classes on a set X is a monotoneclass on X.

Proof. Exercise.

Theorem 27.4. Let X be a set and let C be any collection of subsets of X. The intersection ofall monotone classes on X that contain C is the smallest monotone class on X that contains C. Itcalled the monotone class generated by C and it is denoted by M(C).

Proof. Exercise.

Lemma 27.5. (Monotone Class Lemma) If A is an algebra on a set X, then the monotone classgenerated by A equals the σ-algebra generated by A, i.e.,

M(A) = σ(A).

Proof. Since σ(A) is a monotone class, we haveM(A) ⊆ σ(A). We need to show σ(A) ⊆M(A). Itsuffices to show thatM(A) is a σ-algebra. Note that if a monotone class is an algebra, then it is a σ-algebra (exercise). Therefore it suffices to show thatM(A) is an algebra. Note ∅, X ∈ A ⊆M(A).Thus, to prove that M(A) is an algebra, it is easy to check (exercise) that it will suffice to provethe following claim: For all E,F ∈M(A) we have E ∩ F,E \ F, F \ E ∈M(A).

Proof of Claim. For each E ∈M(A), define

ME = F ∈M(A) : E ∩ F,E \ F, F \ E ∈M(A) .

It is easy to check (exercise) that ME is a monotone class for every E ∈ M(A). Since A is analgebra, it is easy to check (exercise) that A ⊆ME for every E ∈ A. Therefore M(A) ⊆ME forevery E ∈ A. In other words, for every E ∈ A and every F ∈ M(A) we have F ∈ ME . Note thatfor every E,F ∈M(A) we have F ∈ME iff E ∈MF . Thus, for every E ∈ A and every F ∈M(A)we have E ∈ MF . This means that A ⊆ MF for every F ∈ M(A). Therefore M(A) ⊆ MF for

75

every F ∈ M(A). But this means that for all E,F ∈ M(A) we have E ∈ MF . Thus for allE,F ∈M(A) we have E ∩ F,E \ F, F \ E ∈M(A).

28 Iterated Integrals of Characteristic Functions

Definition 28.1. Let (X,S, µ) be a measure space. We say that (X,S, µ) is σ-finite (or that µ isσ-finite) if there exist E1, E2, . . . ⊆ X such that X =

⋃∞i=1Ei and µ(Ei) <∞ for every i ∈ N.

Definition 28.2. Let (X,S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,we define ∫

Y1E(x, y)dν(y) =

∫Y

[1E ]x (y)dν(y)

For each y ∈ Y , we define ∫X

1E(x, y)dµ(x) =

∫X

[1E ]y (x)dµ(x)

Lemma 28.3. Let (X,S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,∫Y

1E(x, y)dν(y) = ν([E]x)

For each y ∈ Y , ∫X

1E(x, y)dµ(x) = µ([E]y)

Proof. For each x ∈ X, we have [1E ]x = 1[E]x , and so∫Y

1E(x, y)dν(y) =

∫Y

[1E ]x (y)dν(y) =

∫Y

1[E]x(y)dν(y) = ν([E]x).

The proof of the other statement is similar.

Theorem 28.4. Let (X,S, µ) and (Y, T , ν) be σ-finite measure spaces. For every E ∈ S ⊗ T , wehave

(a) x 7→∫Y 1E(x, y)dν(y) is an S-measurable function.

(b) y 7→∫X 1E(x, y)dµ(x) is a T -measurable function.

(c)∫X

∫Y 1E(x, y)dν(y)dµ(x) =

∫Y

∫X 1E(x, y)dµ(x)dν(y).

Proof. We will first prove the theorem in the special case where µ and ν are finite. Then we willprove the theorem in the case µ and ν are σ-finite.

Assume µ and ν are finite. LetM be the collection of all sets E ∈ S ⊗T such that (a),(b),(c) hold.It will suffice to show that S ⊗ T ⊆ M. Let R be the collection of all measurable rectangles. Let

76

A be the collection of all finite unions of disjoint measurable rectangles. Note that S ⊗ T = σ(A)and that A is an algebra. By the monotone class lemma, S ⊗ T =M(A). Therefore, to show thatS ⊗ T ⊆M, it will suffice to show thatM is a monotone class that contains A. We do so in threesteps.

Step 1: R ⊆ M. Let E ∈ R. We must show that (a),(b),(c) hold. We have E = A × B for someA ∈ S and B ∈ T . For each x ∈ X, we have

[E]x =

B if x ∈ A∅ if x /∈ A

and so ∫Y

1E(x, y)dν(y) = ν([E]x) =

ν(B) if x ∈ A0 if x /∈ A

= ν(B)1A(x).

Since A ∈ S, (a) holds. Similarly,∫X 1E(x, y)dµ(x) = µ(A)1B(y) for each y ∈ Y . Since B ∈ T , (b)

holds. Now we see that∫X

∫Y

1E(x, y)dν(y)dµ(x) =

∫Xν(B)1A(x)dµ(x) = µ(A)ν(B) =

∫Yµ(A)1B(y)dν(y) =

∫Y

∫X

1E(x, y)dµ(y)dν(x)

Thus (c) holds.

Step 2: A ⊆M. Let E ∈ A. Then E =⋃n

i=1Ri for some disjoint R1, . . . , Rn ∈ R. For each x ∈ X,we have

[E]x =

[n⋃

i=1

Ri

]x

=

n⋃i=1

[Ri]x

and so ∫Y

1E(x, y)dν(y) = ν([E]x) =n∑

i=1

ν([Ri]x) =n∑

i=1

∫Y

1Ri(x, y)dν(y).

By Step 1, x 7→∫Y 1Ri(x, y)dν(y) is S-measurable for each i. So x 7→

∫Y 1E(x, y)dν(y) is a sum of S-

measurable functions, and hence it is S-measurable. Thus (a) holds. Similarly,∫X 1E(x, y)dµ(x) =∑n

i=1

∫X 1Ri(x, y)dµ(x), and so (b) holds. By Step 1,∫

X

∫Y

1Ri(x, y)dν(y)dµ(x) =

∫Y

∫X

1Ri(x, y)dµ(x)dν(y)

for each i. Therefore∫X

∫Y

1E(x, y)dν(y)dµ(x) =n∑

i=1

∫X

∫Y

1Ri(x, y)dν(y)dµ(x) =

n∑i=1

∫Y

∫X

1Ri(x, y)dµ(x)dν(y) =

∫Y

∫X

1E(x, y)dµ(x)dν(y).

Thus (c) holds.

Step 3: M is a monotone class. We first show thatM is closed under countable increasing unions.Suppose E1 ⊆ E2 ⊆ . . . are in M and set E =

⋃∞n=1Ei. We must show E ∈ M. To do so, we will

show (a),(b),(c) hold for E. For each x ∈ X, define

fn(x) =

∫Y

1En(x, y)dν(y) = ν([En]x),

f(x) =

∫Y

1E(x, y)dν(y) = ν([E]x).

77

For x ∈ X, we have [E1]x ⊆ [E2]x ⊆ . . . and [E]x =⋃∞

n=1 [En]x. By continuity from below appliedto ν([En]x) (for each fixed x) (or the monotone convergence theorem applied to 1En(x, y) (foreach fixed x)), we have that fn → f pointwise. By the definition of M, each fn is S-measurable.Therefore f is measurable. So (a) holds for E. For each y ∈ Y , define

gn(y) =

∫X

1En(x, y)dµ(x) = ν([En]y),

g(x) =

∫X

1E(x, y)dµ(x) = ν([E]y).

For y ∈ Y , we have [E1]y ⊆ [E2]

y ⊆ . . . and [E]y =⋃∞

n=1 [En]y. By continuity from below appliedto ν([En]y) (for each fixed y) (or the monotone convergence theorem applied to 1En(x, y) (for eachfixed y)), we have that gn → g pointwise. By the definition of M, each gn is T -measurable.Therefore g is measurable. So (b) holds for E. Now, since (fn) is an increasing sequence, themonotone convergence theorem implies∫ ∫


∫f(x)dµ(x) = lim

n

∫fn(x)dµ(x) = lim

n

∫ ∫1En(x, y)dν(y)dµ(x).

(28.1)

Similarly, since (gn) is an increasing sequence, the monotone convergence theorem implies∫ ∫1E(x, y)dµ(x)dν(y) =

∫g(y)dν(y) = lim

n

∫gn(y)dν(y) = lim

n

∫ ∫1En(x, y)dµ(x)dν(y).

(28.2)

Since (c) holds for each En, the right-hand sides of (28.1) and (28.2) are equal, and so∫ ∫1E(x, y)dν(y)dµ(x) =

∫ ∫1E(x, y)dµ(x)dν(y)

Thus (c) holds for E. Now we show that M is closed under countable decreasing intersections.Suppose E1 ⊆ E2 ⊆ . . . are in M and set E =

⋃∞n=1Ei. We must show E ∈ M. To do so, we will

show (a),(b),(c) hold for E. Define f , fn, g, gn as above. We have f1(x) = ν([E1]x) ≤ ν(Y ) <∞ foreach x ∈ X. By continuity from above applied to ν([En]x) (for each fixed x ∈ X) (or the dominatedconvergence theorem applied to 1En(x, y) (for each fixed x)), we have that fn → f pointwise. Bythe definition of M, each fn is S-measurable. Therefore f is measurable. So (a) holds for E. Wehave g1(y) = µ([E1]

y) ≤ µ(X) <∞ for each y ∈ Y . By continuity from above applied to µ([En]y)(for each fixed y ∈ Y ) (or the dominated convergence theorem applied to 1En(x, y) (for each fixedy)), we have that gn → g pointwise. By the definition ofM, each gn is T -measurable. Therefore gis measurable. So (b) holds for E. We have 0 ≤ fn ≤ f1 for all n and∫

f1(x)dµ(x) ≤∫ν(Y )dµ(x) = µ(X)ν(Y ) <∞.

Thus the dominated convergence theorem implies∫ ∫1E(x, y)dν(y)dµ(x) =

∫f(x)dµ(x) = lim

n

∫fn(x)dµ(x) = lim

n

∫ ∫1En(x, y)dν(y)dµ(x).

(28.3)

78

Similarly, we have 0 ≤ gn ≤ g1 for all n and∫g1(y)dν(y) ≤

∫µ(X)dν(y) = µ(X)ν(Y ) <∞.

Thus the dominated convergence theorem implies∫ ∫1E(x, y)dµ(x)dν(y) =

∫g(y)dν(y) = lim

n

∫gn(y)dν(y) = lim

n

∫ ∫1En(x, y)dµ(x)dν(y).

(28.4)

Since (c) holds for each En, the right-hand sides of (28.3) and (28.4) are equal, and so∫ ∫1E(x, y)dν(y)dµ(x) =

∫ ∫1E(x, y)dµ(x)dν(y)

Thus (c) holds for E.

The proof of the theorem is now complete in the case where µ and ν are finite. Now we assumeonly that µ and ν are σ-finite. Since µ is σ-finite, there exist X1, X2 ∈ S such that

⋃∞i=1Xi = X

and µ(Xi) < ∞ for all i. By replacing each Xi by X1 ∪ · · · ∪Xi, we can assume X1 ⊆ X2 ⊆ . . ..Since ν is σ-finite, there exist Y1, Y2 ∈ T such that

⋃∞j=1 Yj = Y and ν(Yj) < ∞ for all j. By

replacing each Yj by Y1 ∪ · · · ∪ Yj , we can assume Y1 ⊆ Y2 ⊆ . . .. For each k ∈ N, define µk onS by µk(A) = µ(A ∩Xk) and νk on T by νk(B) = ν(B ∩ Yk). It is easy to check that µk and νkare finite measures. Let E ∈ S ⊗ T . As the theorem has been proved for finite measures, for eachk ∈ N, x 7→ νk([E]x) is S-measurable function, y 7→ µk([E]y) is T -measurable function, and∫

νk([E]x)dµ(x) =

∫µk([E]y)dν(x). (28.5)

For each fixed x, [E]x∩Yk increases to [E]x, and so continuity from below implies νk([E]x) increasesto ν([E]x). Then x 7→ ν([E]x) is measurable and the monotone convergence theorem implies

limk

∫νk([E]x)dµ(x) =

∫ν([E]x)dµ(x).

Similarly, y 7→ µ([E]y) is measurable and the monotone convergence theorem implies

limk

∫µk([E]x)dν(x) =

∫µ([E]y)dν(x).

Combining the last two equations with (28.5) gives∫ν([E]x)dµ(x) =

∫µ([E]y)dν(x),

which is equivalent to ∫ ∫1E(x, y)dν(y)dµ(x) =

∫ ∫1E(x, y)dµ(x)dν(y).

79

29 Product Measures

Definition 29.1. Let (X,S, µ) and (Y, T , ν) be measure spaces. Assume (X,S, µ) and (Y, T , ν)are σ-finite. The product measure of µ and ν is the set function µ× ν : S ⊗T → [0,∞] definedby

(µ× ν)(E) =

∫X

∫Y


∫X

∫Y

1E(x, y)dν(y)dµ(x).

Theorem 29.2. Let (X,S, µ) and (Y, T , ν) be measure spaces. Assume (X,S, µ) and (Y, T , ν) areσ-finite. Then µ × ν is a σ-finite measure on S ⊗ T . Moreover, µ × ν is the unique measure onS ⊗ T such that

(µ× ν)(A×B) = µ(A)ν(B)

for all A ∈ S and B ∈ T .

Proof. Step 1: µ× ν is a measure on S ⊗ T . Since 1∅ = 0, we have

(µ× ν)() =

∫ ∫1∅dµdν =

∫ ∫0dµdν = 0.

Now we check that µ× ν is countably additive. Let E1, E2, . . . ∈ S ⊗ T be disjoint. Then

(µ× ν)

( ∞⋃i=1

Ei

)=

∫ ∫1⋃∞

i=1 Eidνdµ =

∫ ∫ν

([ ∞⋃i=1

Ei

]x

)dµ =

∫ν

( ∞⋃i=1

[Ei]x

)dµ

=

∫ ∞∑i=1

ν ([Ei]x) dµ

By applying the monotone convergence theorem to the sequence of partial sums of the series, weget

(µ× ν)

( ∞⋃i=1

Ei

)=

∞∑i=1

∫ν ([Ei]x) dµ =

∞∑i=1

∫ ∫1Ei(x, y)dνdµ =

∞∑i=1

(µ× ν)(Ei)

Step 2: (µ× ν)(A×B) = µ(A)ν(B) for all A ∈ S and B ∈ T . If A ∈ S and B ∈ T , then

(µ× ν)(A×B) =

∫ ∫1A×Bdνdµ =

∫ν([A×B]x)dµ(x) =

∫ν(B)1A(x)dµ(x) = µ(A)ν(B).

Step 3: µ × ν is σ-finite. Since µ is σ-finite, there exist X1, X2 ∈ S such that⋃∞

i=1Xi = Xand µ(Xi) < ∞ for all i. Since ν is σ-finite, there exist Y1, Y2 ∈ T such that

⋃∞j=1 Yj = Y and

ν(Yj) <∞ for all j. Then⋃

(i,j)∈N×N(Xi×Yj) = X×Y and Xi×Yj ∈ S×T and (µ×ν)(Xi×Yj) =µ(Xi)ν(Yj) <∞ for all (i, j) ∈ N× N.

Step 4: Uniqueness. Let π and ρ be measures on S ⊗ T that both satisfy π(A × B) = µ(A)ν(B)and ρ(A × B) = µ(A)ν(B) for all A ∈ S and B ∈ T . We need to show π = ρ. By arguing as inStep 3, we see that π and ρ are σ-finite. We first treat the case where π and ρ are finite. Then wetreat the case where π and ρ are σ-finite.

80

Case 1: π and ρ are finite. Define M = E ∈ S ⊗ T : π(E) = ρ(E). To show that π = ρ, itsuffices to show S ⊗ T ⊆ M. Let A be the collection of finite unions of disjoint measurablerectangles. It is easy to check that A is an algebra and S ⊗ T = σ(A). By the monotone classlemma, S ⊗ T = M(A). If we show that M is a monotone class that contains A, it will followthat S ⊗ T ⊆ M, and we will be done. Using the additivity of π and ρ, it is easy to check thatA ⊆ M. So it remains to check that M is a monotone class. First we check that M is closedunder countable increasing unions. Suppose E1 ⊆ E2 ⊆ . . . belong toM and let E =

⋃∞n=1En. By

continuity from below,π(E) = lim

nπ(En) = lim

nρ(En) = ρ(E).

Thus E ∈ M. Now we check that M is closed under countable decreasing intersections. SupposeE1 ⊇ E2 ⊇ . . . belong to M and let E =

⋂∞n=1En. Since π(E1) < ∞ and ρ(E1) < ∞, continuity

from above givesπ(E) = lim

nπ(En) = lim

nρ(En) = ρ(E).

Thus E ∈M. Therefore M is a monotone class.

Case 2: π and ρ are σ-finite. As in Step 3, choose X1, X2 ∈ S and Y1, Y2 ∈ T such that⋃(i,j)∈N×N(Xi × Yj) = X × Y , µ(Xi) < ∞ for all i, and µ(Yj) < ∞. For each k ∈ N, let Zk

be the union of those sets (Xi × Yj) with i + j ≤ k, i.e., Zk =⋃

(i,j):i+j≤k(Xi × Yj). Then

Z1 ⊆ Z2 . . . is an increasing sequence of sets in S × T such that⋃∞

k=1 Zk = Xk × Yk,

π(Zk) ≤∑

(i,j):i+j≤k

π(Xi × Yj) =∑

(i,j):i+j≤k

µ(Xi)ν(Yj) <∞,

andρ(Zk) ≤

∑(i,j):i+j≤k

ρ(Xi × Yj) =∑

(i,j):i+j≤k

µ(Xi)ν(Yj) <∞.

For each k ∈ N, define πk and ρk on S⊗T by πk(E) = π(E∩Zk) and ρk(E) = ρ(E∩Zk). It is easy tocheck that πk and ρk are finite measures and that πk(A×B) = µ(A)ν(B) and ρk(A×B) = µ(A)ν(B)for all A ∈ S and B ∈ T . Therefore, by applying Case 1 to πk and ρk, we get that πk(E) = ρk(E)for all E ∈ S ⊗ T . Then, by continuity from below,

π(E) = limkπk(E) = lim

kρk(E) = ρ(E).

30 Tonelli’s Theorem and Fubini’s Theorem

Theorem 30.1. (Tonelli’s Theorem) Let (X,S, µ) and (Y, T , ν) be σ-finite measure spaces. Iff : X × Y → [0,∞] is a S ⊗ T -measurable function, then

(a) x 7→∫Y f(x, y)dν(x) is an S-measurable function,

(b) y 7→∫X f(x, y)dµ(y) is a T -measurable function,

(c)∫fd(µ× ν) =

∫X

∫Y f(x, y)dν(y)dµ(x) =

∫Y

∫X f(x, y)dµ(x)dν(y).

81

Proof. By Theorem 28.4 and the definition of product measure, the theorem holds when f is theindicator function of a S ⊗ T -measurable set. Since linear combinations of measurable functionsare measurable and since the integral is linear, the theorem also holds when f is a measurablesimple function. If f is non-negative extended-real-valued S ⊗ T -measurable function, we can finda sequence of non-negative measurable simple functions which increases to f pointwise, and so thefact that limits of measurable functions are measurable and the monotone convergence theoremimply that the theorem holds for f in this case as well.

Theorem 30.2. (Fubini’s Theorem) Let (X,S, µ) and (Y, T , ν) be σ-finite measure spaces. Sup-pose f : X × Y → [−∞,∞] (or f : X × Y → C) is S ⊗ T -measurable and µ× ν-integrable.

(a) [f ]x is ν-integrable (i.e.,∫Y |f(x, y)|dν(y) <∞) for µ-almost-every x ∈ X.

(b) [f ]y is µ-integrable (i.e.,∫X |f(x, y)|dµ(x) <∞) for ν-almost-every y ∈ Y .

(c) The function

I(x) =

∫Y f(x, y)dν(y) if [f ]x is ν-integrable

0 otherwise

is S-measurable and µ-integrable.

(d) The function

J(y) =

∫X f(x, y)dµ(x) if [f ]y is µ-integrable

0 otherwise

is T -measurable and ν-integrable.

(e)∫fd(µ× ν) =

∫X I(x)dµ(x) =

∫Y J(y)dν(y).

Remark: Note that the function x 7→∫Y f(x, y)dν(y) may not be defined for every x ∈ X. Thus

we need to work with I(x) instead. Likewise with the function y 7→∫Y f(x, y)dµ(y) and J(y). It is

standard convention to identify these functions, so that (e) can be written as∫fd(µ× ν) =

∫X

∫Yf(x, y)dν(y)dµ(x) =

∫Y

∫Xf(x, y)dµ(x)dν(y).

Proof. The plan is to apply Tonelli’s theorem to the functions |f |, f+, and f−. The function |f |is non-negative and S ⊗ T -measurable. Tonelli’s theorem implies that x 7→

∫Y |f(x, y)|dν(x) is an

S-measurable function, y 7→∫X |f(x, y)|dµ(y) is a T -measurable function, and∫

|f |d(µ× ν) =

∫X

∫Y|f(x, y)|dν(y)dµ(x) =

∫Y

∫X|f(x, y)|dµ(x)dν(y).

The assumption that f is µ × ν-integrable means that all three expressions above are finite. Nowwe apply Tonelli’s theorem to the positive and negative parts of f . The functions f+ and f− arenon-negative and S⊗T -measurable. Tonelli’s theorem implies that the functions I+ and I− definedby

I+(x) =

∫Yf+(x, y)dν(y) =

∫[f ]+ I−(x) =

∫Yf−(x, y)dν(y)

are both S-measurable functions. Note that

I+(x) =

∫ [f+]xdν =

∫([f ]x)+dν

82

and

I−(x) =

∫ [f−]xdν =

∫([f ]x)−dν

Thus [fx] is ν-integrable iff I+(x) and I−(x) are both finite. Since f+ ≤ |f | and f− ≤ |f |, we have∫XI+(x)dµ(x) ≤

∫X

∫Y|f(x, y)|dν(y)dµ(x) <∞

and ∫XI−(x)dµ(x) ≤

∫X

∫Y|f(x, y)|dν(y)dµ(x) <∞

Thus I+ and I− are µ-integrable. It follows that I+(x) and I−(x) are finite for µ-almost-everyx ∈ X. Equivalently, [fx] is ν-integrable for µ-almost-every x ∈ X. So (a) is proved. Let A be theset of those x ∈ X such that both I+(x) and I−(x) are finite. Equivalently, A is the set of thosex ∈ X such that is ν-integrable. For each x ∈ A, we have

I(x) =

∫Yf(x, y)dν(y) =

∫Yf+(x, y)dν(y)−

∫Yf−(x, y)dν(y) = I+(x)− I−(x)

Thus I = I+1A − I−1A. Since I+ and I− are S-measurable, it follows that A ∈ S. Therefore I isS-measurable. Moreover, since I+ and I− are µ-integrable, so are I+1A and I−1A, and hence sois I. So (c) is proved. The proofs of (b) and (d) are similar. It remains to prove (e). First observethat ∫

Idµ =

∫I+1Adµ−

∫I−1Adµ =

∫I+dµ−

∫I−dµ.

By applying Tonelli’s theorem to the functions f+ and f−, we have that∫f+d(µ× ν) =

∫ ∫f+dνdµ =

∫I+dµ

and ∫f−d(µ× ν) =

∫ ∫f−dνdµ =

∫I−dµ.

Therefore ∫fd(µ× ν) =

∫f+d(µ× ν)−

∫f−d(µ× ν) =

∫I+dµ−

∫I+dµ =

∫Idµ

A similar argument gives ∫fd(µ× ν) =

∫Jdν.

31 Lebesgue Measure on Rn

Let B(Rn) denote the Borel σ-algebra on Rn. It is an exercise to show that B(Rm) ⊗ B(Rn) =B(Rm+n). Note that the Lebesgue measure on R is σ-finite. The Lebesgue measure λn on B(Rn)is defined to be the product measure

λn = λ× · × λ.

83

Tonelli’s Theorem and Fubini’s Theorem hold for Lebesgue measure on B(Rn).

It is possible to define the Lebesgue measure on the larger Lebesgue σ-algebra on Rn. And it ispossible to prove Tonelli’s Theorem and Fubini’s Theorem in that case. But it is a bit complicated.This is explored in the homework exercises.

32 Probability Theory: Basic Definitions

Definition 32.1. A probability space is a measure space (Ω,A, P ) such that P (Ω) = 1. Insuch case, the measure P is called a probability measure on (Ω,A), the elements of Ω are calledelementary outcomes or sample points, the set Ω is called the sample space, the sets A ∈ Aare called events, and, for each A ∈ A, the number P (A) is called the probability of the eventA.

Example 32.2. Flip two fair coins. There are four possible outcomes: we get two heads; we gethead on the first coin and a tail on the second coin; we get a tail on the first coin and a head onthe second coin; we get two tails. We can take the sample space to be Ω = HH,HT,TH,TT andcollection of events to be A = P(Ω). For example, one event is HT,TH; it represents the eventwhere we get exactly one head. Since the coin is fair, each elementary outcome has probability 1/4,so we define the probability of each event A to be 1/4 times the number of elements of A. Thus

P (exactly one head) = P (HT,TH) =1

4· 2 =

1

2.

Definition 32.3. Let (Ω,A, P ) be a probability space. A random variable on (Ω,A, P ) is aA-measurable function X : Ω→ R.

Example 32.4. We continue the previous example. Define X to be the number of heads, i.e.,

X(ω) =

0 if ω = TT1 if ω = HT or ω = TH2 if ω = HH

(Since the σ-algebra A is the collecton of all subsets of Ω, any function from Ω to R is A-measurable.Thus X is A-measurable. So X is a random variable.)

Notation 32.5. Probabilists have an aversion to displaying the arguments of random variables. Forexample, it is common to write X ≥ a instead of ω : X(ω) ≥ a = X−1((−∞, a]) and P (X ≥ a)instead of P (ω : X(ω) ≥ a) = P (X−1((−∞, a]))

Definition 32.6. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).The distribution of X is the set function PX : A → [0, 1] defined by

PX(B) = P (X−1(B)) = P (X ∈ B)

for each B ∈ B(R). The next theorem says that PX is a probability measure on (R,B(R)).

Theorem 32.7. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).The distribution of X, PX , is a probability measure on (R,B(R)).

84

Proof. We have

PX(∅) = P (X−1(∅)) = P (∅) = 0

PX(R) = P (X−1(R)) = P (Ω) = 1.

To complete the proof, we need to show that PX is countably additive. Suppose B1, B2, . . . ∈B(R) are disjoint. Then X−1(B1), X

−1(B2), . . . are disjoint and X−1 (⋃∞

i=1Bi) =⋃∞

i=1X−1 (Bi).

Therefore

PX

( ∞⋃i=1

Bi

)= P

(X−1

( ∞⋃i=1

Bi

))= P

( ∞⋃i=1

X−1 (Bi)

)=∞∑i=1

P(X−1 (Bi)

)=∞∑i=1

PX (Bi) .

Definition 32.8. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).The cumulative distribution function or cdf of X is the function FX : R→ [0, 1] defined by

FX(t) = PX((−∞, t]) = P (X ≤ t)

for all t ∈ R. When no confusion is possible, we denote the cdf of X by F instead of FX .

Definition 32.9. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).We say that X is a continuous random variable if there is a function fX : R→ [0,∞) such thatfX is B(R)-measurable and

P (X ∈ B) =

∫BfXdλ =

∫fX1Bdλ

for every B ∈ B(R). The function fX is called the probability density function or pdf of X.When no confusion is possible, we denote the pdf of X by f instead of fX .

Theorem 32.10. Let X be a continuous random variable on a probability space (Ω,A, P ). Let fbe the pdf of X. Let F be the cdf of X.

1. F (t) =∫ t−∞ fdλ for every t ∈ R.

2. P (X = t) = 0 for all t ∈ R.

3. F is continuous.

4. The pdf of X is unique up to λ-a.e. equality. In other words, if g : R → [0,∞) is B(R)-measurable and P (X ∈ B) =

∫B gdλ for all B ∈ B(R), then f = g λ-a.e.

5. f(t) = F ′(t) for every t ∈ R at which f is continuous.

6. f(t) = F ′(t) for λ-a.e. t ∈ R.

Proof. (a)-(e) are exercises. The proof of (f) is difficult and is omitted for now.

Here are some examples of continuous random variables.

85

Example 32.11. (a) A random variable X is said to have a standard normal distribution if ithas pdf f(x) =

√2πe−x

2/2. Then

P (c ≤ X ≤ d) =

∫ d

c

√2πe−x

2/2dx.

(b) A random variable X is said to be uniformly distributed on the interval [a, b] if it has pdff = 1

b−a1[a,b]. For each interval [c, d] ⊆ [a, b], we have

P (c ≤ X ≤ d) =

∫1

b− a1[a,b]1[c,d]dx =

d− cb− a

.

In words, the probability that X lies in the interval [c, d] ⊆ [a, b] is the relative length of theinterval [c, d].

Definition 32.12. Let X be a random variable on a probability space (Ω,A, P ). Let µc be thecounting measure on (R,B(R)). We say X is a discrete random variable if there is a functionf : R→ [0,∞) such that

P (X ∈ B) =

∫BfXdµc =

∫fX1Bdµc

for every B ∈ B(R). In such case, fX is called the probability mass function of X. When noconfusion is possible, we denote the pmf of X by f instead of fX .

Theorem 32.13. Let X be a discrete random variable on a probability space (Ω,A, P ). LetS = s ∈ R : P (X = s) > 0.

1. S is countable.

2. The probability mass function of X is given by

f(t) = P (X = t) =∑s∈S

1s(t)

for all t ∈ R.

3. PX(B) =∑

s∈S∩B f(s) for every B ∈ B(R).

4. PX =∑

s∈S f(s)δs.

Proof. Exercise.

Theorem 32.14. Let X be a random variable on a probability space (Ω,A, P ). The following areequivalent:

(a) X is discrete.

(b) The set S = s ∈ R : P (X = s) > 0 is countable.

(c) The distribution of X is a countable weighted sum of point masses. More precisely, PX =∑∞i=1 ciδsi for some sequences c1, c2, . . . ∈ [0,∞) and s1, s2, . . . ∈ R.

In particular, if the range of X is countable, then X is discrete.

86

Proof. Exercise.

Here are some examples of discrete random variables.

Example 32.15. (a)

(b)

87

33 Theory of Cumulative Distribution Functions

We seek to answer two questions.

Question 1: Which functions can be cumulative distribution functions?

Question 2: We know the distribution PX determines the cumulative distribution function FX .Does the cumulative distribution function FX determine the distribution PX?

As a preliminary, we establish

First we note some properties of cumulative distribution functions.

Theorem 33.1. If X is a random variable on a probability space (Ω,A, P ), then the cumulativedistribution function of X, FX , has the following properties.

(a) (Monotone Increasing) If s ≤ t are real numbers, then FX(s) ≤ FX(t).

(b) (Right-Continuous) limt→a+ FX(t) = FX(a) for all a ∈ R.

(c) limt→−∞ FX(t) = 0

(d) limt→∞ FX(t) = 1

Proof. (a): By monotonicity of the measure P , if s ≤ t, then

FX(s) = P (ω : X(ω) ≤ s) ≤ P (ω : X(ω) ≤ t) = FX(t).

(b): If (an) is any sequence of real numbers which decreases to a, then the sets ω : X(ω) ≤ andecrease to ω : X(ω) ≤ a, and so the fact that P is a finite measure and continuity from abovegive

limnFX(an) = lim

nP (ω : X(ω) ≤ an) = lim

nP (ω : X(ω) ≤ a) = lim

nFX(a).

Thus limt→a+ FX(t) = FX(a). (c) and (d): Similar to (b). The details are an exercise.

Theorem 33.2. Let X be a random variable on a probability space (Ω,A, P ). Let a ≤ b be realnumbers. Let F be the cdf of X.

(a) PX((a, b]) = F (b)− F (a).

(b) PX((−∞, a)) = limt→a− F (t)

(c) PX(a) = F (a)− limt→a− F (t).

(d) PX([a, b]) = F (b)− limt→a− F (t).

(e) PX((a, b)) = limt→b− FX(t)− F (a).

(f) PX([a, b)) = limt→b− FX(t)− limt→a− F (t).

(g) PX((a,∞)) = 1− F (a)

(h) PX([a,∞)) = 1− limt→a− F (t)

88

Proof. Exercise.

The previous theorem tells us that the cumulative distribution function FX determines the distri-bution PX for intervals.

The next theorem is the key to answering the questions above.

Theorem 33.3. If F : R → [0, 1] satisfies properties (a)-(d) above, then there is a unique proba-bility measure µF on (R,B(R)) such that F (t) = µF ((−∞, t]) for all t ∈ R.

Proof. We give only an outline. The details are left as an exercise to the reader. To prove theexistence of µF , we mimic the construction of Lebesgue measure using Caratheodory’s restrictiontheorem. For each interval (a, b] in R, define

`F ((a, b]) = F (b)− F (a).

For each E ∈ P(R), define

µ∗F (B) = inf

∞∑i=1

`F (Ii) : I1, I2, . . . are intervals of the form (a, b] and B ⊆∞⋃i=1

Ii

.

By mimicking the argument for Lebesgue outer measure, it is easy to check that µ∗F is an outermeasure on R and µ∗F (I) = `F (I) (exercise). The latter implies that F (t) = µ∗F ((−∞, t]) for allt ∈ R and µ∗F (R) = 1. Now the Caratheodory restriction theorem implies that the collection

M∗F = E ∈ P(R) : µ∗F (A) = µ∗F (A ∩ E) + µ∗F (A ∩ Ec) for all A ∈ P(R)

of µ∗F -measurable sets is a σ-algebra on R and that the restriction of µ∗F to M∗F is a measure.Denote this measure by µF . Note µF is a measure on (R,M∗F ). We want to have a measureon (R,B(R)). By mimicking the proof that B(R) ⊆ L(R), it is easy to show that B(R) ⊆ M∗F(exercise). Thus we can restrict µF to B(R) to obtain a measure on (R,B(R)). We also denotethis restricted measure by µF . Since we have noted that F (t) = µ∗F ((−∞, t]) for all t ∈ R andµ∗F (R) = 1, and since (−∞, t] is a Borel set we see that F (t) = µF ((−∞, t]) and µF is a probabilitymeasure. This completes the proof of the existence of µF . One way to prove uniqueness is with themonotone class lemma. Another way is given in the proof of Proposition 1.3.10 in Cohn’s book.The details are left as an exercise.

Now we can answer the questions above. The first two corollaries answer Question 1.

Corollary 33.4. Suppose F : R→ [0, 1] satisfies properties (a)-(d) above and let µ be the probabil-ity measure from the previous theorem. If we define (Ω,A, P ) = (R,B(R), µ) and define X : Ω→ Rby X(ω) = ω, then F is the cumulative distribution function of the random variable X on (Ω,A, P ).

Proof. We have

FX(t) = PX((−∞, t]) = P (X−1((−∞, t]))= P (ω ∈ Ω : X(ω) ∈ (−∞, t]) = P (ω ∈ Ω : ω ∈ (−∞, t])= P ((−∞, t]) = µ((−∞, t]) = F (t)

89

Corollary 33.5. Let F : R → [0, 1]. Then F satisfies properties (a)-(d) above iff there exists arandom variable whose cumulative distribution function is F .

The final corollary answers Question 1.

Corollary 33.6. Let X be a random variable on a probability space (Ω,A, P ). There is a uniqueprobability measure µ on (R,B(R)) such that FX(t) = µ((−∞, t]) and µ = PX .

Proof. Apply the previous theorem to F = FX and use the uniqueness assertion to concludeµ = PX .

34 Probability Theory: Expected Value

From elementary probability and statistics, you may be familiar with the following formulas for theexpected value of discrete and continuous random variables. If X is a continuous random variablewith pdf f , then

E(X) =

∫Rxf(x)dx.

If X is a discrete random variable with pmf f and S = s ∈ R : P (X = s) > 0, then

E(X) =∑s∈S

sf(s)

Our goal in this section is to show that these two formulas are actually special cases of a singleformula.

We start with the general definition of expected value.

Definition 34.1. Let X be a random variable on a probability space (Ω,A, P ). the expectedvalue (or expectation or mean or average of X is defined to be

E(X) =

∫XdP

whenever the integral on the right is defined. Note that E(X) may be ±∞.

Theorem 34.2. (Law of the Unconscious Statistician) Let (Ω,A, P ) be a probability space. LetX be a random variable on (Ω,A, P ). Let g : R→ R be a B(R)-measurable function.

(a) g(X) = g X is a random variable on (Ω,A, P ).

(b) g is PX -integrable iff g(X) = g X is P -integrable.

(c) If g is PX -integrable, then

E(g(X)) =

∫(g X)dP =

∫RgdPX

90

(d) If either∫R gdPX = ±∞ or

∫(g X)dP == ±∞, then

E(g(X)) =

∫(g X)dP =

∫RgdPX .

Proof. (a): We are given that X : Ω → R is B(R)-measurable and that is g : R → R is B(R)-measurable. We must show that g X is B(R)-measurable. Let B ∈ B(R) be arbitrary. Sinceg is B(R)-measurable, we have g−1(B) ∈ B(R). Then, since X is B(R)-measurable, we haveX−1(g−1(B)) ∈ B(R). Finally, because (g X)−1(A) = X−1(g−1(A) for all sets A ⊆ R, we have

(g X)−1(B) = X−1(g−1(B) ∈ B(R).

Thus g X is B(R)-measurable.

(b): First suppose that g is an indicator function of a set B ∈ B(R). Then g X is the indicatorfunction of the set X−1(B) and∫

gdPX =

∫1BdPX = PX(B) = P (X−1(B)) =

∫1X−1(B)dP =

∫g XdP

In summary, ∫gdPX =

∫g XdP

Using the linearity of the integral, we can see that this equality holds if g is any non-negativesimple B(R)-measurable function. By the monotone convergence, this equality also holds if g isany non-negative B(R)-measurable function. By applying the equality to the positive and negativeparts of g and noting that (g X)+ = g+ X and (g X)− = g− X, we have∫

g+dPX =

∫(g X)+dP,

∫g−dPX =

∫(g X)−dP

So we see that g is PX -integrable iff both∫g+dPX and

∫g−dPX are finite iff both

∫(g X)+dP

and∫

(g X)−dP iff g X is P -integrable. This proves (b).

(c): The first equality is just the definition of expectation. Based on what we proved above, if g isPX -integrable, then∫

gdPX =

∫g+dPX −

∫g−dPX =

∫(g X)+dP −

∫(g X)−dP =

∫g XdP.

This proves the second equality.

(d): If∫gdPX = ±∞, then one of

∫g+dPX and

∫g−dPX is infinite and one is finite, and we can

do the same calculation as in (c). The other case is similar.

Remark Some authors will use the notation∫g(x)dFX(x). This is just another notation for∫

g(x)dPX(x).

By taking g(x) = x in the previous theorem, we get the following.

91

Corollary 34.3. If X is a random variable on a probability space (Ω,A, P ), then

E(X) =

∫xdPX .

Theorem 34.4. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).

Theorem 34.5. Let (Ω,A, P ) be a probability space. Let X be a random variable on (Ω,A, P ).Let g : R→ R be a B(R)-measurable function.

(a) If X is a continuous random variable with pdf f , then

E(g(X)) =

∫Rg(x)f(x)dx.

whenever the integral is defined.

(b) If X is a discrete random variable with pmf f and S = s ∈ R : P (X = s) > 0, then

E(g(X)) =∑s∈S

g(s)f(s)

whenever the sum is defined.

Proof. (a): For all B ∈ B(R),

PX(B) =

∫Bfdλ,

and this can be rewritten as ∫1BdPX =

∫1Bfdλ.

By the linearity of the integral, we have∫gdPX =

∫gfdλ

if g is any non-negative simple B(R)-measurable function. By the monotone convergence, thisequality also holds if g is any non-negative B(R)-measurable function. By applying the equality tothe positive and negative parts of g, we see that inequality holds whenever g is PX -integrable or(more generally) whenever one of the integrals

∫gdPX or

∫gfdλ is defined. (As usual, “defined”

means equal to a finite number or ±∞.). Now the previous theorem implies the desired result.

(b): By repeating the argument in (a) with λ replaced by the counting measure µc, we get

E(g(X)) =

∫Rg(x)f(x)dµc(x) =

∑s∈S

g(s)f(s)

35 Absolutely Continuous and Singular Measures

36 Decomposition of Probability Measures

92

lecture notes - math 231a - real analysis€¦ · lecture notes - math 231a - real analysis kyle...

Documents