exponential sums over finite fields - university of bristolmatdb/tcc/exp/exp.pdf · exponential...

50
Exponential sums over finite fields Tim Browning Autumn, 2010 1 Introduction Let p be prime and let F p = Z/pZ. Given polynomials f,g F p [X ] with g non-zero, sums of the sort S = xFp,g(x)=0 e 2πif (x)/g(x) p C arise frequently in number theory. Typical questions / objectives: [G1 ] Is there an explicit expression for S ? [G2 ] Can we obtain upper bounds for |S | beyond the trivial one |S |≤ #F p = p? The “square-root philosophy” suggests |S | should have order p unless there is bias. [G3 ] Can we obtain lower bounds for |S | ? [G4 ] If S depends on further parameters l L how does it vary as a function of l? [G5 ] The function e 2πi· p defines an additive character on F p . What about multiplicative characters? What about mixed characters? [G6 ] What happens in higher dimensions? Typical applications: — Counting points on varieties over finite fields (§5). — Existence of rational points on varieties over number fields (circle method). — Bounds for Fourier coefficients of cusp forms (Poincar´ e series). Let’s proceed with some easy examples, illustrating the phenomena [G1]–[G6]. 1.1 Linear sums Take f (X )= aX, g(X )=1, so S = xFp e 2πiax p . If ζ a =e 2πia p then S =1+ ζ a + ζ 2 a + ··· + ζ p-1 a = p, p | a, 0, p a. Hence we have an example of [G1] above. 1

Upload: vokhue

Post on 28-May-2018

219 views

Category:

Documents


3 download

TRANSCRIPT

Exponential sums over finite fields

Tim Browning

Autumn, 2010

1 Introduction

Let p be prime and let Fp = Z/pZ. Given polynomials f, g ∈ Fp[X] with g non-zero, sumsof the sort

S =∑

x∈Fp, g(x) 6=0

e2πif(x)/g(x)

p ∈ C

arise frequently in number theory.

Typical questions / objectives:

[G1 ] Is there an explicit expression for S?

[G2 ] Can we obtain upper bounds for |S| beyond the trivial one |S| ≤ #Fp = p? The“square-root philosophy” suggests |S| should have order

√p unless there is bias.

[G3 ] Can we obtain lower bounds for |S| ?

[G4 ] If S depends on further parameters l ∈ L how does it vary as a function of l?

[G5 ] The function e2πi·

p defines an additive character on Fp. What about multiplicativecharacters? What about mixed characters?

[G6 ] What happens in higher dimensions?

Typical applications:

— Counting points on varieties over finite fields (§5).

— Existence of rational points on varieties over number fields (circle method).

— Bounds for Fourier coefficients of cusp forms (Poincare series).

Let’s proceed with some easy examples, illustrating the phenomena [G1]–[G6].

1.1 Linear sums

Take f(X) = aX, g(X) = 1, so S =∑

x∈Fpe

2πiaxp . If ζa = e

2πiap then

S = 1 + ζa + ζ2a + · · ·+ ζp−1

a =p, p | a,0, p - a.

Hence we have an example of [G1] above.

1

1.2 Gauss sums

Take f(X) = aX2, g(X) = 1, so Ga = S =∑

x∈Fpe

2πiax2

p . Assume a 6= 0 and p > 2. Let

ζ = e2πip and let

Ta =∑x∈Fp

(x

p

)ζax,

where ( ·p) is the Legendre symbol. We may write

1 +(x

p

)= #y ∈ Fp : y2 = x.

HenceGa =

∑x∈Fp

ζax#y ∈ Fp : y2 = x =∑x∈Fp

ζax + Ta = Ta

by §1.1.

Lemma 1.1 We have

(1) Ta = (ap )T1.

(2) G2a = T 2

a = (−1)p−12 p.

Proof: For (1) we note that (a

p

)Ta =

∑x∈Fp

(ax

p

)ζax = T1,

since for a 6= 0 we see that ax runs over a complete set of residues modulo pwhen x does.

For (2) it suffices to take a = 1 since G2a = T 2

1 by (1). For any b 6= 0 we have TbT−b =(−1p )T 2

1 by (1). Hence ∑b∈F∗p

TbT−b =(−1p

)(p− 1)T 2

1 .

But we also have

TbT−b =∑x,y∈Fp

(x

p

)(y

p

)ζb(x−y).

Hence §1.1 yields

∑b∈F∗p

TbT−b =∑x,y∈Fp

(x

p

)(y

p

)∑b∈Fp

yb(x−y) − 1

= p∑x∈Fp

(x2

p

)= p(p− 1),

since∑

x∈Fp(xp ) = 0. Comparing these expressions and recalling that (−1

p ) = (−1)p−12

completes the proof.

2

Lemma 1.1 implies that there exists εp ∈ ±1 such that

G1 = T1 =

εp√p, p ≡ 1(mod4),

εpi√p, p ≡ 3(mod4).

In fact one always has εp = +1, so Gauss sums give examples of the phenomena [G1],[G2], [G5]. They are ubiquitous in number theory and (in particular) can be used to givea short proof of quadratic reciprocity:

Theorem A If p 6= q are odd primes then(pq

)(qp

)= (−1)

p−12· q−1

2 .

Proof: For any n ∈ N let

Ga,n =∑

x∈Z/nZ

e2πiax2

n

Since any x ∈ Z/pqZ can be written in a unique way as x = px1 + qx2 where x1 is well-defined modulo q and x2 modulo p, we see that

G1,pq =∑x1∈Fq

∑x2∈Fp

e2πi(px1+qx2)2

pq = Gp,qGq,p =(p

q

)(q

p

)G1,pG1,q

by Lemma 1.1. But then the theorem follows from the fact that

G1,n = √

n, n ≡ 1(mod4),i√n, n ≡ 3(mod4).

for any odd n ∈ N.

In a similar spirit higher degree Gauss sums can be used to study higher reciprocity laws.

1.3 Kloosterman sums

Take f(X) = aX2 + b, g(X) = X , so

K(a, b; p) = S =∑x∈F∗p

e2πi(ax+bx−1)

p .

Getting a non-trivial upper bound for |K(a, b; p)| is most crucial ingredient in:

Theorem B (Kloosterman) Let a1, . . . , an ∈ N, with n ≥ 4. Then for sufficiently largeN ∈ N there are infinitely many solutions x ∈ Zn of the Diophantine equation

a1x21 + · · ·+ anx

2n = N,

provided that there is no congruence condition.

As we shall see in §6 Weil’s resolution of the Riemann Hypothesis for function fieldsyields

|K(a, b; p)| ≤ 2√p,

3

if a, b ∈ F∗p. To illustrate some useful ideas we present the proof of:

Theorem C (Kloosterman) Let p be prime and a, b ∈ F∗p. Then |K(a, b; p)| ≤ 2p34 .

Proof: The idea is to try and understand Kloosterman sums globally and not individually,through consideration of moments∑

a,b,∈F∗p

|K(a, b; p)|2k = Mk,

say. If we can show Mk ≤ M for some M ≥ 0 then since K(a, b; p) = K(ac, bc−1; p) forany c ∈ F∗p we deduce that

(p− 1)|K(a, b; p)|2k =∑c∈F∗p

|K(ac, bc−1; p)|2k ≤M,

whence

|K(a, b; p)| ≤(

M

p− 1

) 12k

.

Now for any k ≥ 1 we have

Mk =∑a,b∈Fp

|K(a, b; p)|2k − 2∑a∈F∗p

|K(a, 0; p)|2k − |K(0, 0; p)|2k.

But a 6= 0 implies that K(a, 0; p) = −1 by §1.1. Expanding the definition of the Klooster-man sums and their conjugates we deduce that

Mk =∑a,b∈Fp

∑x∈(F∗p)k

∑y∈(F∗p)k

ζa(x1+···+xk−y1−···−yk) × ζb(x−11 +···+x−1

k −y−11 −···−y−1

k )

− 2(p− 1)− (p− 1)2k,

where ζ = e2πip . But then §1.1 yields

Mk = p2Nk − 2(p− 1)− (p− 1)2k,

where Nk is the number of x, y ∈ (F∗p)k such that

x1 + · · ·+ xk = y1 + · · ·+ yk,

x−11 + · · ·+ x−1

k = y−11 + · · ·+ y−1

k .

If k = 1 then N1 = p− 1 and so

M1 = p3 − p2 − 2p+ 2− (p2 − 2p+ 1) = p3 − 2p2 + 1.

If k = 2 then there are 2(p − 1)2 − (p − 1) obvious solutions in which y = (x1, x2) or(x2, x1). Moreover there are (p − 1)2 solutions in which x1 + x2 = 0 or y1 + y2 = 0,

4

of which 2(p − 1) have already been counted via (x1,−x1, x1,−x1) or (x1,−x1,−x1, x1).Suppose x, y is any other solution. Then

x1 + x2 = y1 + y2,

y1y2(x1 + x2) = x1x2(y1 + y2).

Eliminating x1 we are left with

y1y2(y1 + y2) = x2(y1 + y2 − x2)(y1 + y2),

whence y1y2 = x2(y1 + y2 − x2). This implies y1(y2 − x2) = x2(y2 − x2). Thus eithery2 = x2 or y1 = x2, both of which are impossible. We conclude that

N2 = 2(p− 1)2 − (p− 1) + (p− 1)2 − 2(p− 1) = 3(p− 1)(p− 2),

whence

|K(a, b; p)| ≤(M2

p− 1

) 14

≤(3(p− 2)p2 − 2− (p− 1)3

) 14 ≤ 2p

34 ,

as required.

How about lower bounds for |K(a, b; p)|? Noting that

M2 ≤ maxa,b|K(a, b; p)|2M1,

we see that there exist a, b ∈ F∗p such that

|K(a, b; p)|2 ≥ M2

M1=

2p3 − 3p2 − 3p− 1p2 − p− 1

> 2p− 2.

This shows that the Weil bound is (essentially) best possible.

Now it is clear that K(a, b; p) = K(a, b; p). Hence K(a, b; p) ∈ R.

Lemma 1.2 For a, b 6= 0 and p prime we have K(a, b; p) 6= 0.

Proof: Let K = Q(ζ) be the cyclotomic field generated by ζ = e2πip . This field is Galois of

degree p− 1 with ring of integer OK = Z[ζ]. Moreover we have

(p) = pOK = pp−1,

where p is the prime ideal generated by p and 1− ζ. In particular ζ ≡ 1 modulo p, and itfollows that

K(a, b; p) =∑x∈F∗p

ζax+bx−1 ≡

∑x∈F∗p

1 (mod p) ≡ −1 (mod p).

In particular K(a, b; p) is non-zero.

5

Theorem D (Fouvry) For a, b 6= 0 and p prime we have

|K(a, b; p)| ≥

(1

2p34

)p−2

.

Proof: Let K = Q(ζ) be as before. Since Gal(K/Q) is formed from the p − 1 automor-phisms of K defined by ζ 7→ ζ l for 1 ≤ l < p, we see that the conjugates of the sumK(a, b; p) are just K(al, bl; p) for 1 ≤ l < p. Since K(a, b; p) ∈ OK it follows from Lemma1.2 that

1 ≤ |NK/Q(K(a, b; p))| =p−1∏l=1

|K(al, bl; p)| ≤ |K(a, b; p)|(2p34 )p−2,

by Theorem C.

It seems hard to do much better than this (after substituting Weil for Kloosterman in theproof). Thus Kloosterman sums illustrate the phenomena [G2] and [G3].

1.4 Equidistribution

Take f(X) = aX3 + bX, g(X) = 1, so

S(a, b; p) = S =∑x∈Fp

ζax3+bx,

where ζ = e2πip and a 6= 0. Then S(a, b; p) ∈ R and the Weil bound is |S(a, b; p)| ≤ 2

√p,

which we will establish in §6. It is natural to ask how the numbers

θa,bp =S(a, b; p)

2√p

are distributed on [−1, 1].

— Horizontal distribution: fix a, b ∈ Z with a 6= 0 and vary p.

— Vertical distribution: fix p and vary a, b ∈ Fp with a 6= 0.

In both cases it is conjectured that the θa,bp become equidistributed for the Sato–Tate mea-sure

µ([α, β]) =2π

∫ β

α

√1− x2dx.

For the vertical version this is:

Theorem E (Livne) For all α, β ∈ [−1, 1] we have

1p(p− 1)

#(a, b) ∈ F∗p × Fp : α ≤ θa,bp ≤ β −→ 2π

∫ β

α

√1− x2dx,

6

as p→∞.

Thus as p→∞ the numbers θa,bp get “more and more” equidistributed. If χ is the charac-teristic function of the interval [α, β] then the left hand side is

1p(p− 1)

∑(a,b)∈F∗p×Fp

χ(θa,bp

),

and the right hand side is2π

∫ 1

−1χ(x)

√1− x2dx.

Characteristic functions are typically difficult to investigate using harmonic analysis soone tries instead to prove that as p→∞ we have

1p(p− 1)

∑(a,b)∈F∗p×Fp

φ(θa,bp

)→ 2

π

∫ 1

−1φ(x)

√1− x2dx,

for “nicer” functions φ : [−1, 1] → C. Then (roughly) if one can do this for large enoughclass of φ then there is equidistribution. Since polynomials are dense in the set of “nice”functions it suffices to take φ(x) = xk and show, for every integer k ≥ 0,

1p(p− 1)

∑(a,b)∈F∗p×Fp

(θa,bp

)k→ 2

π

∫ 1

−1xk√

1− x2dx = ck,

say, as p→∞. One calculates

ck =

0 if k is odd,12k · 1

k/2+1 ·(kk/2

)if k is even.

Let

VMk =1

p(p− 1)

∑(a,b)∈F∗p×Fp

(θa,bP

)k

=1

p(p− 1)

(1

2√p

)k ∑a∈F∗p

∑b∈Fp

∑x∈Fp

ζax3+bx

k

,

where ζ = e2πip . Arguing as in our treatment of Kloosterman’s bound in §1.3 we see that

the inner sum over a, b is

T =∑a 6=0

∑b

∑x1,...,xk

ζax31+bx1 . . . ζax

3k+bxk

=∑a 6=0

∑b

∑x1,...,xk

ζa(Px3

i )+b(Pxi).

7

But ∑b∈Fp

ζb(Pxi) =

p, if

∑xi = 0,

0, if∑xi 6= 0,

and ∑a∈F∗p

ζa(Px3

i ) =p− 1, if

∑x3i = 0,

−1, if∑x3i 6= 0.

HenceT =

∑x1,...,xk

pδPxi=0

(pδPx3

i =0 − 1),

and it follows that

VMk =1

p(p− 1)· 1(2√p)k

(p2#Xk(Fp)− pk

),

where Xk ⊂ Ak denotes the affine variety

k∑i=1

x3i =

k∑i=1

xi = 0.

In order to understand equidistribution we are therefore led to seek precise estimates forthe counting function #X(Fp) associated to a higher-dimensional variety. Thus we arepresented with phenomena [G4] and [G6] in this example.

When k = 0 we have MV0 = 1.When k = 1 we have

MV1 =1

p(p− 1)· 12√p(p2 · 1− p) =

12√p−→ 0,

as p→∞.

Exercise 1 Let k = 2, 3. Calculate #Xk(Fp) and deduce that MVk → ck.

8

2 Finite Fields

Let F be any field, with multiplicative identity 1F . The map λ : Z → F such that λ(n) =n1F is a ring homomorphism. It has kernel (n) = nZ for some integer n ≥ 0. Hence wehave a ring isomorphism

Z/nZ → G,

where G is the integral domain obtained from the image of λ in F . If n = 0 then F con-tains a subring which is isomorphic to Z and we say F has characteristic 0. If n 6= 0 thenn = p for some prime p and F contains a subring which is isomorphic to Fp = Z/pZ. Wesay F has characteristic p in this case.

Let F be a finite field with cardinality q. Then char(F ) = p for some prime p, else Fwould contain a subfield isomorphic to Q (the field of fractions of G). Thus F contains asubfield K isomorphic to Fp, which one usually identifies with K. Let f = [F : Fp]. Thenq = pf .

Theorem A Let Fq be a finite field with q elements. Then q = pf for p prime and f ∈ N.For every such q there exists exactly one field Fq. This field is the splitting field of Xq−Xover Fp, and all its elements are roots of Xq −X .

Proof: We have already established the first part. Now let Fq be given and let F∗q be themultiplicative group. Then #F∗q = q − 1. If x ∈ F∗q then xq−1 = 1.Hence xq − x = 0 for every x ∈ Fp. Therefore

Xq −X =∏x∈Fq

(X − x),

and it follows that Fq is the splitting field of Xq − X over Fq. But a splitting field isuniquely determined up to isomorphism.Now let F be the splitting field of Xq −X over Fp, with q = pf . We claim F is the set ofroots of Xq −X in Fq. Let x1, . . . , xq be the roots. These are distinct since

ddX

(Xq −X) = −1 6= 0.

Also, xi + xj , xixj are roots since (xi + xj)q − (xi + xj) = xqi + xqj − xi − xj = 0 and(xixj)q − xixj = xqix

qj − xixj = xixj − xixj = 0. Likewise x−1

j is a root if xj 6= 0 and also−xj is a root. Thus the roots form a field and so F = x1. . . . , xq. This completes theproof.

Let Fq be a finite field and let n ∈ N be given. Then qn = pnf . By Theorem A the splittingfield ofXqn−X is precisely Fpnf and has degree nf over Fp. It follows that Fqn has degreen over Fq. Conversely any extension of degree n over Fq has degree nf over Fp and somust be Fpnf . We have therefore established the following consequence:

Corollary Given Fq and n ∈ N there is a unique extension of Fq of degree n and this is Fqn .

9

We now investigate the nature of F∗q .

Lemma 2.1 We have n =∑

d|n φ(d), where φ is the Euler totient function.

Proof: Since φ is a multiplicative arithmetic function, so is h(n) =∑

d|n φ(d). Hence itsuffices to check that pf = h(pf ). But

h(pf ) =

∑1≤e≤f

pe(

1− 1p

)+ 1 =pf+1 − p

p− 1

(1− 1

p

)+ 1 = pf ,

as desired.

Lemma 2.2 Let H be a finite group of order n. Suppose that for every d|n there are atmost d elements x ∈ H such that xd = 1. Then H is cyclic.

Proof: Let d|n. Suppose there exists x ∈ H of order d. Then (x) = 1, x, x2, . . . , xd−1 iscyclic of order d. By hypothesis all elements y ∈ H such that yd = 1 belong to (x). Inparticular all elements ofH of order d are generators of (x). There are φ(d) such elements.Hence the number of elements of H of order d is 0 or φ(d). By Lemma 2.1 it can’t be 0,else

n =∑d|n

φ(d) > #H = n,

which is impossible. Hence there is an element x ∈ H of order n and H = (x).

Taking H = F∗q and n = q − 1 in Lemma 2.2 we obtain the following result:

Theorem B F∗q is cyclic of order q − 1.

In particular it follows that Fq = Fp(x), if x is a generator of F∗q . We proceed to deter-mine the automorphisms of a finite field. Let Fq ⊂ Fr. Then r = qn. The Frobeniusautomorphism is the map

ω = ωq : Fr → Fr,

given by ω(x) = xq. This is injective since if xq = yq then 0 = xq − yq = (x− y)q, whencex = y. Thus ω is also surjective (since it is injective on a finite set to itself). Moreover ω isan automorphism since ω(x+y) = (x+y)q = xq+yq = ω(x)+ω(y) and ω(xy) = ω(x)ω(y).It leaves Fq fixed since if x ∈ Fq then ω(x) = xq = x. Thus ω ∈ Gal(Fr/Fq).

If r = qn then 1, ω, ω2, . . . , ωn−1 are automorphisms of Fr over Fq. They are distinct sinceif ∃i, j such that ωi(x) = ωj(x) for all x ∈ Fr, then ∃i, j such that

xqi − xq

j= 0

for all x ∈ Fr. But the polynomial Xqi −Xqj= 0 has degree less than r = qn. Hence the

polynomial is identically zero and i = j. Since the order of the Galois group is n we haveshown:

10

Theorem C The Galois group of Fr over Fq is cyclic with generator ω = ωq.

Much of these lectures will be concerned with equations over finite fields. As a taster weestablish the following basic result:

Theorem D (Chevalley–Warning) Let f1, . . . , fk ∈ Fq[X1, . . . , Xn] be polynomials such

that∑k

i=1 deg(fi) < n. Then

N = #x ∈ Fnq : f1(x) = · · · = fk(x) = 0 ≡ 0 (mod p).

Proof: Let P =∏ki=1(1 − f q−1

i ) and let x ∈ Fnq . If fi(x) 6= 0 then fi(x)q−1 = 1 and soP = 0. Hence

N ≡∑x∈Fn

q

P (x) (mod p).

By hypothesis degP < n(q − 1). Thus P is a linear conbination of monomials Xu =Xu1

1 · · ·Xunn with

∑ni=1 ui < n(q − 1). Suppose without loss of generality that u1 < q − 1.

If u1 = 0 then ∑x∈Fq

xu1 = q = 0.

If u1 ≥ 1 then by Theorem B there exists y ∈ F∗q such that yu1 6= 1, since u1 < q− 1. Hence∑x∈Fq

xu1 =∑x∈F∗q

yu1xu1 = yu1∑x∈F∗q

xu1 ,

so that(1− yu1)

∑x∈Fq

xu1 = 0

This implies that N ≡ 0(mod p) as required.

Suppose that f1, . . . , fk are homogeneous and let X ⊂ Pn−1 denote the correspondingvariety f1 = · · · = fk = 0. Then clearly 0 is a trivial solution of the equations. HenceTheorem D implies that X(Fq) 6= ∅ if

∑deg fi < n.

Corollary Fq is a C1-field.

Let’s return to our general discussion of finite fields. Let Fqn/Fq be a finite extension.The trace map is the Fq-linear map

Tr = TrFqn/Fq:

Fqn → Fqx 7→ x+ xq + · · ·+ xq

n−1.

The norm map is the multiplicative map

N = NFqn/Fq:

Fqn → Fqx 7→ x · xq · · ·xq

n−1= x(qn−1)/(q−1).

11

The latter restricts to a group homomorphism N : F∗qn → F∗q . Furthermore, it is clear thatN(ω(x)) = N(x) and Tr(ω(x)) = Tr(x) for any ω ∈ Gal(Fqn/Fn) and x ∈ Fqn .

Lemma 2.3

(i) The trace map Tr : Fqn → Fq is a surjective linear map with

Ker(Tr) = x ∈ Fqn : x = yq − y for some y ∈ Fqn

(ii) The norm map N : F∗qn → F∗q is a surjective homomorphism with

Ker(N) = x ∈ F∗qn : x = yq−1 for some y ∈ F∗qn.

Proof: For (i) we define δ : Fqn → Fqn given by δ(y) = yq−y. This is a Fq-linear map withKer(δ) = y ∈ Fqn : yq = y = Fq by Theorem C. Hence Im(δ) has dimension n − 1. Weclaim that Ker(Tr) = Im(δ), which will be enough to conclude the proof of (i) since then

dim Im(Tr) = n− dim Ker(Tr) = 1,

so that Tr is surjective.To establish the claim we have Im(δ) ⊆ Ker(Tr), since for any y ∈ Fqn ,

Tr(yq − y) = Tr(yq)− Tr(y) = 0.

On the other hand Ker(Tr) = x ∈ Fqn : P (x) = 0, where P (X) = X +Xq + · · ·+Xqn−1

is a polynomial of degree qn−1. Hence

# Ker(Tr) ≤ degP ≤ qn−1,

and so the inclusion already known must be an equality.

Exercise 2 Establish the second part of Lemma 2.3 using the group ho-momorphism ∆ : F∗qn → F∗qn given by ∆(x) = xq−1.

Lemma 2.3 shows that we have exact sequences of abelian groups

0 → Fq → Fqnδ→ Fqn

Tr→ Fq → 0,

and1 → F∗q → F∗qn

∆→ F∗qnN→ F∗q → 1.

12

3 Characters of Finite Abelian Groups

Let G be any group. A character of G is a group homomorphism

χ : G→ C∗.

The set of characters of G forms a group G called the dual of G, with

operation: (χ1 · χ2)(x) = χ1(x)χ2(x)inverse: χ−1(x) = χ(x)−1

unit: trivial homomorphism χ0 : x 7→ 1.

If G is a finite group of order n then any χ ∈ G takes values in µn = z ∈ C : zn = 1.Indeed for any x ∈ G we have

χn(x) = χ(xn) = χ(1) = 1.

Let e(x) = e2πix. It is easily checked that the map x 7→ e(x/p) gives a group homomor-phism for G = Fp. More generally we have:

Lemma 3.1 Let Cn be the cyclic group of order n, with generator g. Given a residue classa(mod n), the map

χa :Cn → µngt 7→ e(at/n)

is a character of Cn. Every character of Cn is of this type and Cn is cyclic of order n.

Proof: The first part is clear. Moreover distinct residue classes give distinct characters.Since χaχb = χa+b, the characters χa form a group which is isomorphic to the inte-gers modulo n, and hence it is cyclic of order n. Finally if χ ∈ Cn then χn(g) = 1, soχ(g) = e(a/n) for some a. But than χ(gt) = e(at/n), and so χ = χa.

Lemma 3.2 Let G = G1 ×G2 be a direct product of abelian groups G1, G2. Then

G ∼= G1 × G2

.

Proof: For every χ1 ∈ G1 and χ2 ∈ G2 we associate the map χ : G→ C, with

χ(x1, x2) = χ1(x1)χ2(x2).

It is easy to check that χ is a character of G and in fact the map

(χ1, χ2) 7→ χ

is an isomorphism G1 × G2 → G. To check surjectivity we note that if χ ∈ G then

χ(x1, x2) = χ(x1, 1)χ(1, x2) = χ1(x1)χ2(x2),

13

with χ1(x) = χ(x, 1) and χ2(x) = χ(1, x). Clearly χi ∈ Gi for i = 1, 2.

Now any finite abelian group G is isomorphic to

Cn1 × Cn2 × · · · × Cnk,

for cyclic groups Cni . Applying Lemma 3.1 and Lemma 3.2 repeatedly, we obtain:

Theorem A For any finite abelian group G we have G ∼= G.

Given a group G we let χ0 be the trivial character. The following facts will be used fre-quently.

Theorem B Let G be a finite abelian group of order n. Then we have the orthogonalityrelations:

(a) For any χ ∈ G ∑x∈G

χ(x) =n : if χ = χ0,0 : if χ 6= χ0.

(b) For any x ∈ G ∑χ∈G

χ(x) =n : if x = 1,0 : if x 6= 1.

Proof: For part (a) we note that the formula is obvious if χ = χ0. If χ 6= χ0 choose y ∈ Gsuch that χ(y) 6= 1. Then

χ(y)∑x∈G

χ(x) =∑x∈G

χ(xy) =∑x∈G

χ(x),

whence the result. For part (b) apply part (a) to G, which is a finite abelian group of ordern by Theorem A.

We now focus on the two abelian groups (Fq, +) and (F∗q , ×), for a finite field Fq, whichwe term additive and multiplicative, respectively.

3.1 Additive Characters of FqIt is easy to give a uniform description of additive characters. If Fq is a finite field withq = pf then we have an isomorphism of groups Fq ∼= (Fp)f . Mimicking the proof ofLemma 3.1 one can show:

Lemma 3.3 Let Fq be a finite field of characteristic p. There is an isomorphismFq → Fqa 7→ ψa,

14

where ψa is the character

ψa(x) = e(

TrFq/Fp(ax)

p

).

More generally, if ψ is an additive character of Fq, then the map

x 7→ ψ(TrFqn/Fq(x))

is a character of Fqn .

Proof: That these maps are characters follows from linearity of the trace in Lemma 2.3 (i).Moreover, by Theorem A we have #Fq = #Fq = q. Since a 6= a′ implies that ψa 6= ψa′ , itfollows that ψa runs through all additive characters of Fq as a runs through Fq.

3.2 Multiplicative Characters of F∗q

By Theorem 2.B and Theorem A it follows that F∗q is cyclic of order q − 1. However it isstill quite a complicated object to describe, since it is hard to give an explicit isomorphismF∗q ∼= Z/(q − 1)Z. Nonetheless we know every χ ∈ F∗q will have

χq−1 = χ0,

where χ0 is the trivial (or principal) character. We say χ is of order d if

χd = χ0,

and if d is the least positive integer with this property. Clearly d | q − 1. By the structureof cyclic groups, the characters of order dividing d form a subgroup of order d. As ananalogue of Lemma 3.3 we have:

Lemma 3.4 If χ is a character of F∗q then the map

x 7→ χ(NFqn/Fq(x)),

is a character of F∗qn , with order equal to the order of χ.

Proof: The first part is clear, and the second part follows from the surjectivity of the normmap in Lemma 2.3(ii).

It is convenient to extend the definition of a character χ to all of Fq by setting

χ(0) =

1 : χ = χ0,0 : χ 6= χ0.

15

Exercise 3 Let Fq be a finite field with q elements and let d|q − 1. Forany x ∈ Fq show that∑

χ∈Fq

χd=χ0

χ(x) = #y ∈ Fq : yd = x.

This sort of identity is very useful in analytic number theory to detect dth powers.

16

4 Gauss and Jacobi Sums

Let Fq be a finite field with q elements. Throughout this section we will let ψ be anadditive character of Fq and χ will always be a multiplicative character of Fq. We willreserve ψ0 (resp. χ0) for the trivial additive (resp. multiplicative) character of Fq. InSection 1 we met quadratic Gauss sums. We begin by discussing their generalisation.

4.1 Gauss Sums

The Gauss sum associated to ψ and χ is

g(χ, ψ) =∑x∈Fq

χ(x)ψ(x).

By Theorem 3.B we have

g(χ0, ψ) = 0, if ψ 6= ψ0,

g(χ, ψ0) = 0, if χ 6= χ0,

g(χ0, ψ0) = q.

If q odd and χ = χ2 is a non-trivial character of order 2 on F∗q , then for ψ 6= ψ0 we have

g(χ2, ψ) =∑x∈Fq

∑y2=x

1− 1

ψ(x)

=∑x∈Fq

ψ(x2),

by Exercise 3. If q = p and ψ(x) = e(x/p) then we retrieve the usual Gauss sum fromSection 1 since then

χ2(x) =(x

p

).

Indeed χ2 (being of order 2) is trivial on the subgroup (F∗p)2 of squares x2, with x ∈ F∗p,and maps non-squares to −1 since χ2 must factor

F∗p −→ F∗p/(F∗p)2∼−→ −1,+1.

If ψ1 is any fixed non-trivial additive character on Fq and ψ is any other character, thenψ(x) = ψ1(ax) for some a ∈ F∗q by Lemma 3.3. It is easy to see that

g(χ, ψ) =∑x∈Fq

χ(x)ψ1(ax)

=∑y∈Fq

χ(a−1y)ψ1(y)

= χ(a)g(χ, ψ1).

17

We can also compute the modulus of Gauss sums:

Theorem A If χ 6= χ0 and ψ 6= ψ0 then |g(χ, ψ)| = q12 .

Proof: Let g = g(χ, ψ). We have

|g|2 =∑x∈Fq

∑y∈F∗q

χ(x)ψ(x)χ(y)ψ(y)

=∑x∈Fq

∑y∈F∗q

χ(xy−1)ψ(x− y).

Putting u = xy−1 we obtain

|g|2 =∑y∈F∗q

∑u∈Fq

χ(u)ψ(y(u− 1)),

where we have now isolated a pure additive inner sum. By orthogonality (Theorem 2.B)we have ∑

y∈F∗q

ψ(y(u− 1)) =∑y∈Fq

ψ(y(u− 1))− 1

=

−1 : if u 6= 1,q − 1 : if u = 1,

whence|g|2 = q −

∑u∈Fq

χ(u) = q,

as required.

Gauss sums are quite remarkable, being algebraic integers of modulus precisely q12 which

arise as a sum of roots of unity. Moreover if σ is any field automorphism of C theng(χ, ψ)σ = g(σ χ, σ ψ).

Let q be a power of a prime p and let m ∈ Z. A q-Weil number of weight m is an algebraicinteger α such that |i(α)| = q

m2 for any embedding i : Q(α) → C. A Gauss sum is an

example of a q-Weil number of weight 1 and p-th roots of unity give examples of q-Weilnumbers of weight 0.

4.2 Jacobi Sums

Let Fq be a finite field and let χ, λ be multiplicative characters of Fq. The Jacobi sumassociated with χ and λ is

J(χ, λ) =∑x+y=1

χ(x)λ(y).

18

To illustrate their use let Np denote the number of solutions of the equation x2 + y2 = 1over Fp, for p > 2. Then

Np =∑a+b=1

∑x2=a

1∑y2=b

1

=∑a+b=1

(1 +

(a

p

))(1 +

(b

p

))= p+ J(χ2, χ2),

with χ2(x) = (xp ). We need the following result:

Lemma 4.1 For any non-trivial multiplicative character χ we have J(χ, χ−1) = −χ(−1).

Proof: It is clear that

J(χ, χ−1) =∑x+y=1y 6=0

χ

(x

y

)=∑x 6=1

χ

(x

1− x

).

Setting z = x(1−x) we see x = z

1+z if z 6= −1, whence

J(χ, χ−1) =∑z 6=−1

χ(z) = −χ(−1),

as claimed.

It follows from Lemma 4.1 that

Np = p−(−1p

)=p− 1 : if p ≡ 1 (mod 4),p+ 1 : if p ≡ 3 (mod 4).

It turns out that Jacobi sums can be simply expressed in terms of Gauss sums.

Theorem B Let χ, λ be non-trivial multiplicative characters on Fq with χλ 6= χ0. For anynon-trivial additive character ψ on Fq we have

J(χ, λ) =g(χ, ψ)g(λ, ψ)g(χλ, ψ)

.

Proof: Expanding the definitions of the two sums involved we get

J(χ, λ)g(χλ, ψ) =∑x

∑y

χ(x)λ(1− x)χ(y)λ(y)ψ(y).

The sum can be restricted to x /∈ 0, 1 and y 6= 0, since χ and λ are non-trivial. Definingu = xy and v = y − xy we get a bijective change of variables from

(x, y) ∈ (Fq \ 0, 1)× F∗q

19

to(u, v) ∈ F∗q × F∗q : u+ v 6= 0.

Indeed we can recover (x, y) by y = v + u and x = u/(u+ v). Hence

J(χ, λ)g(χλ, ψ) =∑u,v∈F∗qu+v 6=0

χ(u)λ(v)ψ(u+ v)

= g(χ, ψ)g(λ, ψ)−∑u∈F∗q

χ(u)λ(−u)

= g(χ, ψ)g(λ, ψ),

since χλ 6= χ0.

It follows from Theorems A and B that |J(χ, λ)| = q12 , and one sees that J(χ, λ) forms a

q-Weil number of weight 1. As a further application of Jacobi sums we quickly derive:

Theorem C (Fermat) Let p ≡ 1 (mod 4) be a prime. Then there exists a, b ∈ Z such thatp = a2 + b2.

Proof: Let χ2 be the Legendre character of order 2 on F∗p and let χ be a character of order4. By Theorem B we have |J |2 = p, with J = J(χ, χ2). But J is a sum of terms χ(x)χ2(y),with

χ(x) ∈ ±1, 0,±i, χ2(y) ∈ 0,±1.

Hence J = a+ bi for a, b ∈ Z and so p = |J |2 = a2 + b2.

Exercise 4 Let q be odd and let χ2 be the non-trivial multiplicative char-acter of order 2 on Fq. Let χ, ψ be multiplicative, additive characters onFq, respectively. Use Theorem 4.B to show that

g(χ2, ψ)g(χ2, ψ) = χ(4)g(χ, ψ)g(χχ2, ψ).

4.3 Salie Sums

One can generalise the notion of Kloosterman sums from Section 1 by defining

K(ψ, η) =∑x∈F∗q

ψ(x)η(x−1),

for any additive characters ψ, η of Fq. A related (but easier sum) is the Salie sum

T (ψ, η) =∑x∈F∗q

χ2(x)ψ(x)η(x−1),

where ψ, η are additive characters of Fq and χ2 is the non-trivial multiplicative characterof order 2 of F∗q .

20

Theorem D (Salie) Assume ψ, η 6= ψ0. Then we have

T (ψ, η) = g(χ2, ψ)∑y2=4a

ψ(y),

where a ∈ F∗q is such that η(x) = ψ(ax) for all x ∈ Fq.

Proof: The idea is to study the variation of the function

φ(b) = T (ψb, η) =∑x∈F∗q

χ2(x)ψ(bx+ ax−1).

We represent this function using the discrete multiplicative Fourier expansion

φ(b) =∑χ

φ(χ)χ(b),

where χ runs over multiplicative characters of F∗q and

φ(χ) =1

q − 1

∑b∈F∗q

φ(b)χ(b).

This expression follows from orthogonality (Theorem 3.B).

We now compute the Fourier coefficients

φ(χ) =1

q − 1

∑b∈F∗q

χ(b)∑x∈F∗q

χ2(x)ψ(bx+ ax−1)

=1

q − 1

∑x∈F∗q

χ2(x)ψ(ax−1)∑b∈F∗q

χ(b)ψ(bx)

=g(χ, ψ)q − 1

∑x∈F∗q

χ2(x)χ(x)ψ(ax−1),

by the property of Gauss sums recorded before Theorem A. Reapplying this property wededuce that

φ(χ) =g(χ, ψ)g(χ2χ, ψa)

q − 1

=χ2(a)χ(a)g(χ, ψ)g(χ2χ, ψ)

q − 1.

An application of Exercise 4 now yields

g(χ, ψ)g(χ2χ, ψ) = χ(4)g(χ2, ψ)g(χ2, ψ),

whence

φ(χ) =χ2(a)χ(4)χ(a)g(χ2, ψ)g(χ2, ψ)

q − 1.

21

Finally we see that

T (ψ, η) = φ(1) =∑χ

φ(x) =χ2(a)g(χ2, ψ)

q − 1

∑χ

χ(4a)g(χ2, ψ).

Opening up the inner Gauss sum we obtain

T (ψ, η) =χ2(a)g(χ2, ψ)

q − 1

∑χ

∑x∈F∗q

χ(4ax−2)ψ(x)

= χ2(a)g(χ2, ψ)∑x∈F∗q4a=x2

ψ(x),

by orthogonality. Finally we can remove the factor χ2(a), since if χ2(a) = −1 then theinner sum is zero anyway.

Since a 6= 0 the equation x2 = 4a has either 0 or 2 solutions in Fq. Since g(χ2, ψ) is a q-Weilnumber of weight 1 we deduce that T (ψ, η) is a sum of two q-Weil numbers of weight 1,with

|T (ψ, η)| ≤ 2√q.

22

5 Equations over Finite Fields

Given a field F we write An(F ) for the set of vectors x = (x1, . . . , xn), with xi ∈ F for1 ≤ i ≤ n. The set An(F ) is affine n-space over F and can be considered as a vector spaceby defining addition and scalar multiplication in the usual way.

On the set An+1(F ) \ 0 we define the equivalence relation

(x0, x1, . . . , xn) ∼ (y0, y1, . . . , yn)

if and only if there exists γ ∈ F ∗ such that

xi = γyi,

for 0 ≤ i ≤ n. We define projective n-space Pn(F ) to be the set of equivalence classes.If x ∈ An+1(F ), with x 6= 0, then [x] will denote the corresponding equivalence class inPn(F ). (Note that the points of Pn(F ) are in bijection with the lines in An+1(F ) passingthrough the origin). Points in Pn(F ) represented by x = (x0, x1, . . . , xn) with x0 6= 0 arecalled the finite points. Points in Pn(F ) represented by (0, x1, . . . , xn) are called the pointsat infinity.

Lemma 5.1 There is a bijection between finite points of Pn(F ) and An(F ).

Proof: This follows on noting that every finite point can be uniquely represented by some(1, x1, . . . , xn).

Thus Pn(F ) has 2 pieces: a copy of An(F ) (the finite points) and a copy of Pn−1(F ) (thepoints at infinity).

Henceforth let F = Fq be a finite field with q elements. We will be concerned in thissection with the number of affine/projective zeros of polynomials with coefficients in Fq.

Lemma 5.2 We have #An(Fq) = qn and #Pn(Fq) = qn + qn−1 + · · ·+ 1.

Proof: The first part is trivial and the second part follows on noting that An+1(Fq) \ 0has qn+1 − 1 elements and each equivalence class has #F∗q = q − 1 elements.

Given a polynomial

f(X) = f(X1, . . . , Xn) =∑

i1+···+in≤dai1,...,inX

i11 · · ·X

inn ,

with coefficients in Fq, we will be interested in

Nf = # x ∈ An(Fq) : f(x) = 0 .

The equation f(X) = 0 defines a hypersurface in affine n-space. Likewise, given a ho-mogeneous polynomial (a form)

F (X) = F (X0, . . . , Xn) =∑

i0+···+in=d

ai0,...,inXi00 · · ·X

inn ,

23

with coefficients in Fq, we will be interested in

N∗F = # [x] ∈ Pn(Fq) : F (X) = 0 .

The equation F (X) = 0 defines a hypersurface in projective n-space.

5.1 Crude Estimates

Let f ∈ Fq[X1, . . . , Xn] be a non-zero polynomial of degree d. Then we have:

Theorem A [Lang-Weil] Nf ≤ dqn−1.

Proof: If d = 0 then trivially Nf = 0. If d = 1 then

f(X1, . . . , Xn) = a1X1 + · · ·+ anXn + c,

and it is clear that Nf = qn−1. If n = 1 then clearly Nf ≤ d. Thus the result holds if n = 1or if d ≤ 1. We proceed by double induction. Suppose n > 1 and d > 1. There are twocases:Case (i): ∃x ∈ Fq such that X1−x divides f(X). Then f(X) = (X1−x)g(X), where g is anon-zero polynomial with at most n variables and degree d−1. The induction hypothesisimplies that

Nf ≤ qn−1 + (d− 1)qn−1 = dqn−1.

Case (ii): @x ∈ Fq such that X1 − x divides f(X). Then for any x ∈ Fq,

gx(X2, . . . , Xn) = f(x,X2, . . . , Xn)

is a non-zero polynomial with n− 1 variables and degree at most d. Hence the inductionhypothesis implies that

Nf ≤∑x∈Fq

Ngx ≤ dqn−1.

This completes the proof of the result.

Given f ∈ Fq[X1, . . . , Xn] as above we may introduce the polynomial

f∗(X0, . . . , Xn) = Xd0f(X1/X0, . . . , Xn/X0),

which will be homogeneous of degree d.

Lemma 5.3: We have Nf ≤ N∗f∗ ≤ Nf + dqn−2(1− 1

q )−1.

Proof: It is clear that N∗f∗ = Nf +N∗

F , where

F (X1, . . . , XN ) = f∗(0, X1, . . . , Xn).

In particular N∗f∗ ≥ Nf . Moreover Theorem A implies that NF ≤ dqn−1. Since any zeros

counted by N∗F are considered the same when they are proportional, it follows that

N∗f∗ ≤ Nf +

dqn−1

q − 1,

24

as required.

Let us proceed to discuss how Nf behaves on average. Given d let

Ωd = f ∈ Fq[X1, . . . , Xn] : deg f ≤ d ,ωd =

(i1, . . . , in) ∈ ZN≥0 : i1 + · · ·+ in ≤ d

.

Then f ∈ Ωd if and only if

f(X) =∑

(i1,...,in)∈ωd

ai1,...,inXi11 · · ·X

inn ,

for ai1,...,in ∈ Fq. It is clear that #Ωd = q#ωd , with

#ωd =(n+ dd

).

Theorem B We have1

#Ωd

∑f∈Ωd

Nf = qn−1.

Proof: Clearly, on reordering the summation, we have∑f∈Ωd

Nf =∑x∈Fn

q

∑f∈Ωdf(x)=0

1 =∑x∈Fn

q

q#ωd−1,

since for fixed x, the equation f(x) = 0 is linear in the coefficients. But then the resultfollows on noting that #Fnq = qn.

Exercise 5 : Apply Theorem B to show that

1#Ωd

∑f∈Ωd

(Nf − qn−1)2 = qn−1 − qn−2.

Exercise 5 shows that the average value of (Nf − qn−1)2 is qn−1 − qn−2 = O(qn−1). Onemight therefore expect that for “most” f one has

Nf = qn−1 +O(qn−1

2 ).

We proceed to discuss some examples where this sort of estimate can actually be proved.

5.2 Quadric Hypersurfaces

Assume Fq is a finite field with q odd. A quadratic form over Fq is given by

Q(X) = Q(X1, . . . , Xn) =∑

1≤i,j≤naijXiXj ,

25

with aij = aji ∈ Fq. We have Q(X) = XTAX , where A = (aij)≤i,j≤n is a symmetric n×nmatrix. We define the determinant of Q to be detQ = detA.

Two quadratic forms Q1, Q2 over Fq are said to be equivalent, written Q1(X) ∼ Q2(X),if there is a non-singular matrix T such that Q1(X) = Q2(TX). If Q1(X) ∼ Q2(X), thendetQ1 = detQ2(detT )2. In particular, if Q1 is non-singular and Q1(X) ∼ Q2(X), then Q2

is also non-singular and detQ1/detQ2 is a square in F∗q .

We say Q represents a ∈ F∗q if there exists x ∈ Fnq such that Q(x) = a. We say Q representszero (or Q is isotropic) if there exists x ∈ Fnq , with x 6= 0, such that Q(x) = 0.

Lemma 5.4 Suppose Q represents a ∈ F∗q . Then Q(X) ∼ aX21 + P (X2, . . . , Xn) for some

quadratic form P (X2, . . . , Xn) over Fq.

Proof: Let A be the underlying matrix of Q. Then there exists x ∈ Fnq such that xTAx = a.Since x 6= 0 we may find a non-singular matrix

B =

x1 c12 · · · c1n...

...xn cn2 · · · cnn

with cij ∈ Fq. Now Q(BX) = XTBTABX , and the entry in the upper left corner ofBTAB is xTAx = a. Hence there exist b2, . . . , bn ∈ Fq such that

Q(X) ∼ aX21 + 2b2X1X2 + · · ·+ 2bnX1Xn + h(X2, . . . , Xn).

After completing the square on the right hand side, one is easily led to statement oflemma.

Lemma 5.5 If a nonsingular quadratic form represents zero then it represents every ele-ment of Fq.

Proof: Equivalent forms clearly represent the same elements of Fq. By linear algebraevery quadratic form is equivalent to a diagonal form. (This can also be seen by apply-ing Lemma 5.4 and induction). It follows we may assume that we are given a diagonalquadratic form

Q(X) =n∑i=1

aiX2i ,

with ai ∈ F∗q for 1 ≤ i ≤ n. By assumption there exists x ∈ Fnq \ 0 such that Q(x) = 0.Without loss of generality we may assume x1 6= 0. Put y1 = x1(1 + t) and yi = xi(1 − t)for 2 ≤ i ≤ n, for a parameter t ∈ Fq. Then

Q(y) = 2t(a1x21 − a2x

22 − · · · − anx

2n) = 4ta1x

21.

Clearly, given a ∈ F∗q , we deduce that Q represents a on choosing t = a/(4a1x21).

26

Suppose now that Q ∈ Fq[X1, . . . , Xn] is a non-singular quadratic form with n ≥ 3.By Chevalley–Warning (Theorem 2.D) we know Q represents zero. Hence Q represents1 ∈ Fq by Lemma 5.5. Hence Q(X) ∼ X2

1 + P (X2, . . . , Xn) by Lemma 5.4, for somequadratic form P . Hence there exist x1, . . . , xn ∈ Fq, not all zero, with

x21 + P (x2, . . . , xn) = 0.

If x1 6= 0 then P represents −1. If x1 = 0 then P represents 0, and so represents −1 byLemma 5.5. Thus Lemma 5.4 yields

Q(X) ∼ X21 −X2

2 +R(X3, . . . , Xn)∼ X1X2 +R(X3, . . . , Xn),

for some non-singular quadratic form R in n− 2 variables.

We may use these facts to study the cardinality NQ = #x ∈ Fnq : Q(x) = 0.For d ∈ F∗q we introduce the symbol(

d

q

)=

+1 : if d ∈ (F∗q)2−1 : if d /∈ (F∗q)2

By our remarks above we have (detQ1

q

)=(

detQ2

q

)for any non-singular equivalent quadratic forms Q1, Q2. We have the following result:

Theorem C Let Q ∈ Fq[X1. . . . , Xn] be a non-singular quadratic form with detQ = ∆ andn ≥ 3. Then

NQ =

qn−1 : n odd,

qn−1 + (q − 1)qn−2

2

((−1)

n2 ∆q

): n even.

Proof: By our work above we may assume that

Q(X) = X1X2 +R(X3, . . . , Xn),

for a non-singular quadratic form R. There are 2q − 1 choices for x1, x2 ∈ Fq such thatx1x2 = 0. Hence the number of x counted by NQ for which R(x3, . . . , xn) = 0 is

(2q − 1)NR.

Likewise given x3, . . . , xn such that R(x3, . . . , xn) 6= 0, there are q − 1 choices of x1, x2

such that Q(x) = 0. Hence the number of x counted by NQ for which R(x3, . . . , xn) 6= 0is

(q − 1)(qn−2 −NR).

27

Adding these together we find that

NQ = qn−1 − qn−2 + qNR.

Suppose that n is odd. If n = 1 then NQ = 1. If n ≥ 3 then induction implies that

NQ = qn−1 − qn−2 + q · qn−3 = qn−1,

as required.Suppose now that n is even. If n = 2 then there exist a1, a2 ∈ F∗q such that Q(X1, X2) ∼a1(X2

1 + (a2/a1)X22 ). Moreover (

−∆q

)=(−a1a2

q

).

If (−∆q ) = −1 then (−(a2/a1)

q ) = −1 and so Q(X1, X2) has only the trivial zero, so that

NQ = 1. If If (−∆q ) = 1 then (−(a2/a1)

q ) = 1 and so there exist 2(q− 1) non-trivial solutionsto Q(X1, X2) = 0. Hence NQ = 1 + 2(q − 1) = 2q − 1 in this case. If n ≥ 4 then inductionreveals that

NQ = qn−1 − qn−2 + q

qn−3 + (q − 1)q

n−42

((−1)

n−22 (−∆)q

)

= qn−1 + (q − 1)qn−2

2

((−1)

n2 ∆

q

),

as required.

5.3 Diagonal Hypersurfaces

Given any polynomial f ∈ Fq[X1. . . . , Xn] it follows from Theorem 3.B that

Nf =1q

∑ψ

∑x∈Fn

q

ψ(f(x)),

where the sum runs over additive characters ψ of Fq. If ψ 6= ψ0 is a fixed additive char-acter of Fq then by Lemma 3.3 we see that ψ(a)(x) = ψ(ax) runs through all additivecharacters as a runs through Fq. Hence

Nf =1q

∑a∈Fq

∑x∈Fn

q

ψ(af(x)).

We will use this expression to calculate Nf when f is a diagonal form of degree d.

Theorem D Assume d | q − 1 and f(X) = a1Xd1 + · · ·+ anX

dn, with ai ∈ F∗q for 1 ≤ i ≤ n.

Then for any additive character ψ of Fq we have

Nf = qn−1 +(

1− 1q

) ∑χ1,...,χn

χ1(a1) · · ·χn(an)g(χ1, ψ) · · · g(χn, ψ),

28

where g(χi, ψ) is a Gauss sum for each 1 ≤ i ≤ n and the sum is over multiplicativecharacters χ1, . . . , χn of Fq, with χi 6= χ0, such that χ1 · · ·χn = χ0 and χdi = χ0.

Proof: By our work above we have

qNf =∑a∈Fq

∑x∈Fn

q

ψ(aa1xd1 + · · ·+ aanx

dn)

=∑a∈Fq

n∏i=1

∑xi∈Fq

ψ(aaixdi )

= qn +∑a∈F∗q

n∏i=1

∑xi∈Fq

ψ(aaixdi ).

By Exercise 3 (since d | q − 1) we have∑xi∈Fq

ψ(aaixdi ) =∑yi∈Fq

ψ(aaiyi)∑χd

i =χ0

χi(yi).

If a 6= 0 we may make the change of variables yi 7→ yi/(aai), to obtain∑xi∈Fq

ψ(aaixdi ) =∑χd

i =χ0

χi(aai)∑yi∈Fq

χi(yi)ψ(yi)

=∑χd

i =χ0

χi(aai)g(χi, ψ).

Thus

qNf − qn =∑

χ1,...,χn

χdi =χ0

χ1(a1) · · ·χn(an)g(χ1, ψ) · · · g(χn, ψ)∑a∈F∗q

χ1(a) · · ·χn(a).

If χ1 · · ·χn 6= χ0 then ∑a∈F∗q

χ1(a) · · ·χn(a) =∑a∈Fq

χ1 · · ·χn(a) = 0,

by Theorem 3.B (orthogonality). But if χ1 · · ·χn = χ0 then∑a∈F∗q

χ1(a) · · ·χn(a) = q − 1.

Moreover if χi = χ0 then g(χi, ψ) = 0 by §4.1. We deduce that

qNf − qn = (q − 1)∑

χ1,...,χn

χ1(a1) · · ·χn(an)g(χ1, ψ) · · · g(χn, ψ),

where the sum is over characters χ1, . . . , χn of Fq for which χi 6= χ0 and χdi = χ0 andχ1 · · ·χn = χ0. The theorem follows.

29

Let g be a fixed generator of the cyclic group F∗q (Theorem 2.B). By Lemma 3.1 any multi-plicative character χi of Fq, with χdi = χ0, is of the form

χi(gt) = e(bit/d),

where bi ∈ [0, d) is an integer. In fact bi 6= 0 if χi 6= χ0. We have χ1 · · ·χn = χ0 preciselywhen

b1d

+ · · ·+ bnd∈ Z.

Let A(d) be the number of b = (b1, . . . , bn) ∈ Z such that 0 < bi < d for 1 ≤ i ≤ n and

b1 + · · ·+ bn ≡ 0 (mod d).

Then A(d) is the number of summands in Theorem D, whence:

Corollary D.1 Let f(X) be as in the statement of Theorem D, with d | q − 1. Then

| Nf − qn−1 |≤ A(d)(

1− 1q

)q

n2 .

Proof: This follows on noting that each Gauss sum g(χi, ψ) has modulus q12 by Theo-

rem 4.A

Note that A(d) < (d− 1)n. In fact it is not hard to show that

An(d) = A(d) =d− 1d

((d− 1)n−1 − (−1)n−1

),

by induction on n. For n = 1 or n = 2 it is trivial to see that A1(d) = 0 and A2(d) = d− 1.For n ≥ 2 we see that (b1, . . . , bn) is counted by An(d) precisely if 0 < bi < d and

−bn ≡ b1 + · · ·+ bn−1 6≡ 0 (mod d)

The number of possibilities for b1, . . . , bn−1 is (d− 1)n−1 −An−1(d), whence

An(d) = (d− 1)n−1 −An−1(d).

The claim then follows from the induction hypothesis.

We note that Theorem D implies that for diagonal hypersurfaces the error termNf −qn−1

is a sum of A(d) q-Weil numbers. We would now like to study the dependence of thenumber of solutions on the field of coordinates. For any ν ∈ N let us write Nf (ν) for thenumber of x ∈ Fnqν such that

f(x) =n∑i=1

aixdi = 0,

30

with a1, . . . , an ∈ F∗q and d | q − 1. If χi is a character of Fq, with χdi = χ0, Lemma 3.4implies that

χν,i(x) = χi(NFqν /Fq(x))

is a character of Fqν with the same order as χi. Since the norm map is surjective (byLemma 2.3) it follows that χν,i 6= χ′ν,i if χi 6= χ′i. Hence as χi runs through characters ofFq with χdi = χ0, so χν,i runs through characters of Fqν with χdν,i = χ0. Moreover we mayreplace ψ by

ψν(x) = ψ(TrFqν /Fq(x)),

which is an additive character of Fqν . Thus in the formula of Theorem D, to get an expres-sion for Nf (ν), we must replace q by qν , χi(ai) by χν,i(ai) = (χ(ai))ν , and g = g(χi, ψ)by gν = g(χν,i, ψν). The Hasse-Davenport relation, which we shall prove in the followingsection (Theorem E), shows that

−gν = (−g)ν .But then Theorem D implies that

Nf (ν)− qν(n−1) = (−1)(ν−1)n

(1− 1

) ∑χ1,...,χn

(n∏i=1

χi(ai)g(χi, ψ)

)ν,

where the sum is over multiplicative characters of Fq as in statement of Theorem D. Thus

Nf (ν) =∑i

ανi −∑j

βνj

for suitable q-Weil numbers αi, βj of various weights. Weil (1949) conjectured that sucha formula always holds for the counting function Nf (ν) associated to a polynomial f ∈Fq[X1, . . . , Xn]. We will return to this topic in §5.5.

5.4 Hasse-Davenport Relation

Let χ (respectively, ψ) be a multiplicative (respectively, additive) character of Fq. For anextension Fqν/Fq we let χν = χ NFqν /Fq

and ψν = ψ TrFqν /Fq, as above. We wish to

compare the Gauss sums g = g(χ, ψ) and gν = g(χν , ψν). Given a monic polynomial

f(X) = Xd − c1Xd−1 + · · ·+ (−1)dcd ∈ Fq[X],

we define λ(f) = ψ(c1)χ(cn).

Lemma 5.6 For any monic f, g ∈ Fq[X] we have λ(fg) = λ(f)λ(g).

Proof: If g(X) = Xe − b1Xe−1 + · · ·+ (−1)ebe then

f(X)g(X) = Xd+e − (b1 + c1)Xd+e−1 + · · ·+ (−1)d+ebecd.

Thus

λ(fg) = ψ(b1 + c1)χ(becd)= ψ(b1)ψ(c1)χ(be)χ(cd)= λ(f)λ(g),

31

as required.

Lemma 5.7 Let α ∈ Fqν and let f ∈ Fq[X] be the monic irreducible polynomial for α overFq. Then λ(f)

νd = χν(α)ψν(α), where d = deg f .

Proof: Suppose f(X) = Xd − c1Xd−1 + · · ·+ (−1)dcd. Then

TrFqν /Fq(α) =

ν

dc1, NFqν /Fq

= cνdd ,

by the transitivity of the trace and norm. But then

λ(f)νd = ψ(c1)

νdχ(cd)

νd

= ψ(νd· c1)χ(c

νdd )

= ψν(α)χν(α),

as required.

We now show how the Gauss sum gν can be represented using the λ-function.

Lemma 5.8 We have gν =∑

(deg f)λ(f)ν

deg f , where the sum is over all monic irreduciblepolynomials f ∈ Fq[X] with (deg f) | ν.

Proof: By the properties of finite field elucidated in §2 we have∏d|ν

∏deg f=d

f(X) = Xqν −X,

where the product is over all monic irreducible polynomials f ∈ Fq[X] with degree di-viding ν. Indeed the roots of the right hand side are exactly the elements x ∈ Fqν ofmultiplicity 1 and, conversely, every such x has a minimal polynomial which must occur,exactly once, among the polynomials on the left hand side. Now let f ∈ Fq[X] be a monicirreducible polynomial with d = (deg f) | ν and roots α1, . . . , αd ∈ Fqν . Then Lemma 5.7implies that

d∑i=1

χν(αi)ψν(αi) = dλ(f)νd .

Summing over all polynomials of the required type yields the result.

We are now ready to prove:

Theorem E (Hasse-Davenport relation) Let χ 6= χ0 (resp. ψ 6= ψ0) be a multiplicative(resp. additive) character of Fq. Let ν ∈ N. Then g(χν , ψν) = (−1)ν−1g(χ, ψ)ν .

Proof: Expanding (1−λ(f)T deg f )−1 as a geometric series and using unique factorisationin Fq[X], we obtain the identity∑

f

λ(f)T deg f =∏f

(1− λ(f)T deg f )−1,

32

where the sum (resp. product) is over monic (resp. monic irreducible) f ∈ Fq[X]. Wedefine λ(1) = 1, for this to make sense. Now

∑f

λ(f)T deg f =∞∑d=0

∑deg f=d

λ(f)T d.

For d = 1 we have ∑deg f=1

λ(f) =∑a∈Fq

λ(X − a) = g(χ, ψ),

by definition. Moreover, for d > 1 we have∑deg f=d

λ(f) =∑

c1,...,cd∈Fq

λ(Xd − c1Xd−1 + · · ·+ (−1)dcd)

= qs−2∑

c1,cd∈Fq

χ(cd)ψ(c1)

= 0,

by orthogonality and the assumption χ 6= χ0, ψ 6= ψ0. Hence∑f

λ(f)T deg f = 1 + g(χ, ψ)T

=∏f

(1− λ(f)T deg f )−1.

Taking logarithms of both sides gives

log(1 + g(χ, ψ)T ) = −∑f

log(1− λ(f)T deg f ).

Differentiating with respect to T and multiplying through by T we obtain

g(χ, ψ)T1 + g(χ, ψ)T

=∑f

(deg f)λ(f)T deg f

1− λ(f)T deg f.

Expanding the denominators as power series yields

∞∑u=1

(−1)u−1g(χ, ψ)uT u =∑f

∞∑v=1

(deg f)λ(f)vT v deg f .

Equating the coefficients of T u, we deduce that

(−1)u−1g(χ, ψ)u =∑

deg f |u

(deg f)λ(f)u

deg f

= g(χu, ψu),

by Lemma 5.8. This completes the proof of the theorem.

33

5.5 The Zeta Function of a Hypersurface

Let F ∈ Fq[X0, . . . , Xn] be a form of degree d. For any ν ∈ N let

N∗F (ν) = #[x] ∈ Pn(Fqν ) : F (x) = 0.

These numbers are naturally studied via the zeta function

ZF (T ) = exp

( ∞∑ν=1

N∗F (ν)T ν

ν

).

It is possible to regard ZF (T ) either as a formal power series in Q[[T ]], or as a function ofa complex variable which is defined and analytic on the disc

T ∈ C : |T | < q−n.

By Lemma 5.2 it follows that

#Pn(Fνq ) =(qν)n+1 − 1qν − 1

.

Hence the zeta function attached to projective n-space is

Z0(T ) = exp

(n−1∑m=0

∞∑ν=1

(qmT )ν

ν

)

= exp

(−

n−1∑m=0

log(1− qmT )

)= (1− qn−1T )−1 · · · (1− qT )−1(1− T )−1.

In particular we have Z0(T ) ∈ Q(T ).

As a further example we calculate the zeta function associated to the curve X30 + X3

1 +X3

2 = 0 in P2(Fq).

Lemma 5.0 Let F (X) = X30 +X3

1 +X32 . Then there exists a q-Weil number π of weight 1

such that

ZF (T ) =(1 + πT )(1 + πT )(1− T )(1− qT )

.

Proof: It is clear that for each ν ∈ N we have

N∗F (ν) =

NF (ν)− 1qν − 1

,

whereNF (ν) = #x ∈ A3(Fqν ) : F (x) = 0. But then Theorem D (with n = 3 and ai = 1),together with the final part of §5.3, implies that

NF (ν) = q2ν + (−1)ν−1

(1− 1

)g(χ, ψ)3 + g(χ2, ψ)3

.

34

Here ψ is any additive character of Fq and χ is the non-trivial cubic character of Fq. Hence

N∗F (ν) = qν + 1 +

(−1)ν−1g(χ, ψ)3ν + g(χ2, ψ)3νqν

.

By Theorem 4.B we haveg(χ, ψ)2 = J(χ, χ)g(χ2, ψ),

where J is the Jacobi sum. But χ2 = χ−1 = χ, so that

g(χ, ψ)g(χ2, ψ) = g(χ, ψ)g(χ, ψ)

= g(χ, ψ) · χ(−1)g(χ, ψ)= χ(−1)q,

by §4.1 and Theorem 4.A. Since χ(−1) = χ((−1)3) = 1, as χ is a cubic character, it followson multiplying both sides by g(χ, ψ) that

g(χ, ψ)3 = πq, g(χ2, ψ)3 = πq,

where π = J(χ, χ) is a q-Weil number of weight 1. Direct calculation now shows that

ZF (T ) = exp

( ∞∑ν=1

T ν(qν + 1 + (−1)ν−1(πν + πν))ν

)

=(1 + πT )(1 + πT )(1− T )(1− qT )

,

as required

Lemma 5.9 provides a further example in which ZF (T ) ∈ Q(T ). Given any non-zeroform F ∈ Fq[X0, X1, X2] which is non-singular over every algebraic extension of Fq, Weil(1948) proved that

ZF (T ) =P (T )

(1− T )(1− qT ),

for P ∈ Z[T ] of degree (d − 1)(d − 2) = 2g, where d = degF and g is the genus of curvedefined by F . Furthermore he showed that each root α of P is a q-Weil number of weight-1 (i.e. |α| = q−

12 ). This last statement is called the “Riemann Hypothesis for curves”. To

see the analogy set T = q−s. Then

ζF (s) = ZF (q−s) = (1− q−s)−1(1− q1−s)−1P (q−s),

and roots of ZF (T ) having absolute value q−12 is equivalent to the roots of ζF (s) having

<(s) = 12 .

In fact for any form F ∈ Fq[X0, . . . , Xn] Dwork (1959) proved the rationality of ZF (T ),whence there is a factorisation

ZF (T ) =∏i(1− αiT )∏j(1− βjT )

,

35

with αi, βj ∈ C. (Note that the constant term is 1, as can be seen by expanding definitionof ZF (T ) as a power series about the origin).

Lemma 5.10 Assume ZF (T ) ∈ C(T ) as above. Then N∗F (ν) =

∑j β

νj −

∑i α

νi .

Proof: Taking logarithmic derivatives we see that

Z ′F (T )ZF (T )

=∑i

−αi1− αiT

−∑j

−βj1− βjT

.

Multiplying by T and using geometric series (cf Theorem E), we obtain

TZ ′F (T )ZF (T )

=∞∑ν=1

∑j

βνj −∑i

ανi

T ν .

But from the definition we see that the left hand side is

TZ ′F (T )ZF (T )

=∞∑ν=1

N∗F (ν)T ν .

Comparing coefficients gives the lemma.

In fact the converse of the lemma is easily seen to be true. Let us end this section witha brief discussion of the cohomological interpretation of the zeta function. Let V be anon-singular projective hypersurface defined by a form F ∈ Fq[X0. . . . , Xn]. Let

Ω : V → V,

Ω([x0, . . . , xn]) = [xq0, . . . , xqn]

be the (n+ 1)-fold Frobenius automorphism from §2. For any ν let

Vν = [x] ∈ Pn(Fqν ) : F (x) = 0,

so that N∗F (ν) = #Vν . Then if Ων denotes the composition of Ω with itself ν times we

see that Vν is the set of fixed points of Ων on V . The Lefschetz–Grothendieck fixed pointformula states that for any prime

` 6= p = char(Fq),

we have

N∗F (ν) =

2(n−1)∑i=0

(−1)i Tr(Ω∗ν;H i(V,Q`)),

where H i denotes the ith etale cohomology group and Ω∗ denotes the induced mappingof Ω on the cohomology group. In fact H i = H i(V,Qγ) is a finite dimensional vector

36

space over Q`, with dimH i = Bi, the ith Betti number. Note that the cohomology groupsvanish for i > 2 dimV = 2(n− 1). By linear algebra we deduce that

ZF (T ) =2(n−1)∏i=0

exp

( ∞∑ν=1

Tr(Ω∗ν ;H i)T ν

ν

)(−1)i

=2(n−1)∏i=0

det(I − TΩ∗;H i)(−1)i+1

=P1(T )P3(T ) · · ·P2(n−1)−1(T )P0(T )P2(T ) · · ·P2(n−1)(T )

,

with Pi(T ) = det(I − TΩ∗;H i) ∈ Q[T ] and degPi = Bi. Let Pi(T ) =∏j(1 − αijT ). The

values αij are called the “characteristic values of the zeta function” (they are the eigen-values of the Frobenius morphisms Ω∗j). In fact, when V is non-singular, most of thecohomology groups vanish and we get:

Theorem F (Deligne) Let e = n− 1− dimV .

(i) Pi(T ) ∈ Z[T ] and P0(T ) = 1− T and P2e(T ) = 1− qeT .

(ii) [RH] For any 0 ≤ i ≤ 2e the characteristic value αij is a q-Weil number of weight i.

(iii) [Functional equation] Let χ(V ) =∑2e

i=0(−1)iBi and

ε =

1 : 2 - e,(−1)N : 2 | e,

where N is the multiplicity of the eigenvalue qe2 of Ω∗ | He. Then

ZF

(1qeT

)= εq

eχ(V )2 Tχ(V )ZF (T ).

(iv) ZF (T ) = P ∗e (T )(−1)e−1 ∏e

j=o(1− qjT )−1, where

P ∗e (T ) =

Pe(T ) : 2 - e,Pe(T )(1− q

e2T )−1 : 2 | e.

We will not prove this here. Grothendieck (1971) has calculated

Be =d− 1d

((d− 1)n + (−1)n+1

)+

0 : 2 - e,1 : 2 | e,

if F has degree d. Hence Lemma 5.10 yields the following result, which retrieves Corol-lary D.1 for diagonal hypersurfaces.

Corollary F.1 If V is a non-singular hypersurface of dimension e defined over Fq then

|#V (Fq)− (qe + qe−1 + · · ·+ 1)| ≤ Beqe2 .

37

6 Riemann Hypothesis for sums in one variable

In this section we record the Weil estimate for exponential sums in one variable and pro-vide a self-contained proof for hyperelliptic curves.

Let Fq be a finite field and let χ (resp. ψ) be a non-trivial multiplicative (resp. additive)character of Fq. Let f ∈ Fq[X] be a polynomial of degree d. Then we have the followingresults.

Theorem A (Weil) Assume χ has order e > 1 with e | q − 1. Let 1 ≤ m ≤ d be the numberof distinct zeros of f in Fq and assume there does not exist h ∈ Fq[X] such that f = he.Then ∣∣∣∣∣∣

∑x∈Fq

χ(f(x))

∣∣∣∣∣∣ ≤ (m− 1)√q.

Theorem B (Weil) Assume ψ non-trivial and d < q with (d, q) = 1. Then∣∣∣∣∣∣∑x∈Fq

ψ(f(x))

∣∣∣∣∣∣ ≤ (d− 1)√q.

By introducing the companion sums

S(1)ν (f) =

∑x∈Fqν

χ(NFqν /Fq(f(x))),

S(2)ν (f) =

∑x∈Fqν

ψ(TrFqν /Fq(f(x))),

one can form associated zeta functions

Z(i)f (T ) = exp

( ∞∑ν=1

S(i)ν (f)T ν

ν

)for i = 1, 2. As in §5.5 these are rational functions and one sees the analogy between theestimates in Theorems A, B and the Riemann Hypothesis for the ordinary zeta function.More generally we have:

Theorem C (Dwork) Let f, g ∈ Fq(X) and let χ (resp. ψ) be a multiplicative (resp. addi-tive) character of Fq. Let

Sν =∑x∈Fqν

χ(NFqν /Fq(f(x)))ψ(TrFqν /Fq

(g(x))).

Then there exist coprime polynomials P,Q ∈ C[T ], with P (0) = Q(0) = 1, such that

Z(T ) = exp

( ∞∑ν=1

SνTν

ν

)=P (T )Q(T )

.

38

We will not prove this result in this course. However we recall from Theorem 5.E (Hasse–Davenport relation) that

−gν(χ, ψ) = −g(χν , ψν) = (−g(χ, ψ))ν ,

for any ν ≥ 1. This is equivalent to

Z(T ) = exp

( ∞∑ν=1

gν(χ, ψ)T ν

ν

)= 1 + g(χ, ψ)T,

which is a very special case of Theorem C.

Recall the Kloosterman sum

K(ψ, η) =∑x∈F∗q

ψ(x)η(x−1)

from §4.3, for any additive characters ψ, η of Fq. Define companion sums

Kν(ψ, η) =∑x∈F∗q

ψ(TrFqν /Fq(x))η(TrFqν /Fq

(x−1)).

By mimicking the proof of Theorem 5.E one can show that

Z(T ) = exp

( ∞∑ν=1

Kν(ψ, η)T ν

ν

)=

11 +K(ν, η)T + qT 2

.

Note that the role of λ in the proof of Theorem 5.C is now played by the character

µ : G→ C∗,

with µ(f) = ψ(a1)η(ad−1/ad) if f(X) = Xd + a1Xd−1 + · · · + ad−1X + ad and G is the

group of quotients of monic polynomials in Fq[X] defined and non-vanishing at zero.

With Theorem C taken on faith let us make the transition to point counting. For TheoremA this is:

Lemma 6.1 Let f ∈ Fq[X] be non-constant and monic and let e | q − 1. Then for all ν ≥ 1we have ∑

x

∑x∈Fqν

χ(NFqν /Fq(f(x))) = #(x, y) ∈ F2

qν : ye = f(x) − qν ,

where the sum is over multiplicative characters χ of Fq with χ 6= χ0 and χe = χ0.

Proof: This follows from Exercise 3 and the remarks at the close of §5.3.

39

The estimation of the counting function in Lemma 6.1 is the object of:

Theorem D (Stepanov, Bombieri) Let ν ≥ 1. Let f ∈ Fq[X] be a monic non-constant poly-nomial of degree d and let e | q − 1, with (e, d) = 1. Then there exists a constant C ≥ 0,depending only on d and e, such that∣∣#(x, y) ∈ F2

qν : ye = f(x) − qν∣∣ ≤ Cq

ν2 .

To see how this is used in the deduction of Theorem A, when f is as in the statement, weneed the elementary:

Lemma 6.2 Let ω1, . . . , ωr ∈ C, let A,B > 0 and assume that∣∣∣∣∣r∑i=1

ωνi

∣∣∣∣∣ ≤ ABν

for all large enough ν ∈ N. Then |ωi| ≤ B for all i.

Proof: Consider the complex power series

D(z) =∞∑ν=1

(r∑i=1

ωνi

)zν

=r∑i=1

11− ωiz

.

The hypothesis implies that D converges absolutely in the disc |z| < B−1 and so D isanalytic in this region. In particular it has no poles there, so that |ωi|−1 ≥ B−1 for all i.

Let χ be as in Theorem A. Then in fact (cf. Theorem C) there exists ω1, . . . , ωm−1 ∈ C suchthat

Z(T ) = exp

( ∞∑ν=1

SνTν

ν

)=

∏1≤j≤m−1

(1− ωjT ),

where Sν =∑

x∈Fqνχ(NFqν /Fq

(f(x))). As in Lemma 5.10, this implies that

Sν = −(ων1 + · · ·+ ωνm−1).

Putting this together with Lemma 6.1 we see

Nν = qν +∑χ6=χ0χe=χ0

= qν −∑χ6=χ0χe=χ0

ω1(χ)ν + · · ·+ ωm−1(χ)ν.

40

But then Theorem D implies that∣∣∣∣∣∣∣∣∑χ6=χ0χe=χ0

ω1(χ)ν + · · ·+ ωm−1(χ)ν

∣∣∣∣∣∣∣∣ qν2 ,

where the implied constant depends only on d and e. Applying Lemma 6.2 we deducethat for any 1 ≤ i ≤ m− 1 and χ 6= χ0 such that χe = χ0 we have |ωi(χ)| ≤ q

12 . But then

|S1| ≤ (m− 1)√q, as required for Theorem A.

The resolution of Theorem B follows similar lines, but this time based on:

Lemma 6.3 Let f ∈ Fq[X] be non-constant and monic and let ψ 6= ψ0 be a fixed additivecharacter of Fq. Then for all ν ≥ 1 we have∑

a∈F∗q

∑x∈Fqν

ψ(TrFqν /Fq(af(x))) = #(x, y) ∈ F2

qν : yq − y = f(x) − qν .

Proof: This follows from orthogonality of additive characters and the fact that there existsy ∈ Fqν such that yq − y = f(x) if and only if TrFqν /Fq

(f(x)) = 0. It follows that

q#x ∈ Fqν : TrFqν /Fq(f(x)) = 0 = #(x, y) ∈ F2

qν : yq − y = f(x),

since there are q solutions y ∈ Fqν to the equation

yq − y = f(x),

for given x ∈ Fqν . Indeed if y0 is any fixed solution then the general solution is y = y0 +z,for z ∈ Fq (Note that (y − y0)q = y − y0).

To deal with Theorems A and B one is therefore led to count points on the special curves

Y e = f(X) and Y q − Y = f(X).

6.1 Interlude on Short Character Sums

In §3 we discussed multiplicative characters of F∗q . By analogy a Dirichlet character ofmodulus m ∈ N is a character

χ : (Z/mZ)∗ → C.

One extends the definition to Z/mZ by setting χ(a) = 0 if (a,m) > 1.

Of special interest in analytic number theory are the incomplete character sums

Sχ(N) =∑

M<n≤M+N

χ(n),

41

over a short interval. Note that

|Sχ(N)| ≤N (trivial),6√m logm (Polya–Vinogradov),

if χ has modulus m. In particular the second estimate is trivial for sums of lengthN

√m.

Theorem E (Burgess) Let r ≥ 1 and let p be prime. For any Dirichlet character of modulusp we have

Sχ(N) N1− 1r p

r+1

4r2 (log p)1r .

Proof: Note that this result follows from the above unless

p14+ 1

4r N ≤ p12+ 1

4r log p,

which we now assume. We argue by induction on N .

::::::Shifts: Applying a shift n 7→ n+ h for 1 ≤ h ≤ H < N we obtain∣∣∣∣∣∣Sχ(N)−

∑M<n≤M+N

χ(n+ h)

∣∣∣∣∣∣ H1− 1r p

r+1

4r2 (log p)1r ,

by induction for the 2 character sums of length H which do not overlap the originalinterval. We use shifts of the type h = ab for 1 ≤ a ≤ A and 1 ≤ b ≤ B, with H = AB <N . Averaging over a, b and noting that N < p we get

Sχ(N) =1H

∑1≤a≤A1≤b≤B(ab,p)=1

∑M<n≤M+N

χ(n+ ab) +O(Ep(N)),

whereEp(N) is the upper bound recorded in the statement of the theorem. By periodicityand multiplicativity of Dirichlet characters we have χ(n + ab) = χ(a)χ(an + b), whereaa ≡ 1(modp). Hence

|Sχ(N)| ≤ V

H+O(Ep(N)),

where

V =∑

x( mod p)

ν(x)

∣∣∣∣∣∣∑

1≤b≤Bχ(x+ b)

∣∣∣∣∣∣and ν(x) = #a, n : 1 ≤ a ≤ A,M < n ≤ M + N,x ≡ an(modp). It is difficult toanalyse size of ν(x), so we relax its role.

::::::::Holder: Writing ν(x) = ν(x)

r−1r ν(x)

1r in V , we obtain V ≤ V

1− 1r

1 V12r

2 W12r , where

V1 =∑

x( mod p)

ν(x) = AN,

V2 =∑

x( mod p)

ν(x)2,

42

and

W =∑

x( mod p)

∣∣∣∣∣∣∑

1≤b≤Bχ(x+ b)

∣∣∣∣∣∣2r

.

Now V2 is the number of a1, a2, n1, n2 with 1 ≤ a1, a2 ≤ A and M < n1, n2 ≤M +N suchthat

a1n2 ≡ a2n1(modp).

Fix a1, a2 and put a1n2 − a2n1 = pk. We have∣∣∣∣k − (a1 − a2)M

p

∣∣∣∣ ≤ 2AN

p

and (a1, a2) | k. For fixed a1, a2, k as above the number n1, n2 is at most

2N(a1, a2)maxa1, a2

.

Hence

V2 ≤ 2N∑

1≤a1,a2≤A

(a1, a2)maxa1, a2

(2AN

p(a1, a2)+ 1)

(AN)2

p+AN log 2A.

::::RH: Next we will show that

W ≤ (2rB)rp+ 2rB2rp12 ,

for which we may assume that B < p. But

W =∑

1≤b1,...,b2r≤B

∑x( mod p)

χ((x+ b1) · · · (x+ br))χ((x+ br+1) · · · (x+ b2r)).

If there exists bi(modp), with 1 ≤ i ≤ 2r, which is different from the remaining ones then

χ((x+ b1) · · · (x+ br))χ((x+ br+1) · · · (x+ b2r)) = χ(f(x)),

withf(X) =

∏1≤j≤r

(X + bj)∏

r+1≤j≤2r

(x+ bj)p−2,

and bi is a root of order 1 or p− 2, which is coprime with the order of χ. Hence TheoremA yields ∣∣∣∣∣∣

∑x( mod p)

χ((x+ b1) · · · (x+ br))χ((x+ br+1) · · · (x+ b2r))

∣∣∣∣∣∣ ≤ 2rp12 .

Thus

W ≤ B2r · 2rp12 + r

(2rr

)Br · p,

43

which gives the result.

We now put everything together, choosing

A =

[N

9rp12r

], B = [rp

12r ],

so A,B ≥ 1 and AB < N .

Using the approximate functional equation one deduces from taking r = 2 in Theorem Ethe “subconvexity bound”

L

(12

+ it, χ

) |t|p

316

+ε,

for any ε > 0, where L(s, χ) is the Dirichlet L-function associated to a character χ ofmodulus p. (Recall that the convexity bound for an L-function L(f, s) in the Selberg classis

L(f, s) q(f, s)14+ε,

when Re(s) = 12 , where q(f, s) is the conductor).

6.2 Stepanov’s Method

We now begin the proof of Theorem D using Stepanov’s method.

Let a(f) = #(x, y) ∈ F2q : ye = f(x) − q. Then we have the following result.

Lemma 6.4 We have

#(x, y) ∈ F2q : ye − f(x) ≥ q − (e− 1) max

ε∈F∗q|a(εf)|.

Proof: Fix representatives 1, ε2, . . . , εe for the finite cyclic group F∗q/(F∗q)e. If ε is any ofthem let

fε = ε−1f,

which has degree d. Let

C∗ε = (x, y) ∈ Fq × F∗q : ye = fε(x).

Then #C∗ε = q + a(fε) − Nf , where Nf is the number of zeros of f in Fq. In particular0 ≤ Nf ≤ d. Since f(x) is of the form εye for a unique coset representative ε and for evalues of y, provided f(x) 6= 0, we see that∑

ε

#C∗ε =∑ε

∑ye=ε−1f(x)

1 =∑x∈Fq

f(x) 6=0

e = e(q −Nf ).

44

Comparing our expressions we deduce that

e(q −Nf ) = e(q −Nf ) +∑ε

a(fε),

whence a(g) ≥ −(e− 1) maxε6=1 |a(fε)|, as required.

In view of Lemma 6.4, in order to establish Theorem D, it therefore suffices to establishthe upper bound

N = #(x, y) ∈ F2q : ye = f(x) ≤ q +O(q

12 ),

when q 1, where the implied constants depend at most on d = deg f and e. We assumethat (e, d) = 1 and e | q − 1. In particular this implies that Y e − f(X) is absolutely irre-ducible.

The plan of attach is surprisingly simple. We will construct an auxiliary polynomialA ∈ Fq[X], and a parameter m ≥ 1, such that

(i) A 6= 0; and

(ii) if (x, y) is counted by N then the polynomial (X − x)m divides A.

But then it will follow that N ≤ e · deg(A)m .

Let g(X) = f(X)q−1

e . Then we begin with the following:

Lemma 6.5 Lethi(X) = ki0(X) +Xqki1(X) + · · ·+XqKkiK(X),

for 0 ≤ i ≤ e− 1, with deg kij ≤ qe − d. Suppose

h0(X) + g(X)h1(X) + · · ·+ g(X)e−1he−1(X)

is identically zero. Then each kij(X) is identically zero.

Proof: A typical summand is of the form

lij(X) = g(X)iXqjkij(X).

It will suffice to show that the degrees of the non-zero summands are all distinct. But

deg lij = qj +id(q − 1)

e+ deg kij

=q

e(ej + id) + deg kij −

id

e,

whenceq

e(ej + id)− d < deg lij ≤

q

e(ej + id) +

q

e− d.

45

Hence we need to show that for pairs (i, j) 6= (i′, j′), we have ej + id 6= ej′ + i′d. But

ej + id = ej′ + i′d⇒ id ≡ i′d (mode)⇒ i ≡ i′ (mode)⇒ i = i′, j = j′,

since (e, d) = 1 and 0 ≤ i, i′ ≤ e− 1.

To produce polynomials vanishing to large order, for which it is natural to use deriva-tives, we will have to deal with the fact that in characteristic p it is not true that a poly-nomial P has a zero of order l at x0 if and only if P (i)(x0) = 0 for 0 ≤ i < l, when l > p.(Take P (X) = Xp, for example).

For the moment let F be any field. For l ∈ Z≥0 let E(l) be the linear operator on F [X],given by

E(l)(Xt) =(tl

)Xt−l

for t ≥ 0. If D is the differentiation operator then

D(l)(Xt) = l!(tl

)Xt−l

and so D(l) = l!E(l). We call E(l) the lth Hasse derivative.

Lemma 6.6 We have

E(l)(f1(X) · · · ft(X)) =∑

i1+···+it=lE(i1)(f1(X)) · · ·E(it)(ft(X)).

Proof: We deal with the case t = 2 since the general case then follows by induction on t.By linearity we may suppose, furthermore, that fi(X) = Xai , for i = 1, 2. Then we needto show that (

a1 + a2

l

)Xa1+a2−l =

l∑i=0

(a1

i

)(a2

l − i

)Xa1+a2−l.

But this identity involving binomial coefficients is obvious from the definition of(a1+a2

l

)as the number of subsets with l elements in a set with a1 + a2 elements.

Using this we may deduce the following expressions.

Lemma 6.7 We have

E(l)((X − c)t) =(t

l

)(X − c)t−l.

46

Proof: We have

E(l)((X − c)t) =∑

i1+···it=lE(i1)(X − c) · · ·E(it)(X − c)

by Lemma 6.6. But E(1)(X − c) = 1 and E(i)(X − c) = 0 if i ≥ 2. Hence it is only thesummands with each ij ∈ 0, 1 that occur, of which there are

(tl

), each giving (X − c)t−l.

Exercise 6: Let 0 ≤ l ≤ t. Use Lemma 6.6 to show that

E(l)(a(X)f(X)t) = b(X)f(X)t−l,

with deg(b) = deg(a) + l(deg(f)− 1).

We may now establish:

Theorem F Suppose E(l)(f(x)) = 0 for 0 ≤ l ≤M − 1. Then (X − x)M | f(X).

Proof: Write f(X) = a0 + a1(X − x) + · · ·+ ad(X − x)d. Then Lemma 6.7 implies that

E(l)(f(X)) = al +(l + 1l

)al+1(X − x) + · · ·+

(d

l

)ad(X − d)d−l.

By hypothesis al = 0 for 0 ≤ l ≤M − 1.

We may now return to the setting of Theorem D with F = Fq. We can now tackle thefundamental:

Lemma 6.8 Let ε ∈ Z with 1 ≤ ε ≤ e− 1. Let a ∈ Fq[X] have degree ε. Let

S = x ∈ Fq : a(g(x)) = 0 or f(x) = 0.

Let M ≥ d+ 1 be an integer with

(M + 3)2 ≤ 2qe.

Then there exists non-zero r ∈ F[X] which has a zero of order ≥ M for every x ∈ S andsatisfies

deg(r) ≤ ε

eqM + 4dq.

Proof: We consider r(X) = h(X,Xq), with

h(X,Y ) = f(X)Me−1∑i=0

K∑j=0

kij(X)g(X)iY j ,

47

where kij(X) have degree ≤ qe − d, with coefficients to be determined, and where

K =[εe(M + d+ 1)

].

By Ex. 6, since g = fq−1

e , we have

E(l)(f(X)Mkij(X)g(X)i) = f(X)M−lk(l)ij (X)g(X)i,

where k(l)ij has degree deg kij + l(d− 1). Hence it follows that for 0 ≤ l < M ≤ q we have

E(l)(r(X)) = f(X)M−le−1∑i=0

K∑j=0

k(l)ij (X)g(X)iXqj .

Indeed for r(X) = h(X,Xq) ∈ Fq[X] and l < q we have

E(l)(r(X)) = E(l)X (h(X,Xq)),

where E(l)X h is the Hasse derivative of h(X,Y ) performed with respect to X . (Hint: by

linearity it suffices to consider the case h(X,Y ) = XaY b. Use Lemma 6.6 and note thatE(l)(Xq) = 0 if 0 < l < q).

We want that E(l)(r(x)) = 0 for 0 ≤ l ≤ M − 1 and any x ∈ S. Let y ∈ Fq with a(y) = 0.Then

yε = c0 + c1y + · · ·+ cε−1yε−1,

since deg(a) = ε. Hence for i ≥ 0,

yi = c(i)0 + · · ·+ c

(i)ε−1y

ε−1.

Thus if x ∈ Fq satisfies a(g(x)) = 0 then xq = x and

g(x)i =ε−1∑t=0

c(i)t g(x)

t.

Hence for such x

E(l)(r(x)) = f(x)M−lε−1∑t=0

s(l)t (x)g(x)t,

where

s(l)t (X) =

e−1∑i=0

K∑j=0

c(i)t k

(l)ij (X)Xj .

Thus we are done if the polynomials s(l)t (X) are identically zero for 0 ≤ t ≤ ε− 1. Now

deg(s(l)t (X)) ≤ maxi,j

deg(k(l)ij ) +K

≤ maxi,j

deg(kij) + l(d− 1) +K

<q

e+ l(d− 1)− 1 +K.

48

For 0 ≤ t ≤ ε − 1 and 0 ≤ l ≤ M − 1 let B be the total number of coefficients of s(l)t (X).Then

B < εM(qe

+K)

+ εM2

2(d− 1)

<εq

eM + εM2

(d+ 1

2

)+ εM(d+ 1),

Since K ≤ εe(M + d+ 1) and ε ≤ e− 1.

If we denote by A the total number of possible coefficients of all the kij , then

A ≥(qe− d)e(K + 1)

≥ (q − de)ε

e(M + d+ 1)

≥ qε

eM +

e(d+ 1)− 2dεM,

since M ≥ d+ 1.

If B < A then we obtain a system of homogeneous linear equations in the coefficients ofthe kij in which the number of conditions is less than the number of variables. Hence, ifB < A, there exists a non-trivial solution for these coefficients and we have constructedr(X) such that it has a zero of order ≥M for x ∈ Fq such that a(g(x)) = 0. But

(M + 3)2 ≤ 2qe⇒M2 + 6M <

2qe

⇒M2

(d+ 1

2

)+ 3M(d+ 1) <

q

e(d+ 1)

⇒ B < A.

Since r(X) has a factor f(X)M , moreover, it follows that r(X) has a zero of order ≥ Mfor each x ∈ S. Finally Lemma 6.5 implies that r(X) is not identically zero, and we have

deg r(X) ≤ dM +q

e− d+

(e− 1)d(q − 1)e

+ qK

≤ ε

eqM + 4dq.

The lemma is proved.

Applying Lemma 6.8 we see that for any M as in the lemma,

#S ≤ deg(r)M

≤ ε

eq +

4dqM

,

since the number of zeros of r, counted with multiplicities, cannot exceed its degree.

ChooseM = [√

2qe −3]. This will be≥ d+1 if q 1, with the implied constant depending

on d and e. Then#S ≤ ε

eq + 4de

12 q

12 .

49

Choosing a(Y ) = Y − 1 in Lemma 6.8, so that ε = 1, we see that S is the set of x ∈ Fqwith g(x) = 1 or f(x) = 0. Hence

N = #(x, y) ∈ F2q : ye = f(x)

≤ e#x ∈ Fq : g(x) = 1+ #x ∈ Fq : f(x) = 0≤ eS

≤ q + 4de32 q

12 ,

as required to complete the proof of Theorem D.

50