artigo_2 fernanda.pdf

16
 Sales forecasting using longitudinal data models Edward W. Frees * , Thomas W. Miller University of Wisconsin—Madison, School of Business, 975 University Avenue, Madison, WI 53706, USA Abstract This paper shows how to forecast using a class of linear mixed longitudinal, or panel, data models. Forecasts are derived as special cases of best linear unbiased predictors, also known as BLUPs, and hence are optimal predictors of future realizations of the response. We show that the BLUP forecast arises from three components: (1) a predictor based on the conditional mean of the response, (2) a component due to time-varying coefficients, and (3) a serial correlation correction term. The forecasting techniques are applicable in a wide variety of settings. This article discusses forecasting in the context of marketing and sales. In  particular, we consider a data set of the Wiscons in State Lottery , in which 40 weeks of sales are available for each of 50 postal codes. Using sales data as well as economic and demographic characteristics of each postal code, we forecast sales for each  postal code. D 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.  Keywor ds:  Panel data models; Unobserved effects; Random coefficients; Heterogeneity 1. Introduction For ecasting is an int egr al par t of a marketi ng mana ger’ s role . Sal es for ecasts are important for  under standing market share and the compe tition, future production needs, and the dete rminants of  sales, including promotions, pricing, advertising and distribution. Market researchers and business analysts are often faced with the task of predicting sales. One approach to prediction is to use cross-sectional data, working wit h one poi nt in time or aggr ega tin g over several time per iods . We search for variables that rel ate to sales and use those variables as explanatory variables in our models. Another app roach is to work wit h time ser ies data , aggr egat ing across sale s ter ritorie s or accounts. We use past and curr ent sales as a  predictor of future sales as we search for explanatory variables that relate to sales. Cross-sectional and simple time series approaches do not make full use of data available to sales and marketing managers. Typical sales data have a hier- archic al st ructur e. They are longit udi nal or panel dat a, having bot h cross-sec tional and time ser ies characteristics. They are cross-sectional because they incl ude obse rva tions from many cas es, sal es acr oss stores, territories or accounts; we say that these data are differentiated across ‘space’. They are time series data because they repr esent many poi nt s in time. Longitudinal data methods are appropriate for these types of data. Longitudinal data methods have been wide ly develope d for underst andi ng rela tionshi ps among variables in the social and biological sciences includ- ing mar keti ng res earc h; see, for exa mple ,  Ailawadi and Nesl in (1998)  and  Erdem (1996).  But ther e is 0169-2070/$ - see front matter  D 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/S0169-2070(03)00005-0 *  Corresponding author.  E-mail addr esses:  [email protected] (E.W. Frees), [email protected] (T.W. Miller). www.elsevier.com/locate/ijforecast International Journal of Forecasting 20 (2004) 99–114

Upload: adrianobritto

Post on 04-Nov-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • lo

    , T

    ness,

    ixed

    BLU

    archical structure. They are longitudinal or panel

    astingdistribution.

    Market researchers and business analysts are often

    faced with the task of predicting sales. One approach

    to prediction is to use cross-sectional data, working

    with one point in time or aggregating over several

    time periods. We search for variables that relate to

    data, having both cross-sectional and time series

    characteristics. They are cross-sectional because they

    include observations from many cases, sales across

    stores, territories or accounts; we say that these data

    are differentiated across space. They are time series

    data because they represent many points in time.sales, including promotions, pricing, advertising andD 2003 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

    Keywords: Panel data models; Unobserved effects; Random coefficients; Heterogeneity

    1. Introduction

    Forecasting is an integral part of a marketing

    managers role. Sales forecasts are important for

    understanding market share and the competition,

    future production needs, and the determinants of

    or accounts. We use past and current sales as a

    predictor of future sales as we search for explanatory

    variables that relate to sales.

    Cross-sectional and simple time series approaches

    do not make full use of data available to sales and

    marketing managers. Typical sales data have a hier-postal code.the response. We show that the BLUP forecast arises from three components: (1) a predictor based on the conditional mean of

    the response, (2) a component due to time-varying coefficients, and (3) a serial correlation correction term. The forecasting

    techniques are applicable in a wide variety of settings. This article discusses forecasting in the context of marketing and sales. In

    particular, we consider a data set of the Wisconsin State Lottery, in which 40 weeks of sales are available for each of 50 postal

    codes. Using sales data as well as economic and demographic characteristics of each postal code, we forecast sales for eachSales forecasting using

    Edward W. Frees*

    University of WisconsinMadison, School of Busi

    Abstract

    This paper shows how to forecast using a class of linear m

    special cases of best linear unbiased predictors, also known as

    International Journal of Forecsales and use those variables as explanatory variables

    in our models. Another approach is to work with

    time series data, aggregating across sales territories

    0169-2070/$ - see front matter D 2003 International Institute of Forecaste

    doi:10.1016/S0169-2070(03)00005-0

    * Corresponding author.

    E-mail addresses: [email protected] (E.W. Frees),

    [email protected] (T.W. Miller).ngitudinal data models

    homas W. Miller

    975 University Avenue, Madison, WI 53706, USA

    longitudinal, or panel, data models. Forecasts are derived as

    Ps, and hence are optimal predictors of future realizations of

    www.elsevier.com/locate/ijforecast

    20 (2004) 99114Longitudinal data methods are appropriate for these

    types of data.

    Longitudinal data methods have been widely

    developed for understanding relationships among

    variables in the social and biological sciences includ-

    ing marketing research; see, for example, Ailawadi

    and Neslin (1998) and Erdem (1996). But there is

    rs. Published by Elsevier B.V. All rights reserved.

  • E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114100relatively little literature available for forecasting

    using longitudinal data methods. Some important

    exceptions include Battese, Harter, and Fuller

    (1988) and Baltagi and Li (1992).

    By using information in both the cross section

    (space) and time, we are able to provide forecasts

    that are superior to traditional forecasts that use only

    one dimension. We can forecast at the subject/micro

    level, providing managers with additional informa-

    tion for making both strategic and tactical decisions,

    including decisions about sizing and capacity plan-

    ning for manufacturing plants, pricing, marketing

    promotions, advertising, sales organization and sales

    processes.

    The longitudinal data mixed model is introduced

    in Section 2. Appendix A shows that the longitudi-

    nal data mixed model can be represented as a special

    case of the mixed linear model. Thus, there is a large

    literature on estimation of the regression parameters

    (B) as well as variance components; see, for exam-ple, Searle, Casella, and McCulloch (1992) or Ver-

    beke and Molenbergs (2000). In the data analysis

    Section 4, we find it convenient to use the SAS

    procedure for mixed linear models (PROC MIXED)

    when estimating longitudinal data mixed models;

    See Littell, Milliken, Stroup, and, Wolfinger (1996)

    for an introduction from this perspective. Similar

    procedures are available for S-PLUS (Pinheiro and

    Bates, 2000).

    Section 3 develops longitudinal data mixed model

    forecasts using best linear unbiased predictors

    (BLUPs). These predictors were introduced by Gold-

    berger (1962) and developed by Harville (1976) in

    the context of the mixed linear model. One goal of

    this paper is to show how this type of predictor can

    be used as an optimal forecast for longitudinal data

    mixed models. This section is more technical and

    many readers may wish to go directly to the Section

    4 case study.

    Specifically, Section 4 describes the case study

    motivating the theoretical modeling work, forecasting

    Wisconsin lottery sales. Here, we consider a data set

    that contains forty weeks of lottery sales from a

    random sample of 50 Wisconsin postal codes. Sec-

    tion 4 shows how to specify an appropriate longitu-

    dinal data mixed model and forecast using the

    specified model.Section 5 closes with some concluding remarks.2. Longitudinal data mixed model

    Longitudinal data models are regression models in

    which repeated observations of subjects, such as

    stores, are available. Using longitudinal data models,

    we can provide detailed representations of character-

    istics that are unique to each subject, thus accounting

    for the classical misspecification problem of hetero-

    geneity. Furthermore, the repeated observations over

    time allow us to consider flexible models of the

    evolution of responses, such as sales, known as the

    dynamic structure of a model.

    This article introduces forecasting for a broad

    class of dynamic longitudinal data models that we

    call the longitudinal data mixed model. As an

    example of this class of models, consider the basic

    two-way model

    yit ai kt xitVb eit; t 1; . . . ; T ;i 1; . . . ; n: 2:1

    Baltagi (1988) and Koning (1989) developed forecasts

    for this model. Here, yit denotes the response (sales) for

    the ith subject, such as store, during the tth time period.

    This is a model of balanced data in that we assume that

    the same number, T, observations is available for each

    of n stores. The quantity b b1; . . . ; bKV is a K 1vector of parameters that is common to all subjects and

    xit xit;1; xit;2; . . . ; xit;KV is the corresponding vectorof covariates. The term ai is specific to subject iyet is common to all time periods. This variable may

    account for features that are unique, yet unobserved,

    characteristics of each subject. The term kt is specificto the time period t yet is common to all stores.

    This variable may account for common, yet unob-

    served, events that affect sales. Both terms ai and ktare random variables and hence the model in Eq.

    (2.1) is also known as the two-way error components

    model.

    The longitudinal data mixed model is considerably

    more complex than the model in Eq. (2.1) because it

    has the ability to capture many additional features of

    the data that may be of interest to an analyst. We

    focus on three aspects:

    1. The longitudinal data mixed model does not

    require balanced data. To illustrate, it is possibleto allow new subjects to enter the data by

  • these quantities.

    l Jourallowing the first observation to be after time t=1.

    Similarly, the last observation may be prior to

    time t=T, allowing for early departure. Using an

    underlying continuous stochastic process for the

    disturbances, the model allows for unequally

    spaced (in time) observations as well as missing

    data.

    2. The longitudinal data mixed model allows for

    covariates associated with vector error compo-

    nents. This allows one to handle broad classes of

    mixed models, such as random coefficient

    models.

    3. The longitudinal data mixed model allows for

    specification of dynamic aspects in two fashions,

    through the error terms {eit} and through thespecification of {kt} as a stochastic process.

    To illustrate the third point, in the traditional

    longitudinal data mixed model, such as introduced

    by Laird and Ware (1982), the dynamics are speci-

    fied through the correlation structure of subject-

    specific errors. For example, it is common to con-

    sider an autoregressiveoforderp ARp model forthe disturbances {eit} of the form:

    ei;t /1ei;t1 /2ei;t2 . . . /pei;tp fi;t:2:2

    where {fi,t} are initially assumed to be identicallyand independently distributed, mean zero, random

    variables. Alternative structures are easily accommo-

    dated; see Section 3 for further discussion.

    Alternatively, we may model the dynamics using a

    stochastic process for {kt}. For comparison, note thatEq. (2.2) is a model of serial relationships at the

    subject level, whereas a dynamic model of kt is onethat is common to all subjects. To illustrate the latter

    specification, Section 3 considers a random walk

    model for the common, time-specific components.

    Beginning with the basic two-way model in Eq.

    (2.1), more generally, we use

    za;i;t;1ai;1 : : : za;i;t;qai;q zVa;i;tai 2:3

    and

    : : :

    E.W. Frees, T.W. Miller / Internationazk;i;t;1kt;1 zk;i;t;rkt;r zVk;i;tkt: 2:43. Longitudinal data mixed model and forecasting

    3.1. The longitudinal data mixed model

    A more compact form of Eq. (2.5) can be given by

    stacking over t . This yields a matrix form of the

    longitudinal data mixed model

    yi Za;iai Zk;il Xib ei;i 1; . . . ; n: 3:1This expression uses vectors of responses, yi yi1;yi2; . . . ; yiTiV, and of disturbances, eei ei1; ei2; . . . ;eiTiV. Similarly, the matrices of covariates areXi xi1; xi2; . . . ; xiTi V, of dimension Ti K;Za;i za,i;1; za;i;2; . . . ; za;i;TiV, of dimension Ti q matrixand

    Z;i

    z0;i;1 0: : : 0

    0 z0;i;2: : : 0

    ..

    . ...

    . .. ..

    .

    : : : 0

    0BBB@

    1CCCA: 0iWith these terms, we define the longitudinal data

    mixed model as

    yit zVa;i;tai zVk;i;tlt xVitb eit;t 1; . . . ; Ti; i 1; . . . ; n: 2:5Here, ai ai;1; . . . ; ai;qV is a q 1 vector of sub-ject-specific terms and za;i;t za;i;t;1; . . . ; za;i;t;qV isthe corresponding vector of covariates. Similarly, lt=(kt;1; . . . ; kt;rVis a r 1 vector of time-specific termsand zk;i;t zk;i;t;1; . . . ; zk;i;t;rVis the correspondingvector of covariates. We use the notation t = 1; . . . ; Tito indicate the unbalanced nature of the data. Without

    the time-specific terms, this model was introduced by

    Laird and Ware (1982) and is widely used in the

    biological sciences (Diggle, Liang, & Zeger, 1994).

    We have added the time-specific terms to provide

    another mechanism for handling temporal, or dynam-

    ic, patterns. We allow the time-specific term to be a

    vector for symmetry with the subject-specific terms

    and to handle some special cases described in Section

    3 where we give more details of the assumptions of

    nal of Forecasting 20 (2004) 99114 1010 0 z;i;Ti

  • l Jourof dimension Ti rT , where 0i is a Ti rT Tizero matrix. Finally, L=(L1,. . . , LT)Vis the rT 1vector of time-specific coefficients.

    We assume that sources of variability, ei;ai andlt , are mutually independent and mean zero. Thenon-zero means are accounted for in the B para-meters. The disturbances are independent bet-

    ween subjects, yet we allow for serial correlation

    and heteroscedasticity through the notation Var ei Ri. Further, we assume that the subject-specific effects{ai} are random with variancecovariance matrix D,a q q positive definite matrix. Time-specific effectsL have variancecovariance matrix Sk, a rT rTpositive definite matrix. With this notation, we may

    express the variance of each subject as Var yi Va;i+Zl;iSlZl;iV where

    Va;i Za;i DZa;iV Ri: 3:2

    3.2. Forecasting for the longitudinal data mixed

    model

    For forecasting, we wish to predict

    yi;TiL zVa;i;Ti Lai zVl;i;Ti L lTi L ei;Ti L 3:3

    for L lead time units in the future. We use results for

    best linear unbiased prediction (BLUP) for the mixed

    linear model; see Robinson (1991) or Frees, Young,

    and Luo (1999, 2001) for recent reviews. A BLUP is

    the best linear combination of responses that is

    unbiased and has the smallest mean square error over

    the class of all linear, unbiased predictors. When using

    available data to approximate random variables, such

    as yi;TiL , we use the term prediction in lieu ofestimation.

    To calculate these predictors, we use the sum of

    squares SZZ P

    ni1 ZVl ; iV

    1a;i Zl ; i. We summarize

    the results in the following proposition. The details

    of the derivation are in Appendix A.

    Proposition. Consider the longitudinal data mixed

    model described in Section 3.1. Then, the best linear

    unbiased predictor L is

    lBLUP SZZ 1l 1Xn

    ZVl;iV1a;i ei;GLS 3:4

    xVi;Ti Lb

    E.W. Frees, T.W. Miller / Internationa102i1with residuals ei,GLS=yiXibGLS and bGLS is thegeneralized least squares estimator of B. The bestlinear unbiased predictors for ei and ai are,

    ei;BLUP RiV1a;i ei;GLS Zl;ilBLUP 3:5

    and

    ai;BLUP DZ Va;iR1i ei;BLUP: 3:6

    Further, the best linear unbiased predictor of yi;TiL is

    yi;TiL x Vi;TiLbGLS z Va;i;TiLai;BLUP

    zVl;iTiLCovlTiL;lV 1l lBLUP

    Covei;TiL; eiVR1i ei;BLUP: 3:7

    Remarks. We may interpret the BLUP forecast as

    arising from three components. The first component,

    xVi;TiLbGLS zVa;i;TiLai;BLUP, is due to the condition-al mean. The second component, zVk;i;TiLCovlTiL;lV 1l EBLUP , is due to time-varying coefficients.The third component, Covei;TiL; eiVR1i ei;BLUP, is aserial correlation correction term, analogous to a

    result due to Goldberger (1962); see Example 1.2

    below. An expression for the variance of the forecast

    error, Var yi;TiL yi;TiL

    , is available from the

    authors.

    3.3. Forecasting for special cases of the longitudinal

    data mixed model

    The Proposition provides sufficient structure to

    calculate forecasts for a wide variety of models. Still,

    it is instructive to interpret the BLUP forecast in a

    number of special cases. We first consider the case of

    independent and identically distributed time-specific

    components {Lt}.

    Example 1. (Random time-specific components). We

    consider the special case where {Lt} are i.i.d. andassume that Ti L > T . Thus, from Eq. (3.7), wehave the BLUP forecast of yi;TiL is

    yi;TiL xVi;TiLbGLS zVa;i;TiL ai;BLUP1

    nal of Forecasting 20 (2004) 99114 Covei;TiL; eiVRi ei;BLUP: 3:8

  • l JourSuppose further that the disturbance terms are serially

    uncorrelated so that Covei;TiL; ei 0 . As animmediate consequence of Eq. (3.8), we have

    yi;TiL xVi;TiLbGLS zVa;i;TiLai;BLUP:

    We note that even when {lt } are i.i.d., the time-specific components appear in ai,BLUP. Thus, the

    presence of Lt changes the forecasts.

    Example 1.1. (No time-specific components) We

    now consider the case of no time-specific component

    lt . Here, using Eq. (3.8), the BLUP forecast ofyi;TiL is

    yi;TiL xVi;TiLbGLS zVa;i;TiLai;BLUP Covei;TiL; eiVV1a;i yi XibGLS

    where, from Eqs. (3.5) and (3.6), ai;BLUP DZVa;iV1a;i yi XibGLS . To help further interpret this case,consider:

    Example 1.2. (AR(1) serial correlation) An interest-

    ing special case that provides a great deal of intuition

    is the case where we assume autoregressive of order 1

    (AR(1)), serially correlated errors. For example,

    Baltagi and Li (1991, 1992) considered this serially

    correlated structure in the error components model

    (q 1 ) in the balanced data case. More generally,from Eq. (3.7), it can be checked that

    yi;TiL xVi;TiLbGLS zVi;TiLai;BLUP qLeiTi;BLUP:Thus, the L step forecast equals conditional mean,

    with the correction factor of qL times the most recentBLUP residual. This result was originally given by

    Goldberger (1962) in the context of ordinary

    regression without random effects (that is, assuming

    D 0).

    Example 1.3. (Time-varying coefficients) Suppose

    that the model is

    yit xVit t eit;

    where f tg are i.i.d. We can re-write this as:

    B

    B

    E.W. Frees, T.W. Miller / Internationayit zVk;i;tlt xitV eit;Bwhere E t ;lt t and zk;i;t xi;t . With thisnotation and Eq. (3.8), the forecast of yi;TiL isyi;TiL xVi;TiLbGLS.

    Example 1.4. (Two-way error components model)

    Consider the basic two-way model given in Eq.

    (2.1). As in Example 1.2, we have that q r 1 andD r2a and za;i;TiL 1. Thus, from Eq. (3.8), wehave that the BLUP forecast of yi;TiL isyi;TiL ai;BLUP x Vi;TiLbGLS:For additional interpretation, we assume balanced

    data so that Ti T as in Baltagi (1988) and Koning(1989); see also Baltagi (1995, p. 38). To ease

    notation, define f Tr2a r2 Tr2a

    . Then, it can be

    shown that

    yi;TiL xVi;TiLbGLS f

    yi xVibGLS

    n 1 f r2k

    r2 n 1 f r2ky xVbGLS

    :

    Example 2. (Random walk model) Through minor

    modifications, other temporal patterns of common, yet

    unobserved, components can be easily included. For

    this example, we assume that r 1; fktg are i.i.d., sothat the partial sum process {k1 k2 : : : kt} is arandom walk process. Thus, the model is

    yit zVa;i;tai k1 k2 : : : kt xViteit; t 1; . . . ; Ti; i 1; . . . ; n:

    Stacking over t, this can be expressed in matrix form

    as Eq. (3.1) where the Ti T matrix Zl;i is a lowertriangular matrix of 1s for the first Ti rows, and zero

    elsewhere. That is,

    Zl;i

    1 0 0 : : : 0 0 : : : 01 1 0 : : : 0 0 : : : 01 1 1 : : : 0 0 : : : 0... ..

    . ...

    O 0 0 : : : 01 1 1 : : : 1 0 : : : 0

    0BBBB@

    1CCCCA

    1

    Then, it can be shown that

    yi;TiL xVi;TiLbGLS Xt

    s1 lt;BLUP

    zVa;i;TiLai;BLUP

    BB BB

    B

    2

    Ti

    nal of Forecasting 20 (2004) 99114 103Covei;TiL; qiVR1i ei;BLUP:

  • 4. Case study: Forecasting Wisconsin lottery sales

    In this section, we forecast the sale of state

    lottery tickets from 50 postal (ZIP) codes in Wis-

    consin. Lottery sales are an important component of

    state revenues. Accurate forecasting helps in the

    budget planning process. Further, a model is useful

    in assessing the important determinants of lottery

    sales. Understanding the determinants of lottery

    sales is useful for improving the design of the

    lottery sales system and making decisions about

    numbers of retail sales licenses to grant within

    postal (ZIP) codes.

    ume. Higher lottery jackpots lead to higher sales.

    Ticket sales should be higher in areas with higher

    population. Ticket sales should be higher in areas

    better served by online ticket retailers; i.e. higher

    numbers of retailers should lead to higher sales.

    Lower income, less educated people may buy more

    lottery tickets per capita than higher income, more

    educated people. Senior citizens may buy more

    lottery tickets per person than people in other age

    groups. The thinking here is that seniors have more

    free time to engage in recreational and gaming

    activities.

    Table 1 lists economic and demographic character-

    istics that we consider in this analysis. Much of the

    ZIP c

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 991141044.1. Sources and characteristics of data

    State of Wisconsin lottery administrators provid-

    ed weekly lottery sales data. We consider online

    lottery tickets that are sold by selected retail estab-

    lishments in Wisconsin. These tickets are generally

    priced at $1.00, so the number of tickets sold

    equals the lottery revenue. We analyze lottery sales

    (ZOLSALES) over a 40-week period, April, 1998

    through January, 1999, from 50 ZIP codes random-

    ly selected from more than 700 ZIP codes within

    the state of Wisconsin. We also consider the num-

    ber of retailers within a ZIP code for each time

    (NRETAIL).

    A budding literature, such as Ashley, Liu, and

    Chang (1999), suggests variables that influence

    lottery sales. In developing models for lottery sales

    we can draw upon this literature and anecdotal

    evidence concerning the determinants of sales vol-

    Table 1

    Lottery, economic and demographic characteristics of 50 Wisconsin

    Lottery characteristics

    ZOLSALES

    NRETAIL

    Economic and demographic characteristics

    PERPERHH

    MEDSCHYR

    OOMEDHVL

    PRCRENT

    PRC55P

    HHMEDAGE

    CEMIPOPULATempirical literature on lotteries is based on annual data

    that examine the state as the unit of analysis. In

    contrast, we examine much finer economic units, the

    ZIP code level, and weekly lottery sales. The eco-

    nomic and demographic characteristics were abstract-

    ed from the 1990 and 1995 United States census, as

    organized and distributed by the Direct Marketing

    Education Foundation. These variables summarize

    characteristics of individuals within ZIP codes at a

    single point in time and thus are not time-varying.

    Table 2 summarizes the economic and demograph-

    ic characteristics of 50 Wisconsin ZIP codes. To

    illustrate, for the population variable (POPULAT),

    we see that the smallest ZIP code contained 280

    people whereas the largest contained 39 098. The

    average, over 50 ZIP codes, was 9311.04. Table 2

    also summarizes average online sales and average

    number of retailers. Here, these are averages over 40

    weeks. To illustrate, we see that the 40-week average

    odes

    Online lottery sales to individual consumers

    Number of listed retailers

    Persons per household times 10

    Median years of schooling times 10

    Median home value in $100 s for owner-occupied homes

    Percent of housing that is renter occupied

    Percent of population that is 55 or older

    Household median age

    Estimated median household income, in $100 sPopulation

  • of online sales was as low as $189 and as high as similar to Fig. 1, do not show dynamic patterns of

    Table 2

    Summary statistics of lottery, economic and demographic characteristics of 50 Wisconsin ZIP codes

    Variable Mean Median Standard Minimum Maximum

    deviation

    Average 6494.83 2426.41 8103.01 189 33 181

    ZOLSALES

    Average 11.94 6.36 13.29 1 68.625

    NRETAIL

    PERPERHH 27.06 27 2.09 22 32

    MEDSCHYR 126.96 126 5.51 122 159

    OOMEDHVL 570.92 539 183.73 345 1200

    PRCRENT 24.68 24 9.34 6 62

    PRC55P 39.70 40 7.51 25 56

    HHMEDAGE 48.76 48 4.14 41 59

    CEMI 451.22 431 97.84 279 707

    POPULAT 9311.04 4405.5 11 098 280 39 098

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114 105$33 181.

    It is possible to examine cross-sectional relation-

    ships between sales and economic/demographic char-

    acteristics. For example, Fig. 1 shows a positive

    relationship between average online sales and popu-

    lation. Further, the ZIP code corresponding to city of

    Kenosha, Wisconsin has unusually large average

    sales for its population size. However, cross-sectional

    relationships alone, such as correlations and plotsFig. 1. Scatter plot of average lottery sales versus population size.sales.

    Fig. 2 is a multiple time series plot of logarithmic

    (weekly) sales over time. Here, each line traces the

    sales patterns for a ZIP code. This figure shows the

    increase in sales for most ZIP codes, at approximately

    weeks eight and 18. For both time points, the jackpot

    prize of one online game, PowerBall, grew to an

    amount in excess of $100 million. Interest in lotteries,

    and sales, increases dramatically when jackpot prizesSales for Kenosha are unusually large for its population size.

  • E.W. Frees, T.W. Miller / International Jour106reach large amounts. Moreover, Fig. 2 suggests a

    dynamic pattern that is common to all ZIP codes.

    Specifically, logarithmic sales for each ZIP code are

    relatively stable with the same approximate level of

    variability.

    Another form of the response variable to consider

    is the proportional, or percentage, change. Specifical-

    ly, define the percentage change to be

    pchangeit 100salesit

    salesi;t1 1

    : 4:1

    A multiple times series plot of the percentage changes,

    not displayed here, shows autocorrelated serial pat-

    terns. We consider models of this transform of the

    series in the following subsection on model selection.

    4.2. In-sample model specification

    This subsection considers the specification of a

    model, a necessary component prior to forecasting.

    We decompose model specification criteria into two

    components, in-sample and out-of-sample criteria. To

    this end, we partition the data into two subsamples:

    Fig. 2. Multiple time series plot of lognal of Forecasting 20 (2004) 99114we use the first 35 weeks to develop alternative

    models and estimate parameters, and we use the last

    5 weeks to predict our held-out sample. The choice

    of 5 weeks for the out-of-sample validation is some-

    what arbitrary; it was made with the rationale that

    lottery officials consider it reasonable to try to predict

    5 weeks of sales based on 35 weeks of historical sales

    data.

    Our first forecasting model is an ordinary regres-

    sion model

    yit a xVit eit;

    where the intercept is common to all subjects, also

    known as a pooled cross-sectional model. The model

    fits the data well; the coefficient of determination turns

    out to be R2 69:6%: The estimated regression coef-ficients appear in Table 3. From the corresponding t-

    statistics, we see that each variable is statistically

    significant.

    Our second forecasting model is an error compo-

    nents model

    yit ai xVit eit;

    arithmic (base 10) lottery sales.

    B

    B

  • Error

    mode

    Param

    estim

    18.09

    0.120.100.00

    0.02

    0.070.11

    0.00

    0.12

    0.02

    0.60

    0.26

    espon

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114 107where the intercept varies according to subject. Table

    3 provides parameter estimates and the corresponding

    t-statistics, as well as estimates of the variance com-

    Table 3

    Lottery model coefficient estimates

    Variable Pooled cross-sectional

    model

    Parameter t-Statistic

    estimate

    Intercept 13.821 10.32

    PERPERHH 0.108 6.77MEDSCHYR 0.082 11.90OOMEDHVL 0.001 5.19

    PRCRENT 0.032 8.51

    PRC55P 0.070 5.19HHMEDAGE 0.118 5.64

    CEMI 0.004 8.18

    POP/1000 0.057 9.41

    NRETAIL 0.021 5.22

    Var a (r2a)Var e (r2e ) 0.700AR(1) corr (q)

    AIC 4353.25

    Based on in-sample data of n=50 ZIP codes and T=35 weeks. The rponents, r2a and r2e. As is common in longitudinal data

    analysis, allowing intercepts to vary by subject can

    result in regression coefficients for other variables

    becoming statistically insignificant.

    When comparing this model to the pooled cross-

    sectional model, we may use the Lagrange multiplier

    test described in Baltagi (1995, Chapter 3). The test

    statistic turns out to be TS 11; 395:5, indicating thatthe error components model is strongly preferred to

    the pooled cross-sectional model. Another piece of

    evidence is Akaikes Information Criterion (AIC). The

    smaller this criterion, the more preferred is the model.

    Table 3 shows that the error components model is

    preferred compared to the pooled cross-sectional

    model based on the smaller value of the AIC statistic.

    To assess further the adequacy of the error compo-

    nents model, we calculated residuals from the fitted

    model and examined diagnostic tests and graphs. One

    such diagnostic graph, not displayed here, is a plot of

    residuals versus lagged residuals. This graph shows a

    strong relationship between residuals and lagged

    residuals which we can represent using an autocorre-

    lation structure for the error terms. The graph alsoshows a strong pattern of clustering corresponding to

    weeks with large PowerBall jackpots. A variable that

    captures information about the size of PowerBall

    components Error components model

    l with AR(1) term

    eter t-Statistic Parameter t-Statistic

    ate estimate

    6 2.47 15.255 2.18

    9 1.45 0.115 1.368 2.87 0.091 2.531 0.50 0.001 0.81

    6 1.27 0.030 1.53

    3 0.98 0.071 1.019 1.02 0.120 1.09

    5 1.55 0.004 1.58

    1 4.43 0.080 2.73

    7 1.56 0.004 0.20

    7 0.528

    3 0.279

    0.555 25.88

    2862.74 2269.83

    se is (natural) logarithmic sales.jackpots would help in developing a model of lottery

    sales. However, for forecasting purposes, we require

    one or more variables that anticipates large PowerBall

    jackpots. That is, because the size of PowerBall

    jackpots is not known in advance, variables that proxy

    the event of large jackpots are not suitable for fore-

    casting models. These variables could be developed

    through a separate forecasting model of PowerBall

    jackpots.

    Other types of random effects models for fore-

    casting lottery sales could also be considered. To

    illustrate, we also fit a more parsimonious version of

    the AR(1) version of the error components model;

    specifically, we fit this model, deleting those varia-

    bles with insignificant t-statistics. It turned out that

    this fitted model did not perform substantially better

    in terms of overall model fit statistics such as AIC.

    We explore alternative transforms of the response

    when examining a held-out sample in the following

    subsection.

    Table 4 reports the estimation results from fitting

    the two-way error components model in Eq. (2.1),

    with and without an AR(1) term. For comparison

  • Two-

    comp

    Param

    estim

    16.4

    0.10.00.0

    0.0

    0.00.1

    0.0

    0.0

    0.0

    0.5

    0.0

    0.2

    espon

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114108purposes, the fitted coefficient for the one-way

    Table 4

    Lottery model coefficient estimates

    Variable One-way error components

    model with AR(1) term

    Parameter t-Statistic

    estimate

    Intercept 15.255 2.18

    PERPERHH 0.115 1.36MEDSCHYR 0.091 2.53OOMEDHVL 0.001 0.81

    PRCRENT 0.030 1.53

    PRC55P 0.071 1.01HHMEDAGE 0.120 1.09

    CEMI 0.004 1.58

    POP/1000 0.001 2.73

    NRETAIL 0.004 0.20

    Var ar2a 0.528Var er2e 0.279Var kr2kAR(1) corr q 0.555 25.88

    AIC 2270.97

    Based on in-sample data of n=50 ZIP codes and T=35 weeks. The rmodel with an AR(1) term are also presented in

    this table. As in Table 3, we see that the model

    selection criterion, AIC, indicates that the more

    complex two-way models provide an improved fit

    compared to the one-way models. As with the one-

    way models, the autocorrelation coefficient is statis-

    tically significant even with the time-varying pa-

    rameter kt. In each of the three models in Table 4,only the population size (POP) and education levels

    (MEDSCHYR) have a significant effect on lottery

    sales.

    4.3. Out-of-sample model specification

    This subsection compares the ability of several

    competing models to forecast values outside of the

    sample used for model parameter estimation. As in

    Section 4.2, we use the first 35 weeks of data to

    estimate model parameters. The remaining 5 weeks

    are used to assess the validity of model forecasts. For

    each model, we compute forecasts of lottery sales for

    weeks 36 through 40, by ZIP code level, based on the

    first 35 weeks. Denote these forecast values as

    ZOLSALESi;35L , for L=1 to 5. We summarize theaccuracy of the forecasts through two statistics, the

    way error Two-way error components

    onents model model with AR(1) term

    eter t-Statistic Parameter t-Statistic

    ate estimate

    77 2.39 15.897 2.31

    21 1.43 0.118 1.4098 2.79 0.095 2.7001 0.71 0.001 0.75

    28 1.44 0.029 1.49

    71 1.00 0.072 1.0218 1.06 0.120 1.08

    04 1.59 0.004 1.59

    01 5.45 0.001 4.26

    09 1.07 0.003 0.26

    64 0.554

    22 0.024

    41 0.241

    0.518 25.54

    1109.61 1574.02se is (natural) logarithmic sales.mean absolute error

    MAE 15n

    Xni1

    X5L1

    ZOLSALESi;35L

    ZOLSALESi;35Lj 4:2and the mean absolute percentage error

    MAPE 1005n

    Xni1

    X5L1

    ZOLSALESi;35L ZOLSALESi;35L

    ZOLSALESi;35L4:3

    The several competing models include the models

    of logarithmic sales summarized in Tables 3 and 4.

    Because the autocorrelation term appears to be

    highly statistically significant in Table 3, we also

    fit a pooled cross-sectional model with an AR(1)

    term. Further, we fit two modifications of the error

    components model with the AR(1) term. In the first

    case we use lottery sales as the response (not the

    logarithmic version) and in the second case we use

    percentage change of lottery sales, defined in

  • Eq. (4.1), as the response. Finally, we also consider a

    basic fixed effects model,

    yit ai eit;with an AR(1) error structure. For fixed effects

    models, the term ai is treated as a fixed parameter,not a random variable. Because this parameter is

    time-invariant, it is not possible to include our time-

    invariant demographic and economic characteristics

    as part of the fixed effects model.

    Table 5 presents the model forecast criteria in Eqs.

    (4.2) and (4.3) for eachmodel.We first note that Table 5

    re-confirms the point that the AR(1) term improves

    each model. Specifically, for the pooled cross-section-

    al, as well as one-way and two-way error components

    4.4. Forecasts

    We now forecast using the model that provides the

    best fit to the data, the error components model with

    an AR(1) term. The forecasts and forecast intervals for

    this model are a special case of the results for the more

    general longitudinal data mixed model, given in Eqs.

    (3.6) and Appendix A, respectively.

    Fig. 3 displays the forecasts and forecast intervals.

    Here, we use T 40 weeks of data to estimateparameters and provide forecasts for L 5 weeks.Calculation of the parameter estimates, point fore-

    casts and forecast intervals were done using loga-

    rithmic sales as the response. Then, point forecasts

    and forecast intervals were converted to dollars to

    el res

    arithm

    arithm

    arithm

    arithm

    s

    entag

    arithm

    arithm

    arithm

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114 109models, the version with an AR(1) term outperforms

    the analogous model without this term. Table 5 also

    shows that the one-way error components model dom-

    inates the pooled cross-sectional model. This was also

    anticipated by our pooling test, an in-sample test

    procedure. Somewhat surprisingly, the two-way model

    did not perform as well as the one-way model.

    Table 5 confirms that the error components

    model with an AR(1) term with logarithmic sales

    as the response is the preferred model, based on

    either the MAE or MAPE criterion. The next best

    model was the corresponding fixed effects model. It

    is interesting to note that the models with sales as

    the response outperformed the model with percent-

    age change as the response based on the MAE

    criterion, although the reverse is true based on the

    MAPE criterion.

    Table 5

    Out-of-sample forecast comparison of nine alternative models

    Model Mod

    Pooled cross-sectional model Log

    Pooled cross-sectional model with AR(1) Log

    term

    Error components model Log

    Error components model with AR(1) term Log

    Error components model with AR(1) term Sale

    Error components model with AR(1) term Perc

    Fixed effects model with AR(1) term Log

    Two-way error components model Log

    Two-way error components model with LogAR(1) termdisplay the ultimate impact of the model forecasting

    strategy.

    Fig. 3 shows the forecasts and forecast intervals for

    two selected postal codes. The lower forecast repre-

    sents a postal from Dane County whereas the upper

    represents a postal code from Milwaukee. For each

    postal code, the middle line represents the point

    forecast and the upper and lower lines represent the

    bounds on a 95% forecast interval. When calculating

    this interval, we applied a normal curve approxima-

    tion, using the point forecast plus or minus 1.96 times

    the standard error. Compared to the Dane County

    code, the Milwaukee postal code has higher forecast

    sales. Thus, although standard errors on a logarithmic

    scale are about the same as Dane County, this higher

    point forecast leads to a larger interval when rescaled

    to dollars.

    ponse Model forecast criteria

    MAE MAPE

    ic sales 3012.68 83.41

    ic sales 680.64 21.19

    ic sales 1318.05 33.85

    ic sales 571.14 18.79

    1409.61 140.25

    e change 1557.82 48.70

    ic sales 584.55 19.07

    ic sales 1257.21 33.14

    ic sales 1202.97 32.47

  • E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 991141104.5. Sales and marketing applications

    The lottery sales example has a structure similar

    to many sales and marketing applications. Sales data

    are organized in timedays, weeks, months, or

    years. Sales data are organized in spacecountry,

    geographical region, sales areas or districts. We

    model sales volume as a function of market con-

    ditions and the marketing mix. Market conditions

    include information about competitors, substitute

    products, and economic climate. Marketing mix

    refers to the actions that firms take with regard to

    product features, pricing, distribution, advertising,

    and promotion. Like sales response, most explanato-

    ry variables vary in time and space.

    Consider the case of supermarket scanner data

    collected weekly from thousands of stores across the

    United States. To forecast sales volume for a partic-

    ular product, say a frozen dinner, in a particular

    store, researchers build upon observed sales volumes

    for all products in the frozen dinner category. Prod-

    Fig. 3. Forecast intervals for two selected postal codes. For each postal co

    upper and lower lines correspond to endpoints of 95% prediction intervaluct price, discount coupon, promotion, and advertis-

    ing data would be collected, and appropriate

    explanatory variables defined. In building a forecast-

    ing model for a food manufacturer or retail chain, a

    researcher might combine data across stores within

    cities and across products within brands. Models at

    various levels of aggregation can be built from

    longitudinal data. Leeflang, Wittink, Wedell, and

    Naert (2000) and Hanssens, Parsons, and Schultz

    (2001) discuss traditional time series and regression

    approaches to scanner data analysis. Longitudinal

    data mixed models, as discussed in this paper,

    represent a set of flexible, dynamic models for sales

    and marketing applications of this type.

    5. Summary and concluding remarks

    This article considers the longitudinal data mixed

    model, a class of models that extends the traditional

    two-way error components longitudinal data models

    de, the middle line corresponds to point forecasts for 5 weeks. The

    s.

  • yet still falls with the framework of a mixed linear

    model. In particular, the theory allows us to consider

    unbalanced data, slopes that may vary by subject or

    time and parametric forms of heteroscedasticity as

    well as serial correlation. Forecasts for these models

    are derived as special cases of best linear unbiased

    predictors, BLUPs, together with the variance of the

    forecast errors.

    The theory provides optimal predictions assuming

    that the variance components are known. If estima-

    forecasting practice. To illustrate how this theory may

    be applied, this article considers forecasting Wisconsin

    E.W. Frees, T.W. Miller / International Jourlottery sales. We examined a variety of model specifi-

    cations to arrive at a simple one-way error components

    with an AR(1) term. This simplemodel provides a good

    fit to the data. In subsequent work, we intend to

    investigate more complex models in order to realize

    more useful forecasts of future sales. One direction that

    subsequent research may take is to examine longitudi-

    nal data models with spatial, as well as time-series,

    error components.

    Appendix A. Inference for the longitudinal data

    mixed model

    To express the model more compactly, we use

    the mixed linear model specification. Thus, define

    y yV1; yV2; . . . ; yVn V; e eV1 ; eV2 ; . . . ; eVn V; a aV1;aV2 ; . . . ;aVn V,

    X

    X1

    X2

    X3

    ..

    .

    0BBBBBBBBBBBB@

    1CCCCCCCCCCCCA

    ; Zl

    ZE;1

    ZE;2

    ZE;3

    ..

    .

    0BBBBBBBBBBBB@

    1CCCCCCCCCCCCA

    andtors replace the true variance components, then the

    mean squared error of the predictors increases. The

    magnitude of this increase, as well as approximate

    adjustments, have been studied by Kackar and Har-

    ville (1984), Harville and Jeske (1992) and Baillie

    and Baltagi (1999).

    The theory is substantially broader than prevailingXn ZE;nZa

    Za;1 0 0 : : : 00 Za;2 0 : : : 00 0 Za;3 : : : 0

    ..

    . ... ..

    .O ..

    .

    0 0 0 : : : Za;n

    0BBBBB@

    1CCCCCA:

    With these choices, the longitudinal data mixed

    model in Eq. (2.1) can be expressed as a mixed linear

    model, given by

    y Zaa Zll Xb e: A:0

    Further, we also use the notation R=Var

    E=blockdiag(R1,. . ., Rn) and note that Var A=In D. With this notation, we may express the variance

    covariance matrix of y as

    Var y V ZaIn DZaV ZEREZEV R: A:1

    Moreover, define

    Va R ZaIn DZaV

    blockdiagVa;1; . . . ;Va;n;

    where Va;i is defined in Eq. (3.2). With this notation,we use standard matrix algebra results to write

    V1 Va ZEREZEV1

    V1a V1a ZEZEVV1a ZE R1E 1ZEVV1a : A:2

    For best linear unbiased prediction (BLUP) in the

    mixed linear model, suppose that we observe an N 1random vector y with mean E y=X B and variance Vary=V. The generic goal is to predict a random variable

    w, such that E w cVB and Var w r2w. Denote thecovariance between w and y as Cov(w, y). The BLUP

    of w is

    wBLUP cVbGLS Covw; yVV1 y XbGLS

    nal of Forecasting 20 (2004) 99114 111A:3

  • where bGLS XVV1X 1

    XVV1y. The mean squareerror is

    VarwBLUP w c V Covw; yVV1X

    XVV1X 1 c V Covw; yVV1X V Covw; yVV1Covw; y r2 : A:4

    zVl; i; Ti LCovlTiL;lVRl lBLUP :A:9

    Similarly, from Eq. (A.7), we have

    CovzVa;i;TiLai; yVV1 y XbGLS

    E.W. Frees, T.W. Miller / International Jour112w

    Proof of the Proposition

    We first derive the BLUP predictor of L. Let cL bean arbitrary vector of constants and setw=cLVL. For thischoice of w, we have ;=0. From Eq. (A.2), we have

    clVlBLUP clVCovl;ZllVV1y XbGLS:

    Using Eq. (A.2) and SZZ Pni1

    ZVl ;iV1a;iZl;i ZVlV1a Zl, we have

    RlZlVV1 RlZlVV1a V1a ZlZlVV1a Zl R1l 1ZlVV1a

    RlI SZZSZZ R1l 1ZlVV1a SZZ R1l 1ZlVV1a :

    Thus,

    kBLUP RlZlVV1y XbGLS SZZ R1l 1ZlVV1a yXbGLS; A:5

    which is sufficient for Eq. (3.4).

    To derive the BLUP predictor of ei , let ce be anarbitrary vector of constants and set w=ceV ei . With; = 0, we have

    CovceVei; yj ceVRi for j i0 for j p i :

    Using this, Eqs. (A.2) and (A.3) yield

    ceVei;BLUP CovceVei; yVV1y XbGLS CovceVei; yVV1a y XbGLS CovceVei; yVV1a ZlSZZ

    1 1 1 Rl ZlVVa y XbGLSUsing Walds device, we have the BLUP of ei,given in Eq. (3.5).

    The derivation for the BLUP of Ai is similar. LetcA be an arbitrary vector of constants and set w=cAVAi.This yields

    CovcaVei; yj caVRi for j i0 for j p i :

    Using this, Eqs. (A.2), (A.3) and (3.4), we have

    caVai;BLUP CovcaVai; yVV1y XbGLS CovcaVai; yVV1a y XbGLS CovcaVai; yVV1a ZkSZZ R1l 1ZlVV1a y XbGLS

    cVaDZVa;iV1a;i ei;GLS Zl; iBLUP

    ; A:7

    which is sufficient for Eq. (3.6).

    Now, to calculate the BLUP forecast of yi;TiL inEq. (3.7), we wish to predict w yi;TiL . With thischoice of w, we have ; xi;TiL. Now, we examineCovyi;TiL; y CovzVa;i;TiLai; y

    CovzVl ;i ; Ti L lTiL; y Covei;TiL; y: A:8

    Using Eqs. (A.0) and (A.5), we have

    CovzVl; i; Ti LlTiL; yVV1 y XbGLS zVl; i; TiLCovlTiL;lVZVlV1 y XbGLS

    1

    CovceVei; yVV1a y XbGLS CovceVei; yVV1a ZllBLUP

    ceVRiV1a ;i yi XibGLS Zk;ikBLUP

    : A:6

    nal of Forecasting 20 (2004) 99114 zVa;i;TiLai;BLUP A:10

  • as in Eq. (3.7). This is sufficient for the proof of the

    Statistical Association, 57, 369375.

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114 113Proposition.

    References

    Ailawadi, K. L., & Neslin, S. A. (1998). The effect of promotion on

    consumption: Buying more and consuming it faster. Journal of

    Marketing Research, 35, 390398.

    Ashley, T., Liu, Y., & Chang, S. (1999). Estimating net lottery

    revenues for states. Atlantic Economics Journal, 27, 170178.

    Baillie, R. T., & Baltagi, B. H. (1999). Prediction from the regres-

    sion model with one-way error components. In Hsiao, C.,

    Lahiri, K., Lee, L., & Pesaran, M. H. (Eds.), Analysis of panels

    and limited dependent variable models. Cambridge, UK: Cam-

    bridge University Press.

    Baltagi, B. H. (1988). Prediction with a two-way error component

    regression model. Problem 88.1.1. Econometric Theory, 4,

    171.

    Baltagi, B. H. (1995). Econometric analysis of panel data.

    NY: Wiley.

    Baltagi, B. H., & Li, Q. (1991). A transformation that will circum-

    vent the problem of autocorrelation in an error-component

    model. Journal of Econometrics, 48(3), 385393.

    Baltagi, B. H., & Li, Q. (1992). Prediction in the one-way error

    component model with serial correlation. Journal of Forecast-

    ing, 11, 561567.

    Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error

    components model for prediction of county crop areas using

    survey and satellite data. Journal of the American Statisticaland, similar to Eq. (A.6), we have

    Covei;TiL; yVV1 y XbGLS Covei;TiL; eiVR1i ei;BLUP: A:11

    Thus, from Eqs. (A.3), (3.3) and (A.8)(A.11), the

    BLUP forecast of yi;TiL is

    yi;TiL xVi;TiLbGLS Covyi;TiL; yVV1 y XbGLS

    xVi;TiLbGLS z Vl;i;TiL CovlTiL;lVS1l lBLUP z Va ; i; Ti Lai;BLUP Covei;TiL; eiVR1i ei;BLUP;Association, 83, 2836.Hanssens, D. M., Parsons, L. J., & Schultz, R. L. (2001). Market

    response models: econometric and time series analysis (2nd

    edition). Boston: Kluwer.

    Harville, D. (1976). Extension of the GaussMarkov theorem to

    include the estimation of random effects. Annals of Statistics, 2,

    384395.

    Harville, D., & Jeske, J. R. (1992). Mean square error of estimation

    or prediction under a general linear model. Journal of the Amer-

    ican Statistical Association, 87, 724731.

    Kackar, R. N., & Harville, D. (1984). Approximations for standard

    errors of estimators of fixed and random effects in mixed linear

    models. Journal of the American Statistical Association, 79,

    853862.

    Koning, R. H. (1989). Prediction with a two-way error component

    regression model. Solution 88.1.1. Econometric Theory, 5, 175.

    Laird, N. M., & Ware, J. H. (1982). Random-effects models for

    longitudinal data. Biometrics, 38, 963974.

    Leeflang, P. S. H., Wittink, D. R., Wedel, M., & Naert, P. A. (2000).

    Building models for marketing decisions. Boston: Kluwer.

    Littell, R. C., Milliken, G. A., Stroup, W. W., & Wolfinger,

    R. D. (1996). SAS system for mixed models. Cary, North

    Carolina: SAS Institute.

    Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S

    and S-PLUS. New York: Springer.

    Robinson, G. K. (1991). The estimation of random effects. Statis-

    tical Science, 6, 1551.

    Searle, S. R., Casella, G., & McCulloch, C. E. (1992).

    Variance components. New York: John Wiley and Sons.

    Verbeke, G., & Molenbergs, G. (2000). Linear mixed models for

    longitudinal data. New York: SpringerVerlag.

    Biographies: Edward W. (Jed) FREES is a Professor of Business

    and Statistics at the University of WisconsinMadison and is

    holder of the Fortis Health Professorship of Actuarial Science. He

    is a Fellow of both the Society of Actuaries and the American

    Statistical Association. Professor Frees is a frequent contributor to

    scholarly journals. His papers have won several awards for quality,

    including the Actuarial Education and Research Funds annual

    Halmstad Prize for best paper published in the actuarial literature

    (three times). Professor Frees currently is the Editor of the North

    American Actuarial Journal and an Associate Editor for Insurance:

    Mathematics and Economics. The National Science Foundation

    (Grant Number SES-0095343) provided funding to support thisDiggle, P. J., Liang, K. -Y., & Zeger, S. L. (1994). Analysis of

    longitudinal data. Oxford University Press.

    Erdem, T. (1996). A dynamic analysis of market structure based on

    panel data. Marketing Science, 15, 359378.

    Frees, E. W., Young, V., & Luo, Y. (1999). A longitudinal data

    analysis interpretation of credibility models. Insurance: Mathe-

    matics and Economics, 24, 229247.

    Frees, E. W., Young, V., & Luo, Y. (2001). Credibility ratemaking

    using panel data models. North American Actuarial Journal,

    5(4), 2442.

    Goldberger, A. S. (1962). Best linear unbiased prediction in the

    generalized linear regression model. Journal of the Americanresearch.

  • Thomas W. MILLER is Director of the A.C. Nielsen Center for

    Marketing Research at the University of WisconsinMadison. He

    holds graduate degrees in psychology (PhD, psychometrics) and

    statistics (MS) from the University of Minnesota and in business

    (MBA) and economics (MS) from the University of Oregon. An

    expert in applied statistics and modeling, Tom has designed and

    conducted numerous empirical and simulation studies comparing

    traditional and data-adaptive methods. Toms current research

    includes explorations of online research methods and studies of

    consumer life-styles, choices and uses of technology products. He

    won the David K. Hardin Award for the best paper in Marketing

    Research in 2001.

    E.W. Frees, T.W. Miller / International Journal of Forecasting 20 (2004) 99114114

    Sales forecasting using longitudinal data modelsIntroductionLongitudinal data mixed modelLongitudinal data mixed model and forecastingThe longitudinal data mixed modelForecasting for the longitudinal data mixed modelForecasting for special cases of the longitudinal data mixed model

    Case study: Forecasting Wisconsin lottery salesSources and characteristics of dataIn-sample model specificationOut-of-sample model specificationForecastsSales and marketing applications

    Summary and concluding remarksInference for the longitudinal data mixed modelProof of the PropositionReferences