Page 1: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 2: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 3: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 4: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Confidentiality Clause PERMISSION

I declare that the content of this Master Dissertation can be consulted and/or reproduced if the sources

are mentioned.

Name student: Rani Torrekens


Page 5: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 6: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Nederlandse Samenvatting Business Analytics is een opkomend fenomeen die de toenemende belangrijkheid van grote

hoeveelheden gegevens weerspiegelt in termen van groeiende volumes, verscheidenheid en snelheid

(Department for Business Innovation and Skills, 2013). Analytics erkent dat we ons in een tijdperk

bevinden waar grote hoeveelheden gegevens een centrale rol spelen. Commerciële organisaties,

regeringen en gemeenschappen onderzoeken hoe ze hun grote gegevensvolumes kunnen gebruiken om

waarde te creëren (Yui, 2012). Veel organisaties merken dan ook dat de gegevens die zij bezitten en

vooral hoe ze die gebruiken een concurrentievoordeel kunnen creëren. Vandaag de dag wordt continue

gegevens- en informatieverzameling door veel organisaties als één van de belangrijkste

bedrijfsactiviteiten gezien. Deze grote hoeveelheden gegevens moeten op een passende manier worden

beheerd en geanalyseerd om er waarde uit te halen.

Een aantal onderzoekers beweren dat de groeiende aandacht die wordt verleend aan analytics een

belangrijke uitdaging en kans is voor Operationeel Onderzoek (Liberatore en Luo, 2010, Ranyard et

al., 2015, Mortenson et al., 2015). Operationele onderzoeksprofessionals zullen hun optimalisatie- en

modelleringskennis moeten toepassen in combinatie met geavanceerde analytische vaardigheden die

nodig zijn om grote hoeveelheden ongestructureerde gegevens te onderzoeken. Deze geavanceerde

analytische technieken zorgen ook voor een positieve invloed in de gehele productieketen.

Hoofdzakelijk op gebied van Onderzoek en Ontwikkeling, voorraadketenbeheer en productie en

service kan data-analyse een verschil maken. De huidige dynamische marktbehoeften hebben ervoor

gezorgd dat de ontdekking van nieuwe technologieën in de productieketen in een stroomversnelling is

gekomen. Vele traditionele benaderingen met betrekking tot de supply chain moeten herzien worden

omdat ze verouderd zijn in deze nieuwe data-omgeving (Waller & Fawcett, 2013). Dit fenomeen

wordt ook wel Industrie 4.0 genoemd. Industrie 4.0 maakt het mogelijk om waarde over de hele

productlevenscyclus vast te leggen. In deze Master thesis gaat de aandacht vooral naar één specifiek

onderdeel van Industrie 4.0, namelijk de vraagvoorspelling die gestuurd wordt door big data. Een

betere vraagvoorspelling met het oog op een kleinere voorspellingsfout is niet alleen van belang in de

productie maar in het algemeen voor de gehele supply chain. Minder onzekerheid en bijgevolg een

kleinere foutenmarge bij het voorspellen van de vraag zal ervoor zorgen dat er minder voorraad moet

worden voorzien wat uiteraard tot een betere kostenefficiëntie leidt.

Page 7: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


De eigenlijke doelstelling van deze Master thesis is om aan te tonen hoe de combinatie van data

science en analytics gebruikt kan worden om voorraadketenbeheer te verbeteren. Het potentieel van

geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study in

een Belgische farmaceutische groothandel, Multipharma. Aangezien de bottleneck van dit bedrijf de

beperkte magazijnruimte is, zoekt de Supply Chain afdeling voortdurend naar betere oplossingen om

zijn activiteiten te verbeteren zoals het continu optimaliseren van de voorspelling van de vraag en de

voorraad. Er zal worden nagegaan of nieuwe ‘demand drivers’ kunnen gemodelleerd worden door

middel van extra databronnen en zo voor een betere match tussen vraag en aanbod kunnen zorgen.

Page 8: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



This master dissertation gave me the opportunity to further explore my interest in how big

data analytics creates value in the Operations Management field. This interest was highly

triggered during my Master in Operations Management while choosing two elective courses

from the Master Data Analytics. I started to realize big data analytics is an upcoming

phenomenon in the Operations Management field and the combination of both is able to

create value in many different ways. This master dissertation was an occasion to show what I

have learned during the past five years, to critically reflect upon the frequently repeated

statement “more data is always better” and to structure the topic in my own research. It gave

me the chance to gain valuable experience in the business environment and to understand the

ups and downs of a real life project. With this research I hope I made a valuable contribution

to the literature concerning the impact of big data on an Operations Management problem.

The case study following the literature is set out in an existing wholesale company,

Multipharma. Provideor, a supply chain consultancy company, gave me the opportunity to set

out a new project within Multipharma. I would like to thank Thomas Meersseman and Steven

Raekelboom from Provideor for their support, advice and time. Furthermore, I am grateful the

Supply Chain manager of Multipharma, David Van Belle, gave me the opportunity to work in

close collaboration with the company and to provide all kind of internal information I needed

to write this dissertation. Special thanks to my promoter Broos Maenhout who assigned this

interesting topic to me and who was always prepared to give instant and meaningful advice.

Finally, I would like to thank my parents and partner to support me at any time.

Page 9: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 10: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



Confidentiality Clause ................................................................................................................. I

Nederlandse Samenvatting ........................................................................................................ II

Preface ...................................................................................................................................... IV

List of Abbreviations .............................................................................................................. VII

List of Tables ............................................................................................................................ IX

List of Figures .......................................................................................................................... XI

Introduction .............................................................................................................................. 1

Chapter 1: Big Data ................................................................................................................. 4

1.1 From databases to Big Data ................................................................................................. 4

1.2 What is Big Data? ................................................................................................................ 8

1.3 Value of Big Data ............................................................................................................... 10

Chapter 2: Operations Research .......................................................................................... 12

2.1 Evolution of Operations Research ...................................................................................... 12

2.2 Value for Operations Management .................................................................................... 15

2.3 Data-driven demand prediction .......................................................................................... 18

2.3.1 Internal enterprise data .................................................................................................................. 18

2.3.2 Causal factors ................................................................................................................................... 18

2.3.3 Model events .................................................................................................................................... 19

2.3.4 Technological characteristics ...................................................................................................... 20

2.4 Inventory control ................................................................................................................ 22

2.4.1 Measuring demand uncertainty ................................................................................................... 25

2.4.2 Measuring product availability ................................................................................................... 26

2.4.3 Safety stock formula ...................................................................................................................... 27

2.5 Conclusion .......................................................................................................................... 29

Chapter 3: Multipharma ....................................................................................................... 30

3.1 Pharmaceutical supply chain .............................................................................................. 30

3.2 Introduction to Multipharma .............................................................................................. 31

3.3 Network description ........................................................................................................... 35

Chapter 4: Methodology ........................................................................................................ 38

Chapter 5: Demand Forecast ................................................................................................ 43

5.1 Data description and operating assumptions ...................................................................... 43

5.2 Product Grouping ............................................................................................................... 44

5.2.1 General product classification ..................................................................................................... 44

Page 11: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

VI New products ........................................................................................................................... 45 Existing products .................................................................................................................... 46

5.2.2 Group levels ..................................................................................................................................... 48 IMS Classification .................................................................................................................. 48 INN/DCI Classification ........................................................................................................ 48 APB Classification ................................................................................................................. 49 GSTAT Classification ........................................................................................................... 49

5.3 Model selection .................................................................................................................. 50

5.4 Create forecast .................................................................................................................... 52

Chapter 6: Demand Drivers .................................................................................................. 56

6.1 Setting priorities ................................................................................................................. 57

6.2 Internal data ........................................................................................................................ 62

6.2.1 Passage Délégué .............................................................................................................................. 62

6.2.2 Promo O ............................................................................................................................................. 68

6.2.3 Action iU ........................................................................................................................................... 71

6.2.4 Summary of the results .................................................................................................................. 74

6.3 External data ....................................................................................................................... 75

6.3.1 Selecting search terms ................................................................................................................... 77

6.3.2 Flu-related products ....................................................................................................................... 82

6.3.3 Sunscreens......................................................................................................................................... 85

6.3.4 Mosquito products .......................................................................................................................... 87

6.3.5 Insecticides ....................................................................................................................................... 88

6.3.5 Summary of the results .................................................................................................................. 90

Chapter 7: Inventory ............................................................................................................. 92

7.1 Safety inventory ................................................................................................................. 93

7.2 Impact on costs ................................................................................................................... 98

Chapter 8: Conclusion ......................................................................................................... 101

Chapter 9: Further Research .............................................................................................. 104

References ............................................................................................................................. XIII

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Page 12: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


List of Abbreviations

ACF Autocorrelation Function

AI Artificial Intelligence

AIC Akaike's Information Criterion

APB Algemene Pharmaceutische Bond

AR Autoregressive

ARIMA Autoregressive Integrated Moving Average

B2B Business to Business

BI Business Intelligence

CNK Code National(e) Kode

CSL Cycle Service Level

DBMS(s) Database Management System(s)

DC Distribution Center

DCI Les dénominations communes internationales

ERP Enterprise Resource Planning

ESC Expected Shortage per Cycle

ESM Exponential Smoothing Model

GFT Google Flu Trends

GSTAT Groupe Statistique

IDC International Data Corporation

IDM Intermittent Demand Model

INN International Nonproprietary Names

IoT Internet of Things

IT Information Technology

JIT Just-In-Time

KDE Kernel Density Estimation

KMI Koninklijk Meteorologisch Instituut

KPI Key Performance Indicator

MA Moving Average

MAPE Mean Absolute Percentage Error

OTC Over-The-Counter

OUL Order-Up-to-Level

Page 13: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


OR Operations Research

PACF Partial Autocorrelation Function

PEC Parapharmacy

POS Point Of Sale

Promo O Promo Obligatoire

R&D Research and Development

RDBMS Relational Database Management System

RIZIV Rijksinstituut voor Ziekte- en Invaliditeitsverzekering

SCM Supply Chain Management

SKU(s) Stock Keeping Unit(s)

TCO Total Cost of Ownership

UCM Unobserved Components Model

USC Uniform System of Classification

WHO World Health Organisation

WIP Work In Progress

WMS Warehouse Management System

Page 14: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


List of Tables

6.1 Promotional Periods of Vicks Vaporub 100g

6.2 Promotional Period of Tilman Elimin Fresh Thee

6.3 Promotional Period of Vichy Dercos Shampoo

6.4 Summary MAPE’s promotional demand drivers

6.5 IMS class corresponding to seasonal type

6.6 Correlation of search terms

6.7 Flu: Search terms

6.8 Sunscreens: Search terms

6.9 Mosquito: Search terms

6.10 Fleas: Search terms

6.11 Ticks: Search terms

6.12 Worms: Search terms

6.13 Flu: Stepwise Linear Regression

6.14 Flu: SAS Forecast Results

6.15 Flu: SAS Forecast Results excl. peaks

6.16 Sunscreens: SAS Forecast Results

6.17 Sunscreens: SAS Forecast Results excl. peaks

6.18 Mosquito: SAS Forecast Results

6.19 Mosquito: SAS Forecast Results excl. peaks

6.20 Insecticides: SAS Forecast Results

6.21 Insecticides: SAS Forecast Results excl. peaks

6.22 Summary MAPE’s seasonal products

6.23 Percentage of new products

7.1 Service Level based on ABC classification

7.2 Parameters for Normal Distribution

7.3 Goodness-of-Fit tests for Normal Distribution

7.4 Impact on safety inventory for ‘Passage délégué’

7.5 Impact on safety inventory for ‘Promo O’

7.6 Impact on safety inventory for ‘Action iU’

7.7 Impact on safety inventory for flu-related products

7.8 Impact on safety inventory for sunscreens

Page 15: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


7.9 Impact on safety inventory for mosquito products

7.10 Impact on safety inventory for insecticides

7.11 Impact on cost for ‘Passage délégué’

7.12 Impact on cost for ‘Promo O’

7.13 Impact on cost for ‘Action iU’

7.14 Impact on cost for flu-related products

7.15 Impact on cost for sunscreens

7.16 Impact on cost for mosquito products

7.17 Impact on cost for insecticides

A.1 IMS top level classification

A.2 APB classification

A.3 GSTAT classification

D.1 Selection of SKUs with higher MAPE including the independent variables

E.1 Sunscreens: Stepwise Linear Regression

E.2 Mosquito: Stepwise Linear Regression

E.3 Insecticides: Stepwise Linear Regression

Page 16: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


List of Figures

1.1 Traditional BI solution

1.2 Big data solution

1.3 5Vs Theory

2.1 Four steps that comprise a process view of analytics

2.2 The McKinsey Digital Compass

2.3 Summary leaders’ capabilities

2.4 Top pressures for data-driven demand forecasting

2.5 Periodic review policy

2.6 Impact on safety inventory

2.7 Safety factor of Standard Normal distribution and graphical representation

3.1 A pharmaceutical supply chain

3.2 Frequency count according to size of assortment

3.3 Network design

4.1 Coverage in days 2015-2016: Impact of SAS on inventory

4.2 Forecasting process of SAS Forecast Studio

5.1 Overview product classification

5.2 Characteristics of five types of new products

5.3 Hierarchical breakdown of disaggregated data

5.4 Distribution of products over the different forecasting methods

5.5 Prediction error ACF of randomly selected SKU

5.6 Prediction error PACF of randomly selected SKU

5.7 Forecast with out-of-stock period

5.8 Distribution MAPE with baseline model forecast

5.9 Distribution of model type of all SKUs with baseline model forecast

6.1 Priority schedule

6.2 Substitution between two products

6.3 Example Excel file of supplier

6.4 Followed procedure of analysis

6.5 Time series Vicks Vaporub 100g incl. promotional order quantities

6.6 Time series Vicks Vaporub 100g excl. promotional order quantities

6.7 Passage Délégué: Distribution of the MAPE

Page 17: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


6.8 Time series Tilman Elimin Fresh Thee incl. promotional order quantities

6.9 Time series Tilman Elimin Fresh Thee excl. promotional order quantities

6.10 Promo O: Distribution of the MAPE

6.11 Time series Dercos Shampoo incl. promotional order quantities

6.12 Time series Dercos Shampoo excl. promotional order quantities

6.13 Action iU: Distribution of the MAPE

6.14 Google Trends data: ‘Griep’

6.15 Graphical representation of correlation of 2 search terms

6.16 GFT Belgium

6.17 Google Trends: Flu

6.18 Google Trends: Sunscreens

6.19 Google Trends: Mosquito

6.20 Google Trends: Insecticides

7.1 Distribution analysis of ‘Quantity’ of randomly selected SKU

7.2 Revised Periodic review policy

9.1 Key strategic actions to improve demand management

9.2 How companies define safety stocks

Page 18: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Introduction “Big data is not about the data”

Gary King, Harvard University

Nowadays, big data is being produced by everything around us. Data is generated from

multiple sources at a frightening velocity, variety and volume and thereafter transmitted by

systems, sensors and mobile devices. Many fields such as healthcare, science, marketing and

sports have become data-driven. Every area is touched by a large amount of data that has to

be managed in an appropriate way. However, big data is not totally new. The last decade,

research institutions and companies collected large amounts of information out of which new

information was generated. They are all looking for correlations, early indicators and cause-

and-effect relationships between phenomenons, persons and events and eventually make

decisions based on these findings. However, over the last years, something has changed. The

application of big data analytics has broadened and penetrates our daily lives. New

possibilities emerge with the rise of the Internet of Things (IoT), where all kind of small and

big devices are interconnected with each other and the society in general (Klous & Wielaard,

2014; Lohr, 2015).

Many organizations noticed that the data they own and especially how they use it can create a

competitive advantage. Data and information are becoming primary assets for many

organizations. This is the reason why today most organizations try to collect as much data as

possible. This big data has to be managed and analyzed in an appropriate way. According to

King (2016), the real value of data is in the analytics. This new concept of gathering and

analyzing extensive amounts of data has a whole new access to organizational problem

solving and the implementation of analytical solutions. As a consequence, the adoption of big

data in Operations Research (OR) is a phenomenon that evolves at a raising speed (Hillier &

Lieberman, 2015). Nowadays, OR professionals will apply their optimization and modelling

knowledge in combination with more advanced analytical skills, which are necessary to

examine large amounts of unstructured data. However, taking advantage of big data analytics

to promote the OR profession is not as easy as it seems. It is a relatively new and thus

dynamic area. By consequence, innovation and development of analytical solutions

characterized by the integrated use of data, processes, and systems are one of the key areas

companies are focusing on today. By combining descriptive and predictive analytics such as

Page 19: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


data mining and statistics with prescriptive analytics such as optimization methods from OR,

one is able to develop new applications within an organization.

The objective of this master dissertation is to identify how the combination of data science

and analytics (i.e. big data analytics) are used to improve Supply Chain Management (SCM).

More specifically, the central research question is whether demand forecast accuracy can be

improved using big data analytics. Advanced analytical techniques are applied on an existing

wholesale company, which is the topic of the case study. Prior to the case study, the literature

will describe the impact of big data and analytics on SCM to obtain a meaningful insight into

the case study. The literature is divided into two chapters. The first chapter discusses the term

big data in a general context. First, the evolution of the analytical movement will be

described, putting emphasis on the difference between traditional databases (storage) and big

data (data analytics). Second, a definition of big data and an explanation of its main

characteristics are provided based on the ‘5Vs theory’. The terms Volume, Variety and

Velocity, which are the three major pillars of big data, will be clarified. The chapter concludes

with a discussion of a McKinsey report that states how big data offers value in five core

industries. The knowledge of this first chapter is necessary to understand why big data has

become so popular in the business environment. The second chapter illustrates the evolution

of the traditional OR to the more advanced big data analytics, as we know it today and the

value it creates for Operations Management (OM). The latter outlines the implications and

advantages Industry 4.0 (i.e. the digitization of the manufacturing sector) creates for

manufacturing companies. A link to the following sections of this chapter is the McKinsey’s

Digital Compass that relates the levers of Industry 4.0 to eight value drivers that have an

impact on the performance of a common manufacturing company. The remaining part of the

literature focuses on one lever of Industry 4.0 that might have an impact on SCM; data-driven

demand prediction. This lever may result in an improved demand forecast using big data

analytics and thus a better match between supply and demand. The direct impact improved

demand prediction has on supply chain optimization with regard to inventory will be

described theoretically in the final section of this chapter. This chapter is crucial to understand

the insights with regard to the case study.

Part two of this master dissertation is a case study in collaboration with Multipharma, a

Belgian pharmaceutical wholesale company. Since the bottleneck of this company is its

limited warehouse space the Supply Chain department is continuously looking for superior

Page 20: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


solutions to improve its operations and optimize the inventory on hand. Following the

literature it will be investigated if improved data-driven demand forecasting will be able to

reduce the forecast error and thereby create a better match between supply and demand.

Moreover, the direct impact of better matching supply and demand on inventory will be

analyzed based on the theory described in section 2.4. After an introduction to the industry

and company in chapter 3, the methodology of the research is written out in chapter 4.

Chapter 5 describes the data used to execute the analysis and the advanced analytics to

support the decision making process. Chapter 6 starts with the priority list of potential

demand drivers that may have the most substantial impact on demand variability and thus

forecast error. Two demand drivers are selected to investigate whether modelling them in an

appropriate way, thereby using internal and external data sources, may improve demand

forecast accuracy. The demand drivers will be modelled using SAS code and the forecast will

be executed using SAS Forecast Studio, thereby seeing the direct impact on the forecast error.

If the forecast error would have been reduced, better decisions with regard to the inventory

will be possible and this will have an impact on the objectives of the company and the

competitive position, which is the subject of chapter 7. Finally the case study ends with a

conclusion (Chapter 8) and a discussion of the rising concept ‘multi-echelon supply chain’,

which might be a topic for further research (Chapter 9).

Page 21: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 22: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Part I


Page 23: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 24: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 1

Big Data

The big data era is characterized by the growth of social media, an explosion of mobile

devices and a physical world being outfitted with millions of networked sensors connected

through the Internet. These factors have resulted in unprecedented growth of all types and

volumes of data available to businesses (Emani, Cullot, & Nicolle, 2015). According to Chen,

Mao and Lin (2014) the rapid growth of cloud computing and the IoT causes the sharp growth

of data. With IoT, sensors from different devices all over the world are collecting and

transmitting data, which is stored in the cloud. An International Data Corporation (IDC)

report foresees that “from 2005 to 2020, the digital universe will grow by a factor of 300,

from 130 Exabyte to 40,000 Exabyte” (Yin & Kaynak, 2015). This explosion of information

is the reason why we are confronted with the challenge of collecting and integrating massive

data from widely distributed data sources. The large amount of unstructured data far surpasses

the capacities of the Information Technology (IT) architectures and infrastructure of existing

enterprises, which is the topic of section 1.1. Big data is becoming widespread because of the

increasing variety of sources that create data and the increasing speed with which data is

created (Section 1.2). The ‘datafication’ of society has positive and negative consequences.

However, the most accepted belief about big data is that it is an enrichment for the entire

world. Moreover, big data will penetrate our lives even more the upcoming years because

people cannot miss the convenience, comfort and added value of technology anymore (Klous

& Wielaard, 2014). This chapter concludes with a short discussion about the added value of

big data (Section 1.3).

1.1 From databases to Big Data

At the beginning of this century, before we even had a notion of the term big data, the volume

of data started to increase dramatically, known as the ‘information explosion’. In contrast to

the previous years, no intervention of humans was needed to enter new data into databases.

Large amounts of data could be stored automatically because of the new automated storage

Page 25: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


systems. Because many researchers were intrigued by this ‘information boom’ this was

known as the start of the concept big data as we know it and the end of using conventional

methods for data management. Compared to conventional storage systems it is remarkable big

data lacks structure and has a larger amount of data. These are two conditions that require to

approach the data in a different way. In essence, big data is a large amount of unstructured

and (sometimes) not-completed set of data, which makes it impossible to approach it with

conventional database systems (Klous & Wielaard, 2014; van der Zee, 2016).

Madden (2012) states there is a misunderstanding of big data that it is a large amount of data

stored in databases. In fact, Database Management Systems (DBMSs) cannot solve the

problem of big data. It is true they can handle data in the range of Petabytes but they are

generally not fast enough and not able to analyze large amounts of complex structures

because they handle information sequentially. In-database statistics and modelling are not

widely adopted and do not go well with large amounts of data. The same can be said for

platforms such as MapReduce1 and Hadoop

2. They can store large amounts of data but they

are very limited in several ways. They provide a low-level infrastructure designed to process

data, not manage it. On the other hand, existing tools are not appropriate anymore for data

analysis at a large scale. Programming languages such as SAS, R and Matlab are able to do

some mature analysis but they are not consistent with large datasets. These insights gave rise

to solutions that integrate DBMSs or platforms such as Hadoop and advanced programming

languages such as SAS, R and Matlab. As reported by Madden (2012), one considerable

solution is to implement data mining, machine learning and statistical algorithms inside the

DBMS. This makes it possible to manage the data inside the DBMS. Some new systems are

evolving continuously. The limitations of MapReduce gave rise to a new evolution: Apache

Mahout. It provides a framework for executing machine learning algorithms on top of

MapReduce. Another example of a new system is GraphLab. It is a scalable platform that can

solve many graph-based iterative machine learning algorithms. However, it is not a data

management platform and it requires that data sets fit into memory. Another rapidly growing

1 The programming technique MapReduce is used to divide the large amount of unstructured data to a large amount of

parallel computer devices and then combining these results again. In this way it is possible to process a lot of information in a

short time period. 2 Hadoop is an open-source framework from Apache that allows to store and process big data in a distributed environment

across clusters of computers using simple programming models.

Source: Mahout – Introduction. (s.d.). Retrieved October 13, 2016 from,

Page 26: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


system is Watson IBM. Watson analytics is at the forefront of a new area of computing. It is

the only platform on the market that is powered by cognitive capabilities. The computer can

recognize a question in human language and it responds quickly after an extremely fast

research in a pool of many diverse sources of information. The software goes much further

than the traditional Artificial Intelligence (AI)3, which is the reason why it can also recognize

irony and riddles. Besides these new systems Chen et al. (2014) define cloud computing as a

solution to meet the requirements on infrastructure for big data. Cloud computing

revolutionizes the existing IT architecture. Besides having a large amount of storage

available, cloud computing can provide a solution to process big data. This states big data can

effectively be managed by the distributed storage technology based on cloud computing. This

parallel computing capacity can raise the efficiency of acquiring and analyzing big data.

According to Accenture’s digital analyst (Accenture, 2016) new big data technologies allow

enterprises to think differently about how to use data as an enterprise asset to drive value-

based outcomes. He states there is a significant difference between traditional Business

Intelligence (BI) and big data analytics (Figure 1.1 and Figure 1.2). First, traditional

Relational Database Management Systems (RDBMS) require data to be normalized prior to

load and storage. In contrast, big data technologies are able to store and process diverse –raw

structured, semi-structured and unstructured- data. The research paper of Bose (2009) that

describes the evolution of BI, states that the evolution to advanced BI is due to advanced

techniques to capture, transfer, transform and store data. These techniques enable

organizations to integrate various databases into data warehouses in which centralized data

management and retrieval occurs. The second criteria of differentiation (Accenture, 2016)

defines that traditional databases focus on descriptive analytical insights drawn from

historical data, while more recent big data techniques enable predictive analytics and unleash

insights at an increasing speed because of the ability to handle large volumes of real-time

data. Third, data delivery is transformed from static pre-defined report layouts with limited

visualizations to graphical and interactive data visualizations to support analytic exploration.

This visual literacy deeply enhances the communication of complex ideas. Fourth, the Total

Cost of Ownership (TCO) has been reduced due to better scalability via parallel processing,

distributed computing and open source technologies. Finally, the more advanced analytical

techniques change the key roles in a company from report architects, integration architects

3AI is a branch of computer science dealing with the simulation of intelligent behavior in computers.

Page 27: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


and database architects to data visualizers, data scientists, data engineers and big data

architects. A more detailed explanation of the evolution of big data analytics is provided in

chapter 2.

Figure 1.1: Traditional BI solution

Source: Accenture (2016) @ All rights reserved

Figure 1.2: Big data solution

Source: Accenture (2016) @ All rights reserved

Page 28: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


1.2 What is Big Data? In its recent study, Chen et al. (2014) found that although the growing importance of big data

has been generally accepted, people still have varying opinions on its definition. There are

multiple definitions of big data according to different perspectives. These definitions may

provide a better understanding on the profound social, economic, and technological influence

of big data. McKinsey & Company, a global consulting agency, defines big data as the next

frontier for innovation, competition, and productivity (Manyika et al., 2011). They define big

data as “datasets whose size is beyond the ability of typical database software tools to capture,

store, manage, and analyze”. From this definition, it can be concluded that besides the volume

of a dataset, other important criteria of big data are the increasingly expanding data scale and

its management, making it impossible to be handled by traditional database technologies.

Gartner, an international research agency, allocates the following definition to big data: “Big

data is high-volume, high-velocity and high-variety information assets that demand cost-

effective, innovative forms of information processing for enhanced insight and decision

making.” Following this definition, we can define big data by the ‘Triple V’ expression.

Actually, the ‘Triple V’ concept of data management is introduced by Gartner analyst Laney

D. in a 2001 META Group research publication (Li, Cheng, & Zhao, 2015). He defined the

three main components of data as Volume, Variety and Velocity. Nowadays, two other Vs

have been added to the model, resulting in the ‘5Vs theory’. These V-words bring challenges

to big data management and are introduced below.

Figure 1.3: 5Vs Theory

Source: Van Den Poel (2016)


Data at rest


Data in many forms


Data in motion


Data in doubt


Data into money

Page 29: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


The first keyword of the ‘Triple V’ is Volume, the amount of data to process. Without a

doubt, this keyword is the reason why we call it ‘big’ data. Every digital process produces

data all the time, which accumulates to enormous values. One major advantage of the large

amount of data in the range of Petabyte is that we can stop looking for models and

formulating hypotheses in advance. The old way of doing research by hypothesis and

assumptions will be transformed into data-driven research. This implies we need to avoid

testing theories or models but instead do an exploration by observing, describing and mapping

undiscovered territory. Big computing clusters will find patterns based on statistical

algorithms (Emani et al., 2015; Mazzocchi, 2015).

The large amounts of data come from different sources, which give rise to the second V-word:

Variety. Big data has improved access to different sources of information and tries to

integrate them in an appropriate way. These sources can be structured or unstructured e.g.

data originating from social networks, health care data, financial data, biochemistry and

genetic data, astronomical data, etc (Emani et al., 2015). The difficulty associated with big

data is to structure and eventually analyze relevant data from a large amount of unstructured

data with the help of fast-moving computer tools as explained in section 1.1. These tools

become more precise and adapt at a raising speed because they learn on the large amount of

new data.

The third word featuring big data is Velocity. According to Knilans (2014), “Velocity refers

to the speed at which new data is generated and the speed at which it moves around.” Velocity

involves streams of data, structured records creation, and availability for access and delivery

(Emani et al., 2015). Sophisticated algorithms and statistical tools are needed to rapidly go

through an immense amount of rapidly increasing data. An illustrative example where

velocity is of major importance is in the area of e-commerce. Data of a web user has to be

analyzed very quickly such that banners and other advertisements can be shown very fast to

the user to influence their purchasing behaviour.

A V-word often added to the three previously described key words is Veracity. Veracity

contains the uncertainty due to the fact that the data is inconsistent and incomplete. This can

Page 30: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


result in approximations of the corresponding models. By consequence, keeping big data

organized has become a major challenge.

The previously described features (V-words) of data cause traditional analytics to fall short

handling data in a convenient, well-timed and efficient way. This introduces the existence of

the fifth V that organizations are coping with, finding the Value within their data (Knilans,

2014). More data does not necessarily promise more knowledge. Data by themselves are

meaningless. For example, sensors in cars and houses provide a lot of information. Sensors

only give an excellent insight in the use of the products by consumers and their purchasing

behaviour if appropriate techniques are used to manage and analyze the data. Through

effective data mining and analytics, the massive amount of collected data can extract

meaningful value from big data and create competitive advantages. This V-word will be

described more in detail in the following section. More specifically, the value of big data for

OM will be outlined in section 2.2.

In short, big data is data that is too big, too hard or too fast for existing tools to process (Klous

& Wielaard, 2014; Madden, 2012).

1.3 Value of Big Data

Most companies hold on the familiar way of doing business out of fear to lose existing

business. However, the past has already revealed the risks companies take when companies

do not embrace new disruptive technologies, e.g. Kodak fell out of the market with the rise of

digital photography. Most often companies are afraid of the disadvantages, such as privacy

norms, that go along with the implementation of big data (Klous & Wielaard, 2014).

However, McKinsey & Company (2011) evidenced how big data created value for enterprises

in an in-depth research on the five core industries that represent the global economy. This

report indicates that big data may raise the productivity level and competitive advantage of

enterprises and public sectors, and create substantial benefits for consumers (Chen et al.,

2014). McKinsey states big data can reduce the expenditure for the U.S. healthcare by over 8

per cent and the retail industry may improve their profit by more than 60 per cent by fully

utilizing big data. Furthermore, big data may boost the efficiency of governmental activities,

such that the developed economies in Europe could save over 100 billion Euros (Chen et al.,


Page 31: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


In general, McKinsey’s research (2011) states big data can create value by making data more

easily accessible to relevant stakeholders in a timely fashion, thus creating more

transparency. In manufacturing, for example, integrating data from R&D, engineering, and

manufacturing units can significantly cut time to market and improve quality. Another way by

which value is generated is by enabling experimentation to discover needs and improve

performance. Combining more accurate, detailed and real-time (or near real-time)

performance data with IT to set up controlled experiments enables organizations to raise

performance to higher levels. A third approach to create value is by using big data to segment

populations to customize actions. In companies that produce consumer goods or provide

services, segmentation is already used for many years. However, these companies are starting

to deploy more advanced big data techniques, such as the real-time micro-segmentation of

customers to target promotion and advertising. Moreover, big data provides value by

replacing and/or supporting human decision making with automated algorithms. Decision

making can be improved, risk can be minimized and valuable insights can be revealed by

making use of sophisticated analytics. Automated algorithms are useful to retailers who aim

to optimize decision processes such as the automatic fine-tuning of inventories and pricing in

response to real-time in-store and online sales. Besides automating decisions, decision making

is transformed from analyzing smaller samples that individuals with spreadsheets can handle

and understand to analyzing extensive datasets using big data techniques and technologies.

The last way big data creates value for enterprises according to McKinsey is because of its

ability to facilitate innovation of new business models, products, and services. Besides

creating entirely new products, services and models, manufacturers enhance the development

of next generation products and after-sales service offerings by using data obtained from the

actual products.

Page 32: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 2

Operations Research

This chapter starts with a description of the evolution of the traditional OR to the more

advanced big data analytics, as we know it today (Section 2.1). Section 2.2 outlines the value

of big data analytics for OM, more specifically the implications and advantages Industry 4.0

creates for manufacturing companies. This section ends with the McKinsey’s Digital Compass

that relates the levers of Industry 4.0 to eight value drivers that have an impact on the

performance of a manufacturing company. Section 2.3 focuses on one lever of Industry 4.0

that might have a significant impact on SCM; data-driven demand prediction. This section is

based on a research paper of Purdue University & SAS (2008) that identifies capability gaps

between leaders and laggards with regard to data-driven demand forecasting. The direct

impact improved demand prediction has on supply chain optimization with regard to

inventory will be described theoretically in section 2.4.

2.1 Evolution of Operations Research

An introductory book to OR (Hillier & Lieberman, 2015) describes the origins of OR. Due to

the industrial revolution, there was a tendency of increased specialization by the division of

labour and segmentation of management responsibilities, which created new problems. One

problem was that many components of an organization grew into relatively autonomous

empires with their own goals and value systems, thereby losing sight of how their activities

and objectives match with those of the overall organization. What was best for one

component frequently was detrimental to another. As a consequence, the components ended

up working sub-optimal together. Moreover, due to the increasing complexity and

specialization in an organization, it became extremely difficult to allocate the available

resources to the distinct activities with the aim of obtaining maximum effectiveness for the

organization as a whole. These kind of problems and the need to find a better way to solve

them provided the environment for the emergence of OR. As its name implies, OR involves

‘research on operations’. Thus, OR is applied to problems that concern how to conduct and

Page 33: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


coordinate the operations (activities) within an organization. Moreover, OR is concerned with

the practical management of the organization. Therefore, to be rewarding, OR must provide

positive and understandable conclusions to the decision-makers when they are needed.

There has been a great buzz throughout the business world in recent years about analytics (or

business analytics) and the importance of incorporating analytics into managerial decision

making. It is basically OR by another name. However, there are some differences in their

relative emphasis. Analytics aim to focus on an entire business process where decisions span

functional boundaries. In fact, analytics is a motivation for OR to refocus its attention on

growing and applying a wider range of scientific and technological approaches for

organizational decision making (Liberatore & Luo, 2010). In addition, analytics fully

recognizes that we have entered into the era of big data where massive amounts of data now

are commonly available to many businesses and organizations to help guide managerial

decision making. Davenport and Harris (2007, p. 7) define analytics as “the extensive use of

data, statistical and quantitative analysis, explanatory and predictive models, and fact-based

management to drive decisions and actions.” Thus, a primary focus of analytics is on how to

make the most effective use of all these data. Having an immense amount of high-quality

data, organizations start to think how they might use this data to improve decision making,

which is the foundation of growth in analytics (Hillier & Lieberman, 2015). Liberatore and

Luo (2010) define analytics as a four-step process of transforming data into actions through

analysis and insights when making organizational decisions (Figure 2.1).

Figure 2.1: Four steps that comprise a process view of analytics

Source: Liberatore & Luo (2010) @ All rights reserved


• Collection • Extraction • Manipulation


• Visualization • Predictive

modeling • Optimization


• What happened?

• What will happen?

• What should happen?


• Operational decisions

• Process changes

• Strategic formulation

Page 34: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Analyzing large amounts of data to obtain clear insight has limited value unless managers

translate these insights into actions. LaValle et al. (2011) found in a study that the biggest

challenge in adopting analytics is managerial and cultural. According to almost four out of ten

respondents, the main obstruction to widespread analytics adoption is lack of understanding

of how to use analytics to improve the business. Organizations need to determine the current

state, what is likely to happen next and what direction should be taken to obtain optimal

results. In essence, senior executives should focus running businesses on data-driven

decisions and react fast when disruptions occur. To trigger these new actions across the

organization, analytics-driven insights must be closely linked to the business strategy, easy to

understand and embedded into the organizational processes so action can be taken when

opportunities arise.

According to Madden (2012), using the data effectively in practice needs to be educated to

solve the complex problems of today. A survey of Accenture in 2008 of 254 managers in

different functional areas proved 60 per cent of decisions were already based on analytic

input. Moreover, high-performance businesses are 50 per cent more likely to use analytics

strategically. Marketing (customer analytics), Operations and Research and Development

(R&D) are identified as the heaviest users of analytics (Coghlan, 2010). The research of

Liberatore and Luo (2010) indicates IT firms are quickly establishing positions in analytics

and advanced BI because of the growing demand for analytics. Despite the economic

downturn, the BI market will remain one of the most accelerated software markets. A growing

amount of IT professionals who specialize in BI are required as more organizations begin to

implement BI software. It is clear the growing importance of analytics has a profound impact

on OR professionals and their practice. If OR professionals ignore the advantages of analytics

they will be unable to gain its full potential.

Page 35: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


2.2 Value for Operations Management Lee, Kao and Yang (2014) state many manufacturing systems are not ready to implement big

data due to the shortage of smart analytic tools. However, manufacturing industries are

continuously evolving for the upcoming industrial big data environment. Today’s

organizations have to focus on a variety of business aspects such as outsourcing, reducing

inventory levels, global manufacturing, just-in-time (JIT) delivery, customer requirements,

etc. All these aspects have to be managed by a centralized data support system where data is

aggregated from multiple sources and suppliers. This enables complete integration and

visibility of manufacturing capacity, inventory, transportation on a global basis and policies to

manage unexpected risks. The entire manufacturing value will know positive influence of the

new techniques gained from data analytics. Mainly in key areas such as R&D, SCM,

manufacturing and service, big data analytics can make a difference. Using big data analytics

manufacturers will be able to reduce the development cycle, optimize the assembly process,

increase yields, and better meet customer needs. Likewise, Waller and Fawcett (2013) state

big data has the potential to revolutionize the supply chain dynamics. They believe new tools

such as big data will transform the way supply chains are designed and managed, raising a

new and significant challenge to logistics and SCM. Today, there is more data because the

data are captured in more detail and because of the need of global supply chains to capture

data at multiple points in the supply chain. Furthermore, many companies that did not record

daily sales by location and by Stock Keeping Unit (SKU) to make inventory decisions now

do. Therefore, many traditional approaches will need to be re-imagined and some will even be

discarded as obsolete in the new data-environment.

McAfee and Brynjolfsson (2012) state that companies recognizing themselves as more data-

driven perform better on objective measures of financial and operational results. Companies

in the top third of their industry using data-driven decision making are on average 5 per cent

more productive and 6 per cent more profitable than their competitors. From recent research

of Yin and Kaynak (2015), it is clear efficiently capturing and analyzing big data has the

potential to enhance productivity and competitive advantage in a wide range of industrial

sectors. Hence, from an industry perspective, big data is going to play an important role in the

fourth industrial revolution. In this Industry 4.0 era, a transformation of the traditional

production management and factory are evolving due to intelligent analytics. The ambition

behind the use of big data in industrial applications is to attain a faultless and cost-efficient

Page 36: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


running of the process, while realizing the desired performance levels, especially with respect

to quality.

In a recent report, McKinsey (2015) defines Industry 4.0 as digitization of the manufacturing

sector, which is driven by four clusters of disruptive technologies. All digitally enabled

disruptive technologies that are expected to have a significant impact on manufacturing within

the next 10 years belong to the Industry 4.0. These technologies offer ways of leveraging data

to unlock its value potential. The first and the second cluster, namely big data and advanced

analytics, are the topic of this master dissertation. The exponential increase in available data

and advanced statistical techniques empower digitization and automation of knowledge work

and advanced analytics. McKinsey’s research (2015), based on interviews with experts from

different sectors and company sizes, revealed that industries are investing significant

resources in Industry 4.0 because traditional productivity levers are burning out. The pressure

on companies to increase the time to market and customer responsiveness is the reason why

most of them are searching for new opportunities to boost productivity. Becoming

operationally effective is a major concern for manufacturing companies facing an extremely

high level of margin pressure. Digitization and Industry 4.0 stimulate new cost savings that

have so far remain untapped. Data becomes the core driver in smart factories; the study

reveals that big data/advanced analytics approach can result in a 20 to 25 per cent increase in

production volume and up to a 45 per cent reduction in down time.

Remember from the previous chapter, data in itself does not offer a fundamental value. The

key to sustainable innovation within an Industry 4.0 factory according to Lee et al. (2014) is

the actual conversion of big data into useful information. Hence, all data should be

approached with the intention to optimize value (McKinsey, 2015). Industry 4.0 makes it

possible to capture value across the entire product lifecycle. McKinsey’s Digital Compass

(figure 2.2) is an important tool to link the levers of Industry 4.0 to eight value drivers that

have an impact on the performance of a common manufacturing company. Note that the

following section focuses on data-driven demand prediction, because this is the main path to

follow during the case study. Data-driven demand prediction may result in better optimizing

the match between supply and demand. Given that today’s forecast errors are already very

small, it is still recommended to reduce the error rates even further since they cause high

costs. The report states forecasting based on advanced analytics can increase the accuracy of

demand forecasting to 85+ per cent. This lever has a direct impact on the inventory of the

Page 37: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


company. Excess inventory can be due to inaccurate stock numbers that increase sludge or

unreliable demand planning necessitating safety stock, or overproduction. Hence, improved

demand forecast accuracy decreases the required level of safety inventory by better matching

supply and demand and consequently better managing the variability. Carrying too much

redundant inventory leads to high capital costs having a direct impact on the company’s

margin (Infra. Section 2.4).

Figure 2.2: The McKinsey Digital Compass

Source: McKinsey (2015) @ All rights reserved

Page 38: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


2.3 Data-driven demand prediction

A joint paper by Purdue University and SAS (2008) found out that traditional methods of

predicting demand are not efficient in a fluctuating market and organizations that only use

these kind of methods are lagging behind the competition. Today’s market volatility, where

past trends are no longer the only indicator of the future, is the reason of the increasing gap

between leaders and laggards in the industry. Analyzing the capabilities of both leaders and

laggards identified huge gaps of maturity in organizational processes, functional capabilities

and technology enablers. Many similar characteristics among best-performing companies with

regard to data-driven demand forecasting were discovered, which enable them to create a

competitive advantage. What follows are some examples of the most important leading

capabilities and are summarized in figure 2.3.

2.3.1 Internal enterprise data Leaders have the ability to create a single demand forecast with input from multiple roles (e.g.

sales, marketing, finance and others) within the organization. In 2013, IBM indicates in a

report ‘The Application of Big Data to the Real World’, internal enterprise data are the main

sources of big data. These kind of data consist of historically static data that are managed by

RDBMS in a structured way (e.g. online trading data) and data coming from different

departments in the organization. Examples of the latter are production data, inventory data,

sales data and financial data. The power of this internal data is often underestimated. Internal

data sources in the form of Excel files or information coming from the centralized enterprise

resource planning (ERP) system can be subject for analysis, to tackle different problems or

identify opportunities within the organization. Chen et al. (2014) state, every 1.2 years, the

business data volume of all companies in the world doubles due to IT and digital data. This

increasing volume requires more effective real-time analysis. Without this analysis, the data

becomes useless and it is just a massive amount of stored data that does not contribute to the

potential of a company.

2.3.2 Causal factors Leaders have the competence to include causal factors (e.g. weather, natural disasters and

competitor actions) into demand forecasts. Besides historical data, which can be of an

immense amount for products that already exist for a couple of years, other sources of big

Page 39: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


data are even more extensive and might be useful to include as predicting variables into a

forecast model. According to a definition of big data from IBM (IBM, s.d.), large amounts of

decentralized data are created at a daily speed of 2.5 quintillion bytes of data. Especially open

source data can be of interest to many companies when analyzed in a smart way. This data

can come from sensors used to gather climate information, social media sites, digital pictures

and videos, purchase transaction records, cell phone GPS signals, search engine data, etc.

Recent work (Goel et al., 2010) has demonstrated that search engine data can ‘predict the

present’. Large Web search volumes are able to track consumer behavior accurately in near

real-time, e.g. unemployment levels, auto and home sales, and disease prevalence. This

advanced forecasting method is based on the principle that what people are searching for

today is predictive of how they will act in the near future. They also found out that search

query data boosts the performance of baseline models fit on internal historical data or on other

publicly available data. Especially where small improvements in predictive performance are

material, search queries provide a useful guide to the near future.

2.3.3 Model events Leaders have the ability to model events, such as sales promotions, marketing events,

economic activities, etc. The aim of these events is to make sure the forecasting framework

should provide for an automated demand forecasting process based on alerting and

management by exception (such as unexpected competitor actions, etc.). Inspecting or

correcting stable demand signals should not bother demand forecasters. They should be able

to focus on more challenging demand evolutions such as unexpected peaks in demand. That is

why events and alerts are created in dialogue with or within a SCM team. Exceptions in the

forecast of products can have an obvious reason explained by the SCM team, who observes

the products most closely. The qualitative information is translated into IT solutions in the

form of events and alerts. SAS Forecast Studio user's guide (SAS Institute Inc., 2014)

explains events and alerts as follows

Events are automatically detected and modelled within the forecasting process.

They assume a known and stable relationship over time with the demand of


Alerts are automatically detected events, but they are not modelled within the

forecasting process. The reason for considering a demand driver as an alert and

Page 40: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


not as an event is because there is not enough historical information available

or because the impact of the demand driver on demand orders is unknown, not

stable enough or too complex for modelling in an automated way. Moreover,

alerts can be seen as a trigger for creating an event. Alerts are designed to give

a warning about a specific occurrence with a potential impact on demand of

orders for specific products.

2.3.4 Technological characteristics With regard to the technological characteristics, best-performing companies are making use of

advanced technologies, such as statistical demand forecasting, demand analytics and

reporting, and sales and operations planning. Although leaders adopt these advanced

technologies, they still use Excel spreadsheets for demand forecasting and planning to some

extent, just as laggards. This could be a signal that companies, despite using advanced

techniques, still lack the flexibility to create real-time, ad hoc reports and spread them across

the enterprise. Finally, leaders possess sophisticated tools, such as an integrated ERP module,

a demand forecasting software and SCM software. In contrast, laggards are still relying on

Microsoft Excel spreadsheets. An important difference between leaders and laggards is that

laggards were 12 times more prone to rely on executive opinion for demand forecasting and

planning than leaders who were more committed to data and analytics.

Figure 2.3: Summary leaders’ capabilities


Single demand forecast with input from multiple roles

Including causal factors

Forecast SKU

What-if analysis and scenario planning

Advanced technologies

Sophisticated tools

Page 41: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


These kind of proficient companies consistently reported significant improvements in key

performance metrics over a period of two years, such as improvements in inventory turns,

order fulfilment rates, forecast accuracy at both the product family and SKU levels, as well as

improvements in gross profit margins. This study from Purdue University and SAS among

more than 180 forecasting managers, planners and Supply Chain executives from 173 unique

companies revealed the top pressures among industries causing more companies to think

differently and to better manage demand. Shrinking profit margins in the industry addresses

companies to become more cost efficient. Moreover, there is a continuous pressure to

accurately match supply and demand and at the same time keeping inventory at a bare


Figure 2.4: Top pressures for data-driven demand forecasting

Source: Purdue University & SAS Demand Management Survey 2008 @ All rights reserved

Since companies with more accurate demand forecasting and planning capabilities have better

perfect-order ratings, less inventory and shorter cash-to-cash cycle times than others it seems

clear that demand forecasting requires a lot of attention. Similar, literature on supply chain

(Chopra & Meindl, 2013) states demand forecasting is the basis of a proper supply chain

planning. Those companies that overlook the importance of forecasting often reflect a reactive

business model, because they are only responding to the marketplace, not anticipating it.

Hence, it is clear a focus on demand forecasting enables a company to be ahead of the

competition and must be executed as accurate as possible. With regard to the accuracy of a

forecast, it should be noted a forecast error is inevitable in predicting demand. Forecasting

error is something most companies do not measure. However, it is a very good measure of

Page 42: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


accuracy and can be used to improve reliability. The goal of every company should be to

reduce the forecasting error with data analytics methods by combining old and new data sets.

Traditional forecasting techniques based on historical data alone are not sufficient anymore

because of increased demand uncertainty. Demand volatility has increased with the number of

choices available to customers, the velocity of promotions to reduce inventory levels, the

reduced consumer confidence and the increasing competitive activity. The importance of

access to real-time consumption data has never been so widely pronounced (Purdue

University & SAS, 2008).

2.4 Inventory control

Industry studies show that better demand forecast accuracy can result in 15 per cent lower

inventories and 17 per cent stronger perfect-order fulfilment (SAS Institute Inc., 2009). More

accurate forecasts not only improve customer satisfaction and increase revenues, but more

importantly lower inventories (raw material, Work In Progress (WIP), finished goods, safety

stocks, etc.) and working capital requirements, thus free up available cash. One condition is

that demand forecasters need to avoid a biased demand forecast for the purpose of order

generation. Some of the demand planners overestimate the order size to avoid out-of-stocks.

Consequently, the improved accuracy of the demand forecast will be lost and inventory levels

and the associated inventory cost will increase again (Chopra & Meindl, 2013). What follows

describes the impact of improved demand forecast accuracy on the inventory level and cost

due to a reduced level of safety inventory.

Companies use scientific inventory management to decide when and how much to replenish

their inventory. In many industries, inventory management is a key component to the

effective, profit-making operations of a business. Inventory control regulates the inventory

that is already in a distributor’s warehouse. It implies the coordination and supervision of the

supply, storage, distribution, and recording of materials to maintain product levels adequate

for current customer needs without excessive supply or loss. As reported in a recent article

(Fritch, 2015), generating maximum profit from a minimum amount of inventory investment

without hindering customer satisfaction levels or order fill rates is the goal of inventory

control. Inventory can be found in different levels of the supply chain for various reasons and

in various forms. Inventory can exist as raw materials, e.g. manufacturers need raw material

inventories to make their products. Moreover, inventory can exist as work in progress and

Page 43: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


finished goods e.g. both wholesalers and retailers need to maintain finished goods to be

immediately available if customers place orders. The lower levels create demand on the

upstream inventories. Given that this demand is uncertain combined with uncertain transit

times and uncertain delivery from suppliers; inventory incorporates a certain amount of safety

stock. Hence, reducing the demand variability will have a direct impact on the safety stock.

What follows describes inventory in the context of periodic review policies (Chopra &

Meindl, 2013). Periodic review policies are most widely adopted because they do not require

monitoring inventory continuously. Figure 2.5 illustrates the inventory profile for a periodic

review policy with lead time L and reorder interval T for one product. The dashed line from

point 1 to point 2 represents the available inventory from the moment an order is placed until

the next order is placed. Inventory levels are reviewed after a fixed period of time T and the

size of the order is specified such that the level of current inventory plus the replenishment lot

size equals the prespecified Order-Up-to-Level (OUL). The average lot size in a periodic

review system equals the average demand during the review period T and is given as


When demand is normally distributed and independent from one period to the next, a stock

out will occur if the demand during the time interval between zero (review period 1) and T+L

exceeds the OUL. Hence, the OUL, which is the level of the inventory position, should be

large enough to protect the enterprise against shortages until the next order arrives.

Figure 2.5: Periodic review policy

Source: Chopra & Meindl (2013) @ All rights reserved

Page 44: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


In general, enterprises are operating in an uncertain environment. In other words, enterprises

have to deal with supply and demand variability when monitoring the level of product

availability. To avoid stockouts due to unforeseen circumstances a company should carry

safety stock. Chopra and Meindl (2013) define safety inventory as “the inventory carried to

satisfy demand that exceeds the amount forecast.” Demand is uncertain and it is possible

actual demand exceeds forecasted demand, which results in product shortages. Safety stock is

the average inventory remaining when the replenishment lot arrives. The Supply Chain

Manager should make a trade-off when considering how much safety stock to incorporate.

Raising the level of safety stock increases product availability but at the same time raises

inventory holding cost. In today’s rapid changing environment, where demand is extremely

volatile and product variety has grown, the previous described issue of determining an

appropriate trade-off became extremely important. Demand volatility can be captured by

keeping excessive inventory. However, suddenly the inventory on hand can become obsolete

when new products come onto the market and demand for the product in inventory fades out.

Product life cycles have shrunk as product variety has grown and firms need to be aware of

carrying too much inventory. Therefore, a successful company has to figure out ways to

decrease the level of safety inventory carried without hurting the level of product availability.

“Responding to the customer can be achieved with cost overruns, excessive inventory and

firefighting, but to respond profitably means understanding the sources of volatility and

planning for them appropriately.” Gartner Research, August 2011.

The acceptable level of safety inventory is determined by the two factors presented in figure

2.6. Growing uncertainty of supply and/or demand and an increasing desired level of product

availability cause the required level of safety inventory to increase. What follows describes

demand uncertainty and product availability in the context of a periodic review policy to

understand the impact on safety stock.

Page 45: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 2.6: Impact on safety inventory

2.4.1 Measuring demand uncertainty To calculate safety stock in a periodic review system, we need to model the uncertainty

during period T+L. When the distribution of demand during period T+L is assumed to be

normal4, independent and identically distributed the parameters of the demand can be written

as follows

Mean demand during T + L periods, (2.2)

Standard deviation of demand during T + L periods, (2.3)

The derivations of the equations are out of the scope of this master dissertation. D represents

the systematic component of demand and the random component, which is a measure of

demand uncertainty The goal of forecasting is to predict the systematic component and

estimate the random component. Whether an enterprise with a periodic review system is able

to satisfy all demand from inventory depends on the inventory it has on hand when a

replenishment order is placed and on the demand experienced during period T+L, . A

company can take the risk of reordering when the inventory is equal to . Nonetheless,

because of the uncertainty of demand during this period, demand can exceed the mean

demand during period T+L and stockouts will occur. This is why companies include safety

stock based on the uncertainty of demand during period T+L, (Chopra & Meindl, 2013).

4 The Normal distribution is a good approximation for most of the products within a firm. Howbeit, for slow moving products

the Poisson distribution is more appropriate.

Safety inventory


•Demand •Supply

Level of product availability

Page 46: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


2.4.2 Measuring product availability In general, the Cycle Service Level (CSL) is the fraction of replenishment cycles that end

with all the customer demand being met. Hence, the CSL is the probability of not having a

stockout in a replenishment cycle (Chopra & Meindl, 2013). The CSL in a periodic review

system is the probability that demand during period T + L does not exceed the OUL.


This equation is equivalent to


where is the cumulative distribution of demand during the period T + L.

Most of the time, it is more appropriate to use the fill rate. Fill rate is the percentage of

demand satisfied from products in inventory and is usually much higher than CSL in a multi-

product situation. It allows estimating the fraction of demand that is turned into sales. Fill rate

should be measured over specified amounts of demand rather than over time. Nevertheless,

there is a drawback of using fill rate instead of CSL. It is much more mathematically

complicated, especially in a periodic review system it is very computational expensive to

calculate the safety stock based on the fill rate. Equation 2.6 presents the formulation of the

fill rate in a continuous review system. Deriving the fill rate for a periodic review system is

out of the scope of this master dissertation and will not be used in the analysis of the case



with ESC the Expected Shortage per Cycle, which is the average demand in excess of the

OUL in each replenishment cycle. When is the density function of the demand

distribution during the lead time, ESC is given by


In general, when the required product availability goes up, the required safety inventory needs

to increase because the supply chain must now be able to handle high demand or

uncommonly low supply. The marginal increase in safety inventory grows rapidly with an

increase in the desired CSL or fill rate.

Page 47: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


2.4.3 Safety stock formula In a periodic review system, safety stock is the quantity in excess of over the time

interval T+L (Chopra & Meindl, 2013). Hence, OUL and safety stock are related as follows


Based on equation 2.5, OUL can be calculated as the inverse of the cumulative distribution

function of demand. This is the general interpretation of safety stock applicable for all

distributions. Note that in this equation is the mean demand during period T+L and is

not necessarily normally distributed.

For a normal distribution the safety inventory can be written as a function of the standard



where z is a safety factor depending on the required service level. The Z-score is the inverse

of the Standard Normal distribution of the CSL.


As illustrated in figure 2.7, the relationship between the CSL and the Z-score is nonlinear;

higher cycle service levels require disproportionally higher Z-scores and, thus,

disproportionately higher safety stock levels. According to King (2011) rather than using a

fixed Z-score for all products, the Z-score should be set independently for groups of products

based on criteria such as strategic importance, profit margin, or dollar volume. Therefore,

SKUs with a greater value to the business will have more safety stock, and vice versa.

Figure 2.7: Safety factor of standard normal distribution and graphical representation

Source: Eazystock (2015) @ All rights reserved

Page 48: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Equation 2.9 is only valid when there is no lead time variability. The general equation can be

written as follows when average demand and lead time variability are independent and

normally distributed.


Combining this equation with equation 2.3 reveals that the safety inventory increases

lineary with and increases proportionally to the square root of the lead time. It is important

to see the link with the previous section about demand forecasting. If the underlying

uncertainty of demand ( can be reduced with a factor of k, the required safety inventory

also decreases by a factor of k (Chopra & Meindl, 2013). Therefore, using sophisticated

forecasting models that reduce the forecast error and create a better match between supply and

demand will have a direct impact on the safety inventory.

In general, the expected level of inventory after receiving an order is equal to


Keeping the inventory level at a bare minimum by reducing safety inventory has several

advantages. A crucial advantage, related to the top pressure of figure 2.4, is the cost saving,

which enable companies to operate more cost efficient. Larry Mulky, President, Ryder

Integrated Logistics, Inc. (Harrington, 1996) notes, “Inventory is where the biggest cost is

hidden in most businesses today.” Inventories can cost anywhere between 20 and 40 per cent

of a company’s value per year. In a multi-echelon inventory system, the difficulty is to

interact among the different levels to keep the inventory costs low. At the same time, other

costs such as transportation and production costs need to be minimized. An inventory

optimization system weighs the fixed ordering, unit, holding and potential penalty costs5 for

each product and location combination. It also takes into account the demand, demand

variability, lead time and supply variability to come up with an inventory control parameter

that determines the order size and timing of the order placement to obtain the lowest costs and

to meet minimum stock levels. Moreover, it determines the minimum and maximum stock

level, considering both demand and supply variability.

5 Penalty costs of not having enough stock can include either the cost of backorders or lost sales.

Page 49: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


2.5 Conclusion To conclude the literature, there is no doubt that improving demand forecast accuracy and

thus reducing the demand uncertainty will result in a lower required level of safety inventory

and carrying costs. Knowing that companies face continuous pressures to operate cost

efficiently and keeping customer service levels high at all time, this seems an interesting topic

for a real-life case study. Hence, in the following case study, the research question whether

big data analytics can improve demand forecast accuracy and consequently reduce the safety

inventory of a company, will be investigated based on data sources of a pharmaceutical

wholesale company. We will try to apply new advanced capabilities of leading companies

with regard to demand forecasting into the case study (cf. Section 2.3). The objective of the

case study is to optimize inventory levels using these advanced forecasting techniques and

therefore to minimize the overall cost of the company without hindering the customer

satisfaction levels. Based on the theoretical insights of section 2.4 the impact on the

company’s safety inventory and carrying cost will be analyzed.

Page 50: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Part II

Case Study

Page 51: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 52: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 3


The case study is set up in a cooperative wholseller company, Multipharma. A wholeseller is

one stage in the entire pharmaceutical supply chain. Hence, before given a general

introduction about the company (Section 3.2) and the network in which it operates (Section

3.3), a broader view on the entire pharmaceutical supply chain is necessary to understand the

context of the case study (Section 3.1). Section 3.2 and section 3.3 are based on qualitative

interviews with the supply chain department of Multipharma (D.V. Belle, personal

communication, March 9, 2017).

3.1 Pharmaceutical supply chain A pharmaceutical product has a very long and complex research and discovery phase.

Afterwards, the product needs to be tested for safety and efficacy. The final phase consists of

manufacturing and distribution but this phase can be broken down into several subphases.

First, during the primary manufacturing active ingredients are produced. Afterwards during

the second manufacturing, the final product in SKU form is produced. According to Shah

(2004), both manufacturing stages operate very slowly because of the many quality assurance

activities. The final products need to be brought to market warehouses or distribution

channels (wholesalers) and thereafter to retailers (pharmacies) or hospitals. A general

overview of the different stages of a pharmaceutical supply chain is represented in figure 3.1.

Figure 3.1: A pharmaceutical supply chain

pharmaceutical company warehouse/wholesaler pharmacies/hospital end-user (patients)

Page 53: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Booth (1999) recognizes a trend for companies to divest excess capacity resulting from many

local manufacturing sites, and move towards a global SCM process. The different stages

involved in the movement of the product through the global chain make a pharmaceutical

supply chain difficult to coordinate. Besides significant intra-organizational information flows

between different planning units there is also a vital inter-organizational exchange of

information between the different members of the SC. The pharmaceutical supply chain has a

large scale and geographical span, which is why the communication and as a result, the

coordination between the different departments is very limited due to the delayed information.

Difficult coordination and communication creates a bullwhip effect, which is largest at the

primary manufacturing sites where large stocks of active ingredients are held to ensure good

service levels. The stock level in the entire supply chain is between 30 to 90 per cent of the

annual demand. Additionally, the large scale and span makes it very difficult to exploit short

term opportunities such as shortage of a supplier’s products. The supply chain cycle time is

between 1000 and 8000 hours. Hence, when an opportunity arises at the lower levels, it takes

a considerable amount of time until it reaches the higher levels of the supply chain. These

operational issues have an impact on the efficiency and effectiveness of the supply chain.

Managers recognize these issues and opposing to a decade ago where the main focus was on

drug discovery, sales and marketing, today, they pay much more attention to supply chain

optimization as a means of delivering value (Shah, 2004). It is clear improving the

coordination between the different stages and avoiding a bullwhip effect is a key challenge in

the pharmaceutical supply chain. According to Shah (2004), demand and inventory

management together with distribution and production planning are key business processes in

the pharmaceutical industry. In each geographical region a pharmacist develops forward

forecasts based on historical data and market intelligence. The result of the demand

management is a demand forecast at the lowest level, which can be aggregated and imposed

on the appropriate warehouse or distribution centre. Detailed schedules provide information

on how to place orders with upstream suppliers. At each stage, several transport modes are

applied for the delivery of incoming goods. Making a trade-off between the holding cost and

transportation cost results in the optimal lot sizing for these goods.

3.2 Introduction to Multipharma When considering the entire supply chain (Figure 3.1), this master dissertation focuses on the

second stage. Multipharma Group is a Belgian wholesale company, which core business is to

Page 54: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


distribute medicines. The company disposes of a network of 270 corporate-owned

pharmacies. Multipharma was founded in 1921, with the aim of making medicines financially

accessible to a broad public. Nowadays, it is the sixth largest player of drug distribution in

Belgium. Moreover, it is the largest chain in the pharmacy sector. Besides the distribution of

medicines it also has 23 iU stores where parapharmaceutical products are sold. iU, originally

named Equiform, was founded in 1995 and since then it is the pioneer in the distribution of

parapharmaceutical products such as care, dietary and baby products in Belgium.

Multipharma’s vision is patient-centered. The pharmacists endeavour to advise their patients

and guide them in terms of medication use, health and quality of life in general. The

pharmacist will build a relationship of trust with the patient, but also with other care

practitioners, (home) nurses, etc. Consultation is an integral part of the daily activities in the

pharmacy. In order to optimize the quality of service, Multipharma increasingly invested in

continuing education of pharmacy teams, scientific support and innovative pilot projects in

the interest of the patient. The goals are quality, efficiency, safety and accessibility.

Accordingly, the strategy of Multipharma is strongly focused on service. Multipharma’s

mission is to be able to deliver the prescribed product immediately and to reduce the number

of clients who need to return a few hours later because the product is not available. With the

same or a lower stock level they want to create a better service level.

The pharmaceutical sector has not been spared from governmental pressure to contribute to

savings in the healthcare sector: since April 1, 2012 the price of reimbursed medicines was

decreased by 1.95 per cent and pharmacies were obliged to propose the cheapest drug to the

patient to benefit from reimbursement. These measures had a direct impact on the margins,

which is why supply chain efficiency and cost savings are more than ever the eye catchers

of the company. Multipharma faced a pretty big challenge: there was no supply chain

department in 2010. Each department worked independent of the other and the same can be

said of the pharmacies. Despite the fact that the pharmacies belong to the network of

Multipharma and they are not self-employed but clerks, the pharmacists are used to a very

great freedom to run their pharmacy. It is a fairly complex situation, because on the one hand

they are Multipharma pharmacies but on the other hand they are not required to order all

references at Multipharma. That is why Multipharma started to reorganize the whole chain.

The objective was to incorporate supply chain principles in the process and further leverage

Page 55: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


on their scale. Multipharma had access to market demand and order data of all 40k SKUs6 and

used this big data to accomplish the goal of reorganizing the chain. On the one hand, the

system would allow Multipharma to make more efficient use of the central warehouse

capacity to cope with increasing volume and product diversity; on the other hand it would

encourage Point Of Sales (POS) to order more products through Multipharma. Today the

supply chain department is responsible for the statistical forecast, which is enriched by the

Category Manager, the Marketing department, feedback from the network, etc. Besides

introducing supply chain principles in the chain, Multipharma had to change the mindset

around the ERP system. They had been using SAP for years, however, not using more than 10

per cent of the functionality. About everything was done with individual Excel sheets and

paper printouts. There was some cooperation between the different departments, but each

department worked according to its own methods and with its own tools (Excel, SAP, Word,

etc.). In today’s environment, with a lot of emerging technologies, the old way of thinking

would cause them to fall behind their competitors and something had to change dramatically.

Internally, they developed additional functionality on top of SAP to enrich the forecasting.

Fearing a lot of internal resistance, they did not opt for the implementation of a completely

new forecast package. Hence, they chose a step-by-step approach, which will be explained

next (“Supply chain als wissel”, 2012).

Part of the first project was to set up more advanced internal procedures such as demand

planning, forecasting and inventory management. Improving the quality and efficiency of

demand forecasting was an essential part of the reorganisation. This efficiency also included

optimizing the time spent on each task, such that a planner could bring more added value to

the process. Increased forecast accuracy and precision combined with inventory optimization

resulted in a better balance between stock levels and service levels. By applying and

incorporating new business rules, the outcome of the whole process became easier to predict

and manipulate, without detailed human interference. The improvement of the supply chain

backbone was an essential first step to prove the reliability of Multipharma towards the POS.

The scope of the inventory optimization project was limited to forecasting central warehouse

demand of centralized SKUs (about 12,500 SKUs) and optimizing the order flow to the


6 Note that this amount of SKUs is based on the entire assortment of 2010. In 2016 the assortment is even more

differentiated with up to 71k SKUs.

Page 56: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


The next step, which is still not fully operational today7, is to provide POS with trustworthy

order propositions of existing or new Multipharma products. Multipharma expected that the

better they are able to assist POS with their own stock management, the lower the threshold

will be to place orders through Multipharma. Tooling to provide insightful order calculations,

options to approve or overwrite orders and in later stages automatically accept order

propositions from Multipharma is required. However, convincing a network of 270 local

pharmacies, each with their own stock system, why they need to hand in part of their

independency, is not an easy job. In practice, this implies that the pharmacies no longer

determine the inventory parameters. Instead, they are automatically replenished by

Multipharma. Based on the individual sales of each pharmacy, each week, the stock

parameters should be calculated centrally and deliveries should be made to the pharmacies

based on these estimations. In principle, this process can operate entirely automatic. In line

with their vision it will give the pharmacists the opportunity to focus on advising patients and

reducing the efforts of administrative hassle. However, the pharmacists are not that happy

about the new central stock and forecasting policy. The communication to pharmacies about

the new ordering policy is one of the most challenging tasks for Multipharma. It is very tough

to convince the pharmacists the new system will work significantly better, faster and more

precise than the traditional way of ordering. Multipharma had underestimated the impact of

change management and communication. However, the positive outcome of this project must

come from the combination of the capacity of the system and the experience and knowledge

of the pharmacists. Multipharma expects the pharmacies will start to appreciate the quick and

reliable delivery of products of the wholesale distribution warehouse.

Another step of the reorganization is related to Multipharma’s warehouse system. In the near

future, Multipharma will be able to control all warehouse processes and areas with a single

software system. The system makes use of continuous lot tracking and optimizes all

warehouse areas from goods-in to goods-out, including returns. Using a standard interface,

the Warehouse Management System (WMS) will be completely coupled to the already

existing SAP system. Meanwhile, they are improving the control of the warehouse, with the

optimization on utilization of KiSoft, the WMS-Package8 of Knapp, which they do not yet use

to the fullest. Another goal, that is not yet fully accomplished, is related to the SAP system.

The SAP system should capture and share global information from within the company and

7 To date, all processes are managed independently but not yet fully integrated.

8 A Warehouse Management & Controle System

Page 57: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


across its supply chain. Users of the system from all kind of departments (supply chain, sales,

HR, etc.) within Multipharma should use the same tool and share information to achieve

higher productivity and performance. Management will be more able to react quickly when

performance drops in a specific pharmacy thanks to real-time dashboards and they will have

better insight to identify the cause of issues. Hence, this will certainly improve the quality of

operational decisions because real-time information is shared. This real-time information

combined with Multipharma’s SCM software will provide analytical decision support.

As a combined result of these first improvements there was a significant reduction of 10 per

cent of inventory in the central warehouse, which resulted in 2 million Euros of savings.

Those internal optimizations are necessary to convince pharmacists in the future to order in a

different way. Today, the average coverage of the products is 24 days9 where high rotating

products are held no more than one week in the stock of the central warehouse, while slower

rotating products are at 6 weeks of stock on average. The target Multipharma aims to obtain

in the future, based on the previously described not yet fully completed improvements, is 20


3.3 Network description Multipharma is a wholesale company with one Distribution Center (DC). Although

Multipharma is rather small compared to its top competitor that has 10 DC’s, the logistic cost

of Multipharma compared to its revenue is better than most of its competitors. The warehouse

is one of the most efficient 10

in the country. It is located in Anderlecht where about 12,500

distinct SKUs and 2,500,000 units (boxes) are kept in stock. Over 65 per cent of the orders to

pharmacies are automated thanks to robots, which is why the delivery can be completed in

record time and with the highest reliability. All the pharmacies are supplied once a day,

except for Brussels where it happens twice a day. As can be seen in figure 3.2, most of the

pharmacies contain between 4,000 and 6,000 drugs and other care products. Multipharma

develops tools and processes to help the pharmacies ensure the continued availability of these

products. Besides pharmacists there are other distribution channels as well such as prisons, e-

commerce, B2B, iU’s, etc.

9 Based on a 6 day week 10 The warehouse is 65% fully automated and 25% partially automated, which leaves 10% manually picked.

Page 58: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 3.2: Frequency count according to size of assortment

Source: D.V. Belle, personal communication, March 9, 2017

The as is supply chain network in which Multipharma operates consists of two physical

warehouses, Multipharma and Pharma Belgium/Belmedis, supplied by a total of 600 to 900

suppliers. For a specific set of products (non-centralized) the Multipharma’s warehouse takes

up the function of a cross-dock. It is assumed that handling non-centralized items do not have

an impact on possible warehouse constraints. The bottleneck of the warehouse is its limited

volume capacity that affects how order decisions are being made and order generation can be

optimized. The biggest constraint is making sure the current warehouse ‘survives’ until 2019,

when they will build a warehouse twice the size of the current one. The biggest problem is the

inbound logistics11

, where there is a limited capacity due to heavy impact by administrative

processes. Reception has a capacity of about 500 to maximum 550 orderlines a day. This

could be an alerting threshold, therefore, reception seems to be the bottleneck. The overall

network design of Multipharma is represented in figure 3.3. There are three types of inventory

points (indicated in red), namely at the suppliers, at the warehouses and at the pharmacists.

These are the locations where SKUs are stored, produced or transformed. An inventory point

is considered as an independent facility with its own stock management objectives and KPI’s.

Shortages in the network of inventory points can occur for suppliers to central warehouses

and from central warehouses to pharmacists (POS). Multipharma is represented by the

rectangle in the middle and consists of the DC and customer service department. Multipharma

obtains the products directly from a large amount of different suppliers. Most of the time one

order at a time is placed with each supplier resulting in many invoices and by consequence,


The activities of receiving, storing, and disseminating incoming goods or material for use.


Page 59: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


increasing the administrative workload. The order decisions are based on the forecast

prediction of SAS12

, aiming to match supply and demand to the highest extent.

Multipharma’s warehouse contains the 12,500 most performing SKUs. It should be noted the

variety of SKUs sold by the pharmacists was 71,000 in 2016, hence, although Multipharma is

the primary deliverer to the pharmacies (88.5%) there has to be another important partner in

the network. Pharma Belgium/Belmedis is a second wholesaler and preferred partner of

Multipharma. This wholesaler has a larger scope of products (40,000 SKUs) and delivers

more frequently when needed. At the first glance, pharmacists should prioritize the extended

scope and deliveries of Belmedis, however, the opposite is true because Multipharma offers

better pricing conditions to the pharmacists to retain customer relationships with the

pharmacists. Note that the retailers can order directly from the suppliers (7.94%) or transfer

products to other pharmacies (0.06%). The product reaches the end-consumer because the

customer enters a POS and requests the products needed and/or prescribed and which the

pharmacist provides from stock.

Figure 3.3: Network design Source: D.V. Belle, personal communication, March 9, 2017

12 The case study aims to raise the forecast accuracy even more using SAS as an analytical tool in combination with new sources of information.

Page 60: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 4


This chapter outlines the scope and methodology of the case study. The aim of the case study

is improving demand forecast accuracy of Multipharma by using big data analytics (cf.

Section 2.3). Since 2015, there is a continuous tendency to improve forecast accuracy with

SAS programming and SAS Forecast Studio within Multipharma. From section 2.3 it is clear

leading companies use more advanced technologies like a demand forecasting software and a

SCM system. The positive impact of these advanced analytical tools on Multipharma’s on

hand inventory is evidenced in figure 4.1 for the past two years. The average days in stock

calculated at the end of 2016 reduced with about 3.5 days compared to the end of 2015.

Knowing that one day of stock costs about 800,000 Euros, it can be calculated the savings of

Multipharma in 2016 were as much as 2.92 million Euros (D.V. Belle, personal

communication, March 23, 2017).13

Figure 4.1: Coverage in days 2015-2016: Impact of SAS on inventory

Source: Multipharma, slideshare operational results (2016)

13 A critical reader might have noticed that the coverage from April to July 2016 was worse compared to the coverage of 2015. This was mainly due to errors at SAS’s first launch. Wrong table linkages were created, by consequence, wrong stock was ordered creating problems for inbound logistics. Note that the problems were solved in April and just needed some recovery. Thereafter, it is clear the analytics created significantly better results.

Page 61: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


However, there is still a gap between the forecast and actual demand of the products that can

be reduced with new insights and techniques. First, these new insights are obtained from

qualitative research; weekly meetings with the Supply Chain department of Multipharma

reveal additional insights to improve the time series of certain product classes. Together with

the Supply Chain department, new demand drivers are identified and prioritized (cf. Figure

6.1). The business insight of the Supply Chain department is essential to define demand

drivers, which are not yet modelled in the current forecasting model. The quantitative

research will focus on the priority one demand drivers related to promotions. As mentioned in

section 2.3, leaders have the ability to model events like promotions. Hence, we expect to see

a benefit of using internal data to model three kind of promotional events. Therefore, the first

hypothesis that will be investigated is whether modelling different promotional events using

internal data sources can improve the demand forecast accuracy of the company.

Furthermore, from section 2.3 it is clear leaders are able to include causal factors (e.g.

weather conditions) into demand forecasts. Therefore, the second part of the quantitative

research focuses on seasonality (priority 2) in combination with weather conditions and

Google Trends data (priority 3). Consequently, the second hypothesis that will be investigated

is whether demand forecasts for seasonal products (flu-related medicines, sunscreens,

mosquito products and insecticides) will benefit from including external data sources as

predictor variables/causal factors (cf. Section 2.3) into the baseline model and afterwards if

the Google Trends data can be used to exclude extreme order peaks from the data. Section 6.2

and section 6.3 will re-evaluate the accuracy of the forecast when incorporating the internal

and external data sources, respectively. In other words, the case study investigates if superior

value can be obtained by combining already existing data of the products with additional

internal and external data sources.

The first part of the analysis is built on the research paper of Cachon and Fisher (1997) in

which they forecast normal demand with an Exponential Smoothing Model (ESM) where the

forecast is not updated if it occurred on a promotion day. Although, the case study is not

about sell-through (consumer) promotions but sell-in promotions14

it seems reasonable to

assume that forecasting orders from retailers can be improved based on this technique. The

second part of the analysis is related to other work where forecasts for the opening weekend

14 Sell-in promotions are promotions from the manufacturer (supplier) to the retailer where the retailer does not pass through the promotions to the end-consumer thereby stocking inventory and serving the consumer from stock after the promotional period (Chopra & Meindl, 2013).

Page 62: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


box-office revenue for feature films, first-month sales of video games, and the rank of songs

on the Billboard Hot 100 chart are obtained making use of search query volume. In this

research Goel et al. (2010) found that search counts are highly predictive of future outcomes

and generally raise the performance of baseline models fit on other publicly accessible data.

Apparently, to date no studies have been published to improve demand forecast accuracy of

pharmaceutical products combining several forecast capabilities of leading companies

(Purdue University & SAS, 2008), thereby using internal data sources to model promotional

events and external data sources (weather conditions and Google Trends data) to include

causal factors. Moreover, this case study will verify the impact on safety inventory of

improved demand forecast accuracy.

The coding of the case study will be executed with SAS Enterprise Guide. The obtained data

sources will be used to extend the historical dataset with new explanatory variables, i.e. the

new demand drivers. The output in the form of an extended SAS table will be used in SAS

Forecast Studio, similar to the real forecasting technique of Multipharma. SAS Forecast

Studio is a forecasting application that is designed to speed up the forecasting process through

automation. The forecasting process of SAS Forecast Studio is based on a stepwise approach

as represented in figure 4.2. By default, an automated overnight forecasting process will apply

the correct forecasting model based on the product classification, create a forecast and alert

irregularity. Alerts can be reviewed by the demand forecaster and overwritten if necessary.

Every time, the forecast takes into account all existing and new forecast settings and only the

future is reforecast (SAS Institute Inc., 2014).

Before conducting the analysis with additional demand drivers, chapter 5 describes the

existing forecasting process more in detail. The data provided by Multipharma and the

product grouping will be outlined in section 5.1 and section 5.2, respectively. The

classification of products into different groups is required because different forecasting

strategies can be distinguished based on the different product classes. Different products will

have different forecasting schemes. Forecast settings defined on group level are copied to all

products belonging to the product group in question. The second step of the forecasting

process implies fitting the appropriate model to the various product classes. For each product

group all candidate models of SAS Forecast are fitted against the time series. The best

performing model is automatically selected based on the chosen evaluation criterion (MAPE)

and holdout sample. Afterwards, the model is used to extrapolate the time series into the

Page 63: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


future, thus creating a statistical forecast (step 3). However, the automated procedure of

selecting the appropriate model will not be considered as a black box, based on a critical

approach the underlying assumptions of the model will be examined and an alternative model

will be proposed if not appropriate. Therefore, a separate section (Section 5.3) is devoted to

the selection of models in SAS Forecast Studio to understand the underlying principle. The

theory behind the most likely models, ARIMA and Exponential Smoothing, is essential to

understand the time series forecasting and is provided in Appendix B.

Figure 4.2: Forecasting process of SAS Forecast Studio

From SAS Forecast Studio the accuracy of the demand forecasts, in the form of a forecast

error, can be extracted. This is the necessary component to re-evaluate the demand accuracy

subject of section 6.2 and section 6.3. What follows will describe the method to evaluate the

quality of the forecasting process with the Mean Absolute Percentage Error (MAPE). As

explained by Chopra and Meindl (2013), improving forecast accuracy goes along with

reducing the forecasting error. Every instance of demand has a random component. The basis

of a good forecasting method should capture the systematic component of demand but not the

random component (i.e. the forecasting error).




g au




Step 1: Classify Products

Step 2: Apply Forecast Model

Step 3: Create Forecast

Step 4: Review Forecast

Page 64: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Forecast error for period t is given by , where the following holds


The error in period t ( is the difference between the forecast for period t ( and the actual

demand in period t . Many Key Performance Indicators (KPI’s) express the quality of

demand forecasts. MAPE is a well-known forecast quality KPI. It is the average absolute

error as a percentage of demand.


with Dt being the actual demand in period t, Ft the forecasted demand in period t. The

absolute value in this calculation is summed for every forecasted point in time and divided by

the number of fitted points N. Multiplying by 100 makes it a percentage error.

MAPE is the preferred KPI of forecast error when the underlying forecast has significant

seasonality and demand varies considerably from one period to the next (Chopra & Meindl,

2013). MAPE is the most suitable KPI of forecast error for most of the product classes of

Multipharma. Taking this into account when analyzing the impact of the additional data

sources, it can be evaluated if the demand forecast accuracy of certain product categories is

improved. In chapter 7, the benefits Multipharma obtains from the improved demand forecast

will be investigated. The impact on the safety stock (Section 7.1) and the corresponding

carrying cost (Section 7.2) will be quantified. It should be noted that these calculations are

based on the literature in chapter 2.4, supplemented with continuous qualitative input from

the Supply Chain department to make it more conforming to the way of working of the

company. Another remark is that the demand forecast will be executed based on an aggregate

dataset, i.e. a dataset containing the sum of all the order quantities of the individual

pharmacies and parapharmacies of Multipharma. Hence, the demand forecast and inventory

level of each individual (para)pharmacy (i.e. at a lower level of the chain) will not be

considered. This might be a topic for further research as explained in chapter 9.

Page 65: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 66: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 5

Demand Forecast

5.1 Data description and operating assumptions

To conduct this master dissertation Multipharma provides aggregated data of demand orders

of its 270 pharmacies and 23 parapharmacies (iU stores) as well as data of all SKUs and

information from its SAP system related to the operating environment. The latter includes

information on a SKU level regarding the order processing, lead times from suppliers and the

intended service level they want to obtain.

The aggregated data is provided in the form of two SAS datasets. The first dataset is a

historical dataset of the daily aggregated demand for every SKU from January 2, 2013 till

November 13, 2016 or when the product is relatively new, from the moment it started to exist.

This results in a dataset of 14,393,373 observations. The second dataset is a historical dataset

of the weekly aggregated demand for every SKU from December 31, 2012 till March 13,

2017. The dataset contains 2,032,331 observations. Moreover, this dataset contains a binary

variable to indicate if a product is in assortment or not, thus whether it is centralized or non-

centralized (cf. Section 5.2.1). Most of the time, it is always in assortment from the moment it

starts to exist. In addition, the data incorporates for every SKU and every week, four grouping

variables. These variables are used to classify the products into different groups, which will

be further explained in detail in section 5.2.2. Besides the aggregated historical datasets,

Multipharma also possesses a dataset describing all unique SKUs. The data consists of 12,578

distinct SKUs with a unique code and name of the supplier, the product name in Dutch and

French, whether the product has a basic and additional promotion and how much, a couple of

variables regarding the group levels (cf. Section 5.2.2.) and the planner responsible for the

product. Moreover, the dataset contains binary variables whether the product is centralized

and existing, which is related to the product classification described in section 5.2.1. and a

Page 67: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


couple of other variables for which the explanation is out of the scope of this master


5.2 Product Grouping The rationale behind the grouping of the products is twofold. One the one hand, different

products will have a different forecast purpose and require a different forecasting strategy.

Those different product classes must be identified. On the other hand, it is known that forecast

accuracy can be improved by producing one single forecast for a group of similar products,

which is then disaggregated to the individual SKUs making up the group instead of creating a

separate forecast for every single SKU. This is because a lot of SKUs have sparse historical

sales data. The key in here is to come up with relevant groups of substitute goods. Forecast

settings defined on product group level are copied to all products belonging to the product

group in question. In dialogue with the Supply Chain Manager, Van Belle D., the general

product classification and group levels are defined and explained in section 5.2.1. and section

5.2.2. below (D.V. Belle, personal communication, March 23, 2017). The qualitative

information of the manager is completed with webpage15

information of the WHO and the

general pharmaceutical association (APB), which explains some classifications more in detail.

5.2.1 General product classification A first logical split of products can be made by separating non-centralized products,

centralized products and products in decentralization. The nature of forecast for these groups

is completely different. All non-centralized products are considered as references outside

assortment. Automated forecast for references outside assortment are purely ‘informative’ and

are not part of the operational forecasting framework. Purpose of the products in

decentralization is to follow up the current and expected stock positions for those products.

When not a lot of new pharmacists’ orders are expected, it can be decided to remove all

remaining stock from the central warehouse for the product in question. In contrast,

centralized products are kept in stock in the central warehouse and could be directly ordered

by the pharmacists from Multipharma. Pharmacists’ sales are the only source available to

15 Source:

Page 68: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


forecast demand for decentralized products. Demand forecasts for centralized products are

based on the order history from pharmacists to the central warehouse. Purpose of the forecast

for centralized products is to reduce uncertainty for the central warehouse order generation

process. Being ‘centralized’ is a product characteristic that can change over time e.g.

sunscreens are centralized during summer, but are not centralized during winter. At any given

point in time, there are about 12,000 centralized items and 70,000 non-centralized items.

Centralized products will be split in new and existing products. Existing products are either

permanent or non-permanent. New products will be divided in different subgroups (Figure


Figure 5.1: Overview product classification New products

A product will receive the status of ‘new product’ at the moment it becomes centralized and

when no order history can be detected. The period during which a product is considered as

‘new’ is controllable by the demand planner (an end-date will flag the ‘end’ of the ‘new

period’). New products are the hardest to forecast, as there is no historical order data

available. Therefore it relies mainly on the experience of the demand planner who needs to

Non-centralized product

Products in decentralization

Centralized products

Existing products

• Permanently centralized

• Non-permanently centralized

New products

• Completely new products

• Sucessor products

• New in category

• Limited edition

• Line extension

Page 69: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


assign an appropriate forecast strategy for those new products. We differentiate between the

following five classes of new products all requiring a different forecast strategy.

Figure 5.2: Characteristics of five types of new products Existing products

All centralized products that do not have the status of ‘new product’ are considered as

‘existing product’. Order demand for existing products can be forecasted by looking at

historical order patterns for those products. The more stable historical patterns and the higher

past order volumes, the better the forecast. When there is a large number of series to be

forecasted, as is the case for Multipharma (11,067 series), choosing an appropriate forecasting

method for each series has the potential of major cost savings through improved accuracy

(Fildes, 1989). When doing an aggregate forecast, a single method is applied to all the time

series of a particular class and afterwards this aggregate forecast is broken down to an

Completely new product

•Revolutionary or niche products •No reference products •Forecast: human judgement and experience

Successor product

•New package, different formula or composition •Updated version of existing product (predecessor/ reference product) •Forecast : historical order data from reference product which disappears

New in category

•New generic entering market •Possibility of cannibalization effect and market share redistribution within category •Forecast: similar existing product is reference product

Limited editions

•Promo items (e.g. price reduction, giveaway or extra volume) •Reference product will not disappear •Only for certain time period (fixed promo period or until stock lasts) •100% cannibalization effect on reference product during the specified time period

Line extensions

•Another packaging unit •Reference product will not disappear •Change in order behavior and possibility of cannibalization on 'original' product •Forecast: Existing 'original' product is reference product

Page 70: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


individual level. Those proportions can be specified by first estimating a base forecast for the

SKU level. In contrast, individual forecasting identifies a particular method appropriate for

each series. Often disaggregated data are noisier than aggregates constructed for them, which

makes the series of these data harder to forecast and of a lower quality. This is in line with the

second law of forecasting which states that detailed forecasts are worse than aggregate

forecasts (Hopp & Spearman, 2008). On the other hand, sometimes information may be lost

doing aggregated forecasting; it can distort trend, seasonality and other individual product

characteristics, resulting in a ‘loss of information’. For some products, the individual order

level will be good enough to produce accurate forecast, but most products will profit from

hierarchical or grouped forecasting. Aggregating similar products almost always improves the

ability to model and predict trend and seasonality. One method outperforms the other for each

SKU depending on the type of product, the lifetime, its relation to other products, etc.

Therefore, the forecasting method of Multipharma executes both aggregated and individual

forecasts for each SKU. With regard to the aggregated forecast, there is no single product

classifier that will fit for dividing all products into exhaustive groups with enough similarity.

On the other hand, using a combination of product classifiers often result in too many small

groups not covering all similar products. Therefore, four grouped forecasts will be made that

are then disaggregated at the SKU level. The products are aggregated based by general

classification techniques applicable in the pharmaceutical industry (IMS, DCI and APB) and a

classification technique specifically defined by Multipharma (GSTAT). Each classification

divides the products into different categories. Hence, for each product four forecasts are

obtained from breaking down aggregated forecasts based on the category of the class to which

it belongs. Moreover, an individual forecast will be executed for every SKU. The best

performing forecast framework out of those five will be used to generate the actual forecast

for the SKU. The following section describes the four classifications more in detail.

Figure 5.3: Hierarchical breakdown of disaggregated data

Page 71: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


5.2.2 Group levels IMS Classification

The Uniform System of Classification (USC) is a categorization system, developed by IMS,

to resolve a need for therapeutic classification of pharmaceutical products. This classification

technique is widely accepted in North America as the standard for pharmaceutical product

classification of products. Logical grouping of pharmaceutical products based on this

classification system makes it easier to identify products competing in the same or similar

markets. The IMS Classification consists of 1 top-level and 3 or 4 sublevels (according to the

product). The three sublevels are the attributes considered when rendering a decision

regarding the placement of a product or the creation of a new USC category (Alvarez, 2015).

There are approximately 2,400 unique combinations. The explanation of the sublevels is out

of the scope of this master dissertation. A shortcoming of the USC classification is that the

product can only be classified when there is a sales registration. Sales history is not available

for new products, which will be classified in the last category: ‘Other’. Another shortcoming

is that 5 to 10 per cent of centralized SKUs do not have an IMS classification INN/DCI Classification

The International Nonproprietary Names (INN) classification names a pharmaceutical

substance or active ingredient with an official universal and unique name. The existence of an

international nomenclature for pharmaceutical substances, in the form of INN, makes the

communication and exchange of information more efficient and convenient among health

professionals and scientists worldwide. This is beneficial for a clear identification, safe

prescription and dispensing of medicines to patients. Substances belonging to the same group

have similar pharmacological activity. The generic names indicate via their stems what drug

class the drug belongs to. There are about 1,600 unique combinations (stems), which will not

be presented in this master dissertation. Nevertheless, it should be kept in mind that this is a

relevant classification system for the products of Multipharma. A shortcoming is the

unavailability for the majority of products, 60 to 70 per cent of centralized SKUs does not

have a DCI classification.

Page 72: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

49 APB Classification

The “Algemene Pharmaceutische Bond” (APB) is a federation of mainly independent

pharmacists in Belgium. It has created a legal classification with 18 unique combinations,

based on the national codation (CNK). The codation is applied to all medicinal and

pharmaceutical products (medical devices, biocidal products, food supplements, cosmetics,

etc.) that are delivered in the pharmacy; both for human, veterinary and phyto-pharmaceutical

use. Contrary to IMS, this classification is known independent of any sales registration, it

should be known one month after product creation. GSTAT Classification

The “Groupe Statistique” (GSTAT) is a classification method drafted by Multipharma to

assign a personal classification to its products. There are approximately 10 unique

combinations. A lower sublevel of this classification exists, but we will stick to this more

general level of classification.

The IMS top level, APB and GSTAT classification can be found in Appendix A. A recent

forecast obtained from SAS Forecast Studio of the current dataset outputs the forecast method

used for each product, i.e. the forecast technique that obtains the most accurate result for that

specific kind of product. The distribution of the different methods is represented in figure 5.4.

About 62 per cent of the products will make use of the aggregated forecast technique. The

other 38 per cent of the products benefit from an individual forecast because they have

sufficient historical information.

Figure 5.4: Distribution of products over the different forecasting methods

Source: Dataset Multipharma

Page 73: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


5.3 Model selection SAS Forecast Studio supports the needs of forecasters who need to move through the

production forecasting process quickly (Wolfe, Leonard, & Fahey, n.d.). With the automated

forecasting procedure Multipharma is able to generate statistical forecasts for all time series.

In order to generate these forecasts SAS Forecast Studio must first determine an appropriate

model for each time series. SAS Forecast Studio user's guide (SAS Institute Inc., 2014)

explains how the program selects the appropriate models to execute forecasts given the input

data. SAS Forecast Studio runs a series of diagnostics to determine the characteristics of the

data (such as seasonality or intermittency), and avoids models that are inappropriate for the

data. If diagnostics determine that a series is intermittent and by consequence continuous time

series models, such as Autoregressive Integrated Moving Average (ARIMA), Exponential

Smoothing (ESM), or Unobserved Components Models (UCM) cannot be used, SAS Forecast

prevents these models from being used. On the other hand, if the diagnostics determine that a

series is continuous and as a consequence Intermittent Demand Models (IDM)16

cannot be

used, SAS Forecast will avoid these models from being used. When SAS Forecast Studio

diagnoses a project, it attempts to fit all the models in the model selection list. By default the

model selection list consists of ESM, ARIMA and IDM. External models can be added to the

model selection list. However, ARIMA en ESM models seem to work well for the existing

and newly created datasets of Multipharma.

By default, SAS Forecast Studio chooses the best-performing model in the model selection

list as the forecast model. The best-performing model is chosen based on the chosen holdout

sample and selection criterion, which identifies the most accurate model. A preferred and

common selection criterion in business forecasting is the MAPE as explained in chapter 4.

The MAPE is the average of all the individual absolute percentage errors. When MAPE is the

selection criterion, the model with the smallest MAPE value is the best-performing model. It

should be noted that sometimes the ‘best-performing’ model according to SAS Forecast

Studio is not the one the Supply Chain department had in mind. However, Multipharma may

not have time to review all of its 11,067 automated forecasts. In order to efficiently select

those forecast with an inaccurate model fit, they need to quickly identify and address

exception forecasts first (Wolfe et al, n.d.). Based on the distribution of the MAPE of all time

16 IDM are used for time series that have a large number of values that are zero or other constant values. Intermittent time

series occur when the demand for an item is intermittent. Because many time series models are based on weighted

summations of past values, they bias the forecast toward zero. Therefore, their models will not work for intermittent time

series data (SAS Institute Inc., 2014).

Page 74: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


series, Multipharma may identify a certain threshold above which the MAPE is too high and

hence the model may be inaccurate17

. When Multipharma sets the threshold of the MAPE to a

reasonable 160, it only considers 1.7 per cent of the products.18

Using domain expertise might

reveal why some models are not appropriate. When this is the case, the automatically

generated statistical forecast can be overridden with a user forecast. When forecasts are

created in this case study the products with a MAPE above a certain threshold will be

discussed with the Supply Chain department and they will choose a forecast model that is

more appropriate if necessary. Another way to determine if a model fits the data well is by

plotting the prediction error autocorrelation function (ACF) and prediction error partial

autocorrelation function (PACF). The graphics (Figure 5.5 and Figure 5.6) are an example of

the ACF and PACF prediction error for a randomly selected product for which SAS Forecast

automatically selected an ARIMA model.

Figure 5.5: Prediction error ACF of randomly selected SKU

17 Note that an inaccurate model fit is not the only cause of a high MAPE. Most of the time, there is not enough information

available in the data to explain the variability and produce an accurate forecast, thus changing the model in this case might

not be relevant. 18 Note that SAS Forecast Studio also produces ‘alerts’ when the generated forecasts differ significantly from the observed

historical sales, on which Multipharma can act upon.

Page 75: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 5.6: Prediction error PACF of randomly selected SKU

The error prediction for the ACF and PACF for the model built shows that the lags lay within

the confidence error of two standard deviations. This means all autocorrelations for the

residual series are non-significant, which is a necessary condition for a model that fits the data

well. Hence, the graphs reveal that the model built fits the data well. Appendix B discusses

theoretically how the forecast is performed using an Exponential Smoothing and ARIMA

model, because these are the most widely adopted models for the times series forecasts of

Multipharma. What follows describes the basic demand forecasting principles that will be

used for products within the assortment (existing or new products) of Multipharma (cf.

Section 5.2.1). This corresponds with all centralized products and products in decentralization

as described in the product classification.

5.4 Create forecast The current forecasting process is mainly based on the historical order dataset (cf. Section

5.1). The higher the quality of the historical data the more reliable the forecast is. Hence, past

observations are used as a basis to predict the future and a suitable time series forecasting is

of the form


where F is the weekly forecasted order demand, D is the historical aggregated weekly demand

of products from all pharmacies and parapharmacies and t is the current week, t+1 is the next

week, etc.

Page 76: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


However, Multipharma realized quickly this basic approach had its limitations. First, with

regard to the stockout of products, Multipharma chooses not to keep backorders as these

create administrative costs and confusion. Sometimes the product is available the day after,

however, sometimes the product is six months out-of-stock and pharmacies will look for

alternative solutions. In this latter case, backorders might not be relevant when the product

becomes available again. Ignoring backorders causes some pharmacists to consider unfulfilled

orders as ‘lost’, they keep placing ‘the same’ orders until the product is in stock. These

recurring orders might decrease the quality of the data. Hence, the data during a timeframe

where the product is not available anymore at the central warehouse could have different


On the one hand, pharmacists keep on placing ‘the same’ orders. They consider

unfulfilled orders as ‘lost’. The orders placed are a multitude of the actual orders that

would have been placed when the product was not missing. Order data during such

period is an overestimation of actual demand. Data for these periods should not be

used as a history basis but could be an indirect indicator of demand peaks as soon as

the product becomes available.

On the other hand, pharmacists place orders as if they would do when the product

would have been available and as a consequence orders do reflect actual demand.

To solve this problem, information about orders leaving the central warehouse can be used to

correct the order data in the first case. If the outbound drops to zero, a product is missing and

the corresponding out-of-stock period will be used to manipulate the order data where

demand might be an overestimation of the actual demand. Figure 5.7 obtained from the

blueprint information of Multipharma, indicates how the pharmacists’ sales data can be used

to predict central warehouse orders. Note that pharmacists’ historical sales data will be

excluded from the moment outbound of the central warehouse drops to zero to obtain more

‘realistic’ data by filtering out the uncommon out-of-stock data. To solve this problem

analytically an additional ‘event flag’ needs to be created in SAS Enterprise Guide, which

indicates when a product is in an out-of-stock situation, i.e. when the outbound drops to zero.

This ‘event flag’ recognizes when the order data overestimates actual demand and

automatically adjusts the order data in order to minimize the forecasting error for the future.

Page 77: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 5.7: Forecast with out-of-stock period

Source: Provideor @ All rights reserved

The second limitation of the basic approach (Equation 5.1) is about modelling promotional

events. As described in section 2.3, a company clearly benefits from modelling promotional

events. Multipharma recognizes this need and models a promotion using an ‘event flag’. Past

promotions are recognized based on the ‘order type’ and future promotions are based on the

SAP agenda introduced in SAS. The agenda provides information on the start and end date of

the promotion. Based on the agenda, the ‘event flag’ recognizes when a peak in demand due

to a promotion is occurring and automatically returns to normal demand thereafter. In contrast

to out-of-stock events, the data during a promotion that happened in the past is not filtered out

of the historical dataset. Moreover, no distinction is made between different types of

promotions (Passage délégué, Promo O and Action iU), because there was not yet enough

data available to distinguish the different types. This will be the subject of the following

chapter where a new way of handling promotions will be introduced when forecasting with

time series.

The base table from SAS Enterprise Guide, which consists out of the historical, out-of-stock

and promotional information of all 11,067 centralized SKUs, will be used to create a forecast

in SAS Forecast Studio based on the classifications explained in section 5.2.2. The

automatically obtained result in SAS Forecast Studio is a weighted19

MAPE of 53.23 on an

19 The weight for a series is the sum over its entire historical period (N). This is calculated as (N*MEAN) from the series

properties. A more detailed explanation on how to obtain the overall weighted MAPE manually is provided in Appendix C.

Page 78: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


SKU level. The distribution of the MAPE is represented in figure 5.8. This figure illustrates

about 95 per cent of the products have a MAPE distribution lower than 100. However, there

are some products for which the MAPE is even higher than 200. These products have a bad

forecast accuracy, which causes a negative impact on the overall MAPE. The majority of the

models used to execute the forecast are ARIMA models and exponential smoothing models.

Only a small fraction of the products (1%) use intermittent demand models (IDM). The model

distribution is represented in figure 5.9.

Figure 5.8: Distribution MAPE with baseline model forecast

Figure 5.9: Distribution of model type of all SKUs with baseline model forecast

Page 79: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 80: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 6

Demand Drivers

Demand drivers play an important role in raising the quality of a forecast. Predictor variables

or demand drivers can extend the historical information in time series forecasting, e.g.

forecast with ARIMAX models. Internal and external data of Multipharma can be used as

demand driver in the forecasting process and to model obvious events. As described in section

5.4, at present, two events (i.e. out-of-stock and promotion) are included in the baseline model

based on internal data sources and the base forecast is now of the form


where F is the weekly forecasted order demand, is the historical aggregated weekly demand

of products from all pharmacies and parapharmacies, out-of-stock and promotion are two

events and t is the current week, t+1 is the next week, etc. This model tries to explain what

causes variation in demand of the pharmaceutical and parapharmaceutical products. However,

there will always be changes in demand that cannot be accounted for by these demand drivers,

which is why an error term is included allowing for random variations and the effects of

relevant variables not (yet) included in the model. This corresponds to the first law of

forecasting according to Hopp and Spearman (2008), which states ‘forecasts are always

wrong’. Nevertheless, the explanatory model is very useful because it incorporates

information about other variables rather than only historical order quantities to be forecasted.

The higher the error term, the more room for improvement. Business insight might reveal

additional demand drivers, which can be modelled using internal and external data sources.

The aim of this chapter is to validate the hypotheses that big data analytics can offer a

significant benefit for Multipharma. Therefore, it will be investigated if the forecast error will

reduce by using additional internal and external data sources (Supra. chapter 5).

Page 81: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


6.1 Setting priorities Multipharma possesses a lot of semi-structured information in the form of Excel files issued

by the Sales, Marketing, Financial department, etc. In addition, Multipharma also maintains a

continuous flow of structured information from its SAP ERP system. Together with the

Supply Chain department, a couple of concerns with existing forecasting series were

identified (D.V. Belle, personal communication, March 30, 2017). As already mentioned in

the literature (cf. Section 2.1), business knowledge is indispensable to understand the data and

identify potential problems. The communication with the Supply Chain department of

Multipharma was crucial to understand why some products’ forecasts generated by SAS

Forecast Studio were not as expected. Together with the Supply Chain department priorities

were set to improve the forecast of some products. The prioritization was based on rules and

experience, which taught the Supply Chain department which demand drivers could possibly

have a big impact on two classes of the IMS classification where they faced the largest

difficulties: OTC (Over-The-Counter) and PEC (parapharmacy). The first priority demand

drivers have the biggest impact on these classes because today these ‘problems’ are corrected

manually and this takes a lot of time and effort. The orders that SAS Forecast Studio proposes

for these products have a MAPE that is too high and correcting these order propositions

manually has a big impact on the workload. Therefore it would be interesting to include

additional information in the model on an automated basis. Figure 6.1 provides a complete

overview of all new possible demand drivers from a business point of view.

Page 82: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.1: Priority schedule

The first priority is to model events of promotional actions (‘Passage délégué’, ‘Promo O’ and

‘Action iU’, ‘Substitution’), in order to filter out uncommon/volatile data that go along with

these events and therefore to reduce the error of future forecasts. Note that this technique is

different and expected to be more accurate than the traditional way of modelling the

promotion (cf. Section 5.4). Section 6.2 will investigate if this new, more detailed technique of

modelling promotions is superior to the traditional way of forecasting, which uses a more

general technique of modelling promotions. First note that modelling substitution is a major

concern for Multipharma because there is striking evidence of problems with the forecast of

substitution products. When a new promotional item replaces the base SKU, ideally there is a

100 per cent switch to the promotional item. When the promotional product is out of stock or

the promotion is terminated there is a switch back to the base SKU. Figure 6.2 gives an

example of substitution for two products with three promotional periods, for which the daily

demand of the base product (blue line) drops to zero during a promotion (orange line).

• Passage Délégué • Action iU • Promo O • Substitution

Priority 1

• OOS cannibalization • Seasonality • Holiday calendar

Priority 2

• Weather • Google trends data • Healthcare policy • Pricing policy • Commercial policy • Patent loss • Changes in packaging • National campaigns • Duo vs Mono • Changes in the number of PoS and SKUs

Priority 3

Page 83: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Though, SAS Forecast Studio does not capture this obvious relationship due to the lack of

some crucial information. As a consequence, this is an interesting topic to improve the

forecast of these kind of products by inserting new information on a regular basis. However,

to date there is no existing linkage in the SAP system or in the form of Excel files between the

base products and their substitution products. Hence, it would be impossible to relate the

historical and future demand of both products in SAS without automatic input data present.

Figure 6.2: Substitution between two products

In contrast, for the remaining three types of promotions (‘Passage délégué’, ‘Promo O’ and

‘Action iU’) the necessary information is available in the form of semi-structured Excel files

and will be the subject of section 6.2. Each supplier delivers some crucial information about

the promotions to Multipharma. These files have some information in common; they all

contain the brand name, the promotional period, the CNK code and label of the product,

whether it is a new product, whether it is stored in a pharmacy and/or parapharmacy and some

information about the size of the promotion. Figure 6.3 is an example of such an Excel file

for ‘passage délégué’ promotions of the brand ‘Vistalife’. Note that the real base and ‘passage

délégué’ discount percentages are replaced by ‘X%’ for confidential reasons. The most

interesting information needed to model the promotional events is indicated in red. The start

and end time of the promotion will be translated to SAS coding to create the promotional

event that will be added to the baseline model based on the CNK code which corresponds to a

unique SAP code used in SAS to identify unique products.

Page 84: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.3: Example Excel file of supplier

Going back to figure 6.1, the demand driver ‘cannibalization’, which belongs to the second

priority, is an extension of substitution. Cannibalization occurs when a product that is out-of-

stock drives sales of another product. The same issue as with substitution products can be

uttered; there is not yet an existing explicit linkage between the products. Multipharma

discusses internally what kind of data they will need to capture in the future to solve this

problem. In addition, the second priority demand drivers are related to seasonal products and

the holiday calendar. Although seasonality can be captured with SAS Forecast Studio, there is

still some error present in the forecast per example for flu-related medicines, sunscreens,

mosquito products and insecticides. Therefore, Section 6.3 focuses on the improvement of the

forecast accuracy of seasonal products with external data sources, namely weather conditions

and Google Trends data belonging to the third priority demand drivers.

The remaining demand drivers (i.e. annual holiday calendar and the other demand drivers of

priority 3) are expected to have an impact on the demand of pharmaceutical products. As an

Page 85: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


example, the healthcare policy (i.e. the reimbursement by RIZIV20

) is a driver to prescribe or

deliver certain products to pharmacies, because Multipharma is always obliged to propose the

‘cheapest’ products. Another example relates to products with no fixed price (OTC and

parapharmacy) for which Multipharma applies its own pricing policy, which may influence

the way pharmacists propose certain products to the end consumer. Moreover, the pricing and

commercial policy of Multipharma may depend on the pricing and promotions of competitors.

Though, this data is difficult to obtain and as a consequence it will be challenging to model

these demand drivers. On the other hand, some demand drivers like the packaging of products

can be easily modelled because the information is relatively easy to obtain. Boxes of

Dafalgan, which consist of twice the amount of medication as the original packaging, create a

new SKU and this might have an impact on the original packaged SKU. In short, some

demand drivers of priority three will be worth capturing, monitoring and analyzing data, but

this is out of the scope of this master dissertation and might be subject for further research.

The following section will explain the meaning21

of the first priority promotional events in the

environment of Multipharma and examines if the new way of modelling different types of

promotions improves the demand forecast accuracy compared to the base forecast as

described in section 5.4. Note that these promotions are trade promotions offered by the

suppliers resulting in forward buying of Multipharma and the pharmacies. Forward buying

results in large orders during the promotion period followed by very small orders thereafter

and thus will not increase the supply chain’s revenue (Chopra & Meindl, 2013). Moreover,

pharmacists’ order variability will be much larger than customers’ order variability because

the promotions are not passed through to the customers. The second part of this chapter

describes the analysis of the second priority demand driver, seasonality. It will be investigated

if the demand forecast accuracy of seasonal products (flu-related medicines, sunscreens,

mosquito products and insecticides) enhances using external data sources, namely weather

conditions and Google Trends data.

In order to do a proper analysis, the data is divided into a train and holdout sample. In SAS

Forecast Studio, the MAPE of a selection of products is weighted; it is based on the

importance of each SKU in the overall selection of products. Moreover, the weighted MAPE

is calculated based on the historical period and a holdout sample of data at the end of each

20 The National Institute for Health and Disability Insurance in Belgium 21 Source: Supply Chain department, personal communication, March 26, 2017.

Page 86: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


time series that was not used to construct models. Using a holdout sample to judge accuracy is

often referred to as an honest assessment because it simulates fitting and deploying a model

and then judging the accuracy in a live environment. In this case study the holdout sample is

chosen to be three month since this correlates to the normal forecast horizon.

6.2 Internal data

6.2.1 Passage Délégué ‘Passage délégué’ is a term that describes a visit of a commercial representative of a product

supplier or a specialized company22

in a POS of Multipharma. The goal of the commercial

representative is to explain the products, the current commercial conditions and promotions

(discount, free samples, visual display, and gifts) and eventually to propose placing an order

via the commercial representative or via Multipharma’s DC, depending on previous

negotiations with Multipharma. In both cases, the Key Account Manager of a supplier

proposes a list of products and commercial conditions to the Category Manager at

Multipharma. The Category Manager validates the products and the period of ‘passage

délégué’ during which the commercial representatives can visit the POS and present the

selected products. Based on the Excel list of approved products, Category Manager Assistants

update the commercial conditions in the ERP system by including the discount for the

validated period of ‘passage délégué’. They communicate this agreement to all POS of

Multipharma and the internal departments. During the period of ‘passage délégué’ the

supplier will plan visits of their commercial representatives. As this process is managed by

supplier itself and possibly concerns a large number of POS, it is impossible to create an

adequate forecast on which day, week or month suppliers are going to visit the pharmacies

and the precise impact on demand, i.e. the height of the promotional peak. Note that although

having a peak in demand, most of the time ‘passage délégué’ does not increase the sales

revenue for Multipharma because the pharmacists do not pass through the promotion to the

customers. Multipharma will experience a cannibalization of its sales when the discount

disappears because pharmacists have increased the stock during a ‘passage délégué’ period

and continue to serve their customers from this stock the period thereafter.

22 Sometimes activities of a supplier (e.g. Omega Pharma) are outsourced to specialized companies.

Page 87: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


However, Multipharma knows the period in which a ‘passage délégué’ might occur for

certain products, which products will be discounted and the amount of the discount. This is

an interesting source of internal data in the form of semi-structured Excel files (cf. Figure

6.3). According to the Supply Chain department the aim is not to predict when exactly a peak

might occur and the height of the peak due to the promotional event, because this depends on

the agenda and conditions of the supplier. A business rule for inventory management can be

applied during the period of ‘passage délégué’. The business rule specifies how much extra

stock needs to be ordered and thus how the orders towards suppliers need to be adapted with

regard to additional discounts applicable during ‘passage délégué’. Instead, the purpose of

using the internal Excel data sheets is to model the promotional events based on the ‘passage

délégué’ periods and to exclude the peaks in historical order quantities due to these

promotional events. Hence, the aim is to reduce variability of future forecasting, when

‘passage délégue’ is not going on. This is in line with a study of Knuth et al. (2014) stating

that outliers can affect and skew forecast accuracy, and therefore it might be useful to exclude

them from your overall forecasting calculations.

Hence, ‘passage délégué’ promotions create variability in the product forecast due to the

changing order behavior of pharmacists who might order more than normal in order to benefit

from the commercial conditions. This behavior of pharmacists during the promotional periods

is not an accurate predictor for the future and should be excluded from the historical data to

obtain a more accurate base forecast. Figure 6.4 represents the procedure, which is followed

for the analysis of ‘passage délégué’ promotions and this reasoning can be extended to

‘Promo O and ‘Action iU’ promotions, which will be explained thereafter.

Page 88: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.4: Followed procedure of analysis

Figure 6.5 represents the time series (2015-2016) of the product ‘Vicks: Vaporub 100g’ with

‘passage délégué’ promotions in its history during two periods, indicated in red on the

horizontal axis. It can be seen from the graph the first week following the ‘passage délégué’

period of the commercial representative indicates a clear peak of 2,961 SKUs. In contrast, the

second period does not indicate an increase in demand due to the passage of the commercial

representative. This may be due to the fact that the pharmacists do not want to keep much

extra of this product in stock, maybe because they expect the product will sell less during this

time of the year or they still have a significant amount of inventory left due to the previous

promotion. Figure 6.6 represents the historical order quantities of the SKU excluding the

demand during ‘passage délégué’ periods. Focusing on the y-axis, it can be seen the demand

is more level compared to the previous time series. This visual insight leads to the fact that

excluding the peak from the history might reduce the variability for the future.

Visually analyze the effect of a 'passage délégué'

Understand the forecast mathematically

Select all 'passage délégué' products from SAP

SAS coding in SAS Enterpise Guide: creating promotional

events and excluding the volatile order data

Compare the accuracy of the base forecast with the new forecast using SAS Forecast


Discuss the results with Supply Chain department

Page 89: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Period 'Passage Délégué'

02/11 - 30/11/2015

29/02 - 30/03/2016

Table 6.1: Promotional Periods of Vicks Vaporub 100g

Figure 6.5: Time series Vicks Vaporub 100g incl. promotional order quantities

Figure 6.6: Time series Vicks Vaporub 100g excl. promotional order quantities

After the visual insights, the mathematical logic will be explained based on the research paper

of Cachon and Fisher (1997). Note that this interpretation is similar for ‘Promo O’ and

‘Action iU’ and will not be repeated. To exclude the peak the promotional event will be

modelled creating an additional variable: , indicates a promotion occurs in week w, for

SKU s, otherwise, A comment on notation, a w in the superscript refers to a week

Page 90: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


between December 31, 2012 and March 13, 2017, and a subscript refers to SKU

When there is no promotion, equals actual demand, otherwise

Let equal the forecast of normal demand (i.e. non-promotion demand) for week w, where

each is evaluated using a simple ESM


where is a constant. This forecast makes several assumptions about the demand forecast;

promotion demand has little effect on subsequent normal demand because a forecast is not

updated if it occurred on a promotion day. Finally it is assumed that a single constant can

effectively apply to all SKUs. The parameter is chosen to minimize a measure of forecast

errors, defined by the difference between the actual demand and the forecast, in the

calibration period.

The mathematical logic is the reasoning behind the coding of promotional events in SAS

Enterprise Guide and the execution of the forecast in SAS Forecast Studio thereafter. First,

promotional events are modelled based on the period of ‘passage délégué’ and added to the

baseline model. Uncommon data (i.e. the peaks) are filtered out based on these events in the

form of additional variables. Hence, a new dependent variable ‘Quantity excl. passage’ is

created that will be used as the variable to be forecasted. To detect the impact of the model

with the new dependent variable, a selection of products is chosen with ‘passage délégué’

promotions in their history. This selection includes 2,464 distinct SKUs. As explained in

section 5.2, SAS executes forecasts for the entire dataset based on four different

classifications and the SKU level. For each SKU the best forecast is selected. However, the

aim of this section is to compare the accuracy of the baseline model (cf. Section 5.4) and the

more advanced model for a selection of products. Therefore, it seems reasonable to assume

that to compare relative small selections of products the forecasts can be executed on the SKU

level. Hence, in this case study forecasts will always be executed on SKU23

levels, though, in

reality classifications are relevant when the entire dataset is considered.

23 Note that this does not imply that an individual forecast is the best for each SKU in the selection. However, the individual forecasts are the best for the selection of products under consideration.

Page 91: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


The analysis in SAS Forecast Studio is executed as follows. First, the forecast is executed for

the selection of products based on the original model (cf. Section 5.4) with the original ‘order

quantity’ as dependent variable. As explained in chapter 5 (Section 5.3), SAS Forecast Studio

chooses from a selection of models based on the characteristics of the product it is dealing

with. The previous described equation 6.2 only considers simple exponential smoothing

models. Only considering simple exponential smoothing models for the entire dataset will be

too simplistic and cause serious forecasting errors. In contrast, programming the forecast

manually for a large selection of products, where each product possibly needs a different

forecasting model, would be too computational expensive and is considered to be out of the

scope of this master dissertation. It can be said SAS Forecast Studio outperforms manual

programming because it is much faster and offers accurate results for most of the products,

because different models (i.e. ARIMA, ESM and IDM) are fitted to the data and the best-

performing model is chosen based on the characteristics of each product. However, in some

exceptional cases business knowledge can offer additional insight. For example, when

moving average becomes more appropriate than exponential smoothing. However, because

this master dissertation is fulfilled in continuous dialogue with the Supply Chain department

of Multipharma, SAS Forecast Studio will be used and only in exceptional circumstances the

model of a product will be modified manually (cf. Section 5.3). The result of SAS Forecast

Studio of the baseline model forecast for the selection of products is a weighted MAPE of


Second, the order demand is reforecast with the ‘Quantity excl. passage’ as dependent

variable. Hence, similar to equation 6.2, the historical order data during the promotional

period are excluded from the overall historical data to execute the forecast in order to obtain a

more accurate demand forecast, by reducing the variability of demand. The weighted MAPE

of the updated forecast is 39.04. This means the forecast error is reduced by 34.6 per cent for

the selection of products with at least one ‘passage délégué’ in its history. The distributions of

the MAPE of both forecasts are presented in figure 6.7. Group 1 represents the distribution of

the MAPE of the baseline model forecast, whereas Group 2 represents the distribution of the

MAPE of the more advanced model forecast with the new dependent variable. The graphics

include density plots on which the Kernel Density Estimation (KDE) was added to determine

the distribution of the data points. Figure 6.7 shows the distribution of the MAPE’s is shifted

to the left for the entire range of products between zero and 200. Hence, it can be concluded

Page 92: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


inserting the additional demand driver has a positive effect on the forecast accuracy for the

selected SKUs.

Figure 6.7: Passage Délégué: Distribution of the MAPE

6.2.2 Promo O ‘Promo O’ (or ‘Promo Obligatoire’) is a term used by Multipharma to describe promotions

originating from suppliers which are active and mandatory in all Multipharma pharmacies

during a period of one month. The ‘Promo O’ starts on the first Friday of the month and it

lasts until the first Friday of the following month (i.e. when the following promotion might

start). A supplier has purchased a particular promotional location (counter, shelf, display, etc.)

and proposes a selection of products for the promotion. The Category Manager of

Multipharma validates the product selection. Based on historical sales and size of the

promotional location, a fixed quantity for each product is shipped to all pharmacies regardless

of the current stock in the POS. The defined fixed quantity per pharmacy is shipped out to all

pharmacies one to three weeks before the start of the ‘Promo O’. However, for ‘slow movers’

a variable quantity shipment is done, based on the amount each pharmacy already has in

stock. The purpose of doing a variable shipment is to reduce a potential overstock in the POS.

Page 93: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.8 represents the time series (2015-2016) of the product ‘Tilman: Elimin Fresh Thee’

with a ‘Promo O’ in its history during the month of April, which is indicated in red on the

horizontal axis. These time series indicate that promotions tend to be quick surges in demand,

because they are often shipped over only one day. As expected Multipharma anticipates the

‘Promo O’ period about two weeks in advance. This is the peak of 368 SKUs on the 24th


March. Again, the challenge to forecast products with ‘Promo O’ promotions in the past is to

exclude peaks from demand thereby decreasing variability and generating better forecasts

outside promotional periods. It is important to exclude the two weeks in advance of the

promotion because the products are shipped out to all pharmacies two to three weeks before

the start of the ‘Promo O’. Figure 6.9 represents the time series when the entire period from

two weeks before the promotion until the end of the promotion is excluded. The history is

now more level compared to the previous time series when considering the scale of the y-axis.

Period ‘Promo O'

1/04 - 6/05/2016

Table 6.2: Promotional Period of Tilman Elimin Fresh Thee

Figure 6.8: Time series Tilman Elimin Fresh Thee incl. promotional order quantities

Page 94: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.9: Time series Tilman Elimin Fresh Thee excl. promotional order quantities

Using SAS coding a similar promotional ‘event’ is created as with ‘passage délégué’ based on

the start and end date as mentioned in the Excel sheets. However, this time two weeks before

the start of the promotion are also considered as part of the promotional event and the

corresponding order quantities need to be excluded as well. A new variable ‘Quantity excl.

promo’ was created based on the ‘event’ and added to the base table. To detect the impact of

the additional demand driver ‘Promo O’, all products for which a ‘Promo O’ promotion

occurred in their history are selected. The selection consists of 363 unique SKUs. The forecast

with the original ‘order quantity’ as dependent variable is compared with the forecast having

the new variable ‘Quantity excl. promo’ as dependent variable. The latter excludes the highly

volatile order quantities during a promotional event from the historical order data in order to

get a more accurate forecast. The output of the base forecast in SAS Forecast Studio has a

weighted MAPE of 128.64. The output of the updated forecast model in SAS Forecast Studio

has a weighted MAPE of 54.28. The forecast error for the updated forecast model is reduced

by 57.8 per cent. Figure 6.10 represents the distributions of the MAPE of both forecasts and

has a similar interpretation as with the ‘passage délégué’ promotions. Likewise, this figure

demonstrates that the distribution is shifted to the left for the entire range of products with at

least one ‘Promo O’ in their history. Hence, it can be concluded inserting the additional

demand driver has a positive effect on the forecast accuracy for the selection of SKUs.

Page 95: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.10: Promo O: distribution of the MAPE

6.2.3 Action iU ‘Action iU’ is a term used by Multipharma to describe promotions from suppliers, which are

active and mandatory in all iU Points of Sales during a period of one month. The ‘Action iU’

starts on the first Friday of the month and it lasts until the first Friday of the following month

(i.e. when the following promotion might start). A supplier has purchased a particular

promotional location (counter, shelf, display etc.) and proposes a selection of products for the

promotion. Again, the Category Manager validates the product selection. In contrast with

‘Promo O’, there are no fixed quantities per POS that are shipped out for ‘Action iU’, instead

each POS orders the stock they consider necessary. Hence, the order delivered to each POS is

a variable quantity depending on the needs of the Shop Manager. Most of the time the Shop

Manager orders the stock for promotion one week before the promotion starts, because some

‘iU’ products are not shipped on a daily basis. In addition, they need some time to secure the

items, e.g. putting anti-theft tags on the products. In most cases, when a new product becomes

part of an ‘Action iU’ promotion Store Managers order it as soon as it is available. The effect

of ‘Action iU’ is similar to ‘Promo O’, however, the peaks in demand are more unpredictable

and can vary more in size or on the moment when they occur. Again, it is important to

exclude this effect from historical data in order to decrease variability and generate better

Page 96: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


forecasts outside promotional periods. A safe range is to exclude the historical data two weeks

before the actual ‘Action iU’ period takes place until the end of the ‘Action iU’ period.

Figure 6.11 represents the time series (2016) of a relatively new product ‘Vichy Dercos

Shampoo’ with an ‘Action iU’ in its history during the month of July, which is indicated in

red on the horizontal axis. As expected unstable demand occurs two weeks before the start of

the promotion. Figure 6.12 represents the time series when the entire period from two weeks

before the start of the actual promotion until the end of the promotion is excluded. The history

is now more level compared to the previous time series.

Period ‘Action iU'

01/07 - 31/07/2016

Table 6.3: Promotional Period of Vichy Dercos Shampoo

Figure 6.11: Time series Dercos Shampoo incl. promotional order quantities

Figure 6.12: Time series Time series Dercos Shampoo excl. promotional order quantities

Page 97: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


To see the impact of the additional demand driver a selection of 515 ‘Action iU’ products

with at least one ‘Action iU’ promotion in their history will be considered and analyzed. The

baseline model forecast, with the original dependent variable, is executed for this selection of

products. The weighted MAPE of the baseline model forecast is 73.05. A similar ‘event’ as

with the previous promotional types is inserted into the base table by making use of SAS

coding based on the provided information in the form of Excel sheets. A new forecast is

executed with the additional information of ‘Action iU’ promotions (i.e. with a new

independent variable). Hence, the historical order data during the period of an ‘Action iU’

promotion and two weeks before are excluded from the overall historical data to execute the

forecast in order to obtain more accurate estimates. SAS Forecast Studio calculates the

weighted MAPE of the latter forecast to be 64.08. By consequence, the additional demand

driver based on the Excel files of ‘Action iU’ products is responsible for a 12.4 per cent

reduction of the forecast error. The distribution of the MAPE’s is represented in figure 6.13.

From this figure, it can be concluded the additional demand driver ‘Action iU’ improves the

forecast accuracy for the selection of products.

Figure 6.13: Action iU: distribution of the MAPE

Page 98: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


6.2.4 Summary of the results A summary of the MAPE’s is presented in table 6.4.

Baseline Model Updated model % Change

Passage Délégué 59.68 39.04 -34.6

Promo O 128.64 54.28 -57.8

Action iU 73.05 64.08 -12.3

Table 6.4: Summary MAPE’s promotional demand drivers

Several interesting observations follow from the analysis of ‘Passage délégué’, ‘Promo O’

and ‘Action iU’ promotions. About 23 per cent of the SKUs in the study have had at least one

‘Passage délégué’ or ‘Promo O’ in their history but they do not appear both for one product.

About 10 per cent of the existing parapharmaceutical SKUs have had at least one ‘Action iU’

promotion in their history. Weekly demand during a promotional week is often dramatically

greater than weekly mean demand: weekly demand during a promotion is on average 111 per

cent higher than weekly demand in normal circumstances.

The previous examples show additional internal data in the form of semi-structured Excel

files can improve the demand forecast accuracy. The Excel data are used to model three types

of promotional events to incorporate into the baseline model. These events are used in the first

place to exclude demand variability in order to obtain a more accurate forecast for the future.

This technique is superior to the previous described basic promotional information (Section

5.4), which only recognizes the peak but ignores demand variability during the entire

promotional period. Hence, the hypothesis that big data analytics can offer a significant

benefit for Multipharma is validated because adding additional internal data sources reduces

the forecast error. To summarize, when internal data is used in a smart way to extend the

basic forecasting model, it can improve the forecast accuracy of a selected number of

products. It is reasonable to make a proposal to the Supply Chain department to structure the

Excel data files in a better way to automatically inject it into SAS, as it is shown this data

raises demand forecast accuracy. Knuth et al. (2014) states that companies that do not

properly flag or monitor outliers in their demand patterns will overtime distort the inventory

forecasting accuracy which could inevitably create large quantities of excess inventory that

Page 99: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


costs money and hits the profitability margin. Therefore, in chapter 7 the impact of the

improved forecast accuracy on the safety inventory and the carrying costs will be explained.

6.3 External data As mentioned in section 6.1 of this chapter, the second priority according to the Supply Chain

department is to improve the forecast of seasonal products. Seasonal products in the context

of Multipharma are pharmaceutical products for which the demand fluctuates over time with a

repeating pattern. Examples of these kind of products are flu-related medicines, sunscreens,

mosquito products and insecticides (for cats and dogs). As already mentioned in the literature

(Chapter 2, Section 2.3), weather can be an influencer of demand shaping. The impact of

weather is also clearly pronounced when selling pharmaceutical products, e.g. when it is a bad

summer there might be a lot of sun products in stock at the end of the summer. By

consequence, at the end of a season, one will try to get all excess stock onto the market, which

is a suboptimal manner of inventory management. This section will investigate if historical

weather conditions can offer additional value to improve the forecast accuracy of seasonal

products. External data is obtained in the form of Excel files from KMI. KMI is a Belgian

Federal Institute that carries out scientific research in the field of meteorology. The data

obtained contains the daily average temperature and precipitation from January 2012 till

December 2016 in the region of Ukkel.

Besides weather, there is another important source of data to investigate with regard to

seasonal products. The prominent attendance of the Internet in the majority of peoples’ lives

makes it possible to find some trends and patterns out of their search data, i.e. Google Trends.

This is exactly what Google already recognized and therefore created Google Flu Trends

(GFT). The idea behind GFT is that information seeking behaviour on the Web reveals the

influenza status of the population. Ginsberg et al. (2009) evidenced that search queries

outperform simple autoregressive models based on historical data of flu levels. Eysenbach

(2006) revealed a high correlation between clicks on sponsored search results for flu-related

keywords and epidemological data from the Canadian flu season from 2004 to 2005. These

kind of studies prove there is a potential to improve demand forecast accuracy with an

additional demand driver, Google Trends data, based on the insights of GFT. Hence, we

expect Web searches can predict more precisely and accurately when the influenza season

starts and consequently when the sales of flu-related products will start to kick-off. The idea

Page 100: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


behind GFT will be extended to the other types of seasonal products (sunscreens, mosquito

products and insecticides).

Although SAS Forecast Studio can easily absorb seasonality, using Holt-Winters’ seasonal

exponential smoothing models, seasonal ARIMA models and derivatives of these models (cf.

Section 5.3), it will be investigated whether additional data sources, namely Google Trends

and weather conditions, can improve the forecast accuracy compared to the baseline model,

e.g. using an ARIMAX model. Hence, the first research question is whether the additional

data sources can reduce the forecast error by extending the baseline model with additional

explanatory variables, thereby using advanced ARIMAX models. Moreover, it should be noted

the demand of the investigated seasonal products never drops to zero, because products such

as painkillers, sunscreens, mosquito products and insecticides are sold during the entire year.

That is why it is very important to make the forecast during off-peaks as accurate as possible.

With regard to the painkillers, extreme peaks during the winter will cause variability during

off-peaks. The forecast will predict higher sales during off-peak months because of the

extreme peaks, which causes variability in the forecast. Hence, instead of improving the

predicting behavior in general, the data sources might be used to make the base forecast more

accurate following the same reasoning as in section 6.2. Therefore, the second research

question is whether filtering extreme peaks out of the historical dataset by making use of

Google Trends data might improve the forecast accuracy. The same reasoning as with flu-

related products can be followed for sunscreens, mosquito products and other insecticides (for

cats and dogs).

To obtain a proper analysis the selections of SKUs corresponding to the seasonal types that

will be investigated need to be identified and filtered out of the entire SKU dataset based on

the IMS classification. Together with the Supply Chain department the IMS top level and the

subgroups for each seasonal type are identified and represented in table 6.5.

Product IMS level N SKU

Flu J01 199

Sunscreen 83F 288 Mosquito R06 30 Insecticides N/A 45

Table 6.5: IMS class corresponding to seasonal type

Page 101: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Because insecticides belong to the top level ‘Other’ (N/A), a different approach is used to

identify a more precise selection. Products with a label containing ‘Frontline’, ‘Advantix’ and

‘Drontal’ are identified as being insecticides used for cats and dogs.

6.3.1 Selecting search terms Weather conditions are considered as incontestable data, measured by KMI. In contrast,

Google Trends data is data that can be obtained by every individual. The time series of the

search term ‘Griep’ in Belgium is represented in figure 6.14 from January 2012 till December

2016. Although some significant peaks can be distinguished during the winter months, it is

doubtful if only one search term is useful to predict diseases in the short-term and the

corresponding sales of drugs related to these diseases. The former is the reason why Google

invented GFT, based on highly correlated queries. GFT are the result of Google Correlate.

Google Correlate looks for highly correlated search terms. Table 6.6 presents the ten search


, in decreasing order of correlation with the search term ‘Influenza-like Illness’, that

make up the search terms for GFT. Figure 6.15 visualizes the positive correlation of the two

most highly correlated search terms. Figure 6.16 represents the sum of all individual queries

that make up the data for GFT in Belgium.

Figure 6.14: Google Trends data: ‘Griep’

24 Source: Google Flu Trends (

Page 102: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Search Term Correlation

Influenza type a 0.9069 Symptoms of flu 0.9038 Flu duration 0.9033 Flu contagious 0.8919

Flu fever 0.8851

Treat the flu 0.8831

How to treat the flu 0.8830

Signs of the flu 0.8815

How long is the flu 0.8775

Symptoms of the flu 0.8741

Table 6.6: Correlation of search terms Figure 6.15: Graphical representation of

correlation of 2 search terms

Figure 6.16: GFT Belgium

However, using GFT as an additional predictor variable in a company has a drawback.

Google stopped producing data for GFT in 2015. Companies that want to use additional data

sources, such as GFT, to improve demand forecast accuracy need real-time and continuous

data over time. Hence, a limited historical sample of GFT will be insufficient to add to the

historical dataset. However, the underlying principle of GFT can be used to make up our own

trends data based on correlated search queries. In addition, the logic behind GFT can be

extended to other product types in our analysis as well. What follows describes how the

search terms are obtained for the different types of seasonal products.

Page 103: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Google Trends data can be filtered by time and region. The search terms need to be

considered in the period from January 1, 2012 till December 31, 2016. Note that the actual

dataset contains historical data from December 31, 2012 till March 13, 2017. However, the

dataset provided by KMI contains data until December 2016. Therefore, this analysis is based

on a subset of the entire dataset excluding25

the historical data of 2017 and the holdout sample

is considered to be the last 12 weeks of 2016. Google Trends data are only available on a

weekly basis, however, this corresponds with the time aggregated historical order data of

Multipharma, which is also available on a weekly basis. Using the search terms in the context

of Multipharma means it is only relevant when applied to Belgium. Hence, the search terms

need to be filtered based on the country and need to contain both Dutch and French search

terms, because Multipharma is represented in both the Flemish and French part of the country.

In some cases, people are more likely to use English search terms. Hence, English terms will

be considered as well to be relevant search queries. Note that the search terms for each

seasonal product type are selected based on a brainstorming session with Dutch and French

people from the Supply Chain department. The most likely search terms for which Google

Trends data obtains the most significant search results are selected. It should be noted people

are more likely to use words without accents, spaces or hyphens when using the search

engine, as they want to obtain the result absolutely fast. This remark is very important; when

searching for the right queries all kind of writing combinations will be considered in order to

determine the most appropriate queries.

When considering flu-related medicines, nine queries were selected related to ‘flu’ and are

summarized in Table 6.7. Following the procedure to obtain GFT, the correlation between the

best Belgian search term, that is the term for which the sum of the searches over the entire

period is largest, and all the other terms is calculated using SAS Enterprise Guide. The best

search term is represented as the first term in the table with a correlation of one. Based on the

correlations a selection of the individual search terms, which are significantly correlated with

the best search term, can be chosen to make up the trends data for the selection of products.

All search terms having a positive correlation with a significance level lower than 0.0001 with

‘flu’ will be used to obtain the additional explanatory variable to include in the baseline

model. Hence, ‘griepvaccin’ and ‘vaccin grippe’ will be left out of the selection to obtain the

25 We expect the obtained insights from the analysis will be more or less the same since we only exclude a small fraction of

the historical dataset. Note that KMI requires annual payments to provide up-to-date statistics to enterprises. This analysis,

with a limited amount of data over time, will be used to investigate if it pays off to include this additional information and

thus if it is worth the additional annual expense.

Page 104: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


desired result. This is in line with the research of Polgreen et al. (2008) stating that most

influenza vaccination occurs before the influenza season and therefore all vaccination related

searches should be excluded.

Search Term Correlation Significance

Flu 1 <0.0001

Griep 0.76701 <0.0001 Grippe 0.75218 <0.0001 Symptome grippe 0.65322 <0.0001

Griepepidemie 0.5527 <0.0001 Epidemie grippe 0.40696 <0.0001 Griepsymptomen 0.38753 <0.0001 Vaccin grippe 0.17092 0.0056

Griepvaccin 0.16803 0.0065

Table 6.7: Flu: Search terms

Following the same procedure, table 6.8 provides an overview of the search terms related to

sunscreens. All search terms have a positive correlation with a significance level lower than

0.0001 with ‘zonnecreme’ and the sum of the individual queries will be used to obtain the

additional explanatory variable to insert into the baseline model.

Search Term Correlation Significance

Zonnecreme 1 <0.0001

Creme solaire 0.81553 <0.0001 Crème solaire 0.74959 <0.0001

Coup de soleil 0.74110 <0.0001 Zonnebrand 0.73351 <0.0001 Beste zonnecreme 0.69304 <0.0001 Protection solaire 0.61909 <0.0001 Sunscreen 0.60403 <0.0001 Zonnecrème 0.50642 <0.0001 Aftersun 0.45287 <0.0001

Table 6.8: Sunscreens: Search terms

Table 6.9 provides an overview of the mosquito-related search terms. A study of the

individual queries revealed all search terms have a positive correlation with a significance

Page 105: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


level lower than 0.0001 related to ‘Deet’. Hence, they will all be used as queries to obtain the

additional explanatory variable to include into the baseline model.

Search Term Correlation Significance

Deet 1 <0.0001

Anti moustique 0.81729 <0.0001

Anti muggen 0.79214 <0.0001

Muggenbeten 0.74323 <0.0001

Anti muggen armband 0.63687 <0.0001

Anti-moustique 0.60549 <0.0001 Bracelet anti moustique 0.55271 <0.0001

Muggenspray 0.54387 <0.0001

Antimoustique 0.50703 <0.0001 Muggenbeten behandelen 0.48252 <0.0001

Table 6.9: Mosquito: Search terms

Finally, insecticides consist of products against fleas, ticks and worms for cats and dogs.

These products fall into the same category ‘insecticides’, because they have the common

characteristic that people will search for these products when the weather is ‘good’ and they

walk their dogs outside or they let their cats outside more frequently. However, the correlation

of the search terms will be analyzed individually, because people are more likely to search for

a specific product against fleas, ticks or worms. The search terms are presented in table 6.10,

table 6.11 and table 6.12, respectively. They are all positively correlated and significant with

the best search term.

Search Term Correlation Significance

Puces chat 1 <0.0001

Vlooien hond 0.26808 <0.0001 Vlooien kat 0.33738 <0.0001 Vlooienband 0.33394 <0.0001 Vlooienbeten 0.41434 <0.0001 Puces chien 0.33271 <0.0001 Anti puce 0.53034 <0.0001 Anti puce chien 0.39699 <0.0001 Anti puce chat 0.34205 <0.0001

Table 6.10: Fleas: Search terms

Page 106: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Search Term Correlation Significance

Tique chien 1 <0.0001

Tique chat 0.99864 <0.0001

Tekenbeet hond 0.99835 <0.0001

Table 6.11: Ticks: Search terms

Search Term Correlation Significance

Vermifuge chien 1 <0.0001

Ontwormen kat 0.99938 <0.0001 Vermifuge chat 0.99925 <0.0001 Ontwormen hond 0.99924 <0.0001 Ontworming hond 0.99919 <0.0001

Table 6.12: Worms: Search terms

6.3.2 Flu-related products

Figure 6.17 represents the sum of the individual search terms of all seven positively

correlated terms from the Belgian Internet users during the specified period. As expected,

each year the peaks are situated during the beginning of the year. External information

confirms the difference in magnitude of the peak in 2014 compared with the peak in 2015.

Mid-February 2015, the period in which the flu was at its peak, Van Ranst M. claimed there

were five to six times more people infected with the flu compared with the peak in the

previous year. According to Van Ranst, an ‘unfortunate composition’ of the flu vaccine, was

one of the reasons for the high number of infections.

Figure 6.17: Google Trends: Flu

Page 107: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


First, big data in the form of Google Trends data and weather conditions will be used to

examine if this data offers additional predicting power to increase the forecast accuracy of the

demand of flu-related products. Hence, the first question is whether the additional data

sources can offer an increased predicting power when being included as independent variables

into the baseline model. To define the extended model a stepwise linear regression is executed

including all the different combinations of the independent variables, i.e. Google Trends,

precipitation and average temperature. The Akaike's Information Criterion (AIC) is used to

select the predictors; the model stops at step two because adding or removing an additional

effect does not reduce the AIC (Table 6.13). Therefore, we will add the main effect ‘Average

Temperature’ and the interaction effect ‘Trends*Average Temperature’ as independent

variables to the baseline model, because they improve the predicting accuracy. Afterwards,

the extended model will be used to execute the forecast in SAS Forecast Studio. The results

are presented in table 6.14. It can be concluded the additional predictor variables do not

reduce the weighted MAPE and therefore do not improve the forecast accuracy. At the first

sight, this seems rather strange since in 70 per cent of the SKUs the MAPE does reduce or

remains the same. How does it come then that the weighted MAPE does not reduce? For

some high volume products the MAPE increases using the additional variables, which causes

the weighted MAPE to remain about the same as without the predictor variables. For this

selection of SKUs, SAS Forecast Studio fits ARIMAX models to the data because they give a

better prediction based on the holdout sample. Though, the weighted MAPE (cf. Appendix C)

is calculated based on the entire historical period. ARIMA models that only take the own

history into account and ESM models (cf. Appendix B) remain more valuable when taking

the entire period into account and for these type of models additional predictor variables do

not add value. The SKUs (30%), for which ARIMAX models are wrongly fitted to the data,

are represented in Appendix D.

Page 108: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Step Effect Entered

Number Effects



26 3 405084.506 366371.200

1 Temp 4 404879.117 366174.375 2 Trends*Temp 5 404833.570* 366137.393*

* Optimal Value of Criterion

Table 6.13: Flu: Stepwise Linear Regression


Baseline model 36.04 Baseline model incl. predictor variables 36.68

Table 6.14: Flu: SAS Forecast Results

Afterwards, the second question (i.e. whether the trends data can be used to exclude the

extreme order quantities from the historical data) will be investigated in an attempt to improve

the base forecast accuracy. Although SAS Forecast Studio is able to deal with seasonality, it

will incorporate extreme historical peaks into the models to predict the future. Sales of flu-

related products depend highly on the seriousness with which society is infected with the flu.

Hence, extreme sales during a particular year will influence the variability in the future. SAS

Forecast Studio will predict higher sales for the future although the peak in the following year

might be significantly lower. Therefore, it seems appropriate to exclude uncommon extreme

order quantities based on the search terms. In contrast to the promotions described in section

6.2, the trends data cannot be described with a binary variable (i.e. in promotion or not). A

cut-off point needs to be determined above which the order quantity data should be excluded

from the historical dataset. The cut-off point is considered to be the third quartile of the entire

flu-related trends data, which is 86. Hence, 25 per cent of the weekly searches have an

amount of searches of more than 86 and the corresponding order quantities are considered to

cause high variability in the base forecast model. The result of this approach is represented in

table 6.15.

26 Note that ‘Flag_MF’ and ‘SWITCH_TO_PROMO’ are related to the discussion in Section 5.4, where two events were

added to the baseline model with historical order data to deal with out-of-stocks and promotions. This part of the analysis did

not yet made a distinction between different types of promotions, as described in section 6.2, since we want to analyze the

results separately.

Page 109: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



Baseline model 36.04 Baseline model excl. Q where trends > 86 26.07

Table 6.15: Flu: SAS Forecast Results excl. peaks

It is clear the additional information offers a benefit of 27.66 per cent compared to the

original forecast. Hence, the second approach of using the external data seems more

appropriate. A similar approach will be used to describe the other seasonal product types

(sunscreens, mosquito products and insecticides). The external data sources, Google Trends

data and weather conditions, will be analyzed with regard to these types of products. In a

similar fashion, the external data sources will be added as independent variables to the

baseline model. Afterwards, the trends data will be used to exclude extreme order quantities

of exceptional peaks to see if this will increase the accuracy of the base forecast for the future.

6.3.3 Sunscreens Sunscreens are seasonal products, for which it is sometimes extremely difficult to obtain an

accurate forecast because the summer in Belgium is relatively difficult to predict. Peaks can

be situated from May to September depending on the weather in Belgium. Although SAS

Forecast Studio incorporates the seasonality in the forecast based on historical order data, the

software might not always be accurate in defining the occurrence and magnitude of the peak.

Intuitively, it can be expected weather conditions have an impact on the demand of

sunscreens. Hence, it will be investigated if Google Trends data and weather conditions will

offer additional value to the base forecast accuracy.

Figure 6.18 represents the sum of the searches of all positive and significantly correlated

search terms. As expected from the supply chain perspective, peaks are most of the time

situated during the summer months, namely July and August.

Page 110: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 6.18: Google Trends: Sunscreens

Conforming to flu-related products, a stepwise linear regression is executed to select the

predictor variables and can be consulted in Appendix E. The obtained main and interaction

effects are added to the baseline model. Afterwards the forecast is executed using the

extended model. The results are represented in table 6.16. It can be concluded the additional

predictor variables do not offer a benefit over the baseline model. The same explanation as

with the flu-related products can be uttered. It is clear the more information used is not always

better. Sometimes, existing historical data offers already a pretty good prediction and

additional information blurs the already existing seasonal trend and thus ARIMAX models are

wrongly fitted to the data and offer an inferior forecast compared to seasonal ARIMA or ESM

models. According to the Supply Chain department, weather conditions might not offer the

expected superior predicting results because it is known from experience pharmacies already

stock sunscreens during the months February and March and for these months the average

temperatures are still low. Therefore, seasonal models might be better to capture this trend.


Baseline model 63.77

Baseline model incl. predictor variables 67.11

Table 6.16: Sunscreens: SAS Forecast Results

In addition, the trends data will be used to exclude the extreme order quantities from the

historical data in an attempt to improve the base forecast accuracy. Again, the trends data is a

continuous variable and a cut-off point needs to be determined above which the order data

Page 111: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


should be excluded from the historical dataset. The cut-off point is considered to be the third

quartile of the entire sunscreen-related trends data, which is 79. Hence, 25 per cent of the

weekly searches have an amount of searches of more than 79 and the corresponding order

quantities for these searches are considered to cause high variability in the base forecast. The

result of this approach is presented in table 6.17. From the table it can be observed the

additional information decreases the MAPE with 33.80 per cent. Again, the second approach

of using the external data source seems more appropriate. Thus, the Supply Chain department

needs to be very careful how to apply additional sources of external information.


Baseline model 63.77

Baseline model excl. Q where trends > 79 42.22

Table 6.17: Sunscreens: SAS Forecast Results excl. peaks

6.3.4 Mosquito products The Supply Chain department issued the problem of not being able to predict the demand of

mosquito products. More precisely, they faced an extreme shortage during the summer

months of 2016. Webpage articles evidence the extreme peak in the summer of 2016

("Massaal veel muggen deze zomer", 2016). They talk about an extraordinary mosquito

infestation due to lots of rain in June and warm weather thereafter, which are the ideal

weather conditions for mosquitoes. SAS Forecast Studio was unable to predict the extreme

peak based on historical order data. It will be investigated if Google trends data or weather

conditions can improve the forecast accuracy.

Figure 6.19 represents the sum of the searches of all ten positively correlated terms. In July

2016, there was an extreme peak of search trends in line with the news as previously

described. According to the stepwise linear regression (Appendix E) only the trends data

offers additional predicting power and will be added to the baseline model to execute a new

forecast. The results are presented in table 6.18. Similar to the previous examples, including

the Google Trends data does not offer an improved accuracy for the future and it will be

investigated if it is useful to exclude extreme data to raise the base forecast accuracy (Table

6.19). The forecast accuracy improved significantly with 58.5 per cent. The higher raise,

Page 112: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


compared to the previous two seasonal product types, may be due to the fact that most

products in this selection are relatively new. New products are more difficult to predict by

SAS Forecast Studio, because the seasonal models have less historical data to rely on and the

peaks will have a higher weight when forecasting for the future. Excluding extreme peaks will

cause the data to be more reliable for the future.

Figure 6.19: Google Trends: Mosquito


Baseline model 121.32 Baseline model incl. predictor variables


Table 6.18: Mosquito: SAS Forecast Results


Baseline model 121.32

Baseline model excl. Q where trends > 46 50.36

Table 6.19: Mosquito: SAS Forecast Results excl. peaks

6.3.5 Insecticides Although it may not be the first thing coming up in your mind when thinking of the product

assortment of Multipharma, insecticides for cats and dogs are sold in the pharmacies. This

general term contains products to protect cats and dogs against fleas, ticks and worms. The

Page 113: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Supply Chain department has difficulties predicting this type of product and states the

demand depends highly on the weather conditions, because the weather has an impact on

when people are likely to walk their dog and let their cat outside. Hence, it seems reasonable

to investigate if adding statistical weather data offers value when forecasting seasonal

products against fleas, ticks and worms for cats and dogs. Figure 6.20 visualizes the search

terms for fleas, ticks and worms in Belgium during the specified period. It can be seen from

this figure the total sum of the three strengthens the seasonal pattern of the individual search

terms, which is why the total sum will be considered as explanatory variable. Conforming to

the previous seasonal products the results are presented in table 6.20. and table 6.21. Again,

the conclusion is that including the historical weather information does not offer an improved

forecast accuracy for the future. In contrast, excluding extreme data points based on the

search terms decreases the MAPE with 15.49 per cent.

Figure 6.20: Google Trends: Insecticides


Baseline model 45.84

Baseline model incl. predictor variables 50.00

Table 6.20: Insecticides: SAS Forecast Results


Baseline model 45.84 Baseline model excl. Q where trends > 435 38.74

Table 6.21: Insecticides: SAS Forecast Results excl. peaks

Page 114: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


6.3.5 Summary of the results In section 6.3.2 we explained that ARIMAX models performed better than ARIMA or ESM

models exclusively based on the historical order quantities for 70 per cent of the time series’

forecasts. Unlikely, the weighted MAPE did not reduce as for 30 per cent of the products

within the selection ARIMAX models were wrongly fitted based on the holdout sample and

other models (e.g. seasonal ARIMA or ESM models) do perform better when considering the

entire period on which the weighted MAPE is calculated. Hence, it is not always true that

ARIMAX model performs better than the ARIMA or ESM model. Sometimes, using a

specific lag combination of ARIMA can produce better forecast errors than the best ARIMAX

combination for a particular analysis (Gaiya, 2016; Durka & Pastorekova, s.d.). To

summarize, Google Trends data and weather conditions are not very useful to include as

additional independent variables to the baseline model, because the historical data already

incorporates the seasonality and the additional information seems to blur this seasonal trend.

This means it does not offer additional value over the historical data. However, it can be

concluded Google Trends data can offer significant improvements when applied in the right

way, i.e. when the data are used to exclude extreme order quantities from the historical dataset

to make the base forecast accuracy more accurate in the future (similar to the promotional


A summary of the weighted MAPE’s of this second research technique is presented in table


Baseline model Updated model % Change

Flu Products 36.04 26.07 -27.66 Sunscreens 63.77 42.22 -33.80 Mosquito products 121.32 50.36 -58.49 Insecticides 45.84 38.74 -15.49

Table 6.22: Summary MAPE’s seasonal products

Google Trends data seem to be more useful for mosquito products and sunscreens. This may

be due to the fact that there are more new products in the selection and by consequence less

data input present compared to the other seasonal product types (Table 6.23). Higher benefits

are expected from using external trends data to exclude extreme peaks when the percentage of

new products is higher than 50 per cent of the total selection of products. In the following

section the impact on safety stocks and the corresponding costs will be quantified for all the

Page 115: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


seasonal product types, however, it can be expected the benefits will be higher for mosquito

products and sunscreens. Afterwards, it will be more clear if Multipharma should make the

effort of including the additional trends data into the baseline model for seasonal products.

The time and cost to extract the trends data from Google has to be weighed against the cost

savings of the reduced safety inventory.

Product New SKU N SKU %

New Products

Flu 9 199 4.52

Sunscreen 161 288 55.90

Mosquito 16 30 53.33

Insecticides 15 45 33.33

Table 6.23: Percentage of new products

Page 116: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 7


According to the Supply Chain department the ordering cost is negligible in contrast with the

large holding costs (D.V. Belle, personal communication, April 7, 2017). As defined in

chapter 3, section 3.3, the bottleneck is the available space of the warehouse. Multipharma

has a periodic review system, where the inventory levels are reviewed after a fixed period of

time T and an order is placed such that the level of current inventory plus the replenishment

lot size equals a prespecified level (OUL), similar to the theory of Chopra and Meindl (2013)

as explained in section 2.4. The review interval is the time T between successive orders and in

the case of Multipharma is different for different kind of products. The downside of a periodic

review system is that Multipharma is not able to order JIT. However, it is the most promising

technique to do because they are dealing with a large bunch of products. By applying a

periodic review system they are able to plan deliveries and capacity of the warehouse. In

contrast, when dealing with a continuous review model there can be large peaks over time

when suddenly many products are out-of-stock and require replenishment at the same time.

In a periodic review system the lot size is based on the review interval, the average demand

over that period and the current stock27

(cf. Equation 2.1). Multipharma orders 1, 2, 4 or 8

weeks of stock for the product under consideration. This power of two policies is extremely

useful because we are dealing with multiple products and multiple suppliers. The slow

moving items (8 weeks) can be grouped together with the fast and medium moving items (1, 2

and 4 weeks) when ordered with the same supplier. This facilitates truck sharing,

consolidation of efforts, simplification of shipping schedules, etc (Chopra & Meindl, 2013).

This allows Multipharma to stay within a 6 per cent range from the optimal cost.

The prominent holding cost due to the limited available warehouse space causes Multipharma

to continuously optimize its average inventory. Keeping section 2.4 in mind, safety inventory

27 If Multipharma did not sell as expected, i.e. if the current stock level is higher than expected, it is taken into account when

calculating the size of the order. In addition, return flows or alternative sources can be taken into account.

Page 117: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


can be reduced lowering demand variability while keeping the CSL constant. This was

exactly the aim of the previous chapter. When evaluating stock reduction one should take into

account the fact that Multipharma is growing, which of course has a nominal effect on stock

level. Hence, it can now be explained what impact the improved demand forecasting has on

the safety inventory. The impact on safety inventory and corresponding carrying costs are

discussed in section 7.1 and section 7.2, respectively.

7.1 Safety inventory

As described in chapter 2 (section 2.4), safety inventory is the inventory carried to satisfy

demand that exceeds the forecasted demand. Hence, demand variability is one of the drivers

of safety inventory. In reality demand uncertainty cannot be neglected, it is always there.

However, demand uncertainty in the form of variability should be managed and kept at a

minimum. In chapter 6 (Section 6.2 and Section 6.3), it has been evidenced that the weighted

MAPE and therefore demand variability were reduced significantly by using internal and

external data sources. The outcome of the forecasting process is a demand distribution with

appropriate parameters and for each product. In this section the safety stock will be

calculated twice, based on the obtained parameters of both executed forecasts, by applying the

general equation 2.8. Both, the standard deviation and mean of the demand change

when new data sources are inserted. The standard deviation of the lead time and the

mean lead time are not influenced in this case study and can be obtained from SAS for

each product separately. The mean lead time and standard deviation are calculated in SAS

based on historical data, which is continuously reviewed every time there is a new delivery.

The review interval (T) is retrieved from the SAP system and inserted into SAS. Likewise, the

service level can be obtained from the SAP program for each product and is originally based

on the ABC classification. Using the ABC classification purchased parts and materials are

rank-ordered according to the annual dollar value spent on each (Hopp & Spearman, 2008).

Once an item passes a certain threshold with regard to his weekly monetary value it is

assigned to a specific class and a corresponding service level, which it should be able to fulfil.

The corresponding safety factors (z) can be calculated for products that are normally

distributed and are presented in table 7.1.

Page 118: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


ABC Classification Service Level Safety factor

A 0.97 1.88 B 0.93 1.48 C 0.92 1.41

Table 7.1: Service level based on ABC classification

For normally distributed products, instead of using equation 2.8, the more simplified equation

2.11 will be used to calculate the safety stock. The standard deviation of demand during

period T+L is calculated using equation 2.3. However, for slow moving items, the demand

distribution cannot be approximated by a normal distribution (Chopra & Meindl, 2013).

Instead, the Poisson distribution with demand arriving at rate D is a better approach. Using

equation 2.2, applicable when demand is independent and identically distributed, the average

demand during T + L periods is given by . Hence, the safety stock of products

where the demand is Poisson distributed can be calculated using equation 2.8, where the OUL

is the inverse of the cumulative demand distribution function given and the required


We will consider some products with a review period equal to 8 weeks as slow moving,

Poisson distributed items, all other products (i.e. with review periods of 4 weeks or less) are

considered as fast or medium moving, normally distributed items. More specifically we can

define the demand distribution of each product doing a Kolmogrov-Smirnov goodness-of-fit

test for normal distribution. This is particularly valuable for products with a review period of

8 weeks for which it is uncertain if they are considered as slow moving. The null hypothesis

of the Kolmogrov test is that the demand is normally distributed. If the null hypothesis is

rejected (i.e. p-value < 0.05) we assume the products are Poisson distributed. This resulted in

about 30 per cent of the SKUs defined as Poisson distributed items. Figure 7.1 is an example

of the Kolmogrov-Smirnov test of a randomly selected product for which the demand is

normally distributed and the periodic review period is 4 weeks.

Page 119: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 7.1: Distribution analysis of ‘Quantity’ of randomly selected SKU

Parameter Symbol Estimate

Mean Mu 156.5409

Std Dev Sigma 44.37782

Table 7.2: Parameters for Normal Distribution

Test Statistic p Value

Kolmogorov-Smirnov D 0.05210544 Pr > D 0.150

Table 7.3: Goodness-of-Fit tests for Normal Distribution

Table 7.4 till Table 7.6 summarize the results obtained from coding the equations as explained

in section 2.4 in SAS Enterprise Guide based on the demand parameters of the base forecast

and the extended forecast for the promotional products. Table 7.7 till table 7.10 represent the

results obtained from the SAS coding based on the demand parameters of the base forecast

and the extended forecast for the seasonal products. Note that the results are presented based

on the IMS top level, because it allows a compact representation of a general applicable

classification. Seasonal products belong to only one IMS class depending on the type of

seasonal product. Moreover, from the prioritization discussion in section 6.1 it was clear the

Page 120: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Supply Chain department expected most substantial benefits on the OTC and PEC classes.

From the tables it can be concluded the safety inventory for OTC and PEC classes has

reduced significantly. In addition, improving the demand forecast accuracy also had a smaller

positive impact on the safety inventory of some other classes, such as ATC and PAC.


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

1 ATC 448 63.18 64.06 33.46 38.17 60,426.27 61,140.21 1.18% 2 OTC 636 64.96 60.71 48.97 36.1 57,006.39 52,320.51 -8.22% 3 PEC 1220 26.16 19.36 31.19 13.44 27,682.43 17,641.81 -36.27% 4 PAC 75 24.33 22.9 22.59 14.39 17,018.38 16,266.51 -4.42% 5 NUT 57 14.96 15.92 12.94 9.99 16,538.5 16,151.7 -2.34% 6 OTH 28 13.12 12.17 13.22 8.3 10,995.12 10,351.69 -5.85%

Table 7.4: Impact on safety inventory for ‘Passage délégué’


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

1 ATC 67 564.27 120.83 278.01 83.66 146,194.78 92,823.42 -14.97% 2 OTC 152 142.79 106.81 129.45 70.86 508,108.21 250,502.04 -72.24% 3 PEC 97 79.09 71.83 119.89 56.58 95,633.56 82,249.61 -3.75% 4 PAC 10 74.22 68.12 98.25 41.55 233,278.6 207,316.32 -7.28% 5 NUT 4 254.16 252.21 321.39 320.63 506,012.12 502,181.18 -1.07% 6 OTH 33 40.6 11.75 28.38 12.11 12,612.71 10,166.26 -0.69%

Table 7.5: Impact on safety inventory for ‘Promo O’


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

2 OTC 97 11.98 10.68 10.38 8.35 23,085.37 18,992.9 -17.73% 3 PEC 404 19.63 14.8 20.11 13.3 21,672.16 17,070.83 -21.23% 4 PAC 12 12.53 12.72 9.67 8.77 21,759.62 21,253.8 -2.32% 6 OTH 2 33.06 10.21 88.1 11.61 16,837.4 3,439.35 -79.57%

Table 7.6: Impact on safety inventory for ‘Action iU’


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

1 ATC 198 57.78 56.19 40.49 31.94 48,790.20 47,554.96 -2.53%

Table 7.7: Impact on safety inventory for flu-related products

Page 121: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

3 PEC 259 20.43 15.17 20.75 15.45 44,431.74 33,832.37 -23.86%

Table 7.8: Impact on safety inventory for sunscreens


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

3 PEC 29 29.16 17.65 59.63 17.19 24,996.95 12,533.28 -49.86%

Table 7.9: Impact on safety inventory for mosquito products


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

6 OTH 32 30.43 30.42 32.93 33.17 17,911.13 17,654.16 -1.43%

Table 7.10: Impact on safety inventory for insecticides

In line with the previous obtained results, it is important to reconsider figure 2.5 (Section 2.4).

From this section we know that reduced demand variability has an impact on the safety

inventory. However, excluding outliers from the data causes the mean demand (D) to reduce

as well. The outliers were responsible for an inflation of the normal demand. Hence, from

equation 2.1 it can be concluded the analysis has an impact on the lot size quantity (Q), which

causes the cycle inventory to reduce as well. From equation 2.8 it is clear our analysis has a

significant impact on the OUL, assuming that the periodic review period (T) and mean lead

time (L) remain the same. What follows focuses on the reduction of the carrying cost due to

the reduction of the previously calculated safety inventory because this is the main focus of

this master dissertation (Supra. Section 2.4).

Page 122: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Figure 7.2: Revised periodic review policy

7.2 Impact on costs Inevitably, owning safety stock costs money, and as a consequence reduced safety inventory

will lower the money tied up in inventory. From the beginning of this chapter, we know

Multipharma’s holding cost is the most important cost component. The holding or carrying

cost can be broken down into four categories: Capital costs (or financing charges), storage

space costs, inventory services costs and inventory risk costs. The cost of capital, also known

as WACC, is the amount of money invested in inventory and the leading factor in determining

the carrying cost. The standard rule of thumb puts the carrying costs at 25 per cent of

inventory value on hand (Vermorel, s.d.). The annual cost savings resulted from the annual

reduction in safety inventory can be calculated using the previous obtained results; the safety

inventory value on hand is obtained using the previous safety stock levels for each distinct

product multiplied by the price of each product, obtained from the SAP system of

Multipharma. Comparing the annual carrying costs of both the baseline model and the

extended model results in the corresponding annual savings of improved demand forecasting

accuracy, which are presented in the following tables. As expected from the previous section

and the conclusion of section 6.3 including additional data sources for flu-related products

Page 123: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


does not improve the cost savings. In addition, there is a limited annual cost savings of

1,244.43 € including the data sources for insecticides. Hence, for these two types of products

we can conclude Multipharma should not make the effort of capturing, analyzing and adding

the additional data to the baseline model. Nevertheless, using the data in a smart way as with

the other selections of products resulted in an overall yearly cost savings of 959,960.69 €.28


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

COST (€) Base model

COST (€) incl. data

% Change

1 ATC 448 63.18 64.06 33.46 38.17 80,257.45 16,486.59 -79.46% 2 OTC 636 64.96 60.71 48.97 36.10 83,153.48 17,697.96 -78.72% 3 PEC 1220 26.16 19.36 31.19 13.44 50,203.32 6,380.45 -87.29% 4 PAC 75 24.33 22.90 22.59 14.39 25,811.25 6,355.57 -75,38% 5 NUT 57 14.96 15.92 12.94 9.99 26,081.36 4,462.63 -82,89% 6 OTH 28 13.12 12.17 13.22 8.30 18,037.43 3,318.49 -81.60%

Table 7.11: Impact on cost for ‘Passage délégué’


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

COST (€) Base model

COST (€) incl. data

% Change

1 ATC 67 564.27 120.83 278.01 83.66 576,533.07 490,923.58 -14.85% 2 OTC 152 142.79 106.81 129.45 70.86 661,404.38 211,265.18 -68.06% 3 PEC 97 79.09 71.83 119.89 56.58 156,934.54 138,746.41 -11.59% 4 PAC 10 74.22 68.12 98.25 41.55 207,317.11 183,085.39 -11.69% 5 NUT 4 254.16 252.21 321.39 320.63 1,236,594.13 1,227,182.95 -0.76% 6 OTH 33 40.60 11.75 28.38 12.11 237,731.32 164,142.18 -30.95%

Table 7.12: Impact on cost for ‘Promo O’


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

COST (€) Base model

COST (€) incl. data

% Change

2 OTC 97 11.98 10.68 10.38 8.35 23,085.37 18,992.9 -17.73% 3 PEC 404 19.63 14.8 20.11 13.3 21,672.16 17,070.83 -21.23% 4 PAC 12 12.53 12.72 9.67 8.77 21,759.62 21,253.8 -2.32% 6 OTH 2 33.06 10.21 88.1 11.61 16,837.4 3,439.35 -79.57%

Table 7.13: Impact on cost for ‘Action iU’

28 A critical reader should notice that the overall cost savings are even much higher since the money tied up in cycle inventory reduces as well due to the reduction of the lot size quantity.

Page 124: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

COST (€) Base model

COST (€) incl. data

% Change

1 ATC 198 57.78 56.19 40.49 31.94 68,433.04 68,584.96 0.22%

Table 7.14: Impact on cost for flu-related products


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

COST (€) Base model

COST (€) incl. data

% Change

3 PEC 259 20.43 15.17 20.75 15.45 84,785.30 63,932.18 -24.60%

Table 7.15: Impact on cost for sunscreens


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

3 PEC 29 29.16 17.65 59.63 17.19 26,342.71 14,467.96 -45.08%

Table 7.16: Impact on cost for mosquito products


IMS Label

N Obs

MEAN Base model

MEAN incl. data

STDDEV Base model

STDDEV incl. data

SS Base model

SS incl. data

% Change

6 OTH 32 30.43 30.42 32.93 33.17 56,320.57 55,076.14 -2.21%

Table 7.17: Impact on cost for insecticides

Page 125: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 126: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 8


From the literature it was clear big data analytics is emerging into the OM area due to

increasing pressures on companies to boost efficiency and fulfil higher customer service

levels. Researchers confirm that companies recognizing themselves as more data-driven

perform better on objective measures of financial and operational results (McAfee &

Brynjolfsson, 2012). Like many organizations, Multipharma noticed that the data it has in its

possession and especially how it makes use of it can create a competitive advantage. It is

important this big data, coming from its SAP system, Excel sheets, etc. is managed and

analyzed in an appropriate way. This is the reason why this case study started with qualitative

interviews with the Supply Chain department making up a list of potential demand drivers and

prioritizing them. Afterwards, the necessary information was captured to model the

promotional and seasonal demand drivers. The promotional events were modelled and added

to the baseline model using SAS code. This data served as an input to SAS Forecast Studio,

which executed forecasts for the selections of products. Moreover, when the forecast was

executed continuous feedback from the Supply Chain department was necessary to

understand the dynamics of certain time series and to override some forecasts based on

alerting (cf. Section 5.3). As already recognized by Hopp and Spearman (2008), forecasting is

more than selecting the right model and choosing the appropriate parameters. Equally

important is obtaining qualitative information from the forecaster to potentially override the

quantitative model. The result of the qualitative and quantitative techniques was a reduced

weighted MAPE for the three types of promotions, which confirmed the hypothesis that the

additional internal data in the form of semi-structured Excel files improved the demand

forecast accuracy. The events were used to exclude demand variability, in the form of volatile

order quantities, in order to obtain a more accurate forecast for the future.

Page 127: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Afterwards, a more extended research was executed with regard to seasonality, evaluating two

hypotheses. From the qualitative interviews was decided to use Google Trends data and

weather conditions in an attempt to improve the demand forecast accuracy of some seasonal

product types. The first hypothesis investigated whether the additional data sources could be

added to the baseline model as explanatory variables29

to improve the predicting power,

thereby using advanced ARIMAX models. However, this technique did not deliver the

expected results because SAS Forecast Studio already absorbed the seasonality, using Holt-

Winters’ seasonal exponential smoothing and seasonal ARIMA models. Although the

additional independent variables improved the predicting power of ARIMAX models, the

seasonal models were still superior and hence, additional variables were irrelevant. Moreover,

when using additional data sources the researcher should be aware of the issue of overfitting

(“More data is not always better”, 2015). Mathematically, more variables can lead to a model

with a better fit to the data that was used to train it. However, using too many variables tends

to lead to the curse of overfitting. This is where you involve so many variables as predictors

that your model is too specific to the precise historical data you trained it on, and therefore

misleads you as to the real drivers behind your target predicted variables. The second

hypothesis was based on the same approach as the promotional events. Google Trends would

be used to filter extreme peaks out of the historical dataset by setting a threshold value above

which the order quantities would be excluded in an attempt to improve the forecast accuracy.

This technique seemed to be very useful when the percentage of new products was above 50

per cent of the entire selection of products and hence, less historical data input was present.

Knowing that the previous described techniques were able to reduce demand uncertainty and

relating this to the theory described in section 2.4, the impact of the improved demand

forecast accuracy on the safety inventory and carrying costs was investigated following the

corresponding formulas in this section. As expected the cost savings were significant for all

types of promotional demand drivers and for two out of four seasonal product types (those

where the amount of new products was high). Hence, managing and analyzing the data in a

smart way resulted in an overall cost savings of 959,960.69 € per year. It is clear manual

interventions for these types of products were too computational expensive and should be

limited using advanced big data analytics. Therefore, the advice to Multipharma is to capture

the required data sources in a more structured way to improve automation and use this data in

29 Note that all main and interaction effects of Google Trends, precipitation and average temperature were evaluated using a

stepwise linear regression to obtain the final model that could be used in SAS Forecast Studio.

Page 128: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


a smart way to improve the demand forecast accuracy. Though, it should be noted extracting

Google Trends data has to occur manually because there is no automated procedure to obtain

the required search results. Hence, the time and cost to extract this data should be weighed

against the cost savings of inserting the additional information. From the previous described

analysis it is clear extracting this data is worth the effort when dealing with a high percentage

of new products.

To summarize, one can conclude that big data analytics offers a significant benefit for

Multipharma because adding additional internal and external data sources to the already

existing automated demand forecasting, primarily based on historical order quantities, reduces

the forecast error and more importantly the safety inventory and corresponding carrying costs.

In general, we can state using advanced forecast capabilities (i.e. model events, using

advanced technologies, etc.) as described in section 2.3 transforms a company into a leading

company within its industry creating a competitive advantage over the others. Using a

company’s internal and external data sources in a smart way to improve demand forecast

accuracy may have a direct impact on the company’s inventory management, thereby

reducing its safety stock and carrying costs. Hence, the insights of this case study can be

extended to companies in other industries feeling the same pressure to be cost efficient and

fulfill high customer service levels. More specifically, this research is extremely valuable for

wholesale companies with a high level of product variety searching for more advanced

techniques to better match supply and demand (most of the time orders from retailers). With

this case study a contribution was made to the literature proving that data-driven demand

prediction reduces the gap between supply and demand and has a positive impact on safety

inventory, thereby raising a company’s overall profitability.

Page 129: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 130: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Chapter 9

Further Research

The supply chain to which Multipharma belongs is a ‘multi-echelon’ or ‘multi-level’

production and distribution network. This implies that the products move through more than

one step before reaching the final customer. Multipharma serves as an intermediate storage

point between the suppliers of the pharmaceutical products and the retailers. Its inventory

allows ‘risk pooling’ among the retailers and facilitates redistribution of the retailers’

inventories that might grow out of balance (Nahmias, 2009). In such a ‘multi-echelon’ supply

chain, all stages of the supply chain have to work toward the objective of maximizing total

supply chain profitability (Chopra & Meindl, 2013). Although this seems evident, in practice

multi-echelon supply chain optimization is a major challenge and mainly untouched area for


According to a presentation of Desmet (2017) only 27 per cent of all companies make use of

advanced planning systems. Hence, it can be said Multipharma is part of a minority of

companies who use advanced analytics to optimize its demand forecast, planning and

inventory system. An even smaller percentage (13%) of companies has evolved to multi-

echelon supply chain optimization (Figure 9.1). With multi-echelon supply chain

optimization it is possible to minimize inventory levels across the different echelons of the

supply chain. A central decision-maker determines all replenishment decisions in the network

based on continuously or periodically updated information about all inventories of all

products at all relevant facilities and production stages (Federgruen, 1993). As a consequence,

investing in a transportation and information infrastructure is highly required to facilitate the

effective flow of goods and information. The product should be available when the customers

need it. Hence, a very responsive replenishment system along with an outstanding information

system is required. Sharing information across the supply chain improves the utilization of

supply chain assets and the coordination of supply chain flows. However, more information is

not always better, because the cost and complexity of the infrastructure and the analysis

Page 131: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


increase exponentially. Making a trade-off between complexity and value is important when

considering the information flow. It is important shared information goes along with good

supply chain coordination (Chopra & Meindl, 2013). The joint report of SAS and Purdue

University (2008) shows the top five actions, which are clearly aligned, to improve demand

management (Figure 9.1). It is remarkable improved collaboration to create forecasts is listed

as the top strategy. According to Desmet (2017), moving from a single echelon optimization

system to an inter-organizational multi-echelon optimization system might reduce the

inventory with 40 to 60 per cent (cf. Figure 9.2). Moreover, the report of SAS and Purdue

University states that a forecast at the customer level is another key strategy to improve

demand management. Hence, improving the forecast at the lower levels of the supply chain

(e.g. by using additional data sources) might be a strategy to improve overall demand

management of the supply chain.

Figure 9.1: Key strategic actions to improve demand management

Source: Purdue University & SAS Demand Management Survey 2008 @ All rights reserved

Figure 9.2: How companies define safety stocks

Source: Slideshare of Desmet B. (Solventure) @ All rights reserved

Page 132: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Based on this information it can be concluded Multipharma might benefit from moving to a

multi-echelon system and creating more visibility and coordination among its supply chain.

As known from chapter 2 (Section 2.4) inventory is an important cost component and

reducing the inventory among the entire supply chain might create benefits for all partners

involved in providing the product to the end consumer. The network design of Multipharma,

as represented in figure 3.3, shows that Multipharma might improve demand and inventory

management and by consequence lower the carrying costs, by sharing information directly

with its suppliers and especially with its pharmacies (POS). As explained in chapter 3, the

SAP system should capture and share global information from within the company and across

its supply chain. However, the latter is not yet totally on point and Multipharma is currently

optimizing its warehouse system according to the single echelon system. The global

information should be used in a smart way to increase the overall supply chain revenue.

Hence, the quality of operational decisions should improve based on real-time information

and optimization. Combining this with creating a better (data-driven) demand forecast at the

level of the pharmacies (e.g. by implementing external data sources such as pharmacy

location, customer demographics, etc.) might be an interesting topic for the future.

Page 133: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 134: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



Accenture. (2008, 11 december). Most U.S. Companies Say Business Analytics Still Future

Goal, Not Present Reality.

Retrieved from

Accenture. (2016, December 5). [Guest Lecture Big Data] [College-slides]. Retrieved from


Alvarez, P. (2015). Product Classification in Healthcare. Retrieved from


Billah, B., King, L. M., Snyder, D. R., & Koehler, B. A. (2006). Exponential smoothing

model selection for forecasting (International Journal of Forecasting 22, p. 239– 247).

Retrieved from

Bose, (2009) "Advanced analytics: opportunities and challenges", Industrial Management &

Data Systems, Vol. 109 Iss: 2, p.155 – 172.

Retrieved from

Brynjolfsson, E., & McAfee, A. (2012, October). Big Data: The Management Revolution.

Retrieved from

Cachon, G., & Fisher, M. (1997). Campbell's soup's continuous replenishment program:

evaluation and enhanced inventory decision rules (Vol. 6, No. 3).

Retrieved from

Chen, M., Mao, S., & Lin, Y. (2014). Big data: A Survey.

Retrieved from

Chopra, S., & Meindl, P. (2001). Supply Chain Management - Strategy, Planning and

Operation (6e ed.). Harlow, England: Pearson Education Limited.

Coghlan, T., Diehl, G., Karson, E., Liberatore, M., Luo, W., Nydick, R., Pollack-Johnson, B.,

Wagner, W. (2010). The current state of analytics in the corporation: The view from industry

leaders. Internat. J. Bus. Intelligence Res. Forthcoming.

Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of

Winning (Rev. ed.). Brighton, US: Harvard Business Press.

Desmet, B. (2017, February 13). Safety Stock Modelling in Multi-Echelon Supply Chains

[College-slides]. Retrieved from

Page 135: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study



Durka, P., & Pastoreková, S. (n.d.). ARIMA vs. ARIMAX – which approach is better to

analyze and forecast macroeconomic time series? (Proceedings of 30th International

Conference Mathematical Methods in Economics). Retrieved from

Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable Big Data: A survey (p. 70-

81). Retrieved from

Eysenbach, G. (2006). Infodemiology: Tracking Flu-Related Searches on the Web for

Syndromic Surveillance.

Retrieved from

Eazystock. (2015). How to Calculate Safety Stock for Inventory Management [Whitepaper].

Retrieved from


Federgruen, A. (1993). Handbooks in Operations Research and Management Science.

Retrieved from


Fildes, R. (1989). Evaluation of Aggregate and Individual Forecast Method Selection Rules

(Page Range: 1056 - 1065). Retrieved from

Fritsch, D. (2015, 3 augustus). 6 Inventory Control Techniques for Stock Optimization

[Blogpost]. Retrieved from


Gaiya, A. (2016, March 11). Basic knowledge about time series econometrics [Blog

comment]. Retrieved from


Ginsberg, J., Mohebbi, H. M., Patel, S. R., Brammer, L., Smolinski, S. M., & Brilliant, L.

(2009). Detecting influenza epidemics using search engine query data.

Retrieved from

Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., Watts, D. (2010). Predicting consumer

behavior with Web search.

Retrieved from

Harrington, L. (1996), “Untapped savings abound”, Industry Week, 15 July, pp. 53-8.

Hillier, F. S., & Lieberman, G. J. (2015). Introduction to Operations Research (10e ed.). New

York, United States: Mc Graw Hill.

Page 136: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Hopp, W. J., & Spearman, M. L. (2008). Factory Physics (3rd ed.). Chicago, Waveland:


IBM. Bringing big data to the enterprise. Retrieved November 15, 2016 from

Jaume, B. (2015). Analytics and the art of modeling (Pages 429–471). Retrieved from

King, G. (2016, March 17). Big Data is Not About the Data! [Slideshare].

Retrieved from

King, P. L. (2011). Crack The Code: Understanding safety stock and mastering its equations.

Retrieved from

Klous, S., & Wielaard, N. (2014). Wij zijn Big Data - De toekomst van de

informatiesamenleving. Amsterdam, Nederland: Business Contact.

Knilans, E. (2014, 29 juli). The 5 Vs of Big Data [Blogpost]. Retrieved from

Knuth, C., Fritsch, D., Seidel, D., Hallin, J., & Bendis, M. (2014, October 27). Forecasting

Accuracy: How to Manage Demand Outliers [Blog post].

Retrieved from


LaValle, S., Lesser, E., Shockley, R., Hopkins, M., & Kruschwitz, N. (2011). Big Data,

Analytics and the Path From Insights (VOL.52 NO.2).

Retrieved from

Lee, J., Kao, H., & Yang, S. (2014). Service innovation and smart analytics for Industry 4.0

and big data environment. Retrieved from



Li, J., Cheng, Y., & Zhao, L. (2015). Big Data in product lifecycle management (p. 667-684).

Retrieved from

Liberatore, M. J., & Luo, W. (2010). The Analytics Movement: Implications for Operations

Research (p. 313 - 324).

Retrieved from

Lohr, S. (2015). Dit is big data wat het is, hoe het werkt en wat het oplevert. Amsterdam,

Nederland: Maven Publishing.

Madden, S. (2012). From Databases to Big Data.

Retrieved from

Page 137: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, H. A.

(2011). Big data: The next frontier for innovation, competition, and productivity.

Retrieved from file:///Users/ranitorrekens/Downloads/MGI_big_data_exec_summary.pdf

Massaal veel muggen deze zomer. (2016, June 10). Retrieved from


Mazzocchi, F. (2015). Could Big Data be the end of theory in science? (EMBO reports

(2015)). Retrieved from

McKinsey&Company. (2015). Industry 4.0. How to navigate digitization of the

manufacturing sector. Retrieved from

Nahmias, S. (2009). Production & Operations Analysis (6e ed.). New York, US: McGraw-


Provideor (2015). Supply chain platform: blueprint - central warehouse phase.

More data is not always better data. (2015, 17 November). Retrieved from

Polgreen, M. P., Chen, Y., Pennock, M. D., & Nelson, D. F. (2008). Using internet searches

for influenza surveillance. Retrieved from


Purdue University & SAS. (2008). Demand Planning Maturity Model Strategies for Demand-

Driven Forecasting and Planning (Whitepaper). Retrieved from


Ranst, M. V. (2015, 16 februari). "Vijf tot zes keer meer griepgevallen dan vorig jaar".

Retrieved from


SAS Institute Inc. (2009). How can finance and operations work together to maximize

inventory provisions while minimizing working capital costs?. Retrieved from


SAS Institute Inc. (2014). SAS® Forecast Studio 13.2: User’s Guide. NC, USA: Cary.

Shah, N. (2004). Pharmaceutical supply chains: key issues and strategies for optimisation

(Volume 28, Issues 6–7).

Retrieved from

Page 138: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study


Supply chain als wissel: Multipharma creëert supply chain organisatie (2012). Retrieved from

Van der Zee, B., & van der Zee, W. (2016). Succes met big data. Culemborg, Nederland: Van

Duuren Informatica.

Vermorel, E. (n.d.). Inventory costs (Ordering costs, Carrying costs). Definition and formula.

Retrieved from

Waller, M. A., & Fawcett, S. A. (2013). Data Science, Predictive Analytics, and Big Data:

A Revolution That Will Transform Supply Chain Design and Management (p. 77-84).

Retrieved from

Wolfe, B., Leonard, M., & Fahey, P. (n.d.). Introducing SAS® Forecast Studio.

Retrieved from

Yin, S., & Kaynak, O. (2015). Big Data for Modern Industry: Challenges and Trends (Vol.

103, No. 2). Retrieved from

Zhang, P. G. (2003). Time series forecasting using a hybrid ARIMA and neural network

model (Neurocomputing 50, p. 159 – 175).

Retrieved from

30 Original link has been removed, probably due to unconfidential (over-optimistic) information about the way of working of Multipharma. Note that this article is completely discussed with the current Supply Chain Manager and therefore the information in this master dissertation is more valuable.

Page 139: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study
Page 140: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Appendix A

Table A.1: IMS top level classification

Table A.2: APB Classification


(IMS Top



1 ATC (Available To Counter)

2 OTC (Over The Counter)

3 PEC (Parapharmacy)

4 PAC (Parapharmacy Accessories)

5 NUT (Nutrition)


Code Description

A Accessoires

B Bandage et Pansement

C Cosmétique

D Diététique - Nutrition - Alimentation

E Hygiene

F Pesticide à usage agricole

G dispositif médical

H Homeopathie

I Stomie et Incontinence

K Biocide

M Matière première

O Autres

P Divers timbre APB

R Reactif

S Spécialités

T Moyen diagnostique

V Veterinaire

Z Autres (sans contrôle CNK)

Page 141: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Table A.3: GSTAT Classification

Code Description

100 Specialités

101 Matières de base

102 Varia (Medical)

103 Peripharmaceutical Ugage Externe

104 Peripharmaceutical Accessoires

105 Peripharmaceutical Ugage Interne

106 Homeotherapie & Autres Therapies

107 Bandagisterie

108 Divers

Page 142: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Appendix B

Exponential Smoothing31 SAS Forecasting Studio takes into account all kind of exponential smoothing models that

capture common features of time series such as trend and seasonal effects. In other words,

based on the time series data the best model is selected, which is a derivation of the simple

exponential smoothing, trend corrected exponential smoothing or Holt-Winters’ seasonal

exponential smoothing (SAS Institute inc., 2014). Using the simple ESM, the current forecast

(i.e. the one-step-ahead forecast for period t made in period t-1) can be estimated as the

weighted average of the last forecast and the current value of demand (Nahmias, 2009). That


In symbols,


where is a smoothing constant between 0 and 1 chosen by the user. The forecaster can

attach larger weights to more recent observations (large ) than to observations from the

distant past or vice versa (small ), which is exactly why these kind of models are chosen.

The best value will depend on the particular data (Hopp & Spearman, 2008). SAS Forecast

Studio automatically selects the most appropriate parameter for any exponential smoothing

method by minimizing a chosen error criterium (e.g. MAPE).

Most time series can be modelled using a trend- and seasonality- corrected ESM (Winter’s

model). The equations are based on equation B.1 of the simple ESM and computing the

forecast manually is out of the scope of this master dissertation.

31 Source:

Page 143: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

ARIMA Models According to the research paper of Zhang (2003), ARIMA models are among the most

popular linear models for time series forecasting. The ARIMA models are a special type of

regression models where the future value of a variable is a linear function of several past

observations and random errors. Usually, time series can be decomposed into a trend,

seasonal, cyclical and irregular component. Seasonality is a particular type of autocorrelation

patterns where patterns occur every ‘season’. Seasonality must be corrected before a time

series can be fitted to the model. The underlying process that generate the time series has the



where is a constant, and are the actual value and the random error at time period t,

respectively and and are model parameters where p and q are

integer values and often referred to as orders of the model. Random errors are assumed to be

independently and identically distributed with a mean of zero and a constant variance of .

Hence, values of are affected by the values of in the past (lags). Doing a regression

without lags fails to account for the relationships over time and overestimates the relationship

between the dependent and independent variables.

ARIMA models are quite flexible because they can represent different types of time series,

namely pure autoregressive (AR), pure moving average (MA) and combined AR and MA

(ARMA) series. Regressors can be added to the right-hand-side of the forecasting equation of

an ARIMA model, which is then extended to an ARIMAX model.

If q = 0, then equation B.2 becomes a pure AR model of order p. AR models are models in

which the value of a variable in one period is related to its values in the previous periods.

is an AR model with p lags


where is the constant and the coefficient for the lagged variable in time t-p. Although,

this model seems very similar to a standard regression equation, the difference is that in AR

models it is likely that the variables are correlated (Nahmias, 2009).

Page 144: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

When p = 0, the model reduces to a pure MA model of order q. MA models account for the

possibility of a relationship between a variable and the residuals from the previous periods.

is a MA with q lags


where is the coefficient for the lagged error term in time t-q. SAS models calculate with

a reversed sign.

The two previous models can be combined into an autoregressive ARMA model. It combines

both p AR and q MA terms, that is why it is called an model.


Modelling an model requires stationary variables. A stationary process has a

mean and variance that do not change over time and the process does not have trends. If the

variable is not stationary it can be detrended by regressing the variable on a time trend and

obtaining the residuals. Another way to adapt the original process is differencing. When a

variable is not stationary a differenced variable can be used. The first order differentiation

of the variable is represented by the following equation


If the variable is stationary after a first differentiation, the variable is called integrated of

order one. The ARMA model is called an ARIMA model where the ‘I’ stands for integrated.

An denotes an ARMA model with AR lags, MA lags, and a difference in

the order of d (SAS Institute Inc., 2014).

The Box and Jenkins’ methodology proposes to use the ACF and the PACF of the sample

data as the basic tools to identify the order of the ARIMA model (Zhang, 2003). However, in

the case study the parameters are automatically defined and the time series is made stationary

where needed using SAS Forecast Studio. The parameters are estimated such that an overall

measure of errors is minimized and as a consequence the right ARIMA model is fitted to the

time series.

Page 145: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Appendix C

The overall weighted MAPE is automatically calculated by SAS Forecast Studio. The

weighted MAPE can be obtained manually using SAS coding following the equations

described in this section (SAS Institute Inc., personal communication, March 15, 2017). It is

important to understand the underlying principle to extract meaningful conclusions in chapter


For each SKU (k) the weighted MAPE can be calculated as


where is calculated based on equation 4.2, is obtained from the series

properties and is the number of forecasted points in time.

The overall weighted MAPE is given as


An obvious conclusion is that products with high order volumes (high mean) contribute more

to the overall weighted MAPE.

Page 146: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Appendix D


mape flu (for baseline forecast)

model (for baseline forecast)

mape flu (with independent

variables forecast)

model (with independent

variables forecast)

000000000001646700 16.85157735 ESM 17.16675922 ARIMA32 000000000001674400 23.12974508 ESM 29.92862217 ARIMA 000000000002014800 32.18443185 ESM 43.23051178 ARIMA 000000000002015200 264.508236 ARIMA 264.6355322 ARIMA 000000000002671600 57.96121079 ESM 59.52220063 ESM 000000000002778000 31.35117121 ARIMA 32.80230246 ARIMA 000000000006532000 26.96774332 ARIMA 30.6527438 ESM 000000000008482332 49.73711485 ARIMA 55.15880607 ARIMA 000000000008498308 24.32528277 ESM 30.63013023 ARIMA 000000000008509647 77.55271236 ARIMA 77.81139202 ARIMA 000000000008524814 16.96414779 ESM 19.7677253 ARIMA 000000000008528880 20.24505678 ESM 34.15204477 ARIMA 000000000008534244 36.13061118 ESM 42.18745951 ARIMA 000000000008538617 22.7111718 ESM 23.74616356 ESM 000000000008563827 34.15609401 ESM 44.04444268 ARIMA 000000000008564583 68.3913707 ESM 71.80095335 ARIMA 000000000008564709 38.52869719 ESM 40.33744397 ESM 000000000008566370 16.06122936 ARIMA 16.19240805 ARIMA 000000000008567328 53.75199765 ARIMA 53.76211492 ARIMA 000000000008567341 71.45790345 ESM 72.7242423 ESM 000000000008568431 24.79119254 ARIMA 26.29840171 ARIMA 000000000008568432 29.59308524 ARIMA 31.40670346 ARIMA 000000000008571230 19.95005532 ESM 28.35511216 ARIMA 000000000008572526 78.85943112 ARIMA 78.85943112 ARIMA 000000000008572591 49.09590779 ESM 72.49552017 ARIMA 000000000008572592 33.75106563 ESM 44.12574413 ARIMA 000000000008573326 65.93389323 ESM 70.18730001 ARIMA 000000000008573675 49.05096795 ARIMA 50.44575016 ARIMA 000000000008576888 19.76011182 ESM 19.81337922 ARIMA 000000000008576894 75.91731221 ESM 90.67450724 ARIMA 000000000008577388 17.94634802 ESM 24.02464152 ARIMA 000000000008580280 27.95232024 ESM 38.98416035 ARIMA 000000000008580287 50.53142171 ARIMA 54.37348375 ARIMA 000000000008580289 57.32347918 ARIMA 73.20493648 ARIMA 000000000008585181 49.48556489 ARIMA 52.7898008 ARIMA 000000000008588386 29.92816802 ESM 31.54511718 ARIMA 000000000008588387 24.37825548 ARIMA 41.038808 ARIMA 000000000008588400 29.18554857 ARIMA 29.73286455 ARIMA 000000000008597893 34.76248739 ARIMA 39.2599199 ARIMA

32 More specifically ARIMAX but SAS Forecast Studio does not make a distinction in notation between ARIMA and ARIMAX

Page 147: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

000000000008598739 22.99827849 ESM 24.00223766 ARIMA 000000000008609020 69.67138069 ESM 70.34677546 ESM 000000000008609021 54.79938351 ESM 69.98649723 ARIMA 000000000008617323 44.61712284 ESM 48.03350173 ARIMA 000000000008618031 41.49044749 ARIMA 46.02573258 ARIMA 000000000008620854 15.09638952 ESM 20.84070625 ARIMA 000000000008628669 56.05047071 ESM 58.5457714 ESM 000000000008628671 73.71036699 ARIMA 75.27709823 ESM 000000000008637521 49.03898365 ESM 62.90410912 ARIMA 000000000008638142 24.35509915 ESM 26.40111577 ARIMA 000000000008641331 63.48802283 ESM 72.49928241 ARIMA 000000000008642202 50.73014213 ARIMA 50.73014213 ARIMA 000000000008642203 34.09164496 ARIMA 37.08755452 ARIMA 000000000008646136 30.05750644 ESM 47.3129377 ARIMA 000000000008647202 70.06258325 ESM 86.87830014 ARIMA 000000000008649683 68.5656189 ARIMA 82.93603379 ARIMA 000000000008651290 44.24827483 ESM 44.24856052 ESM 000000000008655007 50.34367119 ESM 50.96364378 ARIMA 000000000008656040 37.2721033 ESM 76.45641654 ARIMA 000000000008656043 41.28673049 ESM 49.532739 ARIMA 000000000008672768 34.30563267 ESM 40.55378111 ARIMA

Table D.1: selection of SKUs with higher MAPE including the independent variables

N = 61, out of selection of 198 SKUs

Page 148: · geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study

Appendix E Similar to flu-related products a stepwise linear regression, including all different

combinations of the independent variables (i.e. Google Trends, precipitation and average

temperature) is executed to define the extended model for sunscreens, mosquito products and

insecticides. The AIC is used to select the predictors; the model stops when adding or

removing an additional effect does not reduce the AIC.

Step Effect Entered Number Effects In AIC SBC

0 Intercept 1

FLAG_MF 2 SWITCH_TO_PROMO 3 258596.826 228005.814

1 Trends 4 255326.909 224744.225

2 Trends*Temp 5 255192.796 224618.442

3 Temp 6 255143.984 224577.959 4 Trends*Temp*Rain 7 255115.344 224557.649 5 Temp*Rain 8 255054.620* 224505.254*

* Optimal Value of Criterion

Table E.1: Sunscreens: Stepwise Linear Regression

Step Effect Entered Number

Effects In AIC SBC

0 Intercept 1 FLAG_MF 2 SWITCH_TO_PROMO 3 38467.2862 34702.0002

1 Mosquito_Total 4 37894.4810* 34135.4331*

* Optimal Value of Criterion

Table E.2: Mosquito: Stepwise Linear Regression

Step Effect Entered Number

Effects In AIC SBC

0 Intercept 1 FLAG_MF 2 SWITCH_TO_PROMO 3 50814.9718 45111.9275

1 Trends*Temp 4 50693.8673* 44997.4749*

* Optimal Value of Criterion

Table E.3: Insecticides: Stepwise Linear Regression

Top Related