I
Confidentiality Clause PERMISSION
I declare that the content of this Master Dissertation can be consulted and/or reproduced if the sources
are mentioned.
Name student: Rani Torrekens
Signature:
II
Nederlandse Samenvatting Business Analytics is een opkomend fenomeen die de toenemende belangrijkheid van grote
hoeveelheden gegevens weerspiegelt in termen van groeiende volumes, verscheidenheid en snelheid
(Department for Business Innovation and Skills, 2013). Analytics erkent dat we ons in een tijdperk
bevinden waar grote hoeveelheden gegevens een centrale rol spelen. Commerciële organisaties,
regeringen en gemeenschappen onderzoeken hoe ze hun grote gegevensvolumes kunnen gebruiken om
waarde te creëren (Yui, 2012). Veel organisaties merken dan ook dat de gegevens die zij bezitten en
vooral hoe ze die gebruiken een concurrentievoordeel kunnen creëren. Vandaag de dag wordt continue
gegevens- en informatieverzameling door veel organisaties als één van de belangrijkste
bedrijfsactiviteiten gezien. Deze grote hoeveelheden gegevens moeten op een passende manier worden
beheerd en geanalyseerd om er waarde uit te halen.
Een aantal onderzoekers beweren dat de groeiende aandacht die wordt verleend aan analytics een
belangrijke uitdaging en kans is voor Operationeel Onderzoek (Liberatore en Luo, 2010, Ranyard et
al., 2015, Mortenson et al., 2015). Operationele onderzoeksprofessionals zullen hun optimalisatie- en
modelleringskennis moeten toepassen in combinatie met geavanceerde analytische vaardigheden die
nodig zijn om grote hoeveelheden ongestructureerde gegevens te onderzoeken. Deze geavanceerde
analytische technieken zorgen ook voor een positieve invloed in de gehele productieketen.
Hoofdzakelijk op gebied van Onderzoek en Ontwikkeling, voorraadketenbeheer en productie en
service kan data-analyse een verschil maken. De huidige dynamische marktbehoeften hebben ervoor
gezorgd dat de ontdekking van nieuwe technologieën in de productieketen in een stroomversnelling is
gekomen. Vele traditionele benaderingen met betrekking tot de supply chain moeten herzien worden
omdat ze verouderd zijn in deze nieuwe data-omgeving (Waller & Fawcett, 2013). Dit fenomeen
wordt ook wel Industrie 4.0 genoemd. Industrie 4.0 maakt het mogelijk om waarde over de hele
productlevenscyclus vast te leggen. In deze Master thesis gaat de aandacht vooral naar één specifiek
onderdeel van Industrie 4.0, namelijk de vraagvoorspelling die gestuurd wordt door big data. Een
betere vraagvoorspelling met het oog op een kleinere voorspellingsfout is niet alleen van belang in de
productie maar in het algemeen voor de gehele supply chain. Minder onzekerheid en bijgevolg een
kleinere foutenmarge bij het voorspellen van de vraag zal ervoor zorgen dat er minder voorraad moet
worden voorzien wat uiteraard tot een betere kostenefficiëntie leidt.
III
De eigenlijke doelstelling van deze Master thesis is om aan te tonen hoe de combinatie van data
science en analytics gebruikt kan worden om voorraadketenbeheer te verbeteren. Het potentieel van
geavanceerde analytische methodes en big data wordt geïllustreerd aan de hand van een case study in
een Belgische farmaceutische groothandel, Multipharma. Aangezien de bottleneck van dit bedrijf de
beperkte magazijnruimte is, zoekt de Supply Chain afdeling voortdurend naar betere oplossingen om
zijn activiteiten te verbeteren zoals het continu optimaliseren van de voorspelling van de vraag en de
voorraad. Er zal worden nagegaan of nieuwe ‘demand drivers’ kunnen gemodelleerd worden door
middel van extra databronnen en zo voor een betere match tussen vraag en aanbod kunnen zorgen.
IV
Preface
This master dissertation gave me the opportunity to further explore my interest in how big
data analytics creates value in the Operations Management field. This interest was highly
triggered during my Master in Operations Management while choosing two elective courses
from the Master Data Analytics. I started to realize big data analytics is an upcoming
phenomenon in the Operations Management field and the combination of both is able to
create value in many different ways. This master dissertation was an occasion to show what I
have learned during the past five years, to critically reflect upon the frequently repeated
statement “more data is always better” and to structure the topic in my own research. It gave
me the chance to gain valuable experience in the business environment and to understand the
ups and downs of a real life project. With this research I hope I made a valuable contribution
to the literature concerning the impact of big data on an Operations Management problem.
The case study following the literature is set out in an existing wholesale company,
Multipharma. Provideor, a supply chain consultancy company, gave me the opportunity to set
out a new project within Multipharma. I would like to thank Thomas Meersseman and Steven
Raekelboom from Provideor for their support, advice and time. Furthermore, I am grateful the
Supply Chain manager of Multipharma, David Van Belle, gave me the opportunity to work in
close collaboration with the company and to provide all kind of internal information I needed
to write this dissertation. Special thanks to my promoter Broos Maenhout who assigned this
interesting topic to me and who was always prepared to give instant and meaningful advice.
Finally, I would like to thank my parents and partner to support me at any time.
V
Contents
Confidentiality Clause ................................................................................................................. I
Nederlandse Samenvatting ........................................................................................................ II
Preface ...................................................................................................................................... IV
List of Abbreviations .............................................................................................................. VII
List of Tables ............................................................................................................................ IX
List of Figures .......................................................................................................................... XI
Introduction .............................................................................................................................. 1
Chapter 1: Big Data ................................................................................................................. 4
1.1 From databases to Big Data ................................................................................................. 4
1.2 What is Big Data? ................................................................................................................ 8
1.3 Value of Big Data ............................................................................................................... 10
Chapter 2: Operations Research .......................................................................................... 12
2.1 Evolution of Operations Research ...................................................................................... 12
2.2 Value for Operations Management .................................................................................... 15
2.3 Data-driven demand prediction .......................................................................................... 18
2.3.1 Internal enterprise data .................................................................................................................. 18
2.3.2 Causal factors ................................................................................................................................... 18
2.3.3 Model events .................................................................................................................................... 19
2.3.4 Technological characteristics ...................................................................................................... 20
2.4 Inventory control ................................................................................................................ 22
2.4.1 Measuring demand uncertainty ................................................................................................... 25
2.4.2 Measuring product availability ................................................................................................... 26
2.4.3 Safety stock formula ...................................................................................................................... 27
2.5 Conclusion .......................................................................................................................... 29
Chapter 3: Multipharma ....................................................................................................... 30
3.1 Pharmaceutical supply chain .............................................................................................. 30
3.2 Introduction to Multipharma .............................................................................................. 31
3.3 Network description ........................................................................................................... 35
Chapter 4: Methodology ........................................................................................................ 38
Chapter 5: Demand Forecast ................................................................................................ 43
5.1 Data description and operating assumptions ...................................................................... 43
5.2 Product Grouping ............................................................................................................... 44
5.2.1 General product classification ..................................................................................................... 44
VI
5.2.1.1 New products ........................................................................................................................... 45
5.2.1.2 Existing products .................................................................................................................... 46
5.2.2 Group levels ..................................................................................................................................... 48
5.2.2.1 IMS Classification .................................................................................................................. 48
5.2.2.2 INN/DCI Classification ........................................................................................................ 48
5.2.2.3 APB Classification ................................................................................................................. 49
5.2.2.4 GSTAT Classification ........................................................................................................... 49
5.3 Model selection .................................................................................................................. 50
5.4 Create forecast .................................................................................................................... 52
Chapter 6: Demand Drivers .................................................................................................. 56
6.1 Setting priorities ................................................................................................................. 57
6.2 Internal data ........................................................................................................................ 62
6.2.1 Passage Délégué .............................................................................................................................. 62
6.2.2 Promo O ............................................................................................................................................. 68
6.2.3 Action iU ........................................................................................................................................... 71
6.2.4 Summary of the results .................................................................................................................. 74
6.3 External data ....................................................................................................................... 75
6.3.1 Selecting search terms ................................................................................................................... 77
6.3.2 Flu-related products ....................................................................................................................... 82
6.3.3 Sunscreens......................................................................................................................................... 85
6.3.4 Mosquito products .......................................................................................................................... 87
6.3.5 Insecticides ....................................................................................................................................... 88
6.3.5 Summary of the results .................................................................................................................. 90
Chapter 7: Inventory ............................................................................................................. 92
7.1 Safety inventory ................................................................................................................. 93
7.2 Impact on costs ................................................................................................................... 98
Chapter 8: Conclusion ......................................................................................................... 101
Chapter 9: Further Research .............................................................................................. 104
References ............................................................................................................................. XIII
Appendix A
Appendix B
Appendix C
Appendix D
Appendix E
VII
List of Abbreviations
ACF Autocorrelation Function
AI Artificial Intelligence
AIC Akaike's Information Criterion
APB Algemene Pharmaceutische Bond
AR Autoregressive
ARIMA Autoregressive Integrated Moving Average
B2B Business to Business
BI Business Intelligence
CNK Code National(e) Kode
CSL Cycle Service Level
DBMS(s) Database Management System(s)
DC Distribution Center
DCI Les dénominations communes internationales
ERP Enterprise Resource Planning
ESC Expected Shortage per Cycle
ESM Exponential Smoothing Model
GFT Google Flu Trends
GSTAT Groupe Statistique
IDC International Data Corporation
IDM Intermittent Demand Model
INN International Nonproprietary Names
IoT Internet of Things
IT Information Technology
JIT Just-In-Time
KDE Kernel Density Estimation
KMI Koninklijk Meteorologisch Instituut
KPI Key Performance Indicator
MA Moving Average
MAPE Mean Absolute Percentage Error
OTC Over-The-Counter
OUL Order-Up-to-Level
VIII
OR Operations Research
PACF Partial Autocorrelation Function
PEC Parapharmacy
POS Point Of Sale
Promo O Promo Obligatoire
R&D Research and Development
RDBMS Relational Database Management System
RIZIV Rijksinstituut voor Ziekte- en Invaliditeitsverzekering
SCM Supply Chain Management
SKU(s) Stock Keeping Unit(s)
TCO Total Cost of Ownership
UCM Unobserved Components Model
USC Uniform System of Classification
WHO World Health Organisation
WIP Work In Progress
WMS Warehouse Management System
IX
List of Tables
6.1 Promotional Periods of Vicks Vaporub 100g
6.2 Promotional Period of Tilman Elimin Fresh Thee
6.3 Promotional Period of Vichy Dercos Shampoo
6.4 Summary MAPE’s promotional demand drivers
6.5 IMS class corresponding to seasonal type
6.6 Correlation of search terms
6.7 Flu: Search terms
6.8 Sunscreens: Search terms
6.9 Mosquito: Search terms
6.10 Fleas: Search terms
6.11 Ticks: Search terms
6.12 Worms: Search terms
6.13 Flu: Stepwise Linear Regression
6.14 Flu: SAS Forecast Results
6.15 Flu: SAS Forecast Results excl. peaks
6.16 Sunscreens: SAS Forecast Results
6.17 Sunscreens: SAS Forecast Results excl. peaks
6.18 Mosquito: SAS Forecast Results
6.19 Mosquito: SAS Forecast Results excl. peaks
6.20 Insecticides: SAS Forecast Results
6.21 Insecticides: SAS Forecast Results excl. peaks
6.22 Summary MAPE’s seasonal products
6.23 Percentage of new products
7.1 Service Level based on ABC classification
7.2 Parameters for Normal Distribution
7.3 Goodness-of-Fit tests for Normal Distribution
7.4 Impact on safety inventory for ‘Passage délégué’
7.5 Impact on safety inventory for ‘Promo O’
7.6 Impact on safety inventory for ‘Action iU’
7.7 Impact on safety inventory for flu-related products
7.8 Impact on safety inventory for sunscreens
X
7.9 Impact on safety inventory for mosquito products
7.10 Impact on safety inventory for insecticides
7.11 Impact on cost for ‘Passage délégué’
7.12 Impact on cost for ‘Promo O’
7.13 Impact on cost for ‘Action iU’
7.14 Impact on cost for flu-related products
7.15 Impact on cost for sunscreens
7.16 Impact on cost for mosquito products
7.17 Impact on cost for insecticides
A.1 IMS top level classification
A.2 APB classification
A.3 GSTAT classification
D.1 Selection of SKUs with higher MAPE including the independent variables
E.1 Sunscreens: Stepwise Linear Regression
E.2 Mosquito: Stepwise Linear Regression
E.3 Insecticides: Stepwise Linear Regression
XI
List of Figures
1.1 Traditional BI solution
1.2 Big data solution
1.3 5Vs Theory
2.1 Four steps that comprise a process view of analytics
2.2 The McKinsey Digital Compass
2.3 Summary leaders’ capabilities
2.4 Top pressures for data-driven demand forecasting
2.5 Periodic review policy
2.6 Impact on safety inventory
2.7 Safety factor of Standard Normal distribution and graphical representation
3.1 A pharmaceutical supply chain
3.2 Frequency count according to size of assortment
3.3 Network design
4.1 Coverage in days 2015-2016: Impact of SAS on inventory
4.2 Forecasting process of SAS Forecast Studio
5.1 Overview product classification
5.2 Characteristics of five types of new products
5.3 Hierarchical breakdown of disaggregated data
5.4 Distribution of products over the different forecasting methods
5.5 Prediction error ACF of randomly selected SKU
5.6 Prediction error PACF of randomly selected SKU
5.7 Forecast with out-of-stock period
5.8 Distribution MAPE with baseline model forecast
5.9 Distribution of model type of all SKUs with baseline model forecast
6.1 Priority schedule
6.2 Substitution between two products
6.3 Example Excel file of supplier
6.4 Followed procedure of analysis
6.5 Time series Vicks Vaporub 100g incl. promotional order quantities
6.6 Time series Vicks Vaporub 100g excl. promotional order quantities
6.7 Passage Délégué: Distribution of the MAPE
XII
6.8 Time series Tilman Elimin Fresh Thee incl. promotional order quantities
6.9 Time series Tilman Elimin Fresh Thee excl. promotional order quantities
6.10 Promo O: Distribution of the MAPE
6.11 Time series Dercos Shampoo incl. promotional order quantities
6.12 Time series Dercos Shampoo excl. promotional order quantities
6.13 Action iU: Distribution of the MAPE
6.14 Google Trends data: ‘Griep’
6.15 Graphical representation of correlation of 2 search terms
6.16 GFT Belgium
6.17 Google Trends: Flu
6.18 Google Trends: Sunscreens
6.19 Google Trends: Mosquito
6.20 Google Trends: Insecticides
7.1 Distribution analysis of ‘Quantity’ of randomly selected SKU
7.2 Revised Periodic review policy
9.1 Key strategic actions to improve demand management
9.2 How companies define safety stocks
1
Introduction “Big data is not about the data”
Gary King, Harvard University
Nowadays, big data is being produced by everything around us. Data is generated from
multiple sources at a frightening velocity, variety and volume and thereafter transmitted by
systems, sensors and mobile devices. Many fields such as healthcare, science, marketing and
sports have become data-driven. Every area is touched by a large amount of data that has to
be managed in an appropriate way. However, big data is not totally new. The last decade,
research institutions and companies collected large amounts of information out of which new
information was generated. They are all looking for correlations, early indicators and cause-
and-effect relationships between phenomenons, persons and events and eventually make
decisions based on these findings. However, over the last years, something has changed. The
application of big data analytics has broadened and penetrates our daily lives. New
possibilities emerge with the rise of the Internet of Things (IoT), where all kind of small and
big devices are interconnected with each other and the society in general (Klous & Wielaard,
2014; Lohr, 2015).
Many organizations noticed that the data they own and especially how they use it can create a
competitive advantage. Data and information are becoming primary assets for many
organizations. This is the reason why today most organizations try to collect as much data as
possible. This big data has to be managed and analyzed in an appropriate way. According to
King (2016), the real value of data is in the analytics. This new concept of gathering and
analyzing extensive amounts of data has a whole new access to organizational problem
solving and the implementation of analytical solutions. As a consequence, the adoption of big
data in Operations Research (OR) is a phenomenon that evolves at a raising speed (Hillier &
Lieberman, 2015). Nowadays, OR professionals will apply their optimization and modelling
knowledge in combination with more advanced analytical skills, which are necessary to
examine large amounts of unstructured data. However, taking advantage of big data analytics
to promote the OR profession is not as easy as it seems. It is a relatively new and thus
dynamic area. By consequence, innovation and development of analytical solutions
characterized by the integrated use of data, processes, and systems are one of the key areas
companies are focusing on today. By combining descriptive and predictive analytics such as
2
data mining and statistics with prescriptive analytics such as optimization methods from OR,
one is able to develop new applications within an organization.
The objective of this master dissertation is to identify how the combination of data science
and analytics (i.e. big data analytics) are used to improve Supply Chain Management (SCM).
More specifically, the central research question is whether demand forecast accuracy can be
improved using big data analytics. Advanced analytical techniques are applied on an existing
wholesale company, which is the topic of the case study. Prior to the case study, the literature
will describe the impact of big data and analytics on SCM to obtain a meaningful insight into
the case study. The literature is divided into two chapters. The first chapter discusses the term
big data in a general context. First, the evolution of the analytical movement will be
described, putting emphasis on the difference between traditional databases (storage) and big
data (data analytics). Second, a definition of big data and an explanation of its main
characteristics are provided based on the ‘5Vs theory’. The terms Volume, Variety and
Velocity, which are the three major pillars of big data, will be clarified. The chapter concludes
with a discussion of a McKinsey report that states how big data offers value in five core
industries. The knowledge of this first chapter is necessary to understand why big data has
become so popular in the business environment. The second chapter illustrates the evolution
of the traditional OR to the more advanced big data analytics, as we know it today and the
value it creates for Operations Management (OM). The latter outlines the implications and
advantages Industry 4.0 (i.e. the digitization of the manufacturing sector) creates for
manufacturing companies. A link to the following sections of this chapter is the McKinsey’s
Digital Compass that relates the levers of Industry 4.0 to eight value drivers that have an
impact on the performance of a common manufacturing company. The remaining part of the
literature focuses on one lever of Industry 4.0 that might have an impact on SCM; data-driven
demand prediction. This lever may result in an improved demand forecast using big data
analytics and thus a better match between supply and demand. The direct impact improved
demand prediction has on supply chain optimization with regard to inventory will be
described theoretically in the final section of this chapter. This chapter is crucial to understand
the insights with regard to the case study.
Part two of this master dissertation is a case study in collaboration with Multipharma, a
Belgian pharmaceutical wholesale company. Since the bottleneck of this company is its
limited warehouse space the Supply Chain department is continuously looking for superior
3
solutions to improve its operations and optimize the inventory on hand. Following the
literature it will be investigated if improved data-driven demand forecasting will be able to
reduce the forecast error and thereby create a better match between supply and demand.
Moreover, the direct impact of better matching supply and demand on inventory will be
analyzed based on the theory described in section 2.4. After an introduction to the industry
and company in chapter 3, the methodology of the research is written out in chapter 4.
Chapter 5 describes the data used to execute the analysis and the advanced analytics to
support the decision making process. Chapter 6 starts with the priority list of potential
demand drivers that may have the most substantial impact on demand variability and thus
forecast error. Two demand drivers are selected to investigate whether modelling them in an
appropriate way, thereby using internal and external data sources, may improve demand
forecast accuracy. The demand drivers will be modelled using SAS code and the forecast will
be executed using SAS Forecast Studio, thereby seeing the direct impact on the forecast error.
If the forecast error would have been reduced, better decisions with regard to the inventory
will be possible and this will have an impact on the objectives of the company and the
competitive position, which is the subject of chapter 7. Finally the case study ends with a
conclusion (Chapter 8) and a discussion of the rising concept ‘multi-echelon supply chain’,
which might be a topic for further research (Chapter 9).
Part I
Literature
4
Chapter 1
Big Data
The big data era is characterized by the growth of social media, an explosion of mobile
devices and a physical world being outfitted with millions of networked sensors connected
through the Internet. These factors have resulted in unprecedented growth of all types and
volumes of data available to businesses (Emani, Cullot, & Nicolle, 2015). According to Chen,
Mao and Lin (2014) the rapid growth of cloud computing and the IoT causes the sharp growth
of data. With IoT, sensors from different devices all over the world are collecting and
transmitting data, which is stored in the cloud. An International Data Corporation (IDC)
report foresees that “from 2005 to 2020, the digital universe will grow by a factor of 300,
from 130 Exabyte to 40,000 Exabyte” (Yin & Kaynak, 2015). This explosion of information
is the reason why we are confronted with the challenge of collecting and integrating massive
data from widely distributed data sources. The large amount of unstructured data far surpasses
the capacities of the Information Technology (IT) architectures and infrastructure of existing
enterprises, which is the topic of section 1.1. Big data is becoming widespread because of the
increasing variety of sources that create data and the increasing speed with which data is
created (Section 1.2). The ‘datafication’ of society has positive and negative consequences.
However, the most accepted belief about big data is that it is an enrichment for the entire
world. Moreover, big data will penetrate our lives even more the upcoming years because
people cannot miss the convenience, comfort and added value of technology anymore (Klous
& Wielaard, 2014). This chapter concludes with a short discussion about the added value of
big data (Section 1.3).
1.1 From databases to Big Data
At the beginning of this century, before we even had a notion of the term big data, the volume
of data started to increase dramatically, known as the ‘information explosion’. In contrast to
the previous years, no intervention of humans was needed to enter new data into databases.
Large amounts of data could be stored automatically because of the new automated storage
5
systems. Because many researchers were intrigued by this ‘information boom’ this was
known as the start of the concept big data as we know it and the end of using conventional
methods for data management. Compared to conventional storage systems it is remarkable big
data lacks structure and has a larger amount of data. These are two conditions that require to
approach the data in a different way. In essence, big data is a large amount of unstructured
and (sometimes) not-completed set of data, which makes it impossible to approach it with
conventional database systems (Klous & Wielaard, 2014; van der Zee, 2016).
Madden (2012) states there is a misunderstanding of big data that it is a large amount of data
stored in databases. In fact, Database Management Systems (DBMSs) cannot solve the
problem of big data. It is true they can handle data in the range of Petabytes but they are
generally not fast enough and not able to analyze large amounts of complex structures
because they handle information sequentially. In-database statistics and modelling are not
widely adopted and do not go well with large amounts of data. The same can be said for
platforms such as MapReduce1 and Hadoop
2. They can store large amounts of data but they
are very limited in several ways. They provide a low-level infrastructure designed to process
data, not manage it. On the other hand, existing tools are not appropriate anymore for data
analysis at a large scale. Programming languages such as SAS, R and Matlab are able to do
some mature analysis but they are not consistent with large datasets. These insights gave rise
to solutions that integrate DBMSs or platforms such as Hadoop and advanced programming
languages such as SAS, R and Matlab. As reported by Madden (2012), one considerable
solution is to implement data mining, machine learning and statistical algorithms inside the
DBMS. This makes it possible to manage the data inside the DBMS. Some new systems are
evolving continuously. The limitations of MapReduce gave rise to a new evolution: Apache
Mahout. It provides a framework for executing machine learning algorithms on top of
MapReduce. Another example of a new system is GraphLab. It is a scalable platform that can
solve many graph-based iterative machine learning algorithms. However, it is not a data
management platform and it requires that data sets fit into memory. Another rapidly growing
1 The programming technique MapReduce is used to divide the large amount of unstructured data to a large amount of
parallel computer devices and then combining these results again. In this way it is possible to process a lot of information in a
short time period. 2 Hadoop is an open-source framework from Apache that allows to store and process big data in a distributed environment
across clusters of computers using simple programming models.
Source: Mahout – Introduction. (s.d.). Retrieved October 13, 2016 from,
https://www.tutorialspoint.com/mahout/mahout_introduction.htm
6
system is Watson IBM. Watson analytics is at the forefront of a new area of computing. It is
the only platform on the market that is powered by cognitive capabilities. The computer can
recognize a question in human language and it responds quickly after an extremely fast
research in a pool of many diverse sources of information. The software goes much further
than the traditional Artificial Intelligence (AI)3, which is the reason why it can also recognize
irony and riddles. Besides these new systems Chen et al. (2014) define cloud computing as a
solution to meet the requirements on infrastructure for big data. Cloud computing
revolutionizes the existing IT architecture. Besides having a large amount of storage
available, cloud computing can provide a solution to process big data. This states big data can
effectively be managed by the distributed storage technology based on cloud computing. This
parallel computing capacity can raise the efficiency of acquiring and analyzing big data.
According to Accenture’s digital analyst (Accenture, 2016) new big data technologies allow
enterprises to think differently about how to use data as an enterprise asset to drive value-
based outcomes. He states there is a significant difference between traditional Business
Intelligence (BI) and big data analytics (Figure 1.1 and Figure 1.2). First, traditional
Relational Database Management Systems (RDBMS) require data to be normalized prior to
load and storage. In contrast, big data technologies are able to store and process diverse –raw
structured, semi-structured and unstructured- data. The research paper of Bose (2009) that
describes the evolution of BI, states that the evolution to advanced BI is due to advanced
techniques to capture, transfer, transform and store data. These techniques enable
organizations to integrate various databases into data warehouses in which centralized data
management and retrieval occurs. The second criteria of differentiation (Accenture, 2016)
defines that traditional databases focus on descriptive analytical insights drawn from
historical data, while more recent big data techniques enable predictive analytics and unleash
insights at an increasing speed because of the ability to handle large volumes of real-time
data. Third, data delivery is transformed from static pre-defined report layouts with limited
visualizations to graphical and interactive data visualizations to support analytic exploration.
This visual literacy deeply enhances the communication of complex ideas. Fourth, the Total
Cost of Ownership (TCO) has been reduced due to better scalability via parallel processing,
distributed computing and open source technologies. Finally, the more advanced analytical
techniques change the key roles in a company from report architects, integration architects
3AI is a branch of computer science dealing with the simulation of intelligent behavior in computers.
7
and database architects to data visualizers, data scientists, data engineers and big data
architects. A more detailed explanation of the evolution of big data analytics is provided in
chapter 2.
Figure 1.1: Traditional BI solution
Source: Accenture (2016) @ All rights reserved
Figure 1.2: Big data solution
Source: Accenture (2016) @ All rights reserved
8
1.2 What is Big Data? In its recent study, Chen et al. (2014) found that although the growing importance of big data
has been generally accepted, people still have varying opinions on its definition. There are
multiple definitions of big data according to different perspectives. These definitions may
provide a better understanding on the profound social, economic, and technological influence
of big data. McKinsey & Company, a global consulting agency, defines big data as the next
frontier for innovation, competition, and productivity (Manyika et al., 2011). They define big
data as “datasets whose size is beyond the ability of typical database software tools to capture,
store, manage, and analyze”. From this definition, it can be concluded that besides the volume
of a dataset, other important criteria of big data are the increasingly expanding data scale and
its management, making it impossible to be handled by traditional database technologies.
Gartner, an international research agency, allocates the following definition to big data: “Big
data is high-volume, high-velocity and high-variety information assets that demand cost-
effective, innovative forms of information processing for enhanced insight and decision
making.” Following this definition, we can define big data by the ‘Triple V’ expression.
Actually, the ‘Triple V’ concept of data management is introduced by Gartner analyst Laney
D. in a 2001 META Group research publication (Li, Cheng, & Zhao, 2015). He defined the
three main components of data as Volume, Variety and Velocity. Nowadays, two other Vs
have been added to the model, resulting in the ‘5Vs theory’. These V-words bring challenges
to big data management and are introduced below.
Figure 1.3: 5Vs Theory
Source: Van Den Poel (2016)
Volume
Data at rest
Variety
Data in many forms
Velocity
Data in motion
Veracity
Data in doubt
Value
Data into money
9
The first keyword of the ‘Triple V’ is Volume, the amount of data to process. Without a
doubt, this keyword is the reason why we call it ‘big’ data. Every digital process produces
data all the time, which accumulates to enormous values. One major advantage of the large
amount of data in the range of Petabyte is that we can stop looking for models and
formulating hypotheses in advance. The old way of doing research by hypothesis and
assumptions will be transformed into data-driven research. This implies we need to avoid
testing theories or models but instead do an exploration by observing, describing and mapping
undiscovered territory. Big computing clusters will find patterns based on statistical
algorithms (Emani et al., 2015; Mazzocchi, 2015).
The large amounts of data come from different sources, which give rise to the second V-word:
Variety. Big data has improved access to different sources of information and tries to
integrate them in an appropriate way. These sources can be structured or unstructured e.g.
data originating from social networks, health care data, financial data, biochemistry and
genetic data, astronomical data, etc (Emani et al., 2015). The difficulty associated with big
data is to structure and eventually analyze relevant data from a large amount of unstructured
data with the help of fast-moving computer tools as explained in section 1.1. These tools
become more precise and adapt at a raising speed because they learn on the large amount of
new data.
The third word featuring big data is Velocity. According to Knilans (2014), “Velocity refers
to the speed at which new data is generated and the speed at which it moves around.” Velocity
involves streams of data, structured records creation, and availability for access and delivery
(Emani et al., 2015). Sophisticated algorithms and statistical tools are needed to rapidly go
through an immense amount of rapidly increasing data. An illustrative example where
velocity is of major importance is in the area of e-commerce. Data of a web user has to be
analyzed very quickly such that banners and other advertisements can be shown very fast to
the user to influence their purchasing behaviour.
A V-word often added to the three previously described key words is Veracity. Veracity
contains the uncertainty due to the fact that the data is inconsistent and incomplete. This can
10
result in approximations of the corresponding models. By consequence, keeping big data
organized has become a major challenge.
The previously described features (V-words) of data cause traditional analytics to fall short
handling data in a convenient, well-timed and efficient way. This introduces the existence of
the fifth V that organizations are coping with, finding the Value within their data (Knilans,
2014). More data does not necessarily promise more knowledge. Data by themselves are
meaningless. For example, sensors in cars and houses provide a lot of information. Sensors
only give an excellent insight in the use of the products by consumers and their purchasing
behaviour if appropriate techniques are used to manage and analyze the data. Through
effective data mining and analytics, the massive amount of collected data can extract
meaningful value from big data and create competitive advantages. This V-word will be
described more in detail in the following section. More specifically, the value of big data for
OM will be outlined in section 2.2.
In short, big data is data that is too big, too hard or too fast for existing tools to process (Klous
& Wielaard, 2014; Madden, 2012).
1.3 Value of Big Data
Most companies hold on the familiar way of doing business out of fear to lose existing
business. However, the past has already revealed the risks companies take when companies
do not embrace new disruptive technologies, e.g. Kodak fell out of the market with the rise of
digital photography. Most often companies are afraid of the disadvantages, such as privacy
norms, that go along with the implementation of big data (Klous & Wielaard, 2014).
However, McKinsey & Company (2011) evidenced how big data created value for enterprises
in an in-depth research on the five core industries that represent the global economy. This
report indicates that big data may raise the productivity level and competitive advantage of
enterprises and public sectors, and create substantial benefits for consumers (Chen et al.,
2014). McKinsey states big data can reduce the expenditure for the U.S. healthcare by over 8
per cent and the retail industry may improve their profit by more than 60 per cent by fully
utilizing big data. Furthermore, big data may boost the efficiency of governmental activities,
such that the developed economies in Europe could save over 100 billion Euros (Chen et al.,
2014).
11
In general, McKinsey’s research (2011) states big data can create value by making data more
easily accessible to relevant stakeholders in a timely fashion, thus creating more
transparency. In manufacturing, for example, integrating data from R&D, engineering, and
manufacturing units can significantly cut time to market and improve quality. Another way by
which value is generated is by enabling experimentation to discover needs and improve
performance. Combining more accurate, detailed and real-time (or near real-time)
performance data with IT to set up controlled experiments enables organizations to raise
performance to higher levels. A third approach to create value is by using big data to segment
populations to customize actions. In companies that produce consumer goods or provide
services, segmentation is already used for many years. However, these companies are starting
to deploy more advanced big data techniques, such as the real-time micro-segmentation of
customers to target promotion and advertising. Moreover, big data provides value by
replacing and/or supporting human decision making with automated algorithms. Decision
making can be improved, risk can be minimized and valuable insights can be revealed by
making use of sophisticated analytics. Automated algorithms are useful to retailers who aim
to optimize decision processes such as the automatic fine-tuning of inventories and pricing in
response to real-time in-store and online sales. Besides automating decisions, decision making
is transformed from analyzing smaller samples that individuals with spreadsheets can handle
and understand to analyzing extensive datasets using big data techniques and technologies.
The last way big data creates value for enterprises according to McKinsey is because of its
ability to facilitate innovation of new business models, products, and services. Besides
creating entirely new products, services and models, manufacturers enhance the development
of next generation products and after-sales service offerings by using data obtained from the
actual products.
12
Chapter 2
Operations Research
This chapter starts with a description of the evolution of the traditional OR to the more
advanced big data analytics, as we know it today (Section 2.1). Section 2.2 outlines the value
of big data analytics for OM, more specifically the implications and advantages Industry 4.0
creates for manufacturing companies. This section ends with the McKinsey’s Digital Compass
that relates the levers of Industry 4.0 to eight value drivers that have an impact on the
performance of a manufacturing company. Section 2.3 focuses on one lever of Industry 4.0
that might have a significant impact on SCM; data-driven demand prediction. This section is
based on a research paper of Purdue University & SAS (2008) that identifies capability gaps
between leaders and laggards with regard to data-driven demand forecasting. The direct
impact improved demand prediction has on supply chain optimization with regard to
inventory will be described theoretically in section 2.4.
2.1 Evolution of Operations Research
An introductory book to OR (Hillier & Lieberman, 2015) describes the origins of OR. Due to
the industrial revolution, there was a tendency of increased specialization by the division of
labour and segmentation of management responsibilities, which created new problems. One
problem was that many components of an organization grew into relatively autonomous
empires with their own goals and value systems, thereby losing sight of how their activities
and objectives match with those of the overall organization. What was best for one
component frequently was detrimental to another. As a consequence, the components ended
up working sub-optimal together. Moreover, due to the increasing complexity and
specialization in an organization, it became extremely difficult to allocate the available
resources to the distinct activities with the aim of obtaining maximum effectiveness for the
organization as a whole. These kind of problems and the need to find a better way to solve
them provided the environment for the emergence of OR. As its name implies, OR involves
‘research on operations’. Thus, OR is applied to problems that concern how to conduct and
13
coordinate the operations (activities) within an organization. Moreover, OR is concerned with
the practical management of the organization. Therefore, to be rewarding, OR must provide
positive and understandable conclusions to the decision-makers when they are needed.
There has been a great buzz throughout the business world in recent years about analytics (or
business analytics) and the importance of incorporating analytics into managerial decision
making. It is basically OR by another name. However, there are some differences in their
relative emphasis. Analytics aim to focus on an entire business process where decisions span
functional boundaries. In fact, analytics is a motivation for OR to refocus its attention on
growing and applying a wider range of scientific and technological approaches for
organizational decision making (Liberatore & Luo, 2010). In addition, analytics fully
recognizes that we have entered into the era of big data where massive amounts of data now
are commonly available to many businesses and organizations to help guide managerial
decision making. Davenport and Harris (2007, p. 7) define analytics as “the extensive use of
data, statistical and quantitative analysis, explanatory and predictive models, and fact-based
management to drive decisions and actions.” Thus, a primary focus of analytics is on how to
make the most effective use of all these data. Having an immense amount of high-quality
data, organizations start to think how they might use this data to improve decision making,
which is the foundation of growth in analytics (Hillier & Lieberman, 2015). Liberatore and
Luo (2010) define analytics as a four-step process of transforming data into actions through
analysis and insights when making organizational decisions (Figure 2.1).
Figure 2.1: Four steps that comprise a process view of analytics
Source: Liberatore & Luo (2010) @ All rights reserved
Data
• Collection • Extraction • Manipulation
Analysis
• Visualization • Predictive
modeling • Optimization
Insight
• What happened?
• What will happen?
• What should happen?
Action
• Operational decisions
• Process changes
• Strategic formulation
14
Analyzing large amounts of data to obtain clear insight has limited value unless managers
translate these insights into actions. LaValle et al. (2011) found in a study that the biggest
challenge in adopting analytics is managerial and cultural. According to almost four out of ten
respondents, the main obstruction to widespread analytics adoption is lack of understanding
of how to use analytics to improve the business. Organizations need to determine the current
state, what is likely to happen next and what direction should be taken to obtain optimal
results. In essence, senior executives should focus running businesses on data-driven
decisions and react fast when disruptions occur. To trigger these new actions across the
organization, analytics-driven insights must be closely linked to the business strategy, easy to
understand and embedded into the organizational processes so action can be taken when
opportunities arise.
According to Madden (2012), using the data effectively in practice needs to be educated to
solve the complex problems of today. A survey of Accenture in 2008 of 254 managers in
different functional areas proved 60 per cent of decisions were already based on analytic
input. Moreover, high-performance businesses are 50 per cent more likely to use analytics
strategically. Marketing (customer analytics), Operations and Research and Development
(R&D) are identified as the heaviest users of analytics (Coghlan, 2010). The research of
Liberatore and Luo (2010) indicates IT firms are quickly establishing positions in analytics
and advanced BI because of the growing demand for analytics. Despite the economic
downturn, the BI market will remain one of the most accelerated software markets. A growing
amount of IT professionals who specialize in BI are required as more organizations begin to
implement BI software. It is clear the growing importance of analytics has a profound impact
on OR professionals and their practice. If OR professionals ignore the advantages of analytics
they will be unable to gain its full potential.
15
2.2 Value for Operations Management Lee, Kao and Yang (2014) state many manufacturing systems are not ready to implement big
data due to the shortage of smart analytic tools. However, manufacturing industries are
continuously evolving for the upcoming industrial big data environment. Today’s
organizations have to focus on a variety of business aspects such as outsourcing, reducing
inventory levels, global manufacturing, just-in-time (JIT) delivery, customer requirements,
etc. All these aspects have to be managed by a centralized data support system where data is
aggregated from multiple sources and suppliers. This enables complete integration and
visibility of manufacturing capacity, inventory, transportation on a global basis and policies to
manage unexpected risks. The entire manufacturing value will know positive influence of the
new techniques gained from data analytics. Mainly in key areas such as R&D, SCM,
manufacturing and service, big data analytics can make a difference. Using big data analytics
manufacturers will be able to reduce the development cycle, optimize the assembly process,
increase yields, and better meet customer needs. Likewise, Waller and Fawcett (2013) state
big data has the potential to revolutionize the supply chain dynamics. They believe new tools
such as big data will transform the way supply chains are designed and managed, raising a
new and significant challenge to logistics and SCM. Today, there is more data because the
data are captured in more detail and because of the need of global supply chains to capture
data at multiple points in the supply chain. Furthermore, many companies that did not record
daily sales by location and by Stock Keeping Unit (SKU) to make inventory decisions now
do. Therefore, many traditional approaches will need to be re-imagined and some will even be
discarded as obsolete in the new data-environment.
McAfee and Brynjolfsson (2012) state that companies recognizing themselves as more data-
driven perform better on objective measures of financial and operational results. Companies
in the top third of their industry using data-driven decision making are on average 5 per cent
more productive and 6 per cent more profitable than their competitors. From recent research
of Yin and Kaynak (2015), it is clear efficiently capturing and analyzing big data has the
potential to enhance productivity and competitive advantage in a wide range of industrial
sectors. Hence, from an industry perspective, big data is going to play an important role in the
fourth industrial revolution. In this Industry 4.0 era, a transformation of the traditional
production management and factory are evolving due to intelligent analytics. The ambition
behind the use of big data in industrial applications is to attain a faultless and cost-efficient
16
running of the process, while realizing the desired performance levels, especially with respect
to quality.
In a recent report, McKinsey (2015) defines Industry 4.0 as digitization of the manufacturing
sector, which is driven by four clusters of disruptive technologies. All digitally enabled
disruptive technologies that are expected to have a significant impact on manufacturing within
the next 10 years belong to the Industry 4.0. These technologies offer ways of leveraging data
to unlock its value potential. The first and the second cluster, namely big data and advanced
analytics, are the topic of this master dissertation. The exponential increase in available data
and advanced statistical techniques empower digitization and automation of knowledge work
and advanced analytics. McKinsey’s research (2015), based on interviews with experts from
different sectors and company sizes, revealed that industries are investing significant
resources in Industry 4.0 because traditional productivity levers are burning out. The pressure
on companies to increase the time to market and customer responsiveness is the reason why
most of them are searching for new opportunities to boost productivity. Becoming
operationally effective is a major concern for manufacturing companies facing an extremely
high level of margin pressure. Digitization and Industry 4.0 stimulate new cost savings that
have so far remain untapped. Data becomes the core driver in smart factories; the study
reveals that big data/advanced analytics approach can result in a 20 to 25 per cent increase in
production volume and up to a 45 per cent reduction in down time.
Remember from the previous chapter, data in itself does not offer a fundamental value. The
key to sustainable innovation within an Industry 4.0 factory according to Lee et al. (2014) is
the actual conversion of big data into useful information. Hence, all data should be
approached with the intention to optimize value (McKinsey, 2015). Industry 4.0 makes it
possible to capture value across the entire product lifecycle. McKinsey’s Digital Compass
(figure 2.2) is an important tool to link the levers of Industry 4.0 to eight value drivers that
have an impact on the performance of a common manufacturing company. Note that the
following section focuses on data-driven demand prediction, because this is the main path to
follow during the case study. Data-driven demand prediction may result in better optimizing
the match between supply and demand. Given that today’s forecast errors are already very
small, it is still recommended to reduce the error rates even further since they cause high
costs. The report states forecasting based on advanced analytics can increase the accuracy of
demand forecasting to 85+ per cent. This lever has a direct impact on the inventory of the
17
company. Excess inventory can be due to inaccurate stock numbers that increase sludge or
unreliable demand planning necessitating safety stock, or overproduction. Hence, improved
demand forecast accuracy decreases the required level of safety inventory by better matching
supply and demand and consequently better managing the variability. Carrying too much
redundant inventory leads to high capital costs having a direct impact on the company’s
margin (Infra. Section 2.4).
Figure 2.2: The McKinsey Digital Compass
Source: McKinsey (2015) @ All rights reserved
18
2.3 Data-driven demand prediction
A joint paper by Purdue University and SAS (2008) found out that traditional methods of
predicting demand are not efficient in a fluctuating market and organizations that only use
these kind of methods are lagging behind the competition. Today’s market volatility, where
past trends are no longer the only indicator of the future, is the reason of the increasing gap
between leaders and laggards in the industry. Analyzing the capabilities of both leaders and
laggards identified huge gaps of maturity in organizational processes, functional capabilities
and technology enablers. Many similar characteristics among best-performing companies with
regard to data-driven demand forecasting were discovered, which enable them to create a
competitive advantage. What follows are some examples of the most important leading
capabilities and are summarized in figure 2.3.
2.3.1 Internal enterprise data Leaders have the ability to create a single demand forecast with input from multiple roles (e.g.
sales, marketing, finance and others) within the organization. In 2013, IBM indicates in a
report ‘The Application of Big Data to the Real World’, internal enterprise data are the main
sources of big data. These kind of data consist of historically static data that are managed by
RDBMS in a structured way (e.g. online trading data) and data coming from different
departments in the organization. Examples of the latter are production data, inventory data,
sales data and financial data. The power of this internal data is often underestimated. Internal
data sources in the form of Excel files or information coming from the centralized enterprise
resource planning (ERP) system can be subject for analysis, to tackle different problems or
identify opportunities within the organization. Chen et al. (2014) state, every 1.2 years, the
business data volume of all companies in the world doubles due to IT and digital data. This
increasing volume requires more effective real-time analysis. Without this analysis, the data
becomes useless and it is just a massive amount of stored data that does not contribute to the
potential of a company.
2.3.2 Causal factors Leaders have the competence to include causal factors (e.g. weather, natural disasters and
competitor actions) into demand forecasts. Besides historical data, which can be of an
immense amount for products that already exist for a couple of years, other sources of big
19
data are even more extensive and might be useful to include as predicting variables into a
forecast model. According to a definition of big data from IBM (IBM, s.d.), large amounts of
decentralized data are created at a daily speed of 2.5 quintillion bytes of data. Especially open
source data can be of interest to many companies when analyzed in a smart way. This data
can come from sensors used to gather climate information, social media sites, digital pictures
and videos, purchase transaction records, cell phone GPS signals, search engine data, etc.
Recent work (Goel et al., 2010) has demonstrated that search engine data can ‘predict the
present’. Large Web search volumes are able to track consumer behavior accurately in near
real-time, e.g. unemployment levels, auto and home sales, and disease prevalence. This
advanced forecasting method is based on the principle that what people are searching for
today is predictive of how they will act in the near future. They also found out that search
query data boosts the performance of baseline models fit on internal historical data or on other
publicly available data. Especially where small improvements in predictive performance are
material, search queries provide a useful guide to the near future.
2.3.3 Model events Leaders have the ability to model events, such as sales promotions, marketing events,
economic activities, etc. The aim of these events is to make sure the forecasting framework
should provide for an automated demand forecasting process based on alerting and
management by exception (such as unexpected competitor actions, etc.). Inspecting or
correcting stable demand signals should not bother demand forecasters. They should be able
to focus on more challenging demand evolutions such as unexpected peaks in demand. That is
why events and alerts are created in dialogue with or within a SCM team. Exceptions in the
forecast of products can have an obvious reason explained by the SCM team, who observes
the products most closely. The qualitative information is translated into IT solutions in the
form of events and alerts. SAS Forecast Studio user's guide (SAS Institute Inc., 2014)
explains events and alerts as follows
Events are automatically detected and modelled within the forecasting process.
They assume a known and stable relationship over time with the demand of
orders.
Alerts are automatically detected events, but they are not modelled within the
forecasting process. The reason for considering a demand driver as an alert and
20
not as an event is because there is not enough historical information available
or because the impact of the demand driver on demand orders is unknown, not
stable enough or too complex for modelling in an automated way. Moreover,
alerts can be seen as a trigger for creating an event. Alerts are designed to give
a warning about a specific occurrence with a potential impact on demand of
orders for specific products.
2.3.4 Technological characteristics With regard to the technological characteristics, best-performing companies are making use of
advanced technologies, such as statistical demand forecasting, demand analytics and
reporting, and sales and operations planning. Although leaders adopt these advanced
technologies, they still use Excel spreadsheets for demand forecasting and planning to some
extent, just as laggards. This could be a signal that companies, despite using advanced
techniques, still lack the flexibility to create real-time, ad hoc reports and spread them across
the enterprise. Finally, leaders possess sophisticated tools, such as an integrated ERP module,
a demand forecasting software and SCM software. In contrast, laggards are still relying on
Microsoft Excel spreadsheets. An important difference between leaders and laggards is that
laggards were 12 times more prone to rely on executive opinion for demand forecasting and
planning than leaders who were more committed to data and analytics.
Figure 2.3: Summary leaders’ capabilities
Leaders
Single demand forecast with input from multiple roles
Including causal factors
Forecast SKU
What-if analysis and scenario planning
Advanced technologies
Sophisticated tools
21
These kind of proficient companies consistently reported significant improvements in key
performance metrics over a period of two years, such as improvements in inventory turns,
order fulfilment rates, forecast accuracy at both the product family and SKU levels, as well as
improvements in gross profit margins. This study from Purdue University and SAS among
more than 180 forecasting managers, planners and Supply Chain executives from 173 unique
companies revealed the top pressures among industries causing more companies to think
differently and to better manage demand. Shrinking profit margins in the industry addresses
companies to become more cost efficient. Moreover, there is a continuous pressure to
accurately match supply and demand and at the same time keeping inventory at a bare
minimum.
Figure 2.4: Top pressures for data-driven demand forecasting
Source: Purdue University & SAS Demand Management Survey 2008 @ All rights reserved
Since companies with more accurate demand forecasting and planning capabilities have better
perfect-order ratings, less inventory and shorter cash-to-cash cycle times than others it seems
clear that demand forecasting requires a lot of attention. Similar, literature on supply chain
(Chopra & Meindl, 2013) states demand forecasting is the basis of a proper supply chain
planning. Those companies that overlook the importance of forecasting often reflect a reactive
business model, because they are only responding to the marketplace, not anticipating it.
Hence, it is clear a focus on demand forecasting enables a company to be ahead of the
competition and must be executed as accurate as possible. With regard to the accuracy of a
forecast, it should be noted a forecast error is inevitable in predicting demand. Forecasting
error is something most companies do not measure. However, it is a very good measure of
22
accuracy and can be used to improve reliability. The goal of every company should be to
reduce the forecasting error with data analytics methods by combining old and new data sets.
Traditional forecasting techniques based on historical data alone are not sufficient anymore
because of increased demand uncertainty. Demand volatility has increased with the number of
choices available to customers, the velocity of promotions to reduce inventory levels, the
reduced consumer confidence and the increasing competitive activity. The importance of
access to real-time consumption data has never been so widely pronounced (Purdue
University & SAS, 2008).
2.4 Inventory control
Industry studies show that better demand forecast accuracy can result in 15 per cent lower
inventories and 17 per cent stronger perfect-order fulfilment (SAS Institute Inc., 2009). More
accurate forecasts not only improve customer satisfaction and increase revenues, but more
importantly lower inventories (raw material, Work In Progress (WIP), finished goods, safety
stocks, etc.) and working capital requirements, thus free up available cash. One condition is
that demand forecasters need to avoid a biased demand forecast for the purpose of order
generation. Some of the demand planners overestimate the order size to avoid out-of-stocks.
Consequently, the improved accuracy of the demand forecast will be lost and inventory levels
and the associated inventory cost will increase again (Chopra & Meindl, 2013). What follows
describes the impact of improved demand forecast accuracy on the inventory level and cost
due to a reduced level of safety inventory.
Companies use scientific inventory management to decide when and how much to replenish
their inventory. In many industries, inventory management is a key component to the
effective, profit-making operations of a business. Inventory control regulates the inventory
that is already in a distributor’s warehouse. It implies the coordination and supervision of the
supply, storage, distribution, and recording of materials to maintain product levels adequate
for current customer needs without excessive supply or loss. As reported in a recent article
(Fritch, 2015), generating maximum profit from a minimum amount of inventory investment
without hindering customer satisfaction levels or order fill rates is the goal of inventory
control. Inventory can be found in different levels of the supply chain for various reasons and
in various forms. Inventory can exist as raw materials, e.g. manufacturers need raw material
inventories to make their products. Moreover, inventory can exist as work in progress and
23
finished goods e.g. both wholesalers and retailers need to maintain finished goods to be
immediately available if customers place orders. The lower levels create demand on the
upstream inventories. Given that this demand is uncertain combined with uncertain transit
times and uncertain delivery from suppliers; inventory incorporates a certain amount of safety
stock. Hence, reducing the demand variability will have a direct impact on the safety stock.
What follows describes inventory in the context of periodic review policies (Chopra &
Meindl, 2013). Periodic review policies are most widely adopted because they do not require
monitoring inventory continuously. Figure 2.5 illustrates the inventory profile for a periodic
review policy with lead time L and reorder interval T for one product. The dashed line from
point 1 to point 2 represents the available inventory from the moment an order is placed until
the next order is placed. Inventory levels are reviewed after a fixed period of time T and the
size of the order is specified such that the level of current inventory plus the replenishment lot
size equals the prespecified Order-Up-to-Level (OUL). The average lot size in a periodic
review system equals the average demand during the review period T and is given as
(2.1)
When demand is normally distributed and independent from one period to the next, a stock
out will occur if the demand during the time interval between zero (review period 1) and T+L
exceeds the OUL. Hence, the OUL, which is the level of the inventory position, should be
large enough to protect the enterprise against shortages until the next order arrives.
Figure 2.5: Periodic review policy
Source: Chopra & Meindl (2013) @ All rights reserved
24
In general, enterprises are operating in an uncertain environment. In other words, enterprises
have to deal with supply and demand variability when monitoring the level of product
availability. To avoid stockouts due to unforeseen circumstances a company should carry
safety stock. Chopra and Meindl (2013) define safety inventory as “the inventory carried to
satisfy demand that exceeds the amount forecast.” Demand is uncertain and it is possible
actual demand exceeds forecasted demand, which results in product shortages. Safety stock is
the average inventory remaining when the replenishment lot arrives. The Supply Chain
Manager should make a trade-off when considering how much safety stock to incorporate.
Raising the level of safety stock increases product availability but at the same time raises
inventory holding cost. In today’s rapid changing environment, where demand is extremely
volatile and product variety has grown, the previous described issue of determining an
appropriate trade-off became extremely important. Demand volatility can be captured by
keeping excessive inventory. However, suddenly the inventory on hand can become obsolete
when new products come onto the market and demand for the product in inventory fades out.
Product life cycles have shrunk as product variety has grown and firms need to be aware of
carrying too much inventory. Therefore, a successful company has to figure out ways to
decrease the level of safety inventory carried without hurting the level of product availability.
“Responding to the customer can be achieved with cost overruns, excessive inventory and
firefighting, but to respond profitably means understanding the sources of volatility and
planning for them appropriately.” Gartner Research, August 2011.
The acceptable level of safety inventory is determined by the two factors presented in figure
2.6. Growing uncertainty of supply and/or demand and an increasing desired level of product
availability cause the required level of safety inventory to increase. What follows describes
demand uncertainty and product availability in the context of a periodic review policy to
understand the impact on safety stock.
25
Figure 2.6: Impact on safety inventory
2.4.1 Measuring demand uncertainty To calculate safety stock in a periodic review system, we need to model the uncertainty
during period T+L. When the distribution of demand during period T+L is assumed to be
normal4, independent and identically distributed the parameters of the demand can be written
as follows
Mean demand during T + L periods, (2.2)
Standard deviation of demand during T + L periods, (2.3)
The derivations of the equations are out of the scope of this master dissertation. D represents
the systematic component of demand and the random component, which is a measure of
demand uncertainty The goal of forecasting is to predict the systematic component and
estimate the random component. Whether an enterprise with a periodic review system is able
to satisfy all demand from inventory depends on the inventory it has on hand when a
replenishment order is placed and on the demand experienced during period T+L, . A
company can take the risk of reordering when the inventory is equal to . Nonetheless,
because of the uncertainty of demand during this period, demand can exceed the mean
demand during period T+L and stockouts will occur. This is why companies include safety
stock based on the uncertainty of demand during period T+L, (Chopra & Meindl, 2013).
4 The Normal distribution is a good approximation for most of the products within a firm. Howbeit, for slow moving products
the Poisson distribution is more appropriate.
Safety inventory
Uncertainty
•Demand •Supply
Level of product availability
26
2.4.2 Measuring product availability In general, the Cycle Service Level (CSL) is the fraction of replenishment cycles that end
with all the customer demand being met. Hence, the CSL is the probability of not having a
stockout in a replenishment cycle (Chopra & Meindl, 2013). The CSL in a periodic review
system is the probability that demand during period T + L does not exceed the OUL.
(2.4)
This equation is equivalent to
(2.5)
where is the cumulative distribution of demand during the period T + L.
Most of the time, it is more appropriate to use the fill rate. Fill rate is the percentage of
demand satisfied from products in inventory and is usually much higher than CSL in a multi-
product situation. It allows estimating the fraction of demand that is turned into sales. Fill rate
should be measured over specified amounts of demand rather than over time. Nevertheless,
there is a drawback of using fill rate instead of CSL. It is much more mathematically
complicated, especially in a periodic review system it is very computational expensive to
calculate the safety stock based on the fill rate. Equation 2.6 presents the formulation of the
fill rate in a continuous review system. Deriving the fill rate for a periodic review system is
out of the scope of this master dissertation and will not be used in the analysis of the case
study.
(2.6)
with ESC the Expected Shortage per Cycle, which is the average demand in excess of the
OUL in each replenishment cycle. When is the density function of the demand
distribution during the lead time, ESC is given by
(2.7)
In general, when the required product availability goes up, the required safety inventory needs
to increase because the supply chain must now be able to handle high demand or
uncommonly low supply. The marginal increase in safety inventory grows rapidly with an
increase in the desired CSL or fill rate.
27
2.4.3 Safety stock formula In a periodic review system, safety stock is the quantity in excess of over the time
interval T+L (Chopra & Meindl, 2013). Hence, OUL and safety stock are related as follows
(2.8)
Based on equation 2.5, OUL can be calculated as the inverse of the cumulative distribution
function of demand. This is the general interpretation of safety stock applicable for all
distributions. Note that in this equation is the mean demand during period T+L and is
not necessarily normally distributed.
For a normal distribution the safety inventory can be written as a function of the standard
deviation.
(2.9)
where z is a safety factor depending on the required service level. The Z-score is the inverse
of the Standard Normal distribution of the CSL.
(2.10)
As illustrated in figure 2.7, the relationship between the CSL and the Z-score is nonlinear;
higher cycle service levels require disproportionally higher Z-scores and, thus,
disproportionately higher safety stock levels. According to King (2011) rather than using a
fixed Z-score for all products, the Z-score should be set independently for groups of products
based on criteria such as strategic importance, profit margin, or dollar volume. Therefore,
SKUs with a greater value to the business will have more safety stock, and vice versa.
Figure 2.7: Safety factor of standard normal distribution and graphical representation
Source: Eazystock (2015) @ All rights reserved
28
Equation 2.9 is only valid when there is no lead time variability. The general equation can be
written as follows when average demand and lead time variability are independent and
normally distributed.
(2.11)
Combining this equation with equation 2.3 reveals that the safety inventory increases
lineary with and increases proportionally to the square root of the lead time. It is important
to see the link with the previous section about demand forecasting. If the underlying
uncertainty of demand ( can be reduced with a factor of k, the required safety inventory
also decreases by a factor of k (Chopra & Meindl, 2013). Therefore, using sophisticated
forecasting models that reduce the forecast error and create a better match between supply and
demand will have a direct impact on the safety inventory.
In general, the expected level of inventory after receiving an order is equal to
(2.12)
Keeping the inventory level at a bare minimum by reducing safety inventory has several
advantages. A crucial advantage, related to the top pressure of figure 2.4, is the cost saving,
which enable companies to operate more cost efficient. Larry Mulky, President, Ryder
Integrated Logistics, Inc. (Harrington, 1996) notes, “Inventory is where the biggest cost is
hidden in most businesses today.” Inventories can cost anywhere between 20 and 40 per cent
of a company’s value per year. In a multi-echelon inventory system, the difficulty is to
interact among the different levels to keep the inventory costs low. At the same time, other
costs such as transportation and production costs need to be minimized. An inventory
optimization system weighs the fixed ordering, unit, holding and potential penalty costs5 for
each product and location combination. It also takes into account the demand, demand
variability, lead time and supply variability to come up with an inventory control parameter
that determines the order size and timing of the order placement to obtain the lowest costs and
to meet minimum stock levels. Moreover, it determines the minimum and maximum stock
level, considering both demand and supply variability.
5 Penalty costs of not having enough stock can include either the cost of backorders or lost sales.
29
2.5 Conclusion To conclude the literature, there is no doubt that improving demand forecast accuracy and
thus reducing the demand uncertainty will result in a lower required level of safety inventory
and carrying costs. Knowing that companies face continuous pressures to operate cost
efficiently and keeping customer service levels high at all time, this seems an interesting topic
for a real-life case study. Hence, in the following case study, the research question whether
big data analytics can improve demand forecast accuracy and consequently reduce the safety
inventory of a company, will be investigated based on data sources of a pharmaceutical
wholesale company. We will try to apply new advanced capabilities of leading companies
with regard to demand forecasting into the case study (cf. Section 2.3). The objective of the
case study is to optimize inventory levels using these advanced forecasting techniques and
therefore to minimize the overall cost of the company without hindering the customer
satisfaction levels. Based on the theoretical insights of section 2.4 the impact on the
company’s safety inventory and carrying cost will be analyzed.
Part II
Case Study
30
Chapter 3
Multipharma
The case study is set up in a cooperative wholseller company, Multipharma. A wholeseller is
one stage in the entire pharmaceutical supply chain. Hence, before given a general
introduction about the company (Section 3.2) and the network in which it operates (Section
3.3), a broader view on the entire pharmaceutical supply chain is necessary to understand the
context of the case study (Section 3.1). Section 3.2 and section 3.3 are based on qualitative
interviews with the supply chain department of Multipharma (D.V. Belle, personal
communication, March 9, 2017).
3.1 Pharmaceutical supply chain A pharmaceutical product has a very long and complex research and discovery phase.
Afterwards, the product needs to be tested for safety and efficacy. The final phase consists of
manufacturing and distribution but this phase can be broken down into several subphases.
First, during the primary manufacturing active ingredients are produced. Afterwards during
the second manufacturing, the final product in SKU form is produced. According to Shah
(2004), both manufacturing stages operate very slowly because of the many quality assurance
activities. The final products need to be brought to market warehouses or distribution
channels (wholesalers) and thereafter to retailers (pharmacies) or hospitals. A general
overview of the different stages of a pharmaceutical supply chain is represented in figure 3.1.
Figure 3.1: A pharmaceutical supply chain
pharmaceutical company warehouse/wholesaler pharmacies/hospital end-user (patients)
31
Booth (1999) recognizes a trend for companies to divest excess capacity resulting from many
local manufacturing sites, and move towards a global SCM process. The different stages
involved in the movement of the product through the global chain make a pharmaceutical
supply chain difficult to coordinate. Besides significant intra-organizational information flows
between different planning units there is also a vital inter-organizational exchange of
information between the different members of the SC. The pharmaceutical supply chain has a
large scale and geographical span, which is why the communication and as a result, the
coordination between the different departments is very limited due to the delayed information.
Difficult coordination and communication creates a bullwhip effect, which is largest at the
primary manufacturing sites where large stocks of active ingredients are held to ensure good
service levels. The stock level in the entire supply chain is between 30 to 90 per cent of the
annual demand. Additionally, the large scale and span makes it very difficult to exploit short
term opportunities such as shortage of a supplier’s products. The supply chain cycle time is
between 1000 and 8000 hours. Hence, when an opportunity arises at the lower levels, it takes
a considerable amount of time until it reaches the higher levels of the supply chain. These
operational issues have an impact on the efficiency and effectiveness of the supply chain.
Managers recognize these issues and opposing to a decade ago where the main focus was on
drug discovery, sales and marketing, today, they pay much more attention to supply chain
optimization as a means of delivering value (Shah, 2004). It is clear improving the
coordination between the different stages and avoiding a bullwhip effect is a key challenge in
the pharmaceutical supply chain. According to Shah (2004), demand and inventory
management together with distribution and production planning are key business processes in
the pharmaceutical industry. In each geographical region a pharmacist develops forward
forecasts based on historical data and market intelligence. The result of the demand
management is a demand forecast at the lowest level, which can be aggregated and imposed
on the appropriate warehouse or distribution centre. Detailed schedules provide information
on how to place orders with upstream suppliers. At each stage, several transport modes are
applied for the delivery of incoming goods. Making a trade-off between the holding cost and
transportation cost results in the optimal lot sizing for these goods.
3.2 Introduction to Multipharma When considering the entire supply chain (Figure 3.1), this master dissertation focuses on the
second stage. Multipharma Group is a Belgian wholesale company, which core business is to
32
distribute medicines. The company disposes of a network of 270 corporate-owned
pharmacies. Multipharma was founded in 1921, with the aim of making medicines financially
accessible to a broad public. Nowadays, it is the sixth largest player of drug distribution in
Belgium. Moreover, it is the largest chain in the pharmacy sector. Besides the distribution of
medicines it also has 23 iU stores where parapharmaceutical products are sold. iU, originally
named Equiform, was founded in 1995 and since then it is the pioneer in the distribution of
parapharmaceutical products such as care, dietary and baby products in Belgium.
Multipharma’s vision is patient-centered. The pharmacists endeavour to advise their patients
and guide them in terms of medication use, health and quality of life in general. The
pharmacist will build a relationship of trust with the patient, but also with other care
practitioners, (home) nurses, etc. Consultation is an integral part of the daily activities in the
pharmacy. In order to optimize the quality of service, Multipharma increasingly invested in
continuing education of pharmacy teams, scientific support and innovative pilot projects in
the interest of the patient. The goals are quality, efficiency, safety and accessibility.
Accordingly, the strategy of Multipharma is strongly focused on service. Multipharma’s
mission is to be able to deliver the prescribed product immediately and to reduce the number
of clients who need to return a few hours later because the product is not available. With the
same or a lower stock level they want to create a better service level.
The pharmaceutical sector has not been spared from governmental pressure to contribute to
savings in the healthcare sector: since April 1, 2012 the price of reimbursed medicines was
decreased by 1.95 per cent and pharmacies were obliged to propose the cheapest drug to the
patient to benefit from reimbursement. These measures had a direct impact on the margins,
which is why supply chain efficiency and cost savings are more than ever the eye catchers
of the company. Multipharma faced a pretty big challenge: there was no supply chain
department in 2010. Each department worked independent of the other and the same can be
said of the pharmacies. Despite the fact that the pharmacies belong to the network of
Multipharma and they are not self-employed but clerks, the pharmacists are used to a very
great freedom to run their pharmacy. It is a fairly complex situation, because on the one hand
they are Multipharma pharmacies but on the other hand they are not required to order all
references at Multipharma. That is why Multipharma started to reorganize the whole chain.
The objective was to incorporate supply chain principles in the process and further leverage
33
on their scale. Multipharma had access to market demand and order data of all 40k SKUs6 and
used this big data to accomplish the goal of reorganizing the chain. On the one hand, the
system would allow Multipharma to make more efficient use of the central warehouse
capacity to cope with increasing volume and product diversity; on the other hand it would
encourage Point Of Sales (POS) to order more products through Multipharma. Today the
supply chain department is responsible for the statistical forecast, which is enriched by the
Category Manager, the Marketing department, feedback from the network, etc. Besides
introducing supply chain principles in the chain, Multipharma had to change the mindset
around the ERP system. They had been using SAP for years, however, not using more than 10
per cent of the functionality. About everything was done with individual Excel sheets and
paper printouts. There was some cooperation between the different departments, but each
department worked according to its own methods and with its own tools (Excel, SAP, Word,
etc.). In today’s environment, with a lot of emerging technologies, the old way of thinking
would cause them to fall behind their competitors and something had to change dramatically.
Internally, they developed additional functionality on top of SAP to enrich the forecasting.
Fearing a lot of internal resistance, they did not opt for the implementation of a completely
new forecast package. Hence, they chose a step-by-step approach, which will be explained
next (“Supply chain als wissel”, 2012).
Part of the first project was to set up more advanced internal procedures such as demand
planning, forecasting and inventory management. Improving the quality and efficiency of
demand forecasting was an essential part of the reorganisation. This efficiency also included
optimizing the time spent on each task, such that a planner could bring more added value to
the process. Increased forecast accuracy and precision combined with inventory optimization
resulted in a better balance between stock levels and service levels. By applying and
incorporating new business rules, the outcome of the whole process became easier to predict
and manipulate, without detailed human interference. The improvement of the supply chain
backbone was an essential first step to prove the reliability of Multipharma towards the POS.
The scope of the inventory optimization project was limited to forecasting central warehouse
demand of centralized SKUs (about 12,500 SKUs) and optimizing the order flow to the
warehouse.
6 Note that this amount of SKUs is based on the entire assortment of 2010. In 2016 the assortment is even more
differentiated with up to 71k SKUs.
34
The next step, which is still not fully operational today7, is to provide POS with trustworthy
order propositions of existing or new Multipharma products. Multipharma expected that the
better they are able to assist POS with their own stock management, the lower the threshold
will be to place orders through Multipharma. Tooling to provide insightful order calculations,
options to approve or overwrite orders and in later stages automatically accept order
propositions from Multipharma is required. However, convincing a network of 270 local
pharmacies, each with their own stock system, why they need to hand in part of their
independency, is not an easy job. In practice, this implies that the pharmacies no longer
determine the inventory parameters. Instead, they are automatically replenished by
Multipharma. Based on the individual sales of each pharmacy, each week, the stock
parameters should be calculated centrally and deliveries should be made to the pharmacies
based on these estimations. In principle, this process can operate entirely automatic. In line
with their vision it will give the pharmacists the opportunity to focus on advising patients and
reducing the efforts of administrative hassle. However, the pharmacists are not that happy
about the new central stock and forecasting policy. The communication to pharmacies about
the new ordering policy is one of the most challenging tasks for Multipharma. It is very tough
to convince the pharmacists the new system will work significantly better, faster and more
precise than the traditional way of ordering. Multipharma had underestimated the impact of
change management and communication. However, the positive outcome of this project must
come from the combination of the capacity of the system and the experience and knowledge
of the pharmacists. Multipharma expects the pharmacies will start to appreciate the quick and
reliable delivery of products of the wholesale distribution warehouse.
Another step of the reorganization is related to Multipharma’s warehouse system. In the near
future, Multipharma will be able to control all warehouse processes and areas with a single
software system. The system makes use of continuous lot tracking and optimizes all
warehouse areas from goods-in to goods-out, including returns. Using a standard interface,
the Warehouse Management System (WMS) will be completely coupled to the already
existing SAP system. Meanwhile, they are improving the control of the warehouse, with the
optimization on utilization of KiSoft, the WMS-Package8 of Knapp, which they do not yet use
to the fullest. Another goal, that is not yet fully accomplished, is related to the SAP system.
The SAP system should capture and share global information from within the company and
7 To date, all processes are managed independently but not yet fully integrated.
8 A Warehouse Management & Controle System
35
across its supply chain. Users of the system from all kind of departments (supply chain, sales,
HR, etc.) within Multipharma should use the same tool and share information to achieve
higher productivity and performance. Management will be more able to react quickly when
performance drops in a specific pharmacy thanks to real-time dashboards and they will have
better insight to identify the cause of issues. Hence, this will certainly improve the quality of
operational decisions because real-time information is shared. This real-time information
combined with Multipharma’s SCM software will provide analytical decision support.
As a combined result of these first improvements there was a significant reduction of 10 per
cent of inventory in the central warehouse, which resulted in 2 million Euros of savings.
Those internal optimizations are necessary to convince pharmacists in the future to order in a
different way. Today, the average coverage of the products is 24 days9 where high rotating
products are held no more than one week in the stock of the central warehouse, while slower
rotating products are at 6 weeks of stock on average. The target Multipharma aims to obtain
in the future, based on the previously described not yet fully completed improvements, is 20
days.
3.3 Network description Multipharma is a wholesale company with one Distribution Center (DC). Although
Multipharma is rather small compared to its top competitor that has 10 DC’s, the logistic cost
of Multipharma compared to its revenue is better than most of its competitors. The warehouse
is one of the most efficient 10
in the country. It is located in Anderlecht where about 12,500
distinct SKUs and 2,500,000 units (boxes) are kept in stock. Over 65 per cent of the orders to
pharmacies are automated thanks to robots, which is why the delivery can be completed in
record time and with the highest reliability. All the pharmacies are supplied once a day,
except for Brussels where it happens twice a day. As can be seen in figure 3.2, most of the
pharmacies contain between 4,000 and 6,000 drugs and other care products. Multipharma
develops tools and processes to help the pharmacies ensure the continued availability of these
products. Besides pharmacists there are other distribution channels as well such as prisons, e-
commerce, B2B, iU’s, etc.
9 Based on a 6 day week 10 The warehouse is 65% fully automated and 25% partially automated, which leaves 10% manually picked.
36
Figure 3.2: Frequency count according to size of assortment
Source: D.V. Belle, personal communication, March 9, 2017
The as is supply chain network in which Multipharma operates consists of two physical
warehouses, Multipharma and Pharma Belgium/Belmedis, supplied by a total of 600 to 900
suppliers. For a specific set of products (non-centralized) the Multipharma’s warehouse takes
up the function of a cross-dock. It is assumed that handling non-centralized items do not have
an impact on possible warehouse constraints. The bottleneck of the warehouse is its limited
volume capacity that affects how order decisions are being made and order generation can be
optimized. The biggest constraint is making sure the current warehouse ‘survives’ until 2019,
when they will build a warehouse twice the size of the current one. The biggest problem is the
inbound logistics11
, where there is a limited capacity due to heavy impact by administrative
processes. Reception has a capacity of about 500 to maximum 550 orderlines a day. This
could be an alerting threshold, therefore, reception seems to be the bottleneck. The overall
network design of Multipharma is represented in figure 3.3. There are three types of inventory
points (indicated in red), namely at the suppliers, at the warehouses and at the pharmacists.
These are the locations where SKUs are stored, produced or transformed. An inventory point
is considered as an independent facility with its own stock management objectives and KPI’s.
Shortages in the network of inventory points can occur for suppliers to central warehouses
and from central warehouses to pharmacists (POS). Multipharma is represented by the
rectangle in the middle and consists of the DC and customer service department. Multipharma
obtains the products directly from a large amount of different suppliers. Most of the time one
order at a time is placed with each supplier resulting in many invoices and by consequence,
11
The activities of receiving, storing, and disseminating incoming goods or material for use.
Source: http://www.businessdictionary.com/definition/inbound-logistics.html
37
increasing the administrative workload. The order decisions are based on the forecast
prediction of SAS12
, aiming to match supply and demand to the highest extent.
Multipharma’s warehouse contains the 12,500 most performing SKUs. It should be noted the
variety of SKUs sold by the pharmacists was 71,000 in 2016, hence, although Multipharma is
the primary deliverer to the pharmacies (88.5%) there has to be another important partner in
the network. Pharma Belgium/Belmedis is a second wholesaler and preferred partner of
Multipharma. This wholesaler has a larger scope of products (40,000 SKUs) and delivers
more frequently when needed. At the first glance, pharmacists should prioritize the extended
scope and deliveries of Belmedis, however, the opposite is true because Multipharma offers
better pricing conditions to the pharmacists to retain customer relationships with the
pharmacists. Note that the retailers can order directly from the suppliers (7.94%) or transfer
products to other pharmacies (0.06%). The product reaches the end-consumer because the
customer enters a POS and requests the products needed and/or prescribed and which the
pharmacist provides from stock.
Figure 3.3: Network design Source: D.V. Belle, personal communication, March 9, 2017
12 The case study aims to raise the forecast accuracy even more using SAS as an analytical tool in combination with new sources of information.
38
Chapter 4
Methodology
This chapter outlines the scope and methodology of the case study. The aim of the case study
is improving demand forecast accuracy of Multipharma by using big data analytics (cf.
Section 2.3). Since 2015, there is a continuous tendency to improve forecast accuracy with
SAS programming and SAS Forecast Studio within Multipharma. From section 2.3 it is clear
leading companies use more advanced technologies like a demand forecasting software and a
SCM system. The positive impact of these advanced analytical tools on Multipharma’s on
hand inventory is evidenced in figure 4.1 for the past two years. The average days in stock
calculated at the end of 2016 reduced with about 3.5 days compared to the end of 2015.
Knowing that one day of stock costs about 800,000 Euros, it can be calculated the savings of
Multipharma in 2016 were as much as 2.92 million Euros (D.V. Belle, personal
communication, March 23, 2017).13
Figure 4.1: Coverage in days 2015-2016: Impact of SAS on inventory
Source: Multipharma, slideshare operational results (2016)
13 A critical reader might have noticed that the coverage from April to July 2016 was worse compared to the coverage of 2015. This was mainly due to errors at SAS’s first launch. Wrong table linkages were created, by consequence, wrong stock was ordered creating problems for inbound logistics. Note that the problems were solved in April and just needed some recovery. Thereafter, it is clear the analytics created significantly better results.
39
However, there is still a gap between the forecast and actual demand of the products that can
be reduced with new insights and techniques. First, these new insights are obtained from
qualitative research; weekly meetings with the Supply Chain department of Multipharma
reveal additional insights to improve the time series of certain product classes. Together with
the Supply Chain department, new demand drivers are identified and prioritized (cf. Figure
6.1). The business insight of the Supply Chain department is essential to define demand
drivers, which are not yet modelled in the current forecasting model. The quantitative
research will focus on the priority one demand drivers related to promotions. As mentioned in
section 2.3, leaders have the ability to model events like promotions. Hence, we expect to see
a benefit of using internal data to model three kind of promotional events. Therefore, the first
hypothesis that will be investigated is whether modelling different promotional events using
internal data sources can improve the demand forecast accuracy of the company.
Furthermore, from section 2.3 it is clear leaders are able to include causal factors (e.g.
weather conditions) into demand forecasts. Therefore, the second part of the quantitative
research focuses on seasonality (priority 2) in combination with weather conditions and
Google Trends data (priority 3). Consequently, the second hypothesis that will be investigated
is whether demand forecasts for seasonal products (flu-related medicines, sunscreens,
mosquito products and insecticides) will benefit from including external data sources as
predictor variables/causal factors (cf. Section 2.3) into the baseline model and afterwards if
the Google Trends data can be used to exclude extreme order peaks from the data. Section 6.2
and section 6.3 will re-evaluate the accuracy of the forecast when incorporating the internal
and external data sources, respectively. In other words, the case study investigates if superior
value can be obtained by combining already existing data of the products with additional
internal and external data sources.
The first part of the analysis is built on the research paper of Cachon and Fisher (1997) in
which they forecast normal demand with an Exponential Smoothing Model (ESM) where the
forecast is not updated if it occurred on a promotion day. Although, the case study is not
about sell-through (consumer) promotions but sell-in promotions14
it seems reasonable to
assume that forecasting orders from retailers can be improved based on this technique. The
second part of the analysis is related to other work where forecasts for the opening weekend
14 Sell-in promotions are promotions from the manufacturer (supplier) to the retailer where the retailer does not pass through the promotions to the end-consumer thereby stocking inventory and serving the consumer from stock after the promotional period (Chopra & Meindl, 2013).
40
box-office revenue for feature films, first-month sales of video games, and the rank of songs
on the Billboard Hot 100 chart are obtained making use of search query volume. In this
research Goel et al. (2010) found that search counts are highly predictive of future outcomes
and generally raise the performance of baseline models fit on other publicly accessible data.
Apparently, to date no studies have been published to improve demand forecast accuracy of
pharmaceutical products combining several forecast capabilities of leading companies
(Purdue University & SAS, 2008), thereby using internal data sources to model promotional
events and external data sources (weather conditions and Google Trends data) to include
causal factors. Moreover, this case study will verify the impact on safety inventory of
improved demand forecast accuracy.
The coding of the case study will be executed with SAS Enterprise Guide. The obtained data
sources will be used to extend the historical dataset with new explanatory variables, i.e. the
new demand drivers. The output in the form of an extended SAS table will be used in SAS
Forecast Studio, similar to the real forecasting technique of Multipharma. SAS Forecast
Studio is a forecasting application that is designed to speed up the forecasting process through
automation. The forecasting process of SAS Forecast Studio is based on a stepwise approach
as represented in figure 4.2. By default, an automated overnight forecasting process will apply
the correct forecasting model based on the product classification, create a forecast and alert
irregularity. Alerts can be reviewed by the demand forecaster and overwritten if necessary.
Every time, the forecast takes into account all existing and new forecast settings and only the
future is reforecast (SAS Institute Inc., 2014).
Before conducting the analysis with additional demand drivers, chapter 5 describes the
existing forecasting process more in detail. The data provided by Multipharma and the
product grouping will be outlined in section 5.1 and section 5.2, respectively. The
classification of products into different groups is required because different forecasting
strategies can be distinguished based on the different product classes. Different products will
have different forecasting schemes. Forecast settings defined on group level are copied to all
products belonging to the product group in question. The second step of the forecasting
process implies fitting the appropriate model to the various product classes. For each product
group all candidate models of SAS Forecast are fitted against the time series. The best
performing model is automatically selected based on the chosen evaluation criterion (MAPE)
and holdout sample. Afterwards, the model is used to extrapolate the time series into the
41
future, thus creating a statistical forecast (step 3). However, the automated procedure of
selecting the appropriate model will not be considered as a black box, based on a critical
approach the underlying assumptions of the model will be examined and an alternative model
will be proposed if not appropriate. Therefore, a separate section (Section 5.3) is devoted to
the selection of models in SAS Forecast Studio to understand the underlying principle. The
theory behind the most likely models, ARIMA and Exponential Smoothing, is essential to
understand the time series forecasting and is provided in Appendix B.
Figure 4.2: Forecasting process of SAS Forecast Studio
From SAS Forecast Studio the accuracy of the demand forecasts, in the form of a forecast
error, can be extracted. This is the necessary component to re-evaluate the demand accuracy
subject of section 6.2 and section 6.3. What follows will describe the method to evaluate the
quality of the forecasting process with the Mean Absolute Percentage Error (MAPE). As
explained by Chopra and Meindl (2013), improving forecast accuracy goes along with
reducing the forecasting error. Every instance of demand has a random component. The basis
of a good forecasting method should capture the systematic component of demand but not the
random component (i.e. the forecasting error).
Fo
reca
stin
g au
tom
atio
n
Step 1: Classify Products
Step 2: Apply Forecast Model
Step 3: Create Forecast
Step 4: Review Forecast
42
Forecast error for period t is given by , where the following holds
(4.1)
The error in period t ( is the difference between the forecast for period t ( and the actual
demand in period t . Many Key Performance Indicators (KPI’s) express the quality of
demand forecasts. MAPE is a well-known forecast quality KPI. It is the average absolute
error as a percentage of demand.
(4.2)
with Dt being the actual demand in period t, Ft the forecasted demand in period t. The
absolute value in this calculation is summed for every forecasted point in time and divided by
the number of fitted points N. Multiplying by 100 makes it a percentage error.
MAPE is the preferred KPI of forecast error when the underlying forecast has significant
seasonality and demand varies considerably from one period to the next (Chopra & Meindl,
2013). MAPE is the most suitable KPI of forecast error for most of the product classes of
Multipharma. Taking this into account when analyzing the impact of the additional data
sources, it can be evaluated if the demand forecast accuracy of certain product categories is
improved. In chapter 7, the benefits Multipharma obtains from the improved demand forecast
will be investigated. The impact on the safety stock (Section 7.1) and the corresponding
carrying cost (Section 7.2) will be quantified. It should be noted that these calculations are
based on the literature in chapter 2.4, supplemented with continuous qualitative input from
the Supply Chain department to make it more conforming to the way of working of the
company. Another remark is that the demand forecast will be executed based on an aggregate
dataset, i.e. a dataset containing the sum of all the order quantities of the individual
pharmacies and parapharmacies of Multipharma. Hence, the demand forecast and inventory
level of each individual (para)pharmacy (i.e. at a lower level of the chain) will not be
considered. This might be a topic for further research as explained in chapter 9.
43
Chapter 5
Demand Forecast
5.1 Data description and operating assumptions
To conduct this master dissertation Multipharma provides aggregated data of demand orders
of its 270 pharmacies and 23 parapharmacies (iU stores) as well as data of all SKUs and
information from its SAP system related to the operating environment. The latter includes
information on a SKU level regarding the order processing, lead times from suppliers and the
intended service level they want to obtain.
The aggregated data is provided in the form of two SAS datasets. The first dataset is a
historical dataset of the daily aggregated demand for every SKU from January 2, 2013 till
November 13, 2016 or when the product is relatively new, from the moment it started to exist.
This results in a dataset of 14,393,373 observations. The second dataset is a historical dataset
of the weekly aggregated demand for every SKU from December 31, 2012 till March 13,
2017. The dataset contains 2,032,331 observations. Moreover, this dataset contains a binary
variable to indicate if a product is in assortment or not, thus whether it is centralized or non-
centralized (cf. Section 5.2.1). Most of the time, it is always in assortment from the moment it
starts to exist. In addition, the data incorporates for every SKU and every week, four grouping
variables. These variables are used to classify the products into different groups, which will
be further explained in detail in section 5.2.2. Besides the aggregated historical datasets,
Multipharma also possesses a dataset describing all unique SKUs. The data consists of 12,578
distinct SKUs with a unique code and name of the supplier, the product name in Dutch and
French, whether the product has a basic and additional promotion and how much, a couple of
variables regarding the group levels (cf. Section 5.2.2.) and the planner responsible for the
product. Moreover, the dataset contains binary variables whether the product is centralized
and existing, which is related to the product classification described in section 5.2.1. and a
44
couple of other variables for which the explanation is out of the scope of this master
dissertation.
5.2 Product Grouping The rationale behind the grouping of the products is twofold. One the one hand, different
products will have a different forecast purpose and require a different forecasting strategy.
Those different product classes must be identified. On the other hand, it is known that forecast
accuracy can be improved by producing one single forecast for a group of similar products,
which is then disaggregated to the individual SKUs making up the group instead of creating a
separate forecast for every single SKU. This is because a lot of SKUs have sparse historical
sales data. The key in here is to come up with relevant groups of substitute goods. Forecast
settings defined on product group level are copied to all products belonging to the product
group in question. In dialogue with the Supply Chain Manager, Van Belle D., the general
product classification and group levels are defined and explained in section 5.2.1. and section
5.2.2. below (D.V. Belle, personal communication, March 23, 2017). The qualitative
information of the manager is completed with webpage15
information of the WHO and the
general pharmaceutical association (APB), which explains some classifications more in detail.
5.2.1 General product classification A first logical split of products can be made by separating non-centralized products,
centralized products and products in decentralization. The nature of forecast for these groups
is completely different. All non-centralized products are considered as references outside
assortment. Automated forecast for references outside assortment are purely ‘informative’ and
are not part of the operational forecasting framework. Purpose of the products in
decentralization is to follow up the current and expected stock positions for those products.
When not a lot of new pharmacists’ orders are expected, it can be decided to remove all
remaining stock from the central warehouse for the product in question. In contrast,
centralized products are kept in stock in the central warehouse and could be directly ordered
by the pharmacists from Multipharma. Pharmacists’ sales are the only source available to
15 Source: http://www.who.int/en/ http://www.apb.be/nl/corp/Pages/default.aspx
45
forecast demand for decentralized products. Demand forecasts for centralized products are
based on the order history from pharmacists to the central warehouse. Purpose of the forecast
for centralized products is to reduce uncertainty for the central warehouse order generation
process. Being ‘centralized’ is a product characteristic that can change over time e.g.
sunscreens are centralized during summer, but are not centralized during winter. At any given
point in time, there are about 12,000 centralized items and 70,000 non-centralized items.
Centralized products will be split in new and existing products. Existing products are either
permanent or non-permanent. New products will be divided in different subgroups (Figure
5.1).
Figure 5.1: Overview product classification
5.2.1.1 New products
A product will receive the status of ‘new product’ at the moment it becomes centralized and
when no order history can be detected. The period during which a product is considered as
‘new’ is controllable by the demand planner (an end-date will flag the ‘end’ of the ‘new
period’). New products are the hardest to forecast, as there is no historical order data
available. Therefore it relies mainly on the experience of the demand planner who needs to
Non-centralized product
Products in decentralization
Centralized products
Existing products
• Permanently centralized
• Non-permanently centralized
New products
• Completely new products
• Sucessor products
• New in category
• Limited edition
• Line extension
46
assign an appropriate forecast strategy for those new products. We differentiate between the
following five classes of new products all requiring a different forecast strategy.
Figure 5.2: Characteristics of five types of new products
5.2.1.2 Existing products
All centralized products that do not have the status of ‘new product’ are considered as
‘existing product’. Order demand for existing products can be forecasted by looking at
historical order patterns for those products. The more stable historical patterns and the higher
past order volumes, the better the forecast. When there is a large number of series to be
forecasted, as is the case for Multipharma (11,067 series), choosing an appropriate forecasting
method for each series has the potential of major cost savings through improved accuracy
(Fildes, 1989). When doing an aggregate forecast, a single method is applied to all the time
series of a particular class and afterwards this aggregate forecast is broken down to an
Completely new product
•Revolutionary or niche products •No reference products •Forecast: human judgement and experience
Successor product
•New package, different formula or composition •Updated version of existing product (predecessor/ reference product) •Forecast : historical order data from reference product which disappears
New in category
•New generic entering market •Possibility of cannibalization effect and market share redistribution within category •Forecast: similar existing product is reference product
Limited editions
•Promo items (e.g. price reduction, giveaway or extra volume) •Reference product will not disappear •Only for certain time period (fixed promo period or until stock lasts) •100% cannibalization effect on reference product during the specified time period
Line extensions
•Another packaging unit •Reference product will not disappear •Change in order behavior and possibility of cannibalization on 'original' product •Forecast: Existing 'original' product is reference product
47
individual level. Those proportions can be specified by first estimating a base forecast for the
SKU level. In contrast, individual forecasting identifies a particular method appropriate for
each series. Often disaggregated data are noisier than aggregates constructed for them, which
makes the series of these data harder to forecast and of a lower quality. This is in line with the
second law of forecasting which states that detailed forecasts are worse than aggregate
forecasts (Hopp & Spearman, 2008). On the other hand, sometimes information may be lost
doing aggregated forecasting; it can distort trend, seasonality and other individual product
characteristics, resulting in a ‘loss of information’. For some products, the individual order
level will be good enough to produce accurate forecast, but most products will profit from
hierarchical or grouped forecasting. Aggregating similar products almost always improves the
ability to model and predict trend and seasonality. One method outperforms the other for each
SKU depending on the type of product, the lifetime, its relation to other products, etc.
Therefore, the forecasting method of Multipharma executes both aggregated and individual
forecasts for each SKU. With regard to the aggregated forecast, there is no single product
classifier that will fit for dividing all products into exhaustive groups with enough similarity.
On the other hand, using a combination of product classifiers often result in too many small
groups not covering all similar products. Therefore, four grouped forecasts will be made that
are then disaggregated at the SKU level. The products are aggregated based by general
classification techniques applicable in the pharmaceutical industry (IMS, DCI and APB) and a
classification technique specifically defined by Multipharma (GSTAT). Each classification
divides the products into different categories. Hence, for each product four forecasts are
obtained from breaking down aggregated forecasts based on the category of the class to which
it belongs. Moreover, an individual forecast will be executed for every SKU. The best
performing forecast framework out of those five will be used to generate the actual forecast
for the SKU. The following section describes the four classifications more in detail.
Figure 5.3: Hierarchical breakdown of disaggregated data
48
5.2.2 Group levels
5.2.2.1 IMS Classification
The Uniform System of Classification (USC) is a categorization system, developed by IMS,
to resolve a need for therapeutic classification of pharmaceutical products. This classification
technique is widely accepted in North America as the standard for pharmaceutical product
classification of products. Logical grouping of pharmaceutical products based on this
classification system makes it easier to identify products competing in the same or similar
markets. The IMS Classification consists of 1 top-level and 3 or 4 sublevels (according to the
product). The three sublevels are the attributes considered when rendering a decision
regarding the placement of a product or the creation of a new USC category (Alvarez, 2015).
There are approximately 2,400 unique combinations. The explanation of the sublevels is out
of the scope of this master dissertation. A shortcoming of the USC classification is that the
product can only be classified when there is a sales registration. Sales history is not available
for new products, which will be classified in the last category: ‘Other’. Another shortcoming
is that 5 to 10 per cent of centralized SKUs do not have an IMS classification
5.2.2.2 INN/DCI Classification
The International Nonproprietary Names (INN) classification names a pharmaceutical
substance or active ingredient with an official universal and unique name. The existence of an
international nomenclature for pharmaceutical substances, in the form of INN, makes the
communication and exchange of information more efficient and convenient among health
professionals and scientists worldwide. This is beneficial for a clear identification, safe
prescription and dispensing of medicines to patients. Substances belonging to the same group
have similar pharmacological activity. The generic names indicate via their stems what drug
class the drug belongs to. There are about 1,600 unique combinations (stems), which will not
be presented in this master dissertation. Nevertheless, it should be kept in mind that this is a
relevant classification system for the products of Multipharma. A shortcoming is the
unavailability for the majority of products, 60 to 70 per cent of centralized SKUs does not
have a DCI classification.
49
5.2.2.3 APB Classification
The “Algemene Pharmaceutische Bond” (APB) is a federation of mainly independent
pharmacists in Belgium. It has created a legal classification with 18 unique combinations,
based on the national codation (CNK). The codation is applied to all medicinal and
pharmaceutical products (medical devices, biocidal products, food supplements, cosmetics,
etc.) that are delivered in the pharmacy; both for human, veterinary and phyto-pharmaceutical
use. Contrary to IMS, this classification is known independent of any sales registration, it
should be known one month after product creation.
5.2.2.4 GSTAT Classification
The “Groupe Statistique” (GSTAT) is a classification method drafted by Multipharma to
assign a personal classification to its products. There are approximately 10 unique
combinations. A lower sublevel of this classification exists, but we will stick to this more
general level of classification.
The IMS top level, APB and GSTAT classification can be found in Appendix A. A recent
forecast obtained from SAS Forecast Studio of the current dataset outputs the forecast method
used for each product, i.e. the forecast technique that obtains the most accurate result for that
specific kind of product. The distribution of the different methods is represented in figure 5.4.
About 62 per cent of the products will make use of the aggregated forecast technique. The
other 38 per cent of the products benefit from an individual forecast because they have
sufficient historical information.
Figure 5.4: Distribution of products over the different forecasting methods
Source: Dataset Multipharma
50
5.3 Model selection SAS Forecast Studio supports the needs of forecasters who need to move through the
production forecasting process quickly (Wolfe, Leonard, & Fahey, n.d.). With the automated
forecasting procedure Multipharma is able to generate statistical forecasts for all time series.
In order to generate these forecasts SAS Forecast Studio must first determine an appropriate
model for each time series. SAS Forecast Studio user's guide (SAS Institute Inc., 2014)
explains how the program selects the appropriate models to execute forecasts given the input
data. SAS Forecast Studio runs a series of diagnostics to determine the characteristics of the
data (such as seasonality or intermittency), and avoids models that are inappropriate for the
data. If diagnostics determine that a series is intermittent and by consequence continuous time
series models, such as Autoregressive Integrated Moving Average (ARIMA), Exponential
Smoothing (ESM), or Unobserved Components Models (UCM) cannot be used, SAS Forecast
prevents these models from being used. On the other hand, if the diagnostics determine that a
series is continuous and as a consequence Intermittent Demand Models (IDM)16
cannot be
used, SAS Forecast will avoid these models from being used. When SAS Forecast Studio
diagnoses a project, it attempts to fit all the models in the model selection list. By default the
model selection list consists of ESM, ARIMA and IDM. External models can be added to the
model selection list. However, ARIMA en ESM models seem to work well for the existing
and newly created datasets of Multipharma.
By default, SAS Forecast Studio chooses the best-performing model in the model selection
list as the forecast model. The best-performing model is chosen based on the chosen holdout
sample and selection criterion, which identifies the most accurate model. A preferred and
common selection criterion in business forecasting is the MAPE as explained in chapter 4.
The MAPE is the average of all the individual absolute percentage errors. When MAPE is the
selection criterion, the model with the smallest MAPE value is the best-performing model. It
should be noted that sometimes the ‘best-performing’ model according to SAS Forecast
Studio is not the one the Supply Chain department had in mind. However, Multipharma may
not have time to review all of its 11,067 automated forecasts. In order to efficiently select
those forecast with an inaccurate model fit, they need to quickly identify and address
exception forecasts first (Wolfe et al, n.d.). Based on the distribution of the MAPE of all time
16 IDM are used for time series that have a large number of values that are zero or other constant values. Intermittent time
series occur when the demand for an item is intermittent. Because many time series models are based on weighted
summations of past values, they bias the forecast toward zero. Therefore, their models will not work for intermittent time
series data (SAS Institute Inc., 2014).
51
series, Multipharma may identify a certain threshold above which the MAPE is too high and
hence the model may be inaccurate17
. When Multipharma sets the threshold of the MAPE to a
reasonable 160, it only considers 1.7 per cent of the products.18
Using domain expertise might
reveal why some models are not appropriate. When this is the case, the automatically
generated statistical forecast can be overridden with a user forecast. When forecasts are
created in this case study the products with a MAPE above a certain threshold will be
discussed with the Supply Chain department and they will choose a forecast model that is
more appropriate if necessary. Another way to determine if a model fits the data well is by
plotting the prediction error autocorrelation function (ACF) and prediction error partial
autocorrelation function (PACF). The graphics (Figure 5.5 and Figure 5.6) are an example of
the ACF and PACF prediction error for a randomly selected product for which SAS Forecast
automatically selected an ARIMA model.
Figure 5.5: Prediction error ACF of randomly selected SKU
17 Note that an inaccurate model fit is not the only cause of a high MAPE. Most of the time, there is not enough information
available in the data to explain the variability and produce an accurate forecast, thus changing the model in this case might
not be relevant. 18 Note that SAS Forecast Studio also produces ‘alerts’ when the generated forecasts differ significantly from the observed
historical sales, on which Multipharma can act upon.
52
Figure 5.6: Prediction error PACF of randomly selected SKU
The error prediction for the ACF and PACF for the model built shows that the lags lay within
the confidence error of two standard deviations. This means all autocorrelations for the
residual series are non-significant, which is a necessary condition for a model that fits the data
well. Hence, the graphs reveal that the model built fits the data well. Appendix B discusses
theoretically how the forecast is performed using an Exponential Smoothing and ARIMA
model, because these are the most widely adopted models for the times series forecasts of
Multipharma. What follows describes the basic demand forecasting principles that will be
used for products within the assortment (existing or new products) of Multipharma (cf.
Section 5.2.1). This corresponds with all centralized products and products in decentralization
as described in the product classification.
5.4 Create forecast The current forecasting process is mainly based on the historical order dataset (cf. Section
5.1). The higher the quality of the historical data the more reliable the forecast is. Hence, past
observations are used as a basis to predict the future and a suitable time series forecasting is
of the form
(5.1)
where F is the weekly forecasted order demand, D is the historical aggregated weekly demand
of products from all pharmacies and parapharmacies and t is the current week, t+1 is the next
week, etc.
53
However, Multipharma realized quickly this basic approach had its limitations. First, with
regard to the stockout of products, Multipharma chooses not to keep backorders as these
create administrative costs and confusion. Sometimes the product is available the day after,
however, sometimes the product is six months out-of-stock and pharmacies will look for
alternative solutions. In this latter case, backorders might not be relevant when the product
becomes available again. Ignoring backorders causes some pharmacists to consider unfulfilled
orders as ‘lost’, they keep placing ‘the same’ orders until the product is in stock. These
recurring orders might decrease the quality of the data. Hence, the data during a timeframe
where the product is not available anymore at the central warehouse could have different
interpretations.
On the one hand, pharmacists keep on placing ‘the same’ orders. They consider
unfulfilled orders as ‘lost’. The orders placed are a multitude of the actual orders that
would have been placed when the product was not missing. Order data during such
period is an overestimation of actual demand. Data for these periods should not be
used as a history basis but could be an indirect indicator of demand peaks as soon as
the product becomes available.
On the other hand, pharmacists place orders as if they would do when the product
would have been available and as a consequence orders do reflect actual demand.
To solve this problem, information about orders leaving the central warehouse can be used to
correct the order data in the first case. If the outbound drops to zero, a product is missing and
the corresponding out-of-stock period will be used to manipulate the order data where
demand might be an overestimation of the actual demand. Figure 5.7 obtained from the
blueprint information of Multipharma, indicates how the pharmacists’ sales data can be used
to predict central warehouse orders. Note that pharmacists’ historical sales data will be
excluded from the moment outbound of the central warehouse drops to zero to obtain more
‘realistic’ data by filtering out the uncommon out-of-stock data. To solve this problem
analytically an additional ‘event flag’ needs to be created in SAS Enterprise Guide, which
indicates when a product is in an out-of-stock situation, i.e. when the outbound drops to zero.
This ‘event flag’ recognizes when the order data overestimates actual demand and
automatically adjusts the order data in order to minimize the forecasting error for the future.
54
Figure 5.7: Forecast with out-of-stock period
Source: Provideor @ All rights reserved
The second limitation of the basic approach (Equation 5.1) is about modelling promotional
events. As described in section 2.3, a company clearly benefits from modelling promotional
events. Multipharma recognizes this need and models a promotion using an ‘event flag’. Past
promotions are recognized based on the ‘order type’ and future promotions are based on the
SAP agenda introduced in SAS. The agenda provides information on the start and end date of
the promotion. Based on the agenda, the ‘event flag’ recognizes when a peak in demand due
to a promotion is occurring and automatically returns to normal demand thereafter. In contrast
to out-of-stock events, the data during a promotion that happened in the past is not filtered out
of the historical dataset. Moreover, no distinction is made between different types of
promotions (Passage délégué, Promo O and Action iU), because there was not yet enough
data available to distinguish the different types. This will be the subject of the following
chapter where a new way of handling promotions will be introduced when forecasting with
time series.
The base table from SAS Enterprise Guide, which consists out of the historical, out-of-stock
and promotional information of all 11,067 centralized SKUs, will be used to create a forecast
in SAS Forecast Studio based on the classifications explained in section 5.2.2. The
automatically obtained result in SAS Forecast Studio is a weighted19
MAPE of 53.23 on an
19 The weight for a series is the sum over its entire historical period (N). This is calculated as (N*MEAN) from the series
properties. A more detailed explanation on how to obtain the overall weighted MAPE manually is provided in Appendix C.
55
SKU level. The distribution of the MAPE is represented in figure 5.8. This figure illustrates
about 95 per cent of the products have a MAPE distribution lower than 100. However, there
are some products for which the MAPE is even higher than 200. These products have a bad
forecast accuracy, which causes a negative impact on the overall MAPE. The majority of the
models used to execute the forecast are ARIMA models and exponential smoothing models.
Only a small fraction of the products (1%) use intermittent demand models (IDM). The model
distribution is represented in figure 5.9.
Figure 5.8: Distribution MAPE with baseline model forecast
Figure 5.9: Distribution of model type of all SKUs with baseline model forecast
56
Chapter 6
Demand Drivers
Demand drivers play an important role in raising the quality of a forecast. Predictor variables
or demand drivers can extend the historical information in time series forecasting, e.g.
forecast with ARIMAX models. Internal and external data of Multipharma can be used as
demand driver in the forecasting process and to model obvious events. As described in section
5.4, at present, two events (i.e. out-of-stock and promotion) are included in the baseline model
based on internal data sources and the base forecast is now of the form
(6.1)
where F is the weekly forecasted order demand, is the historical aggregated weekly demand
of products from all pharmacies and parapharmacies, out-of-stock and promotion are two
events and t is the current week, t+1 is the next week, etc. This model tries to explain what
causes variation in demand of the pharmaceutical and parapharmaceutical products. However,
there will always be changes in demand that cannot be accounted for by these demand drivers,
which is why an error term is included allowing for random variations and the effects of
relevant variables not (yet) included in the model. This corresponds to the first law of
forecasting according to Hopp and Spearman (2008), which states ‘forecasts are always
wrong’. Nevertheless, the explanatory model is very useful because it incorporates
information about other variables rather than only historical order quantities to be forecasted.
The higher the error term, the more room for improvement. Business insight might reveal
additional demand drivers, which can be modelled using internal and external data sources.
The aim of this chapter is to validate the hypotheses that big data analytics can offer a
significant benefit for Multipharma. Therefore, it will be investigated if the forecast error will
reduce by using additional internal and external data sources (Supra. chapter 5).
57
6.1 Setting priorities Multipharma possesses a lot of semi-structured information in the form of Excel files issued
by the Sales, Marketing, Financial department, etc. In addition, Multipharma also maintains a
continuous flow of structured information from its SAP ERP system. Together with the
Supply Chain department, a couple of concerns with existing forecasting series were
identified (D.V. Belle, personal communication, March 30, 2017). As already mentioned in
the literature (cf. Section 2.1), business knowledge is indispensable to understand the data and
identify potential problems. The communication with the Supply Chain department of
Multipharma was crucial to understand why some products’ forecasts generated by SAS
Forecast Studio were not as expected. Together with the Supply Chain department priorities
were set to improve the forecast of some products. The prioritization was based on rules and
experience, which taught the Supply Chain department which demand drivers could possibly
have a big impact on two classes of the IMS classification where they faced the largest
difficulties: OTC (Over-The-Counter) and PEC (parapharmacy). The first priority demand
drivers have the biggest impact on these classes because today these ‘problems’ are corrected
manually and this takes a lot of time and effort. The orders that SAS Forecast Studio proposes
for these products have a MAPE that is too high and correcting these order propositions
manually has a big impact on the workload. Therefore it would be interesting to include
additional information in the model on an automated basis. Figure 6.1 provides a complete
overview of all new possible demand drivers from a business point of view.
58
Figure 6.1: Priority schedule
The first priority is to model events of promotional actions (‘Passage délégué’, ‘Promo O’ and
‘Action iU’, ‘Substitution’), in order to filter out uncommon/volatile data that go along with
these events and therefore to reduce the error of future forecasts. Note that this technique is
different and expected to be more accurate than the traditional way of modelling the
promotion (cf. Section 5.4). Section 6.2 will investigate if this new, more detailed technique of
modelling promotions is superior to the traditional way of forecasting, which uses a more
general technique of modelling promotions. First note that modelling substitution is a major
concern for Multipharma because there is striking evidence of problems with the forecast of
substitution products. When a new promotional item replaces the base SKU, ideally there is a
100 per cent switch to the promotional item. When the promotional product is out of stock or
the promotion is terminated there is a switch back to the base SKU. Figure 6.2 gives an
example of substitution for two products with three promotional periods, for which the daily
demand of the base product (blue line) drops to zero during a promotion (orange line).
• Passage Délégué • Action iU • Promo O • Substitution
Priority 1
• OOS cannibalization • Seasonality • Holiday calendar
Priority 2
• Weather • Google trends data • Healthcare policy • Pricing policy • Commercial policy • Patent loss • Changes in packaging • National campaigns • Duo vs Mono • Changes in the number of PoS and SKUs
Priority 3
59
Though, SAS Forecast Studio does not capture this obvious relationship due to the lack of
some crucial information. As a consequence, this is an interesting topic to improve the
forecast of these kind of products by inserting new information on a regular basis. However,
to date there is no existing linkage in the SAP system or in the form of Excel files between the
base products and their substitution products. Hence, it would be impossible to relate the
historical and future demand of both products in SAS without automatic input data present.
Figure 6.2: Substitution between two products
In contrast, for the remaining three types of promotions (‘Passage délégué’, ‘Promo O’ and
‘Action iU’) the necessary information is available in the form of semi-structured Excel files
and will be the subject of section 6.2. Each supplier delivers some crucial information about
the promotions to Multipharma. These files have some information in common; they all
contain the brand name, the promotional period, the CNK code and label of the product,
whether it is a new product, whether it is stored in a pharmacy and/or parapharmacy and some
information about the size of the promotion. Figure 6.3 is an example of such an Excel file
for ‘passage délégué’ promotions of the brand ‘Vistalife’. Note that the real base and ‘passage
délégué’ discount percentages are replaced by ‘X%’ for confidential reasons. The most
interesting information needed to model the promotional events is indicated in red. The start
and end time of the promotion will be translated to SAS coding to create the promotional
event that will be added to the baseline model based on the CNK code which corresponds to a
unique SAP code used in SAS to identify unique products.
60
Figure 6.3: Example Excel file of supplier
Going back to figure 6.1, the demand driver ‘cannibalization’, which belongs to the second
priority, is an extension of substitution. Cannibalization occurs when a product that is out-of-
stock drives sales of another product. The same issue as with substitution products can be
uttered; there is not yet an existing explicit linkage between the products. Multipharma
discusses internally what kind of data they will need to capture in the future to solve this
problem. In addition, the second priority demand drivers are related to seasonal products and
the holiday calendar. Although seasonality can be captured with SAS Forecast Studio, there is
still some error present in the forecast per example for flu-related medicines, sunscreens,
mosquito products and insecticides. Therefore, Section 6.3 focuses on the improvement of the
forecast accuracy of seasonal products with external data sources, namely weather conditions
and Google Trends data belonging to the third priority demand drivers.
The remaining demand drivers (i.e. annual holiday calendar and the other demand drivers of
priority 3) are expected to have an impact on the demand of pharmaceutical products. As an
61
example, the healthcare policy (i.e. the reimbursement by RIZIV20
) is a driver to prescribe or
deliver certain products to pharmacies, because Multipharma is always obliged to propose the
‘cheapest’ products. Another example relates to products with no fixed price (OTC and
parapharmacy) for which Multipharma applies its own pricing policy, which may influence
the way pharmacists propose certain products to the end consumer. Moreover, the pricing and
commercial policy of Multipharma may depend on the pricing and promotions of competitors.
Though, this data is difficult to obtain and as a consequence it will be challenging to model
these demand drivers. On the other hand, some demand drivers like the packaging of products
can be easily modelled because the information is relatively easy to obtain. Boxes of
Dafalgan, which consist of twice the amount of medication as the original packaging, create a
new SKU and this might have an impact on the original packaged SKU. In short, some
demand drivers of priority three will be worth capturing, monitoring and analyzing data, but
this is out of the scope of this master dissertation and might be subject for further research.
The following section will explain the meaning21
of the first priority promotional events in the
environment of Multipharma and examines if the new way of modelling different types of
promotions improves the demand forecast accuracy compared to the base forecast as
described in section 5.4. Note that these promotions are trade promotions offered by the
suppliers resulting in forward buying of Multipharma and the pharmacies. Forward buying
results in large orders during the promotion period followed by very small orders thereafter
and thus will not increase the supply chain’s revenue (Chopra & Meindl, 2013). Moreover,
pharmacists’ order variability will be much larger than customers’ order variability because
the promotions are not passed through to the customers. The second part of this chapter
describes the analysis of the second priority demand driver, seasonality. It will be investigated
if the demand forecast accuracy of seasonal products (flu-related medicines, sunscreens,
mosquito products and insecticides) enhances using external data sources, namely weather
conditions and Google Trends data.
In order to do a proper analysis, the data is divided into a train and holdout sample. In SAS
Forecast Studio, the MAPE of a selection of products is weighted; it is based on the
importance of each SKU in the overall selection of products. Moreover, the weighted MAPE
is calculated based on the historical period and a holdout sample of data at the end of each
20 The National Institute for Health and Disability Insurance in Belgium 21 Source: Supply Chain department, personal communication, March 26, 2017.
62
time series that was not used to construct models. Using a holdout sample to judge accuracy is
often referred to as an honest assessment because it simulates fitting and deploying a model
and then judging the accuracy in a live environment. In this case study the holdout sample is
chosen to be three month since this correlates to the normal forecast horizon.
6.2 Internal data
6.2.1 Passage Délégué ‘Passage délégué’ is a term that describes a visit of a commercial representative of a product
supplier or a specialized company22
in a POS of Multipharma. The goal of the commercial
representative is to explain the products, the current commercial conditions and promotions
(discount, free samples, visual display, and gifts) and eventually to propose placing an order
via the commercial representative or via Multipharma’s DC, depending on previous
negotiations with Multipharma. In both cases, the Key Account Manager of a supplier
proposes a list of products and commercial conditions to the Category Manager at
Multipharma. The Category Manager validates the products and the period of ‘passage
délégué’ during which the commercial representatives can visit the POS and present the
selected products. Based on the Excel list of approved products, Category Manager Assistants
update the commercial conditions in the ERP system by including the discount for the
validated period of ‘passage délégué’. They communicate this agreement to all POS of
Multipharma and the internal departments. During the period of ‘passage délégué’ the
supplier will plan visits of their commercial representatives. As this process is managed by
supplier itself and possibly concerns a large number of POS, it is impossible to create an
adequate forecast on which day, week or month suppliers are going to visit the pharmacies
and the precise impact on demand, i.e. the height of the promotional peak. Note that although
having a peak in demand, most of the time ‘passage délégué’ does not increase the sales
revenue for Multipharma because the pharmacists do not pass through the promotion to the
customers. Multipharma will experience a cannibalization of its sales when the discount
disappears because pharmacists have increased the stock during a ‘passage délégué’ period
and continue to serve their customers from this stock the period thereafter.
22 Sometimes activities of a supplier (e.g. Omega Pharma) are outsourced to specialized companies.
63
However, Multipharma knows the period in which a ‘passage délégué’ might occur for
certain products, which products will be discounted and the amount of the discount. This is
an interesting source of internal data in the form of semi-structured Excel files (cf. Figure
6.3). According to the Supply Chain department the aim is not to predict when exactly a peak
might occur and the height of the peak due to the promotional event, because this depends on
the agenda and conditions of the supplier. A business rule for inventory management can be
applied during the period of ‘passage délégué’. The business rule specifies how much extra
stock needs to be ordered and thus how the orders towards suppliers need to be adapted with
regard to additional discounts applicable during ‘passage délégué’. Instead, the purpose of
using the internal Excel data sheets is to model the promotional events based on the ‘passage
délégué’ periods and to exclude the peaks in historical order quantities due to these
promotional events. Hence, the aim is to reduce variability of future forecasting, when
‘passage délégue’ is not going on. This is in line with a study of Knuth et al. (2014) stating
that outliers can affect and skew forecast accuracy, and therefore it might be useful to exclude
them from your overall forecasting calculations.
Hence, ‘passage délégué’ promotions create variability in the product forecast due to the
changing order behavior of pharmacists who might order more than normal in order to benefit
from the commercial conditions. This behavior of pharmacists during the promotional periods
is not an accurate predictor for the future and should be excluded from the historical data to
obtain a more accurate base forecast. Figure 6.4 represents the procedure, which is followed
for the analysis of ‘passage délégué’ promotions and this reasoning can be extended to
‘Promo O and ‘Action iU’ promotions, which will be explained thereafter.
64
Figure 6.4: Followed procedure of analysis
Figure 6.5 represents the time series (2015-2016) of the product ‘Vicks: Vaporub 100g’ with
‘passage délégué’ promotions in its history during two periods, indicated in red on the
horizontal axis. It can be seen from the graph the first week following the ‘passage délégué’
period of the commercial representative indicates a clear peak of 2,961 SKUs. In contrast, the
second period does not indicate an increase in demand due to the passage of the commercial
representative. This may be due to the fact that the pharmacists do not want to keep much
extra of this product in stock, maybe because they expect the product will sell less during this
time of the year or they still have a significant amount of inventory left due to the previous
promotion. Figure 6.6 represents the historical order quantities of the SKU excluding the
demand during ‘passage délégué’ periods. Focusing on the y-axis, it can be seen the demand
is more level compared to the previous time series. This visual insight leads to the fact that
excluding the peak from the history might reduce the variability for the future.
Visually analyze the effect of a 'passage délégué'
Understand the forecast mathematically
Select all 'passage délégué' products from SAP
SAS coding in SAS Enterpise Guide: creating promotional
events and excluding the volatile order data
Compare the accuracy of the base forecast with the new forecast using SAS Forecast
Studio
Discuss the results with Supply Chain department
65
Period 'Passage Délégué'
02/11 - 30/11/2015
29/02 - 30/03/2016
Table 6.1: Promotional Periods of Vicks Vaporub 100g
Figure 6.5: Time series Vicks Vaporub 100g incl. promotional order quantities
Figure 6.6: Time series Vicks Vaporub 100g excl. promotional order quantities
After the visual insights, the mathematical logic will be explained based on the research paper
of Cachon and Fisher (1997). Note that this interpretation is similar for ‘Promo O’ and
‘Action iU’ and will not be repeated. To exclude the peak the promotional event will be
modelled creating an additional variable: , indicates a promotion occurs in week w, for
SKU s, otherwise, A comment on notation, a w in the superscript refers to a week
66
between December 31, 2012 and March 13, 2017, and a subscript refers to SKU
When there is no promotion, equals actual demand, otherwise
Let equal the forecast of normal demand (i.e. non-promotion demand) for week w, where
each is evaluated using a simple ESM
(6.2)
where is a constant. This forecast makes several assumptions about the demand forecast;
promotion demand has little effect on subsequent normal demand because a forecast is not
updated if it occurred on a promotion day. Finally it is assumed that a single constant can
effectively apply to all SKUs. The parameter is chosen to minimize a measure of forecast
errors, defined by the difference between the actual demand and the forecast, in the
calibration period.
The mathematical logic is the reasoning behind the coding of promotional events in SAS
Enterprise Guide and the execution of the forecast in SAS Forecast Studio thereafter. First,
promotional events are modelled based on the period of ‘passage délégué’ and added to the
baseline model. Uncommon data (i.e. the peaks) are filtered out based on these events in the
form of additional variables. Hence, a new dependent variable ‘Quantity excl. passage’ is
created that will be used as the variable to be forecasted. To detect the impact of the model
with the new dependent variable, a selection of products is chosen with ‘passage délégué’
promotions in their history. This selection includes 2,464 distinct SKUs. As explained in
section 5.2, SAS executes forecasts for the entire dataset based on four different
classifications and the SKU level. For each SKU the best forecast is selected. However, the
aim of this section is to compare the accuracy of the baseline model (cf. Section 5.4) and the
more advanced model for a selection of products. Therefore, it seems reasonable to assume
that to compare relative small selections of products the forecasts can be executed on the SKU
level. Hence, in this case study forecasts will always be executed on SKU23
levels, though, in
reality classifications are relevant when the entire dataset is considered.
23 Note that this does not imply that an individual forecast is the best for each SKU in the selection. However, the individual forecasts are the best for the selection of products under consideration.
67
The analysis in SAS Forecast Studio is executed as follows. First, the forecast is executed for
the selection of products based on the original model (cf. Section 5.4) with the original ‘order
quantity’ as dependent variable. As explained in chapter 5 (Section 5.3), SAS Forecast Studio
chooses from a selection of models based on the characteristics of the product it is dealing
with. The previous described equation 6.2 only considers simple exponential smoothing
models. Only considering simple exponential smoothing models for the entire dataset will be
too simplistic and cause serious forecasting errors. In contrast, programming the forecast
manually for a large selection of products, where each product possibly needs a different
forecasting model, would be too computational expensive and is considered to be out of the
scope of this master dissertation. It can be said SAS Forecast Studio outperforms manual
programming because it is much faster and offers accurate results for most of the products,
because different models (i.e. ARIMA, ESM and IDM) are fitted to the data and the best-
performing model is chosen based on the characteristics of each product. However, in some
exceptional cases business knowledge can offer additional insight. For example, when
moving average becomes more appropriate than exponential smoothing. However, because
this master dissertation is fulfilled in continuous dialogue with the Supply Chain department
of Multipharma, SAS Forecast Studio will be used and only in exceptional circumstances the
model of a product will be modified manually (cf. Section 5.3). The result of SAS Forecast
Studio of the baseline model forecast for the selection of products is a weighted MAPE of
59.68.
Second, the order demand is reforecast with the ‘Quantity excl. passage’ as dependent
variable. Hence, similar to equation 6.2, the historical order data during the promotional
period are excluded from the overall historical data to execute the forecast in order to obtain a
more accurate demand forecast, by reducing the variability of demand. The weighted MAPE
of the updated forecast is 39.04. This means the forecast error is reduced by 34.6 per cent for
the selection of products with at least one ‘passage délégué’ in its history. The distributions of
the MAPE of both forecasts are presented in figure 6.7. Group 1 represents the distribution of
the MAPE of the baseline model forecast, whereas Group 2 represents the distribution of the
MAPE of the more advanced model forecast with the new dependent variable. The graphics
include density plots on which the Kernel Density Estimation (KDE) was added to determine
the distribution of the data points. Figure 6.7 shows the distribution of the MAPE’s is shifted
to the left for the entire range of products between zero and 200. Hence, it can be concluded
68
inserting the additional demand driver has a positive effect on the forecast accuracy for the
selected SKUs.
Figure 6.7: Passage Délégué: Distribution of the MAPE
6.2.2 Promo O ‘Promo O’ (or ‘Promo Obligatoire’) is a term used by Multipharma to describe promotions
originating from suppliers which are active and mandatory in all Multipharma pharmacies
during a period of one month. The ‘Promo O’ starts on the first Friday of the month and it
lasts until the first Friday of the following month (i.e. when the following promotion might
start). A supplier has purchased a particular promotional location (counter, shelf, display, etc.)
and proposes a selection of products for the promotion. The Category Manager of
Multipharma validates the product selection. Based on historical sales and size of the
promotional location, a fixed quantity for each product is shipped to all pharmacies regardless
of the current stock in the POS. The defined fixed quantity per pharmacy is shipped out to all
pharmacies one to three weeks before the start of the ‘Promo O’. However, for ‘slow movers’
a variable quantity shipment is done, based on the amount each pharmacy already has in
stock. The purpose of doing a variable shipment is to reduce a potential overstock in the POS.
69
Figure 6.8 represents the time series (2015-2016) of the product ‘Tilman: Elimin Fresh Thee’
with a ‘Promo O’ in its history during the month of April, which is indicated in red on the
horizontal axis. These time series indicate that promotions tend to be quick surges in demand,
because they are often shipped over only one day. As expected Multipharma anticipates the
‘Promo O’ period about two weeks in advance. This is the peak of 368 SKUs on the 24th
of
March. Again, the challenge to forecast products with ‘Promo O’ promotions in the past is to
exclude peaks from demand thereby decreasing variability and generating better forecasts
outside promotional periods. It is important to exclude the two weeks in advance of the
promotion because the products are shipped out to all pharmacies two to three weeks before
the start of the ‘Promo O’. Figure 6.9 represents the time series when the entire period from
two weeks before the promotion until the end of the promotion is excluded. The history is
now more level compared to the previous time series when considering the scale of the y-axis.
Period ‘Promo O'
1/04 - 6/05/2016
Table 6.2: Promotional Period of Tilman Elimin Fresh Thee
Figure 6.8: Time series Tilman Elimin Fresh Thee incl. promotional order quantities
70
Figure 6.9: Time series Tilman Elimin Fresh Thee excl. promotional order quantities
Using SAS coding a similar promotional ‘event’ is created as with ‘passage délégué’ based on
the start and end date as mentioned in the Excel sheets. However, this time two weeks before
the start of the promotion are also considered as part of the promotional event and the
corresponding order quantities need to be excluded as well. A new variable ‘Quantity excl.
promo’ was created based on the ‘event’ and added to the base table. To detect the impact of
the additional demand driver ‘Promo O’, all products for which a ‘Promo O’ promotion
occurred in their history are selected. The selection consists of 363 unique SKUs. The forecast
with the original ‘order quantity’ as dependent variable is compared with the forecast having
the new variable ‘Quantity excl. promo’ as dependent variable. The latter excludes the highly
volatile order quantities during a promotional event from the historical order data in order to
get a more accurate forecast. The output of the base forecast in SAS Forecast Studio has a
weighted MAPE of 128.64. The output of the updated forecast model in SAS Forecast Studio
has a weighted MAPE of 54.28. The forecast error for the updated forecast model is reduced
by 57.8 per cent. Figure 6.10 represents the distributions of the MAPE of both forecasts and
has a similar interpretation as with the ‘passage délégué’ promotions. Likewise, this figure
demonstrates that the distribution is shifted to the left for the entire range of products with at
least one ‘Promo O’ in their history. Hence, it can be concluded inserting the additional
demand driver has a positive effect on the forecast accuracy for the selection of SKUs.
71
Figure 6.10: Promo O: distribution of the MAPE
6.2.3 Action iU ‘Action iU’ is a term used by Multipharma to describe promotions from suppliers, which are
active and mandatory in all iU Points of Sales during a period of one month. The ‘Action iU’
starts on the first Friday of the month and it lasts until the first Friday of the following month
(i.e. when the following promotion might start). A supplier has purchased a particular
promotional location (counter, shelf, display etc.) and proposes a selection of products for the
promotion. Again, the Category Manager validates the product selection. In contrast with
‘Promo O’, there are no fixed quantities per POS that are shipped out for ‘Action iU’, instead
each POS orders the stock they consider necessary. Hence, the order delivered to each POS is
a variable quantity depending on the needs of the Shop Manager. Most of the time the Shop
Manager orders the stock for promotion one week before the promotion starts, because some
‘iU’ products are not shipped on a daily basis. In addition, they need some time to secure the
items, e.g. putting anti-theft tags on the products. In most cases, when a new product becomes
part of an ‘Action iU’ promotion Store Managers order it as soon as it is available. The effect
of ‘Action iU’ is similar to ‘Promo O’, however, the peaks in demand are more unpredictable
and can vary more in size or on the moment when they occur. Again, it is important to
exclude this effect from historical data in order to decrease variability and generate better
72
forecasts outside promotional periods. A safe range is to exclude the historical data two weeks
before the actual ‘Action iU’ period takes place until the end of the ‘Action iU’ period.
Figure 6.11 represents the time series (2016) of a relatively new product ‘Vichy Dercos
Shampoo’ with an ‘Action iU’ in its history during the month of July, which is indicated in
red on the horizontal axis. As expected unstable demand occurs two weeks before the start of
the promotion. Figure 6.12 represents the time series when the entire period from two weeks
before the start of the actual promotion until the end of the promotion is excluded. The history
is now more level compared to the previous time series.
Period ‘Action iU'
01/07 - 31/07/2016
Table 6.3: Promotional Period of Vichy Dercos Shampoo
Figure 6.11: Time series Dercos Shampoo incl. promotional order quantities
Figure 6.12: Time series Time series Dercos Shampoo excl. promotional order quantities
73
To see the impact of the additional demand driver a selection of 515 ‘Action iU’ products
with at least one ‘Action iU’ promotion in their history will be considered and analyzed. The
baseline model forecast, with the original dependent variable, is executed for this selection of
products. The weighted MAPE of the baseline model forecast is 73.05. A similar ‘event’ as
with the previous promotional types is inserted into the base table by making use of SAS
coding based on the provided information in the form of Excel sheets. A new forecast is
executed with the additional information of ‘Action iU’ promotions (i.e. with a new
independent variable). Hence, the historical order data during the period of an ‘Action iU’
promotion and two weeks before are excluded from the overall historical data to execute the
forecast in order to obtain more accurate estimates. SAS Forecast Studio calculates the
weighted MAPE of the latter forecast to be 64.08. By consequence, the additional demand
driver based on the Excel files of ‘Action iU’ products is responsible for a 12.4 per cent
reduction of the forecast error. The distribution of the MAPE’s is represented in figure 6.13.
From this figure, it can be concluded the additional demand driver ‘Action iU’ improves the
forecast accuracy for the selection of products.
Figure 6.13: Action iU: distribution of the MAPE
74
6.2.4 Summary of the results A summary of the MAPE’s is presented in table 6.4.
Baseline Model Updated model % Change
Passage Délégué 59.68 39.04 -34.6
Promo O 128.64 54.28 -57.8
Action iU 73.05 64.08 -12.3
Table 6.4: Summary MAPE’s promotional demand drivers
Several interesting observations follow from the analysis of ‘Passage délégué’, ‘Promo O’
and ‘Action iU’ promotions. About 23 per cent of the SKUs in the study have had at least one
‘Passage délégué’ or ‘Promo O’ in their history but they do not appear both for one product.
About 10 per cent of the existing parapharmaceutical SKUs have had at least one ‘Action iU’
promotion in their history. Weekly demand during a promotional week is often dramatically
greater than weekly mean demand: weekly demand during a promotion is on average 111 per
cent higher than weekly demand in normal circumstances.
The previous examples show additional internal data in the form of semi-structured Excel
files can improve the demand forecast accuracy. The Excel data are used to model three types
of promotional events to incorporate into the baseline model. These events are used in the first
place to exclude demand variability in order to obtain a more accurate forecast for the future.
This technique is superior to the previous described basic promotional information (Section
5.4), which only recognizes the peak but ignores demand variability during the entire
promotional period. Hence, the hypothesis that big data analytics can offer a significant
benefit for Multipharma is validated because adding additional internal data sources reduces
the forecast error. To summarize, when internal data is used in a smart way to extend the
basic forecasting model, it can improve the forecast accuracy of a selected number of
products. It is reasonable to make a proposal to the Supply Chain department to structure the
Excel data files in a better way to automatically inject it into SAS, as it is shown this data
raises demand forecast accuracy. Knuth et al. (2014) states that companies that do not
properly flag or monitor outliers in their demand patterns will overtime distort the inventory
forecasting accuracy which could inevitably create large quantities of excess inventory that
75
costs money and hits the profitability margin. Therefore, in chapter 7 the impact of the
improved forecast accuracy on the safety inventory and the carrying costs will be explained.
6.3 External data As mentioned in section 6.1 of this chapter, the second priority according to the Supply Chain
department is to improve the forecast of seasonal products. Seasonal products in the context
of Multipharma are pharmaceutical products for which the demand fluctuates over time with a
repeating pattern. Examples of these kind of products are flu-related medicines, sunscreens,
mosquito products and insecticides (for cats and dogs). As already mentioned in the literature
(Chapter 2, Section 2.3), weather can be an influencer of demand shaping. The impact of
weather is also clearly pronounced when selling pharmaceutical products, e.g. when it is a bad
summer there might be a lot of sun products in stock at the end of the summer. By
consequence, at the end of a season, one will try to get all excess stock onto the market, which
is a suboptimal manner of inventory management. This section will investigate if historical
weather conditions can offer additional value to improve the forecast accuracy of seasonal
products. External data is obtained in the form of Excel files from KMI. KMI is a Belgian
Federal Institute that carries out scientific research in the field of meteorology. The data
obtained contains the daily average temperature and precipitation from January 2012 till
December 2016 in the region of Ukkel.
Besides weather, there is another important source of data to investigate with regard to
seasonal products. The prominent attendance of the Internet in the majority of peoples’ lives
makes it possible to find some trends and patterns out of their search data, i.e. Google Trends.
This is exactly what Google already recognized and therefore created Google Flu Trends
(GFT). The idea behind GFT is that information seeking behaviour on the Web reveals the
influenza status of the population. Ginsberg et al. (2009) evidenced that search queries
outperform simple autoregressive models based on historical data of flu levels. Eysenbach
(2006) revealed a high correlation between clicks on sponsored search results for flu-related
keywords and epidemological data from the Canadian flu season from 2004 to 2005. These
kind of studies prove there is a potential to improve demand forecast accuracy with an
additional demand driver, Google Trends data, based on the insights of GFT. Hence, we
expect Web searches can predict more precisely and accurately when the influenza season
starts and consequently when the sales of flu-related products will start to kick-off. The idea
76
behind GFT will be extended to the other types of seasonal products (sunscreens, mosquito
products and insecticides).
Although SAS Forecast Studio can easily absorb seasonality, using Holt-Winters’ seasonal
exponential smoothing models, seasonal ARIMA models and derivatives of these models (cf.
Section 5.3), it will be investigated whether additional data sources, namely Google Trends
and weather conditions, can improve the forecast accuracy compared to the baseline model,
e.g. using an ARIMAX model. Hence, the first research question is whether the additional
data sources can reduce the forecast error by extending the baseline model with additional
explanatory variables, thereby using advanced ARIMAX models. Moreover, it should be noted
the demand of the investigated seasonal products never drops to zero, because products such
as painkillers, sunscreens, mosquito products and insecticides are sold during the entire year.
That is why it is very important to make the forecast during off-peaks as accurate as possible.
With regard to the painkillers, extreme peaks during the winter will cause variability during
off-peaks. The forecast will predict higher sales during off-peak months because of the
extreme peaks, which causes variability in the forecast. Hence, instead of improving the
predicting behavior in general, the data sources might be used to make the base forecast more
accurate following the same reasoning as in section 6.2. Therefore, the second research
question is whether filtering extreme peaks out of the historical dataset by making use of
Google Trends data might improve the forecast accuracy. The same reasoning as with flu-
related products can be followed for sunscreens, mosquito products and other insecticides (for
cats and dogs).
To obtain a proper analysis the selections of SKUs corresponding to the seasonal types that
will be investigated need to be identified and filtered out of the entire SKU dataset based on
the IMS classification. Together with the Supply Chain department the IMS top level and the
subgroups for each seasonal type are identified and represented in table 6.5.
Product IMS level N SKU
Flu J01 199
Sunscreen 83F 288 Mosquito R06 30 Insecticides N/A 45
Table 6.5: IMS class corresponding to seasonal type
77
Because insecticides belong to the top level ‘Other’ (N/A), a different approach is used to
identify a more precise selection. Products with a label containing ‘Frontline’, ‘Advantix’ and
‘Drontal’ are identified as being insecticides used for cats and dogs.
6.3.1 Selecting search terms Weather conditions are considered as incontestable data, measured by KMI. In contrast,
Google Trends data is data that can be obtained by every individual. The time series of the
search term ‘Griep’ in Belgium is represented in figure 6.14 from January 2012 till December
2016. Although some significant peaks can be distinguished during the winter months, it is
doubtful if only one search term is useful to predict diseases in the short-term and the
corresponding sales of drugs related to these diseases. The former is the reason why Google
invented GFT, based on highly correlated queries. GFT are the result of Google Correlate.
Google Correlate looks for highly correlated search terms. Table 6.6 presents the ten search
terms24
, in decreasing order of correlation with the search term ‘Influenza-like Illness’, that
make up the search terms for GFT. Figure 6.15 visualizes the positive correlation of the two
most highly correlated search terms. Figure 6.16 represents the sum of all individual queries
that make up the data for GFT in Belgium.
Figure 6.14: Google Trends data: ‘Griep’
24 Source: Google Flu Trends (http://www.google.org/flutrends)
78
Search Term Correlation
Influenza type a 0.9069 Symptoms of flu 0.9038 Flu duration 0.9033 Flu contagious 0.8919
Flu fever 0.8851
Treat the flu 0.8831
How to treat the flu 0.8830
Signs of the flu 0.8815
How long is the flu 0.8775
Symptoms of the flu 0.8741
Table 6.6: Correlation of search terms Figure 6.15: Graphical representation of
correlation of 2 search terms
Figure 6.16: GFT Belgium
However, using GFT as an additional predictor variable in a company has a drawback.
Google stopped producing data for GFT in 2015. Companies that want to use additional data
sources, such as GFT, to improve demand forecast accuracy need real-time and continuous
data over time. Hence, a limited historical sample of GFT will be insufficient to add to the
historical dataset. However, the underlying principle of GFT can be used to make up our own
trends data based on correlated search queries. In addition, the logic behind GFT can be
extended to other product types in our analysis as well. What follows describes how the
search terms are obtained for the different types of seasonal products.
79
Google Trends data can be filtered by time and region. The search terms need to be
considered in the period from January 1, 2012 till December 31, 2016. Note that the actual
dataset contains historical data from December 31, 2012 till March 13, 2017. However, the
dataset provided by KMI contains data until December 2016. Therefore, this analysis is based
on a subset of the entire dataset excluding25
the historical data of 2017 and the holdout sample
is considered to be the last 12 weeks of 2016. Google Trends data are only available on a
weekly basis, however, this corresponds with the time aggregated historical order data of
Multipharma, which is also available on a weekly basis. Using the search terms in the context
of Multipharma means it is only relevant when applied to Belgium. Hence, the search terms
need to be filtered based on the country and need to contain both Dutch and French search
terms, because Multipharma is represented in both the Flemish and French part of the country.
In some cases, people are more likely to use English search terms. Hence, English terms will
be considered as well to be relevant search queries. Note that the search terms for each
seasonal product type are selected based on a brainstorming session with Dutch and French
people from the Supply Chain department. The most likely search terms for which Google
Trends data obtains the most significant search results are selected. It should be noted people
are more likely to use words without accents, spaces or hyphens when using the search
engine, as they want to obtain the result absolutely fast. This remark is very important; when
searching for the right queries all kind of writing combinations will be considered in order to
determine the most appropriate queries.
When considering flu-related medicines, nine queries were selected related to ‘flu’ and are
summarized in Table 6.7. Following the procedure to obtain GFT, the correlation between the
best Belgian search term, that is the term for which the sum of the searches over the entire
period is largest, and all the other terms is calculated using SAS Enterprise Guide. The best
search term is represented as the first term in the table with a correlation of one. Based on the
correlations a selection of the individual search terms, which are significantly correlated with
the best search term, can be chosen to make up the trends data for the selection of products.
All search terms having a positive correlation with a significance level lower than 0.0001 with
‘flu’ will be used to obtain the additional explanatory variable to include in the baseline
model. Hence, ‘griepvaccin’ and ‘vaccin grippe’ will be left out of the selection to obtain the
25 We expect the obtained insights from the analysis will be more or less the same since we only exclude a small fraction of
the historical dataset. Note that KMI requires annual payments to provide up-to-date statistics to enterprises. This analysis,
with a limited amount of data over time, will be used to investigate if it pays off to include this additional information and
thus if it is worth the additional annual expense.
80
desired result. This is in line with the research of Polgreen et al. (2008) stating that most
influenza vaccination occurs before the influenza season and therefore all vaccination related
searches should be excluded.
Search Term Correlation Significance
Flu 1 <0.0001
Griep 0.76701 <0.0001 Grippe 0.75218 <0.0001 Symptome grippe 0.65322 <0.0001
Griepepidemie 0.5527 <0.0001 Epidemie grippe 0.40696 <0.0001 Griepsymptomen 0.38753 <0.0001 Vaccin grippe 0.17092 0.0056
Griepvaccin 0.16803 0.0065
Table 6.7: Flu: Search terms
Following the same procedure, table 6.8 provides an overview of the search terms related to
sunscreens. All search terms have a positive correlation with a significance level lower than
0.0001 with ‘zonnecreme’ and the sum of the individual queries will be used to obtain the
additional explanatory variable to insert into the baseline model.
Search Term Correlation Significance
Zonnecreme 1 <0.0001
Creme solaire 0.81553 <0.0001 Crème solaire 0.74959 <0.0001
Coup de soleil 0.74110 <0.0001 Zonnebrand 0.73351 <0.0001 Beste zonnecreme 0.69304 <0.0001 Protection solaire 0.61909 <0.0001 Sunscreen 0.60403 <0.0001 Zonnecrème 0.50642 <0.0001 Aftersun 0.45287 <0.0001
Table 6.8: Sunscreens: Search terms
Table 6.9 provides an overview of the mosquito-related search terms. A study of the
individual queries revealed all search terms have a positive correlation with a significance
81
level lower than 0.0001 related to ‘Deet’. Hence, they will all be used as queries to obtain the
additional explanatory variable to include into the baseline model.
Search Term Correlation Significance
Deet 1 <0.0001
Anti moustique 0.81729 <0.0001
Anti muggen 0.79214 <0.0001
Muggenbeten 0.74323 <0.0001
Anti muggen armband 0.63687 <0.0001
Anti-moustique 0.60549 <0.0001 Bracelet anti moustique 0.55271 <0.0001
Muggenspray 0.54387 <0.0001
Antimoustique 0.50703 <0.0001 Muggenbeten behandelen 0.48252 <0.0001
Table 6.9: Mosquito: Search terms
Finally, insecticides consist of products against fleas, ticks and worms for cats and dogs.
These products fall into the same category ‘insecticides’, because they have the common
characteristic that people will search for these products when the weather is ‘good’ and they
walk their dogs outside or they let their cats outside more frequently. However, the correlation
of the search terms will be analyzed individually, because people are more likely to search for
a specific product against fleas, ticks or worms. The search terms are presented in table 6.10,
table 6.11 and table 6.12, respectively. They are all positively correlated and significant with
the best search term.
Search Term Correlation Significance
Puces chat 1 <0.0001
Vlooien hond 0.26808 <0.0001 Vlooien kat 0.33738 <0.0001 Vlooienband 0.33394 <0.0001 Vlooienbeten 0.41434 <0.0001 Puces chien 0.33271 <0.0001 Anti puce 0.53034 <0.0001 Anti puce chien 0.39699 <0.0001 Anti puce chat 0.34205 <0.0001
Table 6.10: Fleas: Search terms
82
Search Term Correlation Significance
Tique chien 1 <0.0001
Tique chat 0.99864 <0.0001
Tekenbeet hond 0.99835 <0.0001
Table 6.11: Ticks: Search terms
Search Term Correlation Significance
Vermifuge chien 1 <0.0001
Ontwormen kat 0.99938 <0.0001 Vermifuge chat 0.99925 <0.0001 Ontwormen hond 0.99924 <0.0001 Ontworming hond 0.99919 <0.0001
Table 6.12: Worms: Search terms
6.3.2 Flu-related products
Figure 6.17 represents the sum of the individual search terms of all seven positively
correlated terms from the Belgian Internet users during the specified period. As expected,
each year the peaks are situated during the beginning of the year. External information
confirms the difference in magnitude of the peak in 2014 compared with the peak in 2015.
Mid-February 2015, the period in which the flu was at its peak, Van Ranst M. claimed there
were five to six times more people infected with the flu compared with the peak in the
previous year. According to Van Ranst, an ‘unfortunate composition’ of the flu vaccine, was
one of the reasons for the high number of infections.
Figure 6.17: Google Trends: Flu
83
First, big data in the form of Google Trends data and weather conditions will be used to
examine if this data offers additional predicting power to increase the forecast accuracy of the
demand of flu-related products. Hence, the first question is whether the additional data
sources can offer an increased predicting power when being included as independent variables
into the baseline model. To define the extended model a stepwise linear regression is executed
including all the different combinations of the independent variables, i.e. Google Trends,
precipitation and average temperature. The Akaike's Information Criterion (AIC) is used to
select the predictors; the model stops at step two because adding or removing an additional
effect does not reduce the AIC (Table 6.13). Therefore, we will add the main effect ‘Average
Temperature’ and the interaction effect ‘Trends*Average Temperature’ as independent
variables to the baseline model, because they improve the predicting accuracy. Afterwards,
the extended model will be used to execute the forecast in SAS Forecast Studio. The results
are presented in table 6.14. It can be concluded the additional predictor variables do not
reduce the weighted MAPE and therefore do not improve the forecast accuracy. At the first
sight, this seems rather strange since in 70 per cent of the SKUs the MAPE does reduce or
remains the same. How does it come then that the weighted MAPE does not reduce? For
some high volume products the MAPE increases using the additional variables, which causes
the weighted MAPE to remain about the same as without the predictor variables. For this
selection of SKUs, SAS Forecast Studio fits ARIMAX models to the data because they give a
better prediction based on the holdout sample. Though, the weighted MAPE (cf. Appendix C)
is calculated based on the entire historical period. ARIMA models that only take the own
history into account and ESM models (cf. Appendix B) remain more valuable when taking
the entire period into account and for these type of models additional predictor variables do
not add value. The SKUs (30%), for which ARIMAX models are wrongly fitted to the data,
are represented in Appendix D.
84
Step Effect Entered
Number Effects
In AIC SBC
0 Intercept 1 FLAG_MF 2 SWITCH_TO_PROMO
26 3 405084.506 366371.200
1 Temp 4 404879.117 366174.375 2 Trends*Temp 5 404833.570* 366137.393*
* Optimal Value of Criterion
Table 6.13: Flu: Stepwise Linear Regression
MAPE
Baseline model 36.04 Baseline model incl. predictor variables 36.68
Table 6.14: Flu: SAS Forecast Results
Afterwards, the second question (i.e. whether the trends data can be used to exclude the
extreme order quantities from the historical data) will be investigated in an attempt to improve
the base forecast accuracy. Although SAS Forecast Studio is able to deal with seasonality, it
will incorporate extreme historical peaks into the models to predict the future. Sales of flu-
related products depend highly on the seriousness with which society is infected with the flu.
Hence, extreme sales during a particular year will influence the variability in the future. SAS
Forecast Studio will predict higher sales for the future although the peak in the following year
might be significantly lower. Therefore, it seems appropriate to exclude uncommon extreme
order quantities based on the search terms. In contrast to the promotions described in section
6.2, the trends data cannot be described with a binary variable (i.e. in promotion or not). A
cut-off point needs to be determined above which the order quantity data should be excluded
from the historical dataset. The cut-off point is considered to be the third quartile of the entire
flu-related trends data, which is 86. Hence, 25 per cent of the weekly searches have an
amount of searches of more than 86 and the corresponding order quantities are considered to
cause high variability in the base forecast model. The result of this approach is represented in
table 6.15.
26 Note that ‘Flag_MF’ and ‘SWITCH_TO_PROMO’ are related to the discussion in Section 5.4, where two events were
added to the baseline model with historical order data to deal with out-of-stocks and promotions. This part of the analysis did
not yet made a distinction between different types of promotions, as described in section 6.2, since we want to analyze the
results separately.
85
MAPE
Baseline model 36.04 Baseline model excl. Q where trends > 86 26.07
Table 6.15: Flu: SAS Forecast Results excl. peaks
It is clear the additional information offers a benefit of 27.66 per cent compared to the
original forecast. Hence, the second approach of using the external data seems more
appropriate. A similar approach will be used to describe the other seasonal product types
(sunscreens, mosquito products and insecticides). The external data sources, Google Trends
data and weather conditions, will be analyzed with regard to these types of products. In a
similar fashion, the external data sources will be added as independent variables to the
baseline model. Afterwards, the trends data will be used to exclude extreme order quantities
of exceptional peaks to see if this will increase the accuracy of the base forecast for the future.
6.3.3 Sunscreens Sunscreens are seasonal products, for which it is sometimes extremely difficult to obtain an
accurate forecast because the summer in Belgium is relatively difficult to predict. Peaks can
be situated from May to September depending on the weather in Belgium. Although SAS
Forecast Studio incorporates the seasonality in the forecast based on historical order data, the
software might not always be accurate in defining the occurrence and magnitude of the peak.
Intuitively, it can be expected weather conditions have an impact on the demand of
sunscreens. Hence, it will be investigated if Google Trends data and weather conditions will
offer additional value to the base forecast accuracy.
Figure 6.18 represents the sum of the searches of all positive and significantly correlated
search terms. As expected from the supply chain perspective, peaks are most of the time
situated during the summer months, namely July and August.
86
Figure 6.18: Google Trends: Sunscreens
Conforming to flu-related products, a stepwise linear regression is executed to select the
predictor variables and can be consulted in Appendix E. The obtained main and interaction
effects are added to the baseline model. Afterwards the forecast is executed using the
extended model. The results are represented in table 6.16. It can be concluded the additional
predictor variables do not offer a benefit over the baseline model. The same explanation as
with the flu-related products can be uttered. It is clear the more information used is not always
better. Sometimes, existing historical data offers already a pretty good prediction and
additional information blurs the already existing seasonal trend and thus ARIMAX models are
wrongly fitted to the data and offer an inferior forecast compared to seasonal ARIMA or ESM
models. According to the Supply Chain department, weather conditions might not offer the
expected superior predicting results because it is known from experience pharmacies already
stock sunscreens during the months February and March and for these months the average
temperatures are still low. Therefore, seasonal models might be better to capture this trend.
MAPE
Baseline model 63.77
Baseline model incl. predictor variables 67.11
Table 6.16: Sunscreens: SAS Forecast Results
In addition, the trends data will be used to exclude the extreme order quantities from the
historical data in an attempt to improve the base forecast accuracy. Again, the trends data is a
continuous variable and a cut-off point needs to be determined above which the order data
87
should be excluded from the historical dataset. The cut-off point is considered to be the third
quartile of the entire sunscreen-related trends data, which is 79. Hence, 25 per cent of the
weekly searches have an amount of searches of more than 79 and the corresponding order
quantities for these searches are considered to cause high variability in the base forecast. The
result of this approach is presented in table 6.17. From the table it can be observed the
additional information decreases the MAPE with 33.80 per cent. Again, the second approach
of using the external data source seems more appropriate. Thus, the Supply Chain department
needs to be very careful how to apply additional sources of external information.
MAPE
Baseline model 63.77
Baseline model excl. Q where trends > 79 42.22
Table 6.17: Sunscreens: SAS Forecast Results excl. peaks
6.3.4 Mosquito products The Supply Chain department issued the problem of not being able to predict the demand of
mosquito products. More precisely, they faced an extreme shortage during the summer
months of 2016. Webpage articles evidence the extreme peak in the summer of 2016
("Massaal veel muggen deze zomer", 2016). They talk about an extraordinary mosquito
infestation due to lots of rain in June and warm weather thereafter, which are the ideal
weather conditions for mosquitoes. SAS Forecast Studio was unable to predict the extreme
peak based on historical order data. It will be investigated if Google trends data or weather
conditions can improve the forecast accuracy.
Figure 6.19 represents the sum of the searches of all ten positively correlated terms. In July
2016, there was an extreme peak of search trends in line with the news as previously
described. According to the stepwise linear regression (Appendix E) only the trends data
offers additional predicting power and will be added to the baseline model to execute a new
forecast. The results are presented in table 6.18. Similar to the previous examples, including
the Google Trends data does not offer an improved accuracy for the future and it will be
investigated if it is useful to exclude extreme data to raise the base forecast accuracy (Table
6.19). The forecast accuracy improved significantly with 58.5 per cent. The higher raise,
88
compared to the previous two seasonal product types, may be due to the fact that most
products in this selection are relatively new. New products are more difficult to predict by
SAS Forecast Studio, because the seasonal models have less historical data to rely on and the
peaks will have a higher weight when forecasting for the future. Excluding extreme peaks will
cause the data to be more reliable for the future.
Figure 6.19: Google Trends: Mosquito
MAPE
Baseline model 121.32 Baseline model incl. predictor variables
117.91
Table 6.18: Mosquito: SAS Forecast Results
MAPE
Baseline model 121.32
Baseline model excl. Q where trends > 46 50.36
Table 6.19: Mosquito: SAS Forecast Results excl. peaks
6.3.5 Insecticides Although it may not be the first thing coming up in your mind when thinking of the product
assortment of Multipharma, insecticides for cats and dogs are sold in the pharmacies. This
general term contains products to protect cats and dogs against fleas, ticks and worms. The
89
Supply Chain department has difficulties predicting this type of product and states the
demand depends highly on the weather conditions, because the weather has an impact on
when people are likely to walk their dog and let their cat outside. Hence, it seems reasonable
to investigate if adding statistical weather data offers value when forecasting seasonal
products against fleas, ticks and worms for cats and dogs. Figure 6.20 visualizes the search
terms for fleas, ticks and worms in Belgium during the specified period. It can be seen from
this figure the total sum of the three strengthens the seasonal pattern of the individual search
terms, which is why the total sum will be considered as explanatory variable. Conforming to
the previous seasonal products the results are presented in table 6.20. and table 6.21. Again,
the conclusion is that including the historical weather information does not offer an improved
forecast accuracy for the future. In contrast, excluding extreme data points based on the
search terms decreases the MAPE with 15.49 per cent.
Figure 6.20: Google Trends: Insecticides
MAPE
Baseline model 45.84
Baseline model incl. predictor variables 50.00
Table 6.20: Insecticides: SAS Forecast Results
MAPE
Baseline model 45.84 Baseline model excl. Q where trends > 435 38.74
Table 6.21: Insecticides: SAS Forecast Results excl. peaks
90
6.3.5 Summary of the results In section 6.3.2 we explained that ARIMAX models performed better than ARIMA or ESM
models exclusively based on the historical order quantities for 70 per cent of the time series’
forecasts. Unlikely, the weighted MAPE did not reduce as for 30 per cent of the products
within the selection ARIMAX models were wrongly fitted based on the holdout sample and
other models (e.g. seasonal ARIMA or ESM models) do perform better when considering the
entire period on which the weighted MAPE is calculated. Hence, it is not always true that
ARIMAX model performs better than the ARIMA or ESM model. Sometimes, using a
specific lag combination of ARIMA can produce better forecast errors than the best ARIMAX
combination for a particular analysis (Gaiya, 2016; Durka & Pastorekova, s.d.). To
summarize, Google Trends data and weather conditions are not very useful to include as
additional independent variables to the baseline model, because the historical data already
incorporates the seasonality and the additional information seems to blur this seasonal trend.
This means it does not offer additional value over the historical data. However, it can be
concluded Google Trends data can offer significant improvements when applied in the right
way, i.e. when the data are used to exclude extreme order quantities from the historical dataset
to make the base forecast accuracy more accurate in the future (similar to the promotional
products).
A summary of the weighted MAPE’s of this second research technique is presented in table
6.22.
Baseline model Updated model % Change
Flu Products 36.04 26.07 -27.66 Sunscreens 63.77 42.22 -33.80 Mosquito products 121.32 50.36 -58.49 Insecticides 45.84 38.74 -15.49
Table 6.22: Summary MAPE’s seasonal products
Google Trends data seem to be more useful for mosquito products and sunscreens. This may
be due to the fact that there are more new products in the selection and by consequence less
data input present compared to the other seasonal product types (Table 6.23). Higher benefits
are expected from using external trends data to exclude extreme peaks when the percentage of
new products is higher than 50 per cent of the total selection of products. In the following
section the impact on safety stocks and the corresponding costs will be quantified for all the
91
seasonal product types, however, it can be expected the benefits will be higher for mosquito
products and sunscreens. Afterwards, it will be more clear if Multipharma should make the
effort of including the additional trends data into the baseline model for seasonal products.
The time and cost to extract the trends data from Google has to be weighed against the cost
savings of the reduced safety inventory.
Product New SKU N SKU %
New Products
Flu 9 199 4.52
Sunscreen 161 288 55.90
Mosquito 16 30 53.33
Insecticides 15 45 33.33
Table 6.23: Percentage of new products
92
Chapter 7
Inventory
According to the Supply Chain department the ordering cost is negligible in contrast with the
large holding costs (D.V. Belle, personal communication, April 7, 2017). As defined in
chapter 3, section 3.3, the bottleneck is the available space of the warehouse. Multipharma
has a periodic review system, where the inventory levels are reviewed after a fixed period of
time T and an order is placed such that the level of current inventory plus the replenishment
lot size equals a prespecified level (OUL), similar to the theory of Chopra and Meindl (2013)
as explained in section 2.4. The review interval is the time T between successive orders and in
the case of Multipharma is different for different kind of products. The downside of a periodic
review system is that Multipharma is not able to order JIT. However, it is the most promising
technique to do because they are dealing with a large bunch of products. By applying a
periodic review system they are able to plan deliveries and capacity of the warehouse. In
contrast, when dealing with a continuous review model there can be large peaks over time
when suddenly many products are out-of-stock and require replenishment at the same time.
In a periodic review system the lot size is based on the review interval, the average demand
over that period and the current stock27
(cf. Equation 2.1). Multipharma orders 1, 2, 4 or 8
weeks of stock for the product under consideration. This power of two policies is extremely
useful because we are dealing with multiple products and multiple suppliers. The slow
moving items (8 weeks) can be grouped together with the fast and medium moving items (1, 2
and 4 weeks) when ordered with the same supplier. This facilitates truck sharing,
consolidation of efforts, simplification of shipping schedules, etc (Chopra & Meindl, 2013).
This allows Multipharma to stay within a 6 per cent range from the optimal cost.
The prominent holding cost due to the limited available warehouse space causes Multipharma
to continuously optimize its average inventory. Keeping section 2.4 in mind, safety inventory
27 If Multipharma did not sell as expected, i.e. if the current stock level is higher than expected, it is taken into account when
calculating the size of the order. In addition, return flows or alternative sources can be taken into account.
93
can be reduced lowering demand variability while keeping the CSL constant. This was
exactly the aim of the previous chapter. When evaluating stock reduction one should take into
account the fact that Multipharma is growing, which of course has a nominal effect on stock
level. Hence, it can now be explained what impact the improved demand forecasting has on
the safety inventory. The impact on safety inventory and corresponding carrying costs are
discussed in section 7.1 and section 7.2, respectively.
7.1 Safety inventory
As described in chapter 2 (section 2.4), safety inventory is the inventory carried to satisfy
demand that exceeds the forecasted demand. Hence, demand variability is one of the drivers
of safety inventory. In reality demand uncertainty cannot be neglected, it is always there.
However, demand uncertainty in the form of variability should be managed and kept at a
minimum. In chapter 6 (Section 6.2 and Section 6.3), it has been evidenced that the weighted
MAPE and therefore demand variability were reduced significantly by using internal and
external data sources. The outcome of the forecasting process is a demand distribution with
appropriate parameters and for each product. In this section the safety stock will be
calculated twice, based on the obtained parameters of both executed forecasts, by applying the
general equation 2.8. Both, the standard deviation and mean of the demand change
when new data sources are inserted. The standard deviation of the lead time and the
mean lead time are not influenced in this case study and can be obtained from SAS for
each product separately. The mean lead time and standard deviation are calculated in SAS
based on historical data, which is continuously reviewed every time there is a new delivery.
The review interval (T) is retrieved from the SAP system and inserted into SAS. Likewise, the
service level can be obtained from the SAP program for each product and is originally based
on the ABC classification. Using the ABC classification purchased parts and materials are
rank-ordered according to the annual dollar value spent on each (Hopp & Spearman, 2008).
Once an item passes a certain threshold with regard to his weekly monetary value it is
assigned to a specific class and a corresponding service level, which it should be able to fulfil.
The corresponding safety factors (z) can be calculated for products that are normally
distributed and are presented in table 7.1.
94
ABC Classification Service Level Safety factor
A 0.97 1.88 B 0.93 1.48 C 0.92 1.41
Table 7.1: Service level based on ABC classification
For normally distributed products, instead of using equation 2.8, the more simplified equation
2.11 will be used to calculate the safety stock. The standard deviation of demand during
period T+L is calculated using equation 2.3. However, for slow moving items, the demand
distribution cannot be approximated by a normal distribution (Chopra & Meindl, 2013).
Instead, the Poisson distribution with demand arriving at rate D is a better approach. Using
equation 2.2, applicable when demand is independent and identically distributed, the average
demand during T + L periods is given by . Hence, the safety stock of products
where the demand is Poisson distributed can be calculated using equation 2.8, where the OUL
is the inverse of the cumulative demand distribution function given and the required
CSL.
We will consider some products with a review period equal to 8 weeks as slow moving,
Poisson distributed items, all other products (i.e. with review periods of 4 weeks or less) are
considered as fast or medium moving, normally distributed items. More specifically we can
define the demand distribution of each product doing a Kolmogrov-Smirnov goodness-of-fit
test for normal distribution. This is particularly valuable for products with a review period of
8 weeks for which it is uncertain if they are considered as slow moving. The null hypothesis
of the Kolmogrov test is that the demand is normally distributed. If the null hypothesis is
rejected (i.e. p-value < 0.05) we assume the products are Poisson distributed. This resulted in
about 30 per cent of the SKUs defined as Poisson distributed items. Figure 7.1 is an example
of the Kolmogrov-Smirnov test of a randomly selected product for which the demand is
normally distributed and the periodic review period is 4 weeks.
95
Figure 7.1: Distribution analysis of ‘Quantity’ of randomly selected SKU
Parameter Symbol Estimate
Mean Mu 156.5409
Std Dev Sigma 44.37782
Table 7.2: Parameters for Normal Distribution
Test Statistic p Value
Kolmogorov-Smirnov D 0.05210544 Pr > D 0.150
Table 7.3: Goodness-of-Fit tests for Normal Distribution
Table 7.4 till Table 7.6 summarize the results obtained from coding the equations as explained
in section 2.4 in SAS Enterprise Guide based on the demand parameters of the base forecast
and the extended forecast for the promotional products. Table 7.7 till table 7.10 represent the
results obtained from the SAS coding based on the demand parameters of the base forecast
and the extended forecast for the seasonal products. Note that the results are presented based
on the IMS top level, because it allows a compact representation of a general applicable
classification. Seasonal products belong to only one IMS class depending on the type of
seasonal product. Moreover, from the prioritization discussion in section 6.1 it was clear the
96
Supply Chain department expected most substantial benefits on the OTC and PEC classes.
From the tables it can be concluded the safety inventory for OTC and PEC classes has
reduced significantly. In addition, improving the demand forecast accuracy also had a smaller
positive impact on the safety inventory of some other classes, such as ATC and PAC.
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
1 ATC 448 63.18 64.06 33.46 38.17 60,426.27 61,140.21 1.18% 2 OTC 636 64.96 60.71 48.97 36.1 57,006.39 52,320.51 -8.22% 3 PEC 1220 26.16 19.36 31.19 13.44 27,682.43 17,641.81 -36.27% 4 PAC 75 24.33 22.9 22.59 14.39 17,018.38 16,266.51 -4.42% 5 NUT 57 14.96 15.92 12.94 9.99 16,538.5 16,151.7 -2.34% 6 OTH 28 13.12 12.17 13.22 8.3 10,995.12 10,351.69 -5.85%
Table 7.4: Impact on safety inventory for ‘Passage délégué’
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
1 ATC 67 564.27 120.83 278.01 83.66 146,194.78 92,823.42 -14.97% 2 OTC 152 142.79 106.81 129.45 70.86 508,108.21 250,502.04 -72.24% 3 PEC 97 79.09 71.83 119.89 56.58 95,633.56 82,249.61 -3.75% 4 PAC 10 74.22 68.12 98.25 41.55 233,278.6 207,316.32 -7.28% 5 NUT 4 254.16 252.21 321.39 320.63 506,012.12 502,181.18 -1.07% 6 OTH 33 40.6 11.75 28.38 12.11 12,612.71 10,166.26 -0.69%
Table 7.5: Impact on safety inventory for ‘Promo O’
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
2 OTC 97 11.98 10.68 10.38 8.35 23,085.37 18,992.9 -17.73% 3 PEC 404 19.63 14.8 20.11 13.3 21,672.16 17,070.83 -21.23% 4 PAC 12 12.53 12.72 9.67 8.77 21,759.62 21,253.8 -2.32% 6 OTH 2 33.06 10.21 88.1 11.61 16,837.4 3,439.35 -79.57%
Table 7.6: Impact on safety inventory for ‘Action iU’
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
1 ATC 198 57.78 56.19 40.49 31.94 48,790.20 47,554.96 -2.53%
Table 7.7: Impact on safety inventory for flu-related products
97
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
3 PEC 259 20.43 15.17 20.75 15.45 44,431.74 33,832.37 -23.86%
Table 7.8: Impact on safety inventory for sunscreens
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
3 PEC 29 29.16 17.65 59.63 17.19 24,996.95 12,533.28 -49.86%
Table 7.9: Impact on safety inventory for mosquito products
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
6 OTH 32 30.43 30.42 32.93 33.17 17,911.13 17,654.16 -1.43%
Table 7.10: Impact on safety inventory for insecticides
In line with the previous obtained results, it is important to reconsider figure 2.5 (Section 2.4).
From this section we know that reduced demand variability has an impact on the safety
inventory. However, excluding outliers from the data causes the mean demand (D) to reduce
as well. The outliers were responsible for an inflation of the normal demand. Hence, from
equation 2.1 it can be concluded the analysis has an impact on the lot size quantity (Q), which
causes the cycle inventory to reduce as well. From equation 2.8 it is clear our analysis has a
significant impact on the OUL, assuming that the periodic review period (T) and mean lead
time (L) remain the same. What follows focuses on the reduction of the carrying cost due to
the reduction of the previously calculated safety inventory because this is the main focus of
this master dissertation (Supra. Section 2.4).
98
Figure 7.2: Revised periodic review policy
7.2 Impact on costs Inevitably, owning safety stock costs money, and as a consequence reduced safety inventory
will lower the money tied up in inventory. From the beginning of this chapter, we know
Multipharma’s holding cost is the most important cost component. The holding or carrying
cost can be broken down into four categories: Capital costs (or financing charges), storage
space costs, inventory services costs and inventory risk costs. The cost of capital, also known
as WACC, is the amount of money invested in inventory and the leading factor in determining
the carrying cost. The standard rule of thumb puts the carrying costs at 25 per cent of
inventory value on hand (Vermorel, s.d.). The annual cost savings resulted from the annual
reduction in safety inventory can be calculated using the previous obtained results; the safety
inventory value on hand is obtained using the previous safety stock levels for each distinct
product multiplied by the price of each product, obtained from the SAP system of
Multipharma. Comparing the annual carrying costs of both the baseline model and the
extended model results in the corresponding annual savings of improved demand forecasting
accuracy, which are presented in the following tables. As expected from the previous section
and the conclusion of section 6.3 including additional data sources for flu-related products
99
does not improve the cost savings. In addition, there is a limited annual cost savings of
1,244.43 € including the data sources for insecticides. Hence, for these two types of products
we can conclude Multipharma should not make the effort of capturing, analyzing and adding
the additional data to the baseline model. Nevertheless, using the data in a smart way as with
the other selections of products resulted in an overall yearly cost savings of 959,960.69 €.28
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
COST (€) Base model
COST (€) incl. data
% Change
1 ATC 448 63.18 64.06 33.46 38.17 80,257.45 16,486.59 -79.46% 2 OTC 636 64.96 60.71 48.97 36.10 83,153.48 17,697.96 -78.72% 3 PEC 1220 26.16 19.36 31.19 13.44 50,203.32 6,380.45 -87.29% 4 PAC 75 24.33 22.90 22.59 14.39 25,811.25 6,355.57 -75,38% 5 NUT 57 14.96 15.92 12.94 9.99 26,081.36 4,462.63 -82,89% 6 OTH 28 13.12 12.17 13.22 8.30 18,037.43 3,318.49 -81.60%
Table 7.11: Impact on cost for ‘Passage délégué’
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
COST (€) Base model
COST (€) incl. data
% Change
1 ATC 67 564.27 120.83 278.01 83.66 576,533.07 490,923.58 -14.85% 2 OTC 152 142.79 106.81 129.45 70.86 661,404.38 211,265.18 -68.06% 3 PEC 97 79.09 71.83 119.89 56.58 156,934.54 138,746.41 -11.59% 4 PAC 10 74.22 68.12 98.25 41.55 207,317.11 183,085.39 -11.69% 5 NUT 4 254.16 252.21 321.39 320.63 1,236,594.13 1,227,182.95 -0.76% 6 OTH 33 40.60 11.75 28.38 12.11 237,731.32 164,142.18 -30.95%
Table 7.12: Impact on cost for ‘Promo O’
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
COST (€) Base model
COST (€) incl. data
% Change
2 OTC 97 11.98 10.68 10.38 8.35 23,085.37 18,992.9 -17.73% 3 PEC 404 19.63 14.8 20.11 13.3 21,672.16 17,070.83 -21.23% 4 PAC 12 12.53 12.72 9.67 8.77 21,759.62 21,253.8 -2.32% 6 OTH 2 33.06 10.21 88.1 11.61 16,837.4 3,439.35 -79.57%
Table 7.13: Impact on cost for ‘Action iU’
28 A critical reader should notice that the overall cost savings are even much higher since the money tied up in cycle inventory reduces as well due to the reduction of the lot size quantity.
100
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
COST (€) Base model
COST (€) incl. data
% Change
1 ATC 198 57.78 56.19 40.49 31.94 68,433.04 68,584.96 0.22%
Table 7.14: Impact on cost for flu-related products
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
COST (€) Base model
COST (€) incl. data
% Change
3 PEC 259 20.43 15.17 20.75 15.45 84,785.30 63,932.18 -24.60%
Table 7.15: Impact on cost for sunscreens
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
3 PEC 29 29.16 17.65 59.63 17.19 26,342.71 14,467.96 -45.08%
Table 7.16: Impact on cost for mosquito products
IMS Top
IMS Label
N Obs
MEAN Base model
MEAN incl. data
STDDEV Base model
STDDEV incl. data
SS Base model
SS incl. data
% Change
6 OTH 32 30.43 30.42 32.93 33.17 56,320.57 55,076.14 -2.21%
Table 7.17: Impact on cost for insecticides
101
Chapter 8
Conclusion
From the literature it was clear big data analytics is emerging into the OM area due to
increasing pressures on companies to boost efficiency and fulfil higher customer service
levels. Researchers confirm that companies recognizing themselves as more data-driven
perform better on objective measures of financial and operational results (McAfee &
Brynjolfsson, 2012). Like many organizations, Multipharma noticed that the data it has in its
possession and especially how it makes use of it can create a competitive advantage. It is
important this big data, coming from its SAP system, Excel sheets, etc. is managed and
analyzed in an appropriate way. This is the reason why this case study started with qualitative
interviews with the Supply Chain department making up a list of potential demand drivers and
prioritizing them. Afterwards, the necessary information was captured to model the
promotional and seasonal demand drivers. The promotional events were modelled and added
to the baseline model using SAS code. This data served as an input to SAS Forecast Studio,
which executed forecasts for the selections of products. Moreover, when the forecast was
executed continuous feedback from the Supply Chain department was necessary to
understand the dynamics of certain time series and to override some forecasts based on
alerting (cf. Section 5.3). As already recognized by Hopp and Spearman (2008), forecasting is
more than selecting the right model and choosing the appropriate parameters. Equally
important is obtaining qualitative information from the forecaster to potentially override the
quantitative model. The result of the qualitative and quantitative techniques was a reduced
weighted MAPE for the three types of promotions, which confirmed the hypothesis that the
additional internal data in the form of semi-structured Excel files improved the demand
forecast accuracy. The events were used to exclude demand variability, in the form of volatile
order quantities, in order to obtain a more accurate forecast for the future.
102
Afterwards, a more extended research was executed with regard to seasonality, evaluating two
hypotheses. From the qualitative interviews was decided to use Google Trends data and
weather conditions in an attempt to improve the demand forecast accuracy of some seasonal
product types. The first hypothesis investigated whether the additional data sources could be
added to the baseline model as explanatory variables29
to improve the predicting power,
thereby using advanced ARIMAX models. However, this technique did not deliver the
expected results because SAS Forecast Studio already absorbed the seasonality, using Holt-
Winters’ seasonal exponential smoothing and seasonal ARIMA models. Although the
additional independent variables improved the predicting power of ARIMAX models, the
seasonal models were still superior and hence, additional variables were irrelevant. Moreover,
when using additional data sources the researcher should be aware of the issue of overfitting
(“More data is not always better”, 2015). Mathematically, more variables can lead to a model
with a better fit to the data that was used to train it. However, using too many variables tends
to lead to the curse of overfitting. This is where you involve so many variables as predictors
that your model is too specific to the precise historical data you trained it on, and therefore
misleads you as to the real drivers behind your target predicted variables. The second
hypothesis was based on the same approach as the promotional events. Google Trends would
be used to filter extreme peaks out of the historical dataset by setting a threshold value above
which the order quantities would be excluded in an attempt to improve the forecast accuracy.
This technique seemed to be very useful when the percentage of new products was above 50
per cent of the entire selection of products and hence, less historical data input was present.
Knowing that the previous described techniques were able to reduce demand uncertainty and
relating this to the theory described in section 2.4, the impact of the improved demand
forecast accuracy on the safety inventory and carrying costs was investigated following the
corresponding formulas in this section. As expected the cost savings were significant for all
types of promotional demand drivers and for two out of four seasonal product types (those
where the amount of new products was high). Hence, managing and analyzing the data in a
smart way resulted in an overall cost savings of 959,960.69 € per year. It is clear manual
interventions for these types of products were too computational expensive and should be
limited using advanced big data analytics. Therefore, the advice to Multipharma is to capture
the required data sources in a more structured way to improve automation and use this data in
29 Note that all main and interaction effects of Google Trends, precipitation and average temperature were evaluated using a
stepwise linear regression to obtain the final model that could be used in SAS Forecast Studio.
103
a smart way to improve the demand forecast accuracy. Though, it should be noted extracting
Google Trends data has to occur manually because there is no automated procedure to obtain
the required search results. Hence, the time and cost to extract this data should be weighed
against the cost savings of inserting the additional information. From the previous described
analysis it is clear extracting this data is worth the effort when dealing with a high percentage
of new products.
To summarize, one can conclude that big data analytics offers a significant benefit for
Multipharma because adding additional internal and external data sources to the already
existing automated demand forecasting, primarily based on historical order quantities, reduces
the forecast error and more importantly the safety inventory and corresponding carrying costs.
In general, we can state using advanced forecast capabilities (i.e. model events, using
advanced technologies, etc.) as described in section 2.3 transforms a company into a leading
company within its industry creating a competitive advantage over the others. Using a
company’s internal and external data sources in a smart way to improve demand forecast
accuracy may have a direct impact on the company’s inventory management, thereby
reducing its safety stock and carrying costs. Hence, the insights of this case study can be
extended to companies in other industries feeling the same pressure to be cost efficient and
fulfill high customer service levels. More specifically, this research is extremely valuable for
wholesale companies with a high level of product variety searching for more advanced
techniques to better match supply and demand (most of the time orders from retailers). With
this case study a contribution was made to the literature proving that data-driven demand
prediction reduces the gap between supply and demand and has a positive impact on safety
inventory, thereby raising a company’s overall profitability.
104
Chapter 9
Further Research
The supply chain to which Multipharma belongs is a ‘multi-echelon’ or ‘multi-level’
production and distribution network. This implies that the products move through more than
one step before reaching the final customer. Multipharma serves as an intermediate storage
point between the suppliers of the pharmaceutical products and the retailers. Its inventory
allows ‘risk pooling’ among the retailers and facilitates redistribution of the retailers’
inventories that might grow out of balance (Nahmias, 2009). In such a ‘multi-echelon’ supply
chain, all stages of the supply chain have to work toward the objective of maximizing total
supply chain profitability (Chopra & Meindl, 2013). Although this seems evident, in practice
multi-echelon supply chain optimization is a major challenge and mainly untouched area for
companies.
According to a presentation of Desmet (2017) only 27 per cent of all companies make use of
advanced planning systems. Hence, it can be said Multipharma is part of a minority of
companies who use advanced analytics to optimize its demand forecast, planning and
inventory system. An even smaller percentage (13%) of companies has evolved to multi-
echelon supply chain optimization (Figure 9.1). With multi-echelon supply chain
optimization it is possible to minimize inventory levels across the different echelons of the
supply chain. A central decision-maker determines all replenishment decisions in the network
based on continuously or periodically updated information about all inventories of all
products at all relevant facilities and production stages (Federgruen, 1993). As a consequence,
investing in a transportation and information infrastructure is highly required to facilitate the
effective flow of goods and information. The product should be available when the customers
need it. Hence, a very responsive replenishment system along with an outstanding information
system is required. Sharing information across the supply chain improves the utilization of
supply chain assets and the coordination of supply chain flows. However, more information is
not always better, because the cost and complexity of the infrastructure and the analysis
105
increase exponentially. Making a trade-off between complexity and value is important when
considering the information flow. It is important shared information goes along with good
supply chain coordination (Chopra & Meindl, 2013). The joint report of SAS and Purdue
University (2008) shows the top five actions, which are clearly aligned, to improve demand
management (Figure 9.1). It is remarkable improved collaboration to create forecasts is listed
as the top strategy. According to Desmet (2017), moving from a single echelon optimization
system to an inter-organizational multi-echelon optimization system might reduce the
inventory with 40 to 60 per cent (cf. Figure 9.2). Moreover, the report of SAS and Purdue
University states that a forecast at the customer level is another key strategy to improve
demand management. Hence, improving the forecast at the lower levels of the supply chain
(e.g. by using additional data sources) might be a strategy to improve overall demand
management of the supply chain.
Figure 9.1: Key strategic actions to improve demand management
Source: Purdue University & SAS Demand Management Survey 2008 @ All rights reserved
Figure 9.2: How companies define safety stocks
Source: Slideshare of Desmet B. (Solventure) @ All rights reserved
106
Based on this information it can be concluded Multipharma might benefit from moving to a
multi-echelon system and creating more visibility and coordination among its supply chain.
As known from chapter 2 (Section 2.4) inventory is an important cost component and
reducing the inventory among the entire supply chain might create benefits for all partners
involved in providing the product to the end consumer. The network design of Multipharma,
as represented in figure 3.3, shows that Multipharma might improve demand and inventory
management and by consequence lower the carrying costs, by sharing information directly
with its suppliers and especially with its pharmacies (POS). As explained in chapter 3, the
SAP system should capture and share global information from within the company and across
its supply chain. However, the latter is not yet totally on point and Multipharma is currently
optimizing its warehouse system according to the single echelon system. The global
information should be used in a smart way to increase the overall supply chain revenue.
Hence, the quality of operational decisions should improve based on real-time information
and optimization. Combining this with creating a better (data-driven) demand forecast at the
level of the pharmacies (e.g. by implementing external data sources such as pharmacy
location, customer demographics, etc.) might be an interesting topic for the future.
XIII
References
Accenture. (2008, 11 december). Most U.S. Companies Say Business Analytics Still Future
Goal, Not Present Reality.
Retrieved from https://newsroom.accenture.com/article_display.cfm?article_id=4777#rel.
Accenture. (2016, December 5). [Guest Lecture Big Data] [College-slides]. Retrieved from
http://minerva.ugent.be/courses2016/F00070102016/document/Invited_lectures/Guest_Lectur
e_-_Big_Data_-_UGent_-_20161205.pdf?cidReq=F00070102016
Alvarez, P. (2015). Product Classification in Healthcare. Retrieved from
http://www.gs1.org/sites/default/files/docs/healthcare/product_classification_in_healthcare.pd
f
Billah, B., King, L. M., Snyder, D. R., & Koehler, B. A. (2006). Exponential smoothing
model selection for forecasting (International Journal of Forecasting 22, p. 239– 247).
Retrieved from http://www.sciencedirect.com/science/article/pii/S016920700500107X
Bose, (2009) "Advanced analytics: opportunities and challenges", Industrial Management &
Data Systems, Vol. 109 Iss: 2, p.155 – 172.
Retrieved from http://www.emeraldinsight.com/doi/abs/10.1108/02635570910930073
Brynjolfsson, E., & McAfee, A. (2012, October). Big Data: The Management Revolution.
Retrieved from https://hbr.org/2012/10/big-data-the-management-revolution
Cachon, G., & Fisher, M. (1997). Campbell's soup's continuous replenishment program:
evaluation and enhanced inventory decision rules (Vol. 6, No. 3).
Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1937-5956.1997.tb00430.x/pdf
Chen, M., Mao, S., & Lin, Y. (2014). Big data: A Survey.
Retrieved from https://link.springer.com/article/10.1007/s11036-013-0489-0
Chopra, S., & Meindl, P. (2001). Supply Chain Management - Strategy, Planning and
Operation (6e ed.). Harlow, England: Pearson Education Limited.
Coghlan, T., Diehl, G., Karson, E., Liberatore, M., Luo, W., Nydick, R., Pollack-Johnson, B.,
Wagner, W. (2010). The current state of analytics in the corporation: The view from industry
leaders. Internat. J. Bus. Intelligence Res. Forthcoming.
Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of
Winning (Rev. ed.). Brighton, US: Harvard Business Press.
Desmet, B. (2017, February 13). Safety Stock Modelling in Multi-Echelon Supply Chains
[College-slides]. Retrieved from
XIV
http://minerva.ugent.be/courses2016/F00071002016/document/Slides/CH5_SCM.pdf?cidReq
=F00071002016
Durka, P., & Pastoreková, S. (n.d.). ARIMA vs. ARIMAX – which approach is better to
analyze and forecast macroeconomic time series? (Proceedings of 30th International
Conference Mathematical Methods in Economics). Retrieved from
http://mme2012.opf.slu.cz/proceedings/pdf/024_Durka.pdf
Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable Big Data: A survey (p. 70-
81). Retrieved from http://www.sciencedirect.com/science/article/pii/S1574013715000064
Eysenbach, G. (2006). Infodemiology: Tracking Flu-Related Searches on the Web for
Syndromic Surveillance.
Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839505/
Eazystock. (2015). How to Calculate Safety Stock for Inventory Management [Whitepaper].
Retrieved from http://www.eazystock.com/wp-content/uploads/2015/05/EazyStock_How-to-
Calculate-Safety-Stock-White-Paper.pdf
Federgruen, A. (1993). Handbooks in Operations Research and Management Science.
Retrieved from
https://www.researchgate.net/publication/238655665_Chapter_3_Centralized_planning_mode
ls_for_multi-echelon_inventory_systems_under_uncertainty
Fildes, R. (1989). Evaluation of Aggregate and Individual Forecast Method Selection Rules
(Page Range: 1056 - 1065). Retrieved from
http://pubsonline.informs.org/doi/pdf/10.1287/mnsc.35.9.1056
Fritsch, D. (2015, 3 augustus). 6 Inventory Control Techniques for Stock Optimization
[Blogpost]. Retrieved from http://www.eazystock.com/blog/2015/08/03/6-inventory-control-
techniques-for-stock-optimization/
Gaiya, A. (2016, March 11). Basic knowledge about time series econometrics [Blog
comment]. Retrieved from https://www.quora.com/How-accurate-is-time-series-forecasting-
with-ARIMA
Ginsberg, J., Mohebbi, H. M., Patel, S. R., Brammer, L., Smolinski, S. M., & Brilliant, L.
(2009). Detecting influenza epidemics using search engine query data.
Retrieved from https://www.nature.com/nature/journal/v457/n7232/full/nature07634.html
Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M., Watts, D. (2010). Predicting consumer
behavior with Web search.
Retrieved from http://www.pnas.org/content/107/41/17486.abstract
Harrington, L. (1996), “Untapped savings abound”, Industry Week, 15 July, pp. 53-8.
Hillier, F. S., & Lieberman, G. J. (2015). Introduction to Operations Research (10e ed.). New
York, United States: Mc Graw Hill.
XV
Hopp, W. J., & Spearman, M. L. (2008). Factory Physics (3rd ed.). Chicago, Waveland:
McGraw-Hill.
IBM. Bringing big data to the enterprise. Retrieved November 15, 2016 from
https://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
Jaume, B. (2015). Analytics and the art of modeling (Pages 429–471). Retrieved from
http://onlinelibrary.wiley.com/doi/10.1111/itor.12165/full
King, G. (2016, March 17). Big Data is Not About the Data! [Slideshare].
Retrieved from http://gking.harvard.edu/files/gking/files/evbase-ufl.pdf?m=1457749684
King, P. L. (2011). Crack The Code: Understanding safety stock and mastering its equations.
Retrieved from http://web.mit.edu/2.810/www/files/readings/King_SafetyStock.pdf
Klous, S., & Wielaard, N. (2014). Wij zijn Big Data - De toekomst van de
informatiesamenleving. Amsterdam, Nederland: Business Contact.
Knilans, E. (2014, 29 juli). The 5 Vs of Big Data [Blogpost]. Retrieved from
http://blogging.avnet.com/ts/advantage/2014/07/the-5-vs-of-big-data/
Knuth, C., Fritsch, D., Seidel, D., Hallin, J., & Bendis, M. (2014, October 27). Forecasting
Accuracy: How to Manage Demand Outliers [Blog post].
Retrieved from http://www.eazystock.com/blog/2014/10/27/manage-outliers-forecasting-
accuracy/
LaValle, S., Lesser, E., Shockley, R., Hopkins, M., & Kruschwitz, N. (2011). Big Data,
Analytics and the Path From Insights (VOL.52 NO.2).
Retrieved from http://www.ttivanguard.com/realtime/bigdata.pdf
Lee, J., Kao, H., & Yang, S. (2014). Service innovation and smart analytics for Industry 4.0
and big data environment. Retrieved from http://ac.els-cdn.com/S2212827114000857/1-s2.0-
S2212827114000857-main.pdf?_tid=f5bd7d72-178e-11e7-9b6b-
00000aacb35f&acdnat=1491129023_732ecffad84ccab02d0e9cbbd2a24e3b
Li, J., Cheng, Y., & Zhao, L. (2015). Big Data in product lifecycle management (p. 667-684).
Retrieved from http://link.springer.com/article/10.1007/s00170-015-7151-x
Liberatore, M. J., & Luo, W. (2010). The Analytics Movement: Implications for Operations
Research (p. 313 - 324).
Retrieved from http://pubsonline.informs.org/doi/abs/10.1287/inte.1100.0502
Lohr, S. (2015). Dit is big data wat het is, hoe het werkt en wat het oplevert. Amsterdam,
Nederland: Maven Publishing.
Madden, S. (2012). From Databases to Big Data.
Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6188576
XVI
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, H. A.
(2011). Big data: The next frontier for innovation, competition, and productivity.
Retrieved from file:///Users/ranitorrekens/Downloads/MGI_big_data_exec_summary.pdf
Massaal veel muggen deze zomer. (2016, June 10). Retrieved from
http://www.hln.be/hln/nl/2764/milieu/article/detail/2729957/2016/06/10/Massaal-veel-
muggen-deze-zomer.dhtml
Mazzocchi, F. (2015). Could Big Data be the end of theory in science? (EMBO reports
(2015)). Retrieved from
http://embor.embopress.org/content/early/2015/09/10/embr.201541001.abstract
McKinsey&Company. (2015). Industry 4.0. How to navigate digitization of the
manufacturing sector. Retrieved from
https://www.mckinsey.de/files/mck_industry_40_report.pdf
Nahmias, S. (2009). Production & Operations Analysis (6e ed.). New York, US: McGraw-
Hill/Irwin.
Provideor (2015). Supply chain platform: blueprint - central warehouse phase.
More data is not always better data. (2015, 17 November). Retrieved from
https://dabblingwithdata.wordpress.com/2015/11/17/more-data-is-not-always-better-data/
Polgreen, M. P., Chen, Y., Pennock, M. D., & Nelson, D. F. (2008). Using internet searches
for influenza surveillance. Retrieved from
https://academic.oup.com/cid/article/47/11/1443/282247/Using-Internet-Searches-for-
Influenza-Surveillance
Purdue University & SAS. (2008). Demand Planning Maturity Model Strategies for Demand-
Driven Forecasting and Planning (Whitepaper). Retrieved from
https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/demand-planning-maturity-
model-103898.pdf
Ranst, M. V. (2015, 16 februari). "Vijf tot zes keer meer griepgevallen dan vorig jaar".
Retrieved from
http://www.goedgevoel.be/gg/nl/102/Griep/article/detail/2220182/2015/02/16/Marc-Van-
Ranst-Vijf-tot-zes-keer-meer-griepgevallen-dan-vorig-jaar.dhtml
SAS Institute Inc. (2009). How can finance and operations work together to maximize
inventory provisions while minimizing working capital costs?. Retrieved from
https://www.sas.com/content/dam/SAS/en_us/doc/solutionbrief/finance-operations-work-
together-maximize-inventory-provisions-104268.pdf
SAS Institute Inc. (2014). SAS® Forecast Studio 13.2: User’s Guide. NC, USA: Cary.
Shah, N. (2004). Pharmaceutical supply chains: key issues and strategies for optimisation
(Volume 28, Issues 6–7).
Retrieved from http://www.sciencedirect.com/science/article/pii/S0098135403002333
XVII
Supply chain als wissel: Multipharma creëert supply chain organisatie (2012). Retrieved from
http://doczz.nl/doc/373808/supply-chain-als-wissel30
Van der Zee, B., & van der Zee, W. (2016). Succes met big data. Culemborg, Nederland: Van
Duuren Informatica.
Vermorel, E. (n.d.). Inventory costs (Ordering costs, Carrying costs). Definition and formula.
Retrieved from https://www.lokad.com/definition-inventory-costs.
Waller, M. A., & Fawcett, S. A. (2013). Data Science, Predictive Analytics, and Big Data:
A Revolution That Will Transform Supply Chain Design and Management (p. 77-84).
Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/jbl.12010/full
Wolfe, B., Leonard, M., & Fahey, P. (n.d.). Introducing SAS® Forecast Studio.
Retrieved from http://www2.sas.com/proceedings/sugi30/193-30.pdf
Yin, S., & Kaynak, O. (2015). Big Data for Modern Industry: Challenges and Trends (Vol.
103, No. 2). Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7067026
Zhang, P. G. (2003). Time series forecasting using a hybrid ARIMA and neural network
model (Neurocomputing 50, p. 159 – 175).
Retrieved from http://www.sciencedirect.com/science/article/pii/S0925231201007020
30 Original link has been removed, probably due to unconfidential (over-optimistic) information about the way of working of Multipharma. Note that this article is completely discussed with the current Supply Chain Manager and therefore the information in this master dissertation is more valuable.
Appendix A
Table A.1: IMS top level classification
Table A.2: APB Classification
Code
(IMS Top
Level)
Description
1 ATC (Available To Counter)
2 OTC (Over The Counter)
3 PEC (Parapharmacy)
4 PAC (Parapharmacy Accessories)
5 NUT (Nutrition)
6 OTHER
Code Description
A Accessoires
B Bandage et Pansement
C Cosmétique
D Diététique - Nutrition - Alimentation
E Hygiene
F Pesticide à usage agricole
G dispositif médical
H Homeopathie
I Stomie et Incontinence
K Biocide
M Matière première
O Autres
P Divers timbre APB
R Reactif
S Spécialités
T Moyen diagnostique
V Veterinaire
Z Autres (sans contrôle CNK)
Table A.3: GSTAT Classification
Code Description
100 Specialités
101 Matières de base
102 Varia (Medical)
103 Peripharmaceutical Ugage Externe
104 Peripharmaceutical Accessoires
105 Peripharmaceutical Ugage Interne
106 Homeotherapie & Autres Therapies
107 Bandagisterie
108 Divers
Appendix B
Exponential Smoothing31 SAS Forecasting Studio takes into account all kind of exponential smoothing models that
capture common features of time series such as trend and seasonal effects. In other words,
based on the time series data the best model is selected, which is a derivation of the simple
exponential smoothing, trend corrected exponential smoothing or Holt-Winters’ seasonal
exponential smoothing (SAS Institute inc., 2014). Using the simple ESM, the current forecast
(i.e. the one-step-ahead forecast for period t made in period t-1) can be estimated as the
weighted average of the last forecast and the current value of demand (Nahmias, 2009). That
is,
In symbols,
(B.1)
where is a smoothing constant between 0 and 1 chosen by the user. The forecaster can
attach larger weights to more recent observations (large ) than to observations from the
distant past or vice versa (small ), which is exactly why these kind of models are chosen.
The best value will depend on the particular data (Hopp & Spearman, 2008). SAS Forecast
Studio automatically selects the most appropriate parameter for any exponential smoothing
method by minimizing a chosen error criterium (e.g. MAPE).
Most time series can be modelled using a trend- and seasonality- corrected ESM (Winter’s
model). The equations are based on equation B.1 of the simple ESM and computing the
forecast manually is out of the scope of this master dissertation.
31 Source: https://www.otexts.org/fpp/7/1
ARIMA Models According to the research paper of Zhang (2003), ARIMA models are among the most
popular linear models for time series forecasting. The ARIMA models are a special type of
regression models where the future value of a variable is a linear function of several past
observations and random errors. Usually, time series can be decomposed into a trend,
seasonal, cyclical and irregular component. Seasonality is a particular type of autocorrelation
patterns where patterns occur every ‘season’. Seasonality must be corrected before a time
series can be fitted to the model. The underlying process that generate the time series has the
form
(B.2)
where is a constant, and are the actual value and the random error at time period t,
respectively and and are model parameters where p and q are
integer values and often referred to as orders of the model. Random errors are assumed to be
independently and identically distributed with a mean of zero and a constant variance of .
Hence, values of are affected by the values of in the past (lags). Doing a regression
without lags fails to account for the relationships over time and overestimates the relationship
between the dependent and independent variables.
ARIMA models are quite flexible because they can represent different types of time series,
namely pure autoregressive (AR), pure moving average (MA) and combined AR and MA
(ARMA) series. Regressors can be added to the right-hand-side of the forecasting equation of
an ARIMA model, which is then extended to an ARIMAX model.
If q = 0, then equation B.2 becomes a pure AR model of order p. AR models are models in
which the value of a variable in one period is related to its values in the previous periods.
is an AR model with p lags
(B.3)
where is the constant and the coefficient for the lagged variable in time t-p. Although,
this model seems very similar to a standard regression equation, the difference is that in AR
models it is likely that the variables are correlated (Nahmias, 2009).
When p = 0, the model reduces to a pure MA model of order q. MA models account for the
possibility of a relationship between a variable and the residuals from the previous periods.
is a MA with q lags
(B.4)
where is the coefficient for the lagged error term in time t-q. SAS models calculate with
a reversed sign.
The two previous models can be combined into an autoregressive ARMA model. It combines
both p AR and q MA terms, that is why it is called an model.
(B.5)
Modelling an model requires stationary variables. A stationary process has a
mean and variance that do not change over time and the process does not have trends. If the
variable is not stationary it can be detrended by regressing the variable on a time trend and
obtaining the residuals. Another way to adapt the original process is differencing. When a
variable is not stationary a differenced variable can be used. The first order differentiation
of the variable is represented by the following equation
(B.6)
If the variable is stationary after a first differentiation, the variable is called integrated of
order one. The ARMA model is called an ARIMA model where the ‘I’ stands for integrated.
An denotes an ARMA model with AR lags, MA lags, and a difference in
the order of d (SAS Institute Inc., 2014).
The Box and Jenkins’ methodology proposes to use the ACF and the PACF of the sample
data as the basic tools to identify the order of the ARIMA model (Zhang, 2003). However, in
the case study the parameters are automatically defined and the time series is made stationary
where needed using SAS Forecast Studio. The parameters are estimated such that an overall
measure of errors is minimized and as a consequence the right ARIMA model is fitted to the
time series.
Appendix C
The overall weighted MAPE is automatically calculated by SAS Forecast Studio. The
weighted MAPE can be obtained manually using SAS coding following the equations
described in this section (SAS Institute Inc., personal communication, March 15, 2017). It is
important to understand the underlying principle to extract meaningful conclusions in chapter
7.
For each SKU (k) the weighted MAPE can be calculated as
(C.1)
where is calculated based on equation 4.2, is obtained from the series
properties and is the number of forecasted points in time.
The overall weighted MAPE is given as
(C.2)
An obvious conclusion is that products with high order volumes (high mean) contribute more
to the overall weighted MAPE.
Appendix D
SKU_ID
mape flu (for baseline forecast)
model (for baseline forecast)
mape flu (with independent
variables forecast)
model (with independent
variables forecast)
000000000001646700 16.85157735 ESM 17.16675922 ARIMA32 000000000001674400 23.12974508 ESM 29.92862217 ARIMA 000000000002014800 32.18443185 ESM 43.23051178 ARIMA 000000000002015200 264.508236 ARIMA 264.6355322 ARIMA 000000000002671600 57.96121079 ESM 59.52220063 ESM 000000000002778000 31.35117121 ARIMA 32.80230246 ARIMA 000000000006532000 26.96774332 ARIMA 30.6527438 ESM 000000000008482332 49.73711485 ARIMA 55.15880607 ARIMA 000000000008498308 24.32528277 ESM 30.63013023 ARIMA 000000000008509647 77.55271236 ARIMA 77.81139202 ARIMA 000000000008524814 16.96414779 ESM 19.7677253 ARIMA 000000000008528880 20.24505678 ESM 34.15204477 ARIMA 000000000008534244 36.13061118 ESM 42.18745951 ARIMA 000000000008538617 22.7111718 ESM 23.74616356 ESM 000000000008563827 34.15609401 ESM 44.04444268 ARIMA 000000000008564583 68.3913707 ESM 71.80095335 ARIMA 000000000008564709 38.52869719 ESM 40.33744397 ESM 000000000008566370 16.06122936 ARIMA 16.19240805 ARIMA 000000000008567328 53.75199765 ARIMA 53.76211492 ARIMA 000000000008567341 71.45790345 ESM 72.7242423 ESM 000000000008568431 24.79119254 ARIMA 26.29840171 ARIMA 000000000008568432 29.59308524 ARIMA 31.40670346 ARIMA 000000000008571230 19.95005532 ESM 28.35511216 ARIMA 000000000008572526 78.85943112 ARIMA 78.85943112 ARIMA 000000000008572591 49.09590779 ESM 72.49552017 ARIMA 000000000008572592 33.75106563 ESM 44.12574413 ARIMA 000000000008573326 65.93389323 ESM 70.18730001 ARIMA 000000000008573675 49.05096795 ARIMA 50.44575016 ARIMA 000000000008576888 19.76011182 ESM 19.81337922 ARIMA 000000000008576894 75.91731221 ESM 90.67450724 ARIMA 000000000008577388 17.94634802 ESM 24.02464152 ARIMA 000000000008580280 27.95232024 ESM 38.98416035 ARIMA 000000000008580287 50.53142171 ARIMA 54.37348375 ARIMA 000000000008580289 57.32347918 ARIMA 73.20493648 ARIMA 000000000008585181 49.48556489 ARIMA 52.7898008 ARIMA 000000000008588386 29.92816802 ESM 31.54511718 ARIMA 000000000008588387 24.37825548 ARIMA 41.038808 ARIMA 000000000008588400 29.18554857 ARIMA 29.73286455 ARIMA 000000000008597893 34.76248739 ARIMA 39.2599199 ARIMA
32 More specifically ARIMAX but SAS Forecast Studio does not make a distinction in notation between ARIMA and ARIMAX
000000000008598739 22.99827849 ESM 24.00223766 ARIMA 000000000008609020 69.67138069 ESM 70.34677546 ESM 000000000008609021 54.79938351 ESM 69.98649723 ARIMA 000000000008617323 44.61712284 ESM 48.03350173 ARIMA 000000000008618031 41.49044749 ARIMA 46.02573258 ARIMA 000000000008620854 15.09638952 ESM 20.84070625 ARIMA 000000000008628669 56.05047071 ESM 58.5457714 ESM 000000000008628671 73.71036699 ARIMA 75.27709823 ESM 000000000008637521 49.03898365 ESM 62.90410912 ARIMA 000000000008638142 24.35509915 ESM 26.40111577 ARIMA 000000000008641331 63.48802283 ESM 72.49928241 ARIMA 000000000008642202 50.73014213 ARIMA 50.73014213 ARIMA 000000000008642203 34.09164496 ARIMA 37.08755452 ARIMA 000000000008646136 30.05750644 ESM 47.3129377 ARIMA 000000000008647202 70.06258325 ESM 86.87830014 ARIMA 000000000008649683 68.5656189 ARIMA 82.93603379 ARIMA 000000000008651290 44.24827483 ESM 44.24856052 ESM 000000000008655007 50.34367119 ESM 50.96364378 ARIMA 000000000008656040 37.2721033 ESM 76.45641654 ARIMA 000000000008656043 41.28673049 ESM 49.532739 ARIMA 000000000008672768 34.30563267 ESM 40.55378111 ARIMA
Table D.1: selection of SKUs with higher MAPE including the independent variables
N = 61, out of selection of 198 SKUs
Appendix E Similar to flu-related products a stepwise linear regression, including all different
combinations of the independent variables (i.e. Google Trends, precipitation and average
temperature) is executed to define the extended model for sunscreens, mosquito products and
insecticides. The AIC is used to select the predictors; the model stops when adding or
removing an additional effect does not reduce the AIC.
Step Effect Entered Number Effects In AIC SBC
0 Intercept 1
FLAG_MF 2 SWITCH_TO_PROMO 3 258596.826 228005.814
1 Trends 4 255326.909 224744.225
2 Trends*Temp 5 255192.796 224618.442
3 Temp 6 255143.984 224577.959 4 Trends*Temp*Rain 7 255115.344 224557.649 5 Temp*Rain 8 255054.620* 224505.254*
* Optimal Value of Criterion
Table E.1: Sunscreens: Stepwise Linear Regression
Step Effect Entered Number
Effects In AIC SBC
0 Intercept 1 FLAG_MF 2 SWITCH_TO_PROMO 3 38467.2862 34702.0002
1 Mosquito_Total 4 37894.4810* 34135.4331*
* Optimal Value of Criterion
Table E.2: Mosquito: Stepwise Linear Regression
Step Effect Entered Number
Effects In AIC SBC
0 Intercept 1 FLAG_MF 2 SWITCH_TO_PROMO 3 50814.9718 45111.9275
1 Trends*Temp 4 50693.8673* 44997.4749*
* Optimal Value of Criterion
Table E.3: Insecticides: Stepwise Linear Regression