continuously assessing and improving software quality with...

Received April 15, 2019, accepted May 6, 2019, date of publication May 17, 2019, date of current version June 6, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2917403

Continuously Assessing and Improving SoftwareQuality With Software Analytics Tools:A Case StudySILVERIO MARTÍNEZ-FERNÁNDEZ 1, ANNA MARIA VOLLMER 1, ANDREAS JEDLITSCHKA 1,XAVIER FRANCH 2, LIDIA LÓPEZ 2, PRABHAT RAM3, PILAR RODRÍGUEZ 3,SANJA AARAMAA4, ALESSANDRA BAGNATO5, MICHAŁ CHORAŚ6,7, AND JARI PARTANEN81Fraunhofer Institute for Experimental Software Engineering (IESE), 67663 Kaiserslautern, Germany2Department of Service and Information System Engineering, UPC-BarcelonaTech, 08034 Barcelona, Spain3Faculty of Information Technology and Electrical Engineering, University of Oulu, 90014 Oulu, Finland4Nokia, 90620 Oulu, Finland5Softeam, 75016 Paris, France6ITTI, 61-612 Poznań, Poland7Faculty of Telecommunications, Computer Science and Electrical Engineering, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland8Bittium, 90590 Oulu, Finland

Corresponding author: Silverio Martínez-Fernández ([email protected])

This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement 732253(Q-Rapids: Quality-Aware Rapid Software Development).

ABSTRACT In the last decade, modern data analytics technologies have enabled the creation of softwareanalytics tools offering real-time visualization of various aspects related to software development and usage.These tools seem to be particularly attractive for companies doing agile software development. However,the information provided by the available tools is neither aggregated nor connected to higher quality goals.At the same time, assessing and improving the software quality has also been the key targets for the softwareengineering community, yielding several proposals for standards and software quality models. Integratingsuch quality models into software analytics tools could close the gap by providing the connection to higherquality goals. This paper aims at understanding whether the integration of quality models into softwareanalytics tools provides understandable, reliable, useful, and relevant information at the right level of detailabout the quality of a process or product and whether practitioners intend to use it. Over the course ofmore than a year, four companies involved in this case study deployed such a tool to assess and improvesoftware quality in several projects. We used standardized measurement instruments to elicit the perceptionof 22 practitioners regarding their use of the tool. We complemented the findings with debriefing sessionsheld at the companies. In addition, we discussed challenges and lessons learned with four practitionersleading the use of the tool. The quantitative and qualitative analyses provided positive results, i.e., thepractitioners’ perception with regard to the tool’s understandability, reliability, usefulness, and relevancewas positive. Individual statements support the statistical findings, and constructive feedback can be usedfor future improvements.We conclude that the potential for future adoption of qualitymodels within softwareanalytics tools definitely exists and encourage other practitioners to use the presented seven challenges andseven lessons learned and adopt them in their companies.

INDEX TERMS Agile software development, case study, quality model, software analytics, softwareanalytics tool, software quality.

I. INTRODUCTIONNowadays, a company’s ability to innovate is increasinglydriven by software. Digital technologies play a key role in

The associate editor coordinating the review of this manuscript andapproving it for publication was Waldemar W. Koczkodaj.

the transformation of many industrial companies [1], espe-cially in sectors like the automotive industry, where software(together with electronics) is responsible for over 90% of allinnovations [2], [3]. In this context, software quality makesthe difference and is an essential competitive factor for com-pany success.

VOLUME 7, 20192169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

68219

https://orcid.org/0000-0001-9928-133X

https://orcid.org/0000-0002-3563-8253

https://orcid.org/0000-0003-3590-6331

https://orcid.org/0000-0001-9733-8830

https://orcid.org/0000-0002-6901-9223

https://orcid.org/0000-0002-0618-6104

S. Martínez-Fernández et al.: Continuously Assessing and Improving Software Quality With Software Analytics Tools

Companies use modern scalable data ingestion tech-nologies (e.g., Apache Kafka,1 ActiveMQ2) togetherwith data visualization and analytics technologies (e.g.,Tableau,3 Microsoft Power BI4) to learn more about theirbusinesses [4]. These technologies have enabled the conceptof the real-time enterprise, which uses up-to-date informationand acts on events as they happen [5]. This is also the casein software engineering, where software analytics aims atdata-driven software development based on software, pro-cess, and usage data [6]–[9]. Software analytics has particularpotential in the context of modern software developmentprocesses such as Agile Software Development (ASD)due to the incremental nature of these processes, whichproduce continuous sources of data (e.g., continuous inte-gration system and customer feedback). More importantly,we can observe an increased interest in software analyticsby different players, from large companies like Microsoftand Google [10], [11] to SMEs and startups offering soft-ware analytics services (e.g., Tasktop,5 Kovair,6 Kiuwan7)and even research projects (e.g., GrimoireLab,8 Q-Rapids,9

CodeFeedr10).As reported by Forrester, companies are interested in con-

necting ‘‘an organization’s business to its software deliverycapability’’ by getting ‘‘a view into planning, health indica-tors, and analytics, helping them collaborate more effectivelyto reduce waste and focus on work that delivers value to thecustomer and the business’’ [12]. This is where a researchgap still exists: While modern software analytics tools out-perform traditional tools when it comes to gathering andvisualizing data, they still fall short of using this data to createquality-related strategic indicators [13].

According to this state of the practice, we focus on theaforementioned need to connect an organization’s businessto its software delivery capability in the context of ASDby continuously assessing and improving quality-relatedstrategic indicators (e.g., product quality, product readi-ness, and process performance) in software analytics tools.We propose using the well-known concept ofQuality Model(QM) [13]–[16] in software analytics tools as the instrumentto bridge the gap between low-level quality concepts relatedto development and usage and high-level quality-relatedstrategic indicators. Therefore, the goal of this researchis to understand practitioners’ perception regarding theintegration of QMs in software analytics tools in ASDcompanies in order to effectively assess and improvequality-related strategic indicators.

1https://kafka.apache.org/2https://activemq.apache.org/3https://www.tableau.com/4https://powerbi.microsoft.com/5https://www.tasktop.com/6https://www.kovair.com/7https://www.kiuwan.com/8https://chaoss.github.io/grimoirelab/9https://www.q-rapids.eu/10http://codefeedr.org/

Since we address the perception of practitioners, our aimwas to conduct an empirical study in industry. To this end,we needed a tool implementing the concept we wanted toevaluate. Given that no such tool is freely available on themarket, we opted for the Q-Rapids tool. The Q-Rapids tool,developed as part of the Q-Rapids project [17], offers soft-ware analytics capabilities that integrate QMs to assess andimprove software quality in the context of ASD. Its mainfunctionalities are: (a) real-time gathering of various typesof data related to the development and usage of a softwaresystem, which are the input for a QM; (b) real-time modelingof this data in terms of a QM in order to reason aboutaggregated quality-related strategic indicators; (c) presentingthis QM and data to decision makers in a multi-dimensionaland navigational dashboard for use during ASD events, suchas sprint planning or daily stand-up meetings.

In order to understand the practitioners’ view regarding ourgoal, we conducted a case study across four companies,involving 26 practitioners, to investigate the following keyaspects related to the Q-Rapids tool:

• Its understandability, reliability, and usefulness to assesssoftware quality;

• Its level of detail, relevance, perceived usefulness, andbehavioral intention to improve software quality;

• Its challenges and lessons learned, as seen by adoptersin realistic settings.

This paper is structured as follows. Section II presentsthe background and related work. Section III presents theobject of study in this case study across four companies:the integration of a QM within a software analytics tool(i.e., the Q-Rapids tool). Moreover, it provides details onhow the tool can be used to assess and improve softwarequality. Section IV describes the researchmethodology of ourcase study. Section V presents the results with respect to theparticipants’ perceptions on exploring the tool to assess andimprove software quality. Section VI discusses challengesand lessons learned as seen by adopters in ASD settings usingthe Q-Rapids tool. Section VII reports the limitations of thiswork. Finally, Section VIII concludes the article and presentsfuture work.

II. BACKGROUND AND RELATED WORKThe objective of this section is twofold: (a) to provide abackground on QMs, software analytics, and ASD; and (b) todiscuss related work on software analytics tools in ASDcompanies.

A. BACKGROUNDQuality is defined by ISO 8042 [18] as ‘‘the totality ofcharacteristics of an entity that bear on its ability to satisfystated and implied needs’’. This definition is too abstract tobe operationalized directly, and is one of the reasons whythere has been a multitude of software QMs proposed in thelast four decades (e.g., [15], [19]–[22]) that refine high-levelconcepts of ‘‘quality’’ (like reliability or efficiency) down to

68220 VOLUME 7, 2019


the level of metrics (as number of bugs or response time). Onepopular example widely adopted in industry is the ISO/IEC25010 standard [14], which determines the quality aspectsto be taken into account when evaluating the properties ofa software product. Two more recent examples well knownin industry are Quamoco and SQALE [23]. Quamoco [16]is a QM integrating abstract quality aspects and concretequality measurements. SQALE [24] computes technical debtindexes based on the violations of quality rules in the sourcecode. Other works propose refactoring models to improvea particular quality aspect of the existing model or code,and different extensions to the traditional concept of quality(such as incorporation of non-technical criteria into ISO/IEC9126-1 QM [25]).

Software analytics is defined as ‘‘analytics on softwaredata for managers and software engineers with the aim ofempowering software development individuals and teamsto gain and share insight from their data to make bet-ter decisions’’ [6]. In this context, analytics results shouldinclude some actionable knowledge that can be used toimprove software quality. Software analytics is used in vari-ous scenarios to assess concrete problems, e.g., use of processdata to predict overall project effort [26], use of securitydata to identify indicators for software vulnerabilities [27],or classification of app reviews [28]. In this context, softwareanalytics tools provide features for specifically visualizingsoftware development aspects. In our view, some notableexamples are SonarQube,11 Kiuwan, Bitergia,12 and Tasktop,all of which have been discussed in related work.

ASD relies on short feedback cycles as a way to pro-vide flexibility and rapid adaptation to market fluctuations.In this context, decisions are also made more frequently. Forinstance, product releases and the related decisions take placein short intervals, instead of months/years as was the casewith traditional software development approaches such as thewaterfall model. Popular agile practices such as continuousintegration [29] provide a tight connection to development toensure errors are detected and fixed as soon as possible. Thecurrent tendency to shorten feedback cycles even further (e.g.,continuously getting feedback from customers and being ableto react on that) enhances the potential for software analytics.Continuous software engineering [30], which has its roots inASD, represents a holistic approach to providing tight con-nections among all software development activities, includ-ing not only integration but also aspects such as businessand development (BizDez) and development and operations(DevOps).

Our work aims to further understand the challenges lim-iting industrial uptake of QM-based software analytics toolsin ASD companies, as seen by practitioners. In our research,we build on previous research on software quality modeling,integrating a QM into a software analytics tool to assess andimprove software quality (see Section III).

11https://www.sonarqube.org/12https://bitergia.com/

B. RELATED WORKSoftware analytics plays a major role in ASD and continu-ous software engineering since, properly used, the enormousamounts of data from different systems (e.g., continuousintegration system, management tools, etc have provedincreasingly useful for guiding (almost) real-time decision-making [31]–[33]. Indeed, companies like Microsoft are hir-ing data scientists for their software teams [34]. ASD andcontinuous software engineering have created numerous newopportunities for observing user behavior and monitoringhow systems and services are being used; for identifyingunexpected patterns and runtime issues; for monitoring sys-tem quality attributes; and for collecting real-time data tofeed both business and technical planning [35], [36]. Themain objective is to constantly monitor and measure bothbusiness indicators and infrastructure-related metrics in orderto facilitate and improve business and technical decision-making [37]. In the specific case of software quality, continu-ous quality monitoring enables transparency to foresee, trace,and understand important aspects of product quality in realtime [37], [38]. Support for product quality management isparticularly relevant in ASD, since it tends to overlook qualityaspects in favor of product functionality [17]. Although theliterature reports on promising advances regarding the useof analytics in continuous software engineering [31]–[33],many challenges still exist, with the lack of software analyticstools being among the most relevant ones [32].

When we look for software analytics tools typically usedin ASD, we find a large number of commercial and academicsoftware analytics tools that are available on the market.Examples of commercial tools focusing on concrete qualityaspect are SonarQube, Kiuwan, Bitergia, New Relic,13 Data-dog,14 and Taktop. SonarQube focuses on continuous codequality based on static code analysis, including assessmentof code smells, bugs, and vulnerabilities. Kiuwan productsfocus on the detection of code security vulnerabilities, alsooffering a tool specifically for Open Source Software (OSS)risk analysis. Bitergia provides actionable and customizabledashboards for analyzing OSS. New Relic allows developersto install monitoring agents and gather real-time insights fromusers such as software failures and performance improve-ments. Datadog monitors cloud-scale applications; providesmonitoring of servers, databases, tools, and services; andsupports the full DevOps stack. Tasktop aims to integrateand coordinate value streams across the DevOps stack.In addition to these commercial software analytics tools, it isworth mentioning value stream management tools for cap-turing, visualizing, and analyzing critical indicators relatedto software product development [12]. Some of these toolseven strongly support the creation of new dashboards fromdata and advanced data analytics (e.g., machine learning).With respect to tools in academia, we find CodeFeedr [39],Perceval [40], and SQuAVisiT [41]. CodeFeedr is a real-time

13https://newrelic.com/14https://www.datadoghq.com/

VOLUME 7, 2019 68221


query engine. Perceval performs automatic and incrementaldata gathering from almost any tool related to open sourcedevelopment, which can be used in Bitergia dashboards.SQuAVisiT is a generic framework supporting maintainabil-ity assessment.

Based on the aforementioned software analytics tools,we can see that the data analytics trend has had a great impacton software engineering as well (i.e., software analytics), par-ticularly in the short feedback cycles of ASD and continuoussoftware engineering. However, application of QMs in thesoftware analytics tools used in industry is not widespread,despite the need for ‘‘a view into planning, health indicators,and analytics, helping them collaborate more effectively toreduce waste and focus on work that delivers value to thecustomer and the business’’ [12]. A notable exception is theSQALE QM within SonarQube. However, its limitation isthat its main functionality of static analysis of the sourcecode is limited to a single data source (i.e., the source code).We can conclude that the aforementioned software analyticstools do not offer full capabilities, including an integratedQM, to provide quality-related strategic indicators by usingsoftware analytics results. Therefore, at the beginning of theQ-Rapids project, we decided to build the Q-Rapids tool,which integrates highly customizable (instead of pre-defined)QMs.

Some researchers have investigated the success factors aswell as the needs and challenges related to the application ofsoftware analytics in industry [6], [42]. Huijgens et al. con-ducted an empirical study to identify success and obstructionfactors regarding the implementation of software analytics inthe context of continuous delivery as a service (e.g., definingand communicating the aims upfront, standardizing data atan early stage, and building efficient visualizations). Buseand Zimmermann proposed several guidelines for analyticstools in software development (e.g., ease-of-use andmeasure-ment of many artifacts using many indicators). In our work,we focus on the key aspects needed for practitioners to adoptQMs integrated into software analytics tools in the contextof ASD.

The novelty of our work is twofold: (a) Based on inputfrom practitioners, it explores the integration of QMs intosoftware analytics tools in ASD companies, leveraging soft-ware, process, and usage data; and (b) it provides an under-standing of what is needed in the aforementioned tools toenable them to be widely accepted in industry as well as chal-lenges and lessons learned from their adoption in four ASDcompanies.

III. INTEGRATING A QUALITY MODEL INTO A SOFTWAREANALYTICS TOOLThis section includes: (a) the description of the QM wepropose in our work; (b) the Q-Rapids software analyticstool integrating the aforementioned QM; and (c) how theQ-Rapids tool can be used to assess and improve softwarequality in ASD.

A. THE Q-RAPIDS QUALITY MODELBoth academic and industry partners of the Q-Rapids projecthave iteratively created a QM for software analytics, whosemain characteristic is that it offers tool-supported customiza-tion (integration into software analytics tools) to define strate-gic indicators related to the quality of software product anddevelopment processes based on company needs. This QMaims at enabling decision makers to improve identified qual-ity deficiencies.

In the following, we will present the elements of the QMand how these elements are computed from automaticallyingested raw data. For details on the initial creation andprevious iterations, the reader is referred to [43], [44].

Following common approaches, we defined the QM for theQ-Rapids tool as a hierarchical structure composed of fivetypes of entities, each of them serving a well-defined purpose(see Figure 1):• Strategic Indicator - Quality-related aspect that a com-pany considers relevant for its decision-making pro-cesses. It represents an organization-wide goal, e.g.,product quality or process performance. A strategic indi-cator can be influenced by numerous product/processfactors.

• Product/Process Factor - The attributes of parts ofa product/process that are concrete enough to bemeasured [16], e.g., code quality, issue velocity, deliveryperformance. A product/process factor may be com-posed of numerous assessed metrics.

• Assessed Metric - Concrete description of how a spe-cific product/process factor is quantified for a specificcontext, e.g., code duplication, test success.

• Raw Data - Data that comes from different data sourceswithout undergoing any modification, e.g., files, unittests, pending issues. Typically, this data cannot bedecomposed further into smaller units.

• Data Source - Each of the different tools the compa-nies use for extracting raw data, e.g., SonarQube, Jira,Jenkins, Gerrit.

Figure 1 shows an excerpt of the elements of the Q-RapidsQM. A detailed QM (i.e., with definitions of all elementsand formulas of the assessed metrics) is available in theAppendix.

To enable assessment of continuously updated strategicindicators, there is a bottom-up calculation process, startingfrom the data sources, which is detailed in [43].

B. THE Q-RAPIDS SOFTWARE ANALYTICS TOOLThe Q-Rapids tool15 provides continuous assessment, whichcould be real-time, of the strategic indicators to decisionmakers based on theQ-RapidsQM. Figure 2 shows an excerptof the conceptual architecture of the Q-Rapids tool, depict-ing its modules related to data analytics capabilities and itsdata flow, which adopts the idea of the lambda architecture

15The tool source code and documentation are available at:https://github.com/q-rapids

68222 VOLUME 7, 2019


FIGURE 1. Excerpt of the generic Q-Rapids QM (details in Appendix).

FIGURE 2. Q-Rapids software analytics tool architecture.

approach used for Big Data solutions [45]. The main modulesof the tool are Data Gathering, Data Modeling and Analysis,and Strategic Decision Making.The Data Gathering module is composed of different

Apache Kafka connectors to gather data from heterogeneousexternal data sources, such as static code analysis (e.g., Sonar-Qube), continuous integration tools (e.g., Jenkins), coderepositories (e.g., SVN, Git, GitLab), issue tracking tools(e.g., Redmine, GitLab, JIRA, Mantis), and software usagelogs.

TheDataModeling and Analysismodule uses the gathereddata and the QM definition to assess software quality. TheQM is highly customizable to support any company’s needs.For instance, users of the Q-Rapids tool can define new strate-gic indicators, product/process factors, assessed metrics, andtheir relationships, as well as the frequency of execution ofthe QM assessment (e.g., daily, hourly). Another example:Since the QM works as a plug-in, it can also be defined fromdata mining and analysis.

The Strategic Decision Making module is responsible forthe end-user interface and provides two different types ofdashboards: (a) the Strategic Dashboard, providing homoge-neous visualization for assessing the higher-level elementsof the QM; and (b) the Raw Data Dashboards, providingspecific visualizations for the data gathered from the sourcetools.

On the one hand, the Strategic Dashboard supports deci-sion makers in their decisions related to the assessed quality

of their products (e.g., does our product have the qualityto be released?). It uses the same kind of charts for visu-alizing the assessment of the three most abstract levels ofthe QM (strategic indicators, factors, and assessed metrics).These most abstract levels (strategic indicators, factors, andassessedmetrics) work like ‘‘traffic lights’’, i.e., red stands forrisky quality, orange for neutral, and green for good quality.These generic visualizations unify the quality visualizationand hide the heterogeneity of the data and source tools. Thisalso allows navigating through the different elements, whichprovides traceability and helps to understand the assessment.

On the other hand, the Raw Data Dashboards allow deci-sion makers to take concrete actions to address a particularissue and improve quality (e.g., which concrete files are toocomplex?). They are customized for concrete QM elements,e.g., the metric Blocking files used in the Blocking strategicindicator. Therefore, they allow the user to link the qualityissue (e.g., Non-Blocking files) to the concrete elements fromthe data source tools (concrete files violating the qualityrules).

C. THE Q-RAPIDS TOOL FOR ASSESSING ANDIMPROVING SOFTWARE QUALITYThis section describes two scenarios where the Q-Rapidstool can be used by decision makers to assess or improvethe quality of their products. We follow the definitions of‘‘assessing’’ and ‘‘improving’’ software quality specified byKläs et al. [15]. Assessment refers to the process inwhich ‘‘a concept is quantified, measured and compared todefined evaluation criteria to check the fulfillment of thecriteria’’ [15]. Improvement refers to the process in which‘‘relevant factors (i.e., variation factors) influencing the con-cept of interest are known. Therefore, the concept of interestcan be improved by improving its influencing factors’’ [15].

1) ASSESSMENT SCENARIO WITH Q-RAPIDSThe (infamous) ACME company needs to deliver the nextrelease of one of their higher-quality products, NeverLate,on time. Therefore, Bob, the product manager in charge ofNeverLate, decides to use the Q-Rapids tool in order to assessand monitor Blocking situations. 16

At the beginning of the sprint, Bob receives an alertbecause the Blocking strategic indicator has dipped belowsome predefined threshold. The Q-Rapids tools visualizesthis event by changing the traffic light related to this indi-cator from green (good quality) to orange (neutral) in theStrategic Indicators View (Figure 3, needle in the radar charton the left). He delves deeper into the QM to analyze thesituation. Taking up the QM, the Blocking strategic indicatoris impacted by two product factors:Blocking code and Testingstatus. The Detailed Strategic Indicators View reveals thatthe Testing status assessment is good, while the Blocking

16‘‘Blocking’’, as defined and customized by the companies of our casestudy, refers to situations that negatively influence the regular progress of thesoftware development.

VOLUME 7, 2019 68223


FIGURE 3. Strategic Dashboard navigation schema.

code assessment is not so good (value around 0.5 on theradar chart’s corresponding axis). Bob explores the assessedmetrics impacting this factor by using the Factors View,where the Non-blocking files metric has a low assessed value(below 0.5 on the corresponding axis of the radar chart).Finally, using the Metrics Historical View (the line chart onthe right), Bob verifies that the Non-Blocking files metrichas been deteriorating since the last sprint. Based on thisassessment, Bob decides to improve the software quality bytackling the Blocking files problem.

2) IMPROVEMENT SCENARIO WITH Q-RAPIDSFollowing the Blocking problem above, Bob calls a meetingwith Jane, a senior developer working on NeverLate, to dis-cuss concrete actions to improve the Blocking files problem.Using the concrete Raw Data Dashboard corresponding toNon-blocking files, Joe and Jane learn that in the last sprint,the classes of a specific directory were changed many timesby the same developer, Brandon (Figure 4, list of blocker andcritical issues at the bottom). Moreover, the classes containfive blocker quality rule violations regarding code smells.Raw data visualization offers actionable analytics to refactorthe classes of the problematic directory, clearly indicatingwhich classes have been heavily modified and explainingthe violated quality rules. Consequently, Bob could take theconcrete action of adding a new issue to the backlog so thatBrandon can solve these problems. Table 1 gives a summaryof the actions that can be taken to improve software qualitybased on Blocking. It also indicates at which point in timeduring the ASD process the actions can be taken.

FIGURE 4. Raw Data Dashboard for blocking and critical files.

TABLE 1. Examples of actions to improve software quality for the‘‘Blocking’’ strategic indicator.

IV. EVALUATION METHODOLOGYThis section reports the methodology of our case study.

A. RESEARCH SETTINGOur research context consisted of four pilot projects inthe four companies participating in the Q-Rapids project.As reported in Table 2, the companies ranged from small andmedium-sized enterprises with up to ten products on the mar-ket to very large corporations with more than 1,000 productson the market. They develop products using ASD for variousapplication domains such as telecommunications, security,military, transport, health, and public administration.

The companies’ pilot projects had three essential thingsin common that made it possible to run a case study [46]across them: They had an ASD (Scrum-like) way of working;the stakeholders (i.e., product owners, managers, and devel-opers) were interested in having tool support to assess andimprove software quality in their setting; and the Q-Rapids

TABLE 2. Setting of the four companies’ pilot projects.

68224 VOLUME 7, 2019


tool presented in Section III was deployed and used by thesestakeholders in such settings.

In the following, we will briefly describe the context of thefour use cases. One common characteristic is that the fourcompanies had already used Q-Rapids in earlier pilot projectsthan the ones evaluated in this case study (i.e., UC1, UC2,UC3, and UC4 from Table 2).

UC1. Company 1 used Q-Rapids in its main product Mod-elio 3.8 and in Modelio NG (UC1), which is a customizationfor a customer. Modelio is the latest generation of a 25-year-old product line of a model-driven tool suite dedicated toexpressing and managing requirements, modeling softwarearchitectures, building accurate UML models, generating afull range of documentation, and automating application codeproduction for several languages. In UC1, Company 1 usedQ-Rapids with the aim of improving the quality of ModelioNG by leveraging QRs during the development phase of newversions of the product line. This involved the early detectionof anomalies, which helped to improve their ability to releasethe product on time by reducing the number of anomaliesdiscovered during the pre-release validation phase. Com-pany 1 used its experts’ knowledge to identify new strategicindicators not provided directly by the Q-Rapids tool.

UC2. Company 2 deployed Q-Rapids in two differentcontexts, applying two different strategies. For case one,Company 2 used a kind of research-oriented approach, wherethe research partners facilitated or collaborated on multiplesessions to define the QM and the strategic indicators basedon high-level strategic goals, to derive factors and metrics,and to identify relevant data sources. This was done to realizea proof of concept. Company 2 assumed that it had learnedthe necessary methods and deployment process in such detailthat they could replicate it on their own in another use case(i.e., UC2). In UC2, Company 2 decided to use their expertknowledge to identify interesting strategic indicators not pro-vided by other tools to be presented from different viewpointsby the Q-Rapids tool. The reported challenges and lessonslearned are based mainly on the deployment experience withUC2.

UC3. Company 3 implemented the solution in two con-texts. For the first context, the focus was on the use of aproof of concept of the Q-Rapids solution by a productiontest software team in the public safety domain. The researchpartners facilitated or collaborated on multiple sessions todetermine the strategic indicators as well as process factorsand metrics. This included the company’s internal develop-ment of the software lifecycle development process as wellas the identification of necessary data sources and alignmentof the development tool chain with the Q-Rapids solution. Forthe second context, identified as UC3, Company 3 expandedthe approach to a multi-context information tool develop-ment project related to product development, manufactur-ing, and maintenance. In comparison to the first context,UC3 involved several teams. The identified improvementissues regarding development tool chains, metrics, and the

Q-Rapids solution from the first context were very useful forUC3.

UC4. Company 4 first (during the proof of conceptphase) used the solution in almost finished projects in themaintenance phase, and in the software part of a research(non-commercial) project, where more experiments and aresearch-oriented approach were possible. After those initialtests, Company 4 decided to use the solution in practice tomeasure quality- and process-related metrics in the largest,most active, and most important commercial software prod-uct deployment project, here referred to as UC4 (where thesolution is still being used by the Product Owner).

B. RESEARCH QUESTIONSAs stated in Section I, our research goal was to under-stand practitioners’ perception of the integration of QMsinto software analytics tools in ASD companies in orderto effectively assess and improve quality-related strategicindicators. We split this research goal into three ResearchQuestions (RQs):

RQ1. What is the practitioners’ perception regarding theintegration of QMs into software analytics toolswhen assessing software quality in ASD?

RQ2. What is the practitioners’ perception regarding theintegration of QMs into software analytics toolswhen improving software quality in ASD?

RQ3. What are the challenges and lessons learned thatpractitioners face when integrating QMs into soft-ware analytics tools for assessing and improvingsoftware quality in ASD?

With RQ1, we investigated the assessment scenario pre-sented in Section III, while RQ2 focused on the improve-ment scenario presented. Transversal to the assessment andimprovement scenarios, RQ3 was used to investigate thechallenges and lessons learned as seen by adopters in realisticsettings. RQ3 was addressed after RQ1 and RQ2 (see detailsin Section IV.C).

For the assessment and improvement scenarios (i.e.,RQ1 and RQ2), we specified relevant sub-RQs for practition-ers (see Table 3). First, we considered that a QM within asoftware analytics tool for assessing software quality needsto be understandable, reliable, and useful in order to beused by practitioners, based on our experiences with theproof of concept of the Q-Rapids project (see [47]). Second,we considered that a QM within a software analytics tool forimproving software quality has to contain the right level ofdetail; it has to provide relevant and perceived useful actions;and practitioners should intend to use it.

Table 3 shows the mapping between the sub-RQs and theconstructs used to address them. Examples of constructs are‘Perceived usefulness’ and ‘Behavioral intention’ [48]. Thetable has four columns: (a) an identifier for the sub-RQ;(b) the sub-RQ; (c) the construct that the sub-RQ is dealingwith; and (d) the literature used to define the construct.

VOLUME 7, 2019 68225


TABLE 3. Sub-research questions.

1) HYPOTHESIS REGARDING THE INTEGRATION OF QMSWITHIN SOFTWARE ANALYTICS TOOLSThemain hypothesis of this work is that QMs within softwareanalytics tools can be used by practitioners to assess andimprove software quality. This main hypothesis was refinedinto seven sub-hypothesis, one for each sub-RQ:

• Practitioners are able to understand a QM within asoftware analytics tool to assess software quality (H1:understandability).

• Practitioners find that using a QM within a softwareanalytics tool is reliable for assessing software quality(H2: reliability).

• Practitioners find that using a QM within a softwareanalytics tool is useful for assessing software quality(H3: usefulness).

• Practitioners find the traceability between abstract qual-ity aspects and raw data of a QM within a softwareanalytics tool to be at the right level of detail to improvesoftware quality (H4: right level of detail).

• Practitioners are able to take relevant actions to improvesoftware quality based on the information providedabout a QM presented in a software analytics tool(H5: relevance).

• Practitioners are able to take perceived useful actions toimprove software quality based on the information pro-vided about a QM presented within a software analyticstool (H6: perceived usefulness).

• Practitioners intend to use a QM within a software ana-lytics tool to improve software quality rather than usinga human approach (H7: behavioral intention).

C. RESEARCH DESIGN AND SAMPLINGThis study was initiated by the Q-Rapids project members.The authors were organized into two teams. The first sevenauthors constituted the researcher team. The researcher teamwas composed of the leading team (first three authors fromFraunhofer IESE) and another four authors from two uni-versities (Technical University of Catalonia and Universityof Oulu). The last four authors, from the four companies,constituted the practitioner team. Their responsibility wasto use Q-Rapids in the setting described in Section IV.A.In addition, there were 22 other participants in the study fromthe four companies. In the following, we will briefly discussthe research design, sampling, and execution.Design: The first two authors created an evaluation guide-

line with two objectives: (a) to provide experimenters andobservers with a detailed description of the evaluation pro-cedures and guidance for managing potential confoundingfactors; and (b) to ensure equal treatment between dif-ferent evaluation steps independent of the experimenter.These guidelines included the design of our evaluation, withtwo key researcher roles: an experimenter and at least oneobserver. On the day of the evaluation, the researcher teamperformed the following steps at the premises of the fourcompanies:

(1) The experimenter explained the evaluation goals andthe procedure to the participants and asked them to sign theinformed consent.

(2) The experimenter performed a live demo in the com-pany setting to introduce the software quality assessment andimprovement scenarios of the Q-Rapids tool. Showing thesame live demo at all four companies served to ensure equaltreatment and to reduce experimenter bias. At the end of thetraining, the experimenter asked the participants to clarify anydoubts about the use of Q-Rapids tool before advancing to thenext step in the evaluation.

(3) The participants individually explored the Q-Rapidstool by working on assigned tasks. The experimenter encour-aged them to use a form to write down positive and negativeaspects of the Q-Rapids tool. This served to get a betterunderstanding of the participants’ perceptions regarding theQ-Rapids tool.

(4) The experimenter collected responses to our sub-RQsby using a structured questionnaire based on the constructs ofTable 3, answered individually. These constructs had alreadybeen validated in practice in terms of item reliability score ofthe construct or Cronbach’s alpha, which are objective mea-sures of reliability [52]. Reliability is concernedwith the abil-ity of an instrument to measure consistently [52]. All selected

68226 VOLUME 7, 2019


constructs were reported to have an item reliability scoregreater than 0.8. Each construct included up to four itemsto be rated using a five-point rating scale from 1 ‘‘stronglydisagree’’ to 5 ‘‘strongly agree’’ (where 3 was ‘‘neutral’’)and an additional ‘‘I don’t know’’ option. We instantiatedthe selected questions according to the purpose and contentof the assessment and improvement usage scenarios of theQ-Rapids tool. This served to collect the practitioners’ per-ception on these two scenarios.

(5) The experimenter asked the participants about thestrengths of the Q-Rapids tool and any suggestions forimprovements during debriefing sessions with all partici-pants. In these sessions, the participants individually usedcards to record the results, posted them on the wall, andpresented them to the rest of the group. Then the experi-menter, together with the participants, created clusters of thecards and asked the participants to prioritize strengths andweaknesses by individual voting. The goal was to comple-ment the data from the questionnaire and understand why theparticipants reported those perceptions.

(6) In parallel to the five sequential steps above, at leastone observer documented the progress of each activity usinga predefined observation protocol. The observer kept recordsof the participants’ comments and questions on the Q-Rapidstool and of any deviations from the prescribed evaluationprocedures. This activity was intended to facilitate later anal-ysis of possible threats to validity, such as experimenter bias.There were different experimenters and observers duringeach company’s evaluation.

The instruments used to support these steps will bedescribed in Section IV.D.

Finally, after the evaluation and once the research teamhad finalized the analysis, the research team presented theresults to the practitioner team, who were the ‘champions’of applying Q-Rapids in the four companies. The goal wastwofold: to validate the results and to discuss RQ3. Withrespect to RQ3, we performed the following two additionalactions:• We asked the four UC champions to summarize theirchallenges and the lessons they learned regarding the useof the Q-Rapids tool in their companies. We instructedthem to provide this description in free text in order toavoid any unnecessary constraint on their communica-tion.

• The research team consolidated the responses by cat-egorizing them while respecting the provided text asmuch as possible. The result was presented to the UCchampions, who provided their feedback (in the formof comments to the proposed text and new insightswith additional information), yielding the final input foranswering RQ3, reported in Section VI.

Population and Sampling: There were three types of targetusers of the Q-Rapids tool: product owners, managers, anddevelopers. The researchers informed the companies aboutthe target sample and the companies’ champions proposeda list of participants based on their suitability and contacted

them. Then we drew a convenient sampling including prod-uct owners, managers, and developers of the companiesinvolved [53]. At the time of the evaluation, the participantswere team members of the pilot project selected in eachcompany for the evaluation of the Q-Rapids tool.

In total, 22 persons from the four companies participatedin the evaluation conducted for RQ1 and RQ2 (see Table 4).Among these participants were two product owners, sevenproject managers, seven managers (including quality andtechnical managers), three developers, and three participantswho did not indicate their role in the demographics part of thequestionnaire, but who belonged to one of these categories.All participants had at least eight months of work experiencein their companies (Mdn = 10 years, Min = 0.75 years,Max= 32 years) and at least nine months of work experiencein their current role (Mdn= 5 years, Min= 0.8 years, Max=30 years) at the time of this evaluation.

TABLE 4. Demographic information of participants.

Execution: Between July 2018 and October 2018,we deployed the Q-Rapids tool in each company. This wasthe second major release of the Q-Rapids tool, whose firstprototype had already been deployed in January 2018. There-fore, data collected by the Q-Rapids tool reached back to Jan-uary 2018. In parallel, the first two authors trained the exper-imenters and observers responsible for performing the evalu-ation at each company. Then the experimenters and observersexecuted this study following the aforementioned proceduresbetween mid-October and November 2018. We scheduledeach evaluation session for up to 7 hours including breaks,taking into consideration the availability of the participants.

D. DATA COLLECTION INSTRUMENTSTo support the aforementioned evaluation design, theresearchers used the following six instruments (available inthe Appendix) during the evaluation:

1. Slides with evaluation goals, agenda, and the informedconsent and demographics forms.

2. Scripts for a live demo of the Q-Rapids tool assessmentand improvement scenarios to give researchers a com-mon artifact to show to the users of the Q-Rapids tool

VOLUME 7, 2019 68227


(similar to the examples in Section III). In addition,a document comprising the specification of the QM(e.g., strategic indicators).

3. The task description, consisting of specific scenariosfor assessing and improving the software quality of theproducts in which the participants were involved on adaily basis.

4. A questionnaire, to collect the practitioners’ perceptionregarding the use of the Q-Rapids tool, using the con-structs of Table 3.

5. Presentation and moderation guidelines for theresearchers to conduct a debriefing session about thestrengths of the current tool and any suggestions forimprovement.

6. The observation protocol, where the observersrecorded start time, attendees, end time, activities/events/deviations/disruptions, timestamp, and memos.

E. DATA ANALYSISThe experimenter and the observer transcribed the partici-pants’ answers (regarding the tasks, the questionnaires, andthe cards from the debriefing sessions) and the observa-tion protocol into three standardized Excel templates. Thisserved to keep the data analysis consistent among the fourcompanies.

We first carried out within-case analyses of the quantitativeand qualitative data collected for each company. Then wecompared, correlated, and integrated the results among thecompanies (cross-case analyses) [54].Quantitative Analysis: We report descriptive statistics

including the sample size (N), minimum (Min), maximum(Max), median (Mdn), and modal value (Mode) for the quan-titative analyses. If a participant answered more than half ofthe questions of one construct with ‘‘I don’t know’’ (see thequestionnaire in the fourth item of Section IV.D), we did notinclude his/her rating of this construct for our quantitativeanalysis. This happened on four occasions.

We performed a One-Sample Wilcoxon Signed RanksTest17 [55], as it is suitable for testing hypotheses with smallsamples. This served to test whether the participants signif-icantly rated the QM more positively or more negatively,i.e., to check whether or not the answers are significantlylower or greater than a selected middle point in the five-pointrating scale data of the questionnaire, i.e., H0: median (X) =θ where θ was set to 3 (the neutral point). If we were ableto reject the null hypothesis (i.e., p < α with α = 0.05),we checked for the standardized test statistic (T∗) fromwhichwe derived whether the result was positive (i.e., Mdn (X)> θ

if T∗ >0) or negative (Mdn (X)< θ if T∗ <0). Therefore,we also report the significance levels p and the standardizedtest statistics of the One-Sample Wilcoxon Signed Ranks Test(see Table 5). We used IBM SPSS Statistics 19 (including

17https://www.ibm.com/support/knowledgecenter/en/SSLVMB_22.0.0/com.ibm.spss.statistics.algorithms/alg_nonparametric_onesample_wilcoxon.htm

IBM SPSS Exact Tests for analyzing small samples) and setthe confidence level of the test at 95% (i.e., α = 0.05).Qualitative Analysis: We used thematic analysis [56] to

analyze the participants’ feedback on the Q-Rapids tool.At least two researchers derived themes inductively by codingand interpreting all observation protocols, independent ofeach other. Then these researchers compared their resultsand resolved any deviations. Moreover, we performed sev-eral peer review meetings, which included all experimenters,observers, and analysts, to review the interpretations of theelicited qualitative data. This served to keep the qualitativeanalyses grounded on the collected evidence and ensuredrigor.

V. RESULTSIn this section, we will present our results according to theRQ1 and RQ2. The results are combined from the quantita-tive analysis (see descriptive analytics and hypothesis testingin Table 5) and the qualitative analysis (thematic analysis).

A. RQ1: WHAT IS THE PRACTITIONERS’ PERCEPTIONREGARDING THE INTEGRATION OF QMS INTO SOFTWAREANALYTICS TOOLS WHEN ASSESSING SOFTWAREQUALITY IN ASD?FINDING #1. THE PARTICIPANTS UNDERSTOOD THE QMWHILE ASSESSING SOFTWARE QUALITYAll participants claimed that the strategic indicators, prod-uct factors, and process factors provided in the QM wereunderstandable (see Table 5: N=20, Min = 3, Max = 5,Mdn = 4, Mode = 4, p = 0.000, T∗ = 4.167). For ourhypothesis H1: understandability, the One-Sample WilcoxonSigned Ranks Test revealed that the participants rated theunderstandability of the QM significantly higher than theneutral point (Mdn(x)= 3). Therefore, the null hypothesisH01: understandability, i.e., that the participants’ perceptionis neutral towards the understandability of a QM within asoftware analytics tool for assessing software quality, can berejected.

The participants gave us the following main reasons whya QM within a software analytics tool is understandable (andhow understandability can be improved):• Clear structure for users. This was mentioned explicitlyby six participants from two use cases; e.g., ‘‘I under-stand and like it [the QM view] simplified’’ [UC 3]; ‘‘[Ithas a] clear structure’’ [UC 2]; ‘‘It looks simple!’’ [UC3]; ‘‘[It] provides [a] ‘common language’ from analysis(Product Owners) to action (Developers)’’ [UC 3].

• Appropriate visualizations. 15 participants from all usecases emphasized appropriate visualization and pro-vided examples such as getting information where usersexpect it to be; e.g., for the Q-Rapids tool, to havethe factor calculation rationale in the detailed strategicindicators view (‘‘At the moment, the ‘‘Detailed strategicindicators’’ [view] leads directly to the factors. It wouldbe nice to be able to see directly what metrics the strate-gic indicator is associated with.’’ [UC 2]).

68228 VOLUME 7, 2019


TABLE 5. Quantitative results of the evaluation of the QM within a software analytics tool for assessing and improving software quality.

• Support for customized settings, e.g., having explana-tions and descriptions or support for setting up appro-priate thresholds. Another participant associated thesupport for setting up appropriate thresholds with theunderstandability of the QM and asked ‘‘What arethe proper/default values for thresholds?’’ [UC 4].

• Including descriptions. Three participants from two usecases emphasized explanations of elements of QMs. Forthe Q-Rapids tool, for example, they stated that ‘‘thedescriptions of metrics need to be better. The factorsthat are made of the metrics should also have betterdescriptions’’ [UC 3], e.g., by ‘‘including descriptions

VOLUME 7, 2019 68229


of formulas, e.g., average of factors’’ [UC 3]. These areadditional features that would foster understandability.

FINDING #2. THE PARTICIPANTS FOUND THE QM ELEMENTSRELIABLE FOR ASSESSING SOFTWARE QUALITYThe computed QM elements were perceived as reliable bythe participants (see Table 5). The One-Sample WilcoxonSigned Ranks Test showed that they considered the reliabilityof the QM elements as positive, as the null hypothesis H02:reliability, i.e., that the participants’ perception is neutraltowards the reliability of a QM within a software analyt-ics tool for assessing software quality, was rejected by thistest.

The reasons given by the participants for why they foundthe QM reliable or how reliability can be further improvedcan be categorized as follows:

• Stable computations. This was explicitly mentioned byone participant: ‘‘Most of the calculation of [the] currentmodel is quite stable now and seems to follow themodel’’[UC 3]. Furthermore, two participants from one use caseexplained that the existing QM calculates values that canbe used to understand the quality of the software system(‘‘When [the QM] is producing ‘‘weird’’ values, I canknow something is wrong with the system’’ [UC 3]).

• Computational transparency, as mentioned by six par-ticipants from two use cases: ‘‘In the strategic indicatorview, in order to trust data it is important to identifywho made the threshold changes if there have beensome changes.’’ [UC 3]. Still, one participant mentionedthat transparency should be improved: ‘‘It is aggregateddata from other tools; I don’t know how exactly it iscomputed.’’ [UC 4].

• Updated computations, e.g., a dditional informationabout the last time the data was updated. One participantstated, ‘‘If Jenkins was updated 5 min ago [and this isshown], I know I can trust it.’’ [UC 3].

• Possibility to customize the QM, e.g., by adjusting theweights of the QM elements relations. This was pointedout by three participants from three use cases and hadan influence on their perceptions regarding reliability.One participant suggested improving the tool support forthese customizations: ‘‘Specifying weights for strategicindicators seem a bit weak.’’ [UC 2].

• Support for exceptions/false positives. Another partici-pant explained this with the following example: ‘‘Somefiles have to be complex in order to provide their func-tionality. It should be possible to mark them as such,so they don’t affect the metrics and show our percep-tion.’’ [UC 4].

FINDING #3. THE PARTICIPANTS FOUND THE QM ELEMENTSUSEFUL FOR ASSESSING SOFTWARE QUALITYThe participants rated the QM elements as useful(seeTable V). As the One-Sample Wilcoxon Signed Ranks Testshowed that the null hypothesis with respect to the rating

of the participants can be rejected, we can reject our nullhypothesis H03: usefulness, i.e., that the participants’ per-ception is neutral towards the usefulness of the QM ele-ments within a software analytics tool for assessing softwarequality.

The following categories of reasons were provided by theparticipants:• Gathering and combining data from heterogeneoussources. This was stated by seven participants from threeuse cases; e.g., ‘‘Merging information from differentsources is useful.’’ [UC 4], ‘‘[The QM provides a] gen-eral view and one entry for many related tools (Sonar-Qube, Project Manager, etc. [There is a] combinationof many entries’’ [UC 1], ‘‘Good combination to directtools used’’ [UC 3], ‘‘Good synthesis from a wide set ofdata’’ [UC 1].

• Inclusion of product and process quality. This is also oneof the strengths that impact the usefulness of the QM forassessing software quality (mentioned explicitly by oneparticipant): ‘‘It is good to see that you included bothaspects. Because from the product owner perspective,the process perspective is maybe even more interesting.And there are less tools to assess the process. Indeed,there are a lot of tools managing the product qualitybut there are less tools for the process.’’ [UC 4]. Fur-thermore, the possibility to show information about theproduct or the process separately was mentioned by twoparticipants from two use cases. ‘‘It would be good tohave on the highest level a distinction between productquality and process quality.’’ [UC 4]. For the Q-Rapidstool,‘‘it is not clear [at the moment] which ones of thefactors belong to the category product and which to theprocess category’’ [UC 2].

• Different abstraction levels of information for a varietyof stakeholders. This was emphasized by three partic-ipants from two use cases, e.g., ‘‘Knowing which filesare increasing the complexity is crucial for the devel-opers [. . . ] usually the product owner is not that muchinterested in these details.’’ [UC 4].

• Real-time raw data dashboards. Three participants fromtwo use cases also perceived the real-time raw data dash-boards as useful because a ‘‘better level of informationimproves reaction time and agility’’ [UC 1]. One of theseparticipants further suggested adding raw data dash-boards that consider multiple data sources, i.e., enablingraw data aggregation: ‘‘[There is] no raw data aggrega-tion: e.g. separate ElasticSearch documents for GitLabdata and SonarQube data’’ [UC 4]. These suggestionscould help to further increase the usefulness of the QMelements.

• Appropriate terminology in visualizations. Furthermore,two participants from two use cases claimed that appro-priate visualizations further improve the usefulness ofthe QM elements. This includes process-specific (i.e.,agile) terminology like time tags (e.g., sprints, mile-stones) in order to ensure compliant use of terminology.

68230 VOLUME 7, 2019


B. RQ2: WHAT IS THE PRACTITIONERS’ PERCEPTIONREGARDING INTEGRATING QMS INTO SOFTWAREANALYTICS TOOLS WHEN IMPROVING SOFTWAREQUALITY IN ASD?FINDING #4. THE PARTICIPANTS FOUND THE TRACEABILITYBETWEEN ABSTRACT QUALITY ASPECTS AND THE RAWDATA OF THE QM TO BE AT THE RIGHT LEVEL OF DETAIL TOIMPROVE SOFTWARE QUALITYThe participants assessed the right level of detail of the QMneeded to drill down from abstract levels of the quality model(e.g., strategic indicators) to the raw data (see Table V).Although three participants from two use cases consideredtraceability as strongly positive, one participant rated thetraceability of the QM as strongly negative. The One-SampleWilcoxon Signed Ranks Test showed that the participantsrated traceability significantly positively, although the signif-icance of this finding is the lowest in comparison to the restof findings (p = 0.023). Therefore, our null hypothesis H04:right level of detail, i.e., that the practitioners’ perception isneutral regarding the right level of detail to trace betweenabstract quality aspects and raw data of a QM within asoftware analytics tool to improve software quality, can berejected.

The participants provided the following reasons regardingpositive traceability:• Drill-down for in-depth knowledge. Nine participantsfrom two use cases considered it positive that the QMaggregates raw data into abstract quality indicatorsthat are traceable, as ‘‘we need to drill down and beable to trust’’ [UC 3]. Meaning that ‘‘the traceability[from strategic indicators to raw data] is very use-ful.’’ [UC 1]. One participant further explained: ‘‘Wehave different levels and you drill down if you misssome information’’ [UC 3]. Increased traceability canbe achieved if it is possible to ‘‘drill down directly tothe metrics from the visualization of the strategic indi-cators view. [This] would be nice.’’ [UC 2]. Improv-ing the connection from assessed metrics to raw datadashboards was also mentioned in another use case,e.g., ‘‘from strategic indicators to raw data in the rawdata dashboard seems now to work [for a subset] verynice’’ [UC 3].

• Holistic QM view. Five persons from two use casesstated the need for a holistic view of the complete QM.They further elaborated it, e.g., ‘‘I need to see the wholehierarchy [as a whole]’’ [UC 3];‘‘I would like to havea direct link from the assessed metrics in the QM to thecorrelated raw data in the raw data dashboard’’ [UC 4];‘‘easier access to the data visualized with the raw datadashboard from the implemented QM [is required] for afinal product’’ [UC 4]

• Traceability to source tools. One participant stated thattraceability to the source tools (i.e., data producers suchas GitLab or SonarQube) is another aspect of the QM’straceability; e.g., ‘‘including the URL in the metrics tex-tual views’’ [UC 1]. This traceability would also provide

data to allow users to understand the rationale behind thecomputation.

FINDING #5. THE PARTICIPANTS TOOK RELEVANT ACTIONSBASED ON THE QM FOR IMPROVINGSOFTWARE QUALITYThe participants assessed the actions that can be taken basedon the information provided by the QM as relevant (seeTable V). The One-Sample Wilcoxon Signed Ranks Test sup-ports this finding as the null hypothesis H05: relevance,i.e., that the participants’ perception is neutral towards tak-ing relevant actions to improve software quality based onthe information provided by a QM presented in a softwareanalytics tool, can be rejected.

The participants provided the following explanations:

• The QM supports explicit insights to solve quality prob-lems.18 participants from the four companies stated this.Among the reasons, we can find ‘‘it [the QM] allow usto navigate into concrete real metric in order to makeright decisions.’’ [UC 1]. For example, ‘‘quality wouldimprove by fixing the 5 errors’’ [UC 2] and the QMsupports to ‘‘concentrate to speed up clarifying the realbugs’’ [UC 3]. One of these participants further elabo-rated: ‘‘Knowing which files are increasing complexity iscrucial for the developers to know; this can be done bythe raw data dashboard because it is great at pinpointingthe issues in the code. Usually [the] product owner isnot that much interested in these details but for takingactions, the developers need to know where the problemis.’’ [UC 4].

• Providing precise information to be able to act onit. Participants indicated the importance of actionableinformation, and also indicated the case in which it wasnot precise enough to be able to act on it; e.g., ‘‘It ispossible that [the] raw data dashboard provides enoughinformation in some cases and for some metrics. But forexample, there is no data for finding the root cause fortest success related information.’’ [UC 2]. In addition,‘‘this raw data dashboard will not provide me possiblesolutions. For example, you can reduce the bug densityby twomethods. Either you don’t include these identifiedbugs in the current or next planned Sprint [i.e. postponeit] or the second way how to handle bug density is tohire more [. . . ] developers to increase the resources.’’[UC 4].

FINDING #6. THE PARTICIPANTS TOOK PERCEIVED USEFULACTIONS BASED ON THE QM FOR IMPROVING SOFTWAREQUALITYThe participants perceived the actions that can be taken basedon the information provided by the QM as useful for improv-ing software quality (see Table V). This finding is supportedby the rejected null hypothesis of the One-Sample WilcoxonSigned Ranks Test, which showed that the participantsgave positive ratings. Therefore, our null hypothesis H06:

VOLUME 7, 2019 68231


perceived usefulness, i.e., that the participants’ perception isneutral towards taking perceived useful actions for improv-ing software quality based on the information providedby a QM presented in a software analytics tool, can berejected.

The reasons given by the participants can be categorized asfollows:• Useful for improving quality deficiencies.As the partici-pants perceived that the QM provides explicit insights tosolve quality problems, they explained why the actionstaken based upon theQMcould support them in practice;e.g., ‘‘I can use [the QM] to identify problems’’ [UC 4];‘‘better information related to the status of the projectallows us to take more accurate decisions during meet-ing plan’’ [UC 1]. The information provided by the QMenables ‘‘better detection of quality issues which arrivesto the customer’’ [UC 1]. Moreover, one participant wasconcerned that ‘‘the quality of the commits need to bereviewed because of high trend of error identification.’’[UC 3]. Another person claimed: ‘‘I don’t have neededinformation how to use this in [practice] to improvequality but I guess this could be useful and can be usedto improve quality’’ [UC 3], for example by ‘‘[putting]more people to work on issues’’ [UC 3].

• Early detection of quality issues. Two participants fromtwo use cases claimed that the real-time informationimproves reaction time and agility as ‘‘the more infor-mation about the environment [is there,] the more pos-sible adjustments for the project [are feasible]’’ [UC4]. Furthermore, one participant explained that ‘‘earlydetection of issues [implies issues that are consequently]more simple to fix’’ [UC 1].

• Combination of different data providers in a single tool.Furthermore, the combination of many data providersin a single tool is useful for their work. One partici-pant summarized this with: ‘‘Giving developers addi-tional tools increases their work time. It takes moreeffort and time, more overload that they don’t have.Having everything in one single tool is easier to con-vince them.’’ [UC 4]. However, convincing practition-ers to use a tool is still challenging; e.g., ‘‘But wehad also challenges to convince them to use evenSonarQube.’’ [UC 4].

FINDING #7. THE PARTICIPANTS INTEND TO USE THEQM TO IMPROVE SOFTWARE QUALITYWith our final question, we asked the participants whetherthey would use the QM within a software analytics tool toimprove software quality or not; i.e., whether they wouldprefer manual/ad-hoc assessment and improvement withouttool support. The participants had the intention to furtheruse the implemented QM (see Table V). For our hypothesisH7: behavioral intention, again the One-Sample WilcoxonSigned Ranks Test revealed that their null hypothesis H07:

behavioral intention, i.e., that the participants’ perception isneutral towards intending to use a QM within a softwareanalytics tool to improve software quality rather than usinga human approach, can be rejected.

The participants gave us the following main reasons:• Raw data dashboard. One of the participants explicitlyhighlighted this twice: ‘‘I would include it [the raw datavisualization] in my daily work.’’ [UC 4]; ‘‘In the currentstate I would use Kibana [the raw data dashboard].’’[UC4].

• QM assesses quality at every lifecycle point. This wasmentioned by nine participants from four use casesbecause the Q-Rapids tool ‘‘would allow to evaluate theimpact of new features and the quality of the project atevery lifecycle point’’ [UC 1]. One of these participantscontinued: ‘‘It is based on concrete metrics which allowto measure and evaluate [...] the quality of our productduring the development life cycle; [and] has a signif-icant progress in the software development industry.’’[UC 1].

• Customization of the QM. Three participants from oneuse case emphasized the possibility to customize theQM, e.g., to add new strategic indicators, as a strengthof using the QM. ‘‘[There is the] possibility to addnew strategic indicators’’ [UC 3] and ‘‘the feature ofremoving metrics from the graph’’ [UC 3]. ‘‘We love it!’’[UC 3]. This allows practitioners to customize the QMaccording to their needs, which is an explicit prereq-uisite of using the QM for one participant, who statedthat ‘‘if the QM is improved to provide me more detail,I’d probably use it.’’ [UC 4]. In general, additionalcustomizations would increase the behavioral intention,as stated by three participants from three use cases. Forinstance, they suggested for the Q-Rapids tool to specifya ‘‘different mechanism to assess weights for the qualityfactors while assessing the strategic indicators’’ [UC2] because one of them additionally asked: ‘‘Is therenot a way to define weights for the strategic indicators’factors?’’ [UC 1].

VI. DISCUSSION: CHALLENGES AND LESSONS LEARNEDThis section presents the discussion about RQ3: Whatare the challenges and lessons learned that practition-ers face when integrating QMs into software analyt-ics tools for assessing and improving software qualityin ASD?

As a final step, based on their own experience, the prac-titioner team of this study reflected on the challenges andlessons learned when using the Q-Rapids tool to assess andimprove software quality. As mentioned in Section IV, theyprovided written feedback, which was processed by meansof a thematic analysis [56]. The resulting topics are pre-sented below, with a distinctionmade between challenges andlessons learned.

68232 VOLUME 7, 2019


A. CHALLENGES REGARDING THE USEOF THE Q-RAPIDS TOOLCHALLENGE #1. NEED FOR TAILORINGTO THE COMPANYA software analytics tool like the Q-Rapids tool requiresa comprehensive understanding of the company’s softwaredevelopment or lifecycle process to be able to extract asolid QM as required by the tool. Although the tool can beused from the beginning with common strategic indicators,it requires extra effort to completely adapt its QM to the needsof each company’s business.

CHALLENGE #2. NEED FOR A SHARED LANGUAGEOne aspect experienced by companies to date is the needto use a language already known and understood by thepotential users. Here, the QM (names for metrics, factors,strategic indicators) has to be clear and based on the practicallanguage known and used by product owners and develop-ers. Of course, this will improve with time, although it alsorequires a customization process for each company (the lan-guage and termsmight not be standardized across all softwarehouses in Europe).

CHALLENGE #3. NEED TO BE INFORMATIVEFOR THE USERA challenge with tools such as the Q-Rapids tool is how tointerpret the information that is rendered. From an end-user’sperspective, they need to understand whether a certain valueof an indicator/factor is good or bad, and how it relates to theprevious stage in the project or to other projects (a benchmarkneeds to be established and known to the users, and the ‘delta’needs to be shown clearly).

CHALLENGE #4. NEED FOR INTEGRATIONWITH OTHER TOOLSCurrently, in many software development companies andsoftware houses, teams are already using some common tools(such as Jira, GitLab, and SonarQube) tomanage the develop-ment process and the quality of their code and products. Suchtools are now embedded into developers’ way of working,and into the processes used in companies. Therefore, whenintroducing the Q-Rapids tool, it has to be clear when it isgoing to be used to assess and improve software quality, e.g.,before checking the status of the project in GitLab or Jira orafterwards. It needs to be clarified when exactly in the agile(or any other) process this new tool should be used.

CHALLENGE #5. NEED FOR TRANSPARENCYAND MORE CLARITY ON ACTIONSWhile automatic data gathering from data sources can be seenas an advantage, some end-users (i.e., product owners) mayhave a need to understand fromwhich raw data the factors andindicators are computed and how. For instance, for one of thecompanies, this was the case for the product quality indicator:What data is aggregated to arrive at a single value? One of

the crucial aspects for future adoption is the clarity regardingpossible decisions and courses of actions. The Q-Rapids toolshould be able to propose a catalog of suggested decisions (asin a typical decision support system), but the user should alsobe aware (either by training, through practice, or supportedby a clear manual) of which decisions can be made andwhy. One of the possible obstacles to successful adoptionwould be offering solutions without any clear indication andinformation of how the tool can be used, for what decisions,etc. – the potential user would be lost and discouraged fromfurther usage. Results should be connected to the process(e.g., allowing users to visualize the assessment for the lastsprint, including the next milestone, as an informative fieldto provide some context for better understand the results).

CHALLENGE #6. SIMPLIFY TOOL INSTALLATIONA general comment was that the deployment of the Q-Rapidstool in the companies was complex and cumbersome. How-ever, a closer look at this statement shows two sides of thecoin. On the one hand, although the tool included a widevariety of connectors for the most popular data sources inthe software industry, companies needed additional ones,which was not trivial in terms of knowledge required. Onecompany reported that choosing the appropriate factors,defining the metrics, and finding the best data sources forthis purpose turned out to be extremely challenging dueto several factors (e.g., access to relevant databases). Thisraised the issue of which competencies are needed in theprocess, such us knowledge on data protection regulations(e.g., EU GDPR). On the other hand, adding new strategicindicators appears to be much easier. To sum up, softwareanalytics tools should offer easy deployment options suchas dockers, and facilitate as much as possible the setting upof the tool (e.g., user interface for the configuration of thedifferent elements to decrease the technological knowledgerequired).

CHALLENGE #7. NEED FOR AN EFFICIENT TOOLCONFIGURATION PROCESSSome situations make the adoption of the Q-Rapids tool (andother software analytics tools) challenging from a techno-logical perspective. The first of these is the configuration ofthe connectors. One constraint is that connector configurationparameters (such as user name, password and end-point)need to be set up per project, even though the data sourcesare the same across projects (e.g., data sources may havemultiple deployments). In other words, for each data sourcein each case, the connectors for mining the data as well as forthe feedback loop back to the backlog must be customized.To increase the feasibility of any software analytics tool, it isnecessary to provide configurable connectors via a file oruser interface, as this decreases the need for specific cod-ing skills and decreases the overall adoption time. Second,some of the techniques may require additional tooling toaddress the subsequent challenges regarding technological

VOLUME 7, 2019 68233


integration and licenses (e.g., tools for Bayesian networksprovide limited facilities for such scaling, so any business-relevant purpose would require purchasing a license forthe tool).

B. LESSONS LEARNED FROM THE USE OF THEQ-RAPIDS TOOLLESSON LEARNED #1. EMBRACE AN INCREMENTALADOPTION APPROACHIt usually takes some time for any new solution to be fullyembedded into a company’s software development style, andany new process is usually based on some play&try in prac-tice sessions. We recommend adopting the Q-Rapids toolin a company for a smaller-scale product or solution beingdeveloped in order to enable extension of the solution to awider number of products later on. This will allow adjustingthe QM as well as potentially tailoring the data connectorsused.

LESSON LEARNED #2. BOOST TRANSPARENCYAS A BUSINESS VALUEIncremental adoption also helps to get an overview and enablefruitful discussions among developers and related stakehold-ers of the product regarding valuable indicators. For instance,one of the companies reported that, after the experienceswith two solutions using the Q-Rapids tool, the user teamsclearly recognized the added value provided by the goodvisibility of the company-defined strategic indicators andquality factors through real-time aggregated and transparentdata.

LESSON LEARNED #3. USE STRATEGIC INDICATORSAS AN ASSET FOR MONITORING PROGRESSStrategic indicators are useful not only for knowing currentvalues but also for understanding the progress of a project.One of the companies stated explicitly that Q-Rapids end-users need to know how a particular indicator’s value relatesto the previous stage in the project or to other projects (abenchmark needs to be established and known to users, andthe ‘delta’ needs to be shown clearly).

LESSON LEARNED #4. HAVE A SINGLE ACCESS POINTTO SOFTWARE-QUALITY-RELATED DATAWith the Q-Rapids tool, the project manager no longer needsto know which tools provide which metrics and to connectto them in order to understand the various measurementprocesses and to link and synchronize results from differenttools. The various indicators give the project manager a veryeasy and fast way to analyze and understand a potentiallycomplex situation. This aspect is completely built into theQ-Rapids QM and its associated dashboards where the var-ious aspects collected by the different software developmentlifecycle metrics are analyzed.

LESSON LEARNED #5. USE QMS INTEGRATED INTOSOFTWARE ANALYTICS TOOLS AS A COMMUNICATIONSPACEThe Q-Rapids tool provides effective means for differentproduct stakeholders to discuss and interact on the basis ofa common dashboard. The various metrics and results aregiven on a factual basis and facilitate the interaction betweenthe various team members and the project manager in thedecision-making process. Having different levels of abstrac-tion and views (e.g., Strategic Dashboard, Raw Data Dash-boards) opens usability up to a wider range of users (differentroles are interested in different levels of abstraction).

LESSON LEARNED #6. ENABLE TAILORINGTO A PRODUCT OR PROJECTEven within the scope of a single company, the QM mightnot be exactly the same. In a normal case, many of themetrics will persist; however, deviations over the productsbeing developed or ongoing projects may occur. These vari-ations need to be identified and incorporated into the QM(and then into the tool) as some kind of parameter. In thisdirection, one of the companies reported that some Q-Rapidsend-users would like the opportunity to play with the modelitself (plug or unplug some input data) and to assign weightsused in the aggregation process. After their first experience ina pilot project, another company decided to use their expertknowledge to identify interesting strategic indicators thatare not provided by other tools, or which would at least bepresented from a different viewpoint by the Q-Rapids tool.In another company, the person in charge of R&D decidedthat it was worth measuring the general management qualityof two projects and independently added a new indicator tomeasure that. In a matter of seconds, the customization wasachieved on the company’s common Q-Rapids website, andthis new indicator was then created for all monitored projects.Therefore, we learned that support for customizing the QMsin software analytics tools is key to assessing and improvingsoftware quality, which is one of the characteristics of theQ-Rapids tool.

LESSON LEARNED #7. INVOLVING EXPERTSPutting a product like the Q-Rapids tool into action requiresknowledge from software engineers, especially technologyexperts and data scientists. The companies pointed out someparticular situations. First, to set up the connectors, a lot ofexpertise is needed: end-users of the tool who can identifythe relevant data content from the UI; software engineerswho can assess the quality of raw data with respect to thefunctionality of the connector; and data scientists for veri-fying the relation between UI fields and the actual structureof the database. Second, some of the techniques used by theanalytics tool, such the option of Bayesian networks for cre-ating strategic indicators [57], require specific competence.Last, implementation also requires knowledge of particulartechnologies that a company might not have, e.g. Kafka.

68234 VOLUME 7, 2019


To sum up, the role of a specialist setting up the Q-Rapidstool environment will be crucial for industry adoption. If nospecialist is available, it may be necessary to hire someone oroutsource the development in order to implement the solution.

VII. THREATS TO VALIDITYAs with any empirical study, there might be limitations to ourresearchmethod and findings. This section discusses possiblethreats to validity in terms of construct, conclusion, inter-nal, and external validity [46] and emphasizes the mitigationactions applied.

Construct validity refers to whether the employed mea-sures appropriately reflect the constructs they represent.Weused seven validated, reliable instruments from the litera-ture to correctly operationalize our evaluation goals and theidentified quality aspects. Clearly defining the constructsreduced the risk of mono-operation bias. We also includedopen questions and comment fields to further collect theparticipants’ opinion. The open feedback sessions enabled usto further elaborate the practical relevance of the strengthsand drawbacks of the Q-Rapids tool with all the participants.The use of quantitative and qualitative measures and obser-vations reduced mono-method bias. Furthermore, we aimedat creating a safe environment, encouraging the participantsto highlight any negative aspects and make suggestions forthe improvement of the Q-Rapids tool. Finally, some of ourresults (particularly the challenges) could be caused by anot optimal Q-Rapids tool implementation rather than by theintegration of QMs into software analytics tools. Still, theseresults are useful for others to learn how to build such aninfrastructure in realistic settings.

Conclusion validity refers to whether correct conclusionsare drawn from (correct) statistical analysis. To ensure thereliability of this evaluation, the measurement plan and pro-cedure, including the creation of instruments for the imple-mentation and execution, were documented in detail. Forinstance, the instruments were reviewed by the completeresearcher team and by one member of the practitioner team,and during the analysis we involved researchers who had notbeen involved in the creation of the Q-Rapids tool. In thisway, we mitigated risks such as using poorly designed instru-ments or fishing for results during the analysis, which wouldhave led to a subjective analysis. Another point deservingattention was the training we provided to the experimentersand observers to enable all of them to apply the treatment ina similar way. For instance, the participants in the differentcompanies received the same training and Q-Rapids toolversion. During the task sessions, the participants performedconcrete tasks aimed at making them use the main featuresof the Q-Rapids tool. In order to compare the results amongthe companies and due to time constraints, these tasks had tobe the same for all cases. Thus, some might not have beenoptimal for all the different projects involved. Furthermore,this version of the Q-Rapids tool included real project datafrom the four use cases. However, not all available data couldbe used on the day of the evaluation, and at least one of the

selected projects had finished before. Therefore, the data hadnot changed during the previous weeks. This might have hadan influence on the answers of the participants regarding thepractical experience they gained with the tool for their work.Thus, the results can only be interpreted as an indication ofappropriateness. Furthermore, we were aware that we wouldonly get a small sample size (i.e., 22 participants) and lookedfor appropriate statistical tests.

Internal validity refers to whether observed relationshipsare due to cause-effect relationships, and whether it is pos-sible to explain unequivocally the changes in the dependentvariable due to the influence of the independent variables.

We evaluated the Q-Rapids tool by drawing a convenientsample of product owners, managers, and developers workingin several application domains and software developmentenvironments. One limitation of our work is that we werenot able to get a random sample of participants in the pilotprojects of the companies.

In addition, we defined an evaluation protocol in advance,which included a specific description of our planned proce-dure and the order of using the materials, i.e., a script ofthe demonstrations to the participants, the tasks, the ques-tionnaire, and an explanation with all the steps that had tobe performed by the experimenter and the observer. We dis-tributed all the materials to the researchers as well as to theuse case representatives (who did not participate) to collectfeedback in order to further improve our design. After allthe partners had agreed on the final version of the evalua-tion guidelines, we executed the evaluation accordingly. Thisshould mitigate the fact that we needed to split the work ofconducting the evaluation among different researchers andpartners. Some of the eight researchers who conducted theevaluation were involved in developing the Q-Rapids toolmodules. To minimize that bias, we made sure that in eachcase there were at least two researchers present, one act-ing as the moderator/experimenter and one as the observer,to emphasize that the participants could speak freely. Afterinspecting and analyzing all the collected data, including theobservation protocol, we could not identify any influence bythese researchers on the participants’ perceptions.

External validity refers to whether findings of the studycan be generalized. Our results are tied to the context of thecompanies involved in the Q-Rapids project. Our goal was tobetter understand practitioners’ perception. We characterizedthe environment as realistically as possible and studied thesuitability of our sampling (see Section IV). This case studycould be the input for further meta-analysis aimed at gener-alizing the results (e.g., [58]).

VIII. CONCLUSIONSAgile software development companies need tools to contin-uously assess and improve software quality. For this study,we conducted a case study across four European ASD com-panies to explore practitioners’ perceptions on the integrationof QMs into software analytics tools. We aimed at exploringkey aspects needed for the widespread adoption of these tools

VOLUME 7, 2019 68235


in industry, namely understandability, reliability, usefulness,right level of detail, relevance, perceived usefulness, andbehavioral intention.

Twenty-two practitioners involved in the study agreed thatthey understood the integration of QMs into software ana-lytics tools and found the QM elements reliable and usefulfor assessing software quality. In addition, the practitionersfound the QMs to be at the right level of detail and the actionstaken from the QMs to be relevant and perceived useful,and they intend to use QMs within software analytics tools.These findings are complemented with reasons for how thesequalities were achieved in the four companies and for howthey can be improved further.

Taking into account that the introduction of a new tooland its measurable impact usually takes several months toyears, the results are very promising. For future adoption,we reported seven challenges and seven lessons learned fromfour practitioners leading the application of QMs within soft-ware analytics tools in their ASD companies. One hope weassociate with our work is that it will not only contributedirectly to the existing body of knowledge on the use ofQMs within software analytics tools in industry – whichcurrently is still weak – but that it will also encourage otherpractitioners to join us in exploring this promising solution forthe problem of software quality assessment and improvementin industry.

Future work will consist of further supporting practitionersin meeting software quality challenges:

• We observed that the definition of QMs varies amongcompanies. In addition to offering tool support for QMsas plug-ins in software analytics tools, ongoing work isaimed at generating QMs not only from expert knowl-edge but using more cost-effective data mining and anal-ysis algorithms (e.g., identifying QM elements interre-lations), and at creating a QM repository.

• Building a complete software analytics infrastructure ina company (e.g., all data available from a single point)requires time and should be done incrementally. Basedon the current infrastructure in the four companies ofthis study, other software quality scenarios are beingexplored, such as prediction and simulation of quality,and semi-automatic generation of quality requirements.

• We believe that software analytics can provide greatervalue in ASD companies by mining usage data to under-stand the behavior of users, which is another directionfor future work.

• Companies need clarity about the process for deployingand applying software analytics tools in their environ-ments. A next step is a process describing softwareanalytics tools deployment and customization for newcompanies.

APPENDIXThe appendix contains the instruments used during the eval-uation: https://figshare.com/s/217851eed7ec1da7ab86

ACKNOWLEDGMENTThe authors thank all members of Bittium, ITTI, Nokia,and Softeam who participated in the evaluation of theQ-Rapids software analytics tool. They also thank AxelWickenkamp for implementing the Q-Rapids quality model,Dr. Liliana Guzmán for her support for previous evaluationactivities, and Sonnhild Namingha for proofreading thisarticle.

REFERENCES[1] M. Ardolino, M. Rapaccini, N. Saccani, P. Gaiardelli, G. Crespi, and

C. Ruggeri, ‘‘The role of digital technologies for the service transformationof industrial companies,’’ Int. J. Prod. Res., vol. 56, no. 6, pp. 2116–2132,Mar. 2018.

[2] S. F. Wiesbaden, ‘‘AUTOSAR—The worldwide automotive standard forE/E systems,’’ ATZextra Worldwide, vol. 18, no. 9, pp. 5–12, Oct. 2013.

[3] J. Bosch, ‘‘Speed, data, and ecosystems: The future of software engineer-ing,’’ IEEE Softw., vol. 33, no. 1, pp. 82–88, Jan. 2016.

[4] Magic Quadrant for Business Intelligence and Analytics Platforms.Accessed: Apr. 14, 2019. [Online]. Available: https://www.gartner.com/doc/3611117/magic-quadrant-business-intelligence-analytics

[5] C. Bussler et al., Business Intelligence for the Real-Time Enterprises?:First International Workshop, BIRTE 2006, Seoul, Korea, Sep. 11, 2006?:Revised Selected Papers. Berlin, Germany: Springer-Verlag, 2007. doi:10.1007/978-3-540-73950-0.

[6] R. P. L. Buse and T. Zimmermann, ‘‘Information needs for software devel-opment analytics,’’ in Proc. 34th Int. Conf. Softw. Eng. (ICSE), Jun. 2012,pp. 987–996.

[7] D. Zhang, S. Han, Y. Dang, J.-G. Lou, H. Zhang, and T. Xie, ‘‘Soft-ware analytics in practice,’’ IEEE Softw., vol. 30, no. 5, pp. 30–37,Sep. 2013.

[8] H. Gall, T. Menzies, L. Williams, and T. Zimmermann, ‘‘Softwaredevelopment analytics (dagstuhl seminar 14261),’’ Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl, Germany, Tech. Rep. DROPS-IDN 4763, 2014, vol. 4, no. 6. doi: 10.4230/DagRep.4.6.64.

[9] T. Menzies, ‘‘The unreasonable effectiveness of software analytics,’’ IEEESoftw., vol. 35, no. 2, pp. 96–98, Mar. 2018.

[10] J. Czerwonka, N. Nagappan, W. Schulte, and B. Murphy, ‘‘CODEMINE:Building a software development data analytics platform at microsoft,’’IEEE Softw., vol. 30, no. 4, pp. 64–71, Jul. 2013.

[11] C. Sadowski, J. van Gogh, C. Jaspan, E. Soderberg, and C. Winter, ‘‘Tri-corder: Building a program analysis ecosystem,’’ in Proc. IEEE/ACM 37thIEEE Int. Conf. Softw. Eng., May 2015, pp. 598–608.

[12] C. Condo and B. Seguin. (2018). The Forrester New WaveTM: ValueStream Management Tools, Q3 2018. Accessed: Mar. 11, 2019. [Online].Available: https://www.forrester.com/report/The+Forrester+New+Wave+Value+Stream+Management+Tools+Q3+2018/-/E-RES141538

[13] V. Basili, Aligning Organizations Through Measurement: TheGQM+Strategies Approach. Cham, Switzerland: Springer, 2014.

[14] Software Process: Improvement and Practice, Standard ISO/IEC25010:2011, 2011. [Online]. Available: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733

[15] M. Kläs, J. Heidrich, J. Münch, and A. Trendowicz, ‘‘CQML scheme:A classification scheme for comprehensive quality model landscapes,’’ inProc. Conf. EUROMICRO, 2009, pp. 243–250.

[16] S. Wagner et al., ‘‘Operationalised product quality models and assess-ment: The Quamoco approach,’’ Inf. Softw. Technol., vol. 62, pp. 101–123,Jun. 2015.

[17] L. Guzmán, M. Oriol, P. Rodríguez, X. Franch, A. Jedlitschka,and M. Oivo, ‘‘How can quality awareness support rapid softwaredevelopment?—A research preview,’’ Requirements Engineering: Foun-dation for Software Quality. Cham, Switzerland: Springer, 2017,pp. 167–173.

[18] Quality Management and Quality Assurance—Vocabulary, Standard ISO8402, 1994. Accessed: Apr. 14, 2019. [Online]. Available: https://www.iso.org/standard/20115.html

[19] B. W. Boehm, J. R. Brown, H. Kaspar, M. Lipow, G. Macleod, andM. Merrit, ‘‘Characteristics of software quality,’’ North Holland Publish-ing, New York, NY, USA, Tech. Rep., 1978.

68236 VOLUME 7, 2019

http://dx.doi.org/10.1007/978-3-540-73950-0

http://dx.doi.org/10.4230/DagRep.4.6.64


[20] V. Basili and H. D. Rombach, ‘‘TAME: Integrating measurement intosoftware environments,’’ Dept. Comput. Sci., Univ. Maryland, CollegePark, MD, USA, Tech. Rep. CS-TR-1764, TAME-TR-1-1987, Jun. 1987.[Online]. Available: http://hdl.handle.net/1903/7517

[21] C. Ebert and R. Dumke, Software Measurement: Establish-Extract-Evaluate-Execute. Berlin, Germany: Springer-Verlag, 2007. doi:10.1007/978-3-540-71649-5.

[22] N. Fenton, J. Bieman, and J. Bieman, Software Metrics. Boca Raton, FL,USA: CRC Press, 2014.

[23] C. Izurieta, I. Griffith, and C. Huvaere, ‘‘An industry perspective tocomparing the SQALE and quamoco software quality models,’’ in Proc.ACM/IEEE Int. Symp. Empirical Softw. Eng. Meas. (ESEM), Nov. 2017,pp. 287–296.

[24] J.-L. Letouzey and T. Coq, ‘‘The sqale analysis model: An analysis modelcompliant with the representation condition for assessing the quality ofsoftware source code,’’ in Proc. 2nd Int. Conf. Adv. Syst. Test. ValidationLifecycle, Aug. 2010, pp. 43–48.

[25] J. P. Carvallo, X. Franch, and C. Quer, ‘‘Managing non-technical require-ments in COTS components selection,’’ in Proc. 14th IEEE Int. Require-ments Eng. Conf. (RE), Sep. 2006, pp. 323–326.

[26] E. Kocaguneli, T. Menzies, and J. W. Keung, ‘‘On the value of ensembleeffort estimation,’’ IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1403–1416,Nov. 2012.

[27] R.Wettel, M. Lanza, and R. Robbes, ‘‘Software systems as cities,’’ in Proc.33rd Int. Conf. Softw. Eng. (ICSE), 2011, p. 551.

[28] W. Maalej, Z. Kurtanović, H. Nabil, and C. Stanik, ‘‘On the automaticclassification of app reviews,’’ Requirements Eng., vol. 21, pp. 311–331,Sep. 2016.

[29] D. Ståhl and J. Bosch, ‘‘Modeling continuous integration practice differ-ences in industry software development,’’ J. Syst. Softw., vol. 87, pp. 48–59,Jan. 2014.

[30] B. Fitzgerald and K.-J. Stol, ‘‘Continuous software engineering:A roadmap and agenda,’’ J. Syst. Softw., vol. 123, pp. 176–189, Jan. 2017.

[31] A. Vetro, R. Dürre, M. Conoscenti, D. M. Fernández, and M. Jørgensen,‘‘Combining data analytics with team feedback to improve the estima-tion process in agile software development,’’ Found. Comput. Decis. Sci.,vol. 43, no. 4, pp. 305–334, 2018.

[32] M. Kuruba, P. Shenava, and J. James, ‘‘Real-time DevOps analytics inpractice,’’ in Proc. QuASoQ, 2018, p. 40.

[33] B. Snyder and B. Curtis, ‘‘Using analytics to guide improvement duringan Agile–DevOps transformation,’’ IEEE Softw., vol. 35, no. 1, pp. 78–83,Jan. 2018.

[34] M. Kim, T. Zimmermann, R. DeLine, and A. Begel, ‘‘Data scientists insoftware teams: State of the art and challenges,’’ IEEE Trans. Softw. Eng.,vol. 44, no. 11, pp. 1024–1038, Nov. 2018.

[35] D. Cukier, ‘‘DevOps patterns to scale Web applications using cloud ser-vices,’’ in Proc. Companion Publication Conf. Syst., Program., Appl.,Softw. Humanity (SPLASH), Oct. 2013, pp. 143–152.

[36] D. G. Feitelson, E. Frachtenberg, and K. L. Beck, ‘‘Development anddeployment at Facebook,’’ IEEE Internet Comput., vol. 17, no. 4, pp. 8–17,Jul. 2013.

[37] P. Rodríguez et al., ‘‘Continuous deployment of software intensive prod-ucts and services: A systematic mapping study,’’ J. Syst. Softw., vol. 123,pp. 263–291, Jan. 2017.

[38] S. Neely and S. Stolt, ‘‘Continuous delivery? Easy! Just change every-thing (well, maybe it is not that easy),’’ in Proc. Agile Conf., Aug. 2013,pp. 121–128.

[39] E. L. Vargas, J. Hejderup, M. Kechagia, M. Bruntink, and G. Gousios,‘‘Enabling real-time feedback in software engineering,’’ in Proc. ICSE,2018, pp. 21–24.

[40] S. Dueñas, V. Cosentino, G. Robles, and J.M.Gonzalez-Barahona, ‘‘Perce-val,’’ in Proc. 40th Int. Conf. Softw. Eng. Companion (ICSE), 2018,pp. 1–4.

[41] M. van den Brand, S. Roubtsov, and A. Serebrenik, ‘‘SQuAVisiT:A flexible tool for visual software analytics,’’ in Proc. 13th Eur. Conf.Softw. Maintenance Reeng., 2009, pp. 331–332.

[42] H. Huijgens, D. Spadini, D. Stevens, N. Visser, and A. van Deursen, ‘‘Soft-ware analytics in continuous delivery: A case study on success factors,’’ inProc. 12th ACM/IEEE Int. Symp. Empirical Softw. Eng. Meas. (ESEM),2018, pp. 1–10.‘

[43] S. Martínez-Fernández, A. Jedlitschka, L. Guzman, and A. M. Vollmer,‘‘A quality model for actionable analytics in rapid software development,’’in Proc. 44th Euromicro Conf. Softw. Eng. Adv. Appl. (SEAA), Aug. 2018,pp. 370–377.

[44] P. Ram, P. Rodriguez, and M. Oivo, ‘‘Software process measurement andrelated challenges in agile software development: A multiple case study,’’in Product-Focused Software Process Improvement. Cham, Switzerland:Springer, 2018, pp. 272–287.

[45] N. Marz and J. Warren, Big Data: Principles and Best Practices ofScalable Real-Time Data Systems. Shelter Island, NY, USA: Manning,2013.

[46] P. Runeson and M. Höst, ‘‘Guidelines for conducting and reporting casestudy research in software engineering,’’ Empirical Softw. Eng., vol. 14,no. 2, pp. 131–164, Apr. 2009.

[47] L. López et al., ‘‘Q-rapids tool prototype: Supporting decision-makersin managing quality in rapid software development,’’ in InformationSystems in the Big Data Era. Cham, Switzerland: Springer, 2018,pp. 200–208.

[48] V. Venkatesh and H. Bala, ‘‘Technology acceptance model 3 anda research agenda on interventions,’’ Decis. Sci., vol. 39, no. 2,pp. 273–315, May 2008.

[49] V. McKinney, K. Yoon, and F. M. Zahedi, ‘‘The measurement of Web-customer satisfaction: An expectation and disconfirmation approach,’’ Inf.Syst. Res., vol. 13, no. 3, pp. 296–315, Sep. 2002.

[50] D. L. Goodhue and R. L. Thompson, ‘‘Task-technology fit andindividual performance,’’ MIS Quart., vol. 19, no. 2, pp. 213–236,Jun. 1995.

[51] Y. W. Lee and D. M. Strong, ‘‘Knowing-why about data processesand data quality,’’ J. Manage. Inf. Syst., vol. 20, no. 3, pp. 13–39,Dec. 2003.

[52] M. Tavakol and R. Dennick, ‘‘Making sense of Cronbach’s alpha,’’ Int. J.Med. Educ., vol. 2, pp. 53–55, Jun. 2011.

[53] J. Daniel, Sampling Essentials: Practical Guidelines for Making SamplingChoices. Thousand Oaks, CA, USA: SAGE, 2012.

[54] M. B. Miles, A. M. Huberman, and J. Saldaña, Qualitative DataAnalysis?: A Methods Sourcebook. Newbury Park, CA, USA: Sage,2020.

[55] F. Wilcoxon, ‘‘Individual comparisons by ranking methods,’’ BiometricsBull., vol. 1, no. 6, pp. 80–83, Dec. 1945.

[56] V. Braun and V. Clarke, ‘‘Using thematic analysis in psychology,’’ Quali-tative Res. Psychol., vol. 3, no. 2, pp. 77–101, Jan. 2006.

[57] M. Manzano, E. Mendes, C. Gómez, C. Ayala, and X. Franch, ‘‘UsingBayesian networks to estimate strategic indicators in the context of rapidsoftware development,’’ in Proc. 14th Int. Conf. Predictive Models DataAnal. Softw. Eng. (PROMISE), 2018, pp. 52–55.

[58] S. Martínez-Fernández, P. S. M. Dos Santos, C. P. Ayala, X. Franch,and G. H. Travassos, ‘‘Aggregating empirical evidence about the benefitsand drawbacks of software reference architectures,’’ in Proc. Int. Symp.Empirical Softw. Eng. Meas., Oct. 2015, pp. 154–163.

SILVERIO MARTÍNEZ-FERNÁNDEZ receivedhis PhD degree in Computing from the UniversitatPolitècnica de Catalunya (UPC-BarcelonaTech),Spain. He was then a Post-Doctoral Fellow of theEuropean Research Consortium for Informaticsand Mathematics (2 years).

Since 2018, he has been a researcher at theFraunhofer Institute for Experimental SoftwareEngineering IESE in Germany. His research inter-ests include software engineering, software refer-

ence architectures, and data-driven development.Dr. Martínez-Fernàndez has been PC co-chair at conferences like PRO-

FES, CESI@ICSE, and QuASD@PROFES, and a PCmember of conferencetracks like ESEM, ICSME, ECSA, and CIbSE. Recently, he has been theleader of the ‘‘Evaluation’’ Work Package in the H2020 Q-Rapids project.More information at http://www.essi.upc.edu/~smartinez.

VOLUME 7, 2019 68237

http://dx.doi.org/10.1007/978-3-540-71649-5


ANNA MARIA VOLLMER received her M.Sc, inComputer Science from the University of Kaiser-slautern in 2016.

She is currently working as a researcher atthe Fraunhofer Institute for Experimental Soft-ware Engineering IESE in Germany. Her researchinterests include empirical software engineering,especially quality evaluations and data-drivenimprovements.

She is actively involved in the ‘‘Evaluation’’Work Package in the H2020 Q-Rapids project.

ANDREAS JEDLITSCHKA is head of the depart-ment ‘‘Data Engineering’’ at the Fraunhofer Insti-tute for Experimental Software Engineering IESEin Germany, where he focuses on systematicapproaches to topics related to data. His researchinterest is empirically-based and data-driven deci-sion support. Currently, the focus of his work is onbusiness value and the use of large, heterogeneousdata sources for detecting patterns, in particulardeviations from those. His most cited work is

‘‘Guidelines for reporting controlled experiments in software engineering’’.Dr.-Ing. Jedlitschka is a member of several steering committees (e.g.,

ESEM, PROFES) and he served as a Program Co-Chair of ESEM 2016 andas the General Chair of PROFES2013.

XAVIER FRANCH is a professor in SoftwareEngineering at the Universitat Politècnica deCatalunya (UPC-BarcelonaTech). He received hisPhD degree in Informatics from UPC in 1996.

His research interest embraces many fieldsin software engineering, including requirementsengineering, empirical software engineering, opensource software, and agile software development.

Prof. Franch is a member of the IST, REJ,IJCIS, and Computing editorial boards, Journal

First chair of JSS, and Deputy Editor of IET Software. He served asa PC chair at RE’16, ICSOC’14, CAiSE’12, and REFSQ’11, amongothers, and as GC for RE’08 and PROFES’19. More information athttps://www.essi.upc.edu/~franch.

LIDIA LÓPEZ received her PhD degree in Com-puting from the Technical University of Catalonia(UPC-BarcelonaTech), Spain, in 2013.

Currently, she is a research fellow at UPC-Barcelona Tech, in the Software and ServicesEngineering research group (GESSI). From 2007to 2012, she worked as an assistant teacher at thesame university. Her research interests are relatedto software engineering and empirical softwareengineering.

Dra. López has been PC co-chair at international conferences and work-shops (CIbSE, iStar) and a PC member of several international conferenceslike RCIS, ICSOFT, SAC, and CIbSE. She has also reviewed papers forjournals including IST, JSS, and IEEE Software.

PRABHAT RAM received his M.Sc. in Informa-tion Processing Science from the University ofOulu, Finland, in 2016.

He is currently working as a DoctoralResearcher at the University of Oulu, in the Empir-ical Software Engineering in Software, Systemsand Services (M3S) research unit. His researchinterests include empirical software engineering,software measurement, Internet of Things, andsystem security and privacy.

PILAR RODRÍGUEZ PhD 2013 ComputerScience (University of Oulu, Finland); M.Sc.2008 Computer Science (Technical Universityof Madrid, Spain); B.Sc. 2006 Computer Sci-ence (Technical University of Madrid, Spain);Post-doctoral researcher at the Empirical Soft-ware Engineering in Software, Systems and Ser-vices (M3S) research unit of the University ofOulu (Finland). Her research centers on agileand lean software development, software quality,

value-based decision-making, and human factors in software engineering.Dr. Rodríguez’s research has been published in premier international soft-ware engineering journals and conferences. She is a member of the ReviewBoard of EMSE and has served as a reviewer for leading SE journals, likeTSE, EMSE, and IST. She has been a PC member of conferences such asESEM and XP. Recently, she has been the leader of Work Package 2 in theH2020 Q-Rapids project.

SANJA AARAMAA is RF Platforms DU Opera-tions Owner at Nokia Mobile Networks (MN). Inthis position, she is responsible for planning andexecuting projects related to R&D tools and pro-cesses, IT and cloud environments within RF Plat-forms. The objective of these projects is to ensurethat MN level decisions on IT & tool roadmaps,harmonization targets, as well as tool strategiesare deployed in RF Platforms. She provides DUfeedback and tool-related requirements towards

relevantMN level bodies and forums. She got her PhD in 2017 in InformationProcessing Science and her M.Sc. in 2008 in Mathematics from the Univer-sity of Oulu. She has conducted empirical research since 2008 in severalpublicly funded (ITEA, Business Finland, H2020) research projects in closecollaboration with industry. Currently she is the H2020 Q-Rapids ProjectManager for Nokia.

ALESSANDRA BAGNATO is head of the researchunit within the Softeam R&D Department.She holds a PhD in Computer Science fromTELECOM SudParis and Université Evry Vald’Essonne, France and a M.Sc. in Computer Sci-ence from the University of Genoa, Italy. Shecoordinates the Softeam R&D team’s researchactivities covering fields such as model-basedsystem engineering (MBSE), design of Cyber-Physical Systems of Systems (CPSoS), Cloud and

Big Data in application areas such as eGovernment, eHealth, and Agricul-ture. She worked at TXT Corporate Research Division headquartered inMilan from 1999 till September 2012 where she was involved as ProjectCoordinator, Project Manager, and/or Technical Leader in several Europeanresearch projects. She is currently coordinating the ITEA 3 MEASURE(Measuring Software Engineering project) and leading the Case Studiesand Integration work package in the CROSSMINER project, which aimsat enabling monitoring, in-depth analysis, and evidence-based selection ofopen-source components and facilitating knowledge extraction from largeopen-source software repositories.

68238 VOLUME 7, 2019


MICHAŁ CHORAŚ holds a professor position atthe University of Science and Technology in Byd-goszcz (UTP) where he is the head of the Tele-informatics Systems Division and the PATRASResearchGroup. He has been involved inmany EUprojects (e.g., SocialTruth, CIPRNet, Q-Rapids,INSPIRE). His research interests include pat-tern recognition in several domains, e.g., cyber-security, image processing, biometrics, and criticalinfrastructures protection. He is the author of over

200 reviewed scientific publications.

JARI PARTANEN obtained his M.Sc. degree inIndustrial Management from the University ofOulu, Finland, in 1990. Jari Partanen is an activeresearcher with some 15 peer-reviewed publica-tions.

Currently he is the Head of Quality and Envi-ronment at Bittium. Bittium (www.bittium.com) asa company has undergone a transition towards aleaner and more agile way of working in the lastfew years and has taken up the use of approaches

like Continuous Integration towards Continuous Deployment, Continuousand Transparent Planning, as well as approaches like embedded DevOpspractices. Over time, the results have yielded major improvements incustomer-perceived quality and lead time as well as among the staff, reflect-ing the cultural changes of the company.

Mr. Partanen’s research interests include topics such as methods for con-tinuous planning, real-time value delivery methods, innovation exploitation,as well as mass customization techniques. Recently he has been the Exploita-tion and Innovation Manager in the H2020 Q-Rapids project.

VOLUME 7, 2019 68239

continuously assessing and improving software quality with...

Documents