data-centric concurrency control on the java programming...

Daniel Luis Landeiroto Parreira

Licenciado em Engenharia Informática

Data-Centric Concurrency Control on the

Java Programming Language

Dissertação para obtenção do Grau de Mestre emEngenharia Informática

Orientador: Prof. Doutor Hervé Miguel Cordeiro

Paulino

Júri:

Presidente: Profª. Doutora Margarida Paula Neves Mamede

Arguentes: Prof. Doutor Francisco Cipriano da Cunha Martins

Vogais: Prof. Doutor Hervé Miguel Cordeiro Paulino

Novembro, 2013

iii

Data-Centric Concurrency Control on the Java Programming Language

Copyright © Daniel Luis Landeiroto Parreira, Faculdade de Ciências e Tecnologia, Uni-versidade Nova de Lisboa

A Faculdade de Ciências e Tecnologia e a Universidade Nova de Lisboa têm o direito,perpétuo e sem limites geográficos, de arquivar e publicar esta dissertação através de ex-emplares impressos reproduzidos em papel ou de forma digital, ou por qualquer outromeio conhecido ou que venha a ser inventado, e de a divulgar através de repositórioscientíficos e de admitir a sua cópia e distribuição com objectivos educacionais ou de in-vestigação, não comerciais, desde que seja dado crédito ao autor e editor.

To all that supported me.

Whoever fights monsters should see to it that in the process hedoes not become a monster. And if you gaze long enough into an

abyss, the abyss will gaze back into you.- Friedrich Nietzsche

Acknowledgements

In first place, I would like to thank my coordinator, Prof. Hervé Paulino, for all theunconditional support given during the elaboration of this thesis. Without his guidance,this work would not be possible. I would also like to thank the Department of Informaticsof the Faculty of Sciences and Technology of the New University of Lisbon for all thesupport, not only by the installations provided, but also from the staff.

A special thanks towards all my family for supporting me throughout all these years.Additionally, a special thanks to Prof. Nuno Preguiça and the RepComp project

(PTDC/EIA-EIA/108963/2008) for allowing me to access the machine used to run thebenchmarks that were used to evaluate the performance of the work developed in thisthesis.

vii

Abstract

The multi-core paradigm has propelled shared-memory concurrent programming toan important role in software development. Its use is however limited by the constructsthat provide a layer of abstraction for synchronizing access to shared resources. Reason-ing with these constructs is not trivial due to their concurrent nature. Data-races anddeadlocks occur in concurrent programs, encumbering the programmer and further re-ducing his productivity.

Even though the constructs should be as unobtrusive and intuitive as possible, per-formance must also be kept high compared to legacy lock-based mechanism. Failure toguarantee similar performance will hinder a system from adoption.

Recent research attempts to address these issues. However, the current state of theart in concurrency control mechanisms is mostly code-centric and not intuitive. Its code-centric nature requires the specification of the zones in the code that require synchroniza-tion, contributing to the decentralization of concurrency bugs and error-proneness of theprogrammer. On the other hand, the only data-centric approach, AJ [VTD06], exposesexcessive detail to the programmer and fails to provide complete deadlock-freedom.

Given this state of the art, our proposal intends to provide the programmer a setof unobtrusive data-centric constructs. These will guarantee desirable security proper-ties: composability, atomicity, and deadlock-freedom in all scenarios. For that purpose,a lower level mechanism (ResourceGroups) will be used. The model proposed resides onthe known concept of atomic variables, the basis for our concurrency control mechanism.

To infer the efficiency of our work, it is compared to Java synchronized blocks, trans-actional memory and AJ, where our system demonstrates a competitive performance andan equivalent level of expressivity.

Keywords: Data-centric, Concurrency control, Deadlock-freedom, Atomicity

ix

Resumo

O paradigma multi-core impulsionou a programação concorrente em memória parti-lhada para um lugar de destaque no desenvolvimento de software. O seu uso é limitadopela natureza e poder expressivo das construções de alto nível para sincronizar acesso arecursos partilhados. Raciocinar sobre estas construções não é trivial devido à sua natu-reza concorrente. Podem ocorrer data-races e deadlocks, reduzindo a produtividade.

Embora as construções devam ser intuitivas, o seu desempenho também deve sersemelhante a mecanismos baseado em locks. Neste contexto, a preservação de um de-sempenho equivalente é fundamental para a adopção generalizada de um dado sistema.

Estas questões têm sido alvo de bastante investigação nos anos recentes. No entanto,o actual estado da arte em controlo de concorrência é centrado no código e nem sempreintuitivo. A sua natureza centrada no código requer que se especifique onde se requersincronização, contribuindo para a descentralização de erros de concorrência. Por outrolado, a única abordagem centrada nos dados , AJ [VTD06], expõe demasiados detalhesao programador e não consegue assegurar a ausência de deadlocks em todos os cenários.

Dado este estado da arte, a nossa proposta pretende fornecer ao programador umconjunto de construções centradas nos dados para controlo de concorrência, que deve-rão garantir atomicidade, composicionalidade e ausência de deadlocks em todos os cená-rios. Para esse propósito, será usado um sistema de mais baixo nível (ResourceGroups).O modelo proposto reside no conhecido conceito de variável atómica, a base do nossomecanismo de gestão de concorrência.

De modo a aferir a eficiência da nossa solução, esta é comparada com os blocos sin-cronizados do Java, memória transacional e o AJ, onde o nosso sistema demonstra umaperformance competitiva e uma expressividade equivalente.

Palavras-chave: Data-centric, Controlo de concorrência, Livre de deadlocks, Atomicidade

xi

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Types of scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Two approaches to synchronization: code-centric and data-centric 6

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 State of the Art 11

2.1 On the handling of deadlock situations . . . . . . . . . . . . . . . . . . . . 11

2.2 Data-centric Concurrency Control . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Atomic Sets (AJ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Resource Groups (RG) . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Code-Centric Concurrency Control . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.1 AutoLocker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.2 Lock Inference for Atomic Sections . . . . . . . . . . . . . . . . . . . 20

2.3.3 Lock Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.4 Inferring locks for atomic sections . . . . . . . . . . . . . . . . . . . 22

2.3.5 Deadlock Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.5.1 Shared Memory Deadlock-Free Semantics . . . . . . . . . 23

2.3.5.2 The theory of deadlock avoidance via discrete control . . 23

2.3.6 Transactional Memory (TM) . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.6.1 Conflict detection and resolution . . . . . . . . . . . . . . 25

2.3.6.2 Supporting I/O and System Calls within Transactions . . 26

2.3.7 Adaptive Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xiii

xiv CONTENTS

3 From Atomic Variables to Data-Centric Concurrency Control 293.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.1 Deadlock-freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Implementation 394.1 A Brief Introduction to javac . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1.1 Nodes and symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1.2 Compilation stages and considerations . . . . . . . . . . . . . . . . 41

4.1.3 Plugin binding points . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.4 Decomposing the implementation phases . . . . . . . . . . . . . . . 43

4.2 Atomic Type Identification and Integration . . . . . . . . . . . . . . . . . . 44

4.2.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2.2 Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2.3 Creation of a class for global partitions . . . . . . . . . . . . . . . . 47

4.3 Converting Annotated Variables into Atomic Variables . . . . . . . . . . . 48

4.3.1 Converting to atomic types . . . . . . . . . . . . . . . . . . . . . . . 48

4.3.2 Fixing atomic type constructors . . . . . . . . . . . . . . . . . . . . . 48

4.3.3 Fixing method return types . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.4 Inferring the single use of atomic type constructors as arguments . 50

4.3.5 Homogenization of the abstract syntax tree . . . . . . . . . . . . . . 52

4.4 Deadlock-free Lock Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4.1 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.2 Lock coalescing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4.3 Lock removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.5 Mapping onto the ResourceGroups API . . . . . . . . . . . . . . . . . . . . . 59

4.5.1 Isolating return expressions . . . . . . . . . . . . . . . . . . . . . . . 60

4.5.2 Injecting partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.5.3 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.3.1 Filtering resources to acquire from the atomic variables . 62

4.5.3.2 Multi-resource Mapper . . . . . . . . . . . . . . . . . . . . 62

4.5.3.3 Single Resource Mapper . . . . . . . . . . . . . . . . . . . 65

4.5.4 Injecting the missing imports . . . . . . . . . . . . . . . . . . . . . . 65

5 Evaluation 675.1 Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1.1 versus AJ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1.2 versus atomic sections . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

CONTENTS xv

5.2.1 Comparison with AJ . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.2.2 Comparison against synchronized blocks and DeuceSTM . . . . . . 755.2.3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Conclusion 83

A Appendix of the annotated code 89

B Appendix of generated code 95

xvi CONTENTS

List of Figures

1.1 A circular-wait involving two threads holding one resource each. . . . . . 3

2.1 Finite state machine to represent scenario Ru(l),Wu′(l),Wu(l). . . . . . . . 16

4.1 Compilation stages in the javac compiler. . . . . . . . . . . . . . . . . . . . 414.2 Type transformations performed. . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Order of resource acquisition after the attribution of a partition to the linked

list example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4 Graph representation of dependency chains in a simplified hash set con-

taining only the add, findIndex and rehash methods. . . . . . . . . . . . . 56

5.1 TSP benchmark performance comparison. . . . . . . . . . . . . . . . . . . . 755.2 Bank benchmark performance comparison. . . . . . . . . . . . . . . . . . . 765.3 HashSet benchmark performance comparison. . . . . . . . . . . . . . . . . 775.4 HashSet benchmark performance comparison, after tweaking the add method

to reduce the usage of the write-lock. . . . . . . . . . . . . . . . . . . . . . . 785.5 RBTree benchmark performance comparison. . . . . . . . . . . . . . . . . . 795.6 LinkedList benchmark performance comparison. . . . . . . . . . . . . . . . 805.7 LinkedList benchmark performance comparison against a variant without

the superfluous partition and using fine-grained locks. . . . . . . . . . . . 81

xvii

xviii LIST OF FIGURES

List of Tables

2.1 Constructs provided to manipulate resource groups. . . . . . . . . . . . . . 162.2 Comparison of legacy and state of the art systems. . . . . . . . . . . . . . . 28

4.1 API provided by the partitioner. . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1 Comparison between the number of annotations required in AJ and oursystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Overview of properties offered by our system and by AJ. . . . . . . . . . . 695.3 Comparison between the number of annotations required in DeuceSTM

and our system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

xix

xx LIST OF TABLES

Listings

1.1 Scenario exemplifying a deadlock during a transfer using mutex locks. . . 4

1.2 A dynamic scenario during the removal of a node from a doubly linked list 6

1.3 Two-phase locking involving atomic sections. . . . . . . . . . . . . . . . . . 7

1.4 Example of a data-centric approach to concurrency as seen in [DHM+12]. 7

2.1 AJ atomic(s) construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 AJ owned(s) construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 AJ unitfor construct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Augmenting AJ with ordering annotations. . . . . . . . . . . . . . . . . . . 14

2.5 Example of a problematic data access pattern (Ru(l),Wu′(l),Wu(l)), as iden-tified in [HDVT08]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Usage of resource groups in static scenarios. . . . . . . . . . . . . . . . . . 17

2.7 Usage of resource groups in dynamic scenarios. . . . . . . . . . . . . . . . 17

2.8 The annotations required to implement an hash table using AutoLocker. . 20

2.9 Augmenting the hash table from Listing 2.8 using fine-grained locks. . . . 20

2.10 Atomic section in a transactional memory environment. . . . . . . . . . . . 24

3.1 Syntactic application of our system to a bank transfer between two accounts. 32

3.2 Syntactic application of our system to the implementation of a simply-linked list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Compilation failures of a program where the atomic types were not prop-agated correctly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Type mutation applied to the bank transfer example. . . . . . . . . . . . . 36

3.5 Type mutation applied to the implementation of a simply-linked list. . . . 37

4.1 Example of an atomic type class generated for the Node class of the linkedlist benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Empty GlobalPartitionsHolderClass created to hold global partitions. . . 47

4.3 Conversion of atomic types applied to the linked list example. . . . . . . . 49

4.4 Conversion of the return type of the getNext method of the Node class fromthe linked list example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

xxi

xxii LISTINGS

4.5 Example of the superfluous variable declaration avoided by the Convert-MethodInvocationsTypePhase phase. . . . . . . . . . . . . . . . . . . . . . 50

4.6 Conversion of the constructor type used in the setNext method of thelinked list example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.7 Lock inference algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.8 Example of the return expression isolation in the contain method of the

RBTree benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.9 GlobalPartitionsHolderClass with two global partition injected. . . . . . 614.10 LinkedList class with a local partition injected. . . . . . . . . . . . . . . . . 614.11 Final result of the multi-resource mapper for the index method of the hash

set benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.12 Example of contain method of the RBTree benchmark after the use of the

single-resource mapper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.1 An example of a bank transfer between two accounts that incurs a high-

level data-race in our system. . . . . . . . . . . . . . . . . . . . . . . . . . . 705.2 An example of a bank transfer between two accounts that incurs a high-

level data-race in AJ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.3 Fixing the high-level data-race presented in Listing 5.1 using our system. . 715.4 Fixing the high-level data-race presented in Listing 5.2 using AJ. . . . . . . 72A.1 Original IntSetHash class of the hash set benchmark. . . . . . . . . . . . . . 89A.2 Original IntSetLinkedList class of the linked list benchmark. . . . . . . . . 91A.3 Original Node class of the linked list benchmark. . . . . . . . . . . . . . . . . 93B.3 Generated IntSetHash class of the hash set benchmark. . . . . . . . . . . . . 95B.1 Implementation of the new add method in the tweaked hash set benchmark. 95B.2 Mapping of the add method in the linkedlist benchmark without the parti-

tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96B.4 Generated IntSetHash class of the hash set benchmark, tweaked for perfor-

mance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99B.5 Generated IntSetLinkedList class of the linked list benchmark. . . . . . . . 102B.6 Generated IntSetLinkedList class of the linked list benchmark if the parti-

tion was ignored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103B.7 Generated Node class of the linked list benchmark. . . . . . . . . . . . . . . . 105

1Introduction

1.1 Motivation

Ever since the stalling of single core frequency scaling, largely due to physical limits ofthe materials [EBA+11], processor development has shifted to multi-core oriented archi-tectures. In order to tap into the full potential of the new architectures, a change in men-talities was required. Algorithm efficiency is no longer the single most relevant aspectwhen measuring program performance: the ability to distribute the computation acrossmultiple cores is gaining importance.

In the near future, consumer computers running 16 or 32 cores will be commonplace.In the meantime, distributed algorithms and concurrent programming will become in-creasingly more important. However, re-engineering an algorithm to run concurrentlymight not be viable, and certainly not a trivial task. Although using libraries that ab-stract hardware threading, such as POSIX threads, programmers are still encumbered bythe burdens brought by concurrent programming.

Only recently have mainstream applications began actively supporting concurrency.Consider the libjpeg and libjpeg-turbo libraries. Even though they are almost omnipresentin modern computer systems, multi-threaded decoding is still not officially supported[Jin12].

Concurrent programming in a shared-memory environment brings the need for syn-chronization across multiple threads [Dij65], as access to shared data in a multi-threadedenvironment has to be sanitized. The non-deterministic nature of those accesses greatlycontributes to the difficulty in debugging and proving the correctness of concurrent pro-grams.

The arbitrary ordering of operations can easily result in bugs. They have origin in

1

1. INTRODUCTION 1.2. Problem

the unexpected interleaving of accesses to shared memory that was not foreseen by theprogrammer. Those events are commonly known as data-races.

Data-races are in the genesis of many errors in shared-memory programming. Theyoccur when multiple threads are allowed to concurrently access shared data and one ofthem performs a write operation. These scenarios are not explicit in the code, and requirespecial attention from the programmer when shared memory communication is in place.Since shared data, like global variables and shared references, can be accessed in variousparts of the code, the programmer has to be aware of such access patterns or else data-races could occur.

A commonly used approach to prevent data-races is to force a sequence of statementsto be evaluated in mutual exclusion regarding the other threads in the system, ensuringatomicity. However, mutual exclusion brings another problem: deadlocks. In a multipro-gramming environment, several processes may compete for a finite number of resources.If the order in which they attempt to acquire them falls under a scenario where furtherprogression is impossible, a deadlock has occurred.

Accordingly, the need for simple and effective constructs that avoid those pitfalls andallow the programmer to control accesses to shared memory is currently a pressing re-search topic.

1.2 Problem

Providing useful concurrency control mechanisms is challenging. The high level con-structs should be intuitive to use by the programmer and guarantee a certain degree ofscalability and security properties, such as atomicity and deadlock-freedom. However,the challenges themselves conflict with each other. Maximizing performance and con-currency might require additional information that is naturally conveyed through moreannotations, resulting in less intuitive constructs.

Thus, these challenges cannot be addressed individually. New concurrency controlmechanisms have to adopt a phase of experimentation and optimization for finding thebest balance of trade-offs that the system can offer to the programmer.

At the present, mutex locks are still the de facto concurrency control mechanism. Theyallow the programmer to enforce mutual exclusion in the evaluation of a sequence ofprogram statements. This mechanism further adds to the complexity of the code, with-out being an optimal solution. One of the problems associated with locks is deadlock-proneness and the decentralization of concurrency errors, particularly as the number oflocks increases. Since locks have to be applied in all sections of code where the program-mer wishes to protect a given computation from concurrent accesses, the application ofconcurrency control primitives is decentralized, leading to decentralization of the loca-tion of concurrency errors. Additionally, deadlocks that can arise in while using locks aretough to pinpoint and their usage entails a higher degree of error-proneness comparedto unsynchronized programs. As such, research in the area for constructs that ensure

2


deadlock-freedom is ongoing.

1.2.1 Deadlocks

Formally, when multiple threads T compete over the acquisition of a finite set of re-sources R, eventually an attempt to acquire a resource Rp will fail because it is alreadyheld by another thread Ti′ . As a result, thread Ti blocks while waiting for resource Rp

to become available. In the meanwhile, thread Ti′ might attempt to acquire resource Rq

that is already held by thread Ti. This prompts thread Ti′ to block while waiting for theresource to become available, completing the circular-wait. This situation, illustrated inFigure 1.1, embodies a deadlock. A popular example of a deadlock situation was E. W.Dijkstra’s Dinning Philosophers problem [Dij65].

T1

R2

R1

T2

T1 waiting for R2 R2 held by T2

T2 waiting for R1R1 held by T1

Figure 1.1: A circular-wait involving two threads holding one resource each.

The requirements for a deadlock to occur have long been identified in the literature[SGG05]. They are characterized by four conditions which have to be simultaneouslyverified:

mutual exclusion only one process can hold a resource at any given time

hold and wait a process holding at least one resource is waiting to acquire additionalresources

no preemption only the process holding a resource can release it

circular wait a set of waiting processes P0, ..., Pn must exist such that each is waiting fora resource held by another process, e.g. P0 waits for the resource held by P1, ..., Pn

waits for the resource held by P0. Formally we have :

∀i@j : i, j ∈ {0, ..., n} ∧ i 6= j ∧ Pi waits on Pj

To illustrate real scenarios in which deadlocks might occur, a banking application willbe used as an example. Besides being an intuitive example to which the reader relateswith, it provides an interesting case for discussing concurrency related concerns, being

3


void transfer(Account A, Account B, int amount){

A.lock();B.lock();A.take(amount);B.give(amount);B.unlock ();A.unlock ();

}

...

transfer(accA , accB , 10); // executed by thread #1...transfer(accB , accA , 10); // executed by thread #2

Listing 1.1: Scenario exemplifying a deadlock during a transfer using mutex locks.

widely used as an example in the literature [Jon07, ST95]. This interest steams from theconsistency and atomicity requirements when dealing with transactions between bankaccounts. The system can never be in a state, externally observed, in which the money isalready retrieved from one account but not yet deposited in another account.

To avoid an inconsistent external state, transfers have to be atomic. For this, the pro-grammer could lock both accounts, ensuring mutual exclusion for the transfer, and re-leasing the locks at the end so that other transfers can take place, as shown in Listing 1.1.

However, even though atomicity is guaranteed, a deadlock could occur. The deadlockarises when the first thread only attains the accA lock and the second thread only attainsthe accB lock. Neither will be able to attain the next lock needed to process the transferbecause the required lock is held by the other thread. This completes the circular-wait,entering a deadlock.

However, if the locks were attained in the same order, no deadlock could occur. Whentwo or more resources are involved, locking them according to a global order is an effec-tive way to prevent deadlocks. This approach eliminates deadlocks in cases where cyclicdependencies are present.

While applying locks to a small segment of code can appear trivial at first sight, thedata that is guarded by those same locks may, naturally, be accessed by other threadsacross the program. This burdens the programmer with concerns that demand an evalu-ation of the impact of those accesses in the correctness of the code segment enclosed bythe lock synchronization primitives. Improper evaluation of such accesses (and subse-quent synchronization, if required) can lead to data-races. Such code-centric approachrequires a non-local reasoning and a vast knowledge of the code by the programmer[VTD06].

On top of that, callers need to be aware of the concurrency management requirementsof their callees, namely which locks they are taking. This implies knowing their imple-mentation to avoid deadlock scenarios. As such, systems based on lock mechanisms are

4


highly uncomposable and hard to maintain.

As expound, the extensive use of locks burdens the programmer with many decisions.The number of locks to use, where to place them, identifying potential deadlock-pronesituations and how to deal with error-recovery inside regions delimited by locks. Allthese can affect the correctness and efficiency of the program.

1.2.2 Types of scenarios

The effectiveness of a concurrency control mechanism might depend on the type of sce-nario it is presented with: static or dynamic. Most notably, this theme will be referencedin the state of the art (Chapter 2) due to deadlock-freedom provided by the various sys-tems varying according to the type of scenario considered. Due to the way they areimplemented, some can only provide deadlock-freedom in static scenarios. In dynamicscenarios, providing for this property becomes inherently problematic.

Static scenarios represent the majority of scenarios in applications due to its fixednature, as previously shown in Listing 1.1. They represent the scenarios where all theresources to be acquired before evaluating a critical section are possible to infer at compiletime or at runtime, before entering the critical section.

Trivially, if the locks in Listing 1.1 were replaced by a higher level construct, the com-piler could recognize the objects involved in the critical section, which in this scenario areaccount accA and accB, and acquire locks on both objects to ensure atomicity. In order toassure deadlock-freedom in this scenario, the locks would have to be acquired accordingto a previously established global order.

On the other end of the spectrum, in dynamic scenarios, no information can be at-tained at compile time about all the objects that take part in the critical section. A dy-namic scenario in the removal of a node from a doubly-linked list is shown in Listing 1.2.Since the locking of more than one memory position is not atomic, a guard is required toprevent other threads from traversing the list at the same time. As a result, a deadlock-prone situation arises if in another part of the program a node is locked before the guardis invoked. The global nature of dynamic scenarios hinders its reasoning.

Some approaches, such as the one presented in [MZGB06], adopt a conservative ap-proach and refuse to compile if deadlock-freedom cannot be guaranteed. In some cases,tools actually manage to support dynamic scenarios like resource groups (the system thatwill be the backbone of this thesis, as detailed in Section 2.2.2) and [CCG08, HFP06,EFJM07, WLK+09] mentioned in the state of the art.

By contrast, transactional memory attempts to solve this at runtime by rolling back theexecution of a thread whenever the memory operations it performs conflict with changesmade by other threads or when a deadlock is detected (in pessimist approaches).

To take a conservative approach, the compiler can attempt to figure out a coarser al-ternative that guarantees the mutual exclusion for the possible objects in the critical sec-tion. One such approach is to force critical sections in dynamic scenarios to be serialized

5


1 void remove(List L, Node node) {2 atomic{3 for(Node cur in L) {4 if(cur==node) {5 Node prev = cur.prev ;6 Node next = cur.next ;7 prev.next = next ;8 next.prev = prev ;9 }

10 }11 }12 }

Listing 1.2: A dynamic scenario during the removal of a node from a doubly linkedlist

according to a global lock.

Another possible approach is acquiring a lock according to the type of the data thatcan be part of the dynamic set of unknown variables. For example, in a list constitutedof nodes, a list/node global lock could be taken. That way, even though the actual dataused in a dynamic scenario is unknown before entering the critical section, the type ofthe data passive of acquisition is known, hence the lesser known common denominatorused to retain the atomicity.

However, these approaches intending to cope with dynamic scenarios result in a lossof concurrency. Compared to them, the lower level system resource groups achieves aminimal loss of concurrency.

1.2.3 Two approaches to synchronization: code-centric and data-centric

So far, this document has focused on code-centric synchronization. They are the mostwidely used constructs, which explicitly delimitate the sequence of instructions that shouldbe evaluated while holding certain properties, which normally equates to atomicity.

Locks fall into this category by using the lock() and unlock() primitives on lockvariables. Over the years, several constructs to express atomicity have been proposed.Of these, the most widely used is the atomic section proposed by Lomet [Lom77], whichrepresents the guarantee of atomicity and isolation of the enclosed block. Atomic sectionshave been adopted by both pessimistic and optimistic models. For the sake of simplicity,this will be the construct used to symbolize atomicity (and transactions) throughout thisdocument.

Even while being under the same category, locks have arguably lesser visibility forthe programmer. If a section of the code must protect several distinct variables, it isnot unusual to see several lock()/unlock() calls to each variable’s respective lock. Thisflood of function calls that surround the actual code obstructs it, increasing the likelihoodof programmer errors, such as forgetting to unlock one of the many variables initiallylocked.

6


1 atomic { // acquire lock over ’b’23 atomic { // acquire lock over ’a’4 ... // accesses ’a’5 }6 atomic {7 ...8 }9

10 //only when no more locks will be acquired , release phase can start11 //lock over ’a’ and ’b’ can be released12 }

Listing 1.3: Two-phase locking involving atomic sections.

1 class Account {2 atomicse t acc ;3 atomic(acc) int balance ;4 }56 void transfer( u n i t f o r Account A, u n i t f o r Account B, int amount) {7 A.take(amount);8 B.give(amount);9 }

Listing 1.4: Example of a data-centric approach to concurrency as seen in [DHM+12].

Similarly, despite being a cleaner alternative to locks, it is still possible to wronglydelimit the sequence of statements to be enclosed by an atomic section, either by excessor deficiency. The former is more common, extending the atomic sections over statementsthat do not belong in the atomic section, reducing performance needlessly.

The regular approach is to apply two-phase-locking that features two distinct phases:acquisition and release phase. During acquisition phase, locks can only be acquired, butnot released. Once the first lock is released, no locks can be further acquired until alllocks are released.

With this in mind, the resources held by an inner section are potentially only releasedwhen the outer section terminates. Another posterior interior section would still acquirelocks, negative the first inner section the right to release them due to the need to remain inthe acquire phase. As shown in Listing 1.3, the resources held by the two nested atomicsections can only be released at the end of the outer atomic section in order to ensuredeadlock-freedom.

Data-centric approaches contrast with these. The information is conveyed through"what data" the programmer wants to evaluate atomically, not where or how. Errors as-sociated with misplacing constructs are lessened. Protecting a group of data is permanentfor the whole program.

However, data-centric approaches place a heavier burden on the compiler, since it

7


must infer more information. In order to optimize performance, even data-centric ap-proaches might require additional annotations. In reality, a data-centric approach willhave more than just annotations on the data that wants to be protected. In [DHM+12],unitfor is used on function arguments to indicate that the method body should evaluatea set of data atomically, an atomic set.

An example from this work is exemplified in Listing 1.4. Comparing to previous code-centric examples, synchronization information is all centralized on the data, making itsuse trivial in the rest of the code, as shown in the implementation of transfer().

Avoiding duplication of responsibilities in code is tutted as the first step towards re-ducing the occurrence of bugs and increasing legibility of the code. In a data-centricapproach, this is assured. If in the course of program development a block of code needsto be treated atomically, and it relates to the same group of data that was previously pro-tected, no changes are required. The annotations related to this group of data are alreadyin the code. This approach centralizes the protection and makes it very non-intrusive andavoids repetition.

As during development, the approach used also influences the methodology in de-bugging applications. Consider a situation where the team developing an applicationfinds a bug related to concurrency. If that application relies heavily on code-centric con-currency control, debugging a concurrency issue involves rechecking every code-centricannotation, where such set of data may be referenced. In contrast, in data-centric mech-anisms it would suffice to review how the data is protected, where it is declared, insteadof rechecking the whole code.

One problem associated with data-centric constructs is its novelty. Research in thisarea is still very recent, especially when comparing to code-centric alternatives, such astransactional memory. As a result, no data-centric system has seen widespread usageoutside of the academia.

Unfortunately, the only data-centric approach [VTD06, DHM+12] requires the pro-grammer to explicitly assemble the resources into sets and does not guarantee deadlock-freedom in all dynamic scenarios. The ideal would be if the grouping-sets process wouldbe automatically generated by the compiler. The programmer would then be strictly re-stricted to simple annotations that would convey its intent on which data to protect.

The absence of deadlock-freedom also affects more than meets the eye. Unlike lockswhere programmers are arguably more experienced in identifying deadlocks, in a newparadigm such as a data-centric system, that knowledge would have to be re-acquired.

As a consequence, the absence of deadlock-freedom in a new paradigm will greatlyaffect its adoption, emphasizing the need for a new system that provides it.

Our proposal intends to address these problems by providing simple constructs thatwill be mapped into an underlying low level API that provides deadlock-freedom andatomicity in a data-centric setting.

8

1. INTRODUCTION 1.3. Objectives

1.3 Objectives

As elaborated in the previous section, the currently used concurrency management sys-tems have their inherent problems. No system can be said to be objectively superior toall other systems. While the data-centric paradigm has potentially the most benefits, itmust be less obtrusive and provide more security properties in order to compete with theengrained code-centric, lock-based mechanisms.

As such, our objective is to provide the programmer with simple data-centric con-structs to express atomicity, while relieving it from the burdens of deadlocks and data-races. Additionally, the performance penalization should be on par with comparablesystems. Failing to achieve so would be detrimental to the adaptation of this work.

To achieve such, a low level data-centric concurrency control mechanism, named Re-sourceGroups [PD12], will be used to implement the higher level constructs that will beprovided to the programmer. These data-centric constructs, provided by previous re-search work done by Paulino and Delgado, allows multiple resources to be combinedinto groups that can be acquired and released, granting the thread that holds the groupthe right to operate on the elements of the group in mutual exclusion regarding the otherthreads. They guarantee deadlock-freedom and composability, properties that should bepreserved by the high level constructs.

The proposed high level constructs will consist in a set of data-centric Java annota-tions with no explicit reference to sets (or groups) of data, contrasting with AJ, the currentstate of the art in data-centric concurrency control. Comparatively to this work, the an-notations will be reduced to the variables and parameters that require synchronization.

For that purpose, the work will consists in two parts: the model which will be pro-vided to the programmer, and how to implementing it efficiently by mapping into theResourceGroups API.

1.4 Contributions

The contributions of this work are threefold:

• Definition of a data-centric concurrency control system supported by a single data-centric annotation.

• Delineation of a lock inference algorithm that guarantees both deadlock-freedomand atomicity in our system, while minimizing the overhead.

• Implementation of a functional prototype in the Java programming language.

• Evaluation of the productivity of the system developed relative to current state ofthe art code-centric and data-centric concurrency control systems.

9

1. INTRODUCTION 1.5. Structure

1.5 Structure

This document is split into six chapters. The current Chapter attempts to provide a mo-tivation and setting for the need to develop further research in the field of deadlock-freeconcurrency control mechanisms and summarizes the contributions of this thesis. Chap-ter 2 presents a more in-depth view on the current state of the art in concurrency controlmechanisms that provide some degree of deadlock-freedom and a final remark aboutthem. Chapter 3 presents our proposal of a data-centric concurrency control model andits properties. Chapter 4 explains the implementations of our prototype. Chapter 5 fea-tures an evaluation of our model and prototype against other state of the art concurrencycontrol mechanisms. Finally, Chapter 6 features a final overview of the work developedin this thesis.

10

2State of the Art

The work on this thesis focuses on providing a data-centric approach to concurrencymanagement that guarantees deadlock-freedom. Firstly, it presents a brief introductionof the three philosophies for dealing with deadlocks that are adopted by the systems pre-sented throughout this chapter. Then, current state of the art in data-centric concurrencycontrol will be presented along with the remaining systems studied that provide somedegree of deadlock-freedom. Lastly, an overview comparing all the systems and theirrespective benefits and pitfalls.

2.1 On the handling of deadlock situations

Deadlocks represent a real danger to software and a constant concern for programmers.Research in the area is driven by the need to provide solutions to this problem, eitherthrough systems that can either guarantee deadlock-freedom or mitigate deadlock-proneness.

There are three distinct philosophies, well documented in the literature, to deal withdeadlock situations [SGG05] :

deadlock avoidance a dynamic analysis, performed at run-time, that mediates the syn-chronization requests. The requests are only granted when they cannot generate adeadlock. Otherwise, they are rejected or delayed until safe. This method requiresmore information about the use given to the resources by each process.

deadlock prevention a static analysis, performed at compile time, ensures that the fournecessary conditions for a deadlock to occur are never simultaneously verified inthe system. At least one of the necessary conditions must not hold at any giventime.

11

2. STATE OF THE ART 2.2. Data-centric Concurrency Control

c lass Account {atomicse t (account)atomic(account) i n t money;

void transfer(Account A, Account B, i n t amount) {A.take(amount);B.give(amount);

}}

Listing 2.1: AJ atomic(s) construct.

deadlock detection using both static and dynamic analysis, the system monitors exe-cution progress and when a deadlock is detected, measures are taken to recoverfrom the deadlock, such as restarting from a previously checkpointed non-deadlockstate. Optimistic transactional memory uses this approach.

Next we present the most relevant approaches for deadlock-free concurrency con-trol, grouping them according to the philosophy used to provide a varying degrees ofdeadlock-freedom.

2.2 Data-centric Concurrency Control

2.2.1 Atomic Sets (AJ)

The following works are based on the notion of an atomic group that represents the abilityto group a set of resources so that they can be evaluated atomically as a single unit. Theyrepresent the basis for data-centric concurrency control management mechanisms.

The proposal in [VTD06, DHM+12] introduces the notion of atomic sets, a data-centricalternative to code-centric synchronization mechanisms. AJ avoids high and low-leveldata-races by delegating on the programmer the specification of consistency propertiesbetween certain data that should be updated atomically.

The paper goes on to demonstrate that data-races (both low and high level) can bestatically categorized into 11 groups. They can be eliminated using the constructs pro-posed, resulting in fewer annotations required to provide equivalent synchronization inprograms. Further static analysis is used in the paper to automatically infer the points inthe code where synchronization primitives are required to avoid data-races.

In total, four constructs are proposed:

atomicset(s) creates a new atomic set. All operations over data in this atomic set shouldbe done atomically, in a unit of work, as illustrated in Listing 2.1.

atomic(s) binds a variable to an atomic set s.

owned(s) in addition to binding a variable to an atomic set s, binds the reference it refer-ences to the same atomic set s, as illustrated in Listing 2.2. Recursively expressed

12


1 c lass LinkedList {2 atomicse t (list);3 atomic(list) i n t size;4 owned(entry) atomic(list) Entry header;56 p u b l i c Object set( i n t index , Object value) {7 Entry e = entry(index);8 Object oldVal = e.value;9 e.value = value;

10 r e t u r n oldVal;11 }12 }1314 c lass Entry{15 atomic(entry) Object value;16 owned(entry) atomic(entry) Entry next;17 }

Listing 2.2: AJ owned(s) construct.

1 c lass Bank {2 p u b l i c void transfer( u n i t f o r Account a, u n i t f o r Account b, i n t n) {3 transfer(a, b, n);4 }5 }

Listing 2.3: AJ unitfor construct.

data structures, such as linked lists, use this annotation to unify a chain of refer-ences in the same atomic set. In the case of a linked list, since each node of the listbinds itself and the next node it points to, recursively, all nodes will be included inthe same atomic set. This guarantees atomicity when dealing with the list.

unitfor specifies function parameters that should be manipulated atomically, as illus-trated in Listing 2.3. In detail, each of the variables marked as unitfor in a functionwill be treated as if they were part of the same atomic set, in the function body.

atomic class-constructor makes a class thread-safe by putting all of its fields and thefields of its superclasses in a single atomic set. This allows us to skip synchroniza-tion wrappers classes.

s1=this.s2 indicates that the atomic set s1 from the annotated type is aliased with atomicset s2 from the current object.

The implementation relies on static analysis, using a data flow analysis over a pro-grams call graph, that infers which locks need to be held for each unit of work, thatrepresents atomic accesses to a group of atomic sets.

A lock is associated with each atomic set. Subsequently, for each method and allmethods transitively called by that method, locks are acquired for all atomic sets they

13


1 c lass Node implements INode {2 atomicse t (n);3 p r i v a t e atomic(n) Node left| t h i s .n<n|;4 p r i v a t e atomic(n) Node right| t h i s .n<n|;5 ...6 void insert( i n t v){7 ... left = new Node| t h i s .n<n|(v); ...8 ... right = new Node| t h i s .n<n|(v); ...9 }

10 }

Listing 2.4: Augmenting AJ with ordering annotations.

could possibly access. Due to performance concerns, the locks used in this proposalallow multiple readers but only one writer, commonly known as readers-writer lock.

The locks identified in each method are acquired according to a previously estab-lished global order, effectively eliminating deadlocks in static scenarios. However, dead-locks can still occur in dynamic scenarios if (transitive) cyclical dependences betweentwo units of work exist. Currently these scenarios are not detected, but could be detectedthrough static analysis. As such, the author points it as a possibility for future work.

Follow-up work [MHD+] attempts to mitigate this possibility by the addition of an-other set of annotations that allow the programmer to clarify situations that potentiallylead to deadlock situations from the perspective of the compiler. Due to the nature ofthe system, some situations are perceived as deadlock-prone because the system cannotguarantee that resources are acquired according to a global order. This type of situationis especially prevalent in dynamic scenarios and recursive data-structures.

s1<s2 specifies the order of acquisition regarding two instances of the same atomic set (s1precedes s2).

However, those specific scenarios might represent false-positives. Even if it is notpossible for the system to infer that locks are acquired in the correct order, the under-lying structure being evaluated might already inadvertently enforce an order. In thesesituations, the primary culprit is the lack of information conveyed to the system by theannotations or the lack of a sufficiently complex static-analysis algorithm that would al-low such inferences to be made. The proposed solution is to augment the system withadditional annotations that can convey the actual order of the locks to the system.

An example of this situation is present in a binary tree, where a de facto node orderexists: the root of a subtree always precedes the nodes of the left and right branches.However, its inference is not trivial, requiring the lock ordering annotations by the pro-grammer, as illustrated on Listing 2.4.1

Notwithstanding, it should be noted that the authors only mention the applicationof these annotations to situations regarding false-positives. Applying this approach to

1The example is taken from Figure 7 in the original document [MHD+]

14


1 atomicse t (a)2 atomic(a) i n t l = 0 ; //part of an atomic set34 function fun1() {5 i n t local1 = l + 1 ;6 l = local ;7 }89 function fun2() {

10 l = -1 ;11 }

Listing 2.5: Example of a problematic data access pattern (Ru(l),Wu′(l),Wu(l)), asidentified in [HDVT08].

dynamic scenarios were no such order exists would not result in deadlock-freedom. Assuch, the current state of this work still does not guarantee deadlock-freedom in all sce-narios.

Related work by Christian Hammer et al. in [HDVT08] changed the approach byfocusing on the identification of scenarios that should be annotated, due to the presenceof data-races. The data-race detection relies on dynamic analysis of a program. Similarlyto previous work by Vaziri et al. in [VTD06], data-races where categorized into groupsthat represent problematic data access patterns. In addition to the 11 problematic groupsidentified in [VTD06], 3 more were added, bringing the total to 14 problematic scenarios.

The correctness criterion used is atomic-set serializability, where a unit of work shouldbe serializable regarding the atomic-set used. When units of work do not respect thiscriterion, data-races can occur within each unit of work. Each data-race scenario foundin a program has to fall under one of the 14 identified by the author. Thus, one of thecontributions is on how to make such detection during runtime as efficient as possible.The notation used to describe the order of events in these scenarios is Opu(var), whereOp can be a read (R) or a write (W ) from unit of work u to variable var.

Consider the following scenario illustrated in Listing 2.5: a shared variable l is partof an atomic set and two threads are executing two functions, fun1() and fun2(), eachrepresenting a unit of work. Unit of work u reads data from variable l into a local variable,possibly modifying it, and then writes it back to variable l. Meanwhile, between the readand the write from unit of work u, another unit of work, u′, writes a value to variable l.

This scenario represents the first problematic data access pattern identified in thiswork,Ru(l),Wu′(l),Wu(l), corresponds to a stale read. As a result of the first write by unitof work u′, the value read in by u at the beginning is stale, invalidating its serializability.

Disregarding efficiency, each scenario can be represented by a state machine, as illus-trated in Figure 2.1. The first state represents the beginning, when no access was detected,while the end state represents the detection of a data-race of the specific scenario.

In order to represent this state machine efficiently, the authors used a bitset. A bitsetis an abstract data-structure that is regularly implemented using an integer of the target

15


0start 1 2 3Ru(l) Wu′(l) Wu(l)

Figure 2.1: Finite state machine to represent scenario Ru(l),Wu′(l),Wu(l).

language. In this work, each bit represents a state of the finite state machine. Since ascenario has at most 5 states, 3 bits are used to describe each state. As such, for eachatomic-set a long value (42-bits) is used to represent all states for all scenarios.

During instrumentation, each atomic-set keeps monitoring accesses to data and up-dating the bitset state machines for the various scenarios. When a data-race is detected,i.e. when one of the state machines reaches an acceptance state, information about theconditions where the data-race appeared are logged into a file for posterior analysis.

2.2.2 Resource Groups (RG)

In [DP12], Delgado and Paulino propose a data-centric pessimistic concurrency controlmechanism, Resource Groups or RGs for short.

In this work, the resource group, a set of data that should be manipulated atomically,is the basic concurrency management unit. These are dynamic: can change composition,be acquired, or released. Table 2.1 lists the corresponding API. The acquire and releasedefine the scope of atomic evaluation of its resources.

As such, concerns about lock acquisition order in each resource group is abstracted bythe system’s API, making this work an interesting platform to be the compilation targetof higher level constructs.

newGroup g Creates a new groupg.add(a) Add a sequence of resources to groupg.addRead(a) Add a sequence of resources to group using a read lockg.remove(a) Remove a sequence of resources from groupg.acquire()/lock() Acquire resource groupg.release()/unlock() Release resource groupg.setPartition(a) Set the group’s partitiong.close() Close resource group

Table 2.1: Constructs provided to manipulate resource groups.

Static scenarios represent situations when the resources to be acquired in the criticalsection are known before evaluating it. Accordingly, a group has to be created, addedresources to and be acquired by the thread, as exemplified in Listing 2.6. Since this groupis fixed, i.e. does not change size, successful acquisition of a group guarantees that thethread can safely execute the critical section without further considerations. Further-more, since all resources are known, the system attempts to lock each resource respecting

16


1 void transfer(Account A, Account B, int amount)2 {3 newGroup group ;4 group.add(A) ;5 group.add(B) ;6 group.acquire () ;78 A.take(amount);9 B.give(amount);

1011 group.release () ;12 }

Listing 2.6: Usage of resource groups in static scenarios.

1 void remove(List L, Node node)2 {3 newGroup group ;4 group.add(L.head) ; //add the head of the list5 group.acquire () ; // acquire partition67 Node current = L.head.getNext () ;89 while(current != node) {

10 current = current.getNext () ;11 }1213 Node previous = current.getPrevious () ;14 Node next = current.getNext () ;15 group.add(previous) ;16 group.add(next) ;1718 previous.setNext(next) ;19 next.setPrevious(previous) ;2021 group.release () ;22 }

Listing 2.7: Usage of resource groups in dynamic scenarios.

a previously defined global resource order, effectively eliminating deadlocks in these sce-narios.

However, in dynamic scenarios, assuring deadlock-freedom is not so simple. Groupsmay increase and decrease in size during critical sections and be nested inside other re-source groups, effectively invalidating the previous approach taken in static scenarios.As such, the resources that potentially might be used during a critical section are not ableto be identified before actually evaluating the critical section. Listing 2.7 illustrates suchscenarios.

When removing an element from a doubly-linked list whose position in the list isunknown, the nodes that surround it are, by transitivity, also unknown. Although theremoval should be atomic, the system does not know which nodes should be lockedbefore the beginning of the operation. This models a fully dynamic scenario.

17

2. STATE OF THE ART 2.3. Code-Centric Concurrency Control

To avoid this, static analysis splits the application state into partitions. Each partitioncontains the elements whose interaction can lead to a deadlock. Each group is then asso-ciated with the partitions that its resources belong to, forcing the group acquisition to alsoacquire the partitions associated with it. Logically, since an additional layer of bondingbetween resources is added to groups, concurrency will be more restrained in dynamicscenarios.

However, since acquiring a resource group involves acquiring beforehand its asso-ciated partition, at most one thread will hold a partition. As such, no two threads willhold a resource group whose elements belong to the same partition, effectively assuringdeadlock-freedom in dynamic scenarios.

The partitioning algorithm puts in the same partition the resources that are accessibleto multiple threads and provoke circular-wait conditions. Effectively, resources are putinto a partition if they generate circular chains. In addition, their acquisition either canbe local (when restricted to a given object) or global. For the sake of brevity, this scoperepresents a way to impose different granularities on partitions in order to maximizeconcurrency.

Conflicts between static and dynamic acquisitions are solved by guaranteeing thata thread obliged to block during a group acquisition operation releases the remainderresources, of that same group, that it already holds. The thread will then block on the soleresource that it failed to acquire last. When that resource is again available, the threadreleases it and attempts to re-acquire all locks from the start, respecting the previouslyestablished global lock order. The ordered re-acquisition of the resources guarantees thatat least one thread is making progress. As a result, livelock-freedom is also guaranteed.

Composability is also provided by these constructs: nested groups can re-acquire re-sources or partitions acquired in outermost groups due to the use of reentrant locks.

In conclusion, this work offers a data-centric API for concurrency management thatguarantees composability, atomicity, livelock-freedom and deadlock-freedom in both staticand dynamic scenarios.

Nevertheless, due to its low level nature, two major problems make it infeasible for adirect replacement of more popular concurrency control mechanisms, such as locks andSTMs. First, the amount of annotations required make it error-prone. If the programmerforgets to free a group after using it, the system will not work correctly. Second, theproperties assured by the API depend on its correct use. That responsibility should bedelegated into a compiler instead of being another concern for the programmer.

As such, creating new high-level constructs that avoid these pitfalls while retainingthe positive properties provided by ResourceGroups will be the focus of this thesis.

2.3 Code-Centric Concurrency Control

Atomic sections are one of the most widely used constructs to express atomicity in pro-gramming languages. Systems that rely on pessimistic approaches are especially useful

18


due to supporting irreversible operations, such as I/O, to be used atomically. The mostcommon method to guarantee deadlock-freedom is prevention.

The following systems are motivated by the lack of these properties in legacy lock-based systems and by the fact that no runtime system is required, as opposed to ap-proaches that use deadlock avoidance or detection. Internally, the atomic sections resortto lock-based lower-level concurrency control primitives to manage concurrency. How-ever, these locks will need to be inferred since in most systems the only informationsupplied by the programmer is the position of the atomic section in the code.

The atomicity provided by these systems is only guaranteed as long as the code insidethe atomic sections do not terminate unexpectedly due to an exception. If such were tohappen, changes made in shared memory would not be reverted, in opposition to STMs,where the transaction would be aborted and retried.

Each system has its own method for inferring which locks should be acquired by agiven atomic section. The main goal is to guarantee atomicity and composability while atthe same time attempting to guarantee deadlock-freedom. While the level of deadlock-freedom guaranteed varies according to each system, most systems do not guarantee itin dynamic scenarios.

2.3.1 AutoLocker

AutoLocker [MZGB06] focus its contribution on the ability to provide superior perfor-mance compared to STMs. Nonetheless, benchmarks show that performance is slightlyworse than manual implementations using coarse and fine-grained policies. Achievingequal performance would require a much more complex lock-inference system and moreannotations to eliminate ambiguities that forbid optimizations used by programmers.

Concurrency control is expressed at two levels. The programmer must delimit atomicsections and, additionally, explicitly associate a lock variable to annotate the data with alock variable, through the protected_by annotation.

Atomic sections in AutoLocker only guarantee exclusive access to the memory posi-tions that have been associated to lock variables. Listing 2.8 illustrates this approach inthe implementation of a function, put, that stores an element v (with given key k) in ahash table.

Due to the lack of annotations inside the atomic section, the programmer can fine-tune the locking mechanism, enforcing it to be more fine or coarse-grained. Tuning thegranularity does not interfere with the underlying program because both annotations(mutex and protected_by) are only used by Autolocker. Listing 2.9 demonstrates the useof an individual lock per bucket instead of a global lock for the whole table, improvingperformance in the hash table by enabling concurrent access to distinct buckets.

The addition of a method for hash table resize would require another protected_by

annotation to protect the whole hash table. The global lock would be at a higher levelthan the individual bucket locks, only allowing the use of individual locks when the

19


1 s t r u c t entry { i n t k; i n t v; s t r u c t entry* next; };23 mutex table_lock;4 s t r u c t entry *table[SIZE] protec ted_by (table_lock);56 void put( i n t k, i n t v) {7 i n t hashcode = ...;8 s t r u c t entry *e = malloc (...);9 e->k = k; e->v = v;

10 atomic {11 e->next = table[hashcode ];12 table[hashcode] = e;13 }14 }

Listing 2.8: The annotations required to implement an hash table using AutoLocker.

1 s t r u c t bucket {2 mutex lock;3 s t r u c t entry* head protec ted_by (lock);4 };5 s t r u c t bucket table[SIZE];

Listing 2.9: Augmenting the hash table from Listing 2.8 using fine-grained locks.

global lock is in read-only mode. This lock organization scheme, called multi-granularitylocking, comes from database systems literature [GLPT76]. Overall, having the possibil-ity to use such organization provides the programmer with almost full control over thecode generated with minimal use of annotations.

Deadlock-freedom is also guaranteed. During the translation from the annotated codeinto pure C code, a correction algorithm is run to ensure that all the code produced isdeadlock free. If that property cannot be guaranteed, compilation will fail. Thus, the toolmight refuse to compile the code if it is unable to order locks in a way that deadlocks arenot present, leading to a false-positive. This is due in part to the algorithm’s conservativeapproach. However, the authors argue that most of the cases that produce a compilationerror can be manually fixed by replacing local locks with global locks.

Since AutoLocker makes use of two-phase-locking, limitations are imposed on al-gorithms implemented with it. Consider a tree data structure. Deadlocks that appearobvious to the compiler might not exist due to the way the structure is accessed. A sim-ple tree traversal algorithm in which each node traversed is locked will not compile. Ateach level, a lock to the previous node would be released and another to the next nodewould have to be acquired, triggering the false positive by the compiler.

2.3.2 Lock Inference for Atomic Sections

The work proposed by Michael Hicks et al. in [HFP06] provides a similar mechanism toAutoLocker but without requiring lock annotations.

20


A type-based analysis infers the abstract locations of each memory location sharedbetween threads through pointer backtracking. For each atomic section, the compilerbacktracks all pointer values to their point of origin in the atomic section. The closed setof memory locations computed represents the minimal set that has to be protected witha lock before executing the atomic section. This information is then used to generate thelocks that have to be acquired to ensure atomicity in each atomic section. These locksare then released before exiting the atomic section, i.e. two-phase locking previouslydiscussed for AutoLocker (Section 2.3.1). Locks should be reentrant since outer atomicsections also protect the data inside the inner atomic sections.

Since the number of locks potentially acquired can introduce unwanted overhead,when two or more locations are always accessed together, their locks are coalesced intoa single lock. Additionally, if a location a is always accessed after acquiring the lock toanother location b, the location a lock is dominated by location b lock. Accordingly, thelock for location a is dropped.

Deadlock-freedom in both dynamic and static scenarios is attained by enforcing aglobal lock order and acquiring them accordingly. As far as we can conclude, due to theneed to assess statically all references, deadlock-freedom in dynamic scenarios is assuredby sacrificing performance. When a relation of dominance is not able to be establisheddue to circular dependencies, the algorithm reverts to a global lock that protects the af-fected atomic sections. This detail is not explicitly mentioned in the work, but a similartechnique is assumed to be used.

Consider a linked list. Even thought the amount of nodes it can contain is unbounded,the algorithm will only protect the list backbone. If a method returned a random nodefrom the list, its lock would not be the same as the list backbone due to a different mem-ory location and violate atomicity. But if the previously mentioned global lock is used,atomicity would be preserved.

2.3.3 Lock Allocation

The work proposed by Emmi et al. in [EFJM07] computes the locks required throughbacktracking of pointers locations. Each atomic section is then associated with locks thatrepresent resources used in that section. Resources that are unknown or that might havemultiple locations are said to be may-alias. The locks are then modelled as a constraintoptimization problem in order to reduce the number of locks used while maintainingatomicity and deadlock-freedom.

The nesting of atomic sections is, once again, supported by enforcing a two-phase-locking in similar fashion to AutoLocker (Section 2.3.1). The resources acquired by theinner atomic sections are only released when the outer atomic section finishes.

In static scenarios, imposing a global lock order guarantees the absence of deadlocksdue to the orderly acquisition of the locks at the entrance of an atomic section. However,this approach is not enough to guarantee deadlock-freedom in dynamic scenarios since

21


the set of locks represented by may-alias variables has to be dynamically determined. Inthose cases, may-alias variables are assigned a single global lock. Given that they canpoint to multiple locations during the execution of the program, assigning them to a sin-gle global lock guarantees both atomicity and deadlock-freedom in dynamic scenarios.Hence, concurrency is sacrificed in order to guarantee deadlock-freedom in dynamic sce-narios.

In conclusion, deadlock-freedom is assured in both static and dynamic scenarios.

2.3.4 Inferring locks for atomic sections

The work proposed by Cherem in [CCG08] makes use of multi-granular locking seen indatabase systems to provide weak atomicity and deadlock-freedom.

Consider a fine lock that represents a resource that is acquired inside an atomic sectionwhere a coarse lock is already held. Multi-granular locks permit the locking of individualfiner locks without holding the coarser lock, instead marking it with an intention. The in-tention conveys that at least one fine lock is currently held by a thread, so the coarse lockcannot be acquired. When all fine locks are released, the coarse lock loses its intention,regaining the ability to be acquired by a thread.

The association between resources and the locks that should be acquired is donethrough pointer backtracking analysis, as described in Section 2.3.2.

If the number of locations in an atomic section is fixed and under a certain threshold,each memory location has a fine-grain lock associated. Otherwise, such as in dynamicscenarios where the number of elements accessed in a atomic section is unbounded, asingle coarse-grain lock is used. This technique reduces the contention when the numberof locks is high and ensures deadlock-freedom in dynamic scenarios by using a coarserlock that encases all potentially accessed elements.

Deadlock-freedom is guaranteed in both scenarios as in [EFJM07, HFP06].

2.3.5 Deadlock Avoidance

The deadlock avoidance strategy relies on having a runtime that avoids transitions inthe system’s state that might result in a deadlock.The system delays the acquisition oflocks deemed deadlock-prone until they are considered safe. As such, the two systemshere overviewed resort to static analysis and a runtime for deciding lock acquisitions.However, they resort to different methods to evaluate the safeness of a lock acquisition.

The purpose of the static analysis is to identify cases where deadlocks can be present,delegating a significant part of the computation offline. This allows the runtime code tofocus solely on avoiding them as the execution unfolds, minimizing runtime overhead.

22


2.3.5.1 Shared Memory Deadlock-Free Semantics

In the work by Boudol presented in [Bou09], the runtime ensures that states that mightlead into a deadlock are not allowed by refusing to lock a memory location when it an-ticipates to take a memory location that is held by another thread.

The information on which memory locations that are held by a certain thread is an-ticipated by using a type and effect system, that uses prudent semantics to prove thedeadlock-freedom of the lock. The runtime then disallows the locking of a pointer when-ever it anticipates to take some other pointer that is currently held by another thread,effectively preventing deadlocks.

This is achieved by translating an extension of CoreML with a type and effect systeminto a runtime language. For each expression, the system computes the finite set of point-ers that it might have to lock during its execution, and adds a lock over the set of point-ers before evaluating the expression. Since all locks possible of acquisition are alreadyacquired when the expression is evaluated, deadlocks are avoided. As a consequence,for every closed expression of the source language that is typable, its translation for thetarget language is effectively free from deadlocks. However, this can only be applied tostatic scenarios.

2.3.5.2 The theory of deadlock avoidance via discrete control

Contrasting with Boudol’s work, Wang et. al [WLK+09] introduces the use of DiscreteControl Theory (DCT) as a tool for dynamic deadlock avoidance in concurrent programsthat use lock-based synchronization primitives, without requiring additional annota-tions. Not requiring additional annotation presents an interesting possibility, in whichthe system would be used on legacy lock-based code to provide deadlock-freedom with-out modifications.

DCT is a branch from control theory that describes system states and transitions, nor-mally modelled as Petri Nets. These models can later be subjected to conditions to iden-tify states that are prone to deadlock. For this purpose, the Petri Nets are augmentedwith synchronization primitives and conditions that govern state transition. In a Petrinet, a siphon represents a group of states where the set of possible output transitions isempty. Since no transitions to outside of the siphon are possible, the system state is effec-tively trapped inside the siphon. For this reason, the authors argue that siphons representdeadlock-prone states.

Instrumentation code is injected to take place immediately before synchronizationprimitives, such as locks, are invoked. Before allowing a synchronization primitive toexecute, the runtime determines whether the resulting target state is deadlock-prone.The transition is then stalled until the target state is deemed deadlock-free.

In dynamic scenarios, when the locks to be acquired are unknown, the system takesa conservative approach and holds a lock using its own type. In the worst case sce-nario, when all locks share the same type, the performance is downgraded to that of a

23


1 void transfer(Object Object , Object Object , i n t amount)2 {3 atomic{4 A.take(amount);5 B.give(amount);6 }7 }

Listing 2.10: Atomic section in a transactional memory environment.

type-wide lock. As a result, deadlock-freedom is still assured in dynamic scenarios bycompromising performance.

2.3.6 Transactional Memory (TM)

Software transactional memory (STM) as a concurrency control mechanism grows fromthe transaction concept found in database systems [HLR10]. Namely, from the optimisticmodel used in databases that enables concurrent transactions and consequently takingadvantage of the new multi-core architectural paradigm. Furthermore, synchronizationintensive applications exhibiting a low conflict rate tend to benefit the most from thescaling provided by concurrent transactions [PW10].

The STM paradigm favours compact semantics and ease of use in a concurrent envi-ronment, as exemplified in Listing 2.10. The support for nested atomic sections enablescode compositionality. For instance, in an object-oriented programming language, theimplementation of a given method is not influenced by the use of atomic sections inlower levels of the call stack.

Of the properties originally guaranteed by database transactions (ACID) [HR83], trans-actional memory provides two of them:

Atomicity Changes performed by the transaction are made visible to the outside at thesame time, when the transaction commits successfully. STMs normally offer weakatomicity, where atomicity is only guaranteed among transactions [HLR10].

Isolation Concurrent transactions do not observe the internal state of other transactions,and therefore do not interfere with each other’s result.

The remainder two, consistency and durability, are not guaranteed in transactionalmemory since supporting them would imply a considerable overhead similar to that ofdatabase systems [MBM+06].

Statements in the atomic section have to maintain these properties or abort, and retrylater. The behaviour in the abort scenario can differ across distinct STM implementa-tions. The unclear semantics associated with transactions across multiple implementa-tions poses one of the reasons why STM only achieved negligible use in real applications[Boe09].

24


Transaction properties are either managed by a runtime system during the executionor statically handled by the compiler, depending on the system in question. Normally,the static approach is not as flexible but has less of an impact on the performance.

The transaction normally relies on two important mechanisms to assure atomicity:commits and rollbacks. Commits are performed at the end of a transaction, and definesthe point when the changes performed by such transaction are made visible to the outside(the commit itself also has to be atomic). A transaction cannot commit when the changesit wants to apply to shared data conflicts with changes made in the data since the timethe transaction started.

Since STMS roll back when a deadlock or conflict is detected, transactional memoryis deadlock-free in both static and dynamic scenarios. At most, it will keep rolling backdue to conflicts, that might lead to a livelock, but never enter a state of deadlock.

Two main approaches can be taken regarding the concurrency control between trans-actions: optimistic and pessimistic.

2.3.6.1 Conflict detection and resolution

With pessimistic concurrency control, if a conflict occurs, it is detected and resolved atthe same moment. Once a transaction is initiated, all further possible conflicts are de-tected before they interact with the data, avoiding future conflicts. In other words, once atransaction starts, it holds exclusive access to that data. In this case, deadlocks that occurhave to be resolved and STM does it through deadlock-detection.

On the other side of the spectrum, optimistic concurrency control permits for de-layed detection and resolution of conflicts. This allows concurrent transactions to takeplace, leaving to the runtime system the detection of possible conflicts before committingchanges.

When contention is high, the pessimistic approach will provide for better perfor-mance due to the decreased number of rollbacks made compared to the optimistic al-ternative. In that case, most transactions would execute only to the aborted because theywere contending with other transactions for the same data and the conflict was onlyfound in the end of execution.

On the other hand, the optimistic implementation is best suited for scenarios wheremultiple transactions are not expected to compete over the same data.

However, some instructions cannot be rolled back, such as I/O routines or systemcalls, and as a consequence are not allowed inside transactions. Even if I/O buffers whereto be copied and rolled back, sent packets would already be sent, regardless of whetherthe transaction rolls back or not. Works in this direction have tried to mend this flaw andprovide for unrestricted transactions that don’t rely on complex implementations.

25


2.3.6.2 Supporting I/O and System Calls within Transactions

The work in [BLM06] proposes a method for supporting I/O operation and system callsin a unbounded conventional transactional memory system.

For that purpose it specifies two modes of execution:

restricted transactions limits transaction size, duration and content but is highly concur-rent.

unrestricted transactions unbounded, but allows I/O and system calls. Only one is al-lowed at any time, limiting concurrency.

Such support for I/O and system calls is based on the assumption that the oldesttransaction in the system will never roll-back or abort, because in the case of conflict,younger transactions are aborted instead. This conflict resolution algorithm is used onseveral transactional proposals [AAK+05, MBM+06, RG02].

Even though the authors provide two distinct implementations, a naïve and an op-timized one, in this document we will restrict ourselves to the latter. The optimizedimplementation allows for concurrent execution of multiple restricted transactions and asingle unrestricted transaction. This provides for highly concurrent execution of transac-tions, assuming that the transactions using unrestricted mode due to I/O or system callsin a program are a minority.

This proposal, however, requires dedicated hardware support: an interface that in-cludes pairs of instructions for initiating and terminating each of the two modes of trans-actions, and two word-sized registers (STSW and PTSW) used for controlling transac-tions without introducing significant overhead.

2.3.7 Adaptive Locks

Adaptive Locks [UBES09] takes a completely different approach from previous worksand plays on the fact that performance of pessimistic and optimistic concurrency controlvaries according to the execution scenario. While it relies on regular atomic sections thatensure atomicity, their mode of execution is dynamically alternated between optimisticand pessimistic in order to achieve maximum performance. Runtime holds the respon-sibility of choosing and dynamically applying the mechanism that should yield the bestperformance in each occasion.

For that purpose, the two modes represent mutex locks or transactional memory. De-cision on whether to run an atomic section using a certain mode is based on the observedbehaviour of the program. During runtime, three factors are statistically evaluated foreach atomic section to make such decision:

nominal contention (c) number of threads blocked on the lock in mutex mode. It repre-sents the benefit of executing in transaction mode rather than in mutex mode.

26

2. STATE OF THE ART 2.4. Final remarks

actual contention (a) number of times each transaction retries in transactional mode. Itrepresents the competition by other threads over a shared data on the critical sec-tion.

transactional overhead (o) a measure of the overhead associated with executing a trans-action in this critical section, calculated based on the proportion of shared memoryoperations in a transaction.

These three variables are taken into account by the balancing algorithm, which de-cides whether or not to change the execution mode based on potential performance gain.The cost-benefit analysis abides to the following formula:

a.o ≥ c

In case of tie, the system opts for the mutex mode.The statistics’ acquisition overhead is almost negligible due to several optimizations,

like using CAS (compare-and-swap) instructions in the decision algorithm and updatingthe statistic counters in a non-atomic way, allowing data-races in the statistics’ updates.Updating the statistic counters would represent an important bottleneck if it had to beserialized in all critical sections. For this reason, the authors decided that the skew arisingfrom concurrent updates is sufficiently negligible, and thus have opt to ignore it.

However, since any of the two mechanisms can be enforced at runtime, only commonfeatures can be taken into account. As such, while isolation is guaranteed, atomicity anddeadlock-freedom are not. Additionally, due to the interchange between STM and mutexmode, composability is also not guaranteed.

2.4 Final remarks

As possible to infer from Table 2.2, the most uniform advantage of the presented systemscompared to the de facto concurrency control mechanism, mutex locks, is composability.This property alone greatly helps programmers build modular software.

The weak atomicity provided by systems other than transactional memory is onlyapplicable if the code being executed during an atomic section does not terminate unex-pectedly due to an exception, as mentioned in Section 2.3.

In regards to their implementation, only works based on deadlock-prevention arestatic. The remaining works, based on deadlock-detection and avoidance, rely on a run-time system to manage the presence of deadlocks and atomicity where applicable.

While some code-centric approaches [HFP06, EFJM07, CCG08] manage to provide afull spectrum of properties that benefit the programmer, only one data-centric approach[PD12] manages to guarantee deadlock-freedom in all dynamic scenarios. As such, thissystem will represent the basis on which the work in this thesis will be built upon.

27

2. STATE OF THE ART 2.4. Final remarks

Deadlock-freescenarios

Approach Static Dynamic Livelock-free Atomicity Isolation Composable

Mutex locks No No Yes No Yes No

Deadlock-detectionTrans. Memory Yes Yes No Weak Yes YesAdaptive Locks No No No No Yes No

Deadlock-preventionAutoLocker Yes Fails to compile Yes Weak- Yes YesInferring Locks Yes Yes Yes Weak- Yes YesLock Inference Yes Yes Yes Weak- Yes YesLock Allocation Yes Yes Yes Weak- Yes YesAJ Yes Mostly yes Yes Weak- Yes YesResource Groups Yes Yes Yes Weak- Yes Yes

Deadlock-avoidanceBoudol Yes No Yes No Yes YesDCT Yes Yes Yes No Yes Yes

Table 2.2: Comparison of legacy and state of the art systems.

28

3From Atomic Variables to

Data-Centric Concurrency Control

In this chapter, we introduce a data-centric concurrency management model. We pro-vide an informal presentation of both the model’s syntax and semantics, followed by adiscussion about the impact of our construct in the underlying type system, along withits reasoning and the properties we can derive from it.

3.1 Syntax

Our proposal relies on the identification of objects in memory whose value should beaccessed atomically, in both read and write accesses. In essence, the declaration of theseobjects should convey to the system that intent, so the latter can enforce the desired prop-erties when the program is subsequently executed.

An issue to consider is the simplicity of the proposed constructs, which has to becarefully balanced. On the one hand, several distinct annotations can provide the com-piler with additional information to further optimize the generated code, but burdensthe programmer with additional responsibilities. On the other hand, a simpler, less com-plex approach, releases the programmer from these concerns and delegates it on the un-derlying automated system. Yet, this approach requires additional static analysis and astronger lock inference engine. The simple use of the model must not, however, limit theprogrammer’s ability to naturally express concurrency in both simple and complex sce-narios. Conversely, the programming construct must supply enough information for thecompiler to generate efficient concurrent code while guaranteeing two important prop-erties: atomicity and deadlock-freedom in all scenarios.

29

3. FROM ATOMIC VARIABLES TO DATA-CENTRIC CONCURRENCY CONTROL 3.2. Semantics

In the face of this trade-off, our approach is to subsume as much concerns in thecompiler and runtime system as possible and, hence, relief the programmer from thoseconcerns. As such, we propose the use of a single data-centric concurrency control con-struction for the target programming language. The main asset provided is the abilityto express concurrency using a simpler mental process compared to legacy, code-centricsystems due to the delegation of concerns to the underlying system.

The specification of concurrency control in our model is thus entirely built arounda single annotation: @Atomic. This annotation is applicable to all variable declaration,namely, class fields, function parameters, and local variables declarations.

The act of using a single annotation, with a single meaning, eases considerably thedevelopment of software compared to systems that provide to the programmer severalconstructs with distinct semantic meaning [VTD06]. As a consequence, we claim that ourmodel is easier to use and provides the programmer with higher productivity than theaforementioned models, while maintaining a high degree of expressivity.

3.2 Semantics

The reasoning of the proposed concurrency control model revolves around the annota-tion of variables that must be evaluated atomically in a unit of work. In the scope of thisdocument, an annotated variable will be referenced to as an atomic variable and a memoryobject referenced from an atomic variable as an atomic resource, or simply resource.

Definition 1 (Atomic Variable). A variable annotated with @Atomic.

Definition 2 (Atomic Resource or Resource). A memory object reachable from an atomic vari-able.

The semantics of the model builds from another construction traversal to most pro-gramming paradigms, the function. This construction will be our basic scope of atomicity,its body will delimit the scope of atomic evaluation for all memory objects reachable fromatomically annotated variables within that same body. It is the equivalent to the unit ofwork in AJ. The scope of atomic execution is limited by either the return of a function orthe raising of an exception.

Definition 3 (Unit of work). The function defines the scope of atomic evaluation of a set ofatomic resources, and thus it is the model’s unit of work.

Definition 4 (Semantics). Let R be the set of atomic resources referenced by the atomic vari-ables in the scope of a function f . The model guarantees all atomic resources belonging to R areatomically evaluated, as a whole, in the entire scope of f .

As such, to perform an atomic operation over a set of resources, the programmer onlyhas to reference them in the same function.

30


Note that in order not to limit the use of nested units of work, resources are reentrant,in the sense that they may be acquired multiple times in nested functions. As expected, aresource acquired in a function is considered acquired in the context of inner functions.

Axiom 1 (Resource thread isolation). Message-oriented communication may not exchange ref-erences to resources.

A precondition for the application of the model is the absence of communication in-side a unit of work that involves the exchange of references to resources between threads.This limitation prevents the occurrence of circular dependency scenarios regarding re-source acquisition that cannot be inferred from static analysis due to threads exchangingreferences to resources.

Note that the atomic resources referenced in a unit of work can change during its exe-cution. However, as described in Definition 4, the model guarantees that all the resourcesreferenced in a unit of work at any point in time are atomically evaluated.

To illustrate the use of our system, two main examples will be presented and revis-ited throughout the document: a bank simulation and the implementation of a (simply-)linked list. The bank allows the withdraw and deposit of money in accounts, and thetransfer of money between two accounts. The linked list on the other hand representsa fairly straightforward implementation, using a single forward reference in each node.This implementation supports adding, removing or checking the presence of an elementin the list.

In the bank program, illustrated in Listing 3.1, the intent is to ensure that all accountsare accessed under atomicity. Respectively, the structure that stores the accounts has itsAccount polymorphic type annotated with @Atomic (line 2). Since the map is not itselfannotated, it is not atomically evaluated, but rather the accounts attained from it. Atthis point, the accounts attained from this map are effectively atomic resources. As such,when assigned to variables, they should be also annotated with @Atomic to convey thatthe (atomic) variable holds an atomic resource (lines 6 and 7).

In the transfer function of the Bank class, the scope of atomicity enforced only in-cludes the third statement (line 8). It represents the first access to an atomic variable inthe function, since the previous two declaration of local atomic variables (lines 6 and 7)do not contain accesses to atomic variables (the map is not considered an atomic vari-able). However, in the transfer function in the Account class, the programmer also hasto annotate the parameter of type Account with @Atomic (line 17) because the accessedaccounts, dst and this (line 18) should also be atomically evaluated in the scope of thistransfer function. Additionally, since the function is invoked with an atomic variableof type Account, the parameter also has to be an atomic variable, hence requiring the@Atomic annotation.

The linked list implementation shares a similar intent to the previous example: guar-anteeing that the linked list is traversed and modified atomically. To achieve this, theprogrammer starts by annotating with @Atomic the head of the list, head (line 16), and

31


1 class Bank{2 Map <Integer , @Atomic Account > accounts;34 void transfer(int srcAccNumber , int dstAccNumber , int amount)5 {6 @Atomic Account src = accounts.get(srcAccNumber) ;7 @Atomic Account dst = accounts.get(dstAccNumber) ;8 src.tranfer(dst , amount);9 }

1011 (...)12 }1314 class Account{15 float m_balance ;1617 void transfer(@Atomic Account dst , int amount) {18 dst.deposit(this.withdraw(amount));19 }2021 (...)22 }

Listing 3.1: Syntactic application of our system to a bank transfer between twoaccounts.

the forward reference in each node, next (line 2). However, these two annotations do notsuffice. The functions of the LinkedList class are still dealing with non-atomic local vari-ables that reference nodes of the list. To ensure the linked list is traversed atomically inthose functions (add, remove, and contains), their local variables should also be annotatedwith @Atomic.

Focusing on the add function, the scope of atomicity in this unit of work ranges fromthe first access to the atomic variable head until the last access to the atomic variable prev

(line 31 and 43, respectively). This scope is especially interesting since it illustrates ascenario where the composition of the resource set in the function evolves in time. Boththe next and prev atomic variables are assigned different atomic resources to the next

and prev atomic variables as the algorithm iterates over the list (lines 23 and 37).

3.2.1 Deadlock-freedom

While the model itself makes no guarantee regarding deadlock or livelock-freedom, thelower level concurrency mechanism that will be used to implement this model alreadyguarantees livelock-freedom. Assuring deadlock-freedom, on the other hand, requiresthe identification of deadlock-prone resource acquisitions (represented by circular de-pendencies between them) and associating them a coarse-grained lock, assuring theirserialization and respective deadlock-freedom. As such, and given the semantics pre-sented, we claim that it is possible to build a system based on pessimistic concurrencycontrol mechanism that fulfils these requirements while assuring that at least one threadis making progress at any given point in time.

32


1 class Node {2 @Atomic Node next ;34 void setNext(@Atomic Node node) {5 next = node ;6 }78 Node getNext () {9 return next ;

10 }1112 (...)13 }1415 class LinkedList {16 @Atomic Node head;1718 public boolean contains(int value) {19 @Atomic Node next = head.getNext ();2021 int v;22 while ((v=next.getValue ()) < value) {23 next = prev.getNext ();24 }2526 return (v == value);27 }2829 public boolean add(int value) {30 boolean result ;31 @Atomic Node prev = head;32 @Atomic Node next = head.getNext ();3334 int v;35 while ((v=next.getValue ()) < value) {36 prev = next ;37 next = prev.getNext ();38 }3940 result = (v != value) ;41 if (result) {42 @Atomic Node node = new Node(value ,next);43 prev.setNext(node);44 }4546 return result;47 }4849 (...)50 }

Listing 3.2: Syntactic application of our system to the implementation of a simply-linked list.

33

3. FROM ATOMIC VARIABLES TO DATA-CENTRIC CONCURRENCY CONTROL 3.3. Typing

3.3 Typing

We want our annotation to have an impact on the type of the resources reachable fromatomically annotated variables. We denote by τ@, the type of an atomically annotatedvariable of type τ .

Definition 5 (Typing). Let<: denote the subtyping relation in the hosting language type system.τ and τ@ share the same interface but τ ≮: τ@ and τ@ ≮: τ .

The omission of any of the annotations will produce a typing error, warning the pro-grammer to the missing annotation and refusing to compile. Additionally, whatever isthe super or subtype of the annotated type, there is a corresponding atomic type.

Proposition 1 (Subtyping). For a given type τ ,

∀τ ′ : τ ′ <: τ ⇒ ∃τ ′@ : τ ′@ <: τ@

and ∀τ ′ : τ <: τ ′ ⇒ ∃τ ′@ : τ@ <: τ ′@

Thus, τ@ cannot be cast to a non-atomic type.

Both types τ and τ@ share the same interface and implementation, but are not sub-types of each other. Therefore, the type is not interchangeable between an atomic variableand its non-atomic correspondent. The type system does not allow invocation of functionby passing an atomic type argument to a non-atomic type parameter, and vice-versa. Itis also not possible to type-cast an atomic type to a non-atomic type without producingan (compile or runtime) error. As such, an atomic type cannot be accessed outside anatomic scope. Additional, all aliases of an annotated variable will also share the sametype, therefore sharing its atomic type. Likewise, any partial access is also atomic.

Incrementally, atomic types also implicitly impact the return type of functions andthe type of the constructors. A function that returns a type value of τ@ must have its typemodified accordingly. Likewise, an assignment from a constructor of type τ to a variableof type τ@ must also be converted. Since the original function returned a non-atomictype (τ ), returning an atomic type (τ@) would result in a failed compilation. As such, thereturn type of the function has to be also converted (Equation (4.5)). Constructors alsohave to be converted if a constructor of a non-atomic type is being assigned to an atomicvariable (Equation (4.3) and (4.4)).

In overview, an atomic resource is always accessed via an atomic variable. As such,an atomic resource is always evaluated in the scope of an atomic function. This propertyis the backbone for the centrality of our concurrency control mechanism.

Theorem 1 (Correction). A well-typed program does not access atomic resources from outsidean atomic function.

Proof. Definition 5 and Proposition 1 imply that an atomic type is not interchangeablewith a non-atomic type. Type-casts from an atomic type to a non-atomic type will also

34

3. FROM ATOMIC VARIABLES TO DATA-CENTRIC CONCURRENCY CONTROL 3.3. Typing

1 class X {2 @Atomic T var = ... ;3 T normal -var = ... ;45 ...67 void go() {8 start(var) ; //TYPE ERROR9 end(normal -var) ; //TYPE ERROR

10 }1112 void start(T param1) {13 @Atomic T local1 = param1 ; //TYPE ERROR14 }1516 void end(@Atomic T param2) {17 T local2 = param2 ; //TYPE ERROR18 }19 }

Listing 3.3: Compilation failures of a program where the atomic types were notpropagated correctly.

result in an error. Additionally, Definition 4 implies that an atomic variable is alwaysevaluated atomically in the scope of the function that accesses it. Thus, an atomic resourcecan only be accessed in the scope of an atomic function, where it will always be atomicallyevaluated.

Corollary 2 (Strong Atomicity). An access to an atomic resource is guaranteed to be atomicagainst all remainder accesses to that same resource.

To illustrate this concept, Listing 3.3 represents a sample program that fails to compiledue to incorrect propagation of our atomic annotations. In line 8, an atomic variable T ispassed as argument to a function that takes a non-atomic argument. Even though theiroriginal type is the same (T), the type of the annotated variable (T@) is not a subtype of T,which results in a type error in this invocation. The inverse takes place in line 9, wherea non-atomic variable fails to typecheck against an atomic variable parameter. Lastly,both lines 13 and 17 feature another type error. This time, both local variables featureopposed atomic-statuses regarding the parameter they are aliasing, effectively failing thetypecheck.

This propagation of atomic types could easily be inferred by the compiler using acall-graph analysis. However, we argue that its explicit annotation by the programmerimproves the legibility of the code and notion that the programmer has about where theconcurrency control is being effectively enforced.

To exemplify the inference of atomic types possible in our system, let us return tothe bank application and linked list examples, now after performing the respective typetransformations discussed. Here we illustrate the concept of primary and derived annota-tions.

35

3. FROM ATOMIC VARIABLES TO DATA-CENTRIC CONCURRENCY CONTROL 3.4. Final remarks

1 class Bank{2 Map <Integer , Account@> accounts;34 void transfer(int srcAccNumber , int dstAccNumber , int amount)5 {6 Account@ src = accounts.get(srcAccNumber) ;7 Account@ dst = accounts.get(dstAccNumber) ;8 src.tranfer(dst , amount);9 }

1011 (...)12 }1314 class Account{15 float m_balance ;1617 void transfer(Account@ dst , int amount) {18 dst.deposit(this.withdraw(amount));19 }2021 (...)22 }

Listing 3.4: Type mutation applied to the bank transfer example.

Both Listing 3.4 and Listing 3.5 represent, respectively, the state of both the bankprogram and the linked list examples after going through type mutation. An erroneousapplication of our annotation would generate type errors.

A simple exercise using the linked list illustrates the inherent centrality of our system.The starting point in this exercise is the linked list implementation without no annota-tions whatsoever. Then, we start by annotating the head of the list (head) at line 16. Afterattempting to compile, the compiler would then quickly return a type error in line 19 and31 due to the assignment of an atomic variable (AtomicNode) to the non-atomic variable(Node). The programmer would then correct these by annotating these two variables.This procedure would go on until the programmer eventually reached the correct stateof the code previously presented in Listing 3.2. As such, the head annotation representsour primary annotation, the one from which all other derived annotations can be inferred.

The example also showcases the application of the type conversion rules to the returnof method results required to maintain type coherency. The return type at line 8 had tobe converted from Node to AtomicNode because the returned variable is atomic. At line 41,the constructor had to be converted from Node to AtomicNode because it is now beingassigned to an atomic variable.

3.4 Final remarks

At the end of this chapter, we have defined a model for data-centric concurrency mecha-nism based on a single annotation. The following chapter will address the implementa-tion of this model on the Java programming language.

36


1 class Node {2 Node@ next ;34 void setNext(Node@ node) {5 next = node ;6 }78 Node@ getNext () {9 return next ;

10 }1112 (...)13 }1415 class LinkedList {16 Node@ head;1718 public boolean contains(int value) {19 Node@ next = head.getNext ();2021 int v;22 while ((v=next.getValue ()) < value) {23 next = next.getNext ();24 }2526 return (v == value);27 }2829 public boolean add(int value) {30 boolean result ;31 Node@ prev = head;32 Node@ next = head.getNext ();3334 int v;35 while ((v=next.getValue ()) < value) {36 prev = next ;37 next = prev.getNext ();38 }3940 result = (v != value) if (result) {41 Node@ node = new Node@(value ,next);42 prev.setNext(node);43 }4445 return result;46 }4748 (...)49 }

Listing 3.5: Type mutation applied to the implementation of a simply-linked list.

37


38

4Implementation

The proposed data-centric concurrency control model will be implemented through themapping of the higher-level @Atomic annotation to the lower-level concurrency controlsystem ResourceGroups [DP12]. This system permits the deadlock-free acquisition of re-sources, but requires their explicit aggregation in groups. Part of our work will attemptto delegate this reasoning into the compiler.

By itself, the objective of the compilation process is to generate Java bytecode withinvocations to the ResourceGroups API. Two approaches could be taken to achieve this:compile the program normally and then use a framework for bytecode manipulationand modify the required code, or carry out this modifications in the source languagedirectly. The latter was selected, due to the fact that the end result would be closelyrelated to an implementation on an actual compiler, meaning it could be integrated di-rectly into the workflow of programmers without reliance on external applications orframeworks. Another advantage of working at the source level is the ability to manip-ulate the type system, that is partially lost once the program is compiled. Furthermore,it would be possible to compile bad-typed programs. However, this approach was com-pletely undocumented in the literature, amassing inertia to the initial development. Theonly source of knowledge was the actual source code of the implementation of the chosencompiler. As a result, the documentation of this procedure along with the particularitiesof the compiler used will also represent a minor contributions of this thesis.

The compiler chosen for this purpose was OpenJDK81, since it represents the mostpopular open-source Java compiler that is actively developed. The eighth version of theJava language was chosen due to the added support for type annotations in polymorphictypes.

1Currently in testing phases, official release scheduled for 2014.

39

4. IMPLEMENTATION 4.1. A Brief Introduction to javac

The introduction of our concept is done by using the new plugin system in OpenJDK8,similar to annotation processing support in past versions of the Java language. The newplugin system allows the binding of user classes to a view of the abstract syntax tree(AST) in the compiler (JSR 1992 and 2693). However, this view is obfuscated by design.No modifications can be done and several properties are hidden, such as the symbolsattached to each AST node. In order to achieve this obfuscation, the classes providedto the programmer to explore the AST are mere facades of the actual internal compilerclasses used to hide their full functionalities.

Those limitations, albeit intended by the developers of the compiler, can be overcome,as partially documented in [EK08]. By casting those facade classes into their inner types,the compilers internal (modifiable) AST nodes and related helper classes are uncovered.The framework developed during this work to access the hidden functionalities of thejavac compiler is available as an open-source project at https://github.com/metabrain/openjdk8-javac-plugin-framework.

As such, this chapter will begin by introducing the selected javac compiler, its internalfunctioning and stages of compilation.

4.1 A Brief Introduction to javac

Before dwelling further inside the implementation, one must understand how the compi-lation in javac works, more specifically, the distinct stages of compilation and their prop-erties. But first, some core aspects and terms have to be defined. Namely the distinctionbetween the abstract syntax tree and the symbols.

4.1.1 Nodes and symbols

The abstract syntax tree represents, more or less faithfully, the syntactic constructs used.Each language construction has a node, with several distinct capabilities depending onthe type of node. But more importantly, most nodes have associated a symbol, and atype object, if applicable. However, during most of the compilation, this type object iseffectively uninitialized: its initialization only occurs during the late Analysis stage.

Regarding the symbols, they are analogous to the AST nodes, only poorer. Less in-formation is delegated onto these symbols, in part because the symbols represent thecompiled state of a node. More prominent AST nodes, such as class or method nodes,have rich symbols. This is justified by the fact that classes and methods are the mostrelevant structures in the actual bytecode. In specific, they contain the classes’ names,methods’ names, parameters and their types, field variables with their types and offseton the stack frame, relevant flags, and so on.

Another thought, and a reasonable one at that, would be that those symbols are ba-sically derived from their AST nodes, and as such, only AST nodes require modification.

2JSR 199: JavaTM Compiler API http://www.jcp.org/en/jsr/detail?id=1993JSR 269: Pluggable Annotation Processing API http://www.jcp.org/en/jsr/detail?id=269

40

https://github.com/metabrain/openjdk8-javac-plugin-framework

https://github.com/metabrain/openjdk8-javac-plugin-framework

http://www.jcp.org/en/jsr/detail?id=199



Figure 4.1: Compilation stages in the javac compiler.

But that is not strictly true. Some symbols already have information saved during theParsing. For example, the stack offsets for the fields in each class is calculated during thisphase. Therefore, if for example, we wanted to inject a field into a class, we would haveto modify those stack offsets manually. As such, a complex process is further complicateddue to the decentralization of responsibilities across both symbols and AST nodes.

The manipulation of symbols that represent already compiled classes is also limited.Those limitations are imposed due to the fact that they are the representatives of compiledbytecode. As such, it is not possible to modify the types of functions and their parametersin previously compiled code.

4.1.2 Compilation stages and considerations

The compilation process of the javac compiler, illustrated in Figure 4.1, is divided intofive main stages:

• Parse - source *.java files are parsed into the AST, and symbols from Java standardlibraries, external libraries and other *.class files from the classpath are loaded intothe compiler.

• Enter - each AST node is augmented with information regarding variable scopingand shadowing. Some symbols are created for some more prominent AST nodes.

• Annotation Processing - not a real compilation stage since the compiler does not doanything in this stage unless the user specifies an annotation processor. It is anartificial stage proposed by JSR 1754 and formalized by JSR 2695 integrated intojavac version 1.6. It allows the use of an annotation processor that can be used togenerate new source files and explore a reduced, obfuscated, view of the AST.

4JSR 175: A Metadata Facility for the JavaTM Programming Language http://www.jcp.org/en/jsr/detail?id=175

5JSR 269: Pluggable Annotation Processing API http://www.jcp.org/en/jsr/detail?id=269

41





• Analysis - one of the more important phases where more verifications take place.To be more specific, most type checking, polymorphic expansions, cyclic depen-dency verification and desugaring take place here. At the start of this phase, theAST nodes have it’s type calculated and saved. This information is used on theverifications performed in this stage.

• Generate - the last phase of compilation. Each AST corresponding to a class is com-piled into its respective class file. During this process, the AST nodes are consumed,leaving behind a skeleton of its former self, representing the information attainablefrom the class files. In other words, they become a symbol.

These stages do not take place in a strict order. The Annotation Processing takes placeafter Enter, after which the Parse phase is repeated for the newly generated files, if present.Afterwards, the Enter phase is again repeated on all files, now including the newly gen-erated ones, so the changes from the new files are propagated to older classes. This cyclegoes on until the annotation processor signals the compiler that it has finished.

The rest of the stages are fairly straightforward. The new plugin system allows usto attach our plugin at the start or end points of each stage of compilation. As such, adecision has to be made about where to perform our modifications. But alas, it is notan easy decision. The obvious answer would be the Analysis stage: our model requiresthe use of types, to have their information available would be useful. Since all expres-sions in this stage have a type calculated and associated by the compiler, even a smallmodification would require the re-calculation and re-assignment of the types of all theexpressions in the program, since they might have been affected by the change. How-ever, the modification of those types is not trivial and would add further complexity toour implementation. In the Enter stage this type information is only present in variabledeclarations instead of all expressions. This means that changing the type of a variablewill not require the recalculation of the expression’s types, simplifying the type trans-formation process. Additionally, we have to imperatively perform some tasks before theAnnotationProcessing stage, such as the identification of atomic types, forcing us to use theEnter stage for that particular part of our plugin. As such, choosing the Analysis stage forthe remaining parts would effectively force us to split the knowledge base between twodistinct compiler phases.

All things considered, we have decided to focus our changes in a single stage, theEnter stage. The more bare bones state of the AST allowed us to make, as well, bare bonesmodifications. This leaves the most complex stages to the later stages of compilation,such as type calculations. However, this means that at this point, no type informationgiven by the compiler available for all the expressions: only variable declarations andclasses have their type associated at this point. If the need to attain the type of, for exam-ple, an expression or method invocation, we have to calculate it ourselves, a non-trivialprocess. This represents the main drawback of making modifications during the Enterphase.

42


4.1.3 Plugin binding points

Our system will be composed of three main phases that have a strict order of dependencybetween them:

Atomic Type Identification collect information about all the variables and parametersannotated with @Atomic. In concrete, the extraction of its type. Can be performedon all compilation stages after the Parse stage;

Generation of Atomic Types produce the new class files of the respective atomic typespreviously identified. This phase can only be performed on the Annotation Process-ing stage of the compiler;

Remaining modifications composed of several phases detailed further ahead, but canbe part of the same compilation stage.

Both the atomic type identification and the remaining modifications can be made atalmost any stage of the compilation process, given that the parsing is already performed.However, the generation of atomic types can only be made at the Annotation Processing.In order to harbour this middle phase, the plugin has to be bound at three points:

• At the end of the first iteration of the Enter stage, atomic types are identified;

• In the following Annotation Processing stage, the respective atomic types’ classes aregenerated;

• At the end of the the second iteration of the Enter stage, the remaining modificationsare performed.

Using these binding points will allow us to achieve all our objectives, while remainingfocused on a single compilation stage, Enter.

4.1.4 Decomposing the implementation phases

Given a general overview of the javac compiler, we can present the fundamental compo-nents in which our implementation is divided. They will be referred to as phases, not tobe confused with javac compilation stages. In all, we can identify twelve distinct phases:

• Atomic Type Identification

1. IdentifyPhase

• Generation of Atomic Types

1. GeneratePhase (with RW and exclusive lock variants)

2. GenerateGlobalPartitionsFilePhase

43

4. IMPLEMENTATION 4.2. Atomic Type Identification and Integration

• Conversion to Atomic Types

1. ConvertTypePhase

2. ConvertAssignmentsPhase

3. ConvertMethodReturnTypePhase

4. ConvertMethodInvocationsTypePhase

• Partitioning

1. BlockifyPhase

2. PartitioningPhase

3. InjectPartitionsPhase

• Mapping into ResourceGroups

1. MappingPhase (with RW and exclusive lock variants)

2. ImportPhase

Each of these phases represents a self-contained set of processes, whose name is de-rived from the source file name that implements it. Additionally, all phases (excludingthe generation of atomic types) are implemented according to a visitor pattern [PJ98].The classes used for this purpose are both TreeScanner and TreeTranslator visitor helperclasses. The first is supplied as part of the compiler API, while the later is part of the de-veloped framework. The TreeTranslator extends the TreeScanner and takes care of theboilerplate code required to access the inner components of the compiler, allowing theprogrammer to access them freely. These phases will be detailed further ahead in thischapter along with their implementation strategy and possible limitations.

4.2 Atomic Type Identification and Integration

4.2.1 Identification

The very first phase of our plugin, IdentifyPhase, has a very simple objective: to iden-tify the atomic types necessary for the latter phases. In other words, it has to inspectall variable declarations and functions parameters in the AST and verify if the @Atomic

annotation is being used or not. If so, that variable is an atomic variable. The type of thatvariable, conveniently stored in a field of the AST node, is then added to a list that storesall the types of the atomic variables found.

Oddly, this procedure is fairly straightforward to perform, unlike the ones showcasedfurther ahead in this document. Both function parameters and variable declarations(whether local, fields, or static fields) are conveniently represented by a single AST nodetype, JCVariableDecl. The type associated to each of these nodes has been previously set

44


by the compiler, being available in the vartype field. The only detail here concerns arrays:the type returned will contain the brackets [] that have to be removed beforehand.

Unlike presented in the model, the annotation of polymorphic types is absent fromthis implementation. This decision was made on the ground that adding support for thepolymorphic atomic types falls beyond the contributions of this thesis. Our knowledgeof the compiler system was centred on the Enter stage, while polymorphic types are onlyexpanded during the Analysis compilation stage. Implementing this feature would dis-perse our knowledge between these two phases, slowing down our progress and beingforced to cut out other features to stay on schedule.

4.2.2 Creation

After having identified all atomic types, the annotation processing stage takes place. Asmall quirk, important to document here, is that the plugin system and the annotationprocessor are both, somehow, sandboxed by the compiler. Probably by launching anotherJVM and running the annotation processor there, but such could not be confirmed. Assuch, knowledge of the atomic types to be generated has to be transferred between theplugin system and the annotation processor through the use of a temporary file. Beforeterminating the Enter stage and proceeding to the Annotation Processing stage, the listholding all the atomic types found has to be serialized and written into the temporaryfile. After starting the annotation processor, the file is deserialized and the atomic typesrecovered.

The annotation processor, GeneratePhaseRW, is used to create the new atomic types.This creation, since it requires a new Java class source file, can only be done at the An-notation Processing stage. For each type, a new class will be created that symbolizes itsatomic type. Listing 4.1 illustrates the atomic class equivalent of the Node class from thelinked list benchmark. This example will be mentioned throughout this section to exhibitthe transformations described.

Two distinct alternatives existed for generating the classes for the atomic types. Ex-tending the base type class or cloning the behaviour of the base type class into the newatomic type class. Copying of the old class’ behaviour into the new class, without ex-tending it, would ensure that the implementation remain strictly faithful to our model.

However, the cloning of AST nodes containing the base class behaviour raises severalimplementation issues that would eventually remove us from our main objective. On theother hand, resorting to the extends clause implies the inability to create an atomic typeif the base class has been declared as final (such as String) or the primitive type classes(such as Integer). Since the system only works with objects, primitives (int, long, etc.)are also unqualifiable for atomic types.

Despite these limitations, the choice of extending the base class was made. Neverthe-less, we believe that with the deeper understanding of the compiler developed duringthe latter phases of this work, the excluded alternative can be regarded as a possibility

45


1 p u b l i c c lass AtomicNode extends Node implements RWResource {2 p r o t e c t e d f i n a l ReentrantReadWriteLock __lock = new

ReentrantReadWriteLock ();3 p r i v a t e f i n a l Lock __writeLock = __lock.writeLock ();4 p r i v a t e f i n a l Lock __readLock = __lock.readLock ();56 p u b l i c AtomicNode( i n t arg0) {7 super(arg0);8 }9

10 p u b l i c AtomicNode( i n t arg0 , AtomicNode arg1) {11 super(arg0 , arg1);12 }1314 // Implementation of the Resource Interface15 p u b l i c i n t compareTo(Resource o) {16 r e t u r n t h i s .hashCode () - o.hashCode ();17 }1819 p u b l i c void lock() {20 __writeLock.lock();21 }2223 p u b l i c void unlock () {24 t r y { __writeLock.unlock (); }25 catch (Exception e) {}26 }2728 p u b l i c boolean tryLock () {29 r e t u r n __writeLock.tryLock ();30 }3132 p u b l i c void lockRead () {33 __readLock.lock();34 }3536 p u b l i c void unlockRead () {37 t r y { __readLock.unlock (); }38 catch (Exception e) {}39 }4041 p u b l i c boolean tryLockRead () {42 r e t u r n __readLock.tryLock ();43 }44 }

Listing 4.1: Example of an atomic type class generated for the Node class of the linkedlist benchmark.

46


1 p u b l i c c lass GlobalPartitionsHolderClass {23 p u b l i c GlobalPartitionsHolderClass () {4 super ();5 }67 //Where partitions will be inserted in the future8 }

Listing 4.2: Empty GlobalPartitionsHolderClass created to hold global partitions.

for future work.The main side-effect of our choice is that the extension of the base class allows the as-

signment of an atomic type to its non-atomic equivalent due to inheritance, i.e. the atomictype is a subtype of the non-atomic type, a behaviour that goes against our model. In anattempt to eliminate this behaviour, our system detects the occurrence of these assign-ments. A type rule was artificially injected in order to strongly enforce the propagationof atomic types. Scenarios where the assignment of an atomic variable to a non-atomicvariable are verified, a custom type error issued by the compiler. This allows us to par-tially avoid breaking the premise presented in Definition 5. However, this rule can stillbe broken if the programmer uses a supertype of the atomic type to convert it to a non-atomic type. For this reason, figuring how to implement the atomic types flawlesslyshould be the priority in future work.

Aside from extending the base class, the new atomic type class also has to supporta lock-based interface (Resource), that allows the ResourceGroup API to interact with anatomic resource. Thus, we add two objects to the new class representing a read lock andwrite lock (lines 3 and 4). Then, the respective methods required from the ResourceGroupAPI are injected and implemented using delegation to the aforementioned two lock ob-jects (lines 15, 19, 23, 28, 32, 36, and 41).

Next, we need to inject into the new class the constructors from the base class andimplement them through delegation (lines 6 and 10). For this, we find the symbol con-taining the base class and extract all the constructors in it. This is done by using theJavacElements helper class in the compiler.

The package of the new atomic type class will be the same as the base class. Thename of the new atomic class is composed of the old name prefixed with a commonprefix attributed to atomic types, Atomic.

4.2.3 Creation of a class for global partitions

After all atomic type classes have been generated, the process proceeds to a rather simplephase entitled GenerateGlobalPartitionsFilePhase. A single, empty, placeholder class,illustrated in Listing 4.2, is generated to store the global partitions required in the future.This allows us to keep all the global partitions in a single class, ready for importing to theclasses that require it.

47

4. IMPLEMENTATION 4.3. Converting Annotated Variables into Atomic Variables

4.3 Converting Annotated Variables into Atomic Variables

This section will encompass several phases whose final objective is to convert the type ofatomically annotated variables into the respective atomic type, and to perform the nec-essary program transformations in order to re-establish the type coherency eventuallydisrupted by the preceding conversion. Additionally, the equations mentioned through-out this section are illustrated in Figure 4.2. In it, T represents a type, x a variable, φour transformation function, f a function, e an expression, and i a statement. e and i

represent a sequence of expressions and statements, respectively.

φ(@Atomic T x)⇒ AtomicT x (4.1)φ(@Atomic T[ ] x)⇒ AtomicT[ ] x (4.2)

φ(new T[e])⇒ new AtomicT[φ(e)] if assigned to a variable of type AtomicT (4.3)φ(new T(e))⇒ new AtomicT(φ(e)) if assigned to a variable of type AtomicT (4.4)φ(T f(x) {ι})⇒ AtomicT f(φ(x)) {φ(ι)} if returnType(ι) is AtomicT (4.5)

φ(α)⇒ α (4.6)

Figure 4.2: Type transformations performed.

4.3.1 Converting to atomic types

The initial type conversion, ConvertTypePhase, is fairly straightforward: convert typeof the variables and function parameters annotated with @Atomic into its correspondentatomic type (Equation (4.1)). As such, for each annotated variable declaration found, wereplace the vartype field of each JCVariableDecl with the new correspondent atomic type.Respectively, array types also have their types converted if annotated (Equation (4.2)).

In the linked list example illustrated in Listing 4.3, it is possible to observe the conver-sion of types that occurred in the root (line 2) and both local atomic variables prev andnext (line 6 and 7, respectively).

4.3.2 Fixing atomic type constructors

At this point, while program has the atomic variables with the correct type, its assign-ments are incorrect. In assignments, before we had a non-atomic type variable on theleft-hand side and a non-atomic type variable or constructor on the right-hand side.However, now it is possible to have an atomic type variable on the left-hand side andthe old non-atomic type on the right-side. Fixing this right-side of the assignments willbe the objective of the following phase, ConvertAssignmentsPhase.

It starts by traversing the AST, similarly to other phases, and checking all assign-ments. Three scenarios can be identified in the assignments, since each scenario requiresa slightly different approach, but they have the same foundations. First, we calculate

48


1 c lass LinkedList {2 AtomicNode root;34 boolean contain( i n t value) {5 boolean result;6 AtomicNode prev = root;7 AtomicNode next = prev.getNext ();8 i n t v;9 whi le ((v=next.getValue ()) < value) {

10 prev = next;11 next = prev.getNext ();12 }13 result = (v == value);14 r e t u r n result;15 }16 }

Listing 4.3: Conversion of atomic types applied to the linked list example.

the types of the left and right-hand side expressions. Second, if the left-hand side cor-responds to an atomic type and the right-hand side to a non-atomic type, we are in thepresence of a type incoherency introduced by the previous stages. In the other cases, wedo not do anything because it is not our responsibility. Third, we check if the right-handside’s type is the base type of the atomic type represented in the left-hand side. If so, wehave to convert the right-hand side to match its atomic correspondent. Therefore, and foreach scenario:

A variable is being assigned a new object (new class constructor) the constructor mustbe converted into the correspondent atomic type (Equation (4.4)).

An array position is being assigned a new object again, the constructor must be con-verted into the correspondent atomic type. (Equation (4.4))

An array is being assigned a new array the constructor of the array must be transformedinto the array of the correspondent atomic type. (Equation (4.3))

4.3.3 Fixing method return types

The next phase, ConvertMethodReturnTypePhase fixes another kind of incoherency.Namely, methods that have their return type as a base type when in fact, the returnedvariable is an atomic type. As such, we need to convert the return type of that methodinto the correspondent atomic type.

During the traversal of the AST, for each method found, the type of the variable re-turned is calculated. If it is an atomic type and the return type of the method representsits base type, conversion is required. The conversion is performed in-place, by replacingthe restype class identifier (that represents the type returned) of the JCMethodDecl ASTnode (that represents the method in question) with the new identifier that represents theatomic type.

Again, in the example illustrated in Listing 4.4, the getNext method has its return type

49


1 p u b l i c c lass Node {2 p r i v a t e AtomicNode m_next;34 (...)56 p u b l i c AtomicNode getNext () {7 r e t u r n m_next;8 }9 }

Listing 4.4: Conversion of the return type of the getNext method of the Node class fromthe linked list example.

1 p u b l i c void meth() {2 @Atomic T var = new T(...);3 go(var);4 }56 p u b l i c void go(@Atomic T param) {7 ...8 }

Listing 4.5: Example of the superfluous variable declaration avoided by theConvertMethodInvocationsTypePhase phase.

converted from Node to AtomicNode (line 7). After the type conversion, the expressionbeing returned no longer represents a Node, but rather its atomic counterpart.

4.3.4 Inferring the single use of atomic type constructors as arguments

The following and final phase shares a distinct motivation. Rather than ruled by the strictenforcement of the model and the type coherency in the program, it is guided by morepractical reasons. Its purpose is to avoid a distinct variable declaration for all objectscreated even if they are only used once, in a method invocation. Take the example illus-trated in Listing 4.5. The normal use of the method invocation would be go(new T(...)),because it is only referenced once and it is only used for the method invocation. But suchwould not be allowed, since the constructor is not on the right-hand side of an explicitassignment, so it would not be converted into its atomic type equivalent. The scenariopreviously described in Section 4.3.2 represents the same transformations, albeit in anexplicit setting. Here, the transformations are implicit since the newly created objects arenot part of direct assignments, but used as arguments to the target function.

This would effectively force the programmer to declare the variable. This phase doesprecisely that. Fix all method invocations in the AST where a constructor of a base typeis used as an argument and the respective parameter is the atomic type.

Note that, from this phase onward, the type of expressions might have to be calcu-lated. However, calculating the type of an expression is not as simple as the variablecase, since it must be derived from the expression’s components, which in turn may

50


1 p u b l i c c lass Node {2 p r i v a t e AtomicNode m_next;34 (...)56 p u b l i c boolean add( i n t value) {7 (...)89 i f (result) {

10 prev.setNext(new Node(value ,next));11 }1213 r e t u r n result;14 }1516 p u b l i c AtomicGENNode setNext(AtomicNode next) {17 m_next = next;18 }

Listing 4.6: Conversion of the constructor type used in the setNext method of thelinked list example.

themselves be expressions. This process is currently very complex, and not yet perfect.There are many corner cases and distinct scenarios that can occur during the calculationof the type of an expression that are not documented. Discussion about the strategy andlimitations of the type inference system developed would deserve a section on its own.However, due to not being directly correlated with the implementation of the plugin, itwill not be further discussed. While the type inference system developed worked as faras it was tested, it is surely the weakest link in this implementation and a priority forfuture work.

Having said that, the first step of this phase, ConvertMethodInvocationsTypePhase,is to calculate the type of constructor used as an argument and the type of the expres-sion over which this function is invoked. The type of the expression that precedes thefunction invocation is needed so we can pinpoint to which class the function declarationbelongs to. Accordingly, the class that owns the declaration of that function is attainedso the type of the correspondent parameter of that function can be retrieved. Now, in thepresence of a constructor of a non-atomic type being used as an argument to a functioninvocation whose respective parameter is the atomic type correspondent, the constructoris transformed into its respective atomic type.

This phase effectively avoids the isolated declaration of an atomic local variable (withconstructor) just to pass it posteriorly into a method invocation. This is illustrated inListing 4.6, where the previous.setNext(node) method can be invoked without declaringexplicitly the new atomic node beforehand (line 10).

51

4. IMPLEMENTATION 4.4. Deadlock-free Lock Inference

4.3.5 Homogenization of the abstract syntax tree

One of the big problems brought by the variation allowed in the AST is the various com-binations of nodes possible to reach the same end result. The biggest proponent of thisaspect are blocks, whose AST node type is JCBlock. They delimit a sequence of instruc-tions and are used mostly in methods, classes, and control flow statements. However, inmany cases, such as in the use of control flow statements, they are not mandatory. Assuch, it is possible to use a for loop with only one statement following it without the us-ing a block. This statement is not an JCBlock, but a JCExpressionStatement. This bringstwo types of problems. First, additional complexity is required when iterating over thesyntax tree. To find the body of control flow statements we have to check both scenarios:either a solo JCExpressionStatement without a block or a JCBlock that contains insidemultiple statements. Second, any modification of the AST tree that requires adding addi-tional statements would have to firstly create the block if missing, adding unneeded andunrelated complexity to the algorithm.

This phase, BlockifyPhase, effectively erases both type of problems by replacing allpotentially missing blocks with blocks. It checks for missing bodies in most control flowstatements. Both if-else statements, foreach and regular for loops are blockified. Itsimplementation is rather simple. If the expected control flow statement’s body is not aJCBlock but a sole JCExpressionStatement, it creates a new JCBlock, copies the previousJCExpressionStatement to the new block and unwraps the JCExpression contained in theJCExpressionStament. Finally, the new block is assigned to the control flow statementsAST node.

4.4 Deadlock-free Lock Inference

In this section, we detail the lock inference performed that guarantees deadlock-freedomin our system. This lock inference diverges from previous approaches in most code-centric state of the art because those works have to infer which locks to associate witheach memory position. On the other hand, in our approach, we already have the locksassociated with each resource accessible through an atomic variable. In order to guaran-tee deadlock-freedom, we use the partitioner, a tool introduced further ahead that detectdeadlock-prone situations in a program, allowing us to prevent them from occurring.At this stage, the program is already deadlock-free. However, a considerable amount oflocks in the current program might not be required, introducing unnecessary overhead.As such, two additional steps are performed to reduce the number of locks that have tobe acquired at runtime, improving its performance. This optimization phase entails twosteps: the elimination of dominated and redundant locks.

52


4.4.1 Partitioning

The partitioner, developed as part of the previous ResourceGroups project [PD12] devel-oped by Professor Hervé Paulino and student Nuno Delgado, is an external tool thatrequires knowledge of the AST of the program to detect which resources acquisitionscan lead to cyclic dependencies, and hence, a deadlock. In specific, the classes presentin the program and their respective methods, the declaration of atomic variables, wherean atomic variable is referenced, and finally, the dependencies between methods are re-quired by this tool. The later is needed so the partitioner can build a call-graph of the pro-gram. As such, the goal of this step is to traverse the AST and transmit that informationabout the AST to the partitioner, through the use of the partitioner’s API, illustrated atTable 4.1. This API features methods oriented to the visitor pattern. The visitor only hasto inform the partitioner of its current position (using either the newClass or newMethod

methods) for the partitioner to progressively build an internal representation of the ASTas the visitor traverses the tree, while informing the partitioner about other relevant treenodes.

void newClass(String clazz)void addPublicField(String resourceIdentifier)void addPrivateField(String resourceIdentifier)void newPrivateMethod(String methodod, int arity)void newAcquisition()void newMethod(String methodod)void addPublicResourceType(String type)void addPrivateResourceType(String type)void addLocalResource(String resourceIdentifier)void addMethodDependency(String clazz, String method, int arity)PartitionList computeLocalPartitions(String clazz)PartitionList computeGlobalPartition()

Table 4.1: API provided by the partitioner.

When the declaration of an atomic variables is found by the visitor, the correspond-ing method must be invoked. For class fields, the addPublicField or addPrivateField

methods is used, depending if the field in question is, respectively, public or private tothe current class. For local atomic variables, addLocalResource is used if declared on thespot (using new constructor), otherwise6 using addPrivateResourceType with its atomictype. Given that, at compilation time, it is not always possible to infer to which resourcean atomic variable will point to, the minimum known denominator must be used, whichis its type. This last case occurs throughout the linked list example, such as in @Atomic

var = previous.getNext(), where addPrivateResourceType is used with the AtomicNode

type. For atomic parameters, addPublicResourceType method is also used with the typeof the parameter. Finally, for each expression encountered with a method invocation,

6Namely, if it derives from the invocation of a method that returns an atomic type or is an assignment toa previously declared local variable.

53


addMethodDependency) indicating the method invoked and its respective class. In the end,the partitioner is able to use the information collected to build a simpler image of the ASTcontaining all the information required to fulfil its purpose.

The internal representation of the program built by the partitioner is composed byfour distinct components: DependencyChainSets, DependencyChains, Groups and finallyLocks. The program evaluated is composed by as many DependencyChainSets as the num-ber of classes present in it. As such, a DependencyChainSet correspond to a class andencompasses all the DependencyChains in it. In turn, each DependencyChain representsa possible path of execution of a method in the current class, composed by a list of Groupswith a specific order representing the order of group acquisitions in chained method in-vocations. A single Group represents a set of Locks which must be acquired atomically.Finally, a Lock represents an atomic variable in the source code.

A set of DependencyChain may be interpreted as a graph, whose vertex set representsall group acquisitions and the edge set is given by the union of the Cartesian productsof every pair of successive group acquisitions. In the example of the linked list imple-mentation illustrated in Listing A.2, the graph representing the DependencyChains of themethods of class Node would be:

root Node

In this graph, strongly connected components (SCC) correspond to cyclic dependen-cies between resource acquisitions, symbolizing deadlock-prone situations.

The partitioner then applies Tarjan’s algorithm [APT79] to identify the SCCs. Each ofthe identified SCCs is associated with a partition, an artificial resource created by the par-titioner that will force the serialization of the methods that incur the SCCs. This preventsthe acquisition of resources in unordered fashion by multiple threads and consequently,prevents deadlocks. Again, in the case of the linked list example illustrated in Listing A.2,a partition would be associated with the three main methods: add, remove and contains,since they access a type included in the partition, AtomicNode. Effectively, this partitionforces the serialization of the execution of those methods, as illustrated in Figure 4.3.

The partitions identified can be part of two distinct scopes: local and global partitions.Local partitions are local to each object of a class since the SCC associated only includesresources local to that class. Global partitions, on the other hand, are shared across all ob-jects of the program because the SCC associated includes resources from multiple classes.

The end result given by the partitioner to our plugin at this point is the set of locks thatshould be acquired by each method. However, this preliminary result will be trimmeddown by the lock coalescing performed. These were developed in the context of ourwork.

54


4.4.2 Lock coalescing

Firstly, in order to eliminate dominated locks, a relation of dominance between locks hasto be established. While this concept is similar to the one presented in [HFP06], the latterdoes not have the notion of groups, only of standalone locks.

Definition 6. (Dominance) A lock l1 is said to dominate lock l2 when all occurrences of l2 areeither preceded by l1 or acquired at the same time (i.e. occur in the same Group) as l1. By analogy,l2 is said to be dominated by l1.

As such, when a lock l dominates lock l′, the acquisition of l′ is not required due toit being contained by the previous acquisition of lock l. Thus, a dominated lock can besafely removed, eliminating the overhead associated with its (superfluous) acquisition.However, since resources can be aliased to other resources, problems regarding aliasingof locks could arise. To avoid these, we can only eliminate a lock if its dominant lockis deemed concrete. A concrete lock is a lock from a variable whose pointer is neveraltered by another thread during execution. Furthermore, since initially the acquisitionof dominated locks is guaranteed to be preceded by the acquisition of its dominant lock,the elimination of dominated locks will not result in a loss of parallelism.

In order to achieve maximum performance, our work will eliminate all dominatedlocks. The implementation of the algorithm that performs this task, illustrated in List-ing 4.7, is fairly straightforward. First, we attain the ordered list of all locks in theprogram supplied (allChains) using the method flatMapLocks(allChains) (line 4). Thelocks should be ordered according to a deterministic criterion. Our implementation usesthe lexicographic order of the locks. Then, for all pairs (l1, l2) of distinct locks l1 and l2,we check if l1 dominates l2 using method dominates(l1,l2,allChains) (line 12). Thismethod returns true when lock l1 dominates lock l2, i.e. when l1 precedes or occurs inthe same group as l2 in all chains where l2 occurs. If so, and if l2 represents a concrete

Figure 4.3: Order of resource acquisition after the attribution of a partition to the linkedlist example.

55


lock, we eliminate all occurrences of l2, the dominated lock, from the current program(line 15). Accordingly, l2 is also removed from the list of all locks. When an eliminationis performed, the algorithm restarts (line 19). This process is repeated until the algo-rithm converges, when an iteration of the algorithm does not eliminate any lock, i.e. nodominated locks remain in the program.

To exemplify the lock inference process, let us focus on the hash set benchmark, il-lustrated in Listing A.1, with just the add, findIndex and rehash methods for simplicity.In the partitioner, three chains exist for the three respective methods, as illustrated inFigure 4.4.

Figure 4.4: Graph representation of dependency chains in a simplified hash set containingonly the add, findIndex and rehash methods.

The add method invokes two methods, first findIndex then rehash. In the add methodchain, the first group belongs to the add method itself, since it accesses both the set andthe states atomic variables. The second group belongs to the findIndex invocation andthe third to the rehash invocation. By itself, the rehash method accesses five atomicvariables: set, states, free, size, and max. The findIndex method chain only accessestwo atomic variables, set, and states.

Before the start of the algorithm, we have the following chains:

Chain add: -> (set,states)->(set,states)->(free,max,set,size,states)

Chain findIndex: -> (set,states)

Chain rehash: -> (free,max,set,size,states)

The algorithm then starts testing all possible pairs of distinct locks in the program,and checking if one dominates the other. In our implementation, the first pair detected inwhich a lock dominates the other is (free,max). In this case, free occurs in two chains:chain add and chain rehash. In both chains, free occurs in the same group as max. Sincefree either precedes or occurs in the same group in the chains that contain max, freedominates max. We proceed with the elimination of the dominated lock, max.

56


12 p u b l i c s t a t i c void lockCoalescing(Set <DependencyChainSet > allChains) {3 //get all locks of the program4 List <Lock > allLocksInProgram = flatMapLocks(allChains) ;5 // remove one lock at a time until it converges6 repeat:whi le ( t r u e ) {7 //for each pair of distinct locks8 f o r (Lock l1 : allLocksInProgram) {9 f o r (Lock l2 : allLocksInProgram) {

10 i f (!l1.equals(l2) && l1.isConcrete ()) {11 //if l1 dominates all ocurrences of l212 i f (dominates(l1,l2, allChains)) {13 // remove dominated lock l2 from the program14 Lock dominated = l2 ;15 removeAllFromAllChains(dominated ,allChains);16 //...and from our list of locks17 allLocksInProgram.remove(dominated) ;18 //finally , repeat algorithm since we eliminated a lock19 cont inue repeat;20 }21 }22 }23 }24 r e t u r n ; //no removal done , converged25 }26 }2728 p r i v a t e s t a t i c boolean dominates(Lock l1, Lock l2, Set <DependencyChainSet >

allChains) {29 f o r (DependencyChainSet css : allChains) {30 f o r (DependencyChain c : css) {31 i f (c.containsLock(l2)) {32 f o r (Group g : c) {33 i f (g.contains(l1))34 break ; // precedes OR in same group in this chain35 i f (g.contains(l2))36 r e t u r n f a l s e ; //l2 found before l1, so l1 doesnt precede37 }38 }39 }40 }41 //lock l1 precedes (or occurs in same group) as lock l2 in our program42 r e t u r n t r u e ;43 }4445 c lass DependencyChainSet extends Set <DependencyChain > {...}46 c lass DependencyChain extends Set <Group > {...}47 c lass Group extends Set <Lock > {...}48 c lass Lock {...}

Listing 4.7: Lock inference algorithm.

57


Chain add: -> (set,states)->(set,states)->(free,set,size,states)


Chain rehash: -> (free,set,size,states)

In the next iteration, the first pair where a lock dominates the other is the (free,size).Since free and size always occur in the same group together, free dominates size. Thesame could be said for the pair (size,free), where size also dominates free. Thus,regarding the correction of the algorithm, free could indeed be eliminated instead ofsize. However, following the ordering of pairs tested in our algorithm, the (free,size)

is tested before the (size,free). As such, size is eliminated.

Chain add: -> (set,states)->(set,states)->(free,set,states)


Chain rehash: -> (free,set,states)

In the next iteration, the pair found is (set,free). Here, set dominates free. Accord-ingly, free is removed.

Chain add: -> (set,states)->(set,states)->(set,states)


Chain rehash: -> (set,states)

In the next iteration, only two locks are left: set and states. The first pair to bechecked is (set,states). Since they always occur in the same group, set dominatesstates. Accordingly, the lock states is eliminated from all the chains where it occurs.

Chain add: -> (set)->(set)->(set)

Chain findIndex: -> (set)

Chain rehash: -> (set)

The algorithm then converges because in the next iteration no lock is dominated byanother lock. No lock can further be removed and the algorithm terminates.

When a partition is introduced, such as the linked list example illustrated in Fig-ure 4.3, the algorithm behaves the same way. In this example, all three methods (add,remove, and contains) share similar chains:

Chain add: -> (partition)->(root)->(AtomicNode)->(AtomicNode)

Chain remove: -> (partition)->(root)->(AtomicNode)->(AtomicNode)

Chain contains: -> (partition)->(root)->(AtomicNode)->(AtomicNode)

Here, the resources represented by AtomicNode cannot be acquired at the start of themethod, when the partition and root are acquired. As such, AtomicNode is part of adistinct group.

In the first iteration, the pair (partition,AtomicNode) is tested. Here, since partition

always precedes AtomicNode in the chains where AtomicNode occurs, partition domi-nates AtomicNode. As such, the locks AtomicNode are eliminated.

58

4. IMPLEMENTATION 4.5. Mapping onto the ResourceGroups API

Chain add: -> (partition)->(root)->()->()

Chain remove: -> (partition)->(root)->()->()

Chain contains: -> (partition)->(root)->()->()

In the second iteration, the pair (partition,root) is tested. Again, since partition

always precedes root, partition dominates root. Accordingly, root is eliminated fromthe program.

Chain add: -> (partition)->()->()->()

Chain remove: -> (partition)->()->()->()

Chain contains: -> (partition)->()->()->()

In the final iteration, no dominated locks are found. Thereby, the algorithm termi-nates.

4.4.3 Lock removal

Afterwards, the elimination of redundant locks takes place. The main objective of thisstep is improving performance by eliminating the acquisition of locks that, through acall-graph analysis, are guaranteed to be already previously acquired. For each method,we check its callers.

Given a method m and a lock l, if all the callers of m acquire l then the acquisition of lin m is omitted. A literal implementation of this algorithm would only eliminate locksacquired in consecutive methods. In order to bubble down the locks previously removed,when checking the presence of lock in the callers of a method, we also check if that calleralready had that lock removed. If it did, it was already acquired by its callees. Thus,we can also remove it from the present method. This procedure is then repeated over allmethods until all removals are bubbled down.

In the previous hash set example, after the elimination of dominated locks, some re-dundant locks are still present. Both findIndex and rehash methods are only invokedfrom within the add method. As such, the set lock acquisition in both findIndex andrehash are redundant since it is guaranteed to be previously acquired in the add method.As such, the locks that are acquired by the findIndex and rehash methods can be elimi-nated from the chains. In the end, only the set lock has to be acquired in the add method.

Chain add: -> (set)->()->()

Chain findIndex: -> ()

Chain rehash: -> ()

4.5 Mapping onto the ResourceGroups API

This section will detail the phase that will acquire the resources identified by the parti-tioner using the ResourceGroups API in each method. Before that, another simplificationof the AST to ease this phase will be presented.

59


1 p u b l i c boolean contains( i n t key) {2 AtomicRBNode node = getRoot ();3 whi le (node != sentinelNode) {4 i f (key == node.getValue ()) {5 boolean returnExpression = t r u e ;6 r e t u r n returnExpression;7 } e l s e {8 i f (key < node.getValue ()) {9 node = node.getLeft ();

10 } e l s e {11 node = node.getRight ();12 }13 }14 }15 boolean returnExpression = f a l s e ;16 r e t u r n returnExpression;17 }

Listing 4.8: Example of the return expression isolation in the contain method of theRBTree benchmark.

4.5.1 Isolating return expressions

One of the fundamental conditions present in our model is that a function releases all thelocks it has acquired. However, in situations where the return expression still accessesatomic variables, release is not possible since the return marks the last statement exe-cuted in the method. As such, one has to isolate those expressions that are returned intoa new variable, that is in turn returned. This job is done by the IsolateReturnExpression-sPhase. At this point, and due to the previous blockification of the AST, all control flowstatements are guaranteed to use a JCBlock, representing the brackets at syntax level.Since all return statements are guaranteed to be inside a JCBlock, searching all blockswill uncover all return statements in the AST. For each return statement found, if it doesnot return a single variable, the later is relocated into a new assignment that is injectedbefore the return statement. Next, the expression in the return statement is replaced bythe reference to the newly created variable.

This transformation preserves the behaviour of the method, while enabling the sub-sequent phases to freely insert statements before any return, without worrying about thepresence of atomic variable references in the return expression, allowing the release ofthe resources acquired in that method before the return statement. The final result is ex-emplified in Listing 4.8, where both return statements in the if block and in the methodbody got isolated to allow the inject of additional statements between them.

4.5.2 Injecting partitions

While the partitions are already defined by the partitioner, they only represent artificialresources that only exist inside the partitioner. Thus, they have to be inserted in the actualprogram. This task is performed by the InjectPartitionsPhase. The local partitions have

60


1 p u b l i c c lass GlobalPartitionsHolderClass {23 p u b l i c GlobalPartitionsHolderClass () {4 super ();5 }67 p u b l i c s t a t i c Partition GLOBAL_partition_00 = new Partition ();8 p u b l i c s t a t i c Partition GLOBAL_partition_01 = new Partition ();9 }

Listing 4.9: GlobalPartitionsHolderClass with two global partition injected.

1 p u b l i c c lass IntSetLinkedList implements IntSet {23 p r i v a t e AtomicNode m_first = new AtomicNode (0);4 p r i v a t e Partition LOCAL_partition_00 = new Partition ();56 (...)7 }

Listing 4.10: LinkedList class with a local partition injected.

to be inserted in the scope of the selected classes and global partitions will have to be in-serted in the special class created to store the global partitions, GlobalPartitionsHolderClass(Section 4.2.3).

For it, we need to inject new fields inside those classes that represent the partition.However, adding a new field to a pre-existing class in the AST is not trivial. One wouldthink that adding the respective variable declaration inside the AST of the class wouldbe sufficient. The code will compile, and the field will be apparently generated. But atruntime, the JVM would crash. The reason for this is that the offsets from the fields in aclass are calculated after the Parse phase. Injecting a variable declaration in the class ASTwould not automatically update the offsets, leading to corrupted bytecode. Those offsetshave to be updated manually after the injection of a new field (a variable declaration) inclasses.7

The variable declarations are created to hold an object of type Partition. This objectinherits from Resource, and as such, supports lock based operations similar to the atomictype objects created. Illustrated in Listing 4.9 is the partition holder class with two globalpartitions injected and in Listing 4.10 is an example of a class that had a local partitionallocated by the partitioner and thus, injected into it. These partitions will be used by thenext phases, as explained in the this section.

7This search for the correct injection of new fields in pre-existing classes in a javac plugin lead to thecreation of a small proof-of-concept. A simple javac plugin that offers support for persistent local variables.https://github.com/metabrain/java8-plugin-persitent-local-vars.

61

https://github.com/metabrain/java8-plugin-persitent-local-vars


4.5.3 Mapping

This phase, the MappingPhase, is where all concurrency control will be effectively en-forced after the preparation performed in previous phases. Firstly, it is important to statetwo variants of this phase exist: one that makes use of exclusive locks and another thatuses read-write locks. Only the read-write implementation will be detailed due to it be-ing more interesting, complex and, as explained ahead in Section 5.2, allowing a strictlybetter performance.

All the reasoning in this phase is done at method-level, since it represents our basicunit of work. Due to the complexity of this phase, it will be further split into three distinctparts: attaining the dominant locks from the partitioner, locating the AST nodes repre-senting locks and finally the actual mapping. The mapper itself, in turn, is also composedof two distinct modes: single lock mapper or multi-lock mapper.

4.5.3.1 Filtering resources to acquire from the atomic variables

At this point, a method might have several types of atomic variables. They can be classfields (partition or a normal variable), method parameters or local variables. As such, thefirst step is to query the partitioner about what locks should be acquired in this method.This is done by invoking the m.getDominators() method in the partitioner, where m rep-resents the current method, from which we obtain a list of the dominant locks. Aftercollecting in a list all AST nodes of the atomic variables referenced in the current method,the local partitions in the current class and all the global partitions, we remove those thatare not present in the list of dominant locks for this method. Now that we have the ASTnodes of the dominant locks, we have to choose which mapper to execute.

4.5.3.2 Multi-resource Mapper

The functioning of the mapper (read-write lock version) will be explained in parallel withthe example illustrated in Listing 4.11. The example represents the hash set benchmarkused in Section 5.2. For this example, the only resource that must be acquired accordingto the compiler is set.

The first step is inject the creation of the ResourceGroup object at the beginning of themethod (line 2). This object will be used according to the ResourceGroups API, illustratedin Table 2.1, in order to add or remove resources, lock or unlock the group, throughout themethod. Following, if a partition is part of the set of dominant locks to acquire, we set asthe current partition by injecting, immediately after the declaration of the ResourceGroup,a statement with rg.setPartition(p), where p represents the partition variable.

Next, the acquisition of the group should be made before the first access to an atomicvariable. This means that scope of atomicity in a function has an upper boundary de-limited by the first statement that accesses an atomic variable. In our example, the firstaccess occurs at line 5. Accordingly, the lock of the resource group is injected at line 4,before the first access.

62


1 p r i v a t e i n t index( i n t value) {2 RWResourceGroup rg = new RWResourceGroup ();3 rg.addRead(set);4 rg.lock();5 i n t length = states.value.length;6 i n t hash = (value * 31) & 2147483647;7 i n t index = hash % length;8 i n t probe;9 i f (states.value[index] != FREE && (states.value[index] == REMOVED ||

set.value[index] != value)) {10 probe = 1 + (hash % (length - 2));11 do {12 index -= probe;13 i f (index < 0) {14 index += length;15 }16 }17 whi le (states.value[index] != FREE && (states.value[index] ==

REMOVED || set.value[index] != value));18 }19 i n t returnExpression = states.value[index] == FREE ? -1 : index;20 rg.unlock ();21 r e t u r n returnExpression;22 }

Listing 4.11: Final result of the multi-resource mapper for the index method of the hashset benchmark.

The following steps entail the initial addition of the atomic variables used to theRWResourceGroup object. In this situation we can identify two groups of atomic vari-ables: those that are known at the beginning of the method and those that are not. Thefirst group contains both class fields and parameters while the second represents localvariables in the method.

Accordingly, the additions of the atomic variables that are known at the start are alsoacquired at the start, immediately after the creation of the ResourceGroups object. In ourexample, set has to be acquired as previously stated, and as such, is acquired at thebeginning of the method (line 3).

On the other hand, the variables that are not known at the start can only be added tothe group when they are declared. As such, their addition to the group is only injectedafter their initial declaration.

The aliasing of atomic variables is also extremely relevant for the mapper. Each timean atomic variable is aliased, the resource referenced is potentially changed, requiringan acquisition. As such, the mapper injects the acquisition of the new resource in thestatement following its aliasing. Basically, the mapper treats an alias of an atomic variableas the declaration of a local variable.

The read-write lock version of the mapper also allows the use of read resources, asobserved in line 3 of the hash set example. However, its application must respect somerules.

63


Since the Java read-write locks used, ReentrantReadWriteLock, do not support pro-motion, we cannot add a resource as read if a nested function will perform a write accessupon that same resource.

To discover if an atomic variable can be acquired in read mode, two conditions mustbe verified. First, the atomic variable is used solely in read accesses in the context of thecurrent method.

Here, our definition of read access is that, basically, the variable is only used as a ref-erence. This entails no aliasing and no invocations of methods on the referenced object.In the case of atomic variable arrays, accessing a position of an array is considered a readaccess, but creating or assigning another array is classified as a write access. Second, themethods invocable from the current method and their respective descendent also onlyuse the atomic variable, if present, in read accesses. If any of these conditions is broken,the system defaults to acquire the variable in write mode.

The inference of read and write accesses might be further relaxed, but to do so itwould be necessary to perform a deeper analysis over the whole call-graph. Due to timeconstrains, we think that the current read/write set detection was sufficient to demon-strate the application of readers-writer locks to our system.

At last, all resources acquired throughout the method have to be released before itterminates, or else resources would remain eternally acquired in the system. Therefore,the rg.unlock() statement has to be injected at the end of all possible exit points of thefunction. For this, the mapper traverses the AST of the method, exploring all possiblepaths of execution and injecting this statement before its end. The end of the paths ofexecution are represented by either return statements, or in its absence, the end of themethod body.

Since the release of the resources acquired throughout the method is always done atits exit points, two conclusions can be drawn. First, the end of a method represents thelower boundary of atomicity in a method. This boundary could, arguably, be furthernarrowed to the last statement that accesses an atomic variable, representing future workthat could be done to further optimize the performance of the system.

However, this implementation does not currently support the release of the locks incase of an exception. The solution would be to, in every method, surround the contentsof the body after the declaration of the ResourceGroup object with a try-catch. The try

section would delimit the entire code of the method, while the catch block would includetwo statements: one to release the resources acquired and another to throw the exceptioncaught upwards on the call-stack. That way, an exception would not leave pending locks.

Note that currently 2PL is not enforced, given that a top-level method does not ac-quire resources that are reachable by a nested method invocation but not in its actualbody. These acquisitions (and paired releases) are delegated on the invoked method. Ac-cordingly, during a method’s execution locks may be released before others are acquired.

64


4.5.3.3 Single Resource Mapper

The single resource mapper, SingleResourceMapMethod, is an optimization for the scenariowhere only one atomic variable needs acquisition in a method. Instead of creating a newResourceGroup object just to add a single atomic variable along with the other methods,we apply the lock and unlock primitives directly on that atomic variable, possible due toResource interface inherited by atomic types, as explained in Section 4.2.2. This optimiza-tion effectively avoids the overhead associated with the creation of a new ResourceGroup

object. However, there is a particular scenario in which this optimization cannot be em-ployed. Due to the nature of this optimization, if the atomic variable is aliased through-out the method, the resource initially locked at the start might not be the same resourcebeing unlocked at the end. As such, the single resource mapper can only be employedwhen only one atomic variable is to be acquired in the method and that atomic variable isnever aliased throughout the method, i.e. the resource referenced by that atomic variableis always the same during the execution of the method. In all other cases, the normalmapper is employed.

Let l be a lock associated with an atomic variable or a partition. The single resourcemapper diverges from the mapper in the sense that only two statements have to be in-jected: l.lock() and l.unlock(). The lock statement is injected at the start of the method.If the atomic variable is not declared at the start of the method, the injection of the lock

statement has to be accordingly delayed until it is declared. Consequently, the unlock

statement shares the same mapping as rg.unlock() in the multi-resource mapper imple-mentation: inject at all possible exit points of the method. An example of the utilizationof this mapper is illustrated in Listing 4.12, where a global partition is the single resourcein the method.

The single-resource mapper shares the same limitations (and respective solutions) ofits multi-resource counterpart, namely lack of support for releasing locks in case of anexception.

4.5.4 Injecting the missing imports

After all the injection of statements in the AST, some of the classes modified by the map-per require additional import statements in the class declaration. These additional im-ports required derive from three sources: the use of the ResourceGroups API, the use ofthe global partition holder class and the use of the new atomic types. This phase, Import-Phase, takes requests from other phases to add the dependencies required for certainclasses. Every time the mapper injects a ResourceGroup object, a request is sent to add theResourceGroup class to the list of imports, if missing. Similarly, when the mapper injectsa global partition, a request sent to add the GlobalPartitionHolderClass to the list ofimports, if missing. The other source of requests is the ConvertPhase. When it convertsa type into its atomic equivalent, a request is sent to add the atomic class to the list ofimports, if missing.

65


1 p u b l i c boolean contains( i n t key) {2 GlobalPartitionsHolderClass.GLOBAL_partition_00.lock();3 AtomicRBNode node = getRoot ();4 whi le (node != sentinelNode) {5 i f (key == node.getValue ()) {6 boolean returnExpression = t r u e ;7 GlobalPartitionsHolderClass.GLOBAL_partition_00.unlock ();8 r e t u r n returnExpression;9 } e l s e {

10 i f (key < node.getValue ()) {11 node = node.getLeft ();12 } e l s e {13 node = node.getRight ();14 }15 }16 }17 boolean returnExpression = f a l s e ;18 GlobalPartitionsHolderClass.GLOBAL_partition_00.unlock ();19 r e t u r n returnExpression;20 }

Listing 4.12: Example of contain method of the RBTree benchmark after the use of thesingle-resource mapper.

66

5Evaluation

In this chapter we perform an evaluation of our model and implementation. The modelwill be compared against competitor models according to their productivity and theproperties guaranteed by each system. The implementation’s performance will be com-pared in a set of benchmarks against the state of the art in concurrency control mecha-nisms.

5.1 Productivity

In this section, we will present an analysis of the productivity enabled by our system. Itwill be compared against the current state of the art in data-centric concurrency control,AJ (atomic sets), and against the state of the art code-centric approaches based on atomicsections.

5.1.1 versus AJ

A clear advantage our system provides against AJ in the simplicity department is the re-duced number of constructs used by our system, one, against the five constructs used byAJ. These five constructs also hold distinct semantic meanings as exposed in Section 2.2.1,while our system’s solo annotation has a single semantic meaning. Furthermore, the pro-grammer has to manually specify the atomic set that each variable belongs to, which isnot required in our model.

We claim that most of these annotations should be delegated to the underlying sys-tem. The programmer should not be encumbered by having such a considerable numberof annotations with distinct use-cases.

67

5. EVALUATION 5.1. Productivity

AJ Our system

atomicset atomic owned unitfor alias (/*this.L=L*/) Total @Atomic

TSP 2 9 0 0 0 11 9Elevator 1 4 0 8 8 21 4Weblech 2 4 0 0 0 6 4Jcursez1 5 15 0 16 29 65 25Cewolf 4 5 0 0 1 10 5

Collections 5 53 0 40 330 428 146

Table 5.1: Comparison between the number of annotations required in AJ and our system.

A direct comparison of the number of constructs required with AJ against our systemis illustrated in Table 5.1. The programs used are part of the programs provided by theauthors of AJ1, with the required modifications already made. It is possible to observethat our system requires a lower number of annotations on all the programs. The use ofatomic(s) on AJ is a subset of our use of the @Atomic annotation. Most other annotationsin AJ simply do not require an equivalent in our system. The atomicset(s) constructis not used because we do not require the explicit association of an atomic variable toan atomic set. The unitfor construct, used to express that an atomic scope must obtainexclusive access to more than one resource set, shares some affinity with our use of the@Atomic annotation on function parameters.

In retrospective, the five constructs are severally different and require careful thoughtabout their application. It is not trivial for a programmer to identify the use-case foreach construct immediately. As such, we argue that the complexity of the mental processof the programmer is much higher when using AJ compared to our system. Moreover,the non-trivial reasoning from AJ is more prone to mistakes in their use. These mistakesmight lead to the violation of desirable atomicity constraints. In more extreme scenarios,these mistakes might even lead to deadlock situations.

In our system, the annotation of which variables must be evaluated atomically isenough to express concurrency restrictions in a safe manner. This safety in enforced bythe type system, assuring that all atomic resources are always referenced by atomic vari-ables who, in turn, are guaranteed to be evaluated in a unit of work that guarantees itsatomicity. Furthermore, due to the nature of the type system, a misuse of our annotationswill quickly trigger a type check error, effectively preventing the compilation and subse-quent execution of a program that does not uphold the desired atomicity constraints.

Additionally, AJ does not guarantee the absence of deadlocks in all dynamic scenarioswhere transitive circular dependencies between atomic sets might occur [VTD06]. Thislimitation was addressed in [MHD+13], delegating into the programmer the responsi-bility of ordering the resources by using an extra < annotation. On the other hand, oursystem requires only the @Atomic annotation to guarantee the absence of deadlocks in allscenarios, guaranteeing progress of the execution.

1 Available on the authors website. http://sss.cs.purdue.edu/projects/aj/

68

http://sss.cs.purdue.edu/projects/aj/


However, AJ is more generic than our system by allowing the acquisition of two dis-tinct atomic sets in the same unit of work and by having annotations that can be used todeal with high-level data-races, given that the programmer identifies them beforehand.Nonetheless, none of our case-studies nor the ones supplied2 by the authors of AJ bene-fited from this feature.

Static Dynamic Livelock-free Atomicity Isolation Composable

Atomic Sets Yes Mostly yes Yes Weak- Yes YesOur model Yes Yes Yes Strong Yes Yes

Table 5.2: Overview of properties offered by our system and by AJ.

Illustrated in Table 5.2 is a comparison between the properties offered by our systemand AJ. According to what was possible to determine for AJ, nothing guarantees thatthe resources referenced by an atomic set are always associated to an atomic set. Thus,we believe AJ only guarantees weak atomicity, unlike our system that provides strongatomicity. The other noteworthy property is deadlock-freedom in dynamic scenarios,that is not fully guaranteed in AJ while our system completely guarantees it.

The issue of high-level data-races might arise in both systems, but due to distinct fac-tors. In AJ, high-level data-races are accounted for and avoided by the system given thatthe programmer applies the correct primitives. But the reasoning behind the applicationof those primitives and the detection of high-level data-races prone situations still resideson the programmer. Both aliasing or unitfor annotations are required and should be ap-plied in situations where a high-level data-race could occur. One of the works relatedto AJ, presented in Section 2.2.1, proposes the dynamic detection of high-level data-racesthat should be annotated. However, since this approach relies on executing the programand detecting the data-races when they occur, their absence is not guaranteed by thesystem at compile time.

In that perspective, our system also does not guarantee the absence of data-races.Nevertheless, if detected by the programmer, they can be fixed. Firstly, a data-race canoccur if at least two units of work nested in a method access the same resources. In eachunit of work, the resources are evaluated atomically. However, during the transitionbetween units of work in the caller method, the resource is not acquired by the currentthread. In this scenario, between both units of work, a data-race can occur in which anoutside thread might access the resource.

Consider the example in Listings 5.1 and 5.2, respectively featuring our system and AJin the same scenario. In this example, a data-race might occur during the transfer

method, between invocations of the withdraw and the deposit methods. Both imple-mentations incur the same data-race possibility. More specifically in our system, whileboth withdraw and deposit methods are units of work and evaluate the atomic account

2See footnote 1.

69


1 p u b l i c c lass Account {2 i n t balance = 0 ;3 (...)4 }56 p u b l i c c lass Bank {7 @Atomic Account [] accounts ;89 (...)

1011 p u b l i c void transfer( i n t accNum_A , i n t accNum_B , i n t amount) {12 withdraw(accNum_A , amount) ;13 // during this transition , both accounts can be accessed14 deposit(accNum_B , amount) ;15 }1617 p u b l i c void withdraw( i n t accNum , i n t amount) {18 @Atomic Account acc = accounts[accNum] ;19 acc.balance -= amount ;20 }2122 p u b l i c void deposit( i n t accNum , i n t amount) {23 @Atomic Account acc = accounts[accNum] ;24 acc.balance += amount ;25 }26 }

Listing 5.1: An example of a bank transfer between two accounts that incurs a high-level data-race in our system.

referenced, the transfer method is not a unit of work because it does not reference anyatomic variable.

After the programmer detects this scenario, it can effectively avoid the data-race byforcing the acquisition of this resource in the method where both units of work are nested,as shown in Figures 5.3 and 5.4. Since now two accounts are acquired by the callermethod, the transition between the inner units of work is not seen by an outside thread,effectively avoiding the data-race between the two inner units of work. We argue that thisprocess of fixing perceived data-races is simpler when employing our system comparedto AJ.

5.1.2 versus atomic sections

Firstly, an obvious advantage of our system is the use of a data-centric concurrency con-trol system, with all the benefits that it entails. In this regard, the most positive aspectis the centralization of concurrency errors. In a code-centric approach, such as in atomicsections, a concurrency error might arise in the incorrect use of a single construct, or fromthe lack of it in a single critical section.

However, by employing a data-centric approach, the use of lower level concurrencymechanisms is delegated on the underlying system, in which the programmer only hasto annotate the variables that should be atomically evaluated throughout the program.

70


1 p u b l i c c lass Account {2 atomicse t (a)3 atomic(a) i n t balance = 0 ;4 (...)5 }67 p u b l i c c lass Bank {8 atomicse t (s)9 atomic(s) Account [] accounts ;

1011 (...)1213 p u b l i c void transfer( i n t accNum_A , i n t accNum_B , i n t amount) {14 withdraw(accNum_A , amount) ;15 // during this transition , both accounts can be accessed16 deposit(accNum_B , amount) ;17 }1819 p u b l i c void withdraw( i n t accNum , i n t amount) {20 Account acc = accounts[accNum] ;21 acc.balance -= amount ;22 }2324 p u b l i c void deposit( i n t accNum , i n t amount) {25 Account acc = accounts[accNum] ;26 acc.balance += amount ;27 }28 }

Listing 5.2: An example of a bank transfer between two accounts that incurs a high-level data-race in AJ.

1 p u b l i c c lass Bank {2 @Atomic Account [] accounts ;34 (...)56 p u b l i c void transfer( i n t accNum_A , i n t accNum_B , i n t amount) {7 @Atomic Account acc_A = accounts[accNum_A] ;8 @Atomic Account acc_B = accounts[accNum_B] ;9 withdraw(acc_A , amount) ;

10 // during this transition ,11 // both accounts are still acquired by this thread12 deposit(acc_B , amount) ;13 }1415 p u b l i c void withdraw(@Atomic Account acc , i n t amount) {16 acc.balance -= amount ;17 }1819 p u b l i c void deposit(@Atomic Account acc , i n t amount) {20 acc.balance += amount ;21 }22 }

Listing 5.3: Fixing the high-level data-race presented in Listing 5.1 using our system.

71


1 p u b l i c c lass Account {2 atomicse t (a)3 atomic(a) i n t balance = 0 ;4 (...)5 }67 p u b l i c c lass Bank {8 atomicse t (s)9 atomic(s) Account [] accounts ;

1011 (...)1213 p u b l i c void transfer( i n t accNum_A , i n t accNum_B , i n t amount) {14 withdraw(acc_A , amount) ;15 deposit(acc_B , amount) ;16 }1718 p u b l i c void withdraw( u n i t f o r (a) Account acc , i n t amount) {19 Account|a= t h i s .s| acc_A = accounts[accNum_A] ;20 acc.balance -= amount ;21 }2223 p u b l i c void deposit( u n i t f o r (a) Account acc , i n t amount) {24 Account|a= t h i s .s| acc_B = accounts[accNum_A] ;25 acc.balance += amount ;26 }27 }

Listing 5.4: Fixing the high-level data-race presented in Listing 5.2 using AJ.

If any concurrency error is found in this approach, it is surely at the declaration of anatomic variables.

Additionally, in our system, it is enough to annotate a variable with @Atomic to dis-seminate its concurrency requirements throughout the program. The compiler will refuseto compile until all these are fulfilled. In other words, until all its aliases (both local vari-ables and function parameters) are also annotated with @Atomic.

Atomic sections Our system

Total Primary Derived Total

TSP 15 9 0 9Elevator 9 4 0 4

Bank 3 1 5 6IntHashSet 3 5 0 5LinkedList 3 1 13 14

RBTree 3 3 15 18

Table 5.3: Comparison between the number of annotations required in DeuceSTM andour system.

A comparison on the number of annotations required was done against DeuceSTM, aoptimistic software transactional memory system that uses atomic sections. The results

72

5. EVALUATION 5.2. Performance

attained are illustrated in Table 5.3. The primary column symbolizes obligatory annota-tions, from whom all other can be derived, represented in the derived column. Four of thesix programs, belonging to the DeuceSTM benchmark suite, resulted in a higher numberof annotations when using our system. The numbers of annotations in the benchmarksthat use recursive data-structures (RBTree and LinkedList) is inflated due to the great useof atomic local variables that have to be annotated. However, in these two benchmarks,a significant portion of the annotations are not primary, but derived from the primaryannotations. This inference of derived annotations could be completely delegated intoan IDE, further emphasizing the centrality of the system and reducing the number ofannotations required.

Moreover, it is important to understand that the results are in part regulated by therelation between the number of methods and the number of data that requires atomic ex-ecution. Even though each benchmark shares some degree of complexity, with multiplevariables (more than three in all programs) that store the benchmark’s state, only threeoperations need to be effectively synchronized. This means that the atomic sections onlyneed to be applied to three functions. As such, these benchmarks are, in a way, not agood representative of the data-centric paradigm.

The justification is inherently correlated with the scaling of data-centric annotations.A program with a single structure that holds its state and uses hundreds of methods willonly require, in minimum, one primary single data-centric annotation in the said struc-ture. On the other hand, a code-centric system would have to identify the methods thatmake use of the said structure and then apply the atomic section construct, certainly, agreater number of times compared to the data-centric approach. Furthermore, forgettinga critical atomic section construct would issue no compile time error whatsoever, but theatomicity constraints would be broken.

The reverse is also true, as verified in the aforementioned four benchmarks. A pro-gram with a great number of distinct structures that only has three operations that haveto be performed atomically will probably require a lower number of atomic section con-structs compared to a data-centric approach. Nevertheless, we argue that the latter sce-nario is less common and more artificial.

Additionally, most code-centric systems either do not guarantee deadlock-freedom inall scenarios or do not guarantee livelock-freedom. We have implemented the proposedmodel in such a way that progress is ensured in all scenarios. A formal proof of suchclaim is in the works.

5.2 Performance

The main focus of this thesis is not the performance. Nonetheless, we still want to assessthe performance of our prototype implementation. Ergo, in this section, we compare itsperformance against AJ and leading techs in code-centric concurrency for Java.

Regarding the implementations available for AJ, we only had access to a limited set

73


of programs, both in the original and transformed versions. The translation mechanismused by AJ was not supplied by the authors, thus we were limited to the source code andsubsequent Java translation of a set of six examples. Only one of those programs qualifiedas a performance benchmark, the travelling salesman problem (TSP), thus representingthe only benchmark used in this comparison.

Regarding the code-centric comparison, we selected both Java synchronized blocksand a reference STM, DeuceSTM. The benchmarks used for this comparison were takenfrom the DeuceSTM benchmark suite. Those were then adapted into a naïve implemen-tation using Java synchronized blocks, where all methods have the synchronized keywordadded.

For each benchmark configuration (thread number, contention), a minimum of 100executions were made. Exceptionally, due to its short execution time, the TSP benchmarkwas executed a minimum of 1000 times for each configuration. The execution of eachbenchmark lasted thirty seconds, except the TSP benchmark where the execution time isthe actual benchmark result. The bottom 10% and the upper 10% of results from eachconfiguration were excluded to eliminate potential outliers. The remaining values wereused to calculate the average, that in turn was used to plot the graphs presented in thissection.

Test infrastructure All tests were ran on a machine with 4 AMD Opteron™ processorswith 16 cores each, totalling 64 cores, and with 64 GigaBytes of RAM. The Java versionused to compile and run the benchmarks was OpenJDK 8 Build b94 x86_64.

Initial Remarks The operations which rely only on reading values from arrays, suchas calculating the sum of money in all accounts in the bank benchmark, achieved a greatincrease in scalability when using read-write locks compared to the naïve implementa-tion using exclusive locks. In the other operations where that pattern was not present,no decrease in performance was verified. This is explained due to the fact that acquiringa write lock in a read-writer lock has equal overhead to the acquisition of a writer-onlylock. As such, the results presented are relevant only to the read-write lock implemen-tation. Additionally, both the original and final codes generated3 by our plugin used inthese benchmarks are available, respectively, in Appendix A and B.

5.2.1 Comparison with AJ

The travelling salesmen benchmark consists of a group of threads that will test all pos-sible combinations of paths possible, update the best path found so far if applicable tothe current path and continue testing other paths. While not embarrassingly parallel, itis possible to achieve a great degree of concurrency, since only accesses to the structures

3While our plugin does not generate code per se, it is possible to print sources right before the bytecode isgenerated using the hidden javac compilation flag ’-printsource -d <generated_dir>’.

74


1 2 4 8 16 32 64 128 256 512

2

4

8

16

32

64

128

256

512

Number of threads

Seco

nds

AJ

synchronized

RGs

Figure 5.1: TSP benchmark performance comparison.

that hold the best path found so far have to be protected. The iterative procedure oftesting the path is completely private to each thread.

As illustrated in Figure 5.1, our system manages to provide the exact same gain inperformance as the fine-grained synchronized blocks implementation, arguably, the bestpossible performance for this benchmark. On the other hand, AJ does not scale at allin this benchmark. In it, the application of concurrency control is very coarse-grained(one lock to protect all structures), that leads to the serialization of the benchmark at best.Furthermore, as the number of threads increases, the contention between threads in thecoarse-grain synchronized blocks used by AJ actually degenerates the performance.

5.2.2 Comparison against synchronized blocks and DeuceSTM

In this section, both static and dynamic scenarios will be evaluated. In specific, the bankand hash set benchmarks make use of static scenarios, while the other two benchmarks,red-black tree and linked list, rely on dynamic scenarios. Note that the later three bench-marks, taken from DeuceSTM, are various implementations of an IntSet, a set of integers.This structure supports the addition, removal, and lookup of an element. Each imple-mentation uses a distinct underlying data structure, namely a single linked list, a hashtable and a red-black tree.

Bank The bank benchmark from DeuceSTM simulates, as the name implies, a bank. Thebank supports the addition and removal of accounts, transferring money between twoaccounts, adding a fixed interest rate to all accounts and calculating the sum of moneyin all accounts. First and foremost, the most distinguishing aspect of this benchmark is

75


2 4 8 16 32 64 128 256 5120

100

200

300

400

500

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

DeuceSTM LOW

DeuceSTM MED

DeuceSTM HIGH

Sync LOW

Sync MED

Sync HIGH

Figure 5.2: Bank benchmark performance comparison.

the fact that it consists almost exclusively in static scenarios. As such, this benchmarkprovides the best representation of the potential for concurrency in a pessimistic system,such as ours.

Observation of Figure 5.2 reveals that our system outperforms software transactionalmemory at all contention levels. Due to the static nature of the transactions between twoaccounts where an ordered locking of both accounts is sufficient to assure atomicity, ourpessimistic system yields great performance compared to software transactional memorythat has retry this operation due to the emergence of conflicts. This fact is especially re-vealing at higher thread counts, where DeuceSTM converges into a very low throughputcompared to our system.

In this benchmark, the naïve Java synchronized block implementation achieved worseperformance compared to our system. Our data-centric approach permits the concur-rent execution of transfers upon non-overlapping pairs of accounts, while synchronizedblocks implementation does not. The superior results at lower thread counts can be jus-tified by the locality to a single CPU (16 cores) of the concurent operations and lock op-erations used. As such, this gain in concurrency allows our system to offset the overheadfrom the use of fine-grained synchronization primitives, ultimately allowing our systemto outperform the synchronized blocks implementation.

Hash set The hash set benchmark builds from a hash table implementation. As illustratedon Figure 5.3, our system manages to outperform transactional memory at medium andhigh contention. The hash table relies on a single array, and most operations require ac-cess to a single position of the array. Due to disjoint data accesses patterns, transactional

76


2 4 8 16 32 64 128 256 5120

200

400

600

800

1,000

1,200

1,400

1,600

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

DeuceSTM LOW

DeuceSTM MED

DeuceSTM HIGH

Sync LOW

Sync MED

Sync HIGH

Figure 5.3: HashSet benchmark performance comparison.

memory thrives under low contention, as expected. The increase in the possibility of con-flicts are exacerbated as contention increases. This is specially relevant because the add

method might require a rehash of the table, a costly operation that requires access to allpositions of the array, disrupting accesses to all of them.

The synchronized blocks implementation outperformed, surprisingly, both systems.The only combination that managed to outperform this implementation was DeuceSTMunder low contention. The hash table operations used (add, remove and contains) sharea constant temporal complexity (O(1)) on average. As such, the operations are too short-lived to compensate the very significant overhead inherent to the fine-grained use of syn-chronization primitives in our system. As such, the synchronized blocks implementationoutperforms our system in this scenario.

During the high contention test, as the number of threads increased beyond the num-ber of physical cores, DeuceSTM entered regularly in livelock on the rehash operation.The same scenario did not occur on our system due to the acquisition of a write-lockon the add method. However, such lock is not required for a regular add operation.Only a read-lock to the array and a write-lock to the specific position accessed would berequired. This limitation is imposed due to the inability to promote read-locks to write-locks. As such, the compiler settles for the highest denominator access found, which inthis case represents the possibility to invoke the rehash operation.

Due to the programming model of our system, it is possible to avoid this write-lockand convert it into a read-lock. This would require a slight re-architecture of the add

method. Instead of finding the correct position and then rehashing if required after in-sertion, all in the same method, the programmer would develop an outer method, as

77


2 4 8 16 32 64 128 256 5120

500

1,000

1,500

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

RGs (Tweaked) LOW

RGs (Tweaked) MED

RGs (Tweaked) HIGH

Figure 5.4: HashSet benchmark performance comparison, after tweaking the add methodto reduce the usage of the write-lock.

illustrated in Listing B.1. This new add outer method would invoke a new new_add (theold add) method that adds the integer to the table if it does not require a rehash, failingotherwise. If the new_add method succeeded, rehash would not be required. Since thenew new_add method does not require a rehash, the table array only requires the acquisi-tion of a read-lock. As such, the write-lock is then exclusively delegated into the rehash

method. Since the rehash is only required if the new_add fails, the write-lock acquisi-tion in the rehash method is only acquired when strictly necessary. Additionally, due tothe extremely limited amount of rehashes required during the execution of the program,concurrency is improved, as shown in Figure 5.4.

However, this approach has its pitfalls: since the new outer method does not accessthe @Atomic array directly, atomicity will not be enforced on this method, but rather onthe delegating methods. As such, the programmer has to keep this in mind when ap-plying this tweak to the program, and attempt to repeat the rehash/add operation untilit succeeds. This tweak is artificially generating a data-race between the rehash and theadd method that was not present before, hence the need to retry. This subject is closelyrelated to the high-level data-races issue discussed in Section 5.1.1. While not a simplemodification, it highlights the power and flexibility of the model presented in this thesis.

Red-black tree The red-black tree benchmark had mixed performance, as illustrated onFigure 5.5. Due to the disjoint nature of the tree and most operations occurring on aspecific node instead of modifying the whole tree, transactional memory provides better

78


2 4 8 16 32 64 128 256 5120

1,000

2,000

3,000

4,000

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

DeuceSTM LOW

DeuceSTM MED

DeuceSTM HIGH

Sync LOW

Sync MED

Sync HIGH

Figure 5.5: RBTree benchmark performance comparison.

performance at lower contentions. As contention increases, conflicts increase, diminish-ing the transactional memory throughput. Our pessimistic approach becomes more de-sirable at higher contentions, outperforming transactional memory at medium and highcontention.

The synchronized blocks implementation achieved inferior performance to our sys-tem. However, since our system also forces serialization due to partitions injected, theserialization cannot be blamed for the inferior performance in the synchronized blocksimplementation. In fact, this worse performance is attributed to the forced re-acquisitionof the locks in the synchronized blocks. The three main methods in the red-black, add,remove, and contains, invoke several sub-methods, that are also using synchronizedblocks. As such, a single invocation of the main methods might require several lockre-acquisitions in the sub-methods. But in our system, those locks in the sub-methodsare removed due to the redundant lock identification and removal algorithm presentedin Section 4.4.2. In conclusion, even though our system also serializes the main methodsin the red-black tree, these methods have a lower overhead due to the absence of the re-dundant locks that are present in the synchronized blocks implementation, allowing oursystem to perform better.

Linked list The linked list implementation had the worst performance by a great marginas illustrated on Figure 5.6. Traversing a (singly) linked list represents a dynamic scenariodue to the inability to order the resource acquisitions beforehand. To avoid the possibil-ity of a deadlock in these situations, the partitioning algorithm generates a partition toprotect unordered acquisition of list nodes. This partition, when applied to all methods

79


2 4 8 16 32 64 128 256 5120

500

1,000

1,500

2,000

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

DeuceSTM LOW

DeuceSTM MED

DeuceSTM HIGH

Sync LOW

Sync MED

Sync HIGH

Figure 5.6: LinkedList benchmark performance comparison.

that traverse the linked list will effectively serialize the execution of those methods.

However, such protection is unnecessary due to the way that the list is traversed. Thelist is traversed always from the head to the tail. As such, the deadlock-prone situationhere detected does not happen due to the one-way recursive nature of the structure.

Without the partition, as illustrated in Listing B.2, only the previous and next node areacquired at any time while the correct position is found, leaving the remaining nodes freefor the other threads. As such, the program remains correct and deadlock-free. Remov-ing the partition and making a fine-grained approach where each node is acquired andreleased as the list is traversed, our system achieves increased scalability, as illustrated inFigure 5.7. Unfortunately, due to the severe overhead incurred by having one lock oper-ation per node, the performance is not as good as the increased parallelism would havelead us to think. These kind of situations, where partitions are not effectively required,could be inferred by the compiler and is a possibility for future work.

The linked list benchmark is an example of a data structure that is very fond of anoptimistic approach to concurrency, greatly due to the low rate of conflicts. The nodestraversed are not modified, therefore, a conflict will only arise on the lesser possibilitymultiple threads modified the exact same previous and next nodes. As such, the highperformance of DeuceSTM in this benchmark does not come as a surprise, outperformingboth our system and the synchronized blocks implementation.

Similarly to the red-black tree benchmark, the partition effectively serializes the ac-cesses to the linked-list. Again, this culminates in our system having equivalent per-formance to the synchronized blocks implementation due to the similarity between the

80


2 4 8 16 32 64 128 256 5120

50

100

150

200

Number of threads

Thou

sand

sof

oper

atio

nspe

rse

cond

RGs LOW

RGs MED

RGs HIGH

RG (No Part) LOW

RG (No Part) MED

RG (No Part) HIGH

Figure 5.7: LinkedList benchmark performance comparison against a variant without thesuperfluous partition and using fine-grained locks.

partition’s mapping to the synchronized keywords applied to the functions of the synchro-nized blocks implementation.

5.2.3 Closing Remarks

As a whole, our system manages to achieve respectable performance compared to otheralternatives. The high performance inherent to the fact it adopts a pessimistic approachis evidenced in programs that make use of static scenarios, where optimistic approachesfall short. Conversely, programs that rely on dynamic scenarios might achieve better per-formance with an optimistic approach if they also report a low conflict rate. When highconflict rates are present, our system will most likely outperform software transactionalmemory.

Regarding the synchronized blocks implementation, we argue our model, besidesall the properties, centrality and non-intrusivity it brings to the table, also allows theimplementation of a system that can allow greater performance that a coarse synchro-nized blocks approach to concurrent programming, as was exposed in this section. Eventhough several optimizations are identified for future work, and that the performanceof our system was not the main contribution of this thesis, the performance achieved isalready very positive in a system that is still very recent. We believe that further work inthis implementation will allow us to further close the gap between our system and themanual application of low level synchronization primitives.

81


82

6Conclusion

In this work we proposed a data-centric concurrency management model, with a singleannotation, that guarantees strong atomicity and deadlock-freedom. Our model makesuses of the type-system to enforce these properties. To validate this model, a prototypewas produced based on pessimistic concurrency. This prototype was then comparedagainst state of the art concurrency control mechanisms, in both performance and pro-ductivity.

In the field of productivity, our system managed to require less modifications of thecode, on average, to achieve the desired atomicity constraints. In scenarios where our sys-tem required more annotations, a large percentage of those annotations could be inferredfrom a smaller subset of annotations and delegated into a tool such as an IDE. Addition-ally, we believe the use of our system is more intuitive compared to the other state ofthe art in data-centric concurrency control, AJ, due to only requiring one single annota-tion with a single meaning, unlike the latter. This simplicity, however, does not result ina loss in expressiveness. While AJ provides annotations to avoid high-level data-races,they require that the programmer identifies the high-level data-races correctly and applythe correct annotations. In turn, in our model it is also possible to avoid high-level data-races that the programmer identifies by using the atomic variables to the method thatmanifests said data-races. Regarding the properties, our system manages to guaranteestrong atomicity and deadlock-freedom in all scenarios, while AJ only guarantees weakatomicity and deadlock-freedom in some dynamic scenarios.

On the subject of performance, our system managed to achieve very competitive re-sults, despite the fact that the performance of the prototype was not the focus of thisthesis. In specific, applications that rely heavily on static scenarios achieve a very pos-itive performance using our system. Nevertheless, the results obtained show that the

83

6. CONCLUSION

prototype is mature and represents a good starting point for future work.Regarding future work, and as referenced throughout this thesis, there are a couple

of aspects that could be further developed in the future. First, it would be interestingto explore more sophisticated techniques that might allow us to reduce the number offalse-positives regarding the detection of deadlock-prone scenarios. Second, and from atechnical standpoint, the calculation of types in arbitrary expressions should be redone asto use the compiler’s type system if possible. Third, the creation of atomic types shouldbe reimplemented without the use of the extends clause, allowing it to be completelyfaithful to our model. Fourth, the work units should be surrounded by a try/catch blockto release the group acquired even in the face of an unexpected exception. Finally, perfor-mance improvements regarding the mapping performed could be attained by attemptingto narrow down the computations performed while having a group acquired.

The outcome of the work developed in this thesis was published at INForum 2013[PP13]. We believe that the work developed met our expectations, and will hopefullyfoster additional future developments.

84

Bibliography

[AAK+05] C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiser-son, and Sean Lie. Unbounded transactional memory. In 11th InternationalConference on High-Performance Computer Architecture (HPCA-11 2005), 12-16February 2005, San Francisco, CA, USA, pages 316–327, 2005.

[APT79] Bengt Aspvall, Michael F. Plass, and Robert Endre Tarjan. A linear-time algo-rithm for testing the truth of certain quantified boolean formulas. InformationProcessing Letters, 8(3), 1979.

[BLM06] Colin Blundell, E Christopher Lewis, and Milo M. K. Martin. UnrestrictedTransactional Memory: Supporting I/O and System Calls within Transac-tions. Technical Report CIS-06-09, Department of Computer and InformationScience, University of Pennsylvania, Apr 2006.

[Boe09] Hans-J. Boehm. Transactional memory should be an implementation tech-nique, not a programming interface. In Proceedings of the First USENIX con-ference on Hot topics in parallelism, HotPar’09, pages 15–15, Berkeley, CA, USA,2009. USENIX Association.

[Bou09] Gérard Boudol. A Deadlock-Free Semantics for Shared Memory Concur-rency. In Theoretical Aspects of Computing - ICTAC 2009, 6th International Col-loquium, Kuala Lumpur, Malaysia, August 16-20, 2009. Proceedings, pages 140–154, 2009.

[CCG08] Sigmund Cherem, Trishul M. Chilimbi, and Sumit Gulwani. Inferring locksfor atomic sections. In Proceedings of the ACM SIGPLAN 2008 Conference onProgramming Language Design and Implementation, Tucson, AZ, USA, June 7-13,2008, pages 304–315, 2008.

[DHM+12] Julian Dolby, Christian Hammer, Daniel Marino, Frank Tip, Mandana Vaziri,and Jan Vitek. A data-centric approach to synchronization. ACM Transactionson Programming Languages and Systems, 34(1):4, 2012.

85

BIBLIOGRAPHY

[Dij65] Edsger W. Dijkstra. Cooperating Sequential Processes, Technical Report.Technical Report EWD 123, 1965.

[DP12] Nuno Delgado and Hervé Paulino. Sobre um mecanismo de controlode concorrência baseado em grupos de recursos. In Antónia Lopesand José Orlando Pereira, editors, INForum 2012 - Atas do 4º Simpósiode Informática, pages 302–305. Universidade Nova de Lisboa, 09 2012.URL=http://asc.di.fct.unl.pt/ herve/papers/RG-INForum2012.pdf.

[EBA+11] Hadi Esmaeilzadeh, Emily R. Blem, Renée St. Amant, Karthikeyan Sankar-alingam, and Doug Burger. Dark silicon and the end of multicore scaling. In38th International Symposium on Computer Architecture (ISCA 2011), June 4-8,2011, San Jose, CA, USA, pages 365–376, 2011.

[EFJM07] Michael Emmi, Jeffrey S. Fischer, Ranjit Jhala, and Rupak Majumdar. Lockallocation. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages, POPL 2007, Nice, France, January 17-19,2007, pages 291–296, 2007.

[EK08] David Erni and Adrian Kuhn. The hacker’s guide to Javac. Bachelor’s thesis,supplementary documentation, University of Bern, August 2008.

[GLPT76] Jim Gray, Raymond A. Lorie, Gianfranco R. Putzolu, and Irving L. Traiger.Granularity of Locks and Degrees of Consistency in a Shared Data Base.In IFIP Working Conference on Modelling in Data Base Management Sys-tems,Freudenstadt, Germany, pages 365–394, 1976.

[HDVT08] Christian Hammer, Julian Dolby, Mandana Vaziri, and Frank Tip. Dynamicdetection of atomic-set-serializability violations. In 30th International Confer-ence on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18, 2008,pages 231–240, 2008.

[HFP06] Michael Hicks, Jeffrey S. Foster, and Polyvios Pratikakis. Lock Inference forAtomic Sections. TRANSACT’06: Proceedings of the 1st ACM SIGPLAN Work-shop on Languages, Compilers, and Hardware Support for Transactional Comput-ing, Ottawa, Canada., 2006.

[HLR10] Tim Harris, James R. Larus, and Ravi Rajwar. Transactional Memory, 2nd edi-tion. Synthesis Lectures on Computer Architecture. Morgan & Claypool Pub-lishers, 2010.

[HR83] Theo Härder and Andreas Reuter. Principles of transaction-orienteddatabase recovery. ACM Computing Surveys, 15(4):287–317, 1983.

86

BIBLIOGRAPHY

[Jin12] Wasuwee Sodsong Jingun Hong. Design, Implementation and Evaluationof a Task-parallel JPEG Decoder for the Libjpeg-turbo Library. InternationalJournal of Multimedia and Ubiquitous Engineering, 7(2), 2012.

[Jon07] Simon Peyton Jones. Beautiful Concurrency. In Beautiful Code: Leading Pro-grammers Explain How They Think (Theory in Practice (O’Reilly)). O’Reilly Me-dia, Inc., 2007.

[Lom77] David B. Lomet. Process structuring, synchronization, and recovery usingatomic actions. In Language Design for Reliable Software 1977, Raleigh, NorthCarolina, pages 128–137, 1977.

[MBM+06] Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, andDavid A. Wood. Logtm: log-based transactional memory. In 12th Interna-tional Symposium on High-Performance Computer Architecture, HPCA-12 2006,Austin, Texas, February 11-15, 2006, pages 254–265, 2006.

[MHD+] Daniel Marino, Christian Hammer, Julian Dolby, Mandana Vaziri, Frank Tip,and Jan Vitek. Detecting deadlock in programs with data-centric synchro-nization. In International Conference on Software Engineering (ICSE), San Fran-cisco, CA, 2013.

[MHD+13] Daniel Marino, Christian Hammer, Julian Dolby, Mandana Vaziri, Frank Tip,and Jan Vitek. Detecting deadlock in programs with data-centric synchro-nization. In Proceedings of the 2013 International Conference on Software Engi-neering, ICSE ’13, pages 322–331, Piscataway, NJ, USA, 2013. IEEE Press.

[MZGB06] Bill McCloskey, Feng Zhou, David Gay, and Eric A. Brewer. Autolocker: syn-chronization inference for atomic sections. In Proceedings of the 33rd ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL2006, Charleston, South Carolina, USA, January 11-13, 2006, pages 346–358,2006.

[PD12] Hervé Paulino and Nuno Delgado. A deadlock-free data-centric concurrencycontrol mechanism. Technical report, Departamento de Informática, Facul-dade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2012.

[PJ98] Jens Palsberg and C. Barry Jay. The essence of the visitor pattern. In Pro-ceedings of the 22nd International Computer Software and Applications Conference,COMPSAC ’98, pages 9–15, Washington, DC, USA, 1998. IEEE Computer So-ciety.

[PP13] Daniel Luis Landeiroto Parreira and Hervé Paulino. Uma abordagem altonível ao controlo de concorrência componível centrado nos dado. In João Ca-chopo and Beatriz Sousa Santos, editors, INForum 2013 - Atas do 5º Simpósio de

87

BIBLIOGRAPHY

Informática, pages 298–309. Escola de Ciências e Tecnologia da Universidadede Évora, 09 2013.

[PW10] Donald E. Porter and Emmett Witchel. Understanding transactional mem-ory performance. In IEEE International Symposium on Performance Analysis ofSystems and Software, ISPASS 2010, 28-30 March 2010, White Plains, NY, USA,pages 97–108, 2010.

[RG02] Ravi Rajwar and James R. Goodman. Transactional lock-free execution oflock-based programs. In Proceedings of the 10th International Conference on Ar-chitectural Support for Programming Languages and Operating Systems (ASPLOS-X), San Jose, California, USA, October 5-9, 2002, pages 5–17, 2002.

[SGG05] Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne. Operating systemconcepts (7. ed.). Wiley, 2005.

[ST95] Nir Shavit and Dan Touitou. Software Transactional Memory. In Proceedingsof the Fourteenth Annual ACM Symposium on Principles of Distributed Comput-ing, Ottawa, Ontario, Canada, August 20-23, 1995, pages 204–213, 1995.

[UBES09] Takayuki Usui, Reimer Behrends, Jacob Evans, and Yannis Smaragdakis.Adaptive Locks: Combining Transactions and Locks for Efficient Concur-rency. In PACT 2009, Proceedings of the 18th International Conference on ParallelArchitectures and Compilation Techniques, 12-16 September 2009, Raleigh, NorthCarolina, USA, pages 3–14, 2009.

[VTD06] Mandana Vaziri, Frank Tip, and Julian Dolby. Associating synchronizationconstraints with data in an object-oriented language. In Proceedings of the 33rdACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL 2006, Charleston, South Carolina, USA, January 11-13, 2006, pages 334–345, 2006.

[WLK+09] Yin Wang, Stéphane Lafortune, Terence Kelly, Manjunath Kudlur, andScott A. Mahlke. The theory of deadlock avoidance via discrete control. InProceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Pro-gramming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009,pages 252–263, 2009.

88

AAppendix of the annotated code

1 package tests.microbenchmarksRW.intset;

23 impor t annotations.Atomic;

4 impor t tests.microbenchmarksRW.finalClasses .*;

56 p u b l i c c lass IntSetHash implements IntSet {

78 @Atomic

9 p r i v a t e XIntVector set = new XIntVector (13);

10 @Atomic

11 p r i v a t e XByteVector states = new XByteVector (13);

12 @Atomic

13 p r i v a t e XInteger free = new XInteger(states.value.length);

14 @Atomic

15 p r i v a t e XInteger size = new XInteger (0);

16 @Atomic

17 p r i v a t e XInteger maxSize = new XInteger(states.value.length /2);

1819 p r i v a t e s t a t i c f i n a l byte FREE = 0;

20 p r i v a t e s t a t i c f i n a l byte FULL = 1;

21 p r i v a t e s t a t i c f i n a l byte REMOVED = 2;

2223 p u b l i c boolean add( i n t value) {

24 i n t index = findIndex(value);

25 i f (index < 0) {

26 r e t u r n f a l s e ;

27 }

2829 byte previousState = states.value[index];

30 set.value[index] = value;

31 states.value[index] = FULL;

32 postInsertHook(previousState == FREE);

33

89

A. APPENDIX OF THE ANNOTATED CODE

34 r e t u r n t r u e ;

35 }

363738 p u b l i c boolean contains( i n t value) {

39 r e t u r n index(value) >= 0;

40 }

414243 p u b l i c boolean remove( i n t value) {

44 i n t index = index(value);

45 i f (index >= 0) {

46 set.value[index] = ( i n t )0;

47 states.value[index] = REMOVED;

48 size.value --;

49 r e t u r n t r u e ;

50 }

51 r e t u r n f a l s e ;

52 }

5354 p r i v a t e i n t index( i n t value) {

5556 i n t length = states.value.length;

57 i n t hash = (value * 31) & 0x7fffffff;

58 i n t index = hash % length;

59 i n t probe;

6061 i f (states.value[index] != FREE &&

62 (states.value[index] == REMOVED || set.value[index] != value)) {

63 probe = 1 + (hash % (length - 2));

6465 do {

66 index -= probe;

67 i f (index < 0) {

68 index += length;

69 }

70 } whi le (states.value[index] != FREE &&

71 (states.value[index] == REMOVED || set.value[index] != value));

72 }

7374 r e t u r n states.value[index] == FREE ? -1 : index;

75 }

7677 i n t findIndex( i n t value){


79 i n t hash = ((value * 31) & 0x7fffffff);


81 i n t probe;

82 i f (states.value[index] == FREE) {

83 r e t u r n index;

84 }

85 e l s e i f (states.value[index] == FULL && set.value[index] == value) {

86 r e t u r n -index -1;

87 } e l s e {


8990 i f (states.value[index] != REMOVED) {

91 do {

90


92 index -= probe;

93 i f (index < 0) {

94 index += length;

95 }

96 }

97 whi le (states.value[index] == FULL && set.value[index] != value);

98 }

99 i f (states.value[index] == REMOVED) {

100 i n t firstRemoved = index;

101 whi le (states.value[index] != FREE &&

102 (states.value[index] == REMOVED || set.value[index] != value)) {

103 index -= probe;

104 i f (index < 0) {

105 index += length;

106 }

107 }

108 r e t u r n states.value[index] == FULL ? -index -1 : firstRemoved;

109 }

110 r e t u r n states.value[index] == FULL ? -index -1 : index;

111 }

112 }

113114 p r o t e c t e d f i n a l void postInsertHook(boolean usedFreeSlot) {

115 i f (usedFreeSlot) {

116 free.value --;

117 }

118119 i f (++ size.value > maxSize.value || free.value == 0) {

120 i n t newCapacity =

121 size.value > maxSize.value

122 ? PrimeFinder.nextPrime(states.value.length << 1)

123 : states.value.length;

124 rehash(newCapacity);

125 maxSize.value = states.value.length /2;

126 }

127 }

128129 p r o t e c t e d void rehash( i n t newCapacity) {

130 i n t oldCapacity = set.value.length;

131 i n t oldSet [] = set.value;

132 byte oldStates [] = states.value;

133134 set = new XIntVector(newCapacity);

135 states = new XByteVector(newCapacity);

136137 f o r ( i n t i = oldCapacity; i-- > 0;) {

138 i f (oldStates[i] == FULL) {

139 i n t o = oldSet[i];

140 i n t index = findIndex(o);

141 set.value[index] = o;


143 }

144 }

145 }

146 }

Listing A.1: Original IntSetHash class of the hash set benchmark.

91




4 impor t java.util.ArrayList;

5 impor t java.util.List;

678 /**

9 * @author Pascal Felber

10 * @since 0.1

11 */

12 p u b l i c c lass IntSetLinkedList implements IntSet {

1314 p r i v a t e s t a t i c f i n a l long serialVersionUID = 1L;

1516 @Atomic

17 p r i v a t e Node m_first = new Node (0);

1819 p u b l i c IntSetLinkedList () {

20 @Atomic Node min = new Node(Integer.MIN_VALUE);

21 @Atomic Node max = new Node(Integer.MAX_VALUE);

22 min.setNext(max);

23 m_first = min;

24 }


27 boolean result;

2829 @Atomic Node previous = m_first;

30 @Atomic Node next = previous.getNext ();

31 i n t v;

32 whi le ((v = next.getValue ()) < value) {

33 previous = next;

34 next = previous.getNext ();

35 }

3637 result = v != value;

38 i f (result) {

39 previous.setNext(new Node(value ,next));

40 }

4142 r e t u r n result;

43 }


46 boolean result;



50 i n t v;


52 previous = next;


54 }

5556 result = v == value;

57 i f (result) {

92


58 previous.setNext(next.getNext ());

59 }


62 }


65 boolean result;



69 i n t v;


71 previous = next;


73 }

7475 result = (v == value);


78 }

79 }

Listing A.2: Original IntSetLinkedList class of the linked list benchmark.



45 /**

6 * Created with IntelliJ IDEA.

7 * User: MetaBrain

8 * Date: 18 -06 -2013

9 * Time: 10:15

10 *

11 * @autor metabrain (https :// github.com/metabrain) - Daniel Parreira 2013

12 */

13 p u b l i c c lass Node {


16 f i n a l p r i v a t e i n t m_value;

17 @Atomic

18 p r i v a t e Node m_next ;

1920 p u b l i c Node( i n t value , @Atomic Node next) {

21 m_value = value;

22 m_next = next;

23 }

2425 p u b l i c Node( i n t value) {

26 t h i s (value , n u l l );

27 }

2829 p u b l i c i n t getValue () {

30 r e t u r n m_value;

31 }

3233 p u b l i c void setNext(@Atomic Node next) {

93


34 m_next = next;

35 }

3637 @Atomic

38 p u b l i c Node getNext () {

39 r e t u r n m_next;

40 }

41 }

Listing A.3: Original Node class of the linked list benchmark.

94

BAppendix of generated code


23 impor t rgs.RWResourceGroup;

4 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXByteVector;

5 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXIntVector;

6 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXInteger;

78 p u b l i c c lass IntSetHash implements IntSet {

910 p u b l i c IntSetHash () {

11 super ();

12 }

13 p r i v a t e AtomicGENXIntVector set = new AtomicGENXIntVector (10000);

14 p r i v a t e AtomicGENXByteVector states = new AtomicGENXByteVector (10000);

15 p r i v a t e AtomicGENXInteger free = new AtomicGENXInteger(states.value.length);

16 p r i v a t e AtomicGENXInteger size = new AtomicGENXInteger (0);

1 p u b l i c c lass IntSetHash implements IntSet {23 (...)45 p u b l i c boolean add( i n t value) {6 i n t result ;78 whi le (( result = new_add(value))== REHASH_REQUIRED)9 postInsertHook( f a l s e );

1011 r e t u r n result ==ADDED ;12 }13 }

Listing B.1: Implementation of the new add method in the tweaked hash set benchmark.

95

B. APPENDIX OF GENERATED CODE

1 p u b l i c boolean add( i n t value) {2 RWResourceGroup rg = new RWResourceGroup ();3 boolean result;4 AtomicGENNode previous = m_first;5 rg.add(previous);6 rg.lock();7 AtomicGENNode next = previous.getNext ();8 rg.add(next);9 i n t v;

10 whi le ((v = next.getValue ()) < value) {11 rg.remove(previous);12 previous = next;13 next = previous.getNext ();14 rg.add(next) ;15 }1617 result = v != value;18 i f (result) {19 previous.setNext(new AtomicGENNode(value , next));20 }2122 boolean returnExpression = result;23 rg.unlock ();24 r e t u r n returnExpression;25 }

Listing B.2: Mapping of the add method in the linkedlist benchmark without thepartition.

17 p r i v a t e AtomicGENXInteger maxSize = new AtomicGENXInteger(states.value.length /

2);





24 RWResourceGroup rg = new RWResourceGroup ();

25 rg.add(set);

26 rg.lock();


28 i f (index < 0) {

29 boolean returnExpression = f a l s e ;

30 rg.unlock ();

31 r e t u r n returnExpression;

32 }

33 byte previousState = states.value[index];

34 set.value[index] = value;


36 postInsertHook(previousState == FREE);

37 boolean returnExpression = t r u e ;

38 rg.unlock ();


40 }



96


44 }



48 rg.add(set);

49 rg.lock();


51 i f (index >= 0) {

52 set.value[index] = ( i n t )0;

53 states.value[index] = REMOVED;

54 size.value --;


56 rg.unlock ();


58 }


60 rg.unlock ();


62 }



66 rg.addRead(set);

67 rg.lock();


69 i n t hash = (value * 31) & 2147483647;


71 i n t probe;

72 i f (states.value[index] != FREE && (states.value[index] == REMOVED || set.

value[index] != value)) {


74 do {

75 index -= probe;

76 i f (index < 0) {

77 index += length;

78 }

79 }

80 whi le (states.value[index] != FREE && (states.value[index] == REMOVED || set.

value[index] != value));

81 }

82 i n t returnExpression = states.value[index] == FREE ? -1 : index;

83 rg.unlock ();


85 }

8687 i n t findIndex( i n t value) {


89 i n t hash = ((value * 31) & 2147483647);


91 i n t probe;

92 i f (states.value[index] == FREE) {


94 } e l s e {

95 i f (states.value[index] == FULL && set.value[index] == value) {

96 r e t u r n -index - 1;

97 } e l s e {


99 i f (states.value[index] != REMOVED) {

97


100 do {

101 index -= probe;

102 i f (index < 0) {


104 }

105 } whi le (states.value[index] == FULL && set.

value[index] != value);

106 }

107 i f (states.value[index] == REMOVED) {


109 whi le (states.value[index] != FREE && (states.value[index] ==

REMOVED || set.value[index] != value)) {

110 index -= probe;

111 i f (index < 0) {


113 }

114 }

115 r e t u r n states.value[index] == FULL ? -index - 1 : firstRemoved;

116 }

117 r e t u r n states.value[index] == FULL ? -index - 1 : index;

118 }

119 }

120 }

121122 /**

123 * After an insert , this hook is called to adjust the size.value/free

124 * values of the set.value and to perform rehashing if necessary.

125 */



128 free.value --;

129 }


131 i n t newCapacity = size.value > maxSize.value ? PrimeFinder.nextPrime(

states.value.length << 1) : states.value.length;


133 maxSize.value = states.value.length / 2;

134 }

135 }



139 i n t [] oldSet = set.value;

140 byte [] oldStates = states.value;

141 RWResourceGroup rg = new RWResourceGroup () ;

142 set = new AtomicGENXIntVector(newCapacity);

143 states = new AtomicGENXByteVector(newCapacity);

144 rg.add(set);

145 rg.lock();

146 f o r ( i n t i = oldCapacity; i-- > 0; ) {

147 i f (oldStates[i] == FULL) {

148 i n t o = oldSet[i];


150 set.value[index] = o;


152 }

153 }

154 rg.unlock ();

98


155 }

156 }

Listing B.3: Generated IntSetHash class of the hash set benchmark.



4 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXByte;

5 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXByteVector;

6 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXIntVector;

7 impor t tests.microbenchmarksRW.finalClasses.AtomicGENXInteger;

89 p u b l i c c lass IntSetHash_Tweaked implements IntSet {

1011 p u b l i c IntSetHash_Tweaked () {}

1213 p r i v a t e AtomicGENXIntVector set = new AtomicGENXIntVector (100000);

14 p r i v a t e AtomicGENXByteVector states = new AtomicGENXByteVector (100000);

15 p r i v a t e AtomicGENXInteger free = new AtomicGENXInteger(states.value.length);

16 p r i v a t e AtomicGENXInteger size = new AtomicGENXInteger (0);

17 p r i v a t e AtomicGENXInteger maxSize = new AtomicGENXInteger(states.value.length /

2);




2223 p r i v a t e s t a t i c f i n a l i n t USED_FREE = 0 ;

24 p r i v a t e s t a t i c f i n a l i n t USED_FILED = 1 ;

25 p r i v a t e s t a t i c f i n a l i n t FAILED = 2 ;


28 i n t added = new_add(value) ;

29 i f (added!= FAILED)

30 postInsertHook(added==FREE);

3132 r e t u r n added!= FAILED ;

33 }

3435 p u b l i c i n t new_add( i n t value) {


37 rg.addRead(set);


39 i f (index < 0) {

40 i n t returnExpression = FAILED;

41 rg.unlock ();


43 }

44 rg.add(set.value[index]);

45 rg.lock();

46 byte previousState = states.value[index].value;

47 set.value[index].value = value;

48 states.value[index].value = FULL;

49 i n t returnExpression = previousState == FREE ? USED_FREE : USED_FILED;

50 rg.unlock ();


52 }

99




56 }



60 rg.addRead(set);


62 rg.lock();

63 i f (index >= 0) {

64 rg.add(set.value[index]);

65 rg.add(states.value[index]);

66 set.value[index].value = ( i n t )0;

67 states.value[index].value = REMOVED;

68 size.value --;


70 rg.unlock ();


72 }


74 rg.unlock ();


76 }



80 rg.addRead(set);

81 rg.lock();


83 i n t hash = (value * 31) & 2147483647;


85 i n t probe;

86 i f (states.value[index].value != FREE && (states.value[index].value ==

REMOVED || set.value[index].value != value)) {


88 do {

89 index -= probe;

90 i f (index < 0) {

91 index += length;

92 }

93 }

94 whi le (states.value[index].value != FREE && (states.value[index].value ==

REMOVED || set.value[index].value != value));

95 }

96 i n t returnExpression = states.value[index].value == FREE ? -1 : index;

97 rg.unlock ();


99 }

100101 i n t findIndex( i n t value) {


103 i n t hash = ((value * 31) & 2147483647);


105 i n t probe;

106 i f (states.value[index].value == FREE) {


108 } e l s e {

100


109 i f (states.value[index].value == FULL && set.value[index].value == value)

{

110 r e t u r n -index - 1;

111 } e l s e {


113 i f (states.value[index].value != REMOVED) {

114 do {

115 index -= probe;

116 i f (index < 0) {


118 }

119 } whi le (states.value[index].value == FULL &&

set.value[index].value != value);

120 }

121 i f (states.value[index].value == REMOVED) {


123 whi le (states.value[index].value != FREE && (states.value[index].

value == REMOVED || set.value[index].value != value)) {

124 index -= probe;

125 i f (index < 0) {


127 }

128 }

129 r e t u r n states.value[index].value == FULL ? -index - 1 :

firstRemoved;

130 }

131 r e t u r n states.value[index].value == FULL ? -index - 1 : index;

132 }

133 }

134 }

135136 /**

137 * After an insert , this hook is called to adjust the size.value/free

138 * values of the set.value and to perform rehashing if necessary.

139 */



142 free.value --;

143 }


145 i n t newCapacity = size.value > maxSize.value ? PrimeFinder.nextPrime(

states.value.length << 1) : states.value.length;


147 maxSize.value = states.value.length / 2;

148 }

149 }


152 RWResourceGroup rg = new RWResourceGroup () ;


154 AtomicGENXInteger [] oldSet = set.value;

155 AtomicGENXByte [] oldStates = states.value;

156 set = new AtomicGENXIntVector(newCapacity);

157 states = new AtomicGENXByteVector(newCapacity);

158 rg.add(set);

159 rg.lock();

160 f o r ( i n t i = oldCapacity; i-- > 0; ) {

161 i f (oldStates[i].value == FULL) {

101


162 i n t o = oldSet[i].value;


164 set.value[index].value = o;

165 states.value[index].value = FULL;

166 }

167 }

168 rg.unlock ();

169 }

170 }

Listing B.4: Generated IntSetHash class of the hash set benchmark, tweaked forperformance.



4 impor t java.util.ArrayList;

5 impor t java.util.List;

67 /**


9 * @since 0.1

10 */

11 p u b l i c c lass IntSetLinkedList implements IntSet {


14 p r i v a t e AtomicGENNode m_first = new AtomicGENNode (0);

1516 p u b l i c IntSetLinkedList () {


18 rg.add(m_first);

19 AtomicGENNode min = new AtomicGENNode(Integer.MIN_VALUE);

20 rg.add(min);

21 AtomicGENNode max = new AtomicGENNode(Integer.MAX_VALUE);

22 rg.add(max);

23 rg.lock();


25 m_first = min;

26 rg.unlock ();

27 }



31 rg.add(m_first);

32 boolean result;

33 rg.lock();

34 AtomicGENNode previous = m_first;

35 AtomicGENNode next = previous.getNext ();

36 i n t v;


38 previous = next;


40 }


42 i f (result) {

43 previous.setNext(new AtomicGENNode(value , next));

44 }

45 boolean returnExpression = result;

102


46 rg.unlock ();


48 }



52 rg.add(m_first);

53 boolean result;

54 rg.lock();



57 i n t v;


59 previous = next;


61 }


63 i f (result) {


65 }


67 rg.unlock ();


69 }



73 rg.add(m_first);

74 boolean result;

75 rg.lock();



78 i n t v;


80 previous = next;


82 }



85 rg.unlock ();


87 }

88 }

Listing B.5: Generated IntSetLinkedList class of the linked list benchmark.



45 /**


7 * @since 0.1

8 */

9 p u b l i c c lass IntSetLinkedList_No_Partition implements IntSet {


12 p r i v a t e AtomicGENNode m_first = new AtomicGENNode (0);

103


1314 p u b l i c IntSetLinkedList_No_Partition () {

15 AtomicGENNode min = new AtomicGENNode(Integer.MIN_VALUE);

16 AtomicGENNode max = new AtomicGENNode(Integer.MAX_VALUE);


18 m_first = min;

19 }



23 boolean result;


25 rg.add(previous);

26 rg.lock();


28 rg.add(next);

29 i n t v;


31 rg.remove(previous);

32 previous = next;


34 rg.add(next) ;

35 }


37 i f (result) {

38 previous.setNext(new AtomicGENNode(value , next));

39 }


41 rg.unlock ();


43 }



47 boolean result;



50 rg.lock();


52 rg.add(next);

53 i n t v;


55 AtomicGENNode old_previous = previous ;

56 previous = next;

57 rg.remove(old_previous);


59 rg.add(next) ;

60 }


62 i f (result) {


64 }


66 rg.unlock ();


68 }


104



72 boolean result;



75 rg.lock();


77 rg.add(next);

78 i n t v;


80 AtomicGENNode old_previous = previous ;

81 previous = next;

82 rg.remove(old_previous);


84 rg.add(next) ;

85 }



88 rg.unlock ();


90 }

91 }

Listing B.6: Generated IntSetLinkedList class of the linked list benchmark if thepartition was ignored.



45 /**

6 * Created with IntelliJ IDEA.

7 * User: MetaBrain

8 * Date: 18 -06 -2013

9 * Time: 10:15

10 *

11 * @autor metabrain (https :// github.com/metabrain) - Daniel Parreira 2013

12 */

13 p u b l i c c lass Node {

1415 /**

16 *

17 */


19 p r i v a t e f i n a l i n t m_value;

20 p r i v a t e AtomicGENNode m_next;

2122 p u b l i c Node( i n t value , AtomicGENNode next) {

23 // RWResourceGroup rg = new RWResourceGroup ();

24 // rg.add(m_next);

25 m_value = value;

26 // rg.lock();

27 m_next = next;

28 // rg.unlock ();

29 }

3031 p u b l i c Node( i n t value) {

32 t h i s (value , n u l l );

33 }

105


3435 p u b l i c i n t getValue () {

36 r e t u r n m_value;

37 }

3839 p u b l i c void setNext(AtomicGENNode next) {


41 rg.add(m_next);

42 rg.lock();

43 m_next = next;

44 rg.unlock ();

45 }

4647 p u b l i c AtomicGENNode getNext () {

48 m_next.lockRead ();

49 tests.microbenchmarksRW.intset.AtomicGENNode returnExpression = m_next;

50 m_next.unlockRead ();


52 }

53 }

Listing B.7: Generated Node class of the linked list benchmark.

106

data-centric concurrency control on the java programming...

Documents