a hybrid cloud - public

Faculty of Information Technology

Department of Databases

Specialization: Databases

Tomasz Kłosiński

9125

A Hybrid Cloud: Amazon Web Services and OpenStack

Master’s thesis written under the supervison of

prof. Krzysztof Stencel

Warsaw, March, 2015

2

Wydział Informatyki

Katedra baz danych

Specjalizacja: Bazy danych

Tomasz Kłosiński

9125

Chmura hybrydowa: Amazon Web Services oraz OpenStack

Praca magisterska napisana pod kierownictwem

prof. Krzysztofa Stencela

Warszawa, marzec, 2015

3

Abstract: The thesis aims to explain the notions of a cloud (Chapter I) and how its usage

influenced modern software delivery processes (Chapter II). It introduces authorial software

project developed using Chef and Vagrant that show the possibilities of a hybrid cloud based on

Amazon Web Services and OpenStack (Chapters III-IV). The principle problem of the thesis is the

question: how to make use of a hybrid cloud? My project presents one of many possible ways

in which this question can be answered.

Streszczenie: Celem tej pracy jest wyjaśnienie koncepcji chmury obliczeniowej (rozdział I) oraz

jak jej użycie wpłynęło na nowoczesny proces dostarczania oprogramowania (rozdział II). Praca

opisuje autorski projekt oprogramowania opracowany przy użyciu platformy Chef oraz

programu Vagrant, który pokazuje możliwości chmury hybrydowej opartej na Amazon Web

Services oraz OpenStack (rozdziały III-IV). Głównym problemem tej dysertacji jest pytanie: jak

zrobić użytek z chmury hybrydowej? Mój projekt jest jednym z wielu możliwych sposobów

odpowiedzi jakie można udzielić na to pytanie.

4

“In today’s computer industry, we still typically

install and maintain computers the way the

automotive industry built cars in the early 1900s. An

individual craftsman manually manipulates a machine

into being, and manually maintains it afterwards.

The automotive industry discovered first mass

production, then mass customization using standard

tooling. The systems administration industry has a

long way to go, but is getting there.”

— Steve Traugott and Joel Huddleston (www.infrastructures.org, circa 2003)

http://www.infrastructures.org/

5

Table of Contents

Introduction .......................................................................................................................................................... 7

Domain, topic and aim ...................................................................................................................................... 7

History of the research ..................................................................................................................................... 8

Chapter I: The problem’s domain ....................................................................................................................... 11

Cloud Computing ............................................................................................................................................ 13

Related concepts ........................................................................................................................................ 13

Definition .................................................................................................................................................... 14

History ........................................................................................................................................................ 15

Layers .......................................................................................................................................................... 16

Deployment models ................................................................................................................................... 17

OpenStack ....................................................................................................................................................... 18

Core Services .............................................................................................................................................. 19

Shared Services ........................................................................................................................................... 20

RDO OpenStack .......................................................................................................................................... 21

Amazon Web Services .................................................................................................................................... 21

Chapter II: Solutions of the problem .................................................................................................................. 26

DevOps ............................................................................................................................................................ 26

Agile ............................................................................................................................................................ 27

Infrastructure as a Code ............................................................................................................................. 27

Chapter III: Description of the project ................................................................................................................ 29

Assumptions and requirements ..................................................................................................................... 29

Development environment ........................................................................................................................ 31

System’s design .............................................................................................................................................. 32

Deployment flow ........................................................................................................................................ 32

System’s implementation ............................................................................................................................... 33

Git ............................................................................................................................................................... 33

Vagrant ....................................................................................................................................................... 34

Chef............................................................................................................................................................. 42

6

Koji cookbook ............................................................................................................................................. 49

Tests ................................................................................................................................................................ 68

Chapter IV: Conclusion ........................................................................................................................................ 70

Potential applications ..................................................................................................................................... 70

Suggestions on further studies and investigations ......................................................................................... 70

Testing ........................................................................................................................................................ 70

Continuous Integration, Deployment and Delivery .................................................................................... 71

Appendix A: Koji build system ............................................................................................................................ 72

Architecture .................................................................................................................................................... 72

Koji-hub ...................................................................................................................................................... 73

Koji-web ...................................................................................................................................................... 74

Kojira ........................................................................................................................................................... 74

Koji builder (kojid) ...................................................................................................................................... 74

Koji client .................................................................................................................................................... 74

Additional tools .......................................................................................................................................... 75

Appendix B: Project’s Vagrant files ..................................................................................................................... 76

Vagrantfile.vbox .............................................................................................................................................. 76

Vagrantfile.openstack ..................................................................................................................................... 79

Vagrantfile.aws ............................................................................................................................................... 81

Vagrantfile.production ................................................................................................................................... 83

Appendix C: Directories and files tree of the project ......................................................................................... 87

Figures ................................................................................................................................................................. 89

Bibliography ........................................................................................................................................................ 91

7

Introduction

Domain, topic and aim The domain of this thesis is the concept of cloud computing and its practical applications. In particular it is

focusing on automation and configuration management in a hybrid cloud environment.

The topic of this thesis consists of two parts: theoretical and practical. In the first one, it is explained in detail

the concept of cloud computing and its various types. It includes also a description of two particular cloud

technologies that were used in the project: OpenStack and Amazon Web Service. Second part consists of

practical example how to use those two cloud technologies. It describes how to create automation scripts to

deploy Koji cluster.

The goal of this paper is, on one hand, to present the possibilities that cloud computing enables on the

example of OpenStack and Amazon Web Services, and on the other, to exemplify its application using

configuration management and deployment automation (or so called “DevOps”) software: Vagrant and Chef.

I approached the problem from the practical side: let’s imagine that our new IT business requires a fast and

easily scalable solution to build RPM packages. How we could solve this problem using a hybrid cloud? My

answer to this problem is: find a software to build RPMs (see

8

Appendix A: Koji build system) and employ Chef and Vagrant to treat Infrastructure as a Code and make use

of best DevOps practices. In fact, the chosen technologies, are only small part of the rich and diverse market of cloud technologies and

related software (including configuration management and deployment automation tools). There exists a

number of both public cloud providers and private cloud solutions. However it would be unreasonable to try

to describe them all – the paper would expand exponentially. Instead I decided to pick one of public and private

cloud and use them as an example of a general concept. To show that cloud computing capabilities are not

only “buzzwords” on advertising flyers but they are real and practical enterprise-ready solutions, I decided to

show how to use Vagrant, Chef and ruby scripting with them. There are also of course plenty of others

applications that utilize the power of cloud computing, but, again, my aim was to show that this is possible and

how it works, and not to write a comprehensive review of all existing software stacks.

This thesis is inspired by my own professional experience. During my short career I have witnessed the major

shift in Polish IT industry that is still ongoing. Namely, adoption of cloud computing and adjusting development,

quality assurance and system administration procedures and technology stacks to the new paradigm. New

technology not only brings new software features but – what is even more important – it changes the

operations of organizations and businesses. It implies new ways of provision of IT services, new methods of

management of IT infrastructure and new possibilities for development of new products.

The tools that are applied in my project are also inspired by my past professional experience. I have worked

with such cloud computing technologies as OpenStack, Eucalyptus, OpenNebula, AWS, Rackspace and with

such cloud-related tools as Chef, Puppet, Ansible and Vagrant. There are also many others used in current IT

development and operations. My opinion is that in cloud computing world AWS and OpenStack are striving to

dominate the market.

History of the research Before the project started, I’ve had to choose the cloud technologies. I decided to use Amazon Web Services

and OpenStack because they are the most mature clouds and I personally have the best experience using these

two.

The research was conducted on my personal laptop with CentOS 7 Linux and RDO OpenStack installed and my

personal Amazon Web Services account. After investigation of existing literature and Internet resources related

to my project, I’ve started working on the code.

The project consists of one Chef cookbook and four Vagrant’s virtual environment configuration files (main

one, Vagrantfile.production, deploys Koji cluster on AWS and OpenStack). The cookbook is a group of scripts

that use Ruby as its reference language, with an extended DSL for specific tasks. Vagrant also uses Ruby scripts

for configuring virtual environments.

Since both Chef cookbook and Vagrant configuration files are executed like typical procedural programs, the

usage of Ruby in the project is very limited and omits more advanced programming topics (like object-oriented

paradigm). In fact most of the code is written in Chef’s and Vagrant’s Domain-Specific Languages.

9

For the purpose of the code development I’ve established a version control repository. I’ve decided to use Git

as a version control software and Bitbucket.com for my private web repository. I’ve created four branches in

Git repository for the development of four versions of the code:

VirtualBox version (for testing),

Amazon Web Services version,

OpenStack version,

Hybrid: AWS and OpenStack version.

In the first step I’ve had to document in detail the entire process of installation and configuration of Koji cluster.

For this purpose I’ve used three Virtual Machines on VirtualBox. After documenting the installation and

configuration process of Koji, I've started working on Chef’s cookbook that would automate this process.

Development of a cookbook requires few tools that are included in Chef Development Kit.

Additionally, for the purpose of deployment I needed Vagrant - a tool that can automatically deploy any set of

Chef cookbooks on any virtualization or cloud provider. I've tested the cookbook on VirtualBox and once it was

finished, cookbook was deployed on AWS and OpenStack. To achieve this result I prepared four configuration

files:

Vagrantfile.vbox for VirtualBox deployment

Vagrantfile.aws for Amazon EC2 deployment

Vagrantfile.openstack for OpenStack deployment

Vagrantfile.production for hybrid (Amazon EC2 and Openstack) deployment

During the course of development I have worked on three different Chef installations (for the types of Chef

versions see

10

Chapter III: Description of the project, section Chef). Firstly, for simplicity I have used Chef Solo for the

deployment. I had to abandon it because it didn’t have support of very important cookbook resource – search

of the nodes. Then I have switched to Chef Zero, which is very useful tool for testing cookbooks (especially

when we need search option) because it is type of minimal Chef Server that runs in RAM and doesn't require

configuration (this version is used in Vagrant configuration file for VirtualBox). Another version of chef that I

have used is Hosted Enterprise Chef, which is simply Chef Server operated by Chef Company (there is also on

premise version of Chef Server but I didn’t used it in the project).

One rarely writes Chef’s cookbook without usage of other cookbooks. Every new cookbook has a list of

dependent cookbooks in metadata.rb file. The usage of code of other projects in new project is called “software

reusability”. Code reuse is a basic principle of modern object-oriented programming languages like Java, C++,

Python or Ruby. The ability to reuse code relies in an essential way on the ability to build larger things from

smaller pieces and being able to identify commonalities among those parts.

My cookbook installs and configures Koji. It is software used by Red Hat, Fedora, CentOS and other

organizations to build RPM packages on mass scale. In my project I have reused the cookbooks that configure

the services required by Koji (e.g. Apache HTTPD, PostgreSQL, NFS server) and few others that were helpful in

Koji installation and configuration (e.g. EPEL yum repository, SELinux, iptables).

Currently there are no means of automatic deployment of Koji using any of popular automation and

configuration management software (like Chef, Puppet, Ansible, Salt or other dedicated software). In Internet

there is only outdated documentation on Fedora Project’s wiki. This documentation and my personal

experience were the basis for the development of the cookbook. During my professional career I have worked

on bash scripts for Koji automatic installation and configuration. This experience helped me in the development

of similar (although more sophisticated) code in Ruby and Chef’s DSL.

Last step of my project was the development of Vagrant virtual environments' configuration files. The first file

was created for the purpose of testing of the Koji cookbook. Before I started working on configuration files

using other Vagrant providers (Amazon EC2 and OpenStack), I have decided that testing the cookbook will be

more comfortable, faster and cheaper on Vagrant's default provider (VirtualBox). This configuration files

defines four Virtual Machines: one with chef-zero installed, one with koji-hub installed, and two with koji-

builder installed. Second Vagrant configuration file included three Virtual Machines deployed on Amazon EC2:

one koji-hub and two koji-builders (chef-zero was replaced by Hosted Enterprise Chef). Third configuration files

also used three VMs but deployed on OpenStack. Fourth configuration file, the final one (production), deploys

koji-hub on Amazon EC2 and two koji-builders on OpenStack.

As a result of conducted research I developed Chef cookbook for Koji cluster installation and Vagrant

configuration files that enable to run it on hybrid cloud: one part of the cluster on Amazon Web Services and

another part on OpenStack.1

1 Although it would be very easy to rewrite the Vagrant configuration files to run the cookbook on other cloud providers. The cookbook itself is provider agnostic – it can run any RPM-based Linux machine.

11

At the end of this brief introduction, I will shortly summary each chapter. In the first chapter I described in

detail what is cloud computing and what are its taxonomies - that's the domain of my project. First chapter

also includes the description of Amazon Web Services and OpenStack cloud technologies – the two cloud

technologies that are used in the project.

Second chapter introduces DevOps, an umbrella term under which solutions to problem stated in the thesis

emerged in recent years. IT Operations found in those methodologies and software new ways of handling a

dynamic and software-defined infrastructure based on hybrid clouds.

Third chapter constitutes the documentation of the software project that I have authored to solve the

problem stated in my thesis. It also contains description of technologies that were used in my project.

However, those description are limited in size as much as possible, to make the reading of document more

comfortable and more usable in practice. Documentation in each IT project is absolute necessary element of

the system. “Ink is better than the best memory” teaches old Chinese proverb. As time passes it is harder to

understand how the system works. However, any author of IT system has to know that voluminous

documentation is as bad, as lack of it. Too much documentation makes system unusable.

Fourth chapter includes conclusions. There is my opinion whether the project occurred to be success or

failure and an outline of what could be done better. This chapter also contains suggestions for further

improvement of the project.

12

Chapter I: The problem’s domain What is “cloud”? Is it just good old Internet? Cloud computing simply can be defined as storing and having

access to computer data and software on the Internet, rather than running it on personal computer or office

server. In fact, programs such as Gmail or Office365 are commonly described as cloud computing technologies.

On the plus side, data and business computing programs are running online, rather than exclusively on office

computers, so it means that company’s staff has access to them anytime, anywhere there's an Internet

connection.2

However, not everyone share the enthusiasm of the new IT paradigm. In 2008 Oracle CEO Larry Ellison said in

regards to cloud computing that “the computer industry is the only industry that is more fashion-driven than

woman’s fashion.”3

McKinsey Quarterly mentions cloud computing as seventh most important technology-enabled business trend

out of ten. McKinsey values cloud computing for enabling new business models. Technology now enables

companies to monitor, measure, customize, and bill for asset use at a much more fine-grained level than ever

before. Asset owners can therefore create services around what have traditionally been sold as products.

Business-to-business (B2B) customers like these service offerings because they allow companies to purchase

units of a service and to account for them as a variable cost rather than undertake large capital investments.

Consumers also like this “paying only for what you use” model, which helps them avoid large expenditures, as

well as the hassles of buying and maintaining a product. This development has created a wave of computing

capabilities delivered as a service, including infrastructure, platform, applications, and content. And vendors

are competing, with innovation and new business models, to match the needs of different customers. 4

According to IDC IT analysts the transformation towards cloud computing is “third wave”, analogously to the

“second wave” – dissemination of the PC’s and computer networks, and the “first wave” – era of mainframe

and terminals dominance. Frank Gens, vice director and chief analyst of IDC, says that “those companies, that

haven’t adapted to the new model, are already forgotten.” Thus, analysis of past events leads to the conclusion

that today, just as in 1986, when a PC appeared, many IT giants will have to decide in which direction lead its

future actions – whether to remain with the second wave, or begin to develop solutions used in the third.5

In report from 2008 by Gartner Group, it was stated that Cloud Computing is the most important trend in IT

world. According to Gartner analysts, these techniques are sufficiently mature to become profitable in a short

period of time. At the same time more and more widespread knowledge about the potential benefits and costs

2 http://www.businessweek.com/smallbiz/content/oct2009/sb20091026_937390.htm (03/08/2014) 3 Anthony Velte, Toby J. Velte, Robert C. Elsenpeter, Cloud Computing, A Practical Approach, McGraw-Hill Prof

Med/Tech, 2009, p. 3 4 https://www.mckinseyquarterly.com/Strategy/Growth/Clouds_big_data_and_smart_assets_Ten_tech-

enabled_business_trends_to_watch_2647#Trend7 (03/08/2014) 5

http://www.computerworld.pl/artykuly/376694/Cloud.Computing.to.dopiero.poczatek.zmian.ktore.nastapia.w.branzy.IT.html (03/08/2014)

http://www.businessweek.com/smallbiz/content/oct2009/sb20091026_937390.htm

https://www.mckinseyquarterly.com/Strategy/Growth/Clouds_big_data_and_smart_assets_Ten_tech-enabled_business_trends_to_watch_2647#Trend7

https://www.mckinseyquarterly.com/Strategy/Growth/Clouds_big_data_and_smart_assets_Ten_tech-enabled_business_trends_to_watch_2647#Trend7

http://www.computerworld.pl/artykuly/376694/Cloud.Computing.to.dopiero.poczatek.zmian.ktore.nastapia.w.branzy.IT.html

http://www.computerworld.pl/artykuly/376694/Cloud.Computing.to.dopiero.poczatek.zmian.ktore.nastapia.w.branzy.IT.html

13

and limitations associated with these trends is available. Therefore, firms should decide whether to implement

these technologies.6

However, despite of overall enthusiasm there are also security concerns raised by some. Ernst and young in its

report claims that only 50% of respondents have documented information security strategy. More than half of

respondents did not introduce any procedures of minimization of risks resulting from implementing cloud

computing technologies.7

Richard M. Stallman, founder of the GNU project and Free Software Foundation, sees cloud computing as a

danger to the privacy of users of the software and their freedom to use and modify the software according to

their needs. He claims that cloud computing takes away control of the software from the users. In his opinion

it can be more dangerous than proprietary software. In closed source software users usually obtains an

executable (binary) file without the source code. Without the users (or rather users-programmers) of the code

really can’t study the program, so they can’t determine what the program really does (for instance, it may spy

you or sends your personal information to produces without your agreement). So the problem is that you can’t

change it. Whereas in cloud computing model, the users there is no even executable file in user’s hands: users

have access only to the interface of the program, the executable is on the server. Thus users can’t exactly know

what this software really does. It is even harder than in proprietary software to change it. Moreover, cloud

computing usually includes elements that can be classified as the malicious software. In case of proprietary

software, not many of them are “spyware”: that is the program sends out data about users' computing

activities. With cloud computing users have to send their data to the server in order to use it. This fulfils the

first requirement of the spyware software: user does not control the data anymore, now the server’s owner

controls it. Thus cloud computing providers have dominant power over their users.8

Eric Schmidt doesn’t share Stallman’s Orwellian predictions. On the contrary, he claims that those technologies

will serve people. He claims that when people have infinitely powerful personal devices, connected to infinitely

fast networks and servers with lots of content, it will enable a new kind of application and it will be personal.

It will use all of that computing power that’s in the cloud, as we call it. So this vision of nearly infinite computing

power, network power, and these powerful devices is the basis of the next generation of computing. 9

6 http://www.computerworld.pl/news/162476/Gartner.Cloud.computing.i.Green.IT.najwazniejszymi.trendami.najblizszych.lat.html (03/08/2014) 7

http://www.computerworld.pl/news/376998/Raport.ErnstYoung.media.spolecznosciowe.i.chmura.grozne.dla.firmowych.danych.html (03/08/2014) 8 http://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html (03/08/2014) 9

https://www.mckinseyquarterly.com/Googles_view_on_the_future_of_business_An_interview_with_CEO_Eric_Schmidt_2229 (03/08/2014)

http://www.computerworld.pl/news/162476/Gartner.Cloud.computing.i.Green.IT.najwazniejszymi.trendami.najblizszych.lat.html

http://www.computerworld.pl/news/162476/Gartner.Cloud.computing.i.Green.IT.najwazniejszymi.trendami.najblizszych.lat.html

http://www.computerworld.pl/news/376998/Raport.ErnstYoung.media.spolecznosciowe.i.chmura.grozne.dla.firmowych.danych.html

http://www.computerworld.pl/news/376998/Raport.ErnstYoung.media.spolecznosciowe.i.chmura.grozne.dla.firmowych.danych.html

http://www.gnu.org/philosophy/who-does-that-server-really-serve.en.html

https://www.mckinseyquarterly.com/Googles_view_on_the_future_of_business_An_interview_with_CEO_Eric_Schmidt_2229

https://www.mckinseyquarterly.com/Googles_view_on_the_future_of_business_An_interview_with_CEO_Eric_Schmidt_2229

14

In conclusion, there are hopes and fears related to development of cloud computing technologies – no matter

whether, we are skeptical or enthusiastic about it, we can agree that it is a field of study that needs more

research and investigation.

Cloud Computing On one hand cloud computing is often described as the on-demand delivery of IT resources via the Internet

with pay-as-you-go pricing.10 Cloud computing is about leasing servers and storage from a provider (like

Amazon Web Services). But, on the other hand, it’s also about much more. The cloud offers IT businesses major

cost savings and agility.

In addition, cloud computing offers significant scalability. With a single line of code, it is possible to provision

thousands of servers and it is paid only for what is really needed. Furthermore, because it is based on pay-as-

you-go per hour, running one server for a thousand hours costs the same amount as running a thousand servers

for one hour.

Finally, cloud computing enabled automation of server provisioning. It supports the automation of software

development, testing and production delivery. Combining scalability with automation provides the ability to

build an application that responds to load.

Related concepts Cloud computing has its origin in few former IT concepts that have dominated in “pre-cloud” IT industry.

Utility computing is the provision of computational, networking and storage resources as a metered service.

“Utility” indicates that the model works analogously to public utilities.11

Grid computing is a combination of distributed resources from various institutions (resource providers), to

meet the demands of clients consuming them.12

Distributed computing is computing over distributed autonomous computers that communicate only over a

network. Such systems are often treated differently from parallel computing systems or shared-memory

systems, where multiple computers share a common memory that is used for communication between the

processors.13

Virtualization enables running several virtual operating systems, independent from each other, on a single

physical host. Thanks to maximal utilization of a physical computer, the return on investment is significantly

higher.14 Resource virtualization is at the heart of most cloud architectures. The concept of virtualization allows

10 http://aws.amazon.com/what-is-cloud-computing/ (03/02/2015) 11 John W. Rittinghouse, James F. Ransome, Cloud Computing: Implementation, Management and Security, CRC

Press, 2009, p. 26 12 Borko Furht, Armando Escalante, Handbook of Cloud Computing, Springer, 2010, p. 185 13 Dinkar Sitaram, Geetha Manjunath, Moving To The Cloud: Developing Apps in the New World of Cloud

Computing, Syngress, 2011, p.381 14 John W. Rittinghouse, James F. Ransome, op. cit., p. 24

http://aws.amazon.com/what-is-cloud-computing/

15

an abstract, logical view on the physical resources and includes servers, data stores, networks, and software.

The basic idea is to pool physical resources and manage them as a whole. Individual requests can then be

served as required from these resource pools.15

Definition The term cloud has its origin in symbol used in network diagrams which has symbolized the Internet. 16

Cloud computing is also a new business model replacing old model based on the traditional data center.

However, traditional data center not necessarily goes away to be replaced with a cloud. Sometimes the

traditional data center is the best fit. Nevertheless, for business agility and economic reasons, the cloud is

becoming an increasingly important option for companies. Cloud computing can be perceived as the

foundation for the industrialization of computing.17

Key characteristics The "five essential characteristics" was proposed by the National Institute of Standards and Technology18:

On-demand self-service – a user, with an appropriate delegation of rights (permission), can individually

provision computing resources when he needs them and without the need of human interaction with

service’s operator.

Broad network access – resources can be accessed by users over the network and through

standardized solutions that enables heterogeneous usage independent from the type of a device (e.g.,

mobile phones, laptops, PDAs).

Resource pooling – resources of a provider are divided into pools to serve numerous consumers in a

architecture consisting of multiple tenants, with various virtual capabilities dynamically allocated in

response to the demand generated by users.

Rapid elasticity – resources are provisioned fast and in elastic way, often automatically; it scales out

rapidly. In user perspective the capabilities of the cloud seems to be almost unlimited.

Measured service – control and optimization of resources is automatic in the cloud system, so that

their utilization is adjusted by the metering functionality. The users’ usage of resources is monitored

constantly. Reports based on metering of the utilized service are used by both user and the cloud

provider.19

15 Baun, C., Kunze, M., Nimis, J., Tai, S., Cloud Computing: Web-Based Dynamic IT Services, Springer, 2011, p. 5 16 John W. Rittinghouse, James F. Ransome, op. cit., p. 26 17 Judith Hurwitz, Robin Bloor, Marcia Kaufman, Fern Halper , Cloud Computing for Dummies, For Dummies, 2009,

p. 19 18 Peter Mell, Timothy Grance, The NIST Definition of Cloud Computing, National Institutes of Technology, U.S. Department of Commerce, Special Publication 800-145, September 2011, available at http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf (20/10/2014) 19 David E. Y. Sarna, Implementing and Developing Cloud Computing Applications, Auerbach Publications, 2010, p. 16

http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

16

Benefits and Drawbacks The benefits of cloud computing are20:

1. Reduction of costs related to infrastructure implementation and maintenance.

2. Cloud computing boosts mobility of the global IT employment.

3. Datacenters became more scalable and more flexible.

4. Faster time to market than traditional datacenter.

5. Facilities transformation and change to enable an innovation-friendly environment.

6. Enables usage of “green” technology and methods of operation.

7. Better affordability enabled SME’s development and usage of high-performance software.

However, there exists also significant area of potential risks of the cloud computing paradigm21:

1. Potential problems with availability of the cloud.

2. Data lock-in.

3. Data privacy and traceability.

4. Compliance with national legislation by geographical data storage.

5. Data transfer bottlenecks.

6. Poor performance predictability.

7. Scalability of persistent storage space.

8. Errors in large, distributed systems.

9. Reputation and liability.

10. Software licenses.

History The notion of cloud computing can be dated to at least to 1961, when John McCarthy22 wrote that time-sharing

computer technology can be transformed in the future into an “utility computing” model, in which computer

resources or even applications could be provided remotely. At the time, the late 1960s, IT technology was not

ready for this futuristic concept. When the idea revitalized in the turn of the millennium, the term “cloud

computing” replaced and extended previous one.23

Google became a pioneer during this revitalization thanks to several key factors.

● First, the collection of data and the processing of that data had to be as automated as possible.

20 John W. Rittinghouse, James F. Ransome, op. cit., p.14 21 Baun, C., Kunze, M., Nimis, J., Tai, S., op. cit., p. 70 22 John McCarthy (September 4, 1927 – October 24, 2011) was an American computer scientist and cognitive scientist. McCarthy was one of the founders of the discipline of artificial intelligence. He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. [Source: http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist) (20/01/2015)] 23 John W. Rittinghouse, James F. Ransome, op. cit., p. 26

http://en.wikipedia.org/wiki/John_McCarthy_(computer_scientist)

17

● It had to be cost effective, so the infrastructure was constructed out of commodity components

(“cheap stuff that breaks”).

● Data had to be stored in a simple and fairly reliable manner to facilitate scaling (instead of a using

traditional database, Google created its own data store called GFS).

● New types of application development architectures and processing algorithms (including map-reduce

family among others).

● Operations had to be automatic and dependable.

● Outages in parts of the application were tolerable.

In order to scale cheaply its search facility, Google had created much of what can probably be first recognized

as a cloud.

Another interesting pioneer in cloud computing is Amazon. In the first years the company built its IT

infrastructure the traditional way: using big, heavy servers with relational databases. This model worked well

in early days. As commerce on the Internet expanded, it became clear for Amazon that its computing

architecture had to change. At the same time, in order to build customer and vendor relationships Amazon

had begun exposing individual services as callable services. This model has accelerated decomposition of many

of Amazon’s applications into individually callable services. In 2006 Amazon began to offer basic computing

resources: computing, storage, and network bandwidth in highly flexible, easily provisioned services, all of

which could be paid for in pay-as-you-go model.

Salesforce.com was the first public cloud service that was targeted at the enterprise customer keeping

customers’ sensitive data outside of their own facilities. They introduced an easy, pay as you go CRM (customer

relationship management) implementation that have risen to meaningful market share and then eventual

dominance largely at the expense of the traditional, install-in-your-own-shop application with an overwrought,

often painful, and unintentionally costly implementation.

During the era of this three major cloud computing pioneers the vision of virtualized utility computing finally

begin to become true:

Computing—computation, storage, communication—is relatively cheap, scales up or down as needed,

operates itself automatically, and always works.24

Layers Each cloud computing layer can be characterized by properties of its own. Additionally, some layers are

subdivided into sub-layers and into their services. There are three basic layers of cloud computing: Software as

a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Software (IaaS).

Clouds are divided into multiple services and interfaces which make them usable. Although each cloud

computing technology consists usually of multiple layers, they are mapped to the highest layer possible. This

is the layer through which potential users are primarily addressed. Cloud computing is currently under dynamic

24 Eric A. Marks, Bob Lozano, Executives Guide to Cloud Computing, Wiley, 2010, p.20

18

evolution, so following classification is not intended to be complete, but rather it represents an outline of the

archetypal cloud services.25

Application Software Cloud software applications (or Software as a Service, SaaS) directly serves the end user. In this model the

customers are free from the need of the installation of the software locally. Instead they use it usually via web

browser because cloud software interface is mostly web. From the cloud architecture perspective, the SaaS

model can be developed and operated by the provider on the basis of a PaaS or IaaS models.26

Platform The cloud services provided on the PaaS layer are targeted to developers. These are mostly programming

environments (PE) and execution environments (EE) where software written in a specific programming

language can be executed. PaaS usually extends existing programming environments, e.g., by adding class

libraries which have a specific application focus.27

Infrastructure The IaaS layer provides the users (typically system administrators) an abstracted view on the hardware, i.e.,

computers, mass storage systems, networks, etc. User manages via its interface a number of resources. It

enables the users to allocate a subset of the resources for their own use. Typically user has a possibility of

creating or removing operating system images, scaling required capacities, or defining network topologies,

connecting volume storage to instances, and basic operations on instances: starting, stopping or destroying

them.28

Deployment models There are three models of cloud deployment are recognized by The National Institute for Standards and

Technology (NIST), Information Technology Laboratory.29

Public cloud In this model cloud infrastructure is owned by an organization selling cloud services. It is made available to

general public.30

Computing resources in public cloud are dynamically provisioned over the Internet via web. Public clouds

(sometimes also called external clouds) are run by third party companies. Typically cloud services from different

25 Baun, C., Kunze, M., Nimis, J., Tai, S., op. cit., p. 17 26 Ibid., p. 20 27 Ibid., p. 20 28 Ibid., p. 18 29 David E. Y. Sarna, op. cit., p. 17 30 Ibid., p. 17

19

companies are likely to be mixed together to form an organization’s IT infrastructure: cloud servers, storage

systems, and networks.31

Private cloud This type of cloud infrastructure is operated individually by and for one organization. Its management may be

outsourced to a third party and it may exist on premise or off premise.32 Private cloud (also called internal

cloud) is a model in which cloud services are running on private networks. They are used exclusively by one

organization that keeps full control over data, security, and quality of service. Private clouds are created and

administrated by a company’s own IT department or outsourced to other third party.33

Hybrid cloud This model is a combination of two or more private and public clouds that are independent units but

standardized technology bounds them to enable portability of data and applications (e.g., load-balancing

between clouds).34 A hybrid cloud environment combines multiple public and private cloud models. Hybrid

clouds introduce the complexity of determining how to distribute applications across both a public and private

cloud.35

OpenStack OpenStack is called by their creators a “cloud operating system”. It controls large pools of compute, storage,

and networking resources throughout a datacenter. The cloud is managed through a dashboard that gives

administrators control and empowers their users to individually provision resources through a web

interface.36

The project aims at creating an open source cloud computing platform for public and private clouds providing

scalability without complexity. Initially it focused on Infrastructure as a Service (IaaS) model, but the scope of

projects and models are constantly growing. One of the core values on which the project is based is openness

with both open standards and open source code. OpenStack has been released under the Apache 2.0

license.37 In addition, OpenStack promotes open standards through the OpenStack API.

The OpenStack project was created by Rackspace Hosting (a large US hosting firm) and NASA (the US Space

agency). They decided to work together and released their internal cloud object storage and cloud compute

code bases (respectively) as a one common open source project.38

31 Borko Furht, Armando Escalante, op. cit., p. 7 32 David E. Y. Sarna, op. cit., p. 17 33 Borko Furht, Armando Escalante, op. cit., p. 7 34 David E. Y. Sarna, op. cit., p. 17 35 Borko Furht, Armando Escalante, op. cit., p. 7 36 http://www.openstack.org/software/ (19/12/2014) 37 Apache 2.0 license is available online: http://www.apache.org/licenses/LICENSE-2.0.txt (22/01/2015) 38 Ken Pepple, Deploying OpenStack, O'Reilly Media, 2011, p. 1

http://www.openstack.org/software/

http://www.apache.org/licenses/LICENSE-2.0.txt

20

Core Services The project currently encompasses five main components: Compute (Nova), Object Store (Swift), Networking

(Neutron) and Dashboard (Horizon).

Compute (Nova) In this service instances of virtual machines are run by the users on numerous hosts. This solution offers

scalability and redundancy. The major goal of this project is to be hardware and hypervisor agnostic.

OpenStack Compute is base for some of the public cloud providers – i.e. it runs Rackspace Open Cloud.39

Object Store (Swift) It is the service that provides storage that is massively scalable and in the same time it can be build using

commodity hardware. Inspiration for creation of OpenStack Object Store was Amazon's S3 storage service.

Users can keep data of almost unlimited size (limited by hardware resources) and extend their storage on

demand. OpenStack Object Storage is highly redundant and thus it is perfect for data archiving (e.g. logs) or

providing a storage system that OpenStack Compute can use for instance templates (VM images).40

Networking (Neutron) OpenStack supports many modes of networking. Main three are Flat networking, VLAN Manager and

Software Defined Networking (SDN). Software Defined Networking is an approach to networking in which

network administrators and cloud operators can programmatically define virtual network services. The

Software Defined Network component of OpenStack Networking is called Neutron.

Using Neutron, users can create complex networks in a secure multi-tenant environment. It overcomes the

issues often associated with previous networking systems: Flat and VLAN. For Flat networks all tenants work

within the same IP subnet. VLAN networking separates the tenant IP ranges with a VLAN ID, but VLANs are

limited to 4096 IDs, which is a problem for larger installations, and the user is still limited to a single IP range

within their tenant to run their applications.

SDN in OpenStack is also a pluggable architecture: it enables to plug-in and control various switches,

firewalls, load balancers and achieve various functions as Firewall as a Service — everything software–

defined providing full control over complete virtual network infrastructure.41

Dashboard (Horizon) Administrating OpenStack installation through a CLI allows to control of the cloud environment, but web

interface gives easier access to the cloud for users, operators and administrators. OpenStack Dashboard

39 Kevin Jackson , Cody Bunch, OpenStack Cloud Computing Cookbook, 2nd Edition, Packt Publishing, 2013, p. 52 40 Ibid., p. 86 41 Ibid., p. 168

21

provides web interface that runs from an Apache server, using WSGI and Django. With OpenStack Dashboard

installed it is possible to manage all the core components of the OpenStack environment.42

Shared Services The project also includes “OpenStack Shared Services” that are used commonly by the core services.43

Identity service (Keystone) It is a service that provides for authenticating and managing user accounts and roles for OpenStack cloud.

Identity service authenticates and verifies also connections between all other OpenStack cloud services, thus

it is the first service that needs to be installed within an OpenStack environment. To authenticate a user or a

service it sends back an authorization token that is passed between the services, once validated. This token is

subsequently used as user’s authentication and verification that can be proceed to use any OpenStack service

(like Computer or Object Store). Configuration of the OpenStack Identity service includes creating

appropriate roles for users and services, tenants, the user accounts, and the service API endpoints that make

up the cloud infrastructure.44

Image Service (Glance) It is the service that allows user to register, discover, and retrieve virtual machine images. They can be stored

in a variety of backend locations: local filesystem, distributed filesystems such as OpenStack Storage (Swift)

and others.45

Block Storage (Cinder) Data written to currently running instances on disks is not persistent – after termination of such instance any

disk writes will be lost. Volumes are persistent storage that can be attached to a running VM instances. It

works like an external USB drive that you can attach to an instance. Block volumes, similarly to USB drives,

can be attached only to one instance at a time.

OpenStack Block Storage is very similar to Amazon EC2's Elastic Block Storage – the difference is in how

volumes are presented to the running instances. Under OpenStack Compute, volumes can easily be managed

using an iSCSI exposed LVM volume group named cinder-volumes, which must be present on any host

running the service Cinder volume.46 Cinder can also use hardware storage arrays or storage servers (like NFS,

GlusterFS, Nexenta, Cepth RBD, VMware, Windows Server 2012, Solaris ZFS, etc.) and other data transfer

protocols (like AoE, NFS, RBD, Fibre Channel, etc.).47

42 Ibid., p. 217 43 http://www.openstack.org/software/openstack-shared-services/ (13/01/2015) 44 Kevin Jackson , Cody Bunch, op. cit., p.5 45 Ibid., p. 35 46 Ibid., pp. 151-152 47 Full list of supported storage systems and protocols is available at OpenStack wiki: https://wiki.openstack.org/wiki/CinderSupportMatrix (22/10/2014)

http://www.openstack.org/software/openstack-shared-services/

https://wiki.openstack.org/wiki/CinderSupportMatrix

22

Telemetry Service This service aggregates usage and performance data from other OpenStack services. It provides visibility of

the usage of the cloud across the data points and it enables users and operators to view metrics globally or

by individual resources.

Orchestration Service It is a template-driven service allowing application developers to automate the deployment of infrastructure.

It provides template language that can specify compute, storage and networking configurations as well as

detailed post-deployment configuration to automate the full provisioning of services and applications. It also

integrates with Telemetry service to provide automatic scaling of infrastructure resources according to load

requirements.

Database Service The service has the goal of enabling users to utilize the features of a relational database in the cloud. Users

and database administrators can provision and manage multiple database instances. The service focuses on

providing resource isolation at high performance and automation of complex administrative tasks (like

deployment, configuration, patching, backups, restores, and monitoring).

RDO OpenStack OpenStack is a complicated array of software services. To make easier and faster the process of installation

and configuration, it is usually delivered in form of “software distributions”. Typical OpenStack distribution

handles the installation and provides tools to manage and monitor the services. One of the major OpenStack

distributions, known as RDO, is developed by Red Hat’s engineers and Fedora’s community.48

RDO is a freely-available, community-supported distribution of OpenStack that runs on Red Hat Enterprise

Linux, CentOS, Fedora, and their derivatives. In addition to providing a set of software packages, it's also a

community of users of cloud computing platform on Red Hat and Fedora Linux operating systems to get help

and compare notes on running OpenStack.49

Amazon Web Services Amazon Web Services (AWS) is a collection of remote computing services (web services) constituting a cloud

computing platform by Amazon.com. Two central services are Amazon EC2 and Amazon S3: they provides a

large computing capacity.50 AWS is a set of on-demand computing resources and services in the cloud, with

pay-as-you-go pricing. Using its resources instead of building traditional datacenter is like purchasing

electricity from a power company instead of running own power generator. Thus it provides many of the

same benefits: capacity exactly matches need, payment only for what is used, economies of scale result in

lower costs, and the service is provided by a vendor experienced in running large-scale networks.51

48 https://openstack.redhat.com/ (18/12/2014) 49 https://openstack.redhat.com/Frequently_Asked_Questions (18/12/2014) 50 http://en.wikipedia.org/wiki/Amazon_Web_Services (18/12/2014) 51 http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html (18/12/2014)

https://openstack.redhat.com/

https://openstack.redhat.com/Frequently_Asked_Questions

http://en.wikipedia.org/wiki/Amazon_Web_Services

http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html

23

Regions AWS is divided into multiple, independent from themselves, regions placed around the world. The isolation

between them enables designing highly available applications that span the globe with low-latency response

times to users.52 The map of current regions is presented in Figure 1.

Figure 1 AWS Global Infrastructure (Regions) (Source: http://aws.amazon.com/ (22/01/2015))

By selecting a region closest to users, it is possible to deliver the best experience by minimizing latency. The

division into regions also enables to start and stop easily new applications in different geographical regions if

needed. It allows to “fail fast”, which lets to try new projects that would have been too expensive in a

traditional datacenter.

Another advantage of using multiple regions is data privacy. Many companies are required to store data in a

specific region. The European Union requires that data about its citizens be stored in Europe. In this case, the

eu-west-1 (Dublin) or eu-central-1 (Frankfurt) would be best choice. The specific regions and locations are

listed in Figure 2.

Region Location ap-northeast-1 Asia Pacific (Tokyo) ap-southeast-1 Asia Pacific (Singapore) ap-southeast-2 Asia Pacific (Sydney) eu-central-1 EU (Frankfurt) eu-west-1 EU (Ireland) sa-east-1 South America (Sao Paulo) us-east-1 US East (N. Virginia)

us-west-1 US West (N. California)

us-west-2 US West (Oregon) Figure 2 List of AWS Regions and Locations (Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html (28/11/2014))

There is also one additional region called GovCloud, which is specifically designed to store data for the U.S.

government. It is located in the Northwestern United States.

52 Brian Beach, Pro PowerShell for Amazon Web Services, Apress, 2014, p. 1

http://aws.amazon.com/

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html

24

Regions provide a possibility to deliver application from the location closest to its users and build redundant

applications served from multiple regions. Amazon Web Services also offers another layer of redundancy

called availability zones.

Availability zones Each region is divided into two or more availability zones (see Figure 3). Each availability zone (AZ) within a

region is a separate data center. They are isolated from failures but connected with high-speed, low-latency

links with each other. Each AZ has separate power, cooling, and Internet access. Additionally their locations

are chosen so they are never in the same flood plain, etc. This allows designing highly available applications

that span multiple data centers.53

Figure 3 AWS - Availability Zones (Source: Brian Beach, Pro PowerShell for Amazon Web Services, Apress, 2014, p. 3)

Regions and availability zones are two layers of separation that enable to build a highly available, low-latency

applications that couldn’t been possible in pre-cloud computing data center. Only a handful of companies

around the globe have the resources to match this functionality in their own data centers.

Services Amazon Web Services can be grouped into five major categories: Management, Storage, Network, Compute

and Monitoring (see Figure 4). Currently there are more services provided by Amazon than presented in this

Figure, but these are the most important that are also substantial for the thesis’s project.54

53 Ibid., p. 3 54 For complete list of AWS services and products see: https://aws.amazon.com/products/ (22/01/2015).

https://aws.amazon.com/products/

25

Figure 4 AWS – Services (Source: Brian Beach, Pro PowerShell for Amazon Web Services, Apress, 2014, p. 3)

The services are accessed over HTTP, using the REST architectural style and SOAP protocol. All services are

billed based on usage, but how usage is measured for billing varies from service to service.

Management The services in the management category are used to access and configure AWS. 55

AWS Management Console – a web GUI for configuring AWS.

Identity and Access Management (IAM) – it allows to control access to an account. Administrator can

create users and groups and write policies to control access to resources.

Storage In the bottom of Figure 4 there are listed multiple storage options. 56

Elastic Block Storage (EBS) – it is a storage area network used to create disks for instances. It is a

network-based solution similar to iSCSI. It is possible to create volumes from 1GB to 1TB and manage

its IO operations per second (IOPS).

Simple Storage Service (S3) – it is highly durable object storage in the cloud. It is used to store an

unlimited number of files up to 5GB each. S3 uses HTTP/S to read and write objects. It has

99.999999999% durability.

55 Ibid., p. 4 56 Ibid., p. 5

26

Amazon Glacier – Glacier is a low cost, cold storage solution. Glacier offers the same high durability

as S3 for about 1/10 the cost, but stores data offline and requires advanced notice to access your

data. This is a great alternative to tape backup.

Network In the middle of Figure 4, there are multiple network services that work together. 57

Virtual Private Cloud (VPC) – it allows to create a private network to isolate instances from those of

other AWS tenants. It enables to create a custom network topology and control network security.

Elastic Load Balancers (ELB) – it enables to balance traffic between multiple servers across

availability zones. It is possible to create public ELBs on the Internet or use a private ELB to balance

traffic between layers of a multitier application.

Route 53 – it is Amazon’s managed DNS solution. It can balance traffic between multiple regions, and

AWS will determine which region is closest to the user and route them automatically.

Compute At the top of the stack there are two compute services. 58

Elastic Compute Cloud (EC2) – it is Amazon’s virtual server service. It is used to launch servers, called

instances, in the cloud. EC2 offers thousands of images and hardware configurations.

Relational Database Service (RDS) – it is Amazon’s managed database service. RDS supports MySQL,

Oracle, PostgreSQL, and Microsoft SQL Server. Users can install any of these on an EC2 instance, but

with RDS, Amazon manages the administration for them.

Monitoring Finally, there is a collection of monitoring services. 59

CloudWatch – it is used to monitor the environment. CloudWatch allows to create custom alarms

and defines what actions to take when an issue arises. For example, it can raise an alarm when CPU

utilization is above 80% for an extended period of time.

Auto Scaling – combined with CloudWatch, allows to automatically respond to changing conditions.

For example, it can create an application that automatically launches new instances when the

application is under high load.

Simple Notification Service (SNS) – it is Amazon’s notification system. CloudWatch can publish

messages to SNS whenever an alarm occurs. SNS can send events using e-mail, SMS text messages,

and many other options.

57 Ibid., p. 6 58 Ibid., p. 6 59 Ibid., p. 7

27

Chapter II: Solutions of the problem Although the problem of utilization of a hybrid cloud is new topic in IT world, it stimulated for last few years

emergence of new ideas, new businesses and new software. One of these novelties that cloud computing

made possible and facilitated is the wide range of configuration management and deployment automation

software. Outsourcing of the infrastructure in cloud computing model required changes also in mindset,

methods and tooling of system administrators, software architects and developers.

Among existing solutions to the problem are tools integrated in public and private clouds (i.e. OpenStack

Orchestration) and tools that can be easily employed to serve as a glue between public and private cloud.

Although there are many software stacks and tools for this purpose, their creation was inspired by new

movement that has sprang in the environment of IT professionals working with cloud computing – its name is

DevOps.

DevOps Many individual aspects and traits of DevOps have been well known for years, whereas others are new. It

started as a movement that addressed the motivational conflict between software development department

and operations (systems administration) department in many companies. The conflict is a result of different

goals and incentives between departments. DevOps was invented as a set of practices and tools to improve

collaboration between development and operations. It integrates the complete delivery process in a holistic

way by providing processes and tools for Agile approaches to all parts of the software delivery process.60

Figure 5 Basic elements of DevOps software development method

DevOps in fact is not proper name for this software development and delivery method because it is lacking in

its name third crucial element: Quality Assurance. Perhaps “DevOpsQA” would be better one. As Figure 5

presents Development, Operations and Quality Assurance and three deeply interrelated elements. The

common sphere between those three circles can be understood as an essence of DevOps.

60 Michael Huttermann, DevOps for Developers, Apress, 2012, p. 12

Development

OpertationsQuality

Assurance

28

Agile Agile is a set of methods and methodologies for IT teams. It enables them organizing work more efficiently

and making better business and engineering decisions. Agile covers entire software development process:

including project management, software design and process improvement. Agile practices are usually

designed to be easy to use and adopt.

Agile requires from its users more of right mindset than right skills or tools. Mindset determines how

effectively a team uses the practices. Agile mindset facilities sharing information in a team, so that its

members can make important project decisions together – instead of having a manager who makes all of

those decisions alone. It is about opening up planning, design, and process improvement to the entire

team.61

DevOps has big affinity with Agile approach. The traditional view of operations treated the “Dev” side as the

“makers a system” and the “Ops” side as the “people that take care of the system in production”. For the last

decade, especially after the introduction of cloud computing, IT industry realized the harm that has been

done by treating these two as separate silos.

DevOps can be understand as an extension of Agile that prescribes close collaboration of customers, product

management, developers, operations and QA to iterate towards a better product fast. Service delivery and its

configurations is a fundamental part of the value for the customer, and thus the product team needs to

include those concerns as a top level item in the Agile-driven project. From this perspective, DevOps is simply

extending Agile principles beyond the boundaries of “the code” to the entire delivered service.62

Infrastructure as a Code Infrastructure was automated long before the emergence of Agile methodologies and DevOps movement.

However, they in old times servers were mainly handcrafter by an individual engineer, whose scripts (if they

were any) were unreadable for others.

As Mike Loukides has put it:

“Perl was designed as a programming language for automating system administration. It didn’t take

long for leading-edge sysadmins to realize that handcrafted configurations and nonreproducible

incantations were a bad way to run their shops.”63

In recent years, new ideas and new software in the field of configuration management emerged and

developed to replace both manual configuration and old-style shell and Perl automation scripts. The central

idea of new tooling was to enable close collaboration between developers and operations engineers. Its aim

was also to provide higher transparency in the complex infrastructure installations. This problem was

addressed because in recent years the number of such installations is growing exponentially. With increasing

61 Andrew Stellman, Jennifer Greene, Learning Agile, O'Reilly Media, 2014, p.2 62 http://theagileadmin.com/what-is-devops/ (06/01/2015) 63 http://radar.oreilly.com/2012/06/what-is-devops.html (23/01/2015)

http://theagileadmin.com/what-is-devops/

http://radar.oreilly.com/2012/06/what-is-devops.html

29

complexity and more sophisticated integration of layers of IT systems developers were required to

understand operations, and operational engineer to know the development process. The infrastructure as

code paradigm can help to achieve these goals.64

Thanks to Agile new methods of developing software emerged: continuous integration, test driven

development, build/deployment automation, and others. All of them were created mostly to automate as

many parts as possible of the lifecycle of a software product. However, at the beginning the biggest focus was

at the software itself, and the infrastructure on which the software runs was often perceived as a separate

problem.

From a traditional perspective, infrastructure summarizes items such as operating systems, servers, switches,

and routers. It comprises all of the environments of an organization together with supporting services, such

as firewalls and monitoring services. In Infrastructure as a Code context, infrastructure often includes every

part of the solution that is not the developed software application itself. In that sense, infrastructure is

includes the middleware: web and application servers, databases, load balancers, configuration files,

software packages as part of the operating system, crontabs, users, groups, etc.).

However, infrastructure is set up and changed over time, before the software even goes into production. In

cloud computing environments it became more often needed to rebuild your infrastructure from scratch.

This brought a need to well document infrastructure and find a solution of automatically set up it.65

Infrastructure as code is a powerful concept and approach that promises to help repair the split-brain

phenomenon witnessed so frequently in organizations where developers and system administrators view

each other as enemies, to the detriment of the common good. Through co-design of the infrastructure code

that runs an application, we give operational responsibilities to developers. By focusing on design and the

software lifecycle, we liberate system administrators to think at higher levels of abstraction. These new

aspects of our professions help us succeed in building robust, scaled architectures. We open up a new way of

working—a new way of cooperating—that is fundamental to the emerging DevOps movement.66

Infrastructure as code emphasizes the need to handle the setup of infrastructure in the same way as the

development of code: by picking the right language or tool to do the job and start developing a solution that

suits the needs, making it an executable specification that can be applied to target systems efficiently and

repeatedly.

64 Michael Huttermann, op. cit., p.136 65 Michael Huttermann, op. cit., p. 135 66 Stephen Nelson-Smith, Test-Driven Infrastructure with Chef, 2nd Edition, O'Reilly Media, 2013, p. 5

30

Chapter III: Description of the project My solution to the problem stated in the thesis – how to utilize a hybrid cloud based on OpenStack and

Amazon Web Services? – is a project based on a popular ruby-based DevOps software stack – Chef and

Vagrant. Project consists of one cookbook and four Vagrant configuration files. All cookbook's code and

Vagrant files are stored in Git repository at bitbucket.com. The cookbook installs and configures a Koji cluster

(see

31

Appendix A: Koji build system), a software stack dedicated for building RPM packages.

Assumptions and requirements The system to work properly has specific requirements that have to be fulfilled before running it. Firstly, the

project was tested on CentOS Linux operating system.67 The user of the system has to have an access to

OpenStack and AWS clouds, install necessary software on his computer and set up some environment

variables. Additionally, if user would like to develop further the project, another set of tools is required.

The project assumes that its user has access to OpenStack cloud and Amazon Web Services. The first one user

can install on any Linux box.68 The other can be obtained by registering on Amazon’s web site.69 Additionally,

user needs an access to Chef Server – either local one, or Hosted Chef Server.70 Last thing that user needs to

67 The project will most probably work on other Linux distributions without much changes. In case of Microsoft Windows or Apple Mac OS X significant changes would be required to run it. 68 RDO OpenStack documentation: https://openstack.redhat.com/Quickstart (23/01/2015) 69 AWS registration form: https://portal.aws.amazon.com/gp/aws/developer/registration/index.html (23/01/2015) 70 More information about Chef Server and Hosted Chef: https://www.chef.io/chef/choose-your-version/ (23/01/2015)

https://openstack.redhat.com/Quickstart

https://portal.aws.amazon.com/gp/aws/developer/registration/index.html

https://www.chef.io/chef/choose-your-version/

32

install on his computer to use the project is Vagrant.71 Because some data has to be private and secure for

each individual user of the project’s Vagrant configuration files, this kind of information is hidden in shell’s

environment variables. There are also other variables kept in it for convenience of configuration.

The project to run properly requires environment variables listed in Figure 6 to be exported with valid values.

The script with environment variables is divided into four sections: Logging, AWS, OpenStack and Chef. In

each part there is number of variables – the bolded ones are those which values have to be provided by the

user of the script.

71 Vagrant can be downloaded from its website: https://www.vagrantup.com/ (23/01/2015)

#Logging

export VAGRANT_LOG=debug

export CHEF_LOG=debug

export VAGRANT_OPENSTACK_LOG=debug

# AWS

export EC2_ACCESS_KEY=USER_ACCESS_KEY

export EC2_SECRET_KEY=USER_SECRET_KEY

export EC2_URL=http://ec2.amazonaws.com

export S3_URL=https://s3.amazonaws.com:443

export AWS_ACCESS_KEY="${EC2_ACCESS_KEY}"

export AWS_SECRET_KEY="${EC2_SECRET_KEY}"

export AWS_KEYPAIR=USER_KEYPAIR

export AWS_PRIVATE_KEY_PATH=$HOME/.ssh/user_key

export AWS_SSH_USERNAME=ec2-user

export AWS_AMI_IMAGE=ami-799cf410

export AWS_REGION=us-east-1

export AWS_INSTANCE_TYPE=t1.micro

export AWS_SECURITY_GROUP=SEC_GROUP

# OpenStack

export OS_USERNAME=OPENSTACK_USERNAME

export OS_TENANT_NAME=OPENSTACK_TENANT_NAME

export OS_PASSWORD=OPENSTACK_PASSWORD

export OS_IP=OPENSTACK_IP_ADDRESS

export OS_AUTH_URL=http://$OS_IP:5000/v2.0/

export OS_PUBLIC_KEY_PATH=$HOME/.ssh/user_key.pub

export OS_PRIVATE_KEY_PATH=$HOME/.ssh/user_key

export OS_SSH_USERNAME=centos

export OS_FLAVOR=m1.small

export OS_IMAGE='CentOS'

export OS_FLOATING_IP_POOL=public

# Chef

export CHEF_ORG=USER_CHEF_ORG_NAME

export CHEF_SERVER=https://api.opscode.com/organizations/$CHEF_ORG

export CHEF_VALIDATION_CLIENT_NAME=koji-validator

export CHEF_VALIDATION_KEY_PATH=$HOME/.chef/koji-validator.pem

Figure 6 Shell environment variables

https://www.vagrantup.com/

33

First part includes variables regarding the verbosity of the Vagrant’s output – “debug” means that Vagrant,

Chef and vagrant-openstack-provider will provide a lot of information during run.

Second section includes the configuration of Amazon Web Services. Five variables in this section requires

user to provide values: he’s AWS access key and secret key allowing authorization in AWS, key pair name and

path to private key, so that Vagrant can log into instance to configure it. Last variable is dedicated to security

group - it has to be one with open ports 22 (SSH) and 80 (Koji).

Third section is related to OpenStack configuration. In this section user has to provide his username and

password, tenant name and IP address of the OpenStack endpoint. Similarly to AWS, user has to provide

public and private key to enable Vagrant SSH log into instance.

Last section is dedicated to Chef – in this part there are two possibilities depending on type of Chef Server

used. If user has an access to Hosted Chef Server, than it is enough to provide Chef organization’s name.

Otherwise, that is when Chef Server is installed locally, then entire CHEF_SERVER has to be changed.

Additionally, name and path of his koji-validator key are correct.

Assuming that file with environment variables is named projectrc, following command has to be used to

export it: source projectrc.

Development environment

Ruby The project can be further developed. However, installation of specific development tools for this purpose is

necessary. The software tools used in the project are dependent on Ruby execution environment

(interpreter)72 and few Ruby libraries (called “gems”)73.

In Figure 7 Gemfile there are listed Ruby gems that are required for the project. The easiest solution to

provide them is to install and run Bundler. “Gemfile” is the name of the file that is used by Bundler to

72 Ruby is also included in ChefDK, so it doesn’t have to installed separately from it. 73 More on Ruby gems: https://rubygems.org/pages/about (25/01/2014)

source 'https://rubygems.org'

gem 'berkshelf'

gem 'test-kitchen'

gem 'kitchen-vagrant'

gem 'knife-ec2'

gem 'knife-openstack'

gem 'aws-sdk'

gem 'fog'

gem 'chef-api'

gem 'chef'

Figure 7 Gemfile

https://rubygems.org/pages/about

34

download and install gems. Then user has to run command: bundle install. Some gems have specific

software requirements (for instance some of them need compilation of their source code written in C, thus

they need GCC and maybe Make, or other compilation toolchain).

Bundler simply ensures that the appropriate dependencies are installed for a given project without

unpleasant ordering issues or cyclical dependencies. Thanks to ease of sharing Gemfile, it enables sharing a

software project between other developers, or other machines or environments, and be confident the

application and its dependencies will behave in the same way.74

ChefDK Chef Development Kit (ChefDK) defines a common workflow for cookbook development, including unit and

integration testing, identifying lint-like behavior, dedicated tooling.75

It includes:

Cookbook dependency manager Berkshelf.

Test Kitchen integration testing framework.

ChefSpec, for cookbook's unit testing.

Foodcritic, a linting tool for doing static code analysis on cookbooks.

All of the basic Chef tools: Chef Client, Knife, Ohai and Chef Zero.

System’s design From the assumptions and requirements one can already has an impression of how the system is designed.

The design of this project resembles a flow of deployment – that is firstly instances are created on OpenStack

and AWS clouds, then Koji software packages are installed, then specific configuration of Koji cluster applied,

and at the end Koji is started and tests are conducted.

Deployment flow The execution of the system can be best presented as an ordered flow of actions leading to successful

deployment of Koji cluster on a hybrid cloud. In the Figure 8 basic flow of this deployment is visualized.

Firstly, user of the system has to download the sources from Git repository using the command git clone

[email protected]:tomasz_klosinski/koji-cookbook.git (provided that he or she has

access to it). After download of the “sources” of the cookbook and infrastructure configuration, user has to

enter the directory with it. Next step is running command vagrant up, which starts the process of

deploying the infrastructure. In this step Vagrant starts executing Vagrantfile that contains all necessary

information about the machines.

74 Stephen Nelson-Smith, op. cit., p. 37 75 https://docs.chef.io/#chef-dk-title (25/01/2015)

https://docs.chef.io/#chef-dk-title

35

Figure 8 Flow of deployment

Instances on clouds are started in accordance to order of appearance in Vagrantfile. The first starts an

instance with Koji hub on Amazon EC2 and then instance with Koji builder on OpenStack. After the instances

are created, Vagrant uses SSH protocol to log into them. Firstly, Vagrant copies the cookbooks (using rsync),

then it runs Chef Client to execute them. Chef Client registers a new “node” in Chef Server and provides basic

information about it (collected by Ohai). After execution of the cookbooks Koji cluster is ready to use by the

user.

System’s implementation The implementation of the system is based on three DevOps technologies on Linux operating system: Git,

Vagrant and Chef. Git is used with a remote repository at Bitbucket.com. To extend Vagrant configuration

possibilities four plugins are installed. Chef is used by Vagrant to provision software on cloud instances. Chef

cookbooks and information about nodes are stored on Hosted Chef Server.

Git Version control system (Source Code Manager) is central component of the project. It is not mere addition to

the project – it is its heart. It helps to stay sane when dealing with important files and collaborating on them.

Using version control is a fundamental part of any infrastructure automation.

36

Entire history of changes of project files is kept in Git repository. The local repository is synchronized with

remote one at Bitbucket.com. Figure 9 presents a history of commits of the Git repository at Bitbucket’s web

interface.

Figure 9 Git: Bitbucket.com repository

Although it is possible to develop code without version control, in practice this is highly inefficient. Especially,

when project is developed by group of developers and when it has different versions (which can be kept in

separate branches). The thesis’s project had only one author, but it had few versions. Additionally, thanks to

version control it is much easier to rollback changes that were a mistake.

Vagrant Vagrant is a tool that handles running the instances on OpenStack and AWS cloud and starts Chef Client on

them, which in turn handles the provisioning of Koji.

Vagrant enables configurable virtual infrastructure environments that are easy to reproduce. Such

environment can be built with usage of many hypervisors or cloud providers and many provision technologies

in a single consistent workflow. Machines (virtual machines or cloud instances76) are provisioned on top of

VirtualBox, VMware, AWS, or any other provider. Then provisioning tools such as shell scripts, Chef, or

Puppet, can be used to automatically install and configure software on them.77

Vagrantfile Each Vagrant project has a file which contains the definitions of virtual machines instances - it is called

Vagrantfile. In this file we also configure the connection to VMs provider (hypervisor or cloud) and software

provisioning (shell scripts and Chef).

Usually there is one Vagrantfile per project, although there may be more and the active one is regulated by a

VAGRANT_VAGRANFILE shell variable. Vagrantfile should be version controlled. With VCS it is easier to share

the environment definition and collaborate on it. Vagrantfile works the same way on each platform that

76 In the thesis these two names are used exchangeable. 77 http://docs.vagrantup.com/v2/why-vagrant/index.html (12/11/2014)

http://docs.vagrantup.com/v2/why-vagrant/index.html

37

Vagrant supports. The syntax of Vagrantfiles is Ruby, but knowledge of this programming language is not

necessary to make modifications, since it is mostly simple variable assignment.78

Vagrant plugins Vagrant is extendable by plugins.79 In the thesis project four Vagrant plugins are used. Their usage is

described in Figure 10 Vagrant: List of plugins used.

Plugin name Description

vagrant-omnibus Ensures the desired version of Chef is installed via the platform-specific Omnibus packages.80

vagrant-berkshelf Adds Berkshelf integration to the Chef provisioners. Vagrant Berkshelf will automatically download and install cookbooks onto instances.81

vagrant-aws Adds an AWS provider to Vagrant, allowing Vagrant to control and provision machines in EC2 and VPC.82

vagrant-openstack-provider Adds an OpenStack Cloud provider to Vagrant, allowing Vagrant to control and provision machines within OpenStack cloud.83

Figure 10 Vagrant: List of plugins used

Vagrant plugins are usually installed manually using command vagrant plugin install NAME.

However, this process can be automated in Vagrantfile and Figure 11 shows how to achieve that result. In

this block of code first list of required plugins is defined. Than using Ruby’s each loop, the list is iterated by

|plugin| element to install the plugin (only unless it is not already installed).

Common elements Once all plugins are installed, we can start analyzing the main part of Vagrantfile. Firstly, we have to check if

Vagrant is installed in proper version.84 This is necessary since there was a break in API’s between second

78 http://docs.vagrantup.com/v2/vagrantfile/index.html (12/11/2014) 79 Full list of Vagrant plugins is available on Vagrant’s wiki: https://github.com/mitchellh/vagrant/wiki/Available-Vagrant-Plugins (27/11/2014) 80 https://github.com/chef/vagrant-omnibus (27/11/2014) 81 https://github.com/berkshelf/vagrant-berkshelf (27/11/2014) 82 https://github.com/mitchellh/vagrant-aws (27/11/2014) 83 https://github.com/ggiamarchi/vagrant-openstack-provider (27/11/2014) 84 http://docs.vagrantup.com/v2/vagrantfile/vagrant_version.html (27/11/2014)

required_plugins = %w( vagrant-omnibus vagrant-berkshelf vagrant-aws

vagrant-openstack-provider )

required_plugins.each do |plugin|

system "vagrant plugin install #{plugin}" unless Vagrant.has_plugin?

plugin

end

Figure 11 Vagrantfile: plugins installation

http://docs.vagrantup.com/v2/vagrantfile/index.html

https://github.com/mitchellh/vagrant/wiki/Available-Vagrant-Plugins

https://github.com/mitchellh/vagrant/wiki/Available-Vagrant-Plugins

https://github.com/chef/vagrant-omnibus

https://github.com/berkshelf/vagrant-berkshelf

https://github.com/mitchellh/vagrant-aws

https://github.com/ggiamarchi/vagrant-openstack-provider

http://docs.vagrantup.com/v2/vagrantfile/vagrant_version.html

38

(1.1+) and first version (1.0.x) of Vagrant.85 From that time Vagrant.configure takes as an argument the API’s

version (in our case it is “2”).

Vagrant.configure is main element of Vagrantfile. In this Ruby’s block of code all configuration is placed. The

file is too long to place in this section of the thesis – for the convenience of reading it is explained excerpt by

excerpt. Full file is provided in

85 http://docs.vagrantup.com/v2/vagrantfile/version.html (27/11/2014)

VAGRANTFILE_API_VERSION = "2"

Vagrant.require_version ">= 1.5.0"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

(...)

end

Figure 12 Vagrantfile: API’s and syntax’s version

http://docs.vagrantup.com/v2/vagrantfile/version.html

39

Appendix B: Project’s Vagrant files.

We start the configuration by appending to block variable “config” three variables: nfs.functional, ssh.pty,

and berkshelf.enabled. First one disables the NFS sync of “vagrant” directory – that means that rsync will be

used instead. Second variable forces Vagrant to use pseudoterminal (pty) in SSH session. Third variables

enables vagrant-berkshelf plugin to handle the cookbook management.

Further in Vagrantfile we have definition of instances (virtual machines), which starts with vm.define with

name of an instance as argument. Inside the block of VM definition there are details of its configuration

(including networking, software provisioning and others).

In Figure 14 Vagrantfile: Definitions of the VMs we have an excerpt that show fragment of definition of two

virtual machines: “kojihub” and “kojibuidler”. They both have set up three variables. First is their hostname.

Second is their synced folder (directory to copy to the instance). In our case with synchronize the current

directory (indicated by “.”) to /vagrant directory on VMs using rsync. Third variable is related to chef

provisioner – Vagrant installs latest chef using omnibus installer on virtual machines.

VMs are executed by the order of appearance in Vagrantfile. Therefore first is created “kojihub” on Amazon

EC2 and second “kojibuilder” on OpenStack.

# Koji hub on Amazon EC2

config.vm.define "kojihub" do |kojihub|

kojihub.vm.hostname = "kojihub"

kojihub.vm.synced_folder '.', '/vagrant', type: "rsync"

koji.omnibus.chef_version = :latest

(...)

end

# Koji builder on OpenStack

config.vm.define "kojibuilder" do |kojibuilder|

kojibuilder.vm.hostname = "kojibuilder"

kojibuilder.vm.synced_folder ".", "/vagrant", type: "rsync"


(...)

end

Figure 14 Vagrantfile: Definitions of the VMs

config.nfs.functional = false

config.ssh.pty = true

config.berkshelf.enabled = true

Figure 13 Vagrantfile: Config specific to installation

40

Kojihub instance Up to this point of Vagrantfile, configuration of first instance was the same as instance of the second.

Further, the configurations will be explained one machine after another. Details of one machine will be

presented in following excerpts with comments regarding given part of code.

In Figure 16 Vagrantfile: Amazon dummy box we see two variables regarding the VM image (“box” in

Vagrant’s nomenclature) that Vagrant typically uses to run virtual machine on a given hypervisor. Since in

cloud environments the images are already provided, we set up a dummy box provided by provider’s plugin

developer.

Next part of the VM configuration is provider. Since we’d like to run Koji hub on Amazon’s cloud, we have to

configure the details of connection. Most of the values are already set in environment variables.

Now, we just have to provide additional configuration in user_data variable. When you launch an instance in

Amazon EC2, you have the option of passing user data to the instance that can be used to perform common

automated configuration tasks and even run scripts after the instance starts. You can pass two types of user



(...)

kojihub.vm.box = "dummy.box"

kojihub.vm.box_url = "https://github.com/mitchellh/vagrant-

aws/raw/master/dummy.box"

(...)

Figure 16 Vagrantfile: Amazon dummy box

# AWS provider

kojihub.vm.provider :aws do |aws, override|

aws.access_key_id = "#{ENV['AWS_ACCESS_KEY']}"

aws.secret_access_key = "#{ENV['AWS_SECRET_KEY']}"

aws.keypair_name = "#{ENV['AWS_KEYPAIR']}"

override.ssh.private_key_path = "#{ENV['AWS_PRIVATE_KEY_PATH']}"

override.ssh.username = "#{ENV['AWS_SSH_USERNAME']}"

aws.ami = "#{ENV['AWS_AMI_IMAGE']}"

aws.region = "#{ENV['AWS_REGION']}"

aws.instance_type = "#{ENV['AWS_INSTANCE_TYPE']}"

aws.security_groups = ["#{ENV['AWS_SECURITY_GROUP']}" ]

aws.user_data = "#!/bin/bash

echo 'Defaults:#{ENV['AWS_SSH_USERNAME']} !requiretty' >

/etc/sudoers.d/999-vagrant-cloud-init-requiretty

chmod 440 /etc/sudoers.d/999-vagrant-cloud-init-requiretty

mkdir -p /etc/chef/ohai/hints

touch /etc/chef/ohai/hints/ec2.json"

end

Figure 15 Vagrantfile: AWS provider

41

data to Amazon EC2: shell scripts and cloud-init directives. You can also pass this data into the launch wizard

as plain text, as a file (this is useful for launching instances via the command line tools), or as base64-encoded

text (for API calls).86

In our case we provide a simple shell script that handles two things: firstly, it sets sudo so that it doesn’t

require terminal (tty) from our user; secondly we add create an empty /etc/chef/ohai/hints/ec2.json file,

which is required by Chef’s Ohai to collect instance metadata from Amazon EC2.

After this basic configuration of instance, we move to next step in Vagrantfile – provisioning. In this part we

configure Chef and use cookbooks to install and configure additional software (that is Koji hub in case of this

instance).

86 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html (12/11/2014)

kojihub.vm.provision "chef_client" do |chef|

chef.node_name = "koji"

chef.chef_server_url = "#{ENV['CHEF_SERVER']}"

chef.validation_key_path = "#{ENV['CHEF_VALIDATION_KEY_PATH']}"

chef.validation_client_name = "#{ENV['CHEF_VALIDATION_CLIENT_NAME']}"

chef.json =

{

"build-essential" => {

"compiletime" => true

},

postgresql: {

password: {

postgres: '123123',

port: 5432

}

},

apache: {

listen_ports: ['80', '443'],

listen_address: '0.0.0.0'

},

selinux: {

state: 'disabled'

}

}

chef.run_list = [

"recipe[nfs::server]",

"recipe[iptables::disabled]",

"recipe[koji::default]",

"recipe[koji::test]"

]

chef.delete_node = true

chef.delete_client = true

end

Figure 17 Vagrantifle: Chef provisioning of koji hub

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

42

Figure 17 presents part of Vagrantfile that is dedicated to provisioning Koji hub and other software using

Chef. First variable defines name of the node registered in Chef Server (Hosted Chef in our case). Next three

variables are provided using environment variables: Chef Server URL, validation key path and validation key

name.

Next two variables are converted by Vagrant into JSON file that is provided to Chef for configuration. First is a

Ruby hash that overrides the default attributes of cookbooks. In this example we set build-essential

cookbook to install packages that are needed to compile Ruby gems written in C. Then we provide password

and port for PostgreSQL, ports and listen address to Apache, and we set SELinux to disabled. Second variable

is Chef’s run list – that is list of cookbook’s recipes that are to be executed on the node. In our example we

install NFS server, we disable iptables, we run default recipe of Koji and then its test recipe.

In this point configuration of first VM ends, and we proceed to next one.

Kojibuilder instance Next instance have many similarities and analogues parts to first one. Firstly, just as in the case of first VM,

we don’t need image for OpenStack provider and we use a dummy image (box) instead.

Next part of the VM’s configuration is OpenStack provider details.



(...)

kojibuilder.vm.box = "dummy.box"

kojibuilder.vm.box_url = "https://github.com/cloudbau/vagrant-

openstack-plugin/raw/master/dummy.box"

(...)

Figure 18 Vagrantfile: OpenStack dummy box

43

As in the case of AWS’s instance, OpenStack requires credentials and SSH keys. Additionally we need to

provide also Keystone (OpenStack’s authorization service) endpoint URL, tenant name, flavor, image and

floating IPs pool’s name. All those values are provided in environment variables in our example.

Next variable is related to additional (persistent) storage attached to instance – volume. This volume was

created earlier in OpenStack. In this point we only provide its id and name under which it will be accessible in

the instance.

Last part, analogously to AWS case, is user data that provides modification of sudoers file to enable SSH login

without terminal (tty). Next, we create the /etc/chef/ohai/hints/openstack.json file to make Chef’s Ohai

collect metadata about the instance from OpenStack. And at the end there is a sequence of commands that

make partition on attached volume, format it with ext4 filesystem and mount it to /var/koji.

# OpenStack provider

kojibuilder.vm.provider :openstack do |os, override|

os.username = "#{ENV['OS_USERNAME']}"

os.password = "#{ENV['OS_PASSWORD']}"

os.public_key_path = "#{ENV['OS_PUBLIC_KEY_PATH']}"

override.ssh.private_key_path = "#{ENV['OS_PRIVATE_KEY_PATH']}"

override.ssh.username = "#{ENV['OS_SSH_USERNAME']}"

os.openstack_auth_url = "#{ENV['OS_AUTH_URL']}/tokens"

os.tenant_name = "#{ENV['OS_TENANT_NAME']}"

os.flavor = "#{ENV['OS_FLAVOR']}" # 'm1.small'

os.image = "#{ENV['OS_IMAGE']}" # 'Fedora 20 x86_64'

os.floating_ip_pool = "#{ENV['OS_FLOATING_IP_POOL']}" # 'public'

os.volumes = [

{

id: 'f9976f16-3d9d-499a-86c1-42247588b3da',

device: '/dev/vdb'

}

]

os.user_data = "#!/bin/bash

echo 'Defaults:#{ENV['OS_SSH_USERNAME']} !requiretty' >




touch /etc/chef/ohai/hints/openstack.json

(echo o; echo n; echo p; echo 1; echo ; echo; echo w) | fdisk

/dev/vdb

mkfs.ext4 /dev/vdb1

mkdir -p /var/koji

mkdir -p /var/koji/mock

mkdir -p /var/koji/tmp

mount /dev/vdb1 /var/koji"

end

Figure 19 Vagrantfile: OpenStack provider

44

In the Figure 20 we have provisioning of Koji builder using Chef. Similarly to AWS’s instance, first variable

defines name of the node registered in Chef Server (Hosted Chef). Next three variables are provided using

environment variables: Chef Server URL, validation key path and validation key name.

And again next two variables are converted by Vagrant into JSON file that is provided to Chef for

configuration. First one overrides the default attributes of cookbooks: here we set SELinux to disabled.

Second variable is list of cookbook’s recipes: here we install NFS client (using default recipe), we disable

iptables and we run kojid recipe of Koji cookbook.

That’s is everything required to run Koji cluster on a hybrid cloud using Amazon Web Services and OpenStack.

In next section of this chapter we will discuss Chef and Koji cookbook.

Chef As the discipline of software development has matured, frameworks have emerged with the aim of reducing

development time by minimizing the overhead of having to implement or manage low-level details that

support the development effort. This allows developers to concentrate on rapid delivery of software that

meets customer requirements.

Chef is a framework for infrastructure development—a supporting structure and package of associated

benefits of direct relevance to framing one’s infrastructure as code. Chef provides an extensive library of

primitives for managing just about every conceivable resource that is used in the process of building up an

# Enable provisioning with chef client/

kojibuilder.vm.provision "chef_client" do |chef|

chef.node_name = "kojibuilder"




chef.json =

{

selinux: {

state: 'disabled'

}

}

chef.run_list = [

"recipe[nfs]",


"recipe[koji::kojid]"

]



end

end

Figure 20 Vagrantfile: Chef provisioning of Koji builder

45

infrastructure within language for modeling infrastructure, and a consistent abstraction layer that allows

developers and system administrators to design and build scalable environments without getting dragged

into operating system and low-level implementation details. It also provides some design patterns and

approaches for producing consistent, shareable, and reusable components.87 It was initially written in Ruby,

but the latest version is a mixture of Erlang and Ruby.

Chef is a set of DevOps tools that enable managing both physical and cloud servers. With support of version

control system it allows to create perfect clones of infrastructure environments with full change history

(allowing to rollback to any version or creating new branches of infrastructure’s configuration). Thanks to

Chef’s “Search” it is easy to configure applications that require knowledge about infrastructure (for instance

about cookbooks applied to other servers or their network configuration, etc.). The advantage of Chef is that

once servers are automated using it, replication of the whole infrastructure becomes very easy.

Chef consists of three logical components: Server, Workstation and Node (in practice Workstation is a special

form of a node). Chef Server holds the configuration data for each and every node registered with it.

Workstation holds the local Chef repository (it’s the Chef user’s personal computer). A node is a client that is

registered with the Chef Server. It has an agent known as Chef Client installed on it.

To automate the configuration of a node cookbooks are used. Chef Cookbook is the basic building blocks of

the automation. It defines a complete scenario of a node, like for instance packages installation and their

configuration. They hold the type of configuration that needs to be done on a node.88

Chef Client A Chef node needs to have an agent, known as Chef Client, installed on it. It is used to interact with the Chef

Server and to pull the configuration that needs to be done on the node.

The process conducted by Chef Client is following: firstly, it registers the node with the Chef Server; then it

downloads the required cookbook in the local cache and compiles the required recipes. Finally, it configures

the node and brings it to the expected state.89

An agent that runs on systems being managed by Chef, and the primary mechanism by which such systems

communicate with the Chef server. chef-client uses the framework’s library of primitives to configure

resources on a system by talking to a central server API to retrieve data.90

Ohai Ohai is a built-in tool that comes with Chef and is used to provide node attributes to the Chef Client so that a

node can be configured. Chef client requires some information about the node whenever it runs. Ohai is used

to detect certain attributes of that particular node and then provide them to the chef client whenever

87 Stephen Nelson-Smith, op. cit., p. 50 88 Navin Sabharwal, Manak Wadhwa, Automation through Chef Opscode, Apress, 2014, p. 4 89 Navin Sabharwal, Manak Wadhwa, op. cit., p. 5 90 Stephen Nelson-Smith, op. cit., p. 51

46

required. Ohai can also be used as a stand-alone component for discovery purposes. Ohai can provide a

variety of details from networking to platform information.91

It is a system profiling tool that gathers large quantities of data about the system, from network and user

data to software and kernel versions. Ohai is extendable – plugins can be written (usually in Ruby) that will

furnish data in addition to the defaults. The collected data is emitted in a machine-parseable and readable

format (JSON), and is used to build up a database of facts about each system that is managed by Chef.92

Chef Server Chef Server component is written in Erlang and uses a JSON-oriented document datastore. The whole Chef

framework is driven via a RESTful API, of which the Knife command-line tool is a client.

The server is open sourced, under the Apache 2.0 license, and is considered a reference implementation of

the Chef Server API. The API is also implemented as a hosted software-as-a-service offering. The hosted

version, called Hosted Chef, offers a fully resilient, highly available, multitenant environment. The platform is

free to use for fewer than five nodes, so it’s the ideal way to experiment with and gain experience with the

framework, tool, and API. A single standalone version of chef server can handle up to 10,000 nodes.

The Chef server also provides an indexing service. All information gathered about the resources managed by

Chef is indexed and searchable, meaning that Chef becomes a coordination point for dynamic, data-driven

infrastructures. It is possible to issue queries for any combination of attributes—for example, VMware

servers on VLAN 102 or MySQL slaves running CentOS 5. This opens up tremendously powerful capabilities –

a simple example would be a dynamic load balancer configuration that automatically includes the web

servers that match a given query to its pool of backend nodes.

The most important thing to understand is that the Chef server is fundamentally nothing more than a

publishing platform with an API, an index, and a dependency solver. All interactions, without exception, are

via the REST API.93

Chef Server is a centrally located server which holds all the data related to the registered nodes (i.e.,

cookbooks, the node object, and metadata). The agent (chef client) runs on each and every node, and it gets

the configuration data from the server and then applies the configuration to a particular node. This approach

is quite helpful in distributing the effort throughout the organization rather than on a single server.

There are two different types of chef server: Hosted Enterprise Chef and On Premises Chef Server.94

Additionally chef can be used in non-client/server architecture using Chef Solo.

91 Navin Sabharwal, Manak Wadhwa, op. cit., p. 5 92 Stephen Nelson-Smith, op. cit., p. 51 93 Stephen Nelson-Smith, op. cit., pp. 52-53 94 https://www.chef.io/chef/choose-your-version/ (10/11/2014)

https://www.chef.io/chef/choose-your-version/

47

Hosted Enterprise Chef Enterprise chef is the paid version of the chef server which comes with two types of installations: one is on-

premise installation (i.e., in your datacenter behind your own firewall) and the other is the hosted version in

which chef is offered as a service hosted and managed by Chef Company.

The major difference between the enterprise version and the open source version is that the enterprise

version comes with high-availability deployment support and has additional features on reporting and

security.

On Premises Chef The open source chef server has most of the capabilities of the enterprise version. However, this version also

has certain limitations. It can be installed only in stand-alone mode (i.e., it is not available in the hosted

model). The open source chef components need to be installed on a single server, and it doesn’t offer the

levels of security available in the enterprise version. It also doesn’t provide reporting capabilities like the

enterprise version.95

Search feature Search feature is essential part of Chef. It can be used in Knife or inside a cookbook.

Chef server maintains an index of your data (environments, nodes, roles). Search index easily allows you to

query the data that is indexed and then use it within a recipe. There is a specified query syntax that supports

range, wildcard, exact, and fuzzy. Search can be done from various places in chef; it can be within a recipe, it

can be from the management console. The search engine in a chef installation is based on Apache Solr.96

We can use the result of a search query in a recipe. The following code shows an example of using a simple

search query in a recipe:

search(:node, "attribute:value")

The result of a search query can be stored in variable and then can be used anywhere within a recipe.

The search query in Figure 21 shows fragment of builder.rb recipe. It will return the servers with the recipe

koji::hub applied and then it iterates over the result set to put into Chef’s log a string informing about name

of the found host and its IP address.

95 Navin Sabharwal, Manak Wadhwa, op. cit., p. 6 96 Navin Sabharwal, Manak Wadhwa, op. cit., pp. 90-91

kojihubs = search(:node, 'recipes:koji\:\:hub')

kojihubs.each do |node|

Chef::Log.info("#{node['hostname']} has IP address #{node['ipaddress']}")

end

Figure 21 Chef: Search in a recipe

48

Knife A workstation is a system that is used to manage chef. There can be multiple workstations for a single chef

server. It is simply a machine where knife is used to manage the Chef Server.

Knife is a command line tool used to interact with the chef server. The complete management of the chef

server is done using knife.

Some of the functions of knife include:

Managing nodes.

Uploading cook books and recipes.

Managing roles and environments.

Knife is a multipurpose command-line tool that facilitates system automation, deployment, and integration.

It provides command and control capabilities for managing physical, virtual, and cloud environments across a

range of Linux, Unix, and Windows platforms. It is also the primary means by which the underlying model

that makes up the Chef framework is managed. Knife is extensible and has a pluggable architecture.97

Figure 22 shows the content of .chef/knife.rb98 configuration file used in the thesis’s project. All of the data in

the file are hidden in environment variables (some of the values are shared with Vagrantfile’s configuration).

The only values that are provided directly in the file are related to default author’s name, email and copyright

of a new cookbook created using knife.

97 Stephen Nelson-Smith, op. cit., p. 52 98 See https://docs.chef.io/config_rb_knife.html for more information on knife configuration options.

https://docs.chef.io/config_rb_knife.html

49

Knife is a tool that very useful to investigate the nodes and their attributes. It also enables us to use Chef’s

search. It can be also used to test if new configuration of server was applied.

Some of the useful knife commands include:

knife node list

knife search node ‘recipes:cookbook\:\:recipe’

knife search node ‘recipes:cookbook\:\:recipe’ -a attribute_name

knife node show -l node_name

First one list the nodes registered in Chef Server. Second search for the node that has “cookbook::recipe”

applied. Third one additionally shows a given attribute of that node. Forth one shows all information about a

node formatted in human-readable way.

Other Chef tools Chef includes also few others tools that were not used in the project, but are worth mentioning99:

Chef Shell – an interactive debugging console that provides command-line access to the framework’s

libraries, the API, and the local system’s data.

99 Stephen Nelson-Smith, op. cit., p. 51

current_dir = File.dirname(__FILE__)

log_level :info

log_location STDOUT

node_name "workstation"

client_key "#{current_dir}/workstation.pem"

validation_client_name "#{ENV['CHEF_VALIDATION_CLIENT_NAME']}"

validation_key "#{ENV['CHEF_VALIDATION_KEY_PATH']}"

chef_server_url "#{ENV['CHEF_SERVER']}"

cache_type 'BasicFile'

cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )

cookbook_copyright "Tomasz Kłosiński"

cookbook_license "All rights reserved"

cookbook_email "[email protected]"

# AWS

knife[:aws_access_key_id] = ENV['AWS_ACCESS_KEY_ID']

knife[:aws_secret_access_key] = ENV['AWS_SECRET_ACCESS_KEY']

# OpenStack

knife[:openstack_auth_url] = "#{ENV['OS_AUTH_URL']}/tokens"

knife[:openstack_username] = "#{ENV['OS_USERNAME']}"

knife[:openstack_password] = "#{ENV['OS_PASSWORD']}"

knife[:openstack_tenant] = "#{ENV['OS_TENANT_NAME']}"

Figure 22 Chef: knife.rb configuration file

50

Chef Solo – a fully featured standalone configuration management tool that allows access to a

subset of Chef’s features without using a Chef server; suitable for simple deployments.

Chef Apply – a lightweight tool for configuring a machine to perform a function with a single

command, needing no configuration or Chef server.

Berkshelf Berkshelf is not part of Chef framework, but it is a tool that rather complements it.

At the beginning of Chef, user had to manually ensure that all dependent cookbooks are installed. User had

to download each and every one of them manually only to find out that with each downloaded cookbook,

another set of dependent cookbooks was inherited. This process is IT world is known as “dependency hell”.100

To fix this Knife gain a possibility of “site install”, which installed all the dependencies locally for the user.

However, this was still not optimal solution, since cookbook directory in the user’s repository get cluttered

with all dependent cookbooks. Usually, user haven’t really care about all those cookbooks and haven’t

wanted to see or even manage them. Additionally, knife’s “site install” downloaded always current version of

dependent cookbooks. However in some situations a particular version of cookbook was needed. Also

sharing the list of cookbooks was problematic.

This is where Berkshelf came to fix these problems. It works like Bundler for Ruby gems, managing cookbook

dependencies for the user. It downloads all the defined dependencies recursively. Instead of polluting user’s

Chef repository, it stores all the cookbooks in a central location (usually ~/.berkshelf.d/). User just commit

Berkshelf dependency file (called Berksfile) to repository, and every other person sharing this repository or

every build server could download and install all those dependent cookbooks based on it.101

Berkshelf shares twin goals of Bundler:

Ensure that the appropriate dependencies are installed for a given problem without encountering

unpleasant ordering issues or cyclical dependencies.

Ensure code can be shared between other developers, or other machines or environments, and be

confident the code and its dependencies will behave in the same way.

Berkshelf solves these problems for cookbooks, only in the place of a Gemfile, Berkshelf has a Berksfile. As

soon as we started relying on recipes from other cookbooks and made use of the include_recipe resource, we

needed to update the metadata.rb file to specify an explicit dependency on the cookbook that provided the

recipe or LWRP that we wanted. That’s perfectly reasonable and to be expected. However, my expectation is

that you pretty soon got tired of having to solve cookbook dependencies manually and recursively. Similarly,

having to upload cookbooks in the right order, one at a time, was equally tiresome. Berkshelf takes these

pains away by providing a local dependency solving solution, and by functioning as a Chef API client for

uploading cookbooks.

100 http://en.wikipedia.org/wiki/Dependency_hell (20/11/2014) 101 Matthias Marschall, Chef Infrastructure Automation Cookbook, Packt Publishing, 2013, p. 25

http://en.wikipedia.org/wiki/Dependency_hell

51

Berkshelf provides considerably more functionality than this. It’s pivotal to an entire Chef development

workflow, dubbed “The Berkshelf Way” by the group of developers from Riot Games, the company behind

Berkshelf, who open sourced it and its component tools.102

As presented in Figure 23 Berksfile consists of source directive, metadata directive and list of dependent

cookbooks. Source is a link to Berkshelf’s cookbook resource – from this website the dependent cookbook

will be downloaded. In our case this is central Chef’s community cookbook repository called Chef

Supermarket (supermarket.getchef.com). Metadata indicates that Berkshelf will also download and manage

the list of dependent cookbooks from metadata.rb file.

Koji cookbook A cookbook is the basic unit of configuration and policy definition in Chef. It defines a complete scenario for

the deployment and configuration of a Koji cluster. Chef cookbook is written in Ruby as the reference library.

For using specific Chef’s resources in cookbook extended DSL (Domain Specific Language) can be used.

It plays following role in Chef ecosystem103:

A cookbook defines the files that need to be distributed for that component onto the client.

It defines the attribute values that should be present on the nodes.

It provides definitions for reusability of code.

It provides libraries which can be used to extend the functionality of chef.

It provides recipes that specify the resources and the order of execution of code on the client.

It provides templates for file configurations.

It provides metadata which can be used specify any kind of dependency, version constraints, and so

on.

102 Stephen Nelson-Smith, op. cit., p. 173 103 Navin Sabharwal, Manak Wadhwa, op. cit., p. 87

source "https://supermarket.getchef.com"

metadata

cookbook "resource-control"

cookbook "apache2"

cookbook "database"

cookbook "hostsfile"

cookbook "postgresql"

cookbook "selinux"

cookbook "yum"

cookbook "yum-epel"

cookbook "chef-zero"

cookbook "chef"

cookbook "iptables"

cookbook "ohai"

Figure 23 Berkshelf: Berksfile

52

Metadata Cookbook metadata is used to store certain information about it. For this purpose the file metadata.rb

provides this information. The file is located in the cookbook directory.

A metadata can be used to specify the following important things104:

Dependencies: If the cookbook is dependent on any other cookbook.

Description: What the cookbook is actually doing.

Supported OS list.

Name of the cookbook.

Version of the cookbook.

The project’s cookbook metadata is presented in Figure 24. It is divided in two sections: in the first one there

are provided basic information about cookbook (name, maintainer, description, etc.); second section

contains the list of dependent cookbooks (each line starts with “depends” and name of the cookbook).

These dependency cookbook are required for Koji cookbook to run properly. Traditionally they were

managed manually by the cookbook user. However, nowadays dependencies are installed and uploaded to

Chef Server by Berkshelf basing on Berksfile file (which includes also metadata.rb file dependencies).

Attributes An attribute is a specific detail about a node. They usually contain information about the current state of the

node, state of the node at the end of the previous chef-client run, and what the state of the node should be

at the end of the current chef-client run.

Attributes are defined by:

The state of the node itself

104 Ibid., p. 118

name 'koji'

maintainer 'Tomasz Kłosiński'

maintainer_email '[email protected]'

license 'All rights reserved'

description 'Installs/Configures Koji'

long_description IO.read(File.join(File.dirname(__FILE__), 'README.md'))

version '0.1.2'

depends "database"

depends "apache2"

depends "postgresql"

depends "hostsfile"

depends "nfs"

depends "selinux"

depends "yum"

depends "yum-epel"

Figure 24 Cookbook: metadata

53

Cookbooks (in attribute files and/or recipes)

Roles

Environments

During every chef-client run, the chef-client builds the attribute list using data about the node collected by

Ohai, the node object that was saved to the Chef server at the end of the previous chef-client run. Eventually

it collects information from the rebuilt node object from the current chef-client run, after it is updated for

changes to cookbooks (attribute files and/or recipes), roles, and/or environments, and updated for any

changes to the state of the node itself.105

After the node object is rebuilt, all of attributes are compared, and then the node is updated based on

attribute precedence. At the end of every chef-client run, the node object that defines the current state of

the node is uploaded to the Chef server so that it can be indexed for search.

Attributes enables us overriding values of the cookbook configuration. Default values of variables are usually

hardcoded in cookbook, but they can be easily change through Chef’s JSON mechanism. By overriding default

values set in cookbooks, users can inject their own values.106

An attribute file is located in the attributes/default sub-directory for a cookbook. When a cookbook is run

against a node, the attributes contained in all attribute files are evaluated in the context of the node object.

Node methods (when present) are used to set attribute values on a node.

105 https://docs.chef.io/attributes.html (12/11/2014) 106 Matthias Marschall, op. cit., p. 98

node.default['koji']['domain'] = "example.com"

node.default['koji']['database']['name'] = "apache"

node.default['koji']['database']['user'] = "apache"

node.default['koji']['database']['ipaddress'] = "127.0.0.1"

node.default['koji']['database']['password'] = "apache"

node.default['koji']['hub']['topdir'] = "/mnt/koji"

node.default['koji']['client']['server'] = "http://koji.#{node['koji']['domain']}/kojihub"

node.default['koji']['client']['weburl'] = "http://koji.#{node['koji']['domain']}/koji"

node.default['koji']['client']['topurl'] = "http://kojipkgs.#{node['koji']['domain']}/koji"

node.default['koji']['kojira']['server'] = "http://koji.#{node['koji']['domain']}/kojihub"

node.default['koji']['kojira']['weburl'] = "http://koji.#{node['koji']['domain']}/koji"

node.default['koji']['kojira']['topurl'] = "http://kojipkgs.#{node['koji']['domain']}/koji"

node.default['koji']['kojid']['server'] = "http://koji.#{node['koji']['domain']}/kojihub"

node.default['koji']['kojid']['weburl'] = "http://koji.#{node['koji']['domain']}/koji"

node.default['koji']['kojid']['topurl'] = "http://kojipkgs.#{node['koji']['domain']}/koji"

Figure 25 Cookbook: Attributes

https://docs.chef.io/attributes.html

54

In Figure 25 there are listed default attributes for Koji cookbook (attributes/default.rb file). Among them we

have domain name, database configuration details, main directory of Koji hub and connection details for Koji

client.

Templates Template is a Chef’s resource that is used to manage the contents of a configuration file. It stores files in an

ERB (Embedded Ruby) template. Templates are stored in the template/default subdirectory of the

cookbook.107

Embedded Ruby allows Ruby code to be embedded within a pair of <% and %> delimiters. These embedded

code blocks are then evaluated in place (they are replaced by the result of their evaluation).108 To implement

ERB Chef uses Erubis109 as its template language.

There two types of delimiters in ERB:

<%= %> is used to print the value of a variable or Ruby expression into the generated file.

<%- %> use used to embed Ruby logic into the template file (it allows to loop over a list for

instance).110

/etc/koji.conf File /etc/koji.conf is based on client-koji.conf.erb template file. It provides basic configuration for Koji client:

that is details regarding connection to Koji hub, such as server URL, web URL, top directory URL and top

directory path on the server.

107 Navin Sabharwal, Manak Wadhwa, op. cit., p. 113 108 Stephen Nelson-Smith, op. cit., p. 234 109 Erubis website: http://www.kuwata-lab.com/erubis/ (03/01/2015) 110 Matthias Marschall, op. cit., p. 103

[koji]

;configuration for koji cli tool

;url of XMLRPC server

server = <%= node[:koji][:client][:server] %>

;url of web interface

weburl = <%= node[:koji][:client][:weburl] %>

;url of package download site

topurl = <%= node[:koji][:client][:topurl] %>

;path to the koji top directory

topdir = <%= node[:koji][:client][:topdir] %>

Figure 26 Cookbook: Template /etc/koji.conf

http://www.kuwata-lab.com/erubis/

55

/etc/httpd/conf.d/kojihub.conf This file is based on httpd-kojihub.conf.erb template file. It is configuration of XML-RPC server running under

mod_wsgi in Apache. As Figure 27 shows, there are no modifications attributes in it and default values from

Koji hub installation is used instead.

/etc/koji-hub/hub.conf This file is based on hub.conf.erb template file. It handles the configuration of Koji hub – its connection to

database, its main directory (where Koji stores packages and repositories), URL to Koji web and other

variables. By default new user login creates the user in database and Koji notifies package maintainer that a

build was success.

Alias /kojihub /usr/share/koji-hub/kojixmlrpc.py

<Directory "/usr/share/koji-hub">

Options ExecCGI

SetHandler wsgi-script

Order allow,deny

Allow from all

</Directory>

Alias /kojifiles "/mnt/koji/"

<Directory "/mnt/koji">

Options Indexes

AllowOverride None

Order allow,deny

Allow from all

</Directory>

Figure 27 Cookbook: Template /etc/httpd/conf.d/kojihub.conf

[hub]

DBName = <%= node[:koji][:database][:name] %>

DBUser = <%= node[:koji][:database][:user] %>

DBHost = <%= node[:koji][:database][:ipaddress] %>

DBPass = <%= node[:koji][:database][:password] %>

KojiDir = <%= node[:koji][:hub][:topdir] %>

LoginCreatesUser = On

KojiWebURL = <%= node[:koji][:hub][:weburl] %>

NotifyOnSuccess = True

Figure 28 Cookbook: Template /etc/koji-hub/hub.conf

56

/etc/kojira/kojira.conf This file is based on kojira.conf.erb template file. It contains the configuration of Kojira service, which is

responsible for keeping order in main RPM repository of Koji (that is, it deletes old builds). In the

configuration file, presented in Figure 29 we can see that attributes are used to provide server URL, top

directory path and rest of the variables are default ones.

/etc/kojid/kojid.conf This file is based on kojid.conf.erb template file. It controls the Koji builder (kojid) service. It includes variables

provided by Chef attributes: credentials, Koji hub URL, top URL and other variables are default.

Recipes Recipes are the configuration units in chef that are actually deployed on the client and are used to configure

the system. They are written in Ruby and Chef’s DSL. Recipes are normally a collection of resources with a bit

of Ruby code. A recipe helps in configuring the nodes that is stored in a cookbook. It can be used in any other

recipe. Every recipe is executed in a top-down approach.111

111 Navin Sabharwal, Manak Wadhwa, op. cit., pp. 88-89

[kojira]

user=kojira

password=kojira

server=<%= node[:koji][:kojira][:huburl] %>

topdir=<%= node[:koji][:kojira][:topdir] %>

logfile=/var/log/kojira.log

with_src=no

Figure 29 Cookbook: Template /etc/kojira/kojira.conf

[kojid]

user = <%= node['hostname'] %>

password = <%= node['hostname'] %>

topdir=<%= node[:koji][:kojid][:topdir] %>

workdir=/var/koji/tmp

mockdir=/var/koji/mock

mockuser=kojibuilder

vendor=Koji

packager=Koji

distribution=Koji

mockhost=koji-linux-gnu

server=<%= node[:koji][:kojid][:huburl] %>

topurl=<%= node[:koji][:kojid][:topurl] %>

Figure 30 Cookbook: Template /etc/kojid/kojid.conf

57

Ruby is a programming language designed to read and behave in a predictable manner. Recipe is mostly a

collection of resources, defined using patterns (resource names, attribute-value pairs, and actions). Recipe

must define everything that is required to configure part of a system. It also has to be stored in a cookbook.

One recipe may be included in another one or it can have a dependency on one (or more) recipes. Recipe

may use the results of a search query and read the contents of a data bag (including an encrypted data bag).

It may tag a node to facilitate the creation of arbitrary groupings. It must be added to a run-list before it can

be used by the chef-client and it is always executed in the same order as listed in a run-list.112

default.rb Default recipe is executed when a run list indicates a cookbook that has to be deployed but doesn’t specify

which recipe it has to run. Figure 31 presents default recipe of Koji cookbook. In this case firstly we ensure

installation of EPEL yum repository by including “yum-epel” recipe (from yum cookbook). Then default recipe

includes other Koji recipes: client, hub, database, kojira and builder.

In other words, entire Koji stack is installed except from Koji builder (kojid). Test recipe is also not included by

default.

112 http://docs.chef.io/recipes.html (11/12/2014)

node['yum']['epel']['enabled'] = true

include_recipe "yum-epel"

include_recipe "koji::client"

include_recipe "koji::hub"

include_recipe "koji::database"

include_recipe "koji::kojira"

include_recipe "koji::builder"

Figure 31 Cookbook: default.rb recipe

http://docs.chef.io/recipes.html

58

client.rb Client recipe installs and configure Koji client. Firstly, a user for Koji client is created (kojiadmin), a directory

“.koji” and a symlink to main configuration file. Thanks to this – a user to use Koji will only need to switch to

kojiadmin user. Recipe then provides using template resource a configuration file /etc/koji.conf and sets its

owner to root and mode to 0644.

user "kojiadmin" do

supports :manage_home => true

comment "kojiadmin"

home "/home/kojiadmin"

shell "/bin/bash"

password "123123"

end

directory "/home/kojiadmin/.koji" do

owner "kojiadmin"

group "kojiadmin"

mode 00755

action :create

end

link "/home/kojiadmin/.koji/config" do

user "kojiadmin"

to "/etc/koji.conf"

end

package "koji " do

action :install

end

template "/etc/koji.conf" do

source "client-koji.conf.erb"

mode 0644

owner "root"

group "root"

end

Figure 32 Cookbook: client.rb recipe

59

hub.rb Hub recipe is main part of the cookbook. As Figure 33 Cookbook: hub.rb recipe - part 1 indicates, this recipe

installs and configures the Koji hub (server). Firstly, the recipe ensures that proper FQDN is set in /etc/hosts

file. Then user “apache” is created (it is user that will be used by the Koji hub service. Next step is installing

Koji hub packages: koji-hub, httpd, mod_ssl and mod_wsgi. Last two lines are modification of Apache

hostsfile_entry "127.0.0.1" do

hostname 'koji.example.com'

aliases ['kojihub.example.com', 'kojiweb.example.com',

'kojipkgs.example.com']

unique true

comment 'Append by Recipe koji::hub'

action :append

end

user "apache" do

supports :manage_home => true

comment "apache"

home "/home/koji"

shell "/bin/bash"

password "123123"

end

%w{koji-hub httpd mod_ssl mod_wsgi}.each do |pkg|

package pkg do

action :install

end

end

node.default['apache']['prefork']['maxrequestworkers'] = 100

node.default['apache']['worker']['maxrequestworkers'] = 100

directory "/etc/httpd/conf.d/" do

owner "root"

group "root"

mode 00755

action :create

end

template "/etc/httpd/conf.d/kojihub.conf" do

source "httpd-kojihub.conf.erb"

mode 0440

owner "root"

group "root"

end

ruby_block "Add 'Include conf.d/*.conf' to /etc/httpd" do

block do

File.open("/etc/httpd/conf/httpd.conf", 'a').puts "Include

conf.d/*.conf"

end

end

Figure 33 Cookbook: hub.rb recipe - part 1

60

service’s configuration file that is required by Koji to work optimally. In next steps we add /etc/httpd/conf.d/

directory and provide to it Koji hub web server’s configuration file. Lastly using ruby block we add a line

“Include conf.d/*.conf” to /etc/httpd.conf.

In the next part of the file (Figure 34 Cookbook: hub.rb recipe - part 2), we provide main configuration file of

Koji hub: /etc/koji-hub/hub.conf. Then we establish a directory hierarchy for Koji in /mnt/koji and we export

it as a NFS share.

template "/etc/koji-hub/hub.conf" do

source "hub.conf.erb"

mode 0440

owner "root"

group "root"

end

%w{koji koji/packages koji/repos koji/work koji/scratch}.each do |dir|

directory "/mnt/" + dir do

owner "apache"

group "apache"

mode 00755

action :create

end

end

nfs_export "/mnt/koji" do

network '*'

writeable true

sync true

options ['no_root_squash', 'insecure']

end

directory "/var/www/html/koji" do

owner "apache"

group "apache"

mode 00755

action :create

end

Figure 34 Cookbook: hub.rb recipe - part 2

61

database.rb

include_recipe 'build-essential::default'

include_recipe "postgresql::server"

include_recipe "database::postgresql"

node.default['postgresql']['pg_hba'] = [

{:comment => '# TYPE DATABASE USER IP-ADDRESS

METHOD',

:type => 'local', :db => 'all', :user => 'all', :addr => nil, :method

=> 'trust'},


METHOD',

:type => 'local', :db => 'apache', :user => 'apache', :addr =>

nil, :method => 'trust'},


METHOD',

:type => 'local', :db => 'apache', :user => 'apache', :addr =>

nil, :method => 'trust'},


METHOD',

:type => 'host', :db => 'apache', :user => 'all', :addr =>

'127.0.0.1/32', :method => 'trust'},


METHOD',

:type => 'host', :db => 'template1', :user => 'all', :addr =>

'127.0.0.1/32', :method => 'trust'},


METHOD',

:type => 'host', :db => 'apache', :user => 'postgres', :addr =>

'0.0.0.0/0', :method => 'md5'}

]

connection_user_postgres = {

:host => '127.0.0.1',

:port => node['postgresql']['config']['port'],

:username => 'postgres',

:password => node['postgresql']['password']['postgres']

}

execute "Create a postgresql user for koji but grant no privileges" do

user "postgres"

exists = <<-EOH

psql -U postgres -d template1 -c \'\\du\' | grep -c apache

EOH

cwd "/var/lib/pgsql/"

command "psql -U postgres -d template1 -c \"CREATE ROLE apache

PASSWORD 'apache' NOSUPERUSER NOCREATEDB NOCREATEROLE LOGIN;\""

not_if exists, :user => "postgres"

end

Figure 35 Cookbook: database.rb - part 1

62

Database recipe installs and configures PostgreSQL server for Koji hub. It requires build-essential recipe to

build Ruby gem that enables connecting to PostgreSQL service and manipulate the schema from recipe’s

code. Next recipe that it requires is “postgresql::server”, which basically installs PostgreSQL server, and

“database::postgresql” recipe which provides resources for manipulating the database: its users and its

schema. It allows also creating new databases and tables.

In next step we configure pg_hba configuration file that is responsible for restricting access to PostgreSQL

database. Further, we define a connection to the database. Then we create a database user for Koji called

“apache”. In next excerpt of the code we start with creating a database for Koji called “apache” and after

that we create a connection to this database (with previously created “apache” user). Next step is executing

the SQL script that creates the schema of Koji’s database.

At the end we set PostgreSQL to listen to all addresses and we start the service and enable it at boot time.

postgresql_database "apache" do

connection connection_user_postgres

provider Chef::Provider::Database::Postgresql

template 'DEFAULT'

encoding 'DEFAULT'

tablespace 'DEFAULT'

connection_limit '-1'

owner 'apache'

action :create

end

connection_user_apache = {

:host => '127.0.0.1',

:port => node['postgresql']['config']['port'],

:username => 'apache',

:password => node['postgresql']['password']['postgres'],

:database_name => 'apache'

}

execute "run schema script: /usr/share/doc/koji-1.9.0/docs/schema.sql" do

user "postgres"

exists = <<-EOH

cat /var/lib/pgsql/lock.txt | grep -c lock

EOH

command "psql -U apache -d apache < /usr/share/doc/koji-

1.9.0/docs/schema.sql && echo lock > /var/lib/pgsql/lock.txt"

not_if exists, :user => "postgres"

end

node.default['postgresql']['config']['listen_addresses'] = '*'

service "postgresql" do

supports :status => true, :restart => true, :reload => true

action [ :enable, :start ]

end

Figure 36 Cookbook: database.rb recipe - part 2

63

kojira.rb Next recipe is for installation of Kojira. We start from installation of the koji-utils package (it includes kojira

service). Then we restart Apache server. We add a user and password for Kojira service in Koji hub database.

In next step we provide a configuration file /etc/kojira/kojira.conf. At the end we start and enable at boot the

service.

package "koji-utils" do

action :install

end

service "httpd" do

action :restart

end

bash "Add kojira user and grant repo permissions in koji" do

user "postgres"

cwd "/var/lib/pgsql"

exists = <<-EOH

psql -U apache -d apache -c "select * from users where

name='kojira'" | grep -c kojira

EOH

code <<-EOH

koji --user=admin --password=admin add-user kojira

psql apache -h127.0.0.1 --command "UPDATE users SET password='kojira'

WHERE name='kojira';"

koji --user=admin --password=admin grant-permission repo kojira

EOH

not_if exists, :user => 'postgres'

end

template "/etc/kojira/kojira.conf" do

source "kojira.conf.erb"

mode 0440

owner "root"

group "root"

end

service "kojira" do



end

Figure 37 Cookbook: kojira.rb recipe

64

builder.rb Builder is a recipe that adds to Koji hub existing Koji builders. First we search in Chef Server if we have any

node that has “koji::kojid” recipe applied. Then we iterate over this result to configure each node. In the

block of configuration we start from adding the host to Koji hub, then we increase its capacity to “4.0”.

In next step we provide a user/password for this node so that it can with these credentials authorize against

Koji hub. Then we have granting permissions to this user so that it has access to the Koji’s repository.

kojibuilders = search(:node, 'recipes:koji\:\:kojid')

kojibuilders.each do |kojid|

execute "koji add-host #{kojid['hostname']} x86_64" do

user "root"

cwd "/root"

exists = <<-EOH

koji --user=admin --password=admin list-hosts | awk '{ print $1 }' | grep

-Fx #{kojid['hostname']} | grep -c #{kojid['hostname']}

EOH

command "koji --user=admin --password=admin add-host #{kojid['hostname']}

x86_64"

not_if exists

end

execute "koji edit-host --capacity=4.0 #{kojid['hostname']}" do

user "root"

cwd "/root"

exists = <<-EOH

koji --user=admin --password=admin list-hosts --channel=createrepo | awk

'{ print $1 }' | grep -Fx #{kojid['hostname']} | grep -c #{kojid['hostname']}

EOH

command "koji --user=admin --password=admin edit-host --capacity=4.0

#{kojid['hostname']}"

not_if exists

end

bash "Add kojid user" do

user "postgres"


exists = <<-EOH

psql -U apache -d apache -c "select name from users where

name='#{kojid['hostname']}'" | grep -c #{kojid['hostname']}

EOH

code <<-EOH

koji --user=admin --password=admin add-user #{kojid['hostname']};

EOH

not_if exists

end

Figure 38 Cookbook: builder.rb recipe - part 1

65

bash "Set kojid user's password" do

user "postgres"


exists = <<-EOH

psql -U apache -d apache -c "select password from users where

name='#{kojid['hostname']}'" | grep -c #{kojid['hostname']}

EOH

code <<-EOH

psql -U apache -d apache -h127.0.0.1 --command "UPDATE users SET

password='#{kojid['hostname']}' WHERE name='#{kojid['hostname']}';";

EOH

not_if exists

end

bash "Grant repo permissions to user kojid" do

user "postgres"


exists = <<-EOH

psql -U apache -d apache -c "select * from user_perms up join users u on

up.user_id=u.id where u.name='#{kojid['hostname']}' and up.perm_id='3'" | grep -c

#{kojid['hostname']}

EOH

code <<-EOH

koji --user=admin --password=admin grant-permission repo

#{kojid['hostname']};

EOH

not_if exists

end

end

Figure 39 Cookbook: builder.rb recipe - part 2

66

kojid.rb Kojid recipe is one that is separate from the others because it can be applied to other servers than Koji-hub.

Additionally, we can have more than one Koji builder. Recipe starts from including a PostgreSQL client recipe,

include_recipe "postgresql::client"

%w{koji koji/mock koji/tmp}.each do |dir|

directory "/var/" + dir do

owner "root"

group "root"

mode 00755

action :create

end

end

kojihubs = search(:node, 'recipes:koji\:\:hub')

kojihubs.each do |node|

Chef::Log.info("#{node['hostname']} has IP address #{node['ipaddress']}")

end

unless kojihubs.empty?

if `ping -q -c3 #{kojihubs.first['ipaddress']} 2>/dev/null 1>/dev/null &&

echo true || echo false` then

kojihubip = kojihubs.first['ipaddress']

end

if !kojihubs.first['cloud'].nil? && `ping -q -c3

#{kojihubs.first['cloud']['public_ipv4']} 2>/dev/null 1>/dev/null && echo

true || echo false` then

#kojihubip = kojihubs.first['eip_address']

kojihubip = kojihubs.first['cloud']['public_ipv4']

end

else

kojihubip = "127.0.0.1"

end

hostsfile_entry "#{kojihubip}" do

hostname 'koji.example.com'

aliases ['kojihub.example.com', 'kojiweb.example.com',

'kojipkgs.example.com']

unique true

comment 'Append by Recipe koji::kojid'

action :append

end

hostsfile_entry "127.0.0.1" do

hostname "#{node['hostname']}.example.com"

unique true

comment 'Append by Recipe koji::kojid'

action :append

end

Figure 40 Cookbok: kojid.rb recipe - part 1

67

so that we can connect to PostgreSQL database of Koji hub. Than we provide a directories hierarchy for kojid

in /var/koji. In next step we search in Chef Server for a node that has hub recipe. In next step we assign a

public IP address of this node to a variable kojihubip. Using this variable we add Koji hub IP address to

/etc/hosts file.

In next part we create a directory for /mnt/koji and we mount under it a NFS share from Koji hub server.

Then we create in it a hierarchy of directories required by kojid service. After this step we install EPEL yum

repository and we install koji-builder and related packages. Ten we provide /etc/kojid/kojid.conf

configuration file using a template.

directory "/mnt/koji" do

owner "root"

group "root"

mode 00755

action :create

end

mount "/mnt/koji" do

device "kojihub.example.com:/mnt/koji"

fstype "nfs"

options "rw"

action [:mount, :enable]

end

%w{koji/mock koji/tmp }.each do |dir|

directory "/mnt/" + dir do

owner "root"

group "root"

mode 00755

action :create

end

end

include_recipe "yum-epel"

%w{mock setarch rpm-build createrepo koji-builder}.each do |pkg|

package pkg do

action :install

end

end

template "/etc/kojid/kojid.conf" do

source "kojid.conf.erb"

mode 0440

owner "root"

group "root"

end

Figure 41 Cookbook: kojid.rb recipe - part 2

68

In the next part we check if Koji hub is available to this Kojid node (in other words, we check if there aren’t

any network issues between Koji hub and Koji builder or if firewall is blocking the connections).

If this test is passed and Koji hub is available, then we start kojid service. At this point Koji cluster should be

ready to use.

test.rb Test recipe is described in next section of this chapter regarding tests of the deployment.

Other files Usually in a cookbook repository there are also files not directly related to Chef. Those include among others

README.md, CHANGELOG.md, LICENSE. First contains description of cookbook and its usage documentation,

second one is log of changes made to the cookbook and last one is of course license applied to the cookbook.

If cookbook is kept in a Git repository, than it will also contain a .gitignore file which contains list of files and

directories (or regular expressions indicating files and directories) that Git version control system should

ignore. Analogously works Chefignore file – it determines which coookbooks to ignore when uploading them

to Chef Server (using Berkshelf or Knife).

Koji cookbook has only basic information provided in these files: author name and email, license (“All rights

reserved”). CHANGELOG will be updates once the cookbook will be made public for Chef community.

README.md contains short description, installation requirements and a how-to regarding cookbook usage.

kojihubAvialable = false

ruby_block "check if koji-hub accepts connections on port 80" do

block do

server = kojihubip

port = 80

begin

Timeout.timeout(5) do

Socket.tcp(server, port){}

end

Chef::Log.info('connections open')

kojihubAvialable = true

rescue

Chef::Log.fatal('connections refused')

end

end

end

if ! kojihubs.empty? and kojihubAvialable # and userExistsInDB

service "kojid" do



end

end

Figure 42 Cookbook: kojid.rb recipe - part 3

69

.gitignore Chefignore

*~ *# .#* \#*# .*.sw[a-z] *.un~ pkg/ # Berkshelf .vagrant /cookbooks Berksfile.lock # Bundler Gemfile.lock bin/* .bundle/* .kitchen/ .kitchen.local.yml *sublime*

.DS_Store Icon? nohup.out ehthumbs.db Thumbs.db # EDITORS # \#* .#* *~ *.sw[a-z] *.bak REVISION TAGS* tmtags *_flymake.* *_flymake *.tmproj .project .settings mkmf.log *sublime* ## COMPILED ## a.out *.o *.pyc *.so *.com *.class *.dll *.exe */rdoc/ # SCM # .git */.git .gitignore .gitmodules .gitconfig .gitattributes .svn */.bzr/* */.hg/* */.svn/* # Berkshelf # cookbooks/* tmp

70

# Cookbooks # CONTRIBUTING CHANGELOG* # Vagrant # .vagrant Vagrantfile

Figure 43 Cookbook: .gitignore and chefignore

Koji cookbook’s .gitignore and Chefignore files are presented in Figure 43. Generally they are complementary.

Usually files that are not be visible in Git repository, shouldn’t also be included in uploaded cookbook.

However, there may be exceptions to this rule.

Tests Tests were conducted in two ways: manually and automatically. Manual test included login into via SSH into

machine and checking the service availability and status. Checking the logs of the services installed and

configured by Chef. Also some information can be obtained via Koji client – for instance it is possible to test

authentication, list Koji builders hosts or build a package. Although I decided to test these capabilities using a

shell script.

Automatic tests consists of two scripts: a ruby recipe and a shell script. First one is test.rb recipe of Koji

cookbook. As Figure 44 presents the recipe firstly copies a cookbook file from the cookbook

(files/default/centos.sh) to /tmp/centos6.sh, then it checks if there are available any Koji builders. If there are

Kojid hosts than test script is executed, otherwise "No kojid hosts found!" string is send to Chef log.

cookbook_file "/tmp/centos6.sh" do

source "centos6.sh"

mode '0744'

path "/tmp/centos6.sh"

action :create_if_missing

end

kojidHostsNumber = "`koji --user=admin --password=admin list-hosts --quiet

| wc -l`".to_i

if kojidHostsNumber > 1

execute "run /tmp/centos6.sh" do

command "sh /tmp/centos6.sh"

end

else

Chef::Log.fatal("No kojid hosts found!")

end

Figure 44 Cookbook: test.rb recipe

71

Figure 45 shows what centos6.sh script contains. This scripts consists of four parts. Firstly, we add tag to Koji

defining a new distribution (CentOS 6 in this case) and additional sub-tag to attach “x86_64” architecture to

it.

In the second part we add two yum repositories to this tag: main centos6 repository and EPEL repository. This

constitute a target for the tag.

In the third part of the script we create virtual yum groups for building RPM packages. There are two groups:

one for building SRPM and another to build RPM. To each of the group list of packages required for building is

assigned. For building SRPMs we need such tools as bash, cvs, gnupg, make, redhat-rpm-config, rpm-build,

shadow-utils, wget, rpmdevtools. Whereas for building RPMS we need: bash, bzip2, coreutils, cpio, diffutils,

findutils, gawk, gcc, grep, sed, gcc-c++, gzip, info, patch, redhat-rpm-config, rpm-build, shadow-utils, tar,

unzip, util-linux-ng, which, make. After that repository related to this target is regenerated.

Forth part of the script builds actual RPM package (nginx web server). Firstly, it downloads a src.rpm (SRPM)

package. Then we use it to build scratch (test) version. And then we add it to Koji database and process final

build. As the result of this script we obtain a RPM package in /mnt/koji/packages directory.

# tags

koji --user=admin --password=admin add-tag dist-centos6

koji --user=admin --password=admin add-tag --parent dist-centos6 --arches "x86_64"

dist-centos6-build

# external repos

koji --user=admin --password=admin add-external-repo -t dist-centos6-build dist-

centos6-repo http://centos.bio.lmu.de/6/os/\$arch/

koji --user=admin --password=admin add-external-repo -t dist-centos6-build dist-

epel6-repo http://ftp-stud.hs-esslingen.de/pub/epel/6/\$arch/

koji --user=admin --password=admin add-target dist-centos6 dist-centos6-build

# virtual build yum groups

koji --user=admin --password=admin add-group dist-centos6-build build

koji --user=admin --password=admin add-group dist-centos6-build srpm-build

koji --user=admin --password=admin add-group-pkg dist-centos6-build build bash

bzip2 coreutils cpio diffutils findutils gawk gcc grep sed gcc-c++ gzip info patch

redhat-rpm-config rpm-build shadow-utils tar unzip util-linux-ng which make

koji --user=admin --password=admin add-group-pkg dist-centos6-build srpm-build bash

cvs gnupg make redhat-rpm-config rpm-build shadow-utils wget rpmdevtools

koji --user=admin --password=admin regen-repo dist-centos6-build

# build rpm

wget http://nginx.org/packages/rhel/6/SRPMS/nginx-1.6.2-1.el6.ngx.src.rpm -O

/tmp/nginx-1.6.2-1.el6.ngx.src.rpm

koji --user=admin --password=admin build --scratch dist-centos6 /tmp/nginx-1.6.2-

1.el6.ngx.src.rpm

koji --user=admin --password=admin add-pkg --owner koji dist-centos6 nginx

koji --user=admin --password=admin build dist-centos6 /tmp/nginx-1.6.2-

1.el6.ngx.src.rpm

Figure 45 Tests: Building nginx for CentOS 6

72

Chapter IV: Conclusion The aim of this research project, as stated in the Introduction, was to show if and eventually how a hybrid

cloud can be utilized. The results clearly demonstrate that using DevOps techniques and software it is

possible to productively employ mixed cloud environment even within a single project.

I personally believe that the project is a success for two reasons. Firstly, projects works as it was intended – it

produces an automated way to establish a Koji cluster in the clouds. Secondly, however, I learnt in depth

practical side of writing code and adjust sophisticated configurations. And this is my personal achievement.

The project was a success on OpenStack and Amazon Web Services. However, it seems that not much

changes are required to adjust it to other Vagrant providers. The cookbook and Vagrant files are more

universal and cloud-independent than I assumed when I have started writing this thesis.

Potential applications The project has an array of potential application that can be successfully employed in IT industry. Potential

user of the project is anyone who needs easy and fast method of deploying Koji cluster in the clouds to build

RPM packages. That are mostly organizations that provide their software for RPM-based Linux distributions.

Koji is used predominantly by Fedora and CentOS developers. Those organizations use Koji in a traditional

datacenter provided by Red Hat. For instance, Fedora uses 91 Koji builders for building packages in three

different processor architectures.113

Therefore, the most probable potential application would be by those projects or by a new organization

aiming at creating new RPM-based Linux distribution.

Suggestions on further studies and investigations Definitely, the project still leaves a lot of space for improvement. First of all, there are few drawbacks that

could be omitted and few things could be done better – especially testing of the code. As a suggestion for

further development I would recommend extending the project include Continuous Integration from source

code commit to automated integration tests in the clouds.

Testing The project lacks proper use of testing frameworks available for Chef and Ruby. In fact while writing this code

I lacked a proper testing mindset. As a non-developer I wrote the code in an old-fashioned and ineffective

way – that is by writing tests at the very end of the project.

Of the modern, Agile development methodologies, the practice most crucial for creating good code, warning

against unwanted side effects, is that of test-driven development (TDD). For infrastructure developers, the

practice is difficult to introduce and implement. However, it promises the biggest return on investment. TDD

113 List of Fedora’s koji builders: http://koji.fedoraproject.org/koji/hosts?start=0&state=enabled&order=name (02/12/2014)

http://koji.fedoraproject.org/koji/hosts?start=0&state=enabled&order=name

73

is a widely adopted way of software development that facilitates the creation of highly reliable and

maintainable code.

The philosophy of TDD is encapsulated in the phrase Red, Green, Refactor. This is an iterative approach that

follows six steps114:

1. Write a test based on requirements.

2. Run the test and watch it fail.

3. Write the simplest code you can to make the test pass.

4. Run the test and watch it pass.

5. Improve the code as required to make it perform well, be readable, and reusable, but without

changing its behavior.

6. Repeat the cycle.

If I would apply to this procedure it would clearly help me prevent scope from growing and it would early

reveal design problems.

Another mistake related to testing that I have done is not using the actual tools available for tests of Ruby

code and Chef cookbooks. Tools that I should use in the project include: Cucumber and Leibniz for

Acceptance Testing, Test Kitchen with Serverspec and Bats for Integration Testing, Chefspec and RSpec for

Unit Testing, and lastly Foodcritic for Linting and Static Analysis.

Continuous Integration, Deployment and Delivery Interesting continuation of the project would be implementation of Continuous Integration process to extend

the automation from new code commit through testing phase to deployment. This approach would enable

integration of automation of both software development and its deployment on infrastructure.

One of the possible solutions would be using Jenkins CI to implement a build pipeline. Jenkins (using an

external plugin) is able to control Vagrant's machines. A brief idea on how this could be done is presented by

Michael Huttermann in his book "DevOps for Developers" (p. 144). However, in our case we would use Chef

instead of Puppet.

114 Stephen Nelson-Smith, op. cit., p. 126

74

Appendix A: Koji build system Koji is an RPM build system open sourced by Red Hat and currently used by Red Hat, Fedora, CERN, CentOS,

Amazon, TomTom and many others organizations.115

The term "build system" may mean different things to different people. From the developer's (or RPM

package maintainer’s) perspective, Koji is a service that accepts build requests and farms them out to

different machines for building on cluster of koji builders. Koji tracks the resulting packages in its database

and supports a tagging system for organizing them. Koji has a web interface, a command line interface, and a

rich XML-RPC interface. Originally Koji was limited to building RPMs, but now it also supports building Java

packages via Maven. 116 Although in this project Koji is configured to build RPM packages.

Architecture Koji system is divided into four components: koji-hub, koji-web, kojira and koji builder (kojid). All of them are

written in Python programming language. Koji-hub and koji-web (optional Koji’s web interface) runs on top of

Apache web server. Koji-hub stores data about packages, builds, users, tags and other metadata in

PostgreSQL database. Kojira is service that maintains order in internal repository of Koji – it is collecting

“garbage” (old non-used RPM packages and their builds). Koji builder is a service that handles the building of

packages. There can be more than one builder and they can be installed on machines with different

processor architectures (i.e. x86_64, PPC, ARM) to compile packages for those architectures. As a good

practice Koji files can be kept on NFS share. Koji components and dependent services, like PostgreSQL or NFS,

can run on one server or on separate servers.

To build a RPM package Koji builder creates a chroot environment called buildroot – inside this isolated

environment the package is build. Koji builder firstly collects in the buildroot other packages: ones that are

needed to compile code and build RPM and ones that are dependencies of the build program. To achieve this

Koji builder utilize a tool called Mock. It enables users to reproduce build environment and debug the process

of RPM building.

Once completed, a build is imported into Koji's database and tagged. Koji's tagging system is very flexible and

can support build configurations for many different projects in the same instance.

115 http://fedoraproject.org/wiki/Koji/RunsHere (10/11/2014) 116 http://opensource.com/life/11/7/free-sake-story-koji (10/11/2014)

http://fedoraproject.org/wiki/Koji/RunsHere

http://opensource.com/life/11/7/free-sake-story-koji

75

Figure 46 Koji services diagram

Koji-hub Koji-hub is central element in Koji system and it works as a mediator between all other koji components,

database and filesystem. It is an XML-RPC server running under mod_wsgi in Apache. Koji-hub is passive in

that it only receives XML-RPC calls and relies upon the build daemons and other components to initiate

communication. Koji-hub is the only component that has direct access to the database (PostgreSQL) and is

one of the two components that have write access to the file system. Koji-hub serves also as an

authentication system for other Koji’s services and users.

76

Koji-web Koji-web is a set of scripts that run in mod_wsgi and use the Cheetah templating engine to provide a web

interface to Koji. It acts as a client to koji-hub providing a visual interface to perform a limited amount of

administration. Koji-web exposes a lot of information and also provides a means for certain operations, such

as cancelling builds. It is optional element that provides web interface to browse packages, users, koji

builders, running tasks (i.e. package building tasks) and browse logs from builds. It is not used to manage Koji

system (Koji client is used for that). Its main goal is to provide visualization of packages and their building

tasks.117

Kojira kojira is a daemon that keeps the build root repodata updated. It is responsible for removing redundant build

roots and cleaning up after a build request is completed. This service is used to maintain order in internal

repository of Koji – it updates this repository and deletes non-used elements. It is not visible for both Koji’s

user and administrator, although it plays major role in the system.

Koji builder (kojid) Koji builder is a service that builds package. Koji-hub can manage one or more koji builders. kojid is the build

daemon that runs on each of the build machines. Its primary responsibility is polling for incoming build

requests and handling them accordingly. Essentially kojid asks koji-hub for work. Koji also has support for

tasks other than building. Creating install images is one example. kojid is responsible for handling these tasks

as well. kojid uses mock for building. It also creates a fresh buildroot for every build. kojid is written in Python

and communicates with koji-hub via XML-RPC.

Mock Mock is a tool for building packages. It can build packages for different architectures and different Fedora or

RHEL versions than the build host has. Mock creates chroots and builds packages in them. Its only task is to

reliably populate a chroot and attempt to build a package in that chroot.118

Koji builder runs mock to build RPM packages. For each build kojid creates a directory in /var/lib/mock/

beginning with “dist-“ in which mock downloads dependencies and build tools, it creates its chroot directory

(buildroot) to build a package and provides logs from the building.

Koji client Koji-client is a CLI tool written in Python that provides many hooks into Koji. It allows the user to query much

of the data as well as perform actions such as adding users and initiating build requests.

117 Example of koji-web interface: http://koji.fedoraproject.org/koji/ (22/09/2014) 118 https://fedoraproject.org/wiki/Projects/Mock (02/10/2014)

http://koji.fedoraproject.org/koji/

https://fedoraproject.org/wiki/Projects/Mock

77

Additional tools

Authentication options There are three methods of authentication in Koji: login/password, SSL certificates and Kerberos. During the

installation Koji administrator has to choose one of them.

YUM repository generation YUM repository can be created either manually, by coping the packages from /mnt/koji and running

createrepo, or it can be generated basing on Koji’s tag/target using mash.

Integration with Source Control Manager Koji can build sources either from SRPM file or pulling them from SCM (like Git repository for instance).

Supported SCMs are Git, SVN, Mercurial, and CVS.

ISO generation Revisor and its fork Pungi are tools to build ISO image (Live or installation) of Linux system basing on a YUM

repository. This tools doesn’t integrate with Koji, but they are often used once the YUM repository is ready to

use. It is a simple solution to produce one’s own Linux distribution.

GPG signing of packages Sigul is a tool to sign RPM packages using GPG key used among others in Fedora project. It easily integrates

with Koji. However, user can sign packages traditionally, i.e. using rpm --sign command.

78

Appendix B: Project’s Vagrant files In this Appendix Vagrant configuration files for given infrastructure are provided: VirtualBox version,

OpenStack version, Amazon Web Services version and “production” version that includes mixed

infrastructure: OpenStack and Amazon Web Services.

Vagrantfile.vbox

# -*- mode: ruby -*-

# vi: set ft=ruby :

# Vagrant required plugins installation.

required_plugins = %w( vagrant-omnibus vagrant-berkshelf vagrant-aws )


system "vagrant plugin install #{plugin}" unless Vagrant.has_plugin? plugin

end

# Vagrant conflicting plugins

# vagrant-omnibus

# Vagrantfile API/syntax version.





config.berkshelf.berksfile_path = "Berksfile"

# CentOS has "Defult requirepty" in /etc/sudoers

# config.ssh.pty = true

# Chef zero

config.vm.define "zerodev" do |zerodev|

zerodev.vm.hostname = "zerodev"

zerodev.vm.synced_folder '.', '/vagrant'

# Set the version of chef to install using the vagrant-omnibus plugin

zerodev.omnibus.chef_version = :latest

# Every Vagrant virtual environment requires a box to build off of.

# If this value is a shorthand to a box in Vagrant Cloud then

zerodev.vm.box = "chef/centos-6.5"

# Assign this VM to a host-only network IP, allowing you to access it

# via the IP. Host-only networks can talk to the host machine as well as

# any other machines on the same network, but cannot be accessed (through

this

# network interface) by any external networks.

zerodev.vm.network :private_network, ip: "33.33.33.3"

zerodev.vm.network :forwarded_port, host: 4000, guest: 4000

79

# Provision chef-zero using chef_solo

zerodev.vm.provision "chef_solo" do |chef_solo|

chef_solo.log_level = :debug

chef_solo.data_bags_path = "./data_bags/"

chef_solo.json =

{

'build-essential' => {

compile_time: true

},

'chef-zero' => {

install: true,

start: true

},

'chef' => {

server_url: "http://127.0.0.1:4000"

}

}

chef_solo.run_list = [


"recipe[build-essential::default]",

"recipe[chef-zero::default]",

"recipe[chef::client]"

]

end

# Run chef-zero

zerodev.vm.provision :shell,

:inline => "/opt/chef/embedded/bin/chef-zero -H 0.0.0.0 -p 4000 -d"

# Run chef-client

zerodev.vm.provision :shell, :path => "scripts/bash_scripts/chef-

configuration.sh", :args => "http://127.0.0.1:4000"

end

# Koji hub

config.vm.define "kojidev" do |kojidev|

kojidev.vm.hostname = "kojidev"

kojidev.vm.synced_folder '.', '/vagrant'


kojidev.omnibus.chef_version = :latest

# Every Vagrant virtual environment requires a box to build off of.

# If this value is a shorthand to a box in Vagrant Cloud then

kojidev.vm.box = "chef/centos-6.5"

# Assign this VM to a host-only network IP, allowing you to access it

# via the IP. Host-only networks can talk to the host machine as well as

# any other machines on the same network, but cannot be accessed (through

this

# network interface) by any external networks.

kojidev.vm.network :private_network, ip: "33.33.33.30"

80


kojidev.vm.provision "chef_client" do |chef|

chef.log_level = :debug

chef.chef_server_url = "http://33.33.33.3:4000"

chef.validation_key_path = "/home/tklosinski/m/koji/.chef-

backup/dummy_key.pem"

chef.json =

{

postgresql: {

password: {

postgres: '123123'

}

},

apache: {

listen_ports: ['80', '443'],


},

selinux: {

state: 'disabled'

}

}

chef.run_list = [


"recipe[koji::kojid]",

"recipe[koji::default]"

]

end

end

end

81

Vagrantfile.openstack


# vi: set ft=ruby :

require 'rubygems'

require 'fog'

require 'chef'


required_plugins = %w( vagrant-omnibus vagrant-berkshelf vagrant-aws vagrant-

openstack-provider chef ) # vagrant-chef-kojibuilder )



end





# CentOS has "Defult requirepty" in /etc/sudoers


config.berkshelf.berksfile_path = "./Berksfile"




#kojibuilder.vm.synced_folder '.', '/vagrant', :disabled => true

kojibuilder.vm.synced_folder ".", "/vagrant", type:

"rsync", :rsync_excludes => ['bar/', 'foo/']


kojibuilder.omnibus.chef_version = :latest

# This is not used in fact, Vagrant just requires some box.


kojibuilder.vm.box_url = "https://github.com/cloudbau/vagrant-openstack-

plugin/raw/master/dummy.box"











82



end




chef.log_level = :debug




chef.json =

{

selinux: {

state: 'disabled'

}

}

chef.run_list = [


]



end

end

end

83

Vagrantfile.aws


# vi: set ft=ruby :

require 'rubygems'

require 'chef'

Chef::Config.from_file(File.join(File.dirname(__FILE__), '.chef',

'knife.rb'))


required_plugins = %w( chef vagrant-omnibus vagrant-berkshelf vagrant-aws




end








# Koji hub

config.vm.define "koji" do |koji|

koji.vm.hostname = "koji"

koji.vm.synced_folder '.', '/vagrant'




koji.vm.box = "dummy.box"

koji.vm.box_url = "https://github.com/mitchellh/vagrant-


# AWS provider

koji.vm.provider :aws do |aws, override|









84

aws.security_groups = [ "koji" ]







aws.tags = {

'Name' => 'koji'

}

end


koji.vm.provision "chef_client" do |chef|

# chef.arguments = "--splay 75"


chef.chef_server_url = Chef::Config[:chef_server_url]

chef.log_level = Chef::Config[:log_level]

chef.validation_key_path = Chef::Config[:validation_key]

chef.validation_client_name = Chef::Config[:validation_client_name]

chef.json =

{



},

postgresql: {

password: {

postgres: '123123',

port: 5432

}

},

apache: {

listen_ports: ['80', '443'],


},

selinux: {

state: 'disabled'

}

}

chef.run_list = [


"recipe[koji::kojid]",



]



end

end

end

85

Vagrantfile.production


# vi: set ft=ruby :

require 'rubygems'

require 'chef'

Chef::Config.from_file(File.join(File.dirname(__FILE__), '.chef',

'knife.rb'))


required_plugins = %w( chef vagrant-omnibus vagrant-berkshelf vagrant-aws




end










kojihub.vm.hostname = "koji"

kojihub.vm.synced_folder '.', '/vagrant'


kojihub.omnibus.chef_version = :latest


kojihub.vm.box = "dummy.box"

kojihub.vm.box_url = "https://github.com/mitchellh/vagrant-


# AWS provider

kojihub.vm.provider :aws do |aws, override|









aws.security_groups = [ "koji" ]

86







aws.tags = {

'Name' => 'koji'

}

end


kojihub.vm.provision "chef_client" do |chef|





chef.json =

{



},

postgresql: {

password: {

postgres: '123123',

port: 5432

}

},

apache: {

listen_ports: ['80', '443'],


},

selinux: {

state: 'disabled'

}

}

chef.run_list = [

"recipe[nfs::server]",




]



end

end




#kojibuilder.vm.synced_folder '.', '/vagrant', :disabled => true

87

kojibuilder.vm.synced_folder ".", "/vagrant", type:

"rsync", :rsync_excludes => ['bar/', 'foo/']


kojibuilder.omnibus.chef_version = :latest



kojibuilder.vm.box_url = "https://github.com/cloudbau/vagrant-openstack-

plugin/raw/master/dummy.box"













os.volumes = [

{

id: 'f9976f16-3d9d-499a-86c1-42247588b3da',

device: '/dev/vdb'

}

]

os.user_data = "#!/bin/bash

echo 'Defaults:#{ENV['OS_SSH_USERNAME']} !requiretty' >




touch /etc/chef/ohai/hints/openstack.json

(echo o; echo n; echo p; echo 1; echo ; echo; echo w) | fdisk

/dev/vdb

mkfs.ext4 /dev/vdb1

mkdir -p /var/koji

mkdir -p /var/koji/mock

mkdir -p /var/koji/tmp

mount /dev/vdb1 /var/koji"

end







chef.json =

{

88

selinux: {

state: 'disabled'

}

}

chef.run_list = [

"recipe[nfs]",



]



end

end

end

91

Figures Figure 1 AWS Global Infrastructure (Regions) .................................................................................................... 22

Figure 2 List of AWS Regions and Locations ....................................................................................................... 22

Figure 3 AWS - Availability Zones ....................................................................................................................... 23

Figure 4 AWS – Services ...................................................................................................................................... 24

Figure 5 Basic elements of DevOps software development method ................................................................. 26

Figure 6 Shell environment variables ................................................................................................................. 30

Figure 7 Gemfile .................................................................................................................................................. 31

Figure 8 Flow of deployment .............................................................................................................................. 33

Figure 9 Git: Bitbucket.com repository ............................................................................................................... 34

Figure 10 Vagrant: List of plugins used ............................................................................................................... 35

Figure 11 Vagrantfile: plugins installation .......................................................................................................... 35

Figure 12 Vagrantfile: API’s and syntax’s version ............................................................................................... 36

Figure 13 Vagrantfile: Config specific to installation .......................................................................................... 36

Figure 14 Vagrantfile: Definitions of the VMs .................................................................................................... 36

Figure 15 Vagrantfile: AWS provider .................................................................................................................. 38

Figure 16 Vagrantfile: Amazon dummy box ....................................................................................................... 38

Figure 17 Vagrantifle: Chef provisioning of koji hub .......................................................................................... 39

Figure 18 Vagrantfile: OpenStack dummy box ................................................................................................... 40

Figure 19 Vagrantfile: OpenStack provider ......................................................................................................... 41

Figure 20 Vagrantfile: Chef provisioning of Koji builder ..................................................................................... 42

Figure 21 Chef: Search in a recipe ...................................................................................................................... 45

Figure 22 Chef: knife.rb configuration file .......................................................................................................... 47

Figure 23 Berkshelf: Berksfile ............................................................................................................................. 49

Figure 24 Cookbook: metadata ........................................................................................................................... 50

Figure 25 Cookbook: Attributes .......................................................................................................................... 51

Figure 26 Cookbook: Template /etc/koji.conf .................................................................................................... 52

Figure 27 Cookbook: Template /etc/httpd/conf.d/kojihub.conf ........................................................................ 53

Figure 28 Cookbook: Template /etc/koji-hub/hub.conf ..................................................................................... 53

Figure 29 Cookbook: Template /etc/kojira/kojira.conf ...................................................................................... 54

Figure 30 Cookbook: Template /etc/kojid/kojid.conf......................................................................................... 54

Figure 31 Cookbook: default.rb recipe ............................................................................................................... 55

Figure 32 Cookbook: client.rb recipe .................................................................................................................. 56

Figure 33 Cookbook: hub.rb recipe - part 1 ........................................................................................................ 57

Figure 34 Cookbook: hub.rb recipe - part 2 ........................................................................................................ 58

Figure 35 Cookbook: database.rb - part 1 .......................................................................................................... 59

Figure 36 Cookbook: database.rb recipe - part 2 ............................................................................................... 60

Figure 37 Cookbook: kojira.rb recipe .................................................................................................................. 61

Figure 38 Cookbook: builder.rb recipe - part 1 ................................................................................................... 62

92

Figure 39 Cookbook: builder.rb recipe - part 2 ................................................................................................... 63

Figure 40 Cookbok: kojid.rb recipe - part 1 ........................................................................................................ 64

Figure 41 Cookbook: kojid.rb recipe - part 2 ...................................................................................................... 65

Figure 42 Cookbook: kojid.rb recipe - part 3 ...................................................................................................... 66

Figure 43 Cookbook: .gitignore and chefignore.................................................................................................. 68

Figure 44 Cookbook: test.rb recipe ..................................................................................................................... 68

Figure 45 Tests: Building nginx for CentOS 6 ...................................................................................................... 69

Figure 46 Koji services diagram .......................................................................................................................... 73

93

Bibliography Baun, C., Kunze, M., Nimis, J., & Tai, S. (2011). Cloud Computing: Web-Based Dynamic IT Services. Springer.

Beach, B. (2014). Pro PowerShell for Amazon Web Services. Apress.

Furht, B., & Escalante, A. (2010). Handbook of Cloud Computing. Springer.

Hurwitz, J., Bloor, R., Kaufman, M., & Halper, F. (2009). Cloud Computing for Dummies. For Dummies.

Huttermann, M. (2012). DevOps for Developers. Apress.

Marschall, M. (2013). Chef Infrastructure Automation Cookbook. Packt Publishing.

Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. National Institutes of Technology, U.S.

Department of Commerce. Retrieved from http://csrc.nist.gov/publications/nistpubs/800-

145/SP800-145.pdf

Nelson-Smith, S. (2013). Test-Driven Infrastructure with Chef, 2nd Edition. O'Reilly Media.

Pepple, K. (2011). Deploying OpenStack. Sebastopol : O'Reilly Media.

Rittinghouse, J. W., & Ransome, J. F. (2009). Cloud Computing: Implementation, Management and Security.

USA: CRC Press.

Sabharwal, N., & Wadhwa, M. (2014). Automation through Chef Opscode. Apress.

Sarna, D. E. (2010). Implementing and Developing Cloud Computing Applications. Auerbach Publications.

Sitaram, D., & Manjunath, G. (2011). Moving To The Cloud: Developing Apps in the New World of Cloud

Computing. Syngress.

Stellman, A., & Greene, J. (2014). Learning Agile. O'Reilly Media.

Velte, A., Velte, T. J., & Elsenpeter, R. C. (2009). Cloud Computing, A Practical Approach. USA: McGraw-Hill

Prof Med/Tech.

a hybrid cloud - public

Documents