hpc lab 20021 the grid blueprint for a new computing infrastructure editor : ian foster and carl...

HPC lab 2002 1

The GRIDBlueprint for a New Computing

Infrastructure

Editor : Ian Foster and Carl Kesselman1999 Morgan Kaufmann Publishers, Inc.Two introductory chapters and 20 technical chapters Applications Programming tools Services Infrastructures

HPC lab 2002 2

Three papers in Today’s lecture What is the Grid? A Three Point Checklist By

Ian Foster, Argonne National Laboratory & University of Chicago on his home page(http://bobcat.mcs.anl.gov/~foster/) July, 2002

The Grid: A New Infrastructure for 21st Century Science By Ian Foster, in Physics Today(Feb 2002)

The Anatomy of the Grid: Enabling Scalable Virtual Organizations, By Ian Foster, Carl Kesselman and Steven Tuecke International Journal of Supercomputer Applications, 2001.

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?OSF







http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?DME

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?DME

HPC lab 2002 3

What is the Grid? A Three Point Checklist

The recent explosion of commercial and scientific interest in the Grid makes it timely to revisit the question: What is the Grid, anyway?

Grids have moved from the obscurely academic to the highly popular. We read about Compute Grids, Data Grids, Science Grids, Access Grids, Knowledge Grids, Bio Grids, Sensor Grids, Cluster Grids, Campus Grids, Tera Grids, and Commodity Grids.

If by deploying a scheduler on my local area network I create a “Cluster Grid,” then doesn’t my Network File System deployment over that same network provide me with a “Storage Grid?” Indeed, isn’t my workstation, coupling as it does processor,

memory, disk, and network card, a “PC Grid?” Is there any computer system that isn’t a Grid?

HPC lab 2002 4

Ultimately the Grid must be evaluated in terms of the applications, business value, and scientific results that it delivers, not its architecture.

Nevertheless, the questions above must be answered if Grid computing is to obtain the credibility and focus that it needs to grow and prosper.

Early Definitions : back in 1998, Carl Kesselman and Ian Foster attempted a definition in the Text: “The Grid: Blueprint for a New Computing Infrastructure.” “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.”

Not the first to talk about on-demand access to computing, data, and services. For example, in 1969 Len Kleinrock suggested presciently, if prematurely: “We will probably see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.”

HPC lab 2002 5

In “The Anatomy of the Grid,” co-authored with Steve Tuecke in 2000, we refined the definition to address social and policy issues, stating that Grid computing is concerned with “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.”

The key concept is the ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose.

We noted: “The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource-brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource

providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization.”

HPC lab 2002 6

We also spoke to the importance of standard protocols as a means of enabling interoperability and common infrastructure.

A Grid Checklist I suggest that the essence of the definitions above can be

captured in a simple checklist, according to which a Grid is a system that:

1) coordinates resources that are not subject to centralized control … (A Grid integrates and coordinates resources and users

that live within different control domains—for example, the user’s desktop vs. central computing; different administrative units of the same company; or different companies; and addresses the issues of security, policy, payment, membership, and so forth that arise in these settings. Otherwise, we are dealing with a local management system.)

HPC lab 2002 7

2) … using standard, open, general-purpose protocols and interfaces … (A Grid is built from multi-purpose protocols and

interfaces that address such fundamental issues as authentication, authorization, resource discovery, and resource access. As I discuss further below, it is important that these protocols and interfaces be standard and open. Otherwise, we are dealing with an application specific system.)

3) … to deliver nontrivial qualities of service. (A Grid allows its constituent resources to be used in a

coordinated fashion to deliver various qualities of service, relating for example to response time, throughput, availability, and security, and/or co-allocation of multiple resource types to meet complex user demands, so that the utility of the combined system is significantly greater than that of the sum of its parts.)

HPC lab 2002 8

The checklist still leaves room for reasonable debate, concerning for example what is meant by “centralized control,” “standard, open, general-purpose protocols,” and “qualities of service.”

Example of systems(no Grid) A cluster management system such as Sun’s Sun Grid

Engine, Platform’s Load Sharing Facility, or Veridian’s Portable Batch System can, when installed on a parallel computer or local area network, deliver quality of service guarantees and thus constitute a powerful Grid resource. However, such a system is not a Grid itself, due to its centralized control of the hosts that it manages: it has complete knowledge of system state and user requests, and complete control over individual components.

At a different scale, the Web is not (yet) a Grid: its open, general-purpose protocols support access to distributed resources but not the coordinated use of those resources to deliver interesting qualities of service.

HPC lab 2002 9

On the other hand, deployments of multi-site schedulers such as Platform’s MultiCluster can reasonably be called (first-generation) Grids—as can distributed computing systems provided by Condor, Entropia, and United Devices, which harness idle desktops; peer-to-peer systems such as Gnutella, which support file sharing among participating peers; and a federated deployment of the Storage Resource Broker, which supports distributed access to data resources.

While arguably the protocols used in these systems are too specialized to meet criteria #2 (and are not, for the most part, open or standard), each does integrate distributed resources in the absence of centralized control, and delivers interesting qualities of service, albeit in narrow domains.

HPC lab 2002 10

The three criteria apply most clearly to the various large-scale Grid deployments being undertaken within the scientific community, such as the distributed data processing system being deployed internationally by “Data Grid” projects (GriPhyN, PPDG, EU DataGrid, iVDGL, DataTAG), NASA’s Information Power Grid, the Distributed ASCI Supercomputer (DAS-2) system that links clusters at five Dutch universities, the DOE Science Grid and DISCOM Grid that link systems at DOE laboratories, and the TeraGrid being constructed to link major U.S. academic sites.

Each of these systems integrates resources from multiple institutions, each with their own policies and mechanisms; uses open, general-purpose (Globus Toolkit) protocols to negotiate and manage sharing; and addresses multiple quality of service dimensions, including security, reliability, and performance.

HPC lab 2002 11

The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in

dynamic, multi-institutional virtual organizations. The sharing that we are concerned with

is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource brokering strategies emerging in industry, science, and engineering.

This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs.

A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).

HPC lab 2002 12

The following are examples of VOs: the application service providers, storage service

providers, cycle providers, and consultants engaged by a car manufacturer to perform scenario evaluation during planning for a new factory; members of an industrial consortium bidding on a new aircraft; a crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation; and members of a large, international, multiyear high energy physics collaboration.

Each of these examples represents an approach to computing and problem solving based on collaboration in computation- and data-rich environments.

HPC lab 2002 13

As these examples show, VOs vary tremendously in their purpose, scope, size, duration, structure, community, and sociology. Nevertheless, careful study of underlying technology

requirements leads us to identify a broad set of common concerns and requirements.

In particular, we see a need for highly flexible sharing relationships, ranging from client-server to peer-to-peer; for sophisticated and precise levels of control over how shared resources are used, including fine-grained and multi-stakeholder access control, delegation, and application of local and global policies; for sharing of varied resources, ranging from programs, files, and data to computers, sensors, and networks; and for diverse usage modes, ranging from single user to multi-user and from performance sensitive to cost-sensitive and hence embracing issues of quality of service, scheduling, co-allocation, and accounting.

HPC lab 2002 14

Current distributed computing technologies do not address the concerns and requirements just listed. For example, current Internet technologies address

communication and information exchange among computers but do not provide integrated approaches to the coordinated use of resources at multiple sites for computation.

Business-to-business exchanges focus on information sharing (often via centralized servers). So do virtual enterprise technologies, although here sharing may eventually extend to applications and physical devices.

Enterprise distributed computing technologies such as CORBA and Enterprise Java enable resource sharing within a single organization.

The Open Group’s Distributed Computing Environment (DCE) supports secure resource sharing across sites, but most VOs would find it too burdensome and inflexible.

HPC lab 2002 15

Storage service providers (SSPs) and application service providers (ASPs) allow organizations to outsource storage and computing requirements to other parties, but only in constrained ways: for example, SSP resources are typically linked to a customer via a virtual private network (VPN). Emerging “Distributed computing” companies seek to harness idle computers on an international scale but, to date, support only highly centralized access to those resources.

In summary, current technology either does not accommodate the range of resource types or does not provide the flexibility and control on sharing relationships needed to establish VOs.

HPC lab 2002 16

It is here that Grid technologies enter the picture. Over the past five years, research and development

efforts within the Grid community have produced protocols, services, and tools that address precisely the challenges that arise when we seek to build scalable VOs.

These technologies include security solutions that support management of credentials and policies when computations span multiple institutions; resource management protocols and services that support secure remote access to computing and data resources and the co-allocation of multiple resources; information query protocols and services that provide configuration and status information about resources, organizations, and services; and data management services that locate and transport datasets between storage systems and applications.

HPC lab 2002 17

Because of their focus on dynamic, cross-organizational sharing, Grid technologies complement rather than compete with existing distributed computing technologies. For example, enterprise distributed computing

systems can use Grid technologies to achieve resource sharing across institutional boundaries;

in the ASP/SSP space, Grid technologies can be used to establish dynamic markets for computing and storage resources, hence overcoming the limitations of current static configurations.

HPC lab 2002 18

The Grid: The Need for InterGrid Protocols My checklist speaks to what it means to be “a

Grid,” yet the title of this article asks what is “the Grid.”

This is an important distinction. The Grid vision requires protocols (and interfaces and policies) that are not only open and general-purpose but also standard.

It is standards that allow us to establish resource-sharing arrangements dynamically with any interested party and thus to create something more than a plethora of balkanized, incompatible, non-interoperable distributed systems. Standards are also important as a means of enabling general-purpose services and tools.

HPC lab 2002 19

The definition of standard “InterGrid” protocols is the single most critical problem facing the Grid community today.

On the standards side, we have the increasingly effective Global Grid Forum. On the practical side, six years of experience and refinement have produced a widely used de facto standard, the open source Globus Toolkit.

The Globus Toolkit is the software tools and services necessary to build a computational grid infrastructure and to develop applications that can exploit the advanced capabilities of the grid. Using the basic services provided by the toolkit, researchers

may build a range of higher-level capabilities. For example, Globus provides a complete implementation of

the Message Passing Interface (MPI) that can run across heterogeneous collections of computers

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?Athena

HPC lab 2002 20

The hourglass principle, as applied in the Internet Protocol suite, Globus resource management services, and Globus communication services

HPC lab 2002 21

Globus Toolkit Principles

The toolkit comprises a set of components that implement basic services for security, resource location, resource management, communication, etc.. Rather than providing a uniform programming model, such as

the object-oriented model defined by the Legion system, the Globus toolkit provides a “bag of services“ from which developers of specific tools or applications can select to meet their needs.

The toolkit distinguishes between local services, which are kept simple to facilitate deployment, and global services, which are constructed on top of local services and may be more complex. Computational grids require that a wide range of services be

supported on a highly heterogeneous mix of systems and that it be possible to define new services without changing the underlying infrastructure. (like layered structure of Internet!)

A simple, well-defined interface---the neck of the hourglass---provides uniform access to diverse implementations of local services; higher-level global services are then defined in terms of this interface.

HPC lab 2002 22

Interfaces are defined so as to manage heterogeneity, rather than hiding it. These so-called translucent interfaces provide structured

mechanisms by which tools and applications can discover and control aspects of the underlying system.

Such translucency can have significant performance advantages because, if an implementation of a higher-level service can understand characteristics of the lower-level services on which the interface is layered, then the higher-level service can either control specific behaviors of the underlying service or adapt its own behavior to that of the underlying service.

Translucent interfaces do not imply complex interfaces. Indeed, we will show that translucency can be provided via simple techniques, such as adding an attribute argument to the interface.

HPC lab 2002 23

An information service is an integral component of the toolkit. Computational grids are in a constant state of flux as

utilization and availability of resources change, computers and networks fail, old components are retired, new systems are added, and software and hardware on existing systems are updated and modified.

It is rarely feasible for programmers to rely on standard or default configurations when building applications. Rather, applications must discover characteristics of their execution environment dynamically and then either configure aspects of system and application behavior for efficient, robust execution or adapt behavior during program execution.

A fundamental requirement for discovery, configuration, and adaptation is an information-rich environment that provides pervasive and uniform access to information about the current state of the grid and its underlying components.

HPC lab 2002 24

The toolkit uses standards whenever possible for both interfaces and implementations. We envision computational grids as supporting an

important niche of applications that must co-exist with more general purpose distributed and networked computing applications such as CORBA, DCE, DCOM, and Web based technologies.

The Internet community and other groups are moving rapidly to develop official and de facto standards for interfaces, protocols, and services in many areas relevant to computational grids.

There is considerable value in adopting these standards whenever they do not interfere with other goals.

Consequently, the Globus components we will describe are not, in general, meant to replace existing interfaces, but rather seek to augment them.

HPC lab 2002 25

And now, within the Global Grid Forum we have major efforts underway to define the Open Grid Services Architecture (OGSA), which modernizes and extends Globus Toolkit protocols to address emerging new requirements, while also embracing Web services.

Companies such as IBM, Microsoft, Platform, Sun, Avaki, Entropia, and United Devices have all expressed strong support for OGSA. I hope that in the near future, we will be able to state that

for an entity to be part of the Grid it must implement OGSA InterGrid protocols, just as to be part of the Internet an entity must speak IP (among other things).

Both open source and commercial products will interoperate effectively in this heterogeneous, multi-vendor Grid world, thus providing the pervasive infrastructure that will enable successful Grid applications.

HPC lab 2002 26

The Grid: A New Infrastructure for 21st Century Science As computer networks become cheaper and more powerful, a

new computing paradigm is poised to transform the practice of science and engineering.

By Ian Foster, in Physics Today(Feb 2002) Driven by increasingly complex problems and propelled by

increasingly powerful technology, today's science is as much based on computation, data analysis, and collaboration as on the efforts of individual experimentalists and theorists. But even as computer power, data storage, and communication

continue to improve exponentially, computational resources are failing to keep up with what scientists demand of them

A personal computer in 2001 is as fast as a supercomputer of 1990. But 10 years ago, biologists were happy to compute a single

molecular structure. Now, they want to calculate the structures of complex assemblies of macromolecules and screen thousands of drug candidates.

HPC lab 2002 27

Personal computers now ship with up to 100 gigabytes (GB) of storage--as much as an entire 1990 supercomputer center.

But by 2006, several physics projects, CERN(European Laboratory for Particle Physics)'s Large Hadron Collider (LHC) among them, will produce multiple petabytes (1015 byte) of data per year. Some wide area networks

now operate at 155 megabits per second (Mb/s), three orders of magnitude faster than the state-of-the-art 56 kilobits per second (Kb/s) that connected US supercomputer centers in 1985. But to work with colleagues across the world on petabyte data sets, scientists now demand tens of gigabits per second (Gb/s).

HPC lab 2002 28

What many term the "Grid" offers a potential means of surmounting these obstacles to progress.1 Built on the Internet and the World Wide Web, the Grid is a new class of infrastructure. By providing scalable, secure, high-performance

mechanisms for discovering and negotiating access to remote resources, the Grid promises to make it possible for scientific collaborations to share resources on an unprecedented scale, and for geographically distributed groups to work together in ways that were previously impossible.2-4

The concept of sharing distributed resources is not new. In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating "like a power company or water company."5 And in their 1968 article "The Computer as a Communications

Device," J. C. R. Licklider and Robert W. Taylor anticipated Grid-like scenarios.6 Since the late 1960s, much work has been devoted to developing distributed systems, but with mixed success.

http://www.aip.org/pt/vol-55/iss-2/p42.html#ref

http://www.bioinformatics.pe.kr/sal/C/2/MPI.html


http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?authentication

HPC lab 2002 29

Now, however, a combination of technology trends and research advances makes it feasible to realize the Grid vision--to put in place a new international scientific infrastructure with tools that, together, can meet the challenging demands of 21st-century science. Indeed, major science communities now accept that Grid

technology is important for their future. Numerous government-funded R&D projects are variously developing core technologies, deploying production Grids, and applying Grid technologies to challenging applications

Technology trends A useful metric for the rate of technological change is

the average period during which speed or capacity doubles or, more or less equivalently, halves in price.

For storage, networks, and computing power, these periods are around 12, 9, and 18 months, respectively. The different time constants associated with these three exponentials have significant implications.

HPC lab 2002 30

The annual doubling of data storage capacity, as measured in bits per unit area, has already reduced the cost of a terabyte (1012 bytes) disk farm to less than $10,000. Anticipating that the trend will continue, the designers of major

physics experiments are planning petabyte data archives. Scientists who create sequences of high-resolution simulations

are also planning petabyte archives. Such large data volumes demand more from our analysis

capabilities. Dramatic improvements in microprocessor performance mean

that the lowly desktop or laptop is now a powerful computational engine.

Nevertheless, computer power is falling behind storage. By doubling "only" every 18 months or so, computer power takes five years to increase by a single order of magnitude.

Assembling the computational resources needed for large-scale analysis at a single location is becoming infeasible

HPC lab 2002 31

The solution to these problems lies in dramatic changes taking place in networking. Spurred by such innovations as doping, which boosts

the performance of optoelectronic devices, and by the demands of the Internet economy,7 the performance of wide area networks doubles every nine months or so; every five years it increases by two orders of magnitude.

The NSFnet network, which connects the National Science Foundation supercomputer centers in the US, exemplifies this trend. In 1985, NSFnet's backbone operated at a then-unprecedented 56 Kb/s.

This year, the centers will be connected by the 40 Gb/s TeraGrid network (http://www.teragrid.org/)--an improvement of six orders of magnitude in 17 years.


http://www.teragrid.org/



HPC lab 2002 32

The doubling of network performance relative to computer speed every 18 months has already changed how we think about and undertake collaboration. If, as expected, networks outpace computers at this

rate, communication becomes essentially free. To exploit this bandwidth bounty, we must imagine new

ways of working that are communication intensive, such as pooling computational resources, streaming large amounts of data from databases or instruments to remote computers, linking sensors with each other and with computers and archives, and connecting people, computing, and storage in collaborative environments that avoid the need for costly travel

HPC lab 2002 33

If communication is unlimited and free, then we are not restricted to using local resources to solve problems. When running a colleague's simulation code, I do not need to install the code locally. Instead, I can run it remotely on my colleague's

computer. When applying the code to datasets maintained at other locations, I do not need to get copies of those datasets myself (not so long ago, I would have requested tapes).

Instead, I can have the remote code access those datasets directly. If I wish to repeat the analysis many hundreds of times on different datasets, I can call on the collective computing power of my research collaboration or buy the power from a provider.

And when I obtain interesting results, my geographically dispersed colleagues and I can look at and discuss large output datasets by using sophisticated collaboration and visualization tools.

HPC lab 2002 34

Although these scenarios vary considerably in their complexity, they share a common thread. In each case, I use remote resources to do things that I cannot

do easily at home. High-speed networks are often necessary for such remote

resource use, but they are far from sufficient. Remote resources are typically owned by others, exist within

different administrative domains, run different software, and are subject to different security and access control policies.

Actually using remote resources involves several steps. First, I must discover that they exist. Next, I must negotiate

access to them (to be practical, this step cannot involve using the telephone!).

Then, I have to configure my hardware and software to use the resources effectively. And I must do all these things without compromising my own security or the security of the remote resources that I make use of, some of which I may have to pay for.

HPC lab 2002 35

Implementing these steps requires uniform mechanisms for such critical tasks as creating and managing services on remote computers, supporting single sign-on to distributed resources, transferring large datasets at high speeds, forming large distributed virtual communities, and maintaining information about the existence, state,

and usage policies of community resources.

Today's Internet and Web technologies address basic communication requirements, but not the tasks just outlined. Providing the infrastructure and tools that make large-

scale, secure resource sharing possible and straightforward is the Grid's raison d'être.

HPC lab 2002 36

Infrastructure and tools An infrastructure is a technology that we can take for granted

when performing our activities. The road system enables us to travel by car; the international banking system allows us to transfer funds across borders; and the Internet allows us to communicate with virtually any electronic device.

To be useful, an infrastructure technology must be broadly deployed, which means, in turn, that it must be simple, extraordinarily valuable, or both. A good example is the set of protocols that must be

implemented within a device to allow Internet access. The set is so small that people have constructed matchbox-

sized Web servers. A Grid infrastructure needs to provide more functionality than

the Internet on which it rests, but it must also remain simple. And of course, the need remains for supporting the resources that power the Grid, such as high-speed data movement, caching of large datasets, and on-demand access to computing.

HPC lab 2002 37

Tools make use of infrastructure services. Internet and Web tools include browsers for accessing

remote Web sites, e-mail programs for handling electronic messages, and search engines for locating Web pages.

Grid tools are concerned with resource discovery, data management, scheduling of computation, security, and so forth.

But the Grid goes beyond sharing and distributing data and computing resources. For the scientist, the Grid offers new and more powerful

ways of working, as the following examples illustrate:

HPC lab 2002 38

Science portals. We are accustomed to climbing a steep learning curve when installing and using a new software package. Science portals make advanced problem-solving methods easier to use by invoking sophisticated packages remotely from Web browsers or other simple, easily downloaded "thin clients." The packages themselves can also run remotely on suitable computers within a Grid. Such portals are currently being developed in biology, fusion, computational chemistry, and other disciplines.

Distributed computing. High-speed workstations and networks can yoke together an organization's PCs to form a substantial computational resource. Entropia Inc's FightAIDSAtHome system harnesses more than 30,000 computers to analyze AIDS drug candidates. And in 2001, mathematicians across the US and Italy pooled their computational resources to solve a particular instance, dubbed "Nug30," of an optimization problem. For a week, the collaboration brought an average of 630--and a maximum of 1006--computers to bear on Nug30, delivering a total of 42,000 CPU-days. Future improvements in network performance and Grid technologies will increase the range of problems that aggregated computing resources can tackle

HPC lab 2002 39

Large-scale data analysis. Many interesting scientific problems require the analysis of large amounts of data. For such problems, harnessing distributed computing and storage resources is clearly of great value. Furthermore, the natural parallelism inherent in many data analysis procedures makes it feasible to use distributed resources efficiently. For example, the analysis of the many petabytes of data to be

produced by the LHC and other future high-energy physics experiments will require the marshalling of tens of thousands of processors and hundreds of terabytes of disk space for holding intermediate results. For various technical and political reasons, assembling these resources at a single location appears impractical. Yet the collective institutional and national resources of the hundreds of institutions participating in those experiments can provide these resources. These communities can, furthermore, share more than just computers and storage. They can also share analysis procedures and computational results

HPC lab 2002 40

Computer-in-the-loop instrumentation. Scientific instruments such as telescopes, synchrotrons, and electron microscopes generate raw data streams that are archived for subsequent batch processing. But quasi-real-time analysis can greatly enhance an instrument's capabilities. For example, consider an astronomer studying solar flares with a

radio telescope array. The deconvolution and analysis algorithms used to process the data and detect flares are computationally demanding. Running the algorithms continuously would be inefficient for studying flares that are brief and sporadic. But if the astronomer could call on substantial computing resources (and sophisticated software) in an on-demand fashion, he or she could use automated detection techniques to zoom in on solar flares as they occurred.

Collaborative work. Researchers often want to aggregate not only data and computing power, but also human expertise. Collaborative problem formulation, data analysis, and the like are important Grid applications. For example, an astrophysicist who has performed a large,

multiterabyte simulation might want colleagues around the world to visualize the results in the same way and at the same time so that the group can discuss the results in real time.

HPC lab 2002 41

Grid architecture Close to a decade of focused R&D and experimentation

has produced considerable consensus on the requirements and architecture of Grid technology.

Standard protocols, which define the content and sequence of message exchanges used to request remote operations, have emerged as an important and essential means of achieving the interoperability that Grid systems depend on.

Also essential are standard application programming interfaces (APIs), which define standard interfaces to code libraries and facilitate the construction of Grid components by allowing code components to be reused.

HPC lab 2002 42

Protocols and APIs can be categorized according to the role they play in a Grid system

At the lowest level, the fabric, we have the physical devices or resources that Grid users want to share and access, including computers, clusters, storage systems, catalogs, networks, and various forms of sensors.

HPC lab 2002 43

The layered Grid architecture and its relationship to the Internet protocol architecture. Because the Internet protocol architecture extends from network to application, there is a mapping from Grid layers into Internet layers.

HPC lab 2002 44

Fabric: Interfaces to Local ControlThe Grid Fabric layer provides the

resources to which shared access is mediated by Grid protocols: for example, computational resources, storage

systems, catalogs, network resources, and sensors.

A “resource” may be a logical entity, such as a distributed file system, computer cluster, or distributed computer pool;

in such cases, a resource implementation may involve internal protocols (e.g., the NFS storage access protocol or a cluster resource management system’s process management protocol), but these are not the concern of Grid architecture.

HPC lab 2002 45

Fabric components implement the local, resource-specific operations that occur on specific resources (whether physical or logical) as a result of sharing operations at higher levels. There is thus a tight and subtle interdependence

between the functions implemented at the Fabric level, on the one hand, and the sharing operations supported, on the other.

Richer Fabric functionality enables more sophisticated sharing operations; at the same time, if we place few demands on Fabric elements, then deployment of Grid infrastructure is simplified.

For example, resource level support for advance reservations makes it possible for higher-level services to aggregate (coschedule) resources in interesting ways that would otherwise be impossible to achieve.

HPC lab 2002 46

However, as in practice few resources support advance reservation “out of the box,” a requirement for advance reservation increases the cost of incorporating new resources into a Grid. issue / significance of building large, integrated systems,

just-in-time by aggregation (=coscheduling and co-management) is a significant new capability provided by these Grid services.

Experience suggests that at a minimum, resources should implement enquiry mechanisms that permit discovery of their structure, state, and capabilities (e.g., whether they support advance reservation) on the one hand, and resource management mechanisms that provide some control of delivered quality of service, on the other.

HPC lab 2002 47

Resource specific characterization of capabilities Computational resources: Mechanisms are required for starting

programs and for monitoring and controlling the execution of the resulting processes. Management mechanisms that allow control over the resources allocated to processes are useful, as are advance reservation mechanisms. Enquiry functions are needed for determining hardware and software characteristics as well as relevant state information such as current load and queue state in the case of scheduler-managed resources.

Storage resources: Mechanisms are required for putting and getting files. Third-party and high-performance (e.g., striped) transfers are useful. So are mechanisms for reading and writing subsets of a file and/or executing remote data selection or reduction functions. Management mechanisms that allow control over the resources allocated to data transfers (space, disk bandwidth, network bandwidth, CPU) are useful, as are advance reservation mechanisms. Enquiry functions are needed for determining hardware and software characteristics as well as relevant load information such as available space and bandwidth utilization.

HPC lab 2002 48

Network resources: Management mechanisms that provide control over the resources allocated to network transfers (e.g., prioritization, reservation) can be useful. Enquiry functions should be provided to determine network characteristics and load.

Code repositories: This specialized form of storage resource requires mechanisms for managing versioned source and object code: for example, a control system such as CVS.

Catalogs: This specialized form of storage resource requires mechanisms for implementing catalog query and update operations: for example, a relational database.

HPC lab 2002 49

The Globus Toolkit has been designed to use (primarily) existing fabric components, including vendor-supplied protocols and interfaces. However, if a vendor does not provide the necessary Fabric-level

behavior, the Globus Toolkit includes the missing functionality. For example, enquiry software is provided for discovering structure

and state information for various common resource types, such as computers (e.g., OS version, hardware configuration, load, scheduler queue status), storage systems (e.g., available space), and networks (e.g., current and predicted future load [52, 63]), and for packaging this information in a form that facilitates the implementation of higher-level protocols, specifically at the Resource layer.

Resource management, on the other hand, is generally assumed to be the domain of local resource managers.

One exception is the General-purpose Architecture for Reservation and Allocation (GARA), which provides a “slot manager” that can be used to implement advance reservation for resources that do not support this capability.

Others have developed enhancements to the Portable Batch System (PBS) and Condor that support advance reservation capabilities.

HPC lab 2002 50

HPC lab 2002 51

GARA (General-purpose Architecture for Reservation and Allocation)

The GARA architecture provides programmers with convenient access to end-to-end QoS for programs. To do so, it provides mechanisms for making QoS

reservations for different types of resources, including computers, networks, and disks.

A reservation is a promise from GARA that an application will receive a certain level of service from a resource.

For example, a reservation may promise a certain bandwidth on a network or a certain percentage of a CPU.

The GARA architecture is defined as a layered architecture with three levels of APIs and one level of low-level mechanisms:

HPC lab 2002 52

HPC lab 2002 53

The GARA API has two interesting advantages. First, it allows you to make reservations either in advance of

when you need them or right at the time that you need them an immediate reservation.

Second, you use the same API to make and manipulate a reservation regardless of the type of the underlying resource, thereby simplifying your programming when you need to work with multiple kinds of resources.

The GARA API can be considered a remote procedure call mechanism to communication with a resource manager. A resource manager controls reservations for a resource: it

performs admission control and controls the resource to enforce the reservations.

Some resources already have the ability to work with advanced reservations, so the resource manager is a simple program.

Most resources cannot deal with advanced reservations, so the resource manager tracks the reservations and does admission control for new reservation requests. Much of the research in GARA has focused on building useful resource managers.

HPC lab 2002 54

When a program uses the GARA API to communicate with a resource manager, the communication does not happen directly, but happens through the assistance of the Globus gatekeeper.

The gatekeeper performs three important service: authentication, to verify the identity of the person making the reservation, authorization, to verify that the person is allowed to make a reservation, and finally, the launches the gatekeeper service to handle the communication with the resource manager.

LRAM = Local Resource Manager API

HPC lab 2002 55

This figure demonstrates an important aspect of GARA: In order to make and use a reservation, you

will need to have a Globus gatekeeper installed with a gatekeeper service properly configured, you will need to have a resource manager running, and you will need to have any underlying resource management tools installed.

For example, if you wish to make reservations for CPU reservations, you’ll need the gatekeeper, the DSRT resource manager, and the DSRT scheduler process.

HPC lab 2002 56

Currently, GARA provides three different resource managers: A differentiated services network resource manager to provide quality of

service over a network. A CPU resource manager that uses the Dynamic Soft Real-Time (DSRT)

scheduler for controlling scheduling for a processes. A DPSS(Distributed-Parallel Storage System) resource manager that allows

exclusive access to a DPSS server. In the near future, new resource managers will be created to work with other

resource types. A resource manager has four important jobs:

Admission Control: A resource manager decides whether or not each reservation can be accepted.

Resource Configuration: A resource manager configures the underlying resource (such as routers on a network) to ensure that each reservation actually receives the quality of service that it request.

Monitoring: A resource manager observes the underlying resource while a reservation is active. This monitoring can provide feedback to the user of the application, such as warnings that the reservation is insufficient or too large.

Reporting: Resource managers (optionally) report the current state of the resource manager into the Metacomputing Directory Service, an LDAP directory. The information that is reported includes the total amount of the resource that can be reserved (e.g. 50 Mbps) and a list of the reservations that have been already made.

HPC lab 2002 57

Differentiated Service Resource Manager Provides reservations for network bandwidth using

differentiated services. Differentiated Services, or diffserv, is a relatively simple

method for providing different types of service to packets in a network.

In networks without diffserv, all packets are treated equally poorly: although the network makes its best-effort to deliver the packets, there is no guarantee that packets will arrive at their destination in a timely fashion, or even a guarantee that they will arrive at all.

While such best-effort networks have served us well for a long time, some programs prefer better guarantees than that.

HPC lab 2002 58

Diffserv provides guarantees by marking each packet with a per-hop behavior (PHB), which indicates how the packet is to be treated.

There are very few PHBs that have been created: it is not the case that each reservation has a unique marking, but that reservations are collected into aggregates, and each aggregate is treated as a whole.

For example, one PHB which is used by GARA is the expedited forwarding (EF) PHB. In a router, all packets that are marked as EF as forwarded on the network before any other kinds of packets, up to some limit.

If there is careful end-to-end control, EF can provide programs with assurances that their packets will get through the network quickly and with low delay.

GARA uses diffserv with EF to provide network reservations. The total amount of EF traffic is limited, so that GARA can be sure that the applications with reservations actually receive the bandwidth that they have reserved.

HPC lab 2002 59

Currently, GARA works with Cisco routers, which are configured with scripts that use telnet.

Although GARA currently relies on Cisco routers, it is very easy to replace just the small configuration scripts in order to use GARA with other underlying router types.

HPC lab 2002 60

Connectivity: Communicating Easily and Securely

The Connectivity layer defines core communication and authentication protocols required for Grid-specific network transactions. Communication protocols enable the exchange of data

between Fabric layer resources. Authentication protocols build on communication services to

provide cryptographically secure mechanisms for verifying the identity of users and resources.

Communication requirements include transport, routing, and naming. While alternatives certainly exist, we assume here that these protocols are drawn from the TCP/IP protocol stack: specifically, the Internet (IP and ICMP Internet

Control Message Protocol), transport (TCP, UDP), and application (DNS, OSPF Open Shortest Path First, RSVP resource Reservation Protocol, etc.) layers of the Internet layered protocol architecture.

This is not to say that in the future, Grid communications will not demand new protocols that take into account particular types of network dynamics.

HPC lab 2002 61

With respect to security aspects of the Connectivity layer, we observe that the complexity of the security problem makes it important that any solutions be based on existing standards whenever possible. As with communication, many of the security standards

developed within the context of the Internet protocol suite are applicable.

Authentication solutions for VO environments should have the following characteristics: Single sign on. Users must be able to “log on” (authenticate)

just once and then have access to multiple Grid resources defined in the Fabric layer, without further user intervention.

Delegation. A user must be able to endow a program with the ability to run on that user’s behalf, so that the program is able to access the resources on which the user is authorized. The program should (optionally) also be able to conditionally delegate a subset of its rights to another program (sometimes referred to as restricted delegation).

HPC lab 2002 62

Integration with various local security solutions: Each site or resource provider may employ any of a variety of local security solutions, including Kerberos and Unix security. Grid security solutions must be able to interoperate with these various local solutions. They cannot, realistically, require wholesale replacement of local security solutions but rather must allow mapping into the local environment.

Cf. Kerberos = The authentication system of MIT's Project Athena. It is based on symmetric key cryptography. Adopted by OSF (Open Software Foundation) as the basis of security for DME(Distributed Management Environment)

mailto:SETI@home

http://bobcat.mcs.anl.gov/~foster/

http://www.globus.org/research/papers/anatomy.pdf

HPC lab 2002 63

User-based trust relationships: In order for a user to use resources from multiple providers together, the security system must not require each of the resource providers to cooperate or interact with each other in configuring the security environment. For example, if a user has the right to use sites A and B, the user should be able to use sites A and B together without requiring that A’s and B’s security administrators interact.

HPC lab 2002 64

Grid security solutions should also provide flexible support for communication protection (e.g., control over the degree of protection, independent

data unit protection for unreliable protocols, support for reliable transport protocols other than TCP) and enable stakeholder control over authorization decisions, including the ability to restrict the delegation of rights in various ways.

Globus Toolkit: The Internet protocols listed above are used for communication. The public-key based Grid Security Infrastructure (GSI) protocols are used for authentication, communication protection, and authorization. GSI builds on and extends the Transport Layer Security (TLS)

protocols to address most of the issues listed above: in particular, single sign-on, delegation, integration with various local security solutions (including Kerberos), and user-based trust relationships. X.509-format identity certificates are used.

HPC lab 2002 65

Stakeholder control of authorization is supported via an authorization toolkit that allows resource owners to integrate local policies via a Generic Authorization and Access (GAA) control interface.

Rich support for restricted delegation is not provided in the current toolkit release (v1.1.4) but has been demonstrated in prototypes.

HPC lab 2002 66

Resource: Sharing Single Resources

The Resource layer builds on Connectivity layer communication and authentication protocols to define protocols (and APIs and SDKs) for the secure negotiation, initiation, monitoring, control,accounting, and payment of sharing operations on individual resources. Resource layer implementations of these protocols call

Fabric layer functions to access and control local resources.

Resource layer protocols are concerned entirely with individual resources and hence ignore issues of global state and atomic actions across distributed collections; such issues are the concern of the Collective layer discussed next.

HPC lab 2002 67

Two primary classes of Resource layer protocols can be distinguished: Information protocols are used to obtain information about

the structure and state of a resource, for example, its configuration, current load, and usage policy (e.g., cost).

Management protocols are used to negotiate access to a shared resource, specifying, for example, resource requirements (including advanced reservation and quality of service) and the operation(s) to be performed, such as process creation, or data access.

Since management protocols are responsible for instantiating sharing relationships, they must serve as a “policy application point,” ensuring that the requested protocol operations are consistent with the policy under which the resource is to be shared.

Issues that must be considered include accounting and payment. A protocol may also support monitoring the status of an operation and controlling (for example, terminating) the operation.

HPC lab 2002 68

While many such protocols can be imagined, the Resource (and Connectivity) protocol layers form the neck of our hourglass model, and as such should be limited to a small and focused set. These protocols must be chosen so as to capture the

fundamental mechanisms of sharing across many different resource types (for example, different local resource management systems), while not overly constraining the types or performance of higher-level protocols that may be developed.

HPC lab 2002 69

Globus Toolkit: A small and mostly standards-based set of protocols is adopted. In particular: A Grid Resource Information Protocol (GRIP, currently

based on the Lightweight Directory Access Protocol: LDAP) is used to define a standard resource information protocol and associated information model.

An associated soft-state resource registration protocol, the Grid Resource Registration Protocol (GRRP), is used to register resources with Grid Index Information Servers.

The HTTP-based Grid Resource Access and Management (GRAM) protocol is used for allocation of computational resources and for monitoring and control of computation on those resources.

HPC lab 2002 70

An extended version of the File Transfer Protocol, GridFTP, is a management protocol for data access; extensions include use of Connectivity layer security protocols, partial file access, and management of parallelism for high-speed transfers.

FTP is adopted as a base data transfer protocol because of its support for third-party transfers and because its separate control and data channels facilitate the implementation of sophisticated servers.

LDAP(Lightweight Directory Access Protocol) is also used as a catalog access protocol.

HPC lab 2002 71

GridFTP features Grid Security Infrastructure (GSI) and Kerberos

support: Robust and flexible authentication, integrity, and confidentiality features are critical when transferring or accessing files. GridFTP supports both GSI and Kerberos authentication, with user controlled setting of various levels of data integrity and/or confidentiality.

Third-party control of data transfer: In order to manage large data sets for large distributed communities, it is necessary to provide third-party control of transfers between storage servers. GridFTP provides this capability by adding GSSAPI security to the existing third-party transfer capability defined in the FTP standard.

Parallel data transfer: On wide-area links, using multiple TCP streams can improve aggregate bandwidth over using a single TCP stream. This is required both between a single client and a single server, and between two servers. GridFTP supports parallel data transfer through FTP command extensions and data channel extensions.

HPC lab 2002 72

Striped data transfer: Partitioning data across multiple servers can further improve aggregate bandwidth. GridFTP supports striped data transfers through extensions defined in the Grid Forum draft.

Partial file transfer: Many applications require the transfer of partial files. However, standard FTP requires the application to transfer the entire file, or the remainder of a file starting at a particular offset. GridFTP introduces new FTP commands to support transfers of regions of a file.

Support for reliable data transfer: Reliable transfer is important for many applications that manage data. Fault recovery methods for handling transient network failures, server outages, etc., are needed. The FTP standard includes basic features for restarting failed transfer that are not widely implemented. The GridFTP protocol exploits these features, and substantially extends them.

HPC lab 2002 73

Manual control of TCP buffer size: This is a critical parameter for achieving maximum bandwidth with TCP/IP. The protocol also has support for automatic buffer size tuning, but we have not yet implemented anything in our code. We are talking with both NCSA and LANL to see if it makes sense to integrate work they are doing in this area into our code.

HPC lab 2002 74

The Globus Toolkit defines client-side C and Java APIs and SDKs for each of these protocols. Server-side SDKs and servers are also provided for each

protocol, to facilitate the integration of various resources (computational, storage, network) into the Grid.

For example, the Grid Resource Information Service (GRIS) implements server-side LDAP functionality, with callouts allowing for publication of arbitrary resource information.

An important server-side element of the overall Toolkit is the “gatekeeper,” which provides what is in essence a GSI-authenticated “inetd” that speaks the GRAM protocol and can be used to dispatch various local operations.

The Generic Security Services (GSS) API is used to acquire, forward, and verify authentication credentials and to provide transport layer integrity and privacy within these SDKs and servers, enabling substitution of alternative security services at the Connectivity layer.

HPC lab 2002 75

Collective: Coordinating Multiple Resources

While the Resource layer is focused on interactions with a single resource, the next layer in the architecture contains protocols and services (and APIs and SDKs) that are not associated with any one specific resource but rather are global in nature and capture interactions across collections of resources. For this reason, we refer to the next layer of the

architecture as the Collective layer. Because Collective components build on the narrow

Resource and Connectivity layer “neck” in the protocol hourglass, they can implement a wide variety of sharing behaviors without placing new requirements on the resources being shared. For example:

HPC lab 2002 76

Directory services allow VO participants to discover the existence and/or properties of VO resources. A directory service may allow its users to query for

resources by name and/or by attributes such as type, availability, or load.

Resource-level GRRP and GRIP protocols are used to construct directories.

Co-allocation, scheduling, and brokering services allow VO participants to request the allocation of one or more resources for a specific purpose and the scheduling of tasks on the appropriate resources. Examples include AppLeS, Condor-G, Nimrod-G, and the

DRM broker. Monitoring and diagnostics services support the

monitoring of VO resources for failure, adversarial attack (“intrusion detection”), overload, and so forth.

HPC lab 2002 77

Data replication services support the management of VO storage (and perhaps also

network and computing) resources to maximize data access performance with respect to metrics such as response time, reliability, and cost.

Grid-enabled programming systems enable familiar programming models to be used in Grid

environments, using various Grid services to address resource discovery, security,resource allocation, and other concerns.

Examples include Grid-enabled implementations of the Message Passing Interface and manager-worker frameworks.

Workload management systems and collaboration frameworks— also known as problem solving environments (“PSEs”)—

provide for the description, use, and management of multi-step, asynchronous, multi-component workflows

HPC lab 2002 78

Software discovery services discover and select the best software implementation

and execution platform based on the parameters of the problem being solved [20].

Examples include NetSolve and Ninf.

Community authorization servers enforce community policies governing resource access,

generating capabilities that community members can use to access community resources.

These servers provide a global policy enforcement service by building on resource information, and resource management protocols (in the Resource layer) and security protocols in the Connectivity layer. Akenti [60] addresses some of these issues.

HPC lab 2002 79

Community accounting and payment services gather resource usage information for the purpose of

accounting, payment, and/or limiting of resource usage by community members.

Collaboratory services support the coordinated exchange of information within

potentially large user communities, whether synchronously or asynchronously.

Examples are CAVERNsoft, Access Grid, and commodity groupware systems.

These examples illustrate the wide variety of Collective layer protocols and services that are encountered in practice. Notice that while Resource layer protocols must be

general in nature and are widely deployed, Collective layer protocols span the spectrum from general purpose to highly application or domain specific, with the latter existing perhaps only within specific VOs.

HPC lab 2002 80

Collective functions can be implemented as persistent services, with associated protocols, or as SDKs (with associated APIs) designed to be linked with applications. In both cases, their implementation can build on Resource

layer (or other Collective layer) protocols and APIs. For example, Figure 3 shows a Collective co-allocation API

and SDK (the middle tier) that uses a Resource layer management protocol to manipulate underlying resources.

Above this, we define a co-reservation service protocol and implement a co-reservation service that speaks this protocol, calling the co-allocation API to implement co-allocation operations and perhaps providing additional functionality, such as authorization, fault tolerance, and logging.

An application might then use the co-reservation service protocol to request end-to-end network reservations.

HPC lab 2002 81

Figure 3: Collective and Resource layer protocols, services, APIs, and SDKS can be combined in a variety of ways to deliver functionality to applications.

HPC lab 2002 82

Collective components may be tailored to the requirements of a specific user community, VO, or application domain for example, an SDK that implements an application-

specific coherency protocol, or a co-reservation service for a specific set of network resources.

Other Collective components can be more general-purpose, for example, a replication service that manages an international collection of storage systems for multiple communities, or a directory service designed to enable the discovery of VOs.

In general, the larger the target user community, the more important it is that a Collective component’s protocol(s) and API(s) be standards based.

HPC lab 2002 83

Globus Toolkit: In addition to the example services listed earlier in this section, many of which build on Globus Connectivity and Resource protocols, we mention the Meta Directory Service, which introduces

Grid Information Index Servers (GIISs) to support arbitrary views on resource subsets, with the LDAP information protocol used to access resource-specific GRISs to obtain resource state and GRRP used for resource registration.

Also replica catalog and replica management services used to support the management of dataset replicas in a Grid environment.

An online credential repository service (“MyProxy”) provides secure storage for proxy credentials. The DUROC co-allocation library provides an SDK and API for resource coallocation.

HPC lab 2002 84

DUROC(Dynamically-Updated Request Online Coallocator) Coallocator requirements and motivation

The Globus environment includes resource managers to provide access to a range of system-dependent schedulers. Each resource manager (RM) provides an interface to submit jobs on a particular set of physical resources.

In order to execute jobs which need to be distributed over resources accessed through independent RMs, a coallocator is used to coordinate transactions with each of the RMs and bring up the distributed pieces of the job. The coallocator must provide a convenient interface to obtain resources and execute jobs across multiple management pools.

HPC lab 2002 85

Reflective management architecture The task an intelligent coallocation agent performs has

two abstractly distinct parts. First, the agent must process resource specifications to

determine how a job might be distributed across the resources of which it is aware--the agent lowers an abstract specification such that portions of the specification are allocated to the individual RMs that control access to those required resources.

Second, the agent must process the lowered resource specification as part of a job request to actually attempt resource allocation--the agent issues job requests to each of the pertinent RMs to schedule the job.

HPC lab 2002 86

The process of lowering a resource specification in a job request in essence refines the request based on information available to the lowering agent.

By separating the tasks of refinement and allocation in the architecture, we can allow user intervention to adjust the refinement based on information or constraints beyond the heuristics used internally by a particular automated agent. A GUI specification-editor has been suggested as a meaningful mode of user (job requester) intervention.

HPC lab 2002 87

spec1 : resource specification spec2 : resource specification lower (spec1) --> spec2

spec : resource specification job : job contact information (or error status) request (spec) --> job

lowering example: lower ( (count=5) ) --> (+ (& (count=3) (resourceManagerContact=RM1 )) (& (count=2) (resourceManagerContact=RM2 )))

DUROC implements the allocation operation across multiple RMs in the Globus test-bed and leaves lowering decisions to higher-level tools.

HPC lab 2002 88

Atomic requests Once a resource specification has been refined the

agent must attempt to allocate resources. In general the resources might managed by different RMs, and the coallocator must atomically schedule the user's single abstract job or fail to schedule the job.

Because the GRAM interface does not provide support for inter-manager atomicity, the user code must be augmented to implement a job-start barrier; as distributed components of the job become active, they must rendezvous with the allocating agent to be sure all components were successfully started prior to performing any non-restartable user operations.

main : job_start_barrier ( ) . . . user_operations ( )

HPC lab 2002 89

Three important points regarding the job-start barrier in the user's code. First, atomicity of job creation can only guaranteed after

the barrier, so the user should not perform operations which cannot be reversed, e.g. certain persistent effects or input/output operations, until after the barrier.

Second, the barrier call is used to implement guaranteed job cancelation within each RM; if the agent's job scheduling fails but some of the components have been scheduled through a manager that cannot cancel jobs it schedules, the agent will have to rendezvous with those components when they become active and signal them to abort.

Third, the barrier call initializes the job-aggregation communication functions needed to make use of the coallocated resources.

HPC lab 2002 90

Coallocated resource specification language DUROC shares its Resource Specification Language

(RSL) with GRAM. DUROC can perform allocations described by a 'lowered'

resource specification. The task of the lowering agent is to take a resource

request of some form, be it a generalized GRAM request or user inputs to a GUI interface, and produce a lowered request so that DUROC can directly acquire the resources for the user.

The allocation semantics for DUROC requests are that each component of the top-level multi-request represents one GRAM request that DUROC should make as part of the distributed job DUROC is allocating. In order to make the request, DUROC must be able to determine what RM to contact.

HPC lab 2002 91

Typically there will be additional terms in the conjunctions of the lowered request, and those terms will be passed on verbatim in each GRAM request. DUROC will extract each component of the lowered multi-request, remove the DUROC-specific components of the subrequest, and then forward that subrequest to the specified GRAM. Therefore any other attributes supported by GRAM are implicitly supported by DUROC. For example:+(&(resourceManagerContact=RM1)(count=3)(executable=myprog.sparc))(&(resourceManagerContact=RM2)(count=2)(executable=myprog.rs6000))

in this request the executables and node counts are specified for each resource pool. While GRAM may in fact require fields such as these, DUROC treats them as it would any other fields not needed to do its job--it forwards them in the subrequests and it is up the the RMs to either successfully handle the request or return a failure-code back to DUROC (which will then return an appropriate code to the user).

HPC lab 2002 92

DUROC request processing (coallocation) Requests submitted to the DUROC API are decomposed

into the individual GRAM requests and each request is submitted through the GRAM API.

A DUROC request proceeds with each GRAM request in the job that succeeds. Runtime features available to the job processes include a start barrier and inter-process communications to help coordinate the job processes.

The start barrier allows the processes to synchronize before performing any non-restartable operations. In the absence of a start barrier, there is no way to guarantee that all job components are successfully created prior to executing user code.

HPC lab 2002 93

The communications library provides two simple mechanisms to send start-up and bootstrapping information between processes: an inter-subjob mechanism to communicate between “node 0” of each subjob, and an intra-subjob mechanism to communicate between all the nodes of a single subjob. A library of common bootstrapping operations is provided, using the public inter-subjob and intra-subjob communication interfaces.

For each GRAM subjob in the DUROC job, there are two optional RSL fields which affect the subjob behavior.

The ‘subjobStartType’ field allows the user to configure each subjob to either participate in the start barrier with strict subjob-state monitoring (value ‘strict-barrier’), participate in the start barrier without strict subjob-state monitoring (value ‘loose-barrier’), or not participate in the barrier at all (value ‘no-barrier’). Subjobs that don't perform the barrier run forward independently of the other subjobs. Strict state monitoring means that the job will be automatically killed if the subjob terminates prior to completing the barrier.

HPC lab 2002 94

The ‘subjobCommsType’ field allows the user to configure each subjob to either join the inter-subjob communications group as a blocking operation (value ‘blocking-join’) or not join the inter-subjob communications group at all (value ‘independent’). When joining the group as a blocking operation, all participating subjobs will join together, i.e. the communications startup function will function as a group barrier

HPC lab 2002 95

Applications At the top of any Grid system are the user applications, which are

constructed in terms of, and call on, the components in any other layer. For example, a high-energy physics analysis application that

needs to execute several thousands of independent tasks, each taking as input some set of files containing events, might proceed by

obtaining necessary authentication credentials (connectivity layer protocols)

querying an information system and replica catalog to determine availability of computers, storage systems, and networks, and the location of required input files (collective services)

submitting requests to appropriate computers, storage systems, and networks to initiate computations, move data, and so forth (resource protocols) and

monitoring the progress of the various computations and data transfers, notifying the user when all are completed, and detecting and responding to failure conditions (resource protocols).

HPC lab 2002 96

Grid Architecture in Practice

HPC lab 2002 97

HPC lab 2002 98

장태무 :장태무 :

HPC lab 2002 99

In the case of the ray tracing application, we assume that this is based on a high-throughput

computing system [37, 50]. In order to manage the execution of large numbers of

largely independent tasks in a VO environment, this system must keep track of the set of active and pending tasks, locate appropriate resources for each task, stage executables to those resources,detect and respond to various types of failure, and so forth.

An implementation in the context of our Grid architecture uses both domain-specific Collective services (dynamic checkpoint, task pool management, failover) and more generic Collective services (brokering, data replication for executables and common input files), as well as standard Resource and Connectivity protocols.

Condor-G represents a first step towards this goal.

HPC lab 2002 100

In the case of the multidisciplinary simulation application, the problems are quite different at the highest level. Some application framework (e.g., CORBA, CCA) may be

used to construct the application from its various components.

We also require mechanisms for discovering appropriate computational resources, for reserving time on those resources, for staging executables (perhaps), for providing access to remote storage, and so forth.

Again, a number of domain-specific Collective services will be used (e.g., solver coupler, distributed data archiver), but the basic underpinnings are the same as in the ray tracing example.

Cf. CCA(Common Component Architecture) Forum : to define a minimal set of standard interfaces that a high-performance component framework has to provide to components, and can expect from them, in order to allow disparate components to be composed together to build a running application. Such a standard will promote interoperability between components developed by different teams across different institutions

HPC lab 2002 101

Authentication, authorization, and policy Authentication, authorization, and policy are among the

most challenging issues in Grids. Traditional security technologies are concerned

primarily with securing the interactions between clients and servers.

In such interactions, a client (that is, a user) and a server need to mutually authenticate (that is, verify) each other's identity, while the server needs to determine whether to authorize requests issued by the client.

Sophisticated technologies have been developed for performing these basic operations and for guarding against and detecting various forms of attack.

We use the technologies whenever we visit e-commerce Web sites such as Amazon to buy products online.

HPC lab 2002 102

In Grid environments, the situation is more complex. The distinction between client and server tends to

disappear, because an individual resource can act as a server one moment (as it receives a request) and as a client at another (as it issues requests to other resources).

For example, when I request that a simulation code be run on a colleague's computer, I am the client and the computer is a server.

But a few moments later, that same code and computer act as a client, as they issue requests--on my behalf--to other computers to access input datasets and to run subsidiary computations.

Managing that kind of transaction turns out to have a number of interesting requirements, such as

HPC lab 2002 103

Single sign-on. A single computation may entail access to many

resources, but requiring a user to reauthenticate on each occasion (by, for example, typing in a password) is impractical and generally unacceptable.

Instead, a user should be able to authenticate once and then assign to the computation the right to operate on his or her behalf, typically for a specified period.

This capability is achieved through the creation of a proxy credential. In figure 3, the program run by the user (the user proxy) uses a proxy credential to authenticate at two different sites. These services handle requests to create new processes

http://www.aip.org/pt/vol-55/iss-2/captions/p42cap3.html

HPC lab 2002 104

HPC lab 2002 105

Mapping to local security mechanisms. Different sites may use different local security

solutions, such as Kerberos and Unix as depicted in figure 3.

A Grid security infrastructure needs to map to these local solutions at each site, so that local operations can proceed with appropriate privileges.

In figure 3, processes execute under a local ID and, at site A, are assigned a Kerberos "ticket," a credential used by the Kerberos authentication system to keep track of requests


http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?MIT

HPC lab 2002 106

Delegation. The creation of a proxy credential is a form of delegation, an

operation of fundamental importance in Grid environments.9 A computation that spans many resources creates

subcomputations (subsidiary computations) that may themselves generate requests to other resources and services, perhaps creating additional subcomputations, and so on.

In figure 3, the two subcomputations created at sites A and B both communicate with each other and access files at site C. Authentication operations--and hence further delegated credentials--are involved at each stage, as resources determine whether to grant requests and computations determine whether resources are trustworthy.

The further these delegated credentials are disseminated, the greater the risk that they will be acquired and misused by an adversary. These delegation operations and the credentials that enable them must be carefully managed.


HPC lab 2002 107

Community authorization and policy. In a large community, the policies that govern who can

use which resources for what purpose cannot be based directly on individual identity.

It is infeasible for each resource to keep track of community membership and privileges.

Instead, resources (and users) need to be able to express policies in terms of other criteria, such as group membership, which can be identified with a cryptographic credential issued by a trusted third party.

In the scenario depicted in figure 3, the file server at site C must know explicitly whether the user is allowed to access a particular file.

A community authorization system allows this policy decision to be delegated to a community representative.


HPC lab 2002 108

Relationships with Other Technologies

The concept of controlled, dynamic sharing within VOs is so fundamental that we might assume that Grid-like technologies must surely already be widely deployed. In practice, however, while the need for these

technologies is indeed widespread, in a wide variety of different areas we find only primitive and inadequate solutions to VO problems.

In brief, current distributed computing approaches do not provide a general resource-sharing framework that addresses VO requirements.

Grid technologies distinguish themselves by providing this generic approach to resource sharing.

This situation points to numerous opportunities for the application of Grid technologies.

HPC lab 2002 109

World Wide Web The ubiquity of Web technologies (i.e., IETF Internet Engineering

Task Force and W3CWWW Consortium Standard protocols—TCP/IP, HTTP,SOAP Simple Object Access Protocol , etc.—and languages, such as HTML and XML) makes them attractive as a platform for constructing VO systems and applications. However, while these technologies do an excellent job of

supporting the browser-client-to-web-server interactions that are the foundation of today’s Web, they lack features required for the richer interaction models that occur in VOs.

For example, today’s Web browsers typically use TLS Transport

Layer Security (after SSL Security Sockets Layer) for authentication, but do not support single sign-on or delegation.

HPC lab 2002 110

Clear steps can be taken to integrate Grid and Web technologies. For example, the single sign-on capabilities provided in the GSI extensions to TLS would, if integrated into Web browsers, allow for single sign-on to multiple Web servers.

GSI delegation capabilities would permit a browser client to delegate capabilities to a Web server so that the server could act on the client’s behalf.

These capabilities, in turn, make it much easier to use Web technologies to build “VO Portals” that provide thin client interfaces to sophisticated VO applications. WebOS addresses some of these issues.

HPC lab 2002 111

Application and Storage Service Providers

Application service providers, storage service providers, and similar hosting companies typically offer to outsource specific business and engineering applications (in the case of ASPs) and storage capabilities (in the case of SSPs). A customer negotiates a service level agreement that

defines access to a specific combination of hardware and software.

Security tends to be handled by using VPN technology to extend the customer’s intranet to encompass resources operated by the ASP or SSP on the customer’s behalf.

Other SSPs offer file-sharing services, in which case access is provided via HTTP, FTP, or WebDAV Web-based

Distributed Authoring and Versioning with user ids, passwords, and access control lists controlling access.

HPC lab 2002 112

From a VO perspective, these are low-level building-block technologies. VPNs and static configurations make many VO sharing

modalities hard to achieve. For example, the use of VPNs means that it is typically impossible for an ASP application to access data located on storage managed by a separate SSP.

Similarly, dynamic reconfiguration of resources within a single ASP or SPP is challenging and, in fact, is rarely attempted.

The load sharing across providers that occurs on a routine basis in the electric power industry is unheard of in the hosting industry.

A basic problem is that a VPN is not a VO: it cannot extend dynamically to encompass other resources and does not provide the remote resource provider with any control of when and whether to share its resources.

HPC lab 2002 113

The integration of Grid technologies into ASPs and SSPs can enable a much richer range of possibilities. For example, standard Grid services and protocols can

be used to achieve a decoupling of the hardware and software.

A customer could negotiate an SLA Service Level Agreement for particular hardware resources and then use Grid resource protocols to dynamically provision that hardware to run customer-specific applications.

Flexible delegation and access control mechanisms would allow a customer to grant an application running on an ASP computer direct, efficient, and securely access to data on SSP storage—and/or to couple resources from multiple ASPs and SSPs with their own resources, when required for more complex problems.

HPC lab 2002 114

A single sign-on security infrastructure able to span multiple security domains dynamically is, realistically, required to support such scenarios.

Grid resource management and accounting/payment protocols that allow for dynamic provisioning and reservation of capabilities (e.g., amount of storage, transfer bandwidth, etc.) are also critical.

HPC lab 2002 115

Enterprise Computing Systems Enterprise development technologies such as

CORBA, Enterprise Java Beans, Java 2 Enterprise Edition, and DCOM are all systems designed to enable the construction of distributed applications. They provide standard resource interfaces, remote

invocation mechanisms, and trading services for discovery and hence make it easy to share resources within a single organization.

However, these mechanisms address none of the specific VO requirements listed above.

Sharing arrangements are typically relatively static and restricted to occur within a single organization.

The primary form of interaction is client-server, rather than the coordinated use of multiple resources.

HPC lab 2002 116

These observations suggest that there should be a role for Grid technologies within enterprise computing. For example, in the case of CORBA, we could construct

an object request broker (ORB) that uses GSI mechanisms to address cross-organizational security issues.

We could implement a Portable Object Adaptor that speaks the Grid resource management protocol to access resources spread across a VO.

We could construct Grid-enabled Naming and Trading services that use Grid information service protocols to query information sources distributed across large VOs.

In each case, the use of Grid protocols provides enhanced capability (e.g., interdomain security) and enables interoperability with other (non-CORBA) clients.

HPC lab 2002 117

Similar observations can be made about Java and Jini Java

Intelligent Network Infra-structure . For example, Jini’s protocols and implementation are geared toward a small collection of devices.

A “Grid Jini” that employed Grid protocols and services would allow the use of Jini abstractions in a large-scale, multi-enterprise environment.

HPC lab 2002 118

Internet and Peer-to-Peer Computing Peer-to-peer computing (as implemented, for

example, in the Napster, Gnutella, and Freenet file sharing systems) and Internet computing (as implemented, for example by the SETI@home, Parabon, and Entropia systems) an example of the more general (“beyond client-

server”) sharing modalities and computational structures that we referred to in our characterization of VOs. As such, they have much in common with Grid technologies.

In practice, we find that the technical focus of work in these domains has not overlapped significantly to date.

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?symmetric+key+cryptography

HPC lab 2002 119

One reason is that peer-to-peer and Internet computing developers have so far focused entirely on vertically integrated (“stovepipe”) solutions, rather than seeking to define common protocols that would allow for shared infrastructure and interoperability. (This is, of course, a common characteristic of new market niches, in which participants still hope for a monopoly.)

Another is that the forms of sharing targeted by various applications are quite limited, for example, file sharing with no access control, and computational sharing with a centralized server

HPC lab 2002 120

As these applications become more sophisticated and the need for interoperability becomes clearer we will see a strong convergence of interests between peer-to-peer, Internet, and Grid computing [31]. For example, single sign-on, delegation, and authorization

technologies become important when computational and data sharing services must interoperate, and the policies that govern access to individual resources become more complex.

HPC lab 2002 121

Other Perspectives on Grids The Grid is a next-generation Internet.

“The Grid” is not an alternative to “the Internet”: it is rather a set of additional protocols and services that build on Internet protocols and services to support the creation and use of computation- and data-enriched environments.

Any resource that is “on the Grid” is also, by definition, “on the Net.”

The Grid is a source of free cycles. Grid computing does not imply unrestricted access to

resources. Grid computing is about controlled sharing. Resource owners

will typically want to enforce policies that constrain access according to group membership, ability to pay, and so forth.

Hence, accounting is important, and a Grid architecture must incorporate resource and collective protocols for exchanging usage and cost information, as well as for exploiting this information when deciding whether to enable sharing.

HPC lab 2002 122

The Grid requires a distributed operating system. In this view, Grid software should define the operating

system services to be installed on every participating system, with these services providing for the Grid what an operating system provides for a single computer: namely, transparency with respect to location, naming, security, and so forth.

Put another way, this perspective views the role of Grid software as defining a virtual machine.

However, we feel that this perspective is inconsistent with our primary goals of broad deployment and interoperability.

We argue that the appropriate model is rather the Internet Protocol suite, which provides largely orthogonal services that address the unique concerns that arise in networked environments.

HPC lab 2002 123

The tremendous physical and administrative heterogeneities encountered in Grid environments means that the traditional transparencies are unobtainable; on the other hand, it does appear feasible to obtain agreement on standard protocols.

The architecture proposed here is deliberately open rather than prescriptive: it defines a compact and minimal set of protocols that a resource must speak to be “on the Grid”; beyond this, it seeks only to provide a framework within which many behaviors can be specified.

HPC lab 2002 124

The Grid requires new programming models. Programming in Grid environments introduces

challenges that are not encountered in sequential (or parallel) computers, such as multiple administrative domains, new failure modes, and large variations in performance.

However, we argue that these are incidental, not central, issues and that the basic programming problem is not fundamentally different.

As in other contexts, abstraction and encapsulation can reduce complexity and improve reliability. But, as in other contexts, it is desirable to allow a wide variety of higher-level abstractions to be constructed, rather than enforcing a particular approach.

HPC lab 2002 125

So, for example, a developer who believes that a universal distributed shared memory model can simplify Grid application development should implement this model in terms of Grid protocols, extending or replacing those protocols only if they prove inadequate for this purpose.

Similarly, a developer who believes that all Grid resources should be presented to users as objects needs simply to implement an object-oriented “API” in terms of Grid protocols.

HPC lab 2002 126

The Grid makes high-performance computers superfluous. The hundreds, thousands, or even millions of processors

that may be accessible within a VO represent a significant source of computational power, if they can be harnessed in a useful fashion.

This does not imply, however, that traditional high-performance computers are obsolete. Many problems require tightly coupled computers, with low latencies and high communication bandwidths; Grid computing may well increase, rather than reduce, demand for such systems by making access easier.

HPC lab 2002 127

Current status and future directions As the Grid matures, standard technologies

are emerging for basic Grid operations. In particular, the community-based, open-source Globus Toolkit is being applied by most major Grid projects. The business world has also begun to investigate Grid applications. By late 2001, 12 companies had announced support for the Globus Toolkit.

HPC lab 2002 128

Progress has also been made on organizational fronts. With more than 1000 people on its mailing

lists, the Global Grid Forum (http://www.gridforum.org) is a significant force for setting standards and community development.

Its thrice-yearly meetings attract hundreds of attendees from some 200 organizations. The International Virtual Data Grid Laboratory is being established as an international Grid system (figure 4).

http://www.gridforum.org/



http://www.aip.org/pt/vol-55/iss-2/captions/p42cap4.shtml

HPC lab 2002 129

HPC lab 2002 130

It is commonly observed that people overestimate the short-term impact of change but underestimate long-term effects.10 It will surely take longer than some expect before Grid

concepts and technologies transform the practice of science, engineering, and business, but the combination of exponential technology trends and R&D advances noted in this article are real and will ultimately have dramatic impacts.

In a future in which computing, storage, and software are no longer objects that we possess, but utilities to which we subscribe, the most successful scientific communities are likely to be those that succeed in assembling and making effective use of appropriate Grid infrastructures and thus accelerating the development and adoption of new problem solving-methods within their discipline

hpc lab 20021 the grid blueprint for a new computing infrastructure editor : ian foster and carl...

Documents