distcom-short-20140112-1600

24
1/22 Geunsik Lim http://leemgs.fedorapeople.org 7/5/22 04:33 PM Distributed Compilation System for High-Speed Software Build Processes

Upload: samsung-electronics

Post on 12-Apr-2017

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: distcom-short-20140112-1600

Geunsik Limhttp://leemgs.fedorapeople.org

05/03/2023 02:42 AM

Distributed Compilation System for High-Speed Software Build Processes

Page 2: distcom-short-20140112-1600

2/22

• Full name: Geunsik Lim • E-mail : [email protected], [email protected]• Affiliation : Sungkyunkwan University, Samsung Electronics• Homepage : http://leemgs.fedorapeople.org

Who am I ?

Page 3: distcom-short-20140112-1600

3/22

Introduction

Background

Design and Implementation

Object File Based Server & Client Model

CPU Scheduling of Distributed PC Resource

Cross Compiling for Heterogeneous CPU Architecture

Evaluation

Conclusion

Outline

Page 4: distcom-short-20140112-1600

4/22

Stat. of distributed client PCs

• Studies into high performance computing still lack research of public computer fa-cilities, which have a lot of idle times.

• Example of used and idle times of 300 dual-core PCs in a university library

public computer facilities, which have a lot of idle times

Page 5: distcom-short-20140112-1600

5/22

What is covered in this talk?

• Current state of source size growth of software from January 2009 until July 2013.

Seve

nfol

d

Page 6: distcom-short-20140112-1600

6/22

Cost Statistics for Building Platform Source • Time cost needed to build a large mobile platform such as Android 4.2.2. Compila-

tion costs account for 67 percentages (34 minutes) of the total cost of execution (51 minutes).

Page 7: distcom-short-20140112-1600

7/22

Idle

Com

pute

rs

What is our final goal?

In unity there is strength.

Page 8: distcom-short-20140112-1600

8/22

Distributed compiler network: to distribute compiling tasks across a network

Support portable Linux system for Windows PCs: for distributed com-piler network using the existing HT-Condor pool of Windows PCs

Establish remote command-line en-vironment: HTCondor does on the library PC must work without any user interaction, so no GUI.

What is our challenge?

Requirement: We can NOT use a GUI installer, as you will not be sitting in front of the distributed PC when users doing their work.

Windows XP

Windows 7

Linux/

Page 9: distcom-short-20140112-1600

9/22

System Architecture of DistCom

HTCondor Pool Manager

HTCondor Client

(3) Cross-Compiler Infrastructure

(1.1) DistCom Service Daemon (=DistCom Server)

Leg

end:

Dis

tCom

Com

pone

nts X86 Windows X86 Linux ARM Android

Dis

tribu

ted

Re-

sour

ces

Cloud Develop-ers

Scientific Re-searchers

(2) D

istC

om

Man

ager

(1.2) DistCom Client

Ope

ratin

g Sy

stem

of D

is-

tribu

ted

PC

s (W

indo

ws)

UsersPlatform Builders

Wor

kloa

d M

oni-

tor

HTCondor Collector

Res

ourc

e M

anag

er

• Distributed server and client model: to control distributed PC resources connected via network• DistCom manager: for scheduling distributed PC resources• Cross-Compiler infrastructure: to support heterogeneous architecture

Source Codes

Server & Client Model

Page 10: distcom-short-20140112-1600

10/22

Object File Based Server & Client Model

(1.2) DistCom Client

HTCondor Pool

Manager

HTCondor PC

(Windows)

①Check of PC’s status

• Collecting Information• Monitoring Workload

DistCom Job Flow HTCondor Job Flow

(2) Dist-Com Man-

ager

User

⑥Result

②Command

DistCom Service Daemon(Server)

DistCom Service Daemon(Server)

(1.1) DistCom Service Daemon

(Distributed Computer – A )

O

O

O

O

O

O

O

③Source

⑤Binary

④Source Compilation

O: Object: Atomic unit of checkpoint/restart

• (2) DistCom Manager uses a checkpoint/restart mechanism to minimize speed degradation, where object files are the atomic level for check pointing.

O

O

O

O

O

O

O

O

4-core CPU4 Commands

O

O

O

O

O

One PC1 Objects

O

OO O

2-core CPU2 CommandsExisting Technique Proposed Technique

Page 11: distcom-short-20140112-1600

11/22

CPU Scheduling: Controlling Remote PC Resources• To avoid degrading the processing speed during user’s work period, the (1.1) DistCom Service

Daemon runs compilation as a task of real-time priority (CPU monopolization method) or a task of lowest priority (Time-sharing method)

Lowest BelowNormal

Normal AboveNormal

Highest Real-time

SchedulerMulti-core processor(s)

Case1 by DistCom Ser-vice Daemon

#include <windows.h>

#include <pthread.h>#include <semaphore.h>#include <sys/time.h>#include <time.h>

#include <stdio.h> #include <vxworks.h> #include <sysLib.h> #include <taskLib.h> #include <semlib.h>

CreateThread()

pthread_create()

taskSpawn()1. 0 (Highest) ~ 255 (Lowest)

taskPrioritySet( )

1. THREAD_PRIORITY_TIME_CRITICAL = 152. THREAD_PRIORITY_HIGHEST = +23. THREAD_PRIORITY_ABOVE_NORMAL = +14. THREAD_PRIORITY_NORMAL = 05. THREAD_PRIORITY_BELOW_NORMAL = -16. THREAD_PRIORITY_LOWEST = -27. THREAD_PRIORITY_IDLE = -15

SetThreadPriority()

1. -20 (Highest) ~ 19 (Lowest)2. 1 ( Lowest) ~ 99 ( Highest) Real-time Priority

setpriority( )

Lowest

POSIX: pthread_setschedparam()

#include <thread.h> thr_create() thr_setprio( )1. 0 (Lowest) ~ 127 (Highest)

Case2 by DistCom Ser-vice Daemon

Time-sharing Real-time

Cas

e St

udy

Use

r-Aw

are

Sche

dulin

g

• No modification of distributed computer systems

Lowest

Page 12: distcom-short-20140112-1600

12/22

Dedicated Resources

Shared Resources

(2)DistCom Manager

Reject

[Task queue]

Stop

FinishTask flowJob flow ※ Minimal job unit : Object file

CPU Scheduling: Task Allocation & Reallocation

[Task State Transi-tion]

• (2) DistCom Manager manages all jobs with two task queues to separate either dedicated resources or shared resources.

1. First, Reject is used to deny the allocation of the task. 2. Second, Stop is used to break the allocation of the task to the PC resource be-

cause of the user’s access. 3. Finally, Finish is used to complete the running tasks normally.

Page 13: distcom-short-20140112-1600

13/22

1. Overload detection if (Qsum > CPUfree ) then find another idle computer

2. Task complexity estimation if (CPUfree is unknown or (CPUaccess > CPUidle)) then Recalculate task complexity of distributed computes (Ccomplexity) Run retry mechanism Call task state transition (stop) Run object-file based compilation at the another idle computer

3. Handling of user access if (Uaccess && DedicatedResouceScheduling ) Call Retry mechanism if (Uaccess && SharedResouceScheduling ) Change scheduling priority from highest to lowest

CPU Scheduling: Task Allocation & Reallocation

[Retry mecha-nism]

Q: QueueC: CalculationU: User

The proposed system supports the retry mechanism that executes the recompila-tion based on the object file units, whenever compilation failure of a distributed PC occurs during the distributed compilation.

Page 14: distcom-short-20140112-1600

14/22

Cross Compiling for Heterogeneous Architecture

Cross-compilation Infrastructure for heterogeneous devices

X86 Windows 32bit/64bit

X86 Linux32bit/64bit

ARM Android32bit (V7)

• Cross-compiler infrastructure for generating executable binary files for a system other than the one on which the compiler is running

• Heterogeneous CPU Mapper connects a source code up to the target machine code after probing OS.

Hardware

Machine code

Tool

Cha

in

Ass

embl

erIn

stru

ctio

n Se

tSource Code

(C, C++, Objective-C, JAVA)

Heterogeneous CPU Mapper

Compiler(GCC)

Linker (LD)

Debugger(GDB)

Build cross-binu-tils

Build cross-gcc

Build

Lin

ux A

PI

head

ers

Build c-library(glibc, bionic)

Build cross-gcc-hosted Bu

ild

tool

s

Page 15: distcom-short-20140112-1600

15/22

Evaluation

User (CentOS6): 115.145.170.xxx

Distributed PC Re-sources

Remote PC (Windows 7): 115.145.170.xxx

Remote PC (Ubuntu 12.04): 115.145.170.xxx

Page 16: distcom-short-20140112-1600

16/22

Evaluation – Build Time of Platform Source

51 minutes18 minutesB

efor

eA

fter

(Pro

pose

d Sy

stem

)

StartTime

End

End33 minutes (Reduced Time)

• Time cost to build the mobile platform source is reduced by 65 percent (33 minutes).

• 25% is consumed by the Network Speed, 30% by the Computing Power of PCs, and 45% by the CPU Scheduling Method.

9 machines (CPU: Intel Core2Duo, MEM: DDR2 1G, Intel 100 Mbps Ethernet Controller)

Page 17: distcom-short-20140112-1600

17/22

Evaluation – Compilation Speed with Distributed PCs• Performance of 10 machines was similar to the 8-core PC. Performance loss of 2 PCs because of

network speed and low computing power of the distributed PCs.• Compilation processing performance of the shared resource scheduling method largely depends

on the CPU usage of the PC resource compared with the dedicated resource scheduling method.

8-co

res

8-co

res

8-co

res

8-co

res

8-co

res

8-co

res

High-Performance Computer: 8-Core Intel Xeon E5 Processor, 12GB memory

* network speed, low computing power

Page 18: distcom-short-20140112-1600

18/22

Evaluation – Experimental Result on Cloud Computing• Proposed system is as effective as one high-performance computer (40-core).• 3 minutes difference in performance is caused by the emulation operation of the

KVM.

3min-utes

40co

res

40co

res

40co

res

40co

res

40co

res

40co

res

40co

res

High-performance cloud server (40-Core Intel Xeon E7 Processor, 32GB memory)

Page 19: distcom-short-20140112-1600

19/22

Evaluation – With Ccache VS. Without Ccache

• Reduced compilation time of dedicated resource scheduling is by about 10%.• Ccache effect (Dedicated) is correlated with the memory shortage of distributed

PC resources and with the physical memory capacity for caching

10%

Page 20: distcom-short-20140112-1600

20/22

Comparison Between Existing System and DistCom

Ccache Distcc HTCondor BOINC DistCom (*)

Domain Caching Output of Compilation

Distributed Com-puting

Distributed Paral-lelization

Distributed Com-puting

Distributed Computing

Task Compile Source Compile Source Run Binary File Run Binary File Compile Source &Run Binary File

Goal High Performance High Performance High Throughput High Throughput Hybrid Computing

Pros. -Performance Ac-celeration (e.g. DB, web-service)

-Reduce Build-Time (e.g. Android, Linux)

-Utilize Extra Re-source Management

-Support CPU & GPU

- Multicore-Aware Object-Based Unit- Retry Mechanism - Shared Scheduling

Cons. -Need Sufficient Physical Memory

-Need additional H/W

-No Distributed Compiling

-Only Use Idle Time

-Depend on Network Infrastructure

Cost High High Low Low Low

User Platform Builders Platform Builders Scientific Research Scientific Research Platform BuildersScientific Research

Page 21: distcom-short-20140112-1600

21/22

Conclusion

• Idle computer resources connected by a network are more ubiquitous than ever before. (e.g. cloud environment, BYOD environment, and generalization of computer usage)

• DistCom (DIStributed COMpilation system) support high-speed software compilation.– 1) Distributed Server/Client Model, 2) Object File based CPU Scheduling of

Remote PC Resource, and 3) Cross Compiling for Heterogeneous Arch.– Hybrid Approach For Mobile platform builders, cloud developers, Grid re-

searchers, computational physics, and Statistics.

• The drastic improvement of compilation speeds using exist-ing idle PC resources.

Page 22: distcom-short-20140112-1600

22/22

Thank you for your attention.Any questions?

Page 23: distcom-short-20140112-1600

23/22

1. Who cares about Distcc/HTCondor based system? Can you do it for mobile devices?

2. Sounds too good. Are there any limitations?

3. Are you going to release it? Or is it a one of talk?

4. I totally don’t get why you are doing this?

FAQ

Page 24: distcom-short-20140112-1600

24/22

1. This approach is distributed PC based software solution. But, some of the small companies do not have sufficient distributed computer re-sources.

2. Users needs to run local area network to get the ideal network speed.

3. Can you always uses idle PCs in real environ-ment? We focus on the research of public com-puter facilities, which have a high percentage of idle time.

Limitation