aws cloud 2017 - amazon aurora를 통한 고성능 데이터베이스 운용하기 (박선용...

1

Amazon Aurora를 통한 고성능 데이터베이스운용하기

Agenda§ Aurora 란?

§ 기존 Aurora의 성능을 위한 기능

§ 새로운 성능 향상

§ Aurora for PostgreSQL

§ 성능 Best Practices

Open source compatible relational database

Performance and availability of commercial databases

Simplicity and cost-effectiveness of open source databases

Amazon Aurora 란?

4

기존 Aurora의 성능을 위한 기능

WRITE PERFORMANCE READ PERFORMANCE

인스턴스 사이즈를 통한 성능

Aurora는 인스턴스 사이즈가 커짐에 따라 read 와 write 모두 성능 확장

Aurora MySQL 5.6 MySQL 5.7

실제 데이터 – 게임 워크로드Aurora vs. RDS MySQL – r3.4XL, MAZ

Aurora 3X faster on r3.4xlarge

“Our first tests of Aurora were difficult to believe because the performance increase was substantial… Aurora made our migration from traditional colocation to AWS easier because the storage was fully managed and replication was extremely fast.” - Mark Smallcombe, CTO

“…if you're using Aurora, you should think about using read replicas because the replica lag is really a game changer compared to regular MySQL.” - Advait Shinde, CTO and co-founder

“Amazon Aurora was able to satisfy all of our scale requirements with no degradation in performance. With Alfresco on Amazon Aurora we scaled to 1 billion documents with a throughput of 3 million per hour, which is 10 times faster than our MySQL environment!" - John Newton, Founder and CTO of Alfresco”

"After 8 months of production, Aurora has been nothing short of impressive… We love that so far Aurora has delivered the necessary performance without any of the operational overhead of running MySQL.” – Chris Broglie, Architect

Amazon Aurora – 낮은 가격에 더 높은 성능

• 더 적은 갯수의 인스턴스• 더 작은 인스턴스로도 가능• 프로비전 스토리지 필요 없음• 읽기 복제를 위한 추가 스토리지 필요없음

Safe.com lowered their AWS database bill by 40% by switching from sharded MySQL to a single Amazon Aurora instance.

Double Down Interactive (gaming) lowered their bill by 67% while also achieving better latencies (most queries ran faster) and lower CPU utilization.

더 적은 I/Os

네트워크 패킷 최소화

캐시 우선 결과

데이터베이스 엔진 부담 경감

더 작아지도록 동작

비동기적인 프로세스

지연 패스 경감

락 프리 데이터 스트럭쳐 사용

배치 조작을 병행

보다 효율적으로 동작

Aurora의 성능향상 배경

데이터베이스는 대부분이 I/O

네크워크 부착 스토리즈는 대부분이 PACKETS/SECOND

고 성능 출력 프로세싱은 대부분이 CONTEXT SWITCHES

MySQL I/O 트래픽

BINLOG DATA DOUBLE-WRITELOG FRM FILES

TYPE OF WR I TE

MYSQL WITH REPLICA

EBS mirrorEBS mirror

AZ 1 AZ 2

Amazon S3

EBSAmazon Elastic

Block Store (EBS)

PrimaryInstance

ReplicaInstance

1

2

3

4

5

Issue write to EBS – EBS issues to mirror, ack when both doneStage write to standby instance through DRBDIssue write to EBS on standby instance

I/O FLOW

Steps 1, 3, 4 are sequential and synchronousThis amplifies both latency and jitterMany types of writes for each user operationHave to write data blocks twice to avoid torn writes

OBSERVATIONS

780K transactions7,388K I/Os per million txns (excludes mirroring, standby)Average 7.4 I/Os per transaction

PERFORMANCE

30 minute SysBench writeonly workload, 100GB dataset, RDS MultiAZ, 30K PIOPS

Aurora I/O 트래픽

AZ 1 AZ 3

PrimaryInstance

Amazon S3

AZ 2

ReplicaInstance

AMAZON AURORA

ASYNC4/6 QUORUM

DISTRIBUTED WRITES

BINLOG DATA DOUBLE-WRITELOG FRM FILES

TYPE OF WR I TE

I/O FLOW

Only write redo log records; all steps asynchronousNo data block writes (checkpoint, cache replacement)6X more log writes, but 9X less network trafficTolerant of network and storage outlier latency

OBSERVATIONS

27,378K transactions 35X MORE950K I/Os per 1M txns (6X amplification) 7.7X LESS

PERFORMANCE

Boxcar redo log records – fully ordered by LSNShuffle to appropriate segments – partially orderedBoxcar to storage nodes and issue writesReplica

Instance

Aurora I/O 트래픽 (스토리지 노드)

LOG RECORDS

Primary Instance

INCOMING QUEUE

STORAGE NODE

S3 BACKUP

1

2

3

4

5

6

7

8

UPDATE QUEUE

ACK

HOTLOG

DATABLOCKS

POINT IN TIMESNAPSHOT

GC

SCRUBCOALESCE

SORTGROUP

PEER TO PEER GOSSIPPeerStorageNodes

모든 스텝은 비동기오직 스탭 1 and 2 가 앞단의 지연 과정Input queue is 46X less than MySQL (unamplified, per node)Favor latency-sensitive operationsUse disk space to buffer against spikes in activity

OBSERVATIONS

I/O FLOW

① 레코드를 받아서 인 메모리 큐로 추가② 레코드를 유지하고 acknowledge③ 레코드를 구성하고 로그와의 갭을 확인④ Gossip with peers to fill in holes⑤ Coalesce log records into new data block versions⑥ Periodically stage log and new block versions to S3⑦ 주기적으로 오래된 버전에 대한 가비지 컬랙트⑧ 주기적으로 블락에 대한 CRC 코드 validate

Aurora 복제에서 I/O 트래픽

페이지 캐시업데이트

Aurora Master

30% Read

70% Write

Aurora Replica

100% New Reads

Shared Multi-AZ Storage

MySQL Master

30% Read

70% Write

MySQL Replica

30% New Reads

70% Write

싱글 쓰레드빈로그 적용

Data Volume Data Volume

• Logical: SQL 명령을 Replica로 전송

• 쓰기 워크로드는 양쪽 모두 비슷함

• 독립적인 스토리지

• 마스터와 복제사이에서 데이터 표류가 발생 가능

Physical: 마스터로부터 복제로 Redo를 전달

복제는 스토리지를 공유. 별도의 쓰기 실행하지 않음

캐시 페이지에는 리두 적용

모든 쓰기 커밋이 진행 전 리드 뷰가 선행해서 보임

MYSQL 읽기 확장 AMAZON AURORA 읽기 확장

“In MySQL, we saw replica lag spike to almost 12 minutes which is almost absurd from an application’s perspective. With Aurora, the maximum read replica lag across 4 replicas never exceeded 20 ms.”

실 데이터 – 읽기 복제 지연

비동기 그룹 커밋

Read

Write

Commit

Read

Read

T1

Commit (T1)

Commit (T2)

Commit (T3)

LSN 10

LSN 12

LSN 22

LSN 50

LSN 30

LSN 34

LSN 41

LSN 47

LSN 20

LSN 49

Commit (T4)

Commit (T5)

Commit (T6)

Commit (T7)

Commit (T8)

LSN GROWTHDurable LSN at head-node

COMMIT QUEUEPending commits in LSN order

TIME

GROUPCOMMIT

TRANSACTIONS

Read

Write

Commit

Read

Read

T1

Read

Write

Commit

Read

Read

Tn

• TRADITIONAL APPROACH AMAZON AURORAMaintain a buffer of log records to write out to disk

Issue write when buffer full or time out waiting for writes

First writer has latency penalty when write rate is low

Request I/O with first write, fill buffer till write picked up

Individual write durable when 4 of 6 storage nodes ACK

Advance DB Durable point up to earliest pending ACK

• 재 진입 커넥션이 활성 쓰레드와 다중연동(multiplexed)

• Kernel-space epoll() inserts into latch-free event queue

• Dynamically size threads pool

• Gracefully handles 5000+ concurrent client sessions on r3.

8xl

표준 MySQL – 연결당 하나의 쓰레드

Doesn’t scale with connection count

MySQL EE – connections assigned to thread group

Requires careful stall threshold tuning

CLI

ENT

CO

NN

ECTI

ON

CLI

ENT

CO

NN

ECTI

ON

LATCH FREETASK QUEUE

epoll(

)

MYSQL THREAD MODEL AURORA THREAD MODEL

적응성 쓰레드 풀

Scan

Delete

Aurora 락 관리

Scan

Delete

Insert

Scan Scan

Insert

Delete

Scan

Insert

Insert

MySQL lock manager Aurora lock manager

§ Same locking semantics as MySQL

§ Concurrent access to lock chains

§ Multiple scanners allowed in an individual lock chains

§ Lock-free deadlock detection

많은 동시 세션들 지원을 위해 필요, 높은 업데이트 출력량

18

새로운 성능 향상

Cached 읽기 성능 개선

• 카탈로그 동시성(Catalog concurrency): 데이터 딕셔너리 동기화와 캐시 퇴거(eviction)을 개선

• NUMA 인식 스케줄러: Aurora scheduler는이제 NUMA를 고려함. 멀티 소켓 인스턴스의확장성에 도움.

• 리드 뷰(Read views): 리드 뷰 생성시 래치-프리(latch-free) 동시성 읽기 뷰 알고리즘을이용함 0

100

200

300

400

500

600

700

MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016

In thousands of read requests/sec

* R3.8xlarge instance, <1GB dataset using Sysbench

25% 출력량 증가

• 스마트 스케줄러(Smart scheduler): Aurora 스케줄러가 쓰레드를 처리할 일이 I/O heavy 인가 CPU heavy 인가에 따라 동적 할당

• 스마트 선택자(Smart selector): Aurora는 카피된 스토리지 노드중 가장 성능이 좋은 것을 자동 선택함으로써 읽기 지연을 감소시킴

• 논리적 선행읽기(LRA; Logical read ahead): B트리 안에서 페이지를 순서대로 메모리에선패치 함으로써 읽기 I/O를 줄임

비 캐시 읽기 성능 개선

0

20

40

60

80

100

120

MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016

In thousands of requests/sec

* R3.8xlarge instance, 1TB dataset using Sysbench

10% 출력량 증가

Scan

Delete

행 핫 경합(Hot row contention)

Scan

Delete

Insert

Scan Scan

Insert

Delete

Scan

Insert

Insert

MySQL lock manager Aurora lock manager

• 높은 경쟁 워크로드는 메모리, CPU사용이 많음

§ 1.9 (11월) – 락 압축 (핫 락을 위한 비트맵)

§ 1.9 – 스핀락을 블락킹 futex로 대체 – 최대 12x 의 CPU사용률 감소, 3x의 처리량 증가

§ 12월– 락 릴리즈에 동적 프로그래밍 사용: from O(totalLocks * waitLocks) to O(totalLocks)

Throughput on Percona TPC-C 100 improved 29x (from 1,452 txns/min to 42,181 txns/min)

행 핫 경합(Hot row contention)

MySQL 5.6 MySQL 5.7 Aurora Improvement

500 connections 6,093 25,289 73,955 2.92x

5000 connections 1,671 2,592 42,181 16.3x

Percona TPC-C – 10GB

* Numbers are in tpmC, measured using release 1.10 on an R3.8xlarge, MySQL numbers using RDS and EBS with 30K PIOPS

MySQL 5.6 MySQL 5.7 Aurora Improvement

500 connections 3,231 11,868 70,663 5.95x

5000 connections 5,575 13,005 30,221 2.32x

Percona TPC-C – 100GB

§ 프라이머리 키 정렬로 배치 인서트 가속 –인덱스 경유에서 커서 포지션을 캐싱함으로써 동작

§ 데이터 패턴에 따라 동적으로 스스로 기능을 끄거나 켬

§ 트리를 따라 내려가는 동안 래치를 획득하기 위한 경합을 피함

§ 양 방향적, 모든 인서트 구문에서 작동– LOAD INFILE, INSERT INTO SELECT, INSERT INTO

REPLACE and, Multi-value inserts.

배치 삽입 성능 향상

Index

R4 R5R2 R3R0 R1 R6 R7 R8

Index

Root

Index

R4 R5R2 R3R0 R1 R6 R7 R8

Index

Root

MySQL: B-tree 루트로부터 시작 인서트 최종까지 경유

Aurora: 인덱스 경유를 피함

더 빠른 인덱스 빌드§ MySQL 5.6은 Linux의 선행읽기(read ahead)

적용 – 이 방식은 결국 b트리에서 블락 주소를요구. 탑다운 방식의 새로운 트리 삽입은 결국분할과 과도한 로깅이 발생.

§ Aurora는 트리 안의 위치에 기반한 선 패치된블락을 스캔하며, 이는 블락 주소를 만들지 않음

§ Aurora builds the leaf blocks and then the branches of the tree.

• No splits during the build.

• Each page touched only once.

• One log record per page.

2-4X better than MySQL 5.6 or MySQL 5.7

0

2

4

6

8

10

12

r3.large on 10GB

dataset

r3.8xlarge on 10GB

dataset

r3.8xlarge on 100GB

dataset

Hours

RDS MySQL 5.6 RDS MySQL 5.7 Aurora 2016

공간 인덱스의 필요성• Need to store and reason about spatial data• E.g., “Find all people within 1 mile of a hospital”• Spatial data is multi-dimensional• B-Tree indexes are one-dimensional

• Aurora supports spatial data types (point/polygon)

• GEOMETRY data types inherited from MySQL 5.6• This spatial data cannot be indexed

• Two possible approaches:• Specialized access method for spatial data (e.g., R-Tree)• Map spatial objects to one-dimensional space & store in

B-Tree - space-filling curve using a grid approximation

AB

A A

A A

A A A

B B

BB

B

A COVERS BCOVEREDBY A

A CONTAINS BINSIDE A

A TOUCH BTOUCH A

A OVERLAPBDYINTERSECT BOVERLAPBDYINTERSECT A

A OVERLAPBDYDISJOINT BOVERLAPBDYDISJOINT A

A EQUAL BEQUAL A

A DISJOINT BDISJOINT A

A COVERS BON A

Aurora에서 공간 인덱스Z-index used in Aurora

R-Trees의 과제잘균형잡혔을 때효율적사각형이 중첩되거나 빈 공간을 덮으면안됨시간이 지남에 따라 악화리 인덱싱 비용이 높음

R-Tree used in MySQL 5.7

Z-index (dimensionally ordered space filling curve)저장, 인덱싱에서 기본적인 B-Tree 사용Removes sensitivity to resolution parameterAdapts to granularity of actual data without user declarationEg GeoWave (National Geospatial-Intelligence Agency)

공간 인덱스 벤치 마크 Sysbench – points and polygons

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . .. .

. . . . . . . . . . . . .

* r3.8xlarge using Sysbench on <1GB dataset* Write Only: 4000 clients, Select Only: 2000 clients, ST_EQUALS

0

20000

40000

60000

80000

100000

120000

140000

Select-only(reads/sec) Write-only(writes/sec)

Aurora

MySQL5.7

28

Aurora for PostgreSQL

§ 오픈 소스 데이터베이스§ 20 년간 활발히 개발 중§ 회사가 아니라 재단에 의해 소유됨§ 혁신 친화적인 오픈소스 라이센스§ 발군의 높은 성능§ 객체 지향과 ANSI-SQL:2008 호환§ 오픈 소스중에서 가장 뛰어난 공간정보 기능 보유§ 12언어로(Java, Perl, Python, Ruby, Tcl, C/C++, Oracle 유사의 PL/pgSQL,

etc.) 스토어드 프로지서 지원§ 가장 Oracle 호환성이 높은 open-source database§ AWS Schema Conversion Tool을 사용해서 Oracle 로 부터 PostgreSQL

전환에 있어서 가장 높은 자동 전환률

PostgreSQL 개괄

Open Source Initiative

Amazon Aurora로 고객 마이그레이션 시나리오

Amazon EC2 혹은 on-premises 로 부터

Amazon RDS for PostgreSQL 로 부터

Oracle and SQL Server 로 부터

새롭게 생성

PostgreSQL 벤치 마크 시스템 구성

Amazon AuroraAZ 1

EBS EBS EBS

45,000 total IOPS

AZ 1 AZ 2 AZ 3

Amazon S3

m4.16xlarge databaseinstance

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

Storage Node

c4.8xlarge client driver

m4.16xlarge databaseinstance

c4.8xlarge client driver

ext4 filesystem

m4.16xlarge (64 VCPU, 256GiB), c4.8xlarge (36 VCPU, 60GiB)

Amazon Aurora >=2x 더 빠름 (PgBench)

pgbench “tpcb-like” workload, scale 2000 (30GiB). All configurations run for 60 minutes

Amazon Aurora 2x-3x 더 빠름 (SysBench)

• Amazon Aurora delivers 2x the absolute peak of PostgreSQL and 3x PostgreSQL performance at high client counts

SysBench oltp(write-only) workload with 30 GB database with 250 tables and 400,000 initial rows per table

Amazon Aurora: Over 120,000 Writes/Sec

• OLTP test statistics:

• queries performed:

• read: 0

• write: 432772903

• other:(begin + commit) 216366749

• total: 649139652

• transactions: 108163671 (30044.73 per sec.) read/write requests: 432772903 (120211.75 per sec.) other operations: 216366749 (60100.40 per sec.) ignored errors: 39407 (10.95 per sec.) reconnects: 0 (0.00 per sec.)

sysbench write-only 10GB workload with 250 tables and 25,000 initial rows per table. 10-minute warmup, 3,076 clientsIgnored errors are key constraint errors, designed into sysbench

Sustained sysbench throughput over 120K writes/sec

Amazon Aurora 3x 더 빨리 데이터 로드

• 데이터 베이스 초기화는 표준 PgBench 벤치마크 테스트에서 PostgreSQL보다 3배 빠름

Command: pgbench -i -s 2000 –F 90

Amazon Aurora >2x 더 빠른 응답 시간

• 매우 높은 쓰기 로드에서 응답시간 >2x 더 빠름• (그리고 >10x 더 일관적)

SysBench oltp(write-only) 23GiB workload with 250 tables and 300,000 initial rows per table. 10-minute warmup.

Amazon Aurora 더욱 일관성있는 출력• 부하 상황에서 성능은 3배 이상• PostgreSQL 보다 더욱 일관성

PgBench “tpcb-like” workload at scale 2000. Amazon Aurora was run with 1280 clients. PostgreSQL was run with 512 clients (the concurrency at which it delivered the best overall throughput)

Amazon Aurora is 3x Faster at Large Scale

• 데이터베이스가 10 GiB 에서 100 GiB로 증가했을 때 1.5x 에서 3x 로 빨라짐

SysBench oltp(write-only) – 10GiB with 250 tables & 150,000 rows and 100GiB with 250 tables & 1,500,000 rows

75,666

27,491

112,390

82,714

0

20,000

40,000

60,000

80,000

100,000

120,000

10GB 100GB

write

s /

sec

SysBench Test Size

SysBench write-only

PostgreSQL Amazon Aurora

Amazon Aurora 85x 더 빠른 리커버리

SysBench oltp(write-only) 10GiB workload with 250 tables & 150,000 rows

Writes per Second 69,620




Recovery Time (seconds) 102.0




0 20 40 60 80 100 120 140

0 20,000 40,000 60,000 80,000

PostgreSQL

12.5GB

Checkpoint

PostgreSQL

8.3GB Checkpoint

PostgreSQL

2.1GB Checkpoint

Amazon Aurora

No Checkpoints

Recovery Time in Seconds

Writes Per Second

Crash Recovery Time - SysBench 10GB Write Workload

Transaction-aware storage system recovers almost instantly

Amazon Aurora 와 PostgreSQL 비교

성능 비교 결과Measurement ResultPgBench >= 2x faster

SysBench 2x-3x faster

Data Loading 3x faster

Response Time >2x faster

Throughput Jitter >3x more consistent

Throughput at Scale 3x faster

Recovery Speed Up to 85x faster

41

성능을 위한 모범예

성능 모범 예§ MySQL/RDBMS 성능 향상 방식은 여전히 동일

§ 가능한 동시접속 사용을 높임ü Aurora 출력량은 커넥션 갯수에 따라 증가

§ 읽기 확장을 적극 활용ü 리드 복제의 지연이 극히 낮음, 여러 읽기 분산으로 전체퍼포먼스 향상

§ 파라미터 튜닝ü 기존 MySQL파라미터를 Aurora로 적용할 필요 없음 à 기본 Aurora 파라미터는 충분히

최적화

§ 퍼포먼스 비교ü 개별 지표(CPU, IOPS, IO throughput)를 너무중시 말것ü 어플리케이션 성능 등에 촛점

§ 기타ü 쿼리 캐스를 ON으로ü CloudWatch 메트릭참고

Advanced monitoring

50+ system/OS metrics | sorted process list view | 1–60 sec. granularity alarms on specific metrics | egress to CloudWatch Logs | integration with third-party tools

ALARM

감사합니다

44

aws cloud 2017 - amazon aurora를 통한 고성능 데이터베이스 운용하기 (박선용...

Technology