db tech showcase_2014_a14_actian...

Download db tech showcase_2014_A14_Actian Vectorで得られる、BIにおける真のパフォーマンスとは

Post on 11-Aug-2014

695 views

Category:

Data & Analytics

0 download

Embed Size (px)

DESCRIPTION

 

TRANSCRIPT

  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. BI - BI -
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. BI Actian Vector Hadoop
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. Who is Actian? 1970s 2010 2011 2012 2013 Ingres Vectorwise (Actian Vector) Actian Versant Pervasive& ParAccel 2014 Actian Analytics Platform
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. What is Actian Vector? Actian Analytics Platform Analyze ActConnect Actian Analytics Accelerators Accelerate Hadoop Accelerate Analytics Accelerate BI Enterprise Applications Data Warehouse Social Internet of Things SaaS WWW Machine Data Mobile World-Class Risk Management Competitive Advantage Customer Delight Disruptive New Business Models NoSQLTraditional VectorMatrixDataFlow Vector
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. Actian Vector 3.5 Hadoop Vector in Hadoop(Vortex) Hadoop Summit /X100 2002 2008 2010 2014 TPC-H 1 2011 RDBMS(MonetDB/SQL) - - DBx100 - MonetDB - Vector Processing - Smarter Compression - CPU cache optimization Vectorwisex100 engine Ingres Database http://homepages.cwi.nl/~boncz/
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. 3rd DataFlow BI Replication ETL
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved.
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved.
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved.
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. CPU - CPU() CPU(CPU ) - /CPU100% ()CPU () I/O - (SSDHDD)I/OCPU (CPU10100) DBI/O (CPU)
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. - - - (CPU) - CPU() CPU
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. (Oracle) Jonathan Lewis Oracle Scratchpad 12c In-memory I wrote a note about the 12c In-Memory option some time ago on the OTN Database forum and thought Id posted a link to it from the blog. If I have I cant find it now so, to avoid losing it, heres a copy of the comments I made: Juan Loaizas presentation is probably available on the Oracle site by now, but in outline: the in-memory component duplicates data (specified tables perhaps with a restriction to a subset of columns) in columnar format in a dedicated area of the SGA. The data is kept up to date in real time, but Oracle doesnt use undo or redo to maintain this copy of the data because its never persisted to disc in this form, its recreated in-memory (by a background process) if the instance restarts. The optimizer can then decide whether it would be faster to use a columnar or row-based approach to address a query. The intent is to help systems which are mixed OLTP and DSS which sometimes have many extra indexes to optimise DSS queries that affect the performance of the OLTP updates. With the in-memory columnar copy you should be able to drop many DSS indexes, thus improving OLTP response times in effect the in-memory stuff behaves a bit like non-persistent bitmap indexing. Updated 18th Oct: Ive been reminded that I think the presentation also included some comments about the way that the code also takes advantage of vector (SIMD) instructions at the CPU level to allow the code to evaluate predicates on multiple rows (extracted from the column store, not the row store) simultaneously, and this contributes to the very high rates of data scanning that Oracle Corp. claims. The presentation from Juan Loaiza was still unavailable at the time of publishing this blog note (3rd Nov 2013). If it does become available as part of the Open World set of presentations it should be at this URL. http://jonathanlewis.wordpress.com/2013/11/06/12c-in-memory/
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. (SQL Server) SQL Server Column Store Index http://msdn.microsoft.com/ja-jp/library/gg492088.aspx SQL Server CPU
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. (DB2) http://public.dhe.ibm.com/common/ssi/ecm/en/imd14435usen/IMD14435USEN.PDF IBM DB2 10.5 with BLU Acceleration
  • The SAP HANA Database An Architecture Overview 4 Analytical Query Processing As generally agreed, column-stores are well suited for analytical queries on massive amounts of data [1]. For high read performance the SAP HANA DBs column-store uses efcient compression schemes in combination with cache-aware and parallel algorithms. Every column is compressed with the help of a sorted dictionary, i.e., each value is mapped to an integer value (the valueID). These valueIDs are further bit-packed and compressed. By resorting the rows in a table, the most benecial compression (e.g., run-length encoding (RLE), sparse coding, or cluster coding) for the columns of this table can be used [11, 12]. Compressing data does not only allow to keep more data on a single node, but it also allows for faster query processing, e.g., by exploiting the RLE to compute aggregates. Scans are accelerated by excessively using SIMD algorithms working directly on the compressed data [16]. http://sites.computer.org/debull/A12mar/hana.pdf Copyright 2014 Insight Technology, Inc. All Rights Reserved. (HANA)
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. SIMD SIMD (Single Instruction Multiple Data) Pentium SSE(Streaming SIMD Extensions)Sandy BridgeIntel AVX(Advanced Vector eXtensions) Instruction Data Output
  • in Row Database A Row Database B Column Database A Column Database B In-Memory Database A Copyright 2014 Insight Technology, Inc. All Rights Reserved.
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. OS RHEL 6.5 x86_64 CPU 16 core (Xeon E5-2680 2.7GHz (8) * 2) Memory 512 GB Disk 1.3 TB (120GB * 22 (RAID10)) Benchmark Data TPC-H@100GB Benchmark Query select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year and l_discount between 0.02 - 0.01 and 0.02 + 0.01 and l_quantity < 24 80GB6
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. Rt = Instructions / (IPC * Hz * Parallelism) * Rt : * IPC (Instructions Per Cycle) : CPU * Hz : CPU * Parallelism :
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. 2.7E+10 2.4E+11 2.0E+11 7.8E+11 1.9E+12 1.9E+12 2.8E+10 3.8E+11 4.8E+11 8.3E+11 2.8E+12 1.9E+12 1 9 7 29 102 68 0 20 40 60 80 100 120 0.0E+00 5.0E+11 1.0E+12 1.5E+12 2.0E+12 2.5E+12 3.0E+12 Columnar DB A Columnar DB B In Memory DB A Rt = Instructions / (IPC * Hz * Parallelism) Row Store DB A Row Store DB B CPU Instructions Vector ()
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. 1.8E+07 1.1E+09 3.0E+08 1.1E+09 1.6E+09 7.7E+08 2.1E+07 1.4E+09 1.2E+09 1.1E+09 1.7E+09 7.7E+08 1 64 17 62 88 43 0 10 20 30 40 50 60 70 80 90 100 0.0E+00 2.0E+08 4.0E+08 6.0E+08 8.0E+08 1.0E+09 1.2E+09 1.4E+09 1.6E+09 1.8E+09 2.0E+09 Rt = Instructions / (IPC * Hz * Parallelism) Columnar DB A Columnar DB B In Memory DB ARow Store DB A Row Store DB B CPU Branch-Misses Vector ()
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. IPC(Instructions Per Cycle) 2.19 1.70 2.05 1.94 1.83 2.08 1.74 1.58 1.40 1.85 1.58 2.08 0 0.5 1 1.5 2 2.5 3 3.5 Rt = Instructions / (IPC * Hz * Parallelism) Columnar DB A Columnar DB B In Memory DB ARow Store DB A Row Store DB B Instrunctions Per Cycle Vector ()
  • 0.48 3.44 35.58 209.45 467.36 332.56 1 7 74 434 968 689 0 200 400 600 800 1000 1200 0.0E+00 5.0E+01 1.0E+02 1.5E+02 2.0E+02 2.5E+02 3.0E+02 3.5E+02 4.0E+02 4.5E+02 5.0E+02 Copyright 2014 Insight Technology, Inc. All Rights Reserved. () Columnar DB A Columnar DB B In Memory DB ARow Store DB A Row Store DB B Rt = Instructions / (IPC * Hz * Parallelism) ParallelismEdition Query Elapsed Time () Vector ()
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. BIDB BI
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. ETL() RDBMS Vector Legacy ETL S3 / (S)FTP(S)
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. ETL() RDBMS DataFlow Engine Vector DataFlow Engine - No Map Reduce - VectorHadoop - S3 / (S)FTP(S)
  • Copyright 2014 Insight Technology, Inc. All Rights Reserved. ETL() DataFlow En