partition switch based data loads
TRANSCRIPT
1| Copyright © 2015 Tata Consultancy Services Limited
Microsoft APS based EDW
Sustaining Strategic GrowthImplementing partitioning
2
Presented by: Leo Khaskin, Solution Architected
Agenda
Use Case
Best Practices
Future State Architecture
Live Demo
Partitioning based process template
Partition Switch Mechanics
Compare Existing vs Test Environment
Prototype Design
Performance Statistics
Considerations
Benefits Scalability
Process Control
Maintainability
Flexibility
Next Step - Implementation
3
Presented by: Leo Khaskin, Solution Architected
Use Case
When EDW on APS platform becomes matured with hundreds of data flows
pumping data into thousands of tables, production teams often times observe
slowdown in query performance and queuing of SQL queries, which leads to
significant delays in data delivery.
If updates to fact table are not limited to any point in time in the the recommended
method is CTAS which creates new table implementing relevant business rules,
drops existing table and renames temp table into original name.
With significant number of records (1B +) and complex rules the query becomes
heavy and might take significant time, consuming much of the appliance
resources, this blocking other queries from execution.
Also, SSAS model, sourced from the fact table will require Full Process, which
consumes significant time.
When CTAS execution time becomes close to SLA - it's right time to evaluate
Partition Switch option.
4
PDW Best Practices – Sustaining Strategic Growth
• Data preparation – NOT in PDW
• Optimize Query
• Utilize CSI
• Monitor PDW Resources
• Partition Switch
• Separated Processes:• Load
• Refresh
• Process SSAS
Process
Policy
Tool
PDW
Optimal
Performance
5
Future State Architecture – Sustaining Strategic Growth
Source
File
in
NAS
SSRS
12
3
7
Data Flow
1 Source System
2 Batch extract
3 SQL Server SMP – Data Preparation
4 Prepared data Increment
5 SSIS package
a DWLoader
b Partition Switch
c SSAS Processor
6 PDW
7 Data Consumers
Ad Hoc
Da
ta C
on
su
mers
NON AU Stage
DQA
Data Type Validation
Constraints Check
Surrogate Key Generator
Distribution Key Generator
De-Duplication
System of Records Prepared
Data
4
5a
6
PDW
Computations
Mart
Stage Fact
SSAS
DWLPS
TAB
5b
5c
6
Presented by: Leo Khaskin, Solution Architected
Partition Switch Mechanics
Load data
into PDW
FFLoader
Parallel Partitions
ProcessingProcess SSAS model
SSAS Processor
7
Presented by: Leo Khaskin, Solution Architected
Compare Existing vs Test Environment
*Only 2 partitions where executed in parallel due to memory constraints.SSIS is running on 4 core machine, max 6 partition can be processed simultaneously.Degree of parallelism is defined by SSIS server number of cores, configuration
settings and available memory.
8
Prototype Design
Metadata operation
Dataset operation
Presented by: Leo Khaskin, Solution Architected
9
Presented by: Leo Khaskin, Solution Architected
Performance Statistics – No pressure on PDW resources
Execution Notes:
Table depicts parallel execution average run time per partition.
Degree of parallelism is defined by SSIS server settings.
Highlighted executions are performed on the same table with Column Store Index (CSI) applied.
Averaged memory consumption
CPU utilization
10
Presented by: Leo Khaskin, Solution Architected
Considerations / Decisions
Partition grain: larger partition – fewer partitions count
System of records:Maintain a copy – create a new copy every run
Table availability:Table copy – single partition (on fly - switch out / in )
11
Presented by: Leo Khaskin, Solution Architected
Benefits
• Significantly shorter load time
• Possibility to process SSAS model incrementally
• Ability to use CSI • Data Compression – smaller footprint on disk
• Batch execution mode enabled
• Improved execution plans
• Faster queries performance
• Scalability to TB sizes
• Better process control
• Increased Maintainability
• Modular design – Reusable Components
• Data Recovery, Archiving, System of Record
12
Next Step - Implementation
Environment
Data
Contact us for evaluation:
Leo Khaskin, [email protected]
Huzeifa Nasir, [email protected]