clr metrics aging

Upload: jasonzzzz

Post on 31-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/15/2019 CLR Metrics Aging

    1/19

    Consolidated LoggingRepository (CLR)Metrics Aging

    DESIGN

    Repository Team10 April 2007

  • 8/15/2019 CLR Metrics Aging

    2/19

  • 8/15/2019 CLR Metrics Aging

    3/19

    2008, Sabre Inc. All rights reserved.

    This documentation is the confidential and proprietary intellectualproperty of Sabre Inc. Any unauthorized use, reproduction,preparation of derivative works, performance, or display of thisdocument, or software represented by this document, without theexpress written permission of Sabre Inc. is strictly prohibited.

    SabrePat Furrow and the Sabre logo design are trademarksand/or service marks of an affiliate of Sabre Inc. All other trademarks, service marks, and trade names are owned by their respective companies.

  • 8/15/2019 CLR Metrics Aging

    4/19

  • 8/15/2019 CLR Metrics Aging

    5/19

    Table of Contents

    M e t r i c s A g i n g

    Design...............................................................................................1

    History....................................................................................................1

    O v e r v i e w

    Problem Statement................................................................................1

    Objectives..............................................................................................1

    Assumptions..........................................................................................1

    CLR Environments and Retention Requirements..................................1

    D a t a b a s e D e s i g n

    MySQL Accounts and Connection Parameters......................................2

    Level 0 Detail Logging (Staging Area)................................................3

    Level 1 Weekly Summary...................................................................5

    Level 2 Monthly Summary..................................................................6

    Level 3 Annual Summary....................................................................7

    Logging Subscriber................................................................................8

    R e q u i r e m e n t s

    Monitoring..............................................................................................9

    Failure Recovery....................................................................................9

    Aging Process......................................................................................10

    Backlog for Future Releases...........................................................10

    Test Plan.................................................................................. ............10

    Security........................................................................................ ........11

    Issues from Reviews............................................................................12

    A p p e n d i x

    Issues from Reviews.............................................................................. .

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. iii

  • 8/15/2019 CLR Metrics Aging

    6/19

  • 8/15/2019 CLR Metrics Aging

    7/19

    History

    The Consolidated Logging Repository (CLR) was created to provide a central storeof operational metrics used to manage the many computing systems at Sabre.

    Operational metrics are captured from the Sabre services instrumented with theIntegrated Computing Environment (ICE) logging API. There are five types of logging that a service can generate: metrics, application, security, billing and audit.

    Metrics logging is designed to measure the behavior of a service and was the firstlogging type to be organized into a relational database in the CLR.

    Problem Statement

    The initial MySQL schema used to store metrics logging closely matched the actualformat of the logged data. Growth in volume of the metrics logging has nowexceeded the capacity of the schema (report and ad-hoc query performance issignificantly degraded) and exhausted the available storage capacity.

    Objectives

    The goal of the metrics aging solution is to provide summarized levels of metrics to:

    meet operational management requirements

    optimize the performance of reporting and ad-hoc analysis, and

    minimize the repository costs.

    Assumptions

    The metrics aging solution will be implemented in 2Q07.

    The metrics aging solution will use a current release of MySQL 5.0.

    CLREnvironmentsand Retention

    Requirements

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 1

    1Overview

    Database Design

    2

  • 8/15/2019 CLR Metrics Aging

    8/19

  • 8/15/2019 CLR Metrics Aging

    9/19

    Level 0 Detail Logging (Staging Area)

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 3

  • 8/15/2019 CLR Metrics Aging

    10/19

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 4

  • 8/15/2019 CLR Metrics Aging

    11/19

    Level 1 Weekly Summary

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 5

  • 8/15/2019 CLR Metrics Aging

    12/19

    Level 2 Monthly Summary

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 6

  • 8/15/2019 CLR Metrics Aging

    13/19

    Level 3 Annual Summary

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 7

  • 8/15/2019 CLR Metrics Aging

    14/19

    Logging Subscriber

    The ICE logging database subscriber will need to be modified to insert the metricslogs it receives into the new Level 0 table. Major changes in this schema include:

    CLIENT_IND is part of the unique key to prevent duplicate inserts.

    All calculated statistics have been removed (AVG, STD_DEV). These will becalculated in the BusinessObjects universe.

    The bucket table has been separated into two distribution tables.

    The only foreign key relationships are on CLR_IND and METRIC_ID which areenforced by MySQL.

    The size of the SUM_OF_SQ columns has been increased to BIGINTUNSIGNED.

    All metrics (counts, times and sizes) are unsigned and will not accept negative values.

    The following changes in the subscriber logic will be required to accommodate thenew aging schema and processes:

    1. The staging level of the metrics schema Each day will have its own unique set of tables with the date specified in each tables name as followswill have the

    following five tables :a. METRIC_L0 _yyyymmdd

    b. METRIC_L0_COUNT _yyyymmdd

    c. METRIC_L0_ITEM _yyyymmdd

    d. METRIC_L0_ITEM_DISTRIB _yyyymmdd

    e. METRIC_L0_DISTRB _yyyymmdd This tables structure is identical toMETRIC_L0_ITEM _DISTRIB but will only contain the optional distributiondetail for the metrics stored in the main metricMETRIC_L0 table. The metricto which the distribution relates will be indicated by specifying one of thefollowing values for ITEM_NAME:

    1) EXIST_TIME

    2) IN_MSG_SIZE

    3) OUT_MSG_SIZE

    Metrics logs should be inserted into the day table matching the

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 8

    Requirements 3

  • 8/15/2019 CLR Metrics Aging

    15/19

    GENERATED_TIME of the log.

    2. If any of the proper L0 tables are not available for inserts ( table with that datedoes not exist, table is locked, MySQL is down, etc.), the whole metrics logshould be written to the secondary (recovery) log file and no part of the logshould be inserted in any L0 table. An application error should be generated

    each time an insert fails and this in turn should trigger an alert to operations staff every each 5 minutes that the errors continue to occur to alert operations staff that metrics inserts are failing.

    Note We would expect to get errors/alerts on any erroneous future date loggingthat isnt corrected prior to this implementation.This section containsexamples and instructions for the various styles to use with graphics.

    Monitoring

    1. Operations staff should be alerted whenever the subscriber processes are notrunning.

    2. The subscriber generated errors process should be monitored and in turn generatean application erroralert every 5 minutes to alert notify Fabric operationsCoverage staff (or MRC for test environments) when inserts of logging to theMySQL L0 tables are failing.

    Note Erroneous future date logging will also generate these alerts.

    3. Logging patterns and aging summarization should be validated daily andgenerate email alerts to the repository team if anomalies are detected. Thisvalidation can likely be implemented using BusinessObjects published reports.Types of anomalies to be detected include:

    a. New logs/missing logs for a node or method

    b. Significant changes in volume of logging

    c. Missing logging intervals

    d. Mismatch between the two database images

    Failure Recovery

    The CLR maintains two parallel database images at all times in order to prevent lossof logging data when planned or unplanned failures occur. Failures can occur atseveral levels: Subscriber process

    Aging process

    MySQL database

    CLR server

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 9

  • 8/15/2019 CLR Metrics Aging

    16/19

    Within 48 hours of recovering from such a failure, each database image should berestored to a full replica of the logging data by using one of the following processes:

    1. Logging recovery process The subscriber will read the recovery log itgenerated during the failure and insert ed the saved logs into the L0 tables on thesame server where the failure occurred.

    2. Table copy process Whole tables can be restored by dropping the incompletetable on the database which suffered the failure and copying the complete tablefrom the opposite servers database.

    3. Reconciliation process If neither database image has a complete table of logging for a given dayweek, month or year , a process to read a specified timerange of data from one servers database and insert it into the other will be usedto generate a complete logging table on one server. Then the table copy processcan be used to restore the other databases table.

    4. Backup/Restore process The database on each server (including InnoDB and

    MyISAM tables) must be backed up daily. If multiple tables in a database arecorrupted, a previous day backup from either server or a current backup from theother server can be restored.

    Aging Process

    1. Summarize any Level 0 tables where the latest RECEIVED_TIME is greater than the last summarization process run time.

    2. Purge any Level 0 data that is older than the specified retention period (initially 14 days).

    3. Compress any Level 1 3 table once the last possible Level 0 table it couldcontaindetail for any date contained in that summary level is dropped. Apply

    best practices to guarantee the integrity of tables during compression.

    4. Schedule aging/table compression to minimize the impact to reporting and ad-hoc query activity.

    Backlog for Future Releases

    5. Create a summary table of intervals dropped from Level 0 due to zero counts.

    6. Create a table of peak intervals from Level 0 detail.

    Test Plan

    1. Test each failure recovery process.

    2. Test existing Business Objects reports with new schema.

    3. Test subscriber when:

    a. L0 table does not exist (both future date and past date logging)

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 10

  • 8/15/2019 CLR Metrics Aging

    17/19

    b. Database backup is running

    c. Past date data is received and L0 table does exist

    d. Duplicate logging is received

    Security

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 11

  • 8/15/2019 CLR Metrics Aging

    18/19

    Issues from Reviews

    The following issues were raised during various reviews of the database and schemadesign. They are documented here for future reference.

    1. Summarization processes need to be designed and documented as well as theschema. See the Aging Process Requirements.

    2. What happens if the subscriber cant find the right Level 0 table to insert themetrics log? See the Logging Subscriber Requirements.

    3. Consider naming Level 0 tables with fixed names instead of dynamic to avoidissue 3. See the Logging Subscriber Requirements.

    4. What happens to logging received after the Level 0 table for that day has already been summarized? See the Logging Subscriber and Aging ProcessRequirements.

    5. What is the best time of day to run the summarize process? Schedule after logging for day is usually complete but before automated report runs begin.

    6. How will the switch from daylight savings to standard time be handled? Wouldstoring datetime in UTC format solve this? Use timestamp data type instead of datetime? Per Mark Scriffiny, the ICE API is passing all times in GMT and theyare stored that way in the MySQL database. Per MySQL we are converted from

    DATETIME data types to TIMESTAMP data types.

    7. MySQL security needs to be designed and implemented at the same time that wedeploy the new schema and processes. Added security requirements todocument.

    8. Resynch processes need to be defined and tested to make sure no schemachanges are needed. Added resync process requirements to the document. TheCLR_IND was added to the table structure to enable easier resynch of datafollowing a failure.

    9. Apply best practices to guarantee the integrity of tables during compression.

    Added to Aging Process Requirements.10. Schedule table compression to minimize the impact to reporting and ad-hoc

    query activity. Added to Aging Process Requirements.

    11. Consider creating a summary table of intervals dropped from Level 0 due to zerocounts. Added to Aging Process Backlog Requirements.

    12. Consider creating a table of peak intervals from Level 0 detail. Added to Aging

    Consolidated Logging Repository (CLR) Metrics Aging February, 2008 Confidential and Proprietary Sabre Inc. 12

    Appendix 4

  • 8/15/2019 CLR Metrics Aging

    19/19

    Process Backlog Requirements.

    13. Consider keeping the INSTANCE_NAME in all levels of summary. Will thiscause indexes to be much larger even if the column is left blank? A null varchar value will require 8 bytes of storage. We will keep INSTANCE_NAME in allsummary levels.

    14. Should we eliminate the RANGE_5_END columns since this will always beequal to the max for that metric? No, if five or fewer levels are specified, theupper limit will be set based on the XML input.

    15. What will the aging model be for development, integration and certification testenvironments?Development: Level 0Integration: Level 0 1Certification: Level 0 3 just like production

    16. Test all existing Business Objects reports with the new schema. Added to Test

    Plan.

    17. Will the BIGINT data type be large enough for a whole day sum of squares?Yes, the largest unsigned value for BIGINT is 18,446,744,073,709,551,615.

    18. How do we guarantee uniqueness of the primary key through summarization andresynchronization processes? Include an integer prefix in the auto-generatedinteger primary key containing the name of the table (yyyymmdd) followed by asingle digit indicating to which database the row was originally inserted.

    19. It was agreed during the initial review of this design with the ICE development team that the Level 0 tables will not be partitioned by data, but will be a single set of 5 table s with a rolling 14 days of detail maintained in each .