ssis framework.pptx
Post on 06-Jul-2018
229 Views
Preview:
TRANSCRIPT
-
8/17/2019 SSIS Framework.pptx
1/53
SSIS – A BEGINNING
FRAMEWORKSQL Saturday #15
2 May 2009
Eric Wisdahl
http://ewisdahl.spaces.live.com
http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/
-
8/17/2019 SSIS Framework.pptx
2/53
OVERVIEW
Most developers would agree that every SSISsolution will have the same fundamental outline.
A basic framework will expedite the process byhandling the common tasks between the systems
while allowing the developer to concentrate onthe task at hand. This framework will consist ofmany items, including but not limited to settingup package configurations, logging, audit trails,error handling, naming standards, etc. This
document will present an example frameworkwhich can be used as the basis for future SSISPackage development.
-
8/17/2019 SSIS Framework.pptx
3/53
META SUB SYSTEMS
The Meta Subsystem contains informationrelating to the auditing, data quality, datadictionaries, processing control tables, and
configurations.
Please see documentation for the Meta-Datasubsystem for more information on the
audit, data quality and dictionary tablesathttp://ewisdahl.spaces.live.com
http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/
-
8/17/2019 SSIS Framework.pptx
4/53
AUDIT TABLES
The Packages table stores information relating to the packagename and versions which are executing.
The PackageExecutions table stores information relating to thepackage that is being run, the start and end dates, and whether
or not the execution was successful.
The TableProcessing table stores information relating to thestatistics of the package execution. How many records whereinitially in the table, how many were inserted, how manyupdated, how many errors there were, and how many records
were in the table after execution.
The DimAudit table stores information to tie thePackageExecutions and TableProcessing tables together forthose packages which might have more than one entry for theTableProcessing table.
-
8/17/2019 SSIS Framework.pptx
5/53
-
8/17/2019 SSIS Framework.pptx
6/53
DATA DICTIONARY TABLES
The TableDictionary table stores informationrelating to the database, schema, and table fromthe database system tables, as well as user inputinformation such as the description, grain,
display name, business name, etc.
The ColumnDictionary table stores informationrelating to the column name, data type, size,precision, scale, nullability, and default valuefrom the database system tables, as well as userinput information such as description, businessname, display name, type of SCD dimension,example values, unknown member values, etc.
-
8/17/2019 SSIS Framework.pptx
7/53
DATA DICTIONARY TABLES
(CONTINUED)
The LogicalDataMap table stores information
relating to how the data was input in to the
system. This includes the source system
database, schema, table, field and data type as
well as the etl rules and any relevant comments.
-
8/17/2019 SSIS Framework.pptx
8/53
CONTROL TABLES
The FrequencyTypes table holds information relating to
types of date ranges. It is used by the ProcessingDates
table below as an enumeration.
The ProcessingDates Table is a control table which holds apointer to a filter as well as a start and end date range
for the particular job to process. It also holds a pointer
to the Frequency type. The date range values in the
processing dates table are updated via a stored
procedure based on the frequency type.
The DictionaryDatabaseList table is used to store a list of
attributes relating to the databases which will be looped
through when processing the data dictionary tables.
-
8/17/2019 SSIS Framework.pptx
9/53
CONFIGURATIONS TABLE
The META environment also houses the SSISConfiguration Table. It is used to house all of theSQL Server configurations that are used in thevarious SSIS packages. Please see SSIS
Configurations, Expressions and Constraints onhttp://ewisdahl.spaces.live.com or BOL for an
overview of SQL Server configurations.
http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/
-
8/17/2019 SSIS Framework.pptx
10/53
CONFIGURATIONS
In this version of an SSIS framework, we use anenvironment variable to hold the connection stringfor the META database. In this fashion we form anindirect configuration to the rest of theconfigurations to be performed.
Once we have the connection to META we use the SQLServer Configuration table to populate the rest ofthe framework configurations as well as theremainder of the connection strings.
When using configurations, always put the descriptionfor the variable or property with the configuration ifpossible, as this allows the next user to identify howthe record(s) in the configuration table are beingused.
-
8/17/2019 SSIS Framework.pptx
11/53
-
8/17/2019 SSIS Framework.pptx
12/53
CONFIGURATIONS –
FRAMEWORK-AUDITPARAMETERS-
SERVERNAME
The ServerName configuration is used to allow the
easy identification of what server the
configurations are coming from as well as
(presumably) what server the ssis job was
running from. It is further used incommunicating back with the operator during
error or completion emails.
-
8/17/2019 SSIS Framework.pptx
13/53
CONFIGURATIONS-
FRAMEWORK-
AUDITQUERYEXPRESSIONS
The AuditQueryExpressions configurations are
used to set the variable values which contain the
sql command strings (via expressions) for the
execute sql tasks within the pre and post
processing sequence containers.
-
8/17/2019 SSIS Framework.pptx
14/53
CONFIGURATIONS –
FRAMEWORK-EMAILSETTINGS
The EmailSettings configurations will hold the
values for the from and to email addresses. It
will also hold the expressions for the subject and
body of the email when a package generates and
error as well as for when a package executessuccessfully.
Note – There is an alternative configuration
Controller-EmailSettings, which houses the sameinformation but with different values, that will be
used in the control (master) packages.
CO G A O S
-
8/17/2019 SSIS Framework.pptx
15/53
CONFIGURATIONS –
FRAMEWORK-
INDEXSCRIPTGENERATION
The IndexScriptGeneration configuration is used to
house the expressions for the Create and Delete
Index Script queries. Note, this is for the query
that generates the individual scripts and not for
the script itself. See the section on handlingIndexes in SSIS which follows.
-
8/17/2019 SSIS Framework.pptx
16/53
CONFIGURATIONS –
FRAMEWORK-ROOTFOLDER
The RootFolder configuration is used to house the
UNC path to the folder which will contain sub
folders for your log files, raw files, packages,
access databases, etc.
NOTE – In the examples I am presenting I use the
“C:\” named drive. This is bad practice. All
paths within SSIS should be full UNC paths (
\\servername.domainname\folder\subfolder\).
However, I do not have shares set up on my
personal laptop… This is an example of “Do as I
say, not as I do!”
http://smb//servername.domainname/folder/subfolder/http://smb//servername.domainname/folder/subfolder/
-
8/17/2019 SSIS Framework.pptx
17/53
CONFIGURATIONS –
SMTPCONNECTIONMANAGER-
CONNECTIONSTRING
The SMTPConnectionManager-ConnectionString is
used to house the connection string to the local
exchange server (or other mail service).
Note – As I do not have access to an exchange server
outside of work, my examples either have non-
working email components, or script tasks pointing
to gmail’s outward facing SMTP server. This
script task, or something similar, will need to beused if you have any situations where you need to
pass along security credentials to an email task, as
the send mail task does not allow any security
outside of windows security.
-
8/17/2019 SSIS Framework.pptx
18/53
CONFIGURATIONS – OTHER
If you have connection strings to a set of databases outside
of the meta database, it is often useful to include all of
these connections within the framework as well, so that
you do not have to continually recreate the connection
managers or reset the configurations to these connection
managers.
Once the framework configurations are set up, it is
important to realize that other configurations can and
should be set for the individual packages as applicable. In
the screen shot showing the package configuration
organizer you can see an extra configuration – Dictionary-
DynamicDatabaseConnectionString that is relevant only
to a particular package or set of packages, but not to the
framework as a whole. This is normal behavior.
-
8/17/2019 SSIS Framework.pptx
19/53
LOGGING
SSIS contains an internal logging mechanism to expose run timeevents. This information can be sent to text files, an sql profilerfile, the sysssislog table on an instance of SQL Server, thewindows event log or to an xml file. For our purposes, we use thetext file logging mechanism. This creates a csv file for eachpackage, which is dynamically created with the package nameand date.
This file can be used to track down warnings and errors from theexecution of the package, as well as determining the last activityfrom the package if the package has hung. We have chosen thetext file as it is a basic method of tracking any errors which is notreliant on any other system being up in order to function.
In this framework I have included all logging events except for the
OnPipeline events and the diagnostic events, as these add a lot ofrecords to the log without providing details that I feel are reallyneeded.
-
8/17/2019 SSIS Framework.pptx
20/53
LOGGING MENU ITEM
-
8/17/2019 SSIS Framework.pptx
21/53
LOGGING WIZARD
-
8/17/2019 SSIS Framework.pptx
22/53
LOGGING WIZARD 2
-
8/17/2019 SSIS Framework.pptx
23/53
FRAMEWORK VARIABLES
Variables are used for a host of activities
throughout the framework. There are variables
which are affected by both package configurations
and expressions.
There has been some effort to keep the variables in
a semblance of organization by using the
namespace property. To see the namespace
property, open the variables window and selectthe “choose variable columns” button.
-
8/17/2019 SSIS Framework.pptx
24/53
FRAMEWORK VARIABLES
This will open up the choose variable columns window. Here you have the option to
select from the scope, data type, value, namespace and raise event when variable
value changes columns. Check the namespace column.
-
8/17/2019 SSIS Framework.pptx
25/53
FRAMEWORK VARIABLES
In the framework, we have created a collection ofnamespaces to hold related variables.
The AuditParameter namespace currently houses
information about the destination and sourcetables. It is necessary to fill out the variables inthis namespace for every package in order toleave the proper audit trail.
The AuditQuery namespace currently housesvariables which use expressions to generate thesql query or command used in the pre-processingand post-processing sequence containers (as wellas the stop process task).
-
8/17/2019 SSIS Framework.pptx
26/53
FRAMEWORK VARIABLES
The AuditVariable namespace is used to house the returnvalues from the sql queries, insert / update / error / etccounts from the data flow, etc. Essentially any item used totrack an audit item for the package will be stored in thisnamespace.
The DateParameter namespace is used to house informationrelating to the processing dates record. The namespacecontains the frequency type variable which will need to befilled in for any package which wishes to make use of theprocessing dates table. This frequency type is used togenerate the processing dates filter via an expression with
the package name. The DateParameter namespace furthercontains the processing date key, start and end date rangesfor this package (if a record is present in the processingdates table for the package).
-
8/17/2019 SSIS Framework.pptx
27/53
FRAMEWORK VARIABLES
The Files namespace contains variables used to housenetwork paths and file names. It includes variablesthat are either set via package configurations orexpressions.
The Index namespace is used to house the queriesthat will generate the create and delete indexscripts, the record sets that will house these scriptsand the individual variable that will hold one ofthese scripts at a time.
The Key namespace will be used to house anyreturned surrogate key values. As of this writingthis is only used for the audit trail, although it iscertainly possible to house any returned key withinthe namespace.
-
8/17/2019 SSIS Framework.pptx
28/53
FRAMEWORK VARIABLES
The Query namespace is used to house any queriesthat are process related as opposed to relating tothe audit or control procedures. An example is aquery used to update the type 2 slowly changingdimension columns in a batch update (as opposed
to a row by row approach within the data flow).
The SSISEmail namespace is used to holdvariables related to emailing the operators andconstructing the subject and body of emails to besent out.
The User namespace is the default namespace forSSIS. It will contain any variables which areadded to the package using the framework(Unless if you specify another namespace).
-
8/17/2019 SSIS Framework.pptx
29/53
SSIS AND INDEXES
Indexes are known to have a great impact onperformance when performing a large number ofinserts or updates. As such, it is advisable todrop and recreate the indexes associated withany table that an SSIS package is processing.
We handle the creation and deletion of the indexesthrough a pair of expressions, stored as packageconfigurations, which return a recordset of thescripts used in this process. We then loopthrough the recordset and execute eachstatement individually.
-
8/17/2019 SSIS Framework.pptx
30/53
STOP PROCESS
The Stop Process task in the framework is used todetermine whether or not this process has beenrun for the parent package before. This task usesthe AuditQuery::StopProcessQuery variable asthe source of the query and the
AuditVariable::StopProcess variable to store theBoolean value returned in the query.
Finally, the precedence constraint going in to thepre-processing container is as follows:
@[AuditVariable::StopProcess] == false ||@[AuditParameter::ParentPackageExecutionKey] == -1
-
8/17/2019 SSIS Framework.pptx
31/53
PRE-PROCESSING CONTAINER
The pre-processing sequence container houses thetasks used in determining the initial row countsand surrogate key for the destination table,creating the audit trail, generating the necessarycontrol information for the package and those
tasks used to handle the indexes on thedestination table.
-
8/17/2019 SSIS Framework.pptx
32/53
-
8/17/2019 SSIS Framework.pptx
33/53
POST PROCESSING CONTAINER
The post-processing sequence container houses thetasks used in determining the initial row countsand surrogate key for the destination table,updating the audit trail, recreating the indexeson the destination table, sending out completion
emails (where appropriate) and deleting any fileswhich are no longer necessary.
-
8/17/2019 SSIS Framework.pptx
34/53
-
8/17/2019 SSIS Framework.pptx
35/53
PROCESSING CONTAINER
The processing container is used to house the tasks specific
to the package being developed. It can be further broken
down into sub containers if desired.
-
8/17/2019 SSIS Framework.pptx
36/53
DATA FLOW TASKS
Most of the activity in the processing sequence
container will take place in a data flow task.
Inside of the data flow task, we like to keep
certain items standardized across packages.
-
8/17/2019 SSIS Framework.pptx
37/53
COUNTS
Extract – The number of rows pulled from the source system
Error Type1 Update – The number of data errors encountered during thetype 1 update branch.
Error Type 2 Update – The number of data errors encountered during thetype 2 update branch.
Error Insert – The number of data errors encountered during the insertion of
the records into the destination table.Failed Lookup – The number of rows that failed to find a match in a lookuptransformation. Often used when building dimensions.
Insert Standard – The number of rows inserted during standard processing.
Insert Non-Standard – The number of rows inserted during non-standardprocessing (ex. late arriving)
No Change – The number of rows which did not change between what was
input from the source system and what is currently stored in thedestination.
Update Type 1 – The number of rows updated during the processing of theSCD Type 1 branch.
Update Type 2 – The number of rows updated during the processing of theSCD Type 2 branch.
-
8/17/2019 SSIS Framework.pptx
38/53
-
8/17/2019 SSIS Framework.pptx
39/53
ERROR FILES
Data errors are put out to a raw file destination.
All errors within the data flow should be brought
together via a union all operation with enough
information to describe where the error occurred
as well as what the error was.
-
8/17/2019 SSIS Framework.pptx
40/53
-
8/17/2019 SSIS Framework.pptx
41/53
ONERROR EVENT HANDLER
The OnError Event Handler is a set of code that isexecuted any time that an error has occurredwhile executing a package. These are errors thatoccur with the process, and are different from a
data error, if the data error is handled within thedata flow task. Within the OnError EventHandler we determine whether or not we havealready sent an error email for this package. Ifwe have not previously sent an error email, we do
so now to a list of recipients determined viapackage configuration. Afterwards we incrementthe counter so that we do not send a second erroremail.
-
8/17/2019 SSIS Framework.pptx
42/53
Sample Error Email:
From: sqlservice@citizensfla.com [mailto:sqlservice@citizensfla.com]
Sent: Tuesday, March 1, !" !:# $M
To: %e&ort Team
Su'(ect: )rror durin* e+ecution of the load%-T$)/T$0M/2)340)S &ac5a*e.m&ortance: 6i*h
There 7as an error in the e+ecution of the load%-T$)/T$0M/2)340)S &ac5a*e
7hich started at 8919!" !::## $M. The follo7in* is the first error re&orted:
;1
-
8/17/2019 SSIS Framework.pptx
43/53
CONNECTION MANAGERS
Connection Managers should be created for every data basewhich is used. The name should be the name of thedatabase or file with no reference to the machine or accountto be used (as these will change between environments).The connection managers that are common to thedevelopment efforts should be placed in the common
template for a project and should have the connectionstring and descriptions set via package configuration. It isworth noting that having extra connection managerswithin a package that are not used carries a minimal costwhen validating the package.
If there would be two separate connection managers to thesame database, but with different connection managertypes, assume that the OLE db connection manager is thedefault and name any other connection managers withtheir type (example META and META.NET)
-
8/17/2019 SSIS Framework.pptx
44/53
HASH VALUES (CHECK SUMS)
Hash values are used to generate quickcomparisons to determine whether or not arecord, or a subset of a record’s columns, haschanged. In order to facilitate the quickcomputation of hash values within a data flow we
have employed the Checksum Transformationavailable from Konesans. With thistransformation you simply select which columnsyou would like to be included with the hash andspecify and output column name.
-
8/17/2019 SSIS Framework.pptx
45/53
-
8/17/2019 SSIS Framework.pptx
46/53
BIDS HELPER
BIDS Helper is a visual studio add-in that expands thefunctionality of the business intelligence design studio.BIDS Helper includes a vast array of extensions includinggiving a graphical representation of expressions andconfigurations, allowing for pipeline componentperformance breakdowns, it extends the variables window,
sorts the project files, fixes relative paths, gives a list of allexpressions and non-standard property values used withinthe packages, etc. It is HIGHLY recommended that anyoneusing BIDS to develop SSIS package install this product.
BIDS Helper is available at
http://www.codeplex.com/bidshelper For more informationon this product please see the the bidshelper web site listedabove.
http://www.codeplex.com/bidshelperhttp://www.codeplex.com/bidshelper
-
8/17/2019 SSIS Framework.pptx
47/53
-
8/17/2019 SSIS Framework.pptx
48/53
-
8/17/2019 SSIS Framework.pptx
49/53
-
8/17/2019 SSIS Framework.pptx
50/53
BIDS OBJECT GUIDS
However, if you have installed
BIDS you can generate new
GUIDS for all objects within
the package by right clicking
on the package name within
the solution explorer and
choosing Reset GUIDS (this
method is preferred as it
will reset all of the IDs
within the package).
-
8/17/2019 SSIS Framework.pptx
51/53
-
8/17/2019 SSIS Framework.pptx
52/53
PACKAGE VERSIONS
There is further a property for Version Comments that should befilled in to explain the changes that have been implemented.
-
8/17/2019 SSIS Framework.pptx
53/53
CONCLUSION
I hope that this has been helpful. I will try to
provide the packages that load the META data
dictionary shortly on my skydrive (which you can
find a link to athttp://ewisdahl.spaces.live.com)
as a working example. I will also try to provide apackage or two showing normal load into an ODS
and a sample package used to conform data.
NOTE: The framework I have presented is a draft
item. I am continually updating it, and, if youshould happen to use it as your base framework
going forward, I would expect you to do the same.
http://ewisdahl.spaces.live.com/http://ewisdahl.spaces.live.com/
top related