Uri LifshitzOptare Consulting Israel
Meaning: The victory is decided before
the battle has begun.
“Saya No Uchi”
Literally: “The victory is in the scabbard”
鞘の内で勝つ
July 10 2
July 10 3
(Source: Giga group)
Average cost of computer downtime:
Understanding
Preparation
Training
July 10 4
DB2 Catalog
DB2 Directory
DB2 Bootstrap datasets
Important note: we will not talk about
simple scenarios. If you want to read chapter
20 in the Administration guide feel free.
We will talk about some nasty things that
could happen to your:
July 10 5
Am I going to talk about Data sharing?
No, because Data Sharing works:
If you lose one DB2 the group still function
We all know about Restart light
CF Structure duplexing
The main DRP advantage in Data sharing is that
a downed DB2 is not a crisis
BUT the problem is still: getting the fallen DB2
back up.
July 10 6
Advice: If you wan to to hear about Data Sharing go to a performance presentation.
DB2 Bootstrap datasets.
Simple questions:
Are you using dual BSDS?
More complicated question:
How often do you backup your BSDS?
Trick question: BSDS are backed up automatically when you archive your active log datasets
Do you remember where your BSDS backup reside?
Do you know how to recover your BSDS?
Recover both or just the corrupted BSDS?
July 10 7
Another trick question: you recycled DB2 and
only one of your BSDS is corrupted, what will
happen?
14.30.51 STC10977 DSNY001I -DBX1 SUBSYSTEM STARTING
14.30.52 STC10977 DSNJ107I -DBX1 READ ERROR ON BSDS 350
350 DSNAME=DSNDBX0.DBX1.BSDS01, ERROR STATUS=0874
14.30.52 STC10977 DSNJ117I -DBX1 INITIALIZATION ERROR READING BSDS 351
351 DSNAME=DSNDBX0.DBX1.BSDS01, ERROR STATUS=0874
14.30.53 STC10977 DSNJ119I -DBX1 BOOTSTRAP ACCESS INITIALIZATION PROCESSING FA
14.31.03 STC10977 *DSNV086E -DBX1 DB2 ABNORMAL TERMINATION REASON=00E80084
14.31.03 STC10977 IEA794I SVC DUMP HAS CAPTURED: 357
357 DUMPID=001 REQUESTED BY JOB (DBX1MSTR)
357 DUMP TITLE=DBX1,ABND=04E-00E80084,U=SYSOPR ,M=(?),C=810.IPC
357 SNYSIRM,M=DSNYECTE,PSW=077C2000A021B300,A=0079
14.31.05 STC10977 IEF450I DBX1MSTR DBX1MSTR - ABEND=S04E U0000 REASON=00E80084
373 TIME=14.31.05
You guessed it: DB2 will not go up.
July 10 8
How do you fix that?
Easy, just copy the correct BSDS over the
corrupted BSDS. What’s the problem?
July 10 9
you copy a backup over the corrupted BSDS.
WWDD?
You guessed it again: DB2 will not start.
15.38.06 STC00002 DSNY001I -DBX1 SUBSYSTEM STARTING
15.38.07 STC00002 DSNJ127I -DBX1 SYSTEM TIMESTAMP FOR BSDS= 09.160 05:46:48.3
15.38.09 STC00002 DSNJ012I -DBX1 DSNJR005 ERROR 00D10348 READING RBA 856
856 0003F4FAA000 IN DATA SET DSNDBX0.DBX1.LOGCOPY1.DS04.
856 CONNECTION-ID=DBX1, CORRELATION-ID=004.JW006 00
15.38.15 STC00002 IEA794I SVC DUMP HAS CAPTURED: 858
858 DUMPID=002 REQUESTED BY JOB (DBX1MSTR)
858 DUMP TITLE=DBX1,ABND=04E-00D10348,U=SYSOPR ,M=(?),C=810.RLM
858 SNJLGR ,M=DSNJRE01,LOC=DSNJL002.DSNJR005+0408
15.38.15 STC00002 DSNJ232I -DBX1 OUTPUT DATA SET CONTROL 859
859 INITIALIZATION PROCESSING FAILED
15.38.15 STC00002 *DSNV086E -DBX1 DB2 ABNORMAL TERMINATION REASON=00E80084
15.38.17 STC00002 IEF450I DBX1MSTR DBX1MSTR - ABEND=S04E U0000 REASON=00E80084
895 TIME=15.38.17
July 10 10
We all know that the DB2 catalog and
directory are identical.
Until they are not…
July 10 11
And then we have to fix it. But some times it’s
not so simple to even know you have a problem
in your Catalog or Directory.
When will you know that your DB2 have a
catalog problem?
Myth number 1 – If I have a serious catalog
problem DB2 will not start.
July 10 12
Lets test that:
We decided that emptying the SYSDBASE dataset
is a serious enough catalog problem.
guess what?
11.54.20 STC06004 DSN9022I -DBX1 DSNYASCP 'START DB2' NORMAL COMPLETION
11.54.26 STC06004 DSNP012I -DBX1 DSNPCNP0 - ERROR IN VSAM CATALOG 531
531 LOCATE FUNCTION FOR DSNDBX0.DSNDBC.DSNDB06.SYSDBASE.I0001.A00
531 CTLGRC=AAAAAA08
531 CTLGRSN=AAAAAA08
531 CONNECTION-ID=DB2CALL, CORRELATION-ID=TMONDB2,
531 LUW-ID=*
11.54.29 STC06004 DSNP012I -DBX1 DSNPCNP0 - ERROR IN VSAM CATALOG 619
619 LOCATE FUNCTION FOR DSNDBX0.DSNDBC.DSNDB06.SYSDBASE.I0001.A00
619 CTLGRC=AAAAAA08a
’START DB2’ normal completion:
July 10 13
Maybe deleting SYSDBASE was not serious
enough for you?
How about deleting SYSCOPY VSAM dataset?
July 10 14
15.38.55 STC06005 DSN9022I -DBX1 DSNYASCP 'START DB2'
NORMAL COMPLETION
That’s right – DB2 start with no problem
You will find out this problem only when you try
to access SYSCOPY.
14.35.10 STC06053 DSNP012I -DBX2 DSNPCNP0 - ERROR IN VSAM CATALOG 031
031 LOCATE FUNCTION FOR DSNDBX0.DSNDBC.DSNDB06.SYSCOPY.I0001.A001
031 CTLGRC=AAAAAA08
031 CTLGRSN=AAAAAA08
031 CONNECTION-ID=DB2CALL, CORRELATION-ID=URI,
031 LUW-ID=*
July 10 15
Till then your DB2 works as if business as usual
Now you need to :
Recover your SYSCOPY
While other utilities get -904 on SYSCOPY
P.S.
Do you know where your SYSCOPY backups
are? (Because you damn sure can’t look them
up in SYSIBM.SYSCOPY)
July 10 16
OK, lets pick on the Directory for a change.
I believe one of the worst risks is having a
garbled DBD on your Directory
So we decided to garble some DBDs *
July 10 17
Do it yourself : how to garble your DBDs in three
easy steps
1. Use DNS1PRNT to print DSNDB01
2. Use the REPAIR utility to locate target Database
3. Use the REPAIR utility with the replace option at the you database DBD offset
Voila! You have caused your very own 00C90101
July 10 18
How to fix this?
Locate corrupted object in DB2 Catalog and
Directory using the REPAIR utility.
Fix corruption using REPAIR or a targeted
recovery rather then recovering all of your
catalog/directory.
July 10 19
Simple questions:
July 10 20
When was the last time you did any of the
operations I just mentioned?
and how long will it take you to do it now?
Understanding
Preparation
Training
July 10 21
Some thing you could do before the crisis:
1. Work Procedures
2. Prepared JCL and Commands
3. Products
4. Team Preparation
July 10 22
Have a clear crisis management procedure:
Who is managing the crisis (Only ONE!)
Who is the technical manager (Only one)
Who should be notified
Timelines for escalation
Periodical checkpoints
July 10 23
As silly as it may sound:
Having the right JCL with the right command will
save time and mistakes.
The last thing you want is to open a book and
start pondering what is the right command to do
what you need.
July 10 24
Some products that you really want to have:
Log Scanners
(backing out several bad changes or updating DRP site)
System Recovery Tools
(returning your whole system to a safe point)
Utility Enhancers
(quicker copy/recover, massive utility generation)Can I suggest BMC
COPY/RECOVERY PLUS?
Can I suggest BMC
RECOVERY MANAGER?
Can I suggest BMC LOG
MASTER?
July 10 25
Why Utility Enhancers?
Because some times you will need to rebuild
hundreds of indexes following a recovery
Because you want all you recovery to run in
parallel
Bottom line: you want to finish your
recovery as soon as possible
July 10 26
Why using System Recovery Tools?
Because you want to be able to recover
EVERYTHING to ANYTIME and you want it
done now.
Why using Log Scanners?
Because you might want to know what
happened during the time you skip.
because you want to know who did what
before the crisis
To keep your recovery site updated
July 10 27
Hey Uri, Why do you want us to buy so many
products? Are these guys paying you?
Money Uri got for this presentation: Company
12 NIS
(Moshe bought me a cup of coffee) BMC
0 NIS IBM
0 NIS CA
0 NIS Others
July 10 28
There are a lot of things you could do so that
your team will be ready to deal with a crisis:
Knowledge management mechanism
(May I suggest MediaWiki? it’s free)
Share information among team members
Make sure everyone are aware of the
procedures
July 10 29
But I believe that the most
important thing you could do to
make your team ready to handle an
emergency is:
July 10 30
Understanding
Preparation
Training
July 10 31
What should you do?
Revise your recovery work procedures
Meet with colleagues and compare procedures
Keeps your team up to date (reading, lectures,
DB2 courses and Certification tests)
Select one DBA to coordinate and test Recovery
scenarios
Remember: nothing is better then hands on
experience!
July 10 32
Perform DRP Drills!
The Methodology we developed at Discount
Bank:
Let the DRP coordinator simulate a DRP scenario
in the system environment
Let another DBA at random handle the problem
July 10 33
This is the only way to be really ready.
July 10 34
Hope you got some good ideas from this
presentation
July 10 35
July 10 36
July 10 37
July 10 38