Thursday, September 26, 2013

Analogy: Restore & Recovery to bone fracture

 

Restore and recovery are analogous to the healing process when you break a bone.

Restoring is similar to the process of setting the broken bone back to its original position.
This is like restoring the datafiles from a backup and placing them in their original locations.

Recovering a datafile is similar to the healing process that recovers the bone back to its state before it was broken.
When you recover your datafiles, you apply transactions (stored in the redo files) to get the datafiles back to the state they were in before the media failure took place.

Traffic analogy: Unix – CPU metric Run queue length

 

The traffic analogy

A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.

So, Bridge Operator, what numbering system are you going to use? How about:

  • 0.00 means there's no traffic on the bridge at all. In fact, between 0.00 and 1.00 means there's no backup, and an arriving car will just go right on.
  • 1.00 means the bridge is exactly at capacity. All is still good, but if traffic gets a little heavier, things are going to slow down.
  • over 1.00 means there's backup. How much? Well, 2.00 means that there are two lanes worth of cars total -- one lane's worth on the bridge, and one lane's worth waiting. 3.00 means there are three lane's worth total -- one lane's worth on the bridge, and two lanes' worth waiting. Etc.

clip_image002= load of 1.00

clip_image004= load of 0.50

clip_image006= load of 1.70

This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU.

Unix refers to this as the
run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.

Like the bridge operator, you'd like your cars/processes to never be waiting. So, your CPU load should ideally stay below 1.00. Also like the bridge operator, you are still ok if you get some temporary spikes above 1.00 ... but when you're consistently above 1.00, you need to worry.

Courtesy: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

Wednesday, September 11, 2013

universallogcollector.pl - extended of diagcollection.pl

11gR2 Universal Collection is expanded diagcollection.pl to collect

GI
ASM
database (RAC) diagnostics (logfile, trace file etc).

The goal is to reduce back and forth information request between Oracle Support and customers.


The tool collects information from local node only, it needs to be executed on all cluster nodes.
 
Once finishes, it will generate files in current directory. Only one file needs to be uploaded per node, the name of the file is highlighted on screen and default to allData_<nodename>_<timestamp>.tar.gz

Q: What command should i run if runInstaller fail?
A: Run the command: ./universallogcollector.pl --collect --install

Q: What command should i run to collect ASM/RAC trace files?
A: Run the command: /exports/universallogcollector/universallogcollector.pl --collect --excl "vendor, acfs, invt, crs"
   to exclude some of the unrelated log collection and save your time.

Q: What command should i run to collect CRS trace files?
A: Run the command: /exports/universallogcollector/universallogcollector.pl --collect --excl "base, invt,home, vendor, acfs"

Q: What command should i run to check system information?
A: Run command: ./universallogcollector.pl --collect --excl "crs,ocr,core,base,invt,home,vendor,acfs"
   this will only collect system log and system configuration data.

Q: I have serval Oracle homes, will the tool collect them for me?
A: Yes, the tool support multiple Oracle home. it wiill check oraInventory files and if
   multiple Oracle Home detected, it will collect them one by one.

 

Example:

root@dbsrvr1/exports/sivtest>/exports/universallogcollector/universallogcollector.pl --collect
Production Copyright 2004, 2010, Oracle. All rights reserved.
Universal Log Collector tool Version 1.4

#########
The following diagnostic archives will be created in the local directory if it's not excluded.

etcData_dbsrvr1_20130910_0245.tar.gz -> oraInst.loc, oratab and /etc/oracle or /var/opt/oracle(platform dependent).
crshomeData_dbsrvr1_20130910_0245.tar.gz -> logs, traces and cores from CRS Home.
                                             Note: core files will be packaged only with the --core option.
ocrData_dbsrvr1_20130910_0245.tar.gz -> ocrdump, ocrcheck etc.
chmosData_dbsrvr1_20130910_0245.tar.gz -> Cluster Health Monitor (OS) data.
coreData_dbsrvr1_20130910_0245.tar.gz -> contents of CRS core files in text format.
osData_dbsrvr1_20130910_0245.tar.gz -> logs from Operating System.
baseData_dbsrvr1_20130910_0245.tar.gz -> logs from CRS Base & Oracle Base(s).
invtData_dbsrvr1_20130910_0245.tar.gz -> logs from Oracle installation log.
orahomeData_dbsrvr1_20130910_0245.tar.gz -> logs from Oracle Home(s) log.
sysconfig_dbsrvr1_20130910_0245.txt -> system config info for cpu, memory, swap, network and disks.
crsresStatus_dbsrvr1_20130910_0245.txt -> outputs from "crsctl stat res -t -f [-init]"
vendorData_dbsrvr1_20130910_0245.tar.gz -> vendor clusterware logs if present.
acfsData_dbsrvr1_20130910_0245.tar.gz -> logs from acfs log.
allData_dbsrvr1_20130910_0245.tar.gz -> a summary tarball for all above logs.
#########

Collecting CRS home data
Collecting information from core dump files
No corefiles found
Collecting OCR data
Collecting Etc Oralce data
cp: /var/opt/oracle/oprocd/check/port: Operation not supported on socket
cp: /var/opt/oracle/oprocd/stop/port: Operation not supported on socket
cp: /var/opt/oracle/oprocd/fatal/port: Operation not supported on socket
Collecting CRS base & Oracle base(s) data
CRS base not specified or invalid, will try to get correct CRS base
Get valid CRS base "/app/oracle" and will collect it.
Collecting Oracle home data from "/app/oracle/product/11.1.0/crs_1"
Collecting Oracle home data from "/app/oracle/product/11.1.0/db_1"
Collecting Oracle home data from "/app/oracle/product/11.1.0/asm_1"
Collecting Oracle home data from "/app/oracle/product/em/agent11g"
Collecting Oracle home data from "/app/oracle/product/11.2.0.3/client"
Collecting Oracle home data from "/app/oracle/product/11.2.0.3/db"
Collecting OS logs
Collecting Oracle installation logs
Collecting vendor cluster logs
tar: cannot open var/opt/cmom/cmomd.log
Collecting sysconfig data
Collecting CRS resource status
Done
#########Universal Log Collection Finished.#######

    Please upload ONLY allData_dbsrvr1_20130910_0245.tar.gz to Oracle Support!

root@usfsuad1/exports/universallogcollector/test_run/run2>/exports/universallogcollector/universallogcollector.pl -help
Production Copyright 2004, 2010, Oracle. All rights reserved.
Universal Log Collector tool Version 1.4

universallogcollector

    --collect
        [--crs] For collecting crs diag information.
        [--install] For collecting install logs when failed with installation special before root.sh.
        [--excl] Exclude specified logs, support crs, ocr, etc, base, home, sys, inv, acfs and vend.
        [--adr] For collecting diag information for ADR; specify ADR location.
        [--chmos] For collecting Cluster Health Monitor (OS) data.
        [--all] Default. For collecting all diag information.
        [--core] UNIX only. Package core files with CRS data.

        [--crshome] Argument that specifies the CRS Home location.
        [--orahome] Argument that specifies the RDBMS Home to collect DB traces and alert logs.
            delimited by ",", if empty, default to all RDBMS Homes
        [--chmoshome] Argument that specifies the location for collecting Cluster Health Monitor (OS) information.

        [--afterdate] UNIX only. Collects archives from the specified date. Specify in MM/DD/YYYY format.
        [--aftertime] Supported with -adr option. Collects archives after the specified time. Specify in YYYYMMDDHHMISS24 format.
        [--beforetime] Supported with -adr option. Collects archives before the specified date. Specify in YYYYMMDDHHMISS24 format.
        [--incidenttime] Collects Cluster Health Monitor (OS) data from the specified time. Specify in MM/DD/YYYY24HH:MM:SS format.
            If not specified, Cluster Health Monitor (OS) data generated in past 24 hours will be collected
        [--incidentduration] Collects Cluster Health Monitor (OS) data for the duration after the specified time. Specify in HH:MM format.
            If not specified, all Cluster Health Monitor (OS) data after incidenttime are collected

    NOTE:
        You can also do the following:
            universallogcollector.pl --collect --crs --crshome <CRS Home>
            universallogcollector.pl --collect --excl "crs, ocr, base, acfs"

    --clean        cleans up the diagnosability information gathered by this script

    --coreanalyze  UNIX only. Extracts information from core files and stores it in a text file

Alternatively consider TFA – Trace File Analyzer.