Fault Tolerant Oracle Installation (ArsDigita Systems Journal)

Fault Tolerant Oracle Installation

Submitted on: 2000-06-03
Last updated: 2000-06-19

ArsDigita : ArsDigita Systems Journal : One article

An Oracle installation is fault tolerant if it can survive the failure of a major piece of hardware -- an entire computer, for example -- and continue to operate without significant data loss or downtime. There are several ways to achieve fault tolerance, each with a different price tag and each offering different degrees of reliability. The most reliable (and, not surprisingly, the most expensive) of these methods is Oracle Parallel Server (OPS), in which multiple computers each running a separate Oracle instance access the same database.
OPS offers transparent failover; the only thing a user would notice in the event of a failure would be rollback of any transactions in progress when the system failed. In addition to high availability, OPS has benefits such as scalability and high performance. For less critical services with smaller budgets, a standby database -- a second computer that perpetually loads the production database's archived redo logs -- is a good choice. Although human intervention is usually needed to activate a standby database, guaranteeing a downtime of at least a few minutes in a failure, it is a much less expensive option and much easier to set up and maintain than OPS. In addition, in Oracle8i, the standby database itself can be opened as a read-only database for reporting.
Below, I will describe how to set up a standby database system and an OPS installation for high availability. I will also take a close look at the hardware necessary to run OPS and look at the costs and benefits of some common hardware configurations. Finally, note that are at least three ways other than OPS and standby database to make an Oracle installation more reliable than a generic one-computer setup, including Replication and OS-based methods. Thomas Kyte runs down the pros and cons of all three, plus the two described above, in his article "Oracle Availability Options" in Oracle Magazine (http://www.oracle.com/oramag/oracle/00-May/o30tom.html).

Standby database

Overview
A standby database is one that serves as a dynamic backup of the production database ready to take over and become the production database when required. For that purpose, the production database is running in ARCHIVELOG mode, and the archived redo logs are periodically applied to the stand-by database. In order to apply the archived logs from the production database, the stand-by database is kept in recovery mode; it is readonly to the database users. Using a standby database is the best fault tolerant solution in a single instance environment, and its operation has a lot to do with Oracle's backup and recovery mechanisms. See Oracle's online documentation, "Oracle8i Standby Database Concepts and Administration" (http://oradoc.photo.net/ora816/server.816/a76995/toc.htm) for more information.
To understand how standby database works, we need to talk a little about Oracle's log archiving mechanism. Any Oracle database operates in one of two modes: NOARCHIVELOG or ARCHIVELOG. Redo logs are re-used and re-written in a cyclic manner. In NOARCHIVELOG mode, the redo log files are not archived before being re-used. If there are 3 redo log groups, and the log files are written to group 1, then group 2, then group 3, when the last group (group 3) is fully written the database writes redo information to group 1 again, overwriting its previous content. As a result, for a database operating in NOARCHIVELOG mode, it can only "rescue" the amount of transactions that are kept in the redo logs in case of instance failure. For a more detailed description, see Oracle's online documentation "How Oracle Keeps Records of Database Transactions" (http://oradoc.photo.net/ora816/server.816/a76993/intro.htm#422312). In ARCHIVELOG mode, however, the redo logs are archived before they are re-used, so a full history of the redo information is kept. This allows open database backup (hot backup), time-based recovery (http://oradoc.photo.net/ora816/server.816/a76993/performi.htm#14678) and more.
Standby database technology involves two databases on two separate machines running in ARCHIVELOG mode. The production database archives its redo logs into at least two destinations: its own storage, and the standby machine's storage. These two machines should have a good network connection because the archived redo logs are transferred from the production database to the standby database through Net8, Oracle's networking solution for distributed databases (http://oradoc.photo.net/ora816/network.816/a76933/toc.htm). The standby database is running in ARCHIVELOG mode, too. In addition, it is almost always in recovery status, with the database mounted but not open, allowing the archived redo logs received from the production database to be applied in a timely fashion. It is just like any ARCHIVELOG mode database undergoing whole database recovery. The Oracle8i Backup and Recovery Guide (http://oradoc.photo.net/ora816/server.816/a76993/toc.htm) has more details on this. In short, if you have a good understanding of how Oracle backup and recovery works in ARCHIVELOG mode, it is easy to imagine the role of the standby database. It is like a "clone" of the production database, but in constant recovery mode where redo logs are applied to the database.
The standby database feature first appeared in Oracle7, mainly as a way to quickly clone an existing database. In pre-8i releases, the archived redo log from the production site needed to be transferred to the standby machine manually, or through some automated methods implemented by the database administrator. Its use in fail-over in a single instance environment is greatly improved in Oracle8i, since the archived redo logs are transferred automatically by the database to the standby site through Net8. This is enabled by simply specifying a connect descriptor as one of the archive destinations on the production database's parameter file. A very good, step-by-step instruction on how to implement such a standby database is in an Oracle Magazine article, "Implementing an Automated Standby Database," by Roby Sherman (http://www.oracle.com/oramag/oracle/99-May/39or8i.html).

Implementation
Briefly, the following steps are necessary to build a standby database:
Set up the standby machine and make it closely resemble the production machine. This is very important because if the production machine dies and the database service fails over to the standby system, you want the same performance and behavior from the system.
Prepare the production database for the transfer of datafiles and redo log files:
Place the database is in a restricted mode before starting the whole procedure by issuing
ALTER SYSTEM ENABLE RESTRICTED SESSION;
After this only users with RESTRICTED SESSION privilege (usually DBAs) can connect to the database.
Force the current redo log to be archived, and then create a standby control file:
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER DATABASE CREATE STANDBY CONTROLFILE AS
'/full_path_to/standby_controlfile_name';
Shutdown the production database normally, and copy its datafiles, redo log files and archived redo log files to the corresponding directories of the standby machine.
Modify the initialization parameter file (init<SID>.ora) of the production database so that it can send the archived redo logs to the standby database through Net8. Essentially, this is done through specifying a Net8 connect string as the log archive destination (in addition to any existing local log archiving destinations):
log_archive_dest_2 = "service=stdby"
Here, "stdby" is the Net8 connect string pointing to the standby database. To make sure the connect string is valid, try "sqlplus <username>/<password>@stdby" to connect to the standby database.
Copy the standby control file (created on the production database before shutdown) to the standby machine, also copy the production database's init.ora file to the standby machine for editing:
Make sure the value of control_files contains the full pathnames to the standby control files copied from the production machine.
Comment out log_archive_dest_ entries that pertain to the production database, and include an entry for standby_archive_dest:
standby_archive_dest = /ora8/m01/app/oracle/admin/ora8/stdby_arch
This specifies a location for the standby database to receive the redo logs transferred from the production database.
Configure a log archive destination on the standby database's local filesystem, to be used when the standby database is activated:
log_archive_dest_1 = "location=/ora8/m01/app/oracle/admin/ora8/arch"
Start the standby database:
SQL> CONNECT SYS AS SYSDBA
SQL> STARTUP NOMOUNT PFILE=/full_path_to/initora8.ora
SQL> ALTER DATABASE MOUNT STANDBY DATABASE;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;
Start the production database.
Verify that the implementation is successful and working properly by examining the transferred log files at the directory specified by standby_archive_dest.
Activating the standby database
In case the production database goes down and cannot be fixed and brought up quickly, its standby database can be activated to serve as the new production database (and the original failed production database can be changed to be the new standby database, if the problems are fixed later). The details of activating a standby database are well documented in Oracle's online documentation. Please check the section "Activating a Standby Database" (http://oradoc.photo.net/ora816/server.816/a76995/standbyr.htm#9625) in "Oracle8i Standby Database Concepts and Administration."
An important point to remember is that before actually activating the standby database, always try to get the failed production database's redo log files, including: online redo logs that are not yet archived, and those archived but not yet transferred to the standby database. Try archiving the current online redo log by "ALTER DATABASE ARCHIVE LOG CURRENT," which is not always possible since the whole production machine may be down.

Oracle Parallel Server (OPS)
OPS is an "Oracle technology" that takes advantage of a cluster of computers and delivers superior database performance. Used for high availability and scalability, it has many benefits over single instance environments (http://oradoc.photo.net/ora816/paraserv.816/a76968/psintro.htm#10949). The most noticeable benefit is the ability to add additional nodes (instances) to an existing OPS environment for immediately enhanced performance. The improvements in database availability and scalability are transparent to the application layer -- applications run in a single instance environment will get the same results on OPS, without re-coding.
Normally a database is used by only one instance. OPS uses multiple instances to mount the same database, one on each of the nodes in a cluster. Many machines can be involved, but for the sole purpose of fault tolerant fail-over, usually two machines (two instances) are used. In a two-instance OPS configured for fail-over, only one of them (primary instance) accepts user connections. Once the primary instance fails, the secondary instance takes over. OPS makes zero down time a reality, because the database service will never fail and the data is consistently available. Only the transactions being performed at the failed node need to be resubmitted.
The most comprehensive documentation on all aspects of OPS is Oracle's online books, and there are three in Oracle8i dedicated to OPS:

Oracle8i Parallel Server Concepts (http://oradoc.photo.net/ora816/paraserv.816/a76968/toc.htm)
Oracle8i Parallel Server Setup and Configuration Guide (http://oradoc.photo.net/ora816/paraserv.816/a76934/toc.htm)
Oracle8i Parallel Server Administration, Deployment, and Performance (http://oradoc.photo.net/ora816/paraserv.816/a76970/toc.htm)

Basic OPS concepts
From the database's point of view, Oracle Parallel Server has these components:

Node, a server where an instance resides;
Cluster, a set of interconnected nodes and a disk system shared by all nodes;
Datafiles, data files of the tablespaces, physically residing on the shared disks and shared by all instances;
Controlfiles, the same controlfiles (on the shared disk system) are used to start and run the database.
Although all instances share the same set of datafiles and controlfiles, each instance has its own redo log files. Those files also reside on the shared disk system and readable to all instances. This is necessary so that each node can perform recovery on behalf of another node.
From the hardware side, OPS must have two or more computers linked together by an interconnect, and in order for those computers to share files they must have access to a shared disk subsystem (http://oradoc.photo.net/ora816/paraserv.816/a76968/pshwarch.htm#4270). This is normally implemented through uniform disk access for SMP systems and non-uniform disk access for MPP systems.
The server software of the OPS architecture has these components:

Cluster manager (http://oradoc.photo.net/ora816/paraserv.816/a76968/pssvarch.htm#7629), which provides an overview of the cluster and its nodes, and detects instance failure.
Distributed lock manager (http://oradoc.photo.net/ora816/paraserv.816/a76968/pssvarch.htm#7550), which coordinates the simultaneous access to the shared database.
The cluster interconnect and inter-process communication, the component that OPS relies on in order to run on multiple interconnected machines.
The disk subsystem, necessary for multiple instances to access the same database concurrently.
Finally, the components in an OPS system would look like the following diagram (more than two machines can be involved, though only two machines/instances are represented here):

A diagrammatic overview of OPS
Since OPS operates on a single set of datafiles, storage device failure could be disastrous even if there is not instance failure at the individual nodes. Hardware based mirroring is often used to maintain redundant media.

Two instance mode OPS for high availability
Oracle8i introduced the special two-instance OPS feature for fault tolerant fail-over, designed to achieve high availability of the database service. This is called a basic high availability configuration (http://oradoc.photo.net/ora816/paraserv.816/a76968/pshavdtl.htm#10874). This primary/secondary instance OPS setting can also serve as a transition path to an N-node configuration. The primary/secondary instance feature is enabled by setting the init.ora parameter active_instance_count to 1. Among the two instances, whichever starts up first assumes the role of primary instance, and the other becomes the secondary instance. If the primary instance fails, the cluster manager first detects it, and the secondary instance becomes the primary one. After the failed primary instance is fixed and started up again, it becomes the new secondary instance.
On the client side, the switch of connection to the secondary instance in case of primary instance failure is achieved through Transparent Application Failover (TAF) (http://oradoc.photo.net/ora816/paraserv.816/a76934/chap5.htm). The configuration is written into the tnsnames.ora file, which is used by the client to resolve a database connection descriptor (service name). For example:
ops.arsdigita.com = 
	(description = 
		(load_balance = off)
		(failover = on)
		(address = 
			(protocol = tcp)
			(host = adops1.arsdigita.com)
			(port = 1521)
		)
		(address = 
			(protocol = tcp)
			(host = adops2.arsdigita.com)
			(port = 1521)
		)
		(connect_data =
			(service_name = ops.arsdigita.com)
			(failover_mode =
				(type = select)
				(method = basic)
			)
		)
	)
The connect string "ops.arsdigita.com" is meaningful to Oracle database clients in specifying which database service to connect to. It is usually not a fully qualified domain name (FQDN), although it looks like one. Usually the "ops" part is provided as a valid connect string, such as in "sqlplus <username>/<password>@ops". The "load_balance" flag has to be off because only the primary instance is accepting client connections. By default, "failover" is on and does not need to be explicitly specified. The values for the "host" must be real host names or IP addresses.
On the server side, the listener.ora file needs to be changed so that the Oracle listener obtains database instance information through dynamic service registration (http://oradoc.photo.net/ora816/paraserv.816/a76934/glos.htm#1000215). This is done by simply removing the "SID_LIST_<listener_name>" section of the file. More details of configuring OPS high-availability features can be found from Oracle8i's online documentation (http://oradoc.photo.net/ora816/paraserv.816/a76934/chap5.htm#596946). For more information on Net8 configurations, read "Oracle8i Net8 Administrator's Guide" (http://oradoc.photo.net/ora816/network.816/a76933/toc.htm).

OPS configuration options using SUN Cluster
Oracle Parallel Server requires serious investment in the cluster hardware and software. The servers used in a two-node OPS can range from Sun Enterprise 2, 220R, 450 to 10000, and also the Netra t and t1 series. The available storage subsystems are Sun StorEdge MultiPack, A1000/D1000, A3500, A5x00 and Netra st A1000/D1000, etc. Available storage interconnects include 20MB/sec SCSI, Fast/Wide SCSI, 40MB/sec Ultra SCSI, and 100MB/sec Fibre Channel. Options for clustered nodes interconnect are 100MB/sec ethernet, gigabit ethernet, FDDI, and 1 GB/sec SCI (Scalable Coherent Interface).
On the software side, Oracle Enterprise Edition with Parallel Server Option is certainly needed (Parallel Server Option is licensed separately). Solaris 2.6 or 8 is needed on each of the two nodes, as well as Sun Cluster software licensed for each node. According to Sun customer service, Oracle 8.1.5 OPS is not supported in Solaris 8. The Cluster Volume Manager required for OPS is included in Sun Cluster software.
Calculating the cost of the software licensing in OPS can be tricky. For example, let's say you want to set up two-node OPS for high availability, and each computer has two 450 MHz processors. The "Universal Power Unit" (UPU, Oracle's way to determine the power of your system) is 1,800 (2 computers x 2 processors of each computer x 450 Mhz). Divide that by 30 (one named user license covers 30 UPU), and the number of licenses you need to buy is 60. From Oracle's online store, the "perpetual" price for Oracle8i Enterprise Edition is $750 per licensing unit, for the two-node system $45,000 is needed just for Oracle8i database server. The Parallel Server Option will add an extra $18,000 ($300 for each licensing unit), and the total is $63,000. Depending on how powerful the computers are in the parallel system, plan on around $100,000 for the Oracle software. The Sun Cluster software and operating systems will likely take another $100,000.
The hardware prices are also highly variable depending on actual configurations. OPS runs on a wide range of hardware. In the Solaris systems, one low-end 220R costs under $10,000, while the very high-end Enterprise 10000 server can go above $1,000,000. The "mid-range" Enterprise 4500 is $223,000 (list price as of May 2000). Also keep in mind that the more powerful the hardware, the higher the licensing charge for Oracle server software (as described above).
The costs of disk arrays vary on size and speed. A Netra st A1000/D1000 disk array (provides 36-216 GB) costs around $15,000, while an A3500 disk array (1,092 GB and up) is around $200,000. In short, plan on a total investment of $500,000 - $1,000,000 for a two-node OPS system, using mid-range Sun hardware. It is also possible to implement OPS on Windows NT systems. People have gotten it done at around $250,000, but it is not as stable and reliable.
OPS and standby database combined
Oracle Parallel Server combined with a standby database makes the system even more fault tolerant. Since each instance in OPS has its own set of redo log groups, it transfers its own archived redo logs to the standby database. The standby database can be at a remote location far away from the OPS cluster, but the nodes within an OPS cluster cannot be located too far away from each other.
There is an interesting post recently to the USENET newsgroup comp.database.oracle.server, and I think it provides an excellent case of OPS (plus standby database) in the real world. The conclusions:

OPS is a "proven" technology and has been around for a long time;
Disk device redundancy is very important to OPS;
The range of operation of the clustered nodes is limited (around one mile according to the post), so
OPS alone cannot protect your database service against intercontinental ballistic missiles, but
OPS coupled with standby database can.
---------------------------------------------------
Reply-To: "Bob Fazio" 
From: "Bob Fazio" 
Newsgroups: comp.databases.oracle.server
Subject: Re: Oracle Parallel Server vs Partioning for Standby server

Ditto,

I am using OPS and it has been around since Oracle 6.  We are
currently using it without any problems.  I do suggest though that OPS
not be your only 24x7 / high availability option.  OPS uses a shared
disk technology, and if the disks fail/array fail, then all instances
fail.  You can't use RAID 5, OPS doesn't support it.  You must mirror,
which I would suggest anyways.  Raid5 is just TOOOOOOOO!!!!! slow and
painful.  Suggestion:

4 Systems 4 arrays.  (2 each for the databases.  Mirrored across two
arrays).

2 more running as standby databases at another location.  The building
could always blow up, and OPS only works up to 1 or 2 kilometers.

This may sound like a lot, but in reality, the availability and
performance are much more reliable than the option that your friend
suggested.  The two standby systems, can be used in a pinch for
recovery of tables (no need for exports in a large database).  With 8i
they can be used for reports/read only.

No need for data replication. because everything is available from any
instance in the cluster.  Database integrity can be accomplished with
RI, it can't be done across multiple databases.

--
Robert Fazio, Oracle DBA
rfazio@home.com

asj-editors@arsdigita.com

Reader's Comments

I would be very careful before considering an OPS deployment. The applications where this level of availability is required are usually truly mission-critical business applications, such as a telecommunications company's billing system. Unfortunately, these applications usually also require very high levels of performance, and when the rubber meets the road, two servers running OPS are sometimes slower than a single server because of the lock synchronization overhead between the two machines.
Admittedly, my experience on the matter comes from a project to implement a dual Sun E10000 project with Solaris 2.6 and Oracle 7.3. We ended up using only one node, with the other one as a standby, something that could also be done cheaper with an active/standby cluster configuration (if we were prepared to have downtime equivalent to a database recovery operation). On the plus side, the system did meet the high level of availability and near-instant failover.
You should also be aware that OPS imposes significant constraints on what can or cannot be done in terms of administration or database features, and requires DBA of the very highest caliber to run correctly.
OPS has improved with Oracle 8, and it is now capable of running with the same performance as a single server setup as long as all write transactions are performed against a single node (but reads can be spread among the others).
Moral: OPS is a good solution for databases needing superlative levels of availability. If performance or operational costs are an issue, you should think hard about a "simple" active/standby clustering solution. If you are still considering OPS, you should at the very least ask Oracle to let you visit some of their reference clients, and secure consultants who have had operational experience in deploying OPS.

-- Fazal Majid, October 26, 2000

Fault Tolerant Oracle Installation

Standby database

Overview

Implementation

Activating the standby database

Oracle Parallel Server (OPS)

Basic OPS concepts

Two instance mode OPS for high availability

OPS configuration options using SUN Cluster

OPS and standby database combined

Reader's Comments

Related Links