Fault Tolerant Oracle Installation
by Xuequn Xu (firstname.lastname@example.org)
Submitted on: 2000-06-03
Last updated: 2000-06-19
ArsDigita : ArsDigita Systems Journal : One article
An Oracle installation is fault tolerant if it can survive the failure
of a major piece of hardware -- an entire computer, for example -- and
continue to operate without significant data loss or downtime. There
are several ways to achieve fault tolerance, each with a different
price tag and each offering different degrees of reliability. The most
reliable (and, not surprisingly, the most expensive) of these methods
is Oracle Parallel Server (OPS), in which multiple computers each
running a separate Oracle instance access the same database.
OPS offers transparent failover; the only thing a user would
notice in the event of a failure would be rollback of any transactions
in progress when the system failed. In addition to high availability,
OPS has benefits such as scalability and high performance. For less
critical services with smaller budgets, a standby database -- a second
computer that perpetually loads the production database's archived
redo logs -- is a good choice. Although human intervention is usually
needed to activate a standby database, guaranteeing a downtime of at
least a few minutes in a failure, it is a much less expensive option
and much easier to set up and maintain than OPS. In addition, in
Oracle8i, the standby database itself can be opened as a read-only
database for reporting.
Below, I will describe how to set up a standby database system and an
OPS installation for high availability. I will also take a close look
at the hardware necessary to run OPS and look at the costs and
benefits of some common hardware configurations. Finally, note that
are at least three ways other than OPS and standby database to make an
Oracle installation more reliable than a generic one-computer setup,
including Replication and OS-based methods. Thomas Kyte runs down the
pros and cons of all three, plus the two described above, in his
article "Oracle Availability Options" in Oracle Magazine (http://www.oracle.com/oramag/oracle/00-May/o30tom.html).
A standby database is one that serves as a
dynamic backup of the production database ready to take over and become
the production database when required. For that purpose, the production
database is running in ARCHIVELOG mode, and the archived redo logs are
periodically applied to the stand-by database. In order to apply the
archived logs from the production database, the stand-by database is
kept in recovery mode; it is readonly to the database users.
Using a standby
database is the best fault tolerant solution in a single instance
environment, and its operation has a lot to do with Oracle's backup
and recovery mechanisms. See Oracle's online documentation, "Oracle8i
Standby Database Concepts and Administration"
for more information.
To understand how standby database works, we need to talk a little
about Oracle's log archiving mechanism. Any Oracle database operates
in one of two modes: NOARCHIVELOG or ARCHIVELOG. Redo logs are
re-used and re-written in a cyclic manner. In NOARCHIVELOG mode, the
redo log files are not archived before being re-used. If there are 3
redo log groups, and the log files are written to group 1, then group
2, then group 3, when the last group (group 3) is fully written the
database writes redo information to group 1 again, overwriting its
previous content. As a result, for a database operating in
NOARCHIVELOG mode, it can only "rescue" the amount of transactions
that are kept in the redo logs in case of instance failure. For a more
detailed description, see Oracle's online documentation "How Oracle
Keeps Records of Database Transactions" (http://oradoc.photo.net/ora816/server.816/a76993/intro.htm#422312).
In ARCHIVELOG mode, however, the redo logs are archived before they
are re-used, so a full history of the redo information is kept. This
allows open database backup (hot backup), time-based recovery (http://oradoc.photo.net/ora816/server.816/a76993/performi.htm#14678)
Standby database technology involves two databases on two separate
machines running in ARCHIVELOG mode. The production database archives
its redo logs into at least two destinations: its own storage,
and the standby machine's storage. These two machines should have a
good network connection because the archived redo logs are
transferred from the production database to the standby database through
Net8, Oracle's networking solution for distributed databases (http://oradoc.photo.net/ora816/network.816/a76933/toc.htm).
The standby database is running in ARCHIVELOG mode, too. In addition,
it is almost always in recovery status, with the database mounted but not
open, allowing the archived redo logs received from the production
database to be applied in a timely fashion. It is just like any
ARCHIVELOG mode database undergoing whole database recovery. The
Oracle8i Backup and Recovery Guide (http://oradoc.photo.net/ora816/server.816/a76993/toc.htm)
has more details on this. In short, if you have a good understanding
of how Oracle backup and recovery works in ARCHIVELOG mode, it is easy
to imagine the role of the standby database. It is like a "clone" of
the production database, but in constant recovery mode where redo
logs are applied to the database.
The standby database feature first appeared in Oracle7, mainly as a way to
quickly clone an existing database. In pre-8i releases, the archived
redo log from the production site needed to be transferred to the standby
machine manually, or through some automated methods implemented by the
database administrator. Its use in fail-over in a single
instance environment is greatly improved in Oracle8i, since the
archived redo logs are transferred automatically by the database to
the standby site through Net8. This is enabled by simply specifying a
connect descriptor as one of the archive destinations on the production
database's parameter file. A very good, step-by-step instruction on
how to implement such a standby database is in an Oracle Magazine article,
"Implementing an Automated Standby Database," by Roby Sherman
Briefly, the following steps are necessary to build a standby
- Set up the standby machine and make it closely resemble the
production machine. This is very important because if the production machine
dies and the database service fails over to the standby system, you
want the same performance and behavior from the system.
- Prepare the production database for the transfer of datafiles and
redo log files:
- Place the database is in a restricted mode before starting
the whole procedure by issuing
ALTER SYSTEM ENABLE RESTRICTED SESSION;
After this only users with RESTRICTED SESSION privilege (usually DBAs)
can connect to the database.
- Force the current redo log to be archived, and then create
a standby control file:
ALTER SYSTEM ARCHIVE LOG CURRENT;
ALTER DATABASE CREATE STANDBY CONTROLFILE AS
- Shutdown the production database normally, and copy its
datafiles, redo log files and archived redo log files to the
corresponding directories of the standby machine.
Modify the initialization parameter file (init<SID>.ora) of the
production database so that it can send the archived redo logs to the
standby database through Net8. Essentially, this is done through
specifying a Net8 connect string as the log archive destination (in
addition to any existing local log archiving destinations):
log_archive_dest_2 = "service=stdby"
Here, "stdby" is the Net8 connect string pointing to the standby
database. To make sure the connect string is valid, try "sqlplus
<username>/<password>@stdby" to connect to the standby database.
Copy the standby control file (created on the production database before
shutdown) to the standby machine, also copy the production database's
init.ora file to the standby machine for editing:
- Start the standby database:
SQL> CONNECT SYS AS SYSDBA
SQL> STARTUP NOMOUNT PFILE=/full_path_to/initora8.ora
SQL> ALTER DATABASE MOUNT STANDBY DATABASE;
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;
- Start the production database.
- Verify that the implementation is successful and
working properly by examining the transferred log files at the
directory specified by standby_archive_dest.
Activating the standby database
In case the production
database goes down and cannot be fixed and brought up quickly, its
standby database can be activated to serve as the new production
database (and the original failed production database can be changed
to be the new standby database, if the problems are fixed later). The
details of activating a standby database are well documented in
Oracle's online documentation. Please check the section "Activating a
Standby Database" (http://oradoc.photo.net/ora816/server.816/a76995/standbyr.htm#9625)
in "Oracle8i Standby Database Concepts and Administration."
An important point to remember is that before actually activating the
standby database, always try to get the failed production database's
redo log files, including: online redo logs that are not yet
archived, and those archived but not yet transferred to the standby
database. Try archiving the current online redo log by "ALTER
DATABASE ARCHIVE LOG CURRENT," which is not always possible since the
whole production machine may be down.
Oracle Parallel Server (OPS)
OPS is an "Oracle technology" that takes advantage of a cluster of
computers and delivers superior database performance. Used
for high availability and scalability, it has many benefits over
single instance environments (http://oradoc.photo.net/ora816/paraserv.816/a76968/psintro.htm#10949).
The most noticeable benefit is the ability to add additional nodes
(instances) to an existing OPS environment
for immediately enhanced performance. The improvements in database
availability and scalability are transparent to the application layer --
applications run in a single instance environment will get the same
results on OPS, without re-coding.
Normally a database is used by only one
instance. OPS uses multiple instances to mount the same database,
one on each of the nodes in a cluster.
Many machines can be involved,
but for the sole purpose of fault tolerant fail-over, usually two
machines (two instances) are used. In a two-instance OPS configured
for fail-over, only one of them (primary instance) accepts user
connections. Once the primary instance fails, the
secondary instance takes over. OPS
makes zero down time a reality, because the database service
will never fail and the data is consistently available.
Only the transactions being performed at the failed node need to be
The most comprehensive documentation on all aspects of OPS is Oracle's
online books, and there are three in Oracle8i dedicated to OPS:
Basic OPS concepts
From the database's point of view, Oracle Parallel Server has these
Although all instances share the same set of datafiles and
controlfiles, each instance has its own redo log files. Those files
also reside on the shared disk system and readable to all
instances. This is necessary so that each node can perform recovery on
behalf of another node.
- Node, a server where an instance resides;
- Cluster, a set of interconnected nodes and a disk system shared by
- Datafiles, data files of the tablespaces, physically residing on the
shared disks and shared by all instances;
- Controlfiles, the same controlfiles (on the shared disk system)
are used to start and run the database.
From the hardware side, OPS must have two or more computers linked
together by an interconnect, and in order for those computers to
share files they must have access to a shared disk subsystem (http://oradoc.photo.net/ora816/paraserv.816/a76968/pshwarch.htm#4270).
is normally implemented through uniform disk access
for SMP systems
and non-uniform disk access
for MPP systems.
The server software of the OPS architecture has these components:
Finally, the components in an OPS system would look like the following
diagram (more than two machines can be involved, though
only two machines/instances are represented here):
A diagrammatic overview of OPS
Since OPS operates on a single set of datafiles, storage device
failure could be disastrous even if there is not instance
failure at the individual nodes. Hardware based mirroring is often used
to maintain redundant media.
Two instance mode OPS for high availability
Oracle8i introduced the special two-instance OPS feature for fault tolerant
fail-over, designed to achieve high availability of the database service.
This is called a basic high availability configuration (http://oradoc.photo.net/ora816/paraserv.816/a76968/pshavdtl.htm#10874).
This primary/secondary instance OPS setting can also serve as a
transition path to an N-node configuration. The primary/secondary
instance feature is enabled by setting the init.ora parameter
active_instance_count to 1. Among the two instances,
whichever starts up first assumes the role of primary instance, and the
other becomes the secondary instance. If the primary instance fails,
the cluster manager first detects it, and the secondary instance
becomes the primary one. After the failed primary instance is fixed
and started up again, it becomes the new secondary instance.
On the client side, the switch of connection to the secondary instance
in case of primary instance failure is achieved through Transparent
Application Failover (TAF) (http://oradoc.photo.net/ora816/paraserv.816/a76934/chap5.htm).
The configuration is written into the tnsnames.ora file, which is used
by the client to resolve a database connection descriptor (service
name). For example:
(load_balance = off)
(failover = on)
(protocol = tcp)
(host = adops1.arsdigita.com)
(port = 1521)
(protocol = tcp)
(host = adops2.arsdigita.com)
(port = 1521)
(service_name = ops.arsdigita.com)
(type = select)
(method = basic)
The connect string "ops.arsdigita.com" is meaningful to Oracle
database clients in specifying which database service to connect to. It
is usually not a fully qualified domain name (FQDN), although it
looks like one. Usually the "ops" part is provided as a valid connect
string, such as in "sqlplus
The "load_balance" flag has to be
off because only the primary instance is accepting client
connections. By default, "failover" is on and does not need to be
explicitly specified. The values for the "host" must be real host
names or IP addresses.
On the server side, the listener.ora file needs to be changed so that the
Oracle listener obtains database instance information through dynamic
service registration (http://oradoc.photo.net/ora816/paraserv.816/a76934/glos.htm#1000215).
This is done by simply removing the "SID_LIST_<listener_name>"
section of the file. More details of configuring OPS
high-availability features can be found from Oracle8i's online
For more information on Net8 configurations, read "Oracle8i Net8
Administrator's Guide" (http://oradoc.photo.net/ora816/network.816/a76933/toc.htm).
OPS configuration options using SUN Cluster
Oracle Parallel Server requires serious investment in the cluster
hardware and software. The servers used in a two-node OPS can range
from Sun Enterprise 2, 220R, 450 to 10000, and also the Netra t and t1 series. The
available storage subsystems are Sun StorEdge MultiPack,
A1000/D1000, A3500, A5x00 and Netra st A1000/D1000, etc. Available storage interconnects
include 20MB/sec SCSI, Fast/Wide SCSI, 40MB/sec Ultra SCSI, and 100MB/sec
Fibre Channel. Options for clustered nodes interconnect are 100MB/sec
ethernet, gigabit ethernet, FDDI, and 1 GB/sec SCI (Scalable Coherent
On the software side, Oracle Enterprise Edition with Parallel Server
Option is certainly needed (Parallel Server Option is licensed
separately). Solaris 2.6 or 8 is needed on each of the two nodes, as
well as Sun Cluster software licensed for each node. According to Sun
customer service, Oracle 8.1.5 OPS is not supported in Solaris 8. The
Cluster Volume Manager required for OPS is included in Sun Cluster
Calculating the cost of the software licensing in OPS can be
tricky. For example, let's say you want to set up two-node OPS for
high availability, and each computer has two 450 MHz processors. The
"Universal Power Unit" (UPU, Oracle's way to determine the power of
your system) is 1,800 (2 computers x 2 processors of each computer x
450 Mhz). Divide that by 30 (one named user license covers 30 UPU),
and the number of licenses you need to buy is 60. From Oracle's online
store, the "perpetual" price for Oracle8i Enterprise Edition is $750
per licensing unit, for the two-node system $45,000 is needed just for
Oracle8i database server. The Parallel Server Option will add an extra
$18,000 ($300 for each licensing unit), and the total is
$63,000. Depending on how powerful the computers are in the parallel
system, plan on around $100,000 for the Oracle software. The Sun
Cluster software and operating systems will likely take another
The hardware prices are also highly variable depending
on actual configurations. OPS runs on a wide range of hardware. In the
Solaris systems, one low-end 220R costs under $10,000, while the very
high-end Enterprise 10000 server can go above $1,000,000. The
"mid-range" Enterprise 4500 is $223,000 (list price as of May 2000).
Also keep in mind that the more powerful the hardware, the higher the
licensing charge for Oracle server software (as described above).
The costs of disk arrays vary on size and speed. A Netra st A1000/D1000
disk array (provides 36-216 GB) costs around $15,000, while an A3500
disk array (1,092 GB and up) is around $200,000. In short, plan on a
total investment of $500,000 - $1,000,000 for a two-node OPS system, using
mid-range Sun hardware. It is also possible to implement OPS on
Windows NT systems. People have gotten it done at around $250,000,
but it is not as stable and reliable.
OPS and standby database combined
Oracle Parallel Server combined with a standby database makes the
system even more fault tolerant. Since each instance in OPS has its
own set of redo log groups, it transfers its own archived redo logs to
the standby database. The standby database can be at a remote location
far away from the OPS cluster, but the nodes within an OPS cluster
cannot be located too far away from each other.
There is an interesting post recently to the USENET newsgroup comp.database.oracle.server, and I
think it provides an excellent case of OPS (plus standby database) in the real world. The conclusions:
- OPS is a "proven" technology and has been around for a long time;
- Disk device redundancy is very important to OPS;
- The range of operation of the clustered nodes is limited (around
one mile according to the post), so
- OPS alone cannot protect your database service against
intercontinental ballistic missiles, but
- OPS coupled with standby database can.
Reply-To: "Bob Fazio"
From: "Bob Fazio"
Subject: Re: Oracle Parallel Server vs Partioning for Standby server
I am using OPS and it has been around since Oracle 6. We are
currently using it without any problems. I do suggest though that OPS
not be your only 24x7 / high availability option. OPS uses a shared
disk technology, and if the disks fail/array fail, then all instances
fail. You can't use RAID 5, OPS doesn't support it. You must mirror,
which I would suggest anyways. Raid5 is just TOOOOOOOO!!!!! slow and
4 Systems 4 arrays. (2 each for the databases. Mirrored across two
2 more running as standby databases at another location. The building
could always blow up, and OPS only works up to 1 or 2 kilometers.
This may sound like a lot, but in reality, the availability and
performance are much more reliable than the option that your friend
suggested. The two standby systems, can be used in a pinch for
recovery of tables (no need for exports in a large database). With 8i
they can be used for reports/read only.
No need for data replication. because everything is available from any
instance in the cluster. Database integrity can be accomplished with
RI, it can't be done across multiple databases.
Robert Fazio, Oracle DBA
I would be very careful before considering an OPS deployment. The applications where this level of availability is required are usually truly mission-critical business applications, such as a telecommunications company's billing system. Unfortunately, these applications usually also require very high levels of performance, and when the rubber meets the road, two servers running OPS are sometimes slower than a single server because of the lock synchronization overhead between the two machines.
Admittedly, my experience on the matter comes from a project to implement a dual Sun E10000 project with Solaris 2.6 and Oracle 7.3. We ended up using only one node, with the other one as a standby, something that could also be done cheaper with an active/standby cluster configuration (if we were prepared to have downtime equivalent to a database recovery operation). On the plus side, the system did meet the high level of availability and near-instant failover.
You should also be aware that OPS imposes significant constraints on what can or cannot be done in terms of administration or database features, and requires DBA of the very highest caliber to run correctly.
OPS has improved with Oracle 8, and it is now capable of running with the same performance as a single server setup as long as all write transactions are performed against a single node (but reads can be spread among the others).
Moral: OPS is a good solution for databases needing superlative levels of availability. If performance or operational costs are an issue, you should think hard about a "simple" active/standby clustering solution. If you are still considering OPS, you should at the very least ask Oracle to let you visit some of their reference clients, and secure consultants who have had operational experience in deploying OPS.
-- Fazal Majid, October 26, 2000
- Oracle9i Real Application Clusters Concepts- OPS becomes "RAC" (Real Application Clusters) in Oracle 9i. It brings much change/improvement to the design and implementation of fault tolerant database server installation. (contributed by Xuequn (Robert) Xu)
- bikle.com Managed Recovery Demo- This page demonstrates a few ideas and techniques related to setting
up a standby database. Also we will demonstrate some methods for
moving the standby database through a variety of states: Manual
Recovery Mode, Managed Recovery Mode, Open Read-Only Mode, and
finally, Fail Over Mode.
(contributed by Dan Bikle)