Patch-ID# 104452-05 Keywords: solstice ha high availability HA 1.2 SUNWhagen patch Synopsis: Solstice HA 1.2: SUNWhagen Patch Date: Jun/09/98 Solaris Release: 2.5.1 SunOS Release: 5.5.1 Unbundled Product: Solstice High Availability Unbundled Release: 1.2 Relevant Architectures: sparc BugId's fixed with this patch: 4080543 4061900 4013636 4026400 4058087 Changes incorporated in this version: 4080543 Patches accumulated and obsoleted by this patch: 104719-02 Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /etc/opt/SUNWhadf/hadf/hafmconfig /opt/SUNWhadf/clust_progs/callmethod /opt/SUNWhadf/fault_progs/net_pingnet /opt/SUNWhadf/fault_progs/ha_dbms_call /opt/SUNWhadf/fault_progs/ha_dbms_serv Problem Description: 4080543: Fault monitors for database services are being launched before the start_net method for those monitored services even got started, this causes undesirable restart/takeover conditions. (From rev 04) 4061900: HA 1.3 does not fail over when public network is pulled. During a public net failure, a race condition between the faultd abort thread and the cmm abort transition thread exist when the cluster is running yp, this results in a race by both threads to grab the diskset and the following failure results: "ERROR: tkown_disks: metaset -s -f -t failed: metaset: : /etc/opt/SUNWmd/lock.1: Resource temporarily unavailable." By setting the faultd abort timeout longer than clustd, we are favoring clustd, ensuring it wins the race. (From rev 03) 4026400: This problem turned out to be a change in behavior in response to broadcast IP pings. Basicly, the number of responses that this host itself will generate in response to a broadcast ping from itself is for each physical controller, 1 + the number of logical network interfaces currently ifconfig'ed up on that physical controller. 4058087: CMM calls processor_bind() only on 'sun4d' arch, subsequently all its childs inherit the same cpu affinity, this behavior negatively impacts any program started from cluster transition. (From rev 02) Includes copyright file which was omitted from revision 01. (From rev 01) 4013636: layered Data Services run in Real Time priority breaking Solaris paging Solstice HA runs its upcalled methods, which fire up layered Data Services, in RealTime (RT) scheduling class. This effectively means e.g. that all of Oracle runs in Real Time class. The effect of this is that certain Solaris housekeeping processes never get to run, notably, fsflush, which flushes pages out of main memory to disk. With this patch, all data service methods (and their child processes) will run in Timeshare (TS) scheduling class; some Solstice HA control processes will still run in Real Time class. Patch Installation Instructions: -------------------------------- Refer to the Install.info file for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. Any other special or non-generic installation instructions should be described below as special instructions. Special Install Instructions: ----------------------------- To install this patch, follow the steps below: 1. Stop HA on the local host (the server to be patched first). # hastop 2. Run installpatch to install the patch on the local host. 3. Start HA on the local host. # hastart 4. Repeat from step 1 on the sibling host. 5. Switch data service back to sibling host. # haswitch phys-hahost2 hahost2 There are no known problems with *initial* installation of this patch. However, if you install this patch using installpatch, then remove it using backoutpatch, and then install it once again using installpatch, you may see error messages like the following from installpatch: ... Generating list of files to be patched... ./installpatch[77]: syntax error at line 6 : `'' unmatched mv: cannot access /tmp/resolvedfiles.5285 Verifying sufficient filesystem capacity (exhaustive method)... ... It appears that the patch still installs correctly after these error messages are output. For current status on this problem, reference Sun bugid 4022870.