Patch-ID# 116468-09
Keywords: disk queue rdc remote mirror logical host sndr logging
Synopsis: Availability Suite 3.2 SNDR Patch
Date: Jun/08/2005


Install Requirements: Reboot after installation                      
                      Install in Single User Mode                      
                      
Solaris Release: 8 9

SunOS Release: 5.8 5.9

Unbundled Product: Sun StorEdge Availability Suite

Unbundled Release: 3.2

Xref: 

Topic: 

Relevant Architectures: sparc
NOTE:
After applying patch 116468-03 on both primary and secondary servers and rebooting, you must perform a full synchronization on all Availability Suite Remote Mirror asynchronous sets to ensure the data on the secondary volumes is consistent with the primary data volumes.
For instructions to perform a full synchronization (sndradm -m) refer to Sun StorEdge Availability Suite 3.2 Remote Mirror Software Administration and Operations Guide (817-2784-10).
For configurations where network latency and dataset size make a full synchronization prohibitive, the secondary may be synchronized with the 
primary via the tape based backup/restore coupled with an sndradm -E.
NOTE:
Problem Statement:
In a Sun Cluster OE, when using Remote Mirror in combination with a Point-in-Time Copy to establish a ndr_ii pair for use during auto synchronization, the Point-in-Time Copy set should be preenabled by the system administrator, verses dynamically enabled by the SNDR auto-synchronization daemon. Failure to do so may cause the SNDR configured, Sun Cluster resource group to hang during failover processing.
Please see BugId:5094206 or SRDB:77917 for detailed description
Resolution:
To prevent the Sun Cluster resource group hang, the Point-in-Time Copy set that is to be used by the SNDR synchronization daemon needs to be pre-enabled prior to turning on SNDR`s auto-synchronization (sndradm -a on) and enabling an SNDR ndr_ii pair (sndradm -I a ....).
Repair:
If an existing Sun Cluster configuration containing an SNDR light-weight resource group, with an ndr_ii pair appears to be hung, the Solaris processing running the following script needs to be identified and terminated.
    /usr/opt/SUNWesm/cluster/sbin/reconfig

BugId's fixed with this patch: 4892753 4914957 4930424 4938202 4940318 4942385 4942997 4943413 4950370 4950802 4952176 4952178 4952920 4957445 4962068 4967629 4970042 4974911 4976889 4977645 4981223 4993281 4995602 4997398 5000951 5004765 5007944 5009144 5010349 5013414 5013757 5014238 5014239 5015987 5018806 5022892 5027558 5034369 5037654 5038271 5038552 5040685 5041365 5049952 5050438 5075457 5077630 5086741 6173700 6173736 6204207 6218008 6222650 6223102 6245800 6267284 6276243

Changes incorporated in this version: 6276243

Patches accumulated and obsoleted by this patch: 

Patches which conflict with this patch: 

Patches required with this patch: 116466-06 (or greater)

Obsoleted by: 

Files included with this patch: 

/usr/kernel/drv/rdc-5.8
/usr/kernel/drv/rdc-5.9
/usr/kernel/drv/sparcv9/rdc-5.8
/usr/kernel/drv/sparcv9/rdc-5.9
/usr/kernel/misc/rdcsrv-5.8
/usr/kernel/misc/rdcsrv-5.9
/usr/kernel/misc/sparcv9/rdcsrv-5.8
/usr/kernel/misc/sparcv9/rdcsrv-5.9
/usr/lib/mdb/kvm/rdc.so
/usr/lib/mdb/kvm/sparcv9/rdc.so
/usr/opt/SUNWesm/SUNWrdc/man/man1rdc/sndradm.1m
/usr/opt/SUNWesm/SUNWrdc/sbin/sndradm
/usr/opt/SUNWesm/SUNWrdc/sbin/sndrboot
/usr/opt/SUNWrdc/lib/sndrd-5.8
/usr/opt/SUNWrdc/lib/sndrd-5.9
/usr/opt/SUNWrdc/lib/sndrsyncd
/usr/opt/SUNWscm/lib/librdc.so.1-5.8
/usr/opt/SUNWscm/lib/librdc.so.1-5.9

Problem Description:

6276243 AS3.2 + latest patches: SNDR with disk queue, scswitch -z -g hangs.
 
(from 116468-08)
 
6267284 AVS3.2 + 116466-06, 116467-07, 116468-07 - SNDR suspend code for disk queue hangs.
 
(from 116468-07)
 
6245800 nskernd ( looping in _rdc_sync ) consumes excessive cpu cycles during sndr update sync
6223102 AVS3.2 latest patches on SC31u4: sndradm -P hang, ii_boot resume failed
6222650 writes in disk queue does not get applied to the secondary when in REP state
6218008 AS3.2 + latest patches: SNDR with disk queue + ndrii, scswitch -z -g hangs.
 
(from 116468-06)
 
6173736 SNDR 3.2 - Notice of pending IOs printed at system shutdown
6204207 failed diskq disk hangs mount on boot
 
(from 116468-05)
 
5075457 synchronous writes should happen until all members of group are done syncing
5086741 corrupted dscfg configuration database panic-ed solaris
6173700 sndradm -B dumps core
 
(from 116468-04)
 
4976889 unable to delete SNDR set when logical host can't be found
5022892 enhance sndradm ds.log entries for TUNABLES and HEALTH
5027558 sndradm man page missing -R r (role reverse) usage and description
5034369 sndradm (-u) (-m) entries missing from ds.log
5037654 sndr dropped into logging with almost empty queue
5040685 deleting an ndr_ii config entry via sndradm -I d is not recorded in ds.log file
5049952 sndradm -h set: usage statement missing diskq parameter
5050438 sndradm -C not checking validity of cluster tag when adding disk queue to set
5077630 Deadlocks when {sndr/ii/sv}adm and {sndr/ii/sv}boot are invoked in Sun Cluster
 
(from 116468-03)
 
4940318 Add logic to support the use of aliases for host or logical host
5010349 sndr bitmaps in one to many not getting updated
5013414 failed enable of a sync set with a disk q not atomic
5013757 diskq block/noblock operations not reported in ds.log
5014238 sndr should dump diskq if queue is full + link down
5014239 sndradm man page needs info on queuing state
5015987 update sync of async sets can drop network writes leaving secondary out of sync
5018806 cmn_err() needed when ref count is maxed out
5038271 diskq failure causes application to hang
5038552 disk queue not getting written when queueing
5041365 SNDR 3.2 Unit tests fail (GroupOrderedWrites)
 
(from 116468-02)
 
4914957 lock contention for disk queues limit performance
4930424 enabling sndr with a diskqueue of 1TB or greater should fail
4938202 sndradm can be very slow when enabling more than 1500 RM sets
4942385 Long volume names cause warning messages to be cut off
4942997 sndr: sndradm unknown host:vol printed in ds.log
4943413 cluster failover during reverse sync makes mounted volume unusable
4950370 sndradm -A #threads[sndr-set] fails to report # to /var/opt/SUNWesm/ds.log
4950802 sndr bitmap count does not show that bits are set until sync or reboot
4952178 misleading disable message on timeout
4952176 iokstats broken
4952920 NHAS bitmap api can panic with 8k bitmaps
4957445 r_net_writeN should negative ack if secondary is logging
4967629 rdc_error_str is local, should be global
4970042 BAD TRAP: panic AVS 3.2 patch testing
4974911 sndradm help output missing a space for diskq removal
4977645 sndradm -e fails on 2'nd logical host
4981223 sndr async mode with many sets sharing a disk queue eats up cpu
4892753 flusher get stuck with diskq set to blocking mode and heavy I/O
4993281 Availability Suite 3.2 using sndr causes system hangs
4995602 double dec in _rdc_remote_flush() can access freed mem
4997398 failure removing diskq from multiple resource groups in SunCluster
5004765 writes to RM vols with full diskq causes incoming threads to be block
5000951 _rdc_async_throttle needs to print disk queue full message
5007944 Data replciation on middle hop of multihop config fails due to overlapping i/o
5009144 one to many with diskq and memory queue may not queue
 
(from 116468-01)
 
4962068 disk queue upgrade results in 'WARNING: disk queue <name> alloc failed(28)'

Patch Installation Instructions:
----------------------------- 
Since this patch updates modules that live in the kernel, it is necessary for
the user to boot the system up in single user mode to apply the patch and then
reboot the system.

Special Install Instructions:
--------------------------

None.

README -- Last modified date:  Wednesday, June 8, 2005

