Patch-ID# 108475-03
Keywords: VxFS 3.3.2
Synopsis: Veritas File Systems 3.3.2: VxFS 3.3.2patch02
Date: Feb/07/2001

Solaris Release: 7

SunOS Release: 5.7

Unbundled Product: Veritas VxFS

Unbundled Release: 3.3.2

Xref: 

Topic: VxFS 3.3.2 Multiple fixes patch

Relevant Architectures: sparc

BugId's fixed with this patch: 4370959
NOTE: VERITAS incident ID''s fixed with this patch: 26934 33670 34381 34901 35546 37373 38070 38578 39067 39068 39069 39587 41052 41685 42612 27294 30340 31636 31948 31957 32226 32652 33143 33246 33299 33408 33629

Changes incorporated in this version: 

Patches accumulated and obsoleted by this patch: 

Patches which conflict with this patch: 

Patches required with this patch: 107171-04 or greater

Obsoleted by: 

Files included with this patch: 

/etc/fs/vxfs/mount
/kernel/drv/fdd
/kernel/drv/sparcv9/fdd
/kernel/drv/sparcv9/vxportal
/kernel/drv/vxportal
/kernel/fs/sparcv9/vxfs
/kernel/fs/vxfs
/usr/lib/fs/vxfs/bin/cp
/usr/lib/fs/vxfs/bin/ln
/usr/lib/fs/vxfs/bin/mv
/usr/lib/fs/vxfs/df
/usr/lib/fs/vxfs/fsadm
/usr/lib/fs/vxfs/fsck
/usr/lib/fs/vxfs/fstyp
/usr/lib/fs/vxfs/mkfs
/usr/lib/fs/vxfs/ncheck
/usr/lib/fs/vxfs/vxdump
/usr/lib/fs/vxfs/vxquotaoff
/usr/lib/fs/vxfs/vxquotaon
/usr/lib/fs/vxfs/vxrestore
/usr/lib/fs/vxfs/vxupgrade
/usr/sbin/qioadmin
/usr/sbin/qiomkfile
/usr/share/man/man1/cpio_vxfs.1
/usr/share/man/man1/mv_vxfs.1
/usr/share/man/man1/qioadmin.1
/usr/share/man/man1/qiostat.1
/usr/share/man/man1m/df_vxfs.1m
/usr/share/man/man1m/ff_vxfs.1m
/usr/share/man/man1m/fsadm_vxfs.1m
/usr/share/man/man1m/fsck_vxfs.1m
/usr/share/man/man1m/fsdb_vxfs.1m
/usr/share/man/man1m/mount_vxfs.1m
/usr/share/man/man1m/vxdump.1m
/usr/share/man/man1m/vxrestore.1m
/usr/share/man/man1m/vxtunefs.1m
/usr/share/man/man1m/vxupgrade.1m
/usr/share/man/man4/fs_vxfs.4
/usr/share/man/man4/inode_vxfs.4
/usr/share/man/man7/vxfsio.7

Problem Description:

(26934) Multi-threaded sequential reads were very slow.
(33670) The fsck command could not perform a log replay.
(34381) The fsck command went into an infinite loop.
(34901) System panicked trying to access attributes on bad inodes.
(35546) The qioadmin command could not correctly parse the configuration file.
(37373) Data corruption occurred when VxFS accessed memory maped files.
(38070) The "cp -e" command could not copy files larger than two gigabytes.
(38578) The mount and mkfs commands were taking a long time to execute.
(39067) fsck replay did not completely clean corrupted file systems.
(39068) A full file system check (fsck -n) was not detecting DIRTY file systems.
(39069) File system was corrupted after shrinking, and fsck could not repair it.
(39587) The fsck command marked some inodes as being sparse when they were not.
(41052) The qioadmin(1) man page required updated information.
(41685) Trying to shrink a file system, the underlying volume successfully shrunk, but the file system did not.
(42612) System panics occurred due to incorrect calculation of page list array.
 
(from 108475-02)
 
4370959 Vxfs 3.3.2 patches change the owner of vxdump/vxrestore binaries
 
 
(from 108475-01)
 
(27294) NFS performance problems due to slow synchronous putpage.
(30340) The fsck utility executed very slowly.
(31636) fsck--au_state_set corrupted memory and failed to set the new state for an allocation unit (AU).
(31948) The fsck and fsdb commands assumed that the IAU header and summary were contiguous.
(31957) The vxdump utility used the time stored in /etc/dumpdates directory instead of the time snapshot.
(32226) A race condition occurred between the iremove and vx_hsm_iptohandle functions.
(32652) A panic occurred while trying to free a non-existing extent.
(33143) VxFS caused the segvn_create function to ignore MAP_NORESERVE.
(33246) Inodes became invalid after doing an unclean shutdown subsequent to growing the file system.
(33299) The fsck utility dumped core on bad d_reclen in a corrupt directory block.
(33408) VxFS 3.3.2 hang--the umountall thread blocked on clone removal.
(33629) The getacl command did not work correctly.

Patch Installation Instructions:
--------------------------------
For Solaris 2.0-2.6 releases, refer to the Install.info file and/or
the README within the patch for instructions on using the generic
'installpatch' and 'backoutpatch' scripts provided with each patch.
 
For Solaris 7 release, refer to the man pages for instructions on
using 'patchadd' and 'patchrm' scripts provided with Solaris.
Any other special or non-generic installation instructions should be
described below as special instructions.  The following example
installs a patch to a standalone machine:
 
       example# patchadd /var/spool/patch/104945-02
 
The following example removes a patch from a standalone system:
 
       example# patchrm 104945-02
 
For additional examples please see the appropriate man pages.

Special Install Instructions:
-----------------------------
This patch requires that the SUN patchid 107171-04 be installed on the
target system before installing this patch.
 
 
Additional Notes:
-----------------
The VxFS 3.3.2patch02 consists of fixes made since the VxFS 3.3.2 GA
release. This patch release also contains all the fixes from the VxFS
3.3.2patch01 release (see the 3.3.2patch01 contents, included below, for
a description of the incidents), so you can upgrade your VxFS 3.3.2
software without installing VxFS 3.3.2patch01 first.
 
This patch can be applied only to VxFS Release 3.3.2. If you have an
earlier release of VxFS installed, upgrade to 3.3.2 before applying
this patch.
 
If you plan to use the VERITAS File System with the Quick I/O for
Databases feature, both the VRTSvxfs and VRTSqio packages must be
installed before installing this patch.
 
If this patch is installed when only VRTSvxfs is installed, and
you later want to use the Quick I/O feature, first remove this patch,
install the VRTSqio package, then reinstall this patch.
 
 
Patch 108475-02 contents:
There were 15 major VxFS escalated incidents. They are described below.
 
1) VERITAS Incident 26934
 
	VxFS showed poor performance when doing multithreaded reads on
	the same file.  Code changes significantly improved throughput
	over single-threaded reads.
 
2) VERITAS Incident 33670
 
	While doing a file truncation operation, a panic occurred because
	VxFS failed to detect invalid indirect address extents and mark 
	the inode BAD.
 
3) VERITAS Incident 34381
 
	The fsck command went into an infinite loop after being run on
	a file system because it was not correctly validating the FS
	structural inodes. The file system could not be repaired or
	remounted afterward.
 
4) VERITAS Incident 34901
 
	After a disk error, the VxFS validation check on inodes read
	from the disk failed, marking the inodes BAD. When an inode
	became inactive (all holds on the inode released), the VxFS
	inactive processing code should no longer have relied on the
	inactive inode's data.  However, it still referenced the
	attribute area of the BAD inode, causing invalid accesses 
	to unrelated inodes and subsequent panics.
 
5) VERITAS Incident 35546
 
	The qioadmin command did not function as documented.
	mount_point was made an optional argument, which when specified
	with the -s option, selects a device and associated files from
	the configuration file.
 
	Specifying the -s option without specifying a mount_point
	selects all the devices and associated files in the
	configuration file.
 
	Without the -s option, mount_point is simply a prefix to a
	relative pathname.
 
6) VERITAS Incident 37373
 
	Data corruption occurred on memory mapped files because VxFS
	inadvertently zeroed the end of the file when it was not a
	fragment size multiple of 1024.
 
7) VERITAS Incident 38070 
 
	The "cp -e force" command could not copy files larger than two 
	gigabytes. The command failed with an "invalid argument" message.
 
8) VERITAS Incident 38578 
 
	The mount and mkfs commands were slow because VxFS was doing a
	hardware check with the prtconf command. The sysinfo() system
	call was substituted, improving execution times.
 
9) VERITAS Incident 39067
 
	After a crash and an apparently successful fsck, a file system
	was remounted, but inodes were marked BAD and error messages
	began appearing in the system log (/var/adm/messages).  This
	problem occurred on systems that crashed after a resize or
	reorg because VxFS was not adequately processing extended
	operations.
 
10) VERITAS Incident 39068
 
	Full file system checks were not detecting DIRTY file systems.
	This processing failure lead to corruption on file systems that
	required extended operations, for example, those that were
	resized.
 
11) VERITAS Incident 39069
 
	fsck could not repair a resized file system because the free
	extent map was corrupted, and fsck cannot rebuild free extent
	bitmap inodes. fsck will now indicate that the file system is
	unfixable if it encounters this kind of corruption.
 
12) VERITAS Incident 39587
 
	In some situations, growing a very full file system for
	example, a partially a full indirect address extent was
	created.  If this single indirect address extent split into
	two, one extent could be empty. A subsequent fsck replay
	interpreted the inode as being sparse and marked it bad,
	requiring a full fsck.
 
13) VERITAS Incident 41052
 
        The qioadmin(1) man page was updated with information on the
        initial state of Cached Quick I/O on file systems, and
        information on on how Cached Quick I/O is enabled with the
        vxtunefs command.  Also, a description of the interaction of
        the mount_point option with the configuration file was added.
	See Incident 35546.
 
14) VERITAS Incident 41685
 
        When trying to shrink a file system, the underlying volume
        successfully shrunk, but the file system did not. This was
        because fsadm was comparing the file mode bits with an
        incorrect constant on directories with the setuid bit set.
 
15) VERITAS Incident 42612
 
	The vx_write_alloc() function was incorrectly calculating
	the size of the page list array, causing a system panic.
 
 
Patch 108475-01 contents:
 
1) VERITAS Incident 27294
 
   This fix restores about 75% of the performance loss in configurations
   that were used by VERITAS to measure performance.
 
   The slowdown was most noticeable in the following configuration: a
   Solaris NFS client writing sequentially, a Solaris server with a lot of
   memory (several gigabytes), and a 100BaseT connection between them. There
   was also a 6-stripe (5 data + parity) RAID-5 disk on the server, having a
   stripe unit size of 64K.
 
   Symptoms included greatly reduced throughput from the client to server,
   apparent momentary hangs on the server, and occasionally NFS timeout
   messages from the client.
 
   Performance was good so long as the sequential writes from the client
   application were put onto the network in approximately sequential order
   by the client OS. The application thread put the write request onto a
   queue inside the client kernel, which was serviced by an NFS thread. When
   the queue became full, the application thread sent out the I/O itself,
   scrambling the write order.
 
   The server received the scrambled writes, and performed unnecessary
   synchronous writes under two circumstances (corresponding to the two
   fixes described above). The synchronous writes were responsible for most
   of the slowdown.
 
   If this problem occurred on your system, install this patch.  Other
   things you can try to address the problem:
 
   a) Check with Sun Microsystems to see if there is a patch or tune that
      causes the client to maintain write ordering.
 
   b) Preallocate the file that the client is writing. If you can allocate
      the file on the server, or on an NFS client that has not started
      reordering writes, then subsequent I/O to the file is usually faster,
      even if sent by an order-scrambling client.
 
   c) Experiment with setting write_pref_io on the server file system using
      the vxtunefs command. For a RAID-5 volume, the write_pref_io value is
      the complete stripe size by default. In some circumstances, throughput
      may be improved by setting this value as low as 32K (the default NFS
      write size). However, setting the value lower than the default will
      probably slow down local access to the same file system (if local
      access is taking place).
 
   d) Lower the total stripe size of the RAID-5 volume, or do not use a
      (software) RAID-5 volume, or raise the NFS transfer size. For one
      tested configuration, a 5-stripe volume (4 data) did better than a
      6-stripe (5 data). In fact, this configuration performed better when
      the ratio of total stripe size (number of data disks * stripe unit
      size) to the NFS transfer size (set at mount time, default 32K) was
      less than or equal to 8, due to some regularity in the pattern of
      write scrambling done on the client.
 
2) VERITAS Incident 30340
 
   The fsck utility replay time was reduced by more than 70%. This affected
   large files systems (over 100 GB or millions of files).
 
3) VERITAS Incident 31636
 
   A problem in the fsck utility (in the function au_state_set()) was
   corrupting memory. This resulted in fsck dumping core, in corrupted
   directory blocks, or in extent map errors reported on the file system
   after it was mounted. If fsck dumps core before it marks the file system
   clean, the file system cannot be mounted.  This problem can occur only
   after a system crash (when fsck is run to recover the file system).
 
   File systems with the following characteristics are at risk:
 
       * A Version 3 or Version 4 disk layout
       * A file system larger than:
 
		128 GB (1K block size)
		256 GB (2K block size)
		512 GB (4K block size.
 
        * File systems with an 8K block size are not affected.
 
   Possible symptoms:
 
       * fsck dumps core
       * fsck finds corrupted directory blocks or inodes and discards them
       * the file system can be mounted, but directory blocks, inode maps,
         extent maps, or inodes are subsequently found to be "bad," requiring
         a full file system check.
 
   The following table lets you determine if your file system is at risk:
 
         Fragment Size     And "df -g" Reports More Than This Many Blocks
         -------------     ----------------------------------------------
 
            1024             268,435,456    	(at risk)
            2048             536,870,912    	(at risk)
            4096             1,073,741,824  	(at risk)
            8192                                (not at risk)
 
   What to do if your file system is at risk:
 
       Any VxFS version is at risk. The fix for older versions is to upgrade
       to version 3.3.2, then apply the patch.
 
4) VERITAS Incident 31948
 
   The Inode Allocation Unit (IAU) Summaries were not fixed correctly by
   fsck resulting in the following console message and the subsequent need
   to run a full fsck on the file system:
 
   vxfs: mesg 004: vx_mapbad - %s file system free inode bitmap in au %d marked bad
 
   This problem occurred only when the IAU structural file was fragmented in
   a particular way, which was likely only on fragmented file systems
   containing millions of files.
 
5) VERITAS Incident 31957
 
   The vxdump command did not record the creation time of a snapshot as the
   dump time in /etc/dumpdates file.  This resulted in files being
   overlooked when performing incremental backups using snapshots.
 
6) VERITAS Incident 32226
 
   A race condition occurred between iremove and vx_hsm_iptohandle.
 
7) VERITAS Incident 32652
 
   A panic occurred while trying to free non-existing extent.
 
8) VERITAS Incident 33143
 
   There were problems with creating a file, unlinking it, and mmaping a
   very large region.
 
9) VERITAS Incident 33246
 
   Inodes were marked bad when a system came back up from a crash or an
   unclean shutdown. After the system was up and the file systems were all
   mounted, the following error message displayed on the console:
 
   vxfs: mesg 017: vx_iread_1 - %s file system inode 44 marked bad
 
   This problem was due do the inode having extended operations pending
   during a resize operation. Usually extended operations are completed as
   part of the mount process, but if a resize operation was performed on the
   file system when it was last mounted, the extended operation was ignored.
   This condition occurred only when the system crashed or was shutdown
   incorrectly.
 
10) VERITAS Incident 33408
 
   VxFS would hang with the umountall thread blocked on clone removal.
 
11) VERITAS Incident 33299
 
   A full fsck dumped core on file systems with a corrupted directory
   block--an extremely rare problem.
 
   The only identifiable symptom of this problem was a full fsck dumping
   core while running on a corrupted file system.
 
12) VERITAS Incident 33629
 
   The getacl command was not working correctly.

README -- Last modified date:  Wednesday, February 7, 2001