Patch-ID# 110978-04 Keywords: Storage Migrator 3.4.1 UFS files SM_341_5 Synopsis: Storage Migrator 3.4.1 UFS files fix Date: Apr/17/2003 Install Requirements: None Solaris Release: 2.6 7 8 SunOS Release: 5.6 5.7 5.8 Unbundled Product: Veritas Storage Migrator Unbundled Release: 3.4.1 Xref: Topic: Storage Migrator 3.4.1 jumbo patch Relevant Architectures: sparc BugId's fixed with this patch: 4454235 4533532 4685639 4835347 Changes incorporated in this version: 4835347 Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: /openv/hsm/bin/HSMKiller /openv/hsm/bin/admincmd/migcopy /openv/hsm/bin/admincmd/migd /openv/hsm/bin/admincmd/migdbclean /openv/hsm/bin/admincmd/migmdclean /openv/hsm/bin/admincmd/migmerge.sh /openv/hsm/bin/admincmd/migmkspace /openv/hsm/bin/admincmd/migpfcpatch /openv/hsm/bin/admincmd/migrd /openv/hsm/bin/admincmd/migrel.sh /openv/hsm/bin/admincmd/migreldev /openv/hsm/bin/admincmd/migshutdown.sh /openv/hsm/bin/admincmd/migstartup.sh /openv/hsm/bin/admincmd/migtlabel /openv/hsm/bin/admincmd/migtpreq.sh /openv/hsm/bin/admincmd/migtrans /openv/hsm/bin/admincmd/migvold /openv/hsm/bin/cmd/migin /openv/hsm/bin/cmd/migmdclean /openv/hsm/bin/cmd/migopscan /openv/hsm/bin/cmd/migrc /openv/hsm/bin/cmd/migstage /openv/hsm/bin/cmd/migtscan /openv/hsm/bin/goodies/migadd_trailer.sh /openv/hsm/bin/migVSMshutdown /openv/hsm/bin/migVSMstartup /openv/hsm/bin/migdbcheck /openv/hsm/bin/migdbrpt /openv/hsm/bin/migreg /openv/hsm/bin/migscriptcons /openv/hsm/bin/migsweep /openv/hsm/bin/migunmigrate /openv/hsm/bin/setuphsm /openv/java/allHSM.jar /openv/lib/libmigsmall.so Problem Description: 4835347 Need SM 341_5 released in Sun format (from 110978-03) 4685639 Need SM 341_3 for issues related to Sun Alert ID 44364 (from 110978-02) 4533532 Need SM 3.4.2 jumbo patches ported to Sun patchadd format (from 110978-01) 4454235 Need SM 3.4.1 jumbo patches ported to Sun patchadd format Patch Installation Instructions: ------------------------------------------------------------------------- Refer to the Install.info file within the patch for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. Any other special or non-generic installation instructions should be described below. ------------------------------------------------------------------------- Special Install Instructions: As root on your NetBackup Master Server: 1) Stop the following daemons: migd migvold /usr/openv/hsm/bin/stopmigd 2) Install patched binaries via patchadd/installpatch. 3) Restart daemons using the platform specify startup script provided in the /usr/openv/hsm/bin/goodies directory: Script Name Startup Path Supported Platforms S78hsmveritas /etc/rc2.d/S78hsmveritas Solaris VxFS/DMAPI S73HSM.mount /etc/rc2.d/S73HSM.mount Solaris ufs/Kernel-based ============================================ == Current === Description: VSM did not correctly release tape drive reservations. This caused resource conflicts in NetBackup SSO (Shared Storage Option) environments. (All VSM Servers) Description: Consolidation where the "granule size" of the source and destination volumes do not match causes an inability to cache files post consolidation. This can only happen when attempting to consolidate from one "method" or media type to a new "method" or media type. The solution is to check for different granule sizes and not allow the consolidation. (All VSM Servers) Description: migdbcheck created some temporary files with no permissions; it will now create them with 0644 or 600 permissions. Workaround: Use the Unix chmod command to set the permissions of any /tmp/migdb* files to be 0644. (All VSM Servers) Description: It is possible for migmkspace to free (purge) the space for a file before enough VSM copies of the file have been made. This can occur when the FHDB (File Handle Database) contains entries for granules at the start of the file, but not enough granules to contain the entire file. Migmkspace was changed to make sure that enough granules exist (for each file copy) to contain the whole file. (All VSM Servers) Description: migunmigrate fails with the following error for files that are 2GB or larger: ERROR: /hsm1/bigfile Cannot open perror 72: Value too large to be stored in data type (All VSM Servers) Description: The migcopy and migbatch processes may abort causing copy database (copydb) problems. The following is an example of the sequence of events that cause the problem: 1. Migrate two copies of some files (for example, f.1 - f.100). 2. Allow one copy of a file to be made (for example, to tape). The second copy of the same file is not yet made (possibly because of a tape error). 3. Read one of the original files (for example, f.50). This causes the file to be unmigrated (the migin easy case) and the dk entry to be marked dead. 4. Make copy 2 of the files in the copy database (copydb). When you reach f.50, the file still exists as a migrated file, but there is no "live" dk entry. The migcopy requests the tape for the first copy of f.50 to make copy 2 of f.50. Note: This problem does not occur in 4.5, since the "migin easy case" for a read has been eliminated. Additional Notes: The fix prevents migcopy from using a level 1 or level 2 copy to make a level 1 or 2 copy, when the source is supposed to be on disk. When an attempt is made to make a new copy at level 1 or 2 from a level 1 or 2 copy (and the source should have been on disk) the existing (level 1 or 2 ) volume will not be requested. (This is file f.50 in the example stated earlier.) In this case the file will be skipped log messages like this: 10/30 08:58:54 [14777]migcopy[16377]: ERROR: No valid granule found. 10/30 08:58:54 [14777]migcopy[16377]: ERROR: Reverting to single buffered I/O. 10/30 08:58:54 [14777]migcopy[16377]: ERROR: No valid granule found. 10/30 08:58:54 [14777]migcopy[16377]: ERROR: Failed to copy 0x1D72; tret = 1 10/30 08:58:54 [14777]migcopy[16377]: ERROR: copy_for_method() ret=1 The file will eventually get remigrated (after it gets old enough again). The new migrate of the file will remove the old HSM handle from the file (not enough copies) and assign a new HSM handle. Two new copydb entries will be created since two new copies of the file are now needed. A migmove from level 1 to level 2 (or from 2 to 1) still works. migcopy can distinguish these cases because the source for migmove is NOT level 0 (the on disk copy). (All VSM Servers) Description: If there are more than 64 HSM eligible file systems mounted, NetBackup only recognizes the first 64 as possible HSM file systems. HSM eligible means a type vxfs file system on Solaris (DMAPI) and HP-UX. HSM eligible means a type HSM file system on Solaris non-DMAPI. HSM eligible means a type xfs file system mounted with -o dmi on IRIX. If there are HSM-managed file systems mounted after the first 64 eligible, then bpbkar and tar are unaware that the file systems beyond number 64 are managed. Bpbkar will cause purged files to cache when it backs them up. Additional Notes: On solaris, the fix is contained in /usr/openv/lib/libmigsmall.so which is dynamically linked by bpbkar and tar. On HP_UX and IRIX, the fix is supplied by the NB_34_4 server patch and is contained in /usr/openv/netbackup/bin/bpbkar and /usr/openv/netbackup/bin/tar Description: Migcopy can store the incorrect end-of-tape position in the VOLDB under the following circumstances: (1) The last file in the list being copied to tape cannot be copied. For example, the last file being copied might have been removed by the user. -- AND -- (2) The last file being copied is the last file in a "flush group". By default, a flush group occurs for every 4 GByte of files being copied. NOTE: This flush group can be configured by the administrator. The flushes can occur after every "n" files or after a given amount of data. Changes to the default value are stored in the hsmname.FLUSH file in the database directory. If this occurs, the VOLDB entry contains the wrong value for the number of file marks on the tape volume. The next attempt to write on that volume succeeds, but it will overwrite the files in the previous flush group. The data for these files becomes lost and unrecoverable. Additional Notes: The identifier script, mig_check_overlaps.sh, can be created by cutting and pasting the identified portions below, or you may contact VSM Customer Support for a copy of the script. *** begin cut & paste on next line *** #!/bin/sh # Usage() { echo "Usage: $PROG hsmname" >&2 exit 1 } . /usr/openv/hsm/bin/migscriptcons PROG=`basename $0` TMP=/tmp/errors.$PROG.$$ if [ -z "$1" ] ; then Usage fi HSMNAME=$1 D=`$MIGDBDIR $HSMNAME 1` STATUS=$? if [ $STATUS -ne 0 ] ; then Usage fi FHDB=$D/database/FHDB if [ ! -s "$FHDB" ] ; then echo "$PROG - cannot find $FHDB " >&2 exit 2 fi $AWK -F'|' '{if ($15 == "ct" || $15 == "dt" || $15 == "mt" ) { if (split($21, position, " ") >= 2) { if (position["2"]+0 == 0) { printf "%s %s % s\n",""$4, ""$15, ""position[1] } } } }' < $FHDB | sort | uniq -d | uniq > $TMP if [ -s "$TMP" ] ; then NUM=`wc -l $TMP | $AWK '{print 0+$1}' ` echo "There are $NUM probable overlaps for HSM $HSMNAME" echo "VOL ID Method File Mark Number" cat $TMP else echo "No overlaps detcted for HSM $HSMNAME" fi ${RM} -f $TMP *** end cut & paste on previous line *** Description: migdbcheck erroneously claimed that extra copies were found. Description: "migrc -L" terminated with a zero status. In actuality, it failed because the file system was not in the MAINTENANCE state. Description: During consolidation, the un-used field in the VOLDB could become negative, causing space calculation to be incorrect. Description: Migcopy allocated 2048 bytes of memory per granule and did not free that memory. When hundreds of thousands of granules were copied, migcopy could generate a bus error when additional memory allocations failed. Workaround: An administrator can split the copydb file that contains many entries into several copydb files, each containing a smaller number of entries. The copydb files would then have to be renamed one at a time to the correct copydb name, and then processed by migbtach one at time. == 110978-03 === Description: The OBSOLETE flag for the FHDB dk entry caused dead dk entries to be removed if the flag was not set. This problem caused cached files to be automatically purged the next time they were migrated. ** Description **: If a migrated file spans tapes and a continuation tape had an I/O error when the file was written, the file cannot be cached. This is caused by a partial FHDB entry for the file. This problem causes data loss if only one copy has been made. Description: The global configuration file for VSM gets corrupted in some instances. Description: migsweep did not select enough files during no space processing when give_up_files was configured to be zero. (give_up_files == 0 should indicate NO limit on the number of files selected.) Workaround: Configure the hsmname so that give_up_files is a large value. Description: migdbcheck erroneously shows files as "copies-needed" even though manual checking shows that the files have been properly migrated and copied. This happens for large files with level 1 copies that span tapes. ** Description **: VSM must optionally use SCSI reserve/release for media manager volumes. This is being done to prevent data loss when VSM is used in a SAN environment. SCSI reserve/release can be turned off by creating the following file: /usr/var/openv/hsm/database/no_scsi_reserve Description: During consolidation, migtrans may dump core if one of the volumes being consolidated contains the second part of a file that spans volumes. Description: When the Java GUI is used to change file system properties, it also inadvertently turns on partial file caching. This fix prevents partial caching from being turned on when it was not specifically selected. Description: The VSM Java GUI performance is very poor when displaying information for a large number of managed file systems. Description: An intermittent problem exists that can cause a new hierarchy to be created after a user makes changes to the file system properties. Description: Running the command "migrc -L" on an ACTIVE file system could release "live" locks, and cause problems with the file system. To resolve this issue, the "migrc" command has been changed so that the "-L" (clear locks) option can only be used if the file system is in the MAINTENANCE state. Description: The Java GUI may have inadvertently turned on partial file caching for any or all managed file systems. This fix provides a command that can be run to detect and correct any such occurrences. Additional Notes: After the SM_341_3 patch is installed, run the following command to detect if partial file caching may have been inadvertently turned on for any managed file systems and, if desired, turn it off: /usr/openv/hsm/bin/admincmd/migpfcpatch ** Description **: Using the undocumented "-N" option of migdbcheck in conjunction with the "-r" (repair) option could result in data loss. This fix makes those two options mutually exclusive. Workaround: Never use the migdbcheck "-N" and "-r" options together. Description: The Java GUI action "Fix DB for Filesystem" may erroneously remove files from the managed file system. Because this is a dangerous thing to do, this capability is being removed from the Java GUI. == 110979-02 === Description: Migreg -F will continue to register a volume even if migassign fails. This can cause volumes that should not be registered to get registered to hsmname managed file-system. Migreg -F should only proceed if the failure was because the volume is already assigned. Description: The "-a" (age) parameter specified to migmdclean was effectively being ignored; the default of 7 days was always being used regardless of whether "-a" was specified or not. Description: When consolidating volumes, migcons does not remove the consolidated volume from the VOLDB and the FHDB entries are not removed from the FHDB. WORKAROUND: After consolidation run migmdclean -a 0 -R . for each volume that was consolidated. Description: If multiple migcopy's are out of tape or optical volumes and are requesting new volumes from the HSM or scratch pool, the same volume may get assigned to two of the migcopy's. When this happens one migcopy will overwrite the data the other migcopy has written on the volume. This same problem can happen during consolidation. WORKAROUND: Do not use scratch pools and use different pool names for each managed filesystem. Use different pool names for each copy and stripe if they are using a similar method. Description: migVSMshutdown will fail if there are multiple instances of the java GUI (migsa, migfb, migam) The error will be /bin/nawk: syntax error at source line 4 Description: A problem occurs on a solaris_dm platform when a non-root user executes migstage and if some of the files being staged will be cached back using the NB method. In this case, the NB method files are cached one at a time via a DMAPI read event on the file. This takes too long and defeats the intent of migstage. Migstage is supposed to cache all the files from one NB method volume in one operation. Description: Migcopy will erroneously indicate that a copy of a migrated file has been successfully made when: (1) the method is nb (NetBackup Method); and (2) at least one file in the worklist of files to be copied contains white space in the file name. This file and all files in the worklist after this file will be erroneously flagged as having been successfully copied. WARNING: THESE FILES CAN NOW BE PURGED, RESULTING IN LOSS OF DATA FOR THESE FILES. This problem exists on HP, SGI, and Solaris DMAPI installations of VSM. This problem does not exist on the kernel-based (non-DMAPI, UFS) version of VSM on Solaris. == 110979-01 === Description: Pool names longer than 14 characters do not work correctly. The tpreq command, as invoked by migcopy, fails with the log message: "Request terminated because of volume pool mismatch". Although NetBackup's use of Media Manager allows a 20 character pool name, VSM limits pool names to 16 characters. This fix will allow up to 16 character pool names when VSM is used. The discrepancy between NetBackup's and VSM's Media Manager pool name length will be addressed in a future fix. Workaround: Use pool names of 14 characters or less. Description: If the volume pool name is more 14 characters long, migdbrpt will show garbage at the end of the pool name. Workaround: The only workaround is to use pool names less then 15 characters. Description: VSM will only try once to read a granule from copy 1 before trying copy 2. If there are occasional read errors, this can cause VSM to wait for a vault copy of the file when a 2nd try at copy 1 would have worked. File /.GRAN_RETRY must exist for this feature to work. Description: migVSMstartup and migVSMshutdown process managed filesystems one at a time. This slows down the startup and shutdown of VSM. migVSMstartup and migVSMshutdown need to process all managed filesystems in parallel. Description: When VSM is managing more then one filesystem, volumes may become registered to a hsmname but NOT be assigned in the media manager database. This can cause VSM to keep trying to register the same volumes in the scratch pool over and over. Workaround: There are two possible workarounds: 1) Register volumes to each hsmname so that migcopy does not have to select volumes from the scratch pool. 2) Assign different pool names to volumes so each hsmname is using a different pool name. Description: On Solaris VSM platforms there is code in VSM for special handling of requests from an NFS daemon and code in the NFS daemon to specially handle migrated and purged files. This introduces a delay in completing NFS client requests involving files purged by VSM. This delay is introduced even in the VSM "easy case" where the file is migrated but the data is still on disk (not yet purged). This fix eliminates the delay in the VSM "easy case". Description: The migcopy process will not process the COPYDB work list if the first file listed has no active FHDB entry. Log entries will look like: ERROR: No source for 708540M3e5c ERROR: Failed to copy 0x3E5C; tret = 2 ERROR: copy_for_method() ret=7000 ERROR: write on destination volume failed, will not try next file Finished 7000 Description: The VSM Java GUI could display an incorrect percentage of space used on filesystems that are very large. Description: If a tape write error occurs, all files written sinse the last file mark was written will be in the following state: The copydb work list will indicate a copy needs to be made. The file's DM attributes will indicate not enough copies have been made. The file will have FHDB entries for the tape volume. The next time migcopy runs to make copies, the migcopy copydb work list will be marked complete because there is an existing FHDB entry. The file's DM attributes will still indicate there are not enough copies and the file will still have FHDB entries for the file. These files will stay in this state and will not be purged as they do not have enough good copies. Workaround: Set / to contain a 1. This will cause a tape mark to be written after each file. Description: When consolidating a tape that has many files on it, migtrans will appear to be hung in a strlen function. migtrans is not hung; it is sorting the tape volume list by file mark number. This can take a long time when there are a lot of files on the tape volume. Additional Notes: If the following file: /database/.GRAN_RETRY is defined, then migcopy will only try twice to read a granule from file copy one. Copyright (C) 2002 VERITAS Software Corporation. All Rights Reserved. VERITAS, VERITAS SOFTWARE, the VERITAS logo, Business Without Interruption, VERITAS The Data Availability Company, NetBackup, NetBackup DataCenter, NetBackup BusinesServer and VERITAS Storage Migrator for Unix are trademarks or registered trademarks of VERITAS Software Corporation in the US and/or other countries. Other product names mentioned herein may be trademarks or registered trademarks of their respective companies. README -- Last modified date: Thursday, April 17, 2003