Patch-ID# 110978-04
Keywords: Storage Migrator 3.4.1 UFS files SM_341_5
Synopsis: Storage Migrator 3.4.1 UFS files fix
Date: Apr/17/2003

Install Requirements: None                      
                      
Solaris Release: 2.6 7 8

SunOS Release: 5.6 5.7 5.8

Unbundled Product: Veritas Storage Migrator

Unbundled Release: 3.4.1

Xref: 

Topic: Storage Migrator 3.4.1 jumbo patch

Relevant Architectures: sparc

BugId's fixed with this patch: 4454235 4533532 4685639 4835347

Changes incorporated in this version: 4835347

Patches accumulated and obsoleted by this patch: 

Patches which conflict with this patch: 

Patches required with this patch: 

Obsoleted by: 

Files included with this patch: 

<install_dir>/openv/hsm/bin/HSMKiller
<install_dir>/openv/hsm/bin/admincmd/migcopy
<install_dir>/openv/hsm/bin/admincmd/migd
<install_dir>/openv/hsm/bin/admincmd/migdbclean
<install_dir>/openv/hsm/bin/admincmd/migmdclean
<install_dir>/openv/hsm/bin/admincmd/migmerge.sh
<install_dir>/openv/hsm/bin/admincmd/migmkspace
<install_dir>/openv/hsm/bin/admincmd/migpfcpatch
<install_dir>/openv/hsm/bin/admincmd/migrd
<install_dir>/openv/hsm/bin/admincmd/migrel.sh
<install_dir>/openv/hsm/bin/admincmd/migreldev
<install_dir>/openv/hsm/bin/admincmd/migshutdown.sh
<install_dir>/openv/hsm/bin/admincmd/migstartup.sh
<install_dir>/openv/hsm/bin/admincmd/migtlabel
<install_dir>/openv/hsm/bin/admincmd/migtpreq.sh
<install_dir>/openv/hsm/bin/admincmd/migtrans
<install_dir>/openv/hsm/bin/admincmd/migvold
<install_dir>/openv/hsm/bin/cmd/migin
<install_dir>/openv/hsm/bin/cmd/migmdclean
<install_dir>/openv/hsm/bin/cmd/migopscan
<install_dir>/openv/hsm/bin/cmd/migrc
<install_dir>/openv/hsm/bin/cmd/migstage
<install_dir>/openv/hsm/bin/cmd/migtscan
<install_dir>/openv/hsm/bin/goodies/migadd_trailer.sh
<install_dir>/openv/hsm/bin/migVSMshutdown
<install_dir>/openv/hsm/bin/migVSMstartup
<install_dir>/openv/hsm/bin/migdbcheck
<install_dir>/openv/hsm/bin/migdbrpt
<install_dir>/openv/hsm/bin/migreg
<install_dir>/openv/hsm/bin/migscriptcons
<install_dir>/openv/hsm/bin/migsweep
<install_dir>/openv/hsm/bin/migunmigrate
<install_dir>/openv/hsm/bin/setuphsm
<install_dir>/openv/java/allHSM.jar
<install_dir>/openv/lib/libmigsmall.so

Problem Description:

4835347 Need SM 341_5 released in Sun format
 
(from 110978-03)
 
4685639 Need SM 341_3 for issues related to Sun Alert ID 44364
 
(from 110978-02)
 
4533532 Need SM 3.4.2 jumbo patches ported to Sun patchadd format
 
(from 110978-01)
 
4454235 Need SM 3.4.1 jumbo patches ported to Sun patchadd format

Patch Installation Instructions:
-------------------------------------------------------------------------
Refer to the Install.info file within the patch for instructions on
using the generic 'installpatch' and 'backoutpatch' scripts provided
with each patch.  Any other special or non-generic installation
instructions should be described below.
-------------------------------------------------------------------------

Special Install Instructions:
As root on your NetBackup Master Server:
 
 
1) Stop the following daemons: migd migvold 
    /usr/openv/hsm/bin/stopmigd
 
2) Install patched binaries via patchadd/installpatch.
 
 
3)  Restart daemons using the platform specify startup script provided in
   the /usr/openv/hsm/bin/goodies directory:
 
Script Name         Startup Path                Supported Platforms
S78hsmveritas       /etc/rc2.d/S78hsmveritas    Solaris VxFS/DMAPI
 
S73HSM.mount        /etc/rc2.d/S73HSM.mount     Solaris ufs/Kernel-based
 
============================================
== Current ===
 
Description: 
    VSM did not correctly release tape drive reservations.  This caused
    resource conflicts in NetBackup SSO (Shared Storage Option) environments. 
 
    (All VSM Servers)
 
Description: 
    Consolidation where the "granule size" of the source and destination 
    volumes do not match causes an inability to cache files post consolidation.
    This can only happen when attempting to consolidate from one "method" or 
    media type to a new "method" or media type.
 
    The solution is to check for different granule sizes and not allow the 
    consolidation. 
 
    (All VSM Servers)
 
Description: 
    migdbcheck created some temporary files with no permissions; it will now
    create them with 0644 or 600 permissions. 
 
Workaround: 
    Use the Unix chmod command to set the permissions of any /tmp/migdb* files
    to be 0644. 
 
    (All VSM Servers)
 
Description: 
    It is possible for migmkspace to free (purge) the space for a file before
    enough VSM copies of the file have been made.  This can occur when the FHDB
    (File Handle Database) contains entries for granules at the start of the 
    file, but not enough granules to contain the entire file.  
 
    Migmkspace was changed to make sure that enough granules exist (for each
    file copy) to contain the whole file. 
 
    (All VSM Servers)
 
Description: 
    migunmigrate fails with the following error for files that are 2GB or 
    larger:
 
    ERROR: /hsm1/bigfile Cannot open
    perror 72: Value too large to be stored in data type 
 
    (All VSM Servers)
 
Description: 
    The migcopy and migbatch processes may abort causing copy database (copydb)
    problems. The following is an example of the sequence of events that cause
    the problem:
    1. Migrate two copies of some files (for example, f.1 - f.100). 
    2. Allow one copy of a file to be made (for example, to tape).  The second
       copy of the same file is not yet made (possibly because of a tape error).
    3. Read one of the original files (for example, f.50). This causes the file
       to be unmigrated (the migin easy case) and the dk entry to be marked
       dead.
    4. Make copy 2 of the files in the copy database (copydb).  When you reach
       f.50, the file still exists as a migrated file, but there is no "live"
       dk entry.  The migcopy requests the tape for the first copy of f.50 to
       make copy 2 of f.50.
 
    Note: This problem does not occur in 4.5, since the "migin easy case" for a
    read has been eliminated. 
 
Additional Notes: 
    The fix prevents migcopy from using a level 1 or level 2 copy
    to make a level 1 or 2 copy, when the source is supposed to
    be on disk.
 
    When an attempt is made to make a new copy at level 1 or 2 from a level 1 
    or 2 copy (and the source should have been on disk) the existing (level 1 
    or 2 ) volume will not be requested.  (This is file f.50 in the example 
    stated earlier.)  In this case the file will be skipped log messages like 
    this:
 
    10/30 08:58:54 [14777]migcopy[16377]: ERROR: No valid granule found.
    10/30 08:58:54 [14777]migcopy[16377]: ERROR: Reverting to single buffered 
    I/O.
    10/30 08:58:54 [14777]migcopy[16377]: ERROR: No valid granule found.
    10/30 08:58:54 [14777]migcopy[16377]: ERROR: Failed to copy 0x1D72; tret = 1
    10/30 08:58:54 [14777]migcopy[16377]: ERROR: copy_for_method() ret=1
 
    The file will eventually get remigrated (after it gets old enough again).  
    The new migrate of the file will remove the old HSM handle from the file 
    (not enough copies) and assign a new HSM handle.  Two new copydb entries 
    will be created since two new copies of the file are now needed.
 
    A migmove from level 1 to level 2 (or from 2 to 1) still works.  migcopy 
    can distinguish these cases because the source for migmove is NOT level 0 
    (the on disk copy). 
 
    (All VSM Servers)
 
Description: 
    If there are more than 64 HSM eligible file systems mounted, NetBackup only
    recognizes the first 64 as possible HSM file systems.
 
    HSM eligible means a type vxfs file system on Solaris (DMAPI) and HP-UX.
    HSM eligible means a type HSM file system on Solaris non-DMAPI.
    HSM eligible means a type xfs file system mounted with -o dmi on IRIX.
 
    If there are HSM-managed file systems mounted after the first 64 eligible,
    then bpbkar and tar are unaware that the file systems beyond number 64 are
    managed. Bpbkar will cause purged files to cache when it backs them up. 
 
Additional Notes: 
    On solaris, the fix is contained in 
    /usr/openv/lib/libmigsmall.so 
    which is dynamically linked by bpbkar and tar. 
 
    On HP_UX and IRIX, the fix is supplied by the NB_34_4 server patch and 
    is contained in 
    /usr/openv/netbackup/bin/bpbkar 
    and 
    /usr/openv/netbackup/bin/tar  
 
Description: 
    Migcopy can store the incorrect end-of-tape position in the VOLDB under the 
    following circumstances: 
 
    (1) The last file in the list being copied to tape cannot be copied. 
        For example, the last file being copied might have been removed by 
        the user. 
 
    -- AND --
 
    (2) The last file being copied is the last file in a "flush group". 
        By default, a flush group occurs for every 4 GByte of files being 
        copied.  
 
        NOTE:  This flush group can be configured by the administrator.  
               The flushes can occur after every "n" files or 
               after a given amount of data.  Changes to the default 
               value are stored in the hsmname.FLUSH file in the database 
               directory. 
 
    If this occurs, the VOLDB entry contains the wrong value for 
    the number of file marks on the tape volume.   The next attempt to write 
    on that volume succeeds, but it will overwrite the files in the previous 
    flush group.  The data for these files becomes lost and unrecoverable.  
 
Additional Notes: 
    The identifier script, mig_check_overlaps.sh, can be created by cutting and 
    pasting the identified portions below, or you may contact VSM Customer 
    Support for a copy of the script.
 
    *** begin cut & paste on next line ***
    #!/bin/sh
    #
 
    Usage() {
            echo "Usage: $PROG hsmname" >&2
        exit 1
    }
 
    . /usr/openv/hsm/bin/migscriptcons
 
    PROG=`basename $0`
    TMP=/tmp/errors.$PROG.$$
 
    if [ -z "$1" ] ; then
      Usage
    fi
 
    HSMNAME=$1
    D=`$MIGDBDIR $HSMNAME 1`
    STATUS=$?
 
    if [ $STATUS -ne 0 ] ; then
      Usage
    fi
 
    FHDB=$D/database/FHDB
    if [ ! -s "$FHDB" ] ; then
      echo "$PROG - cannot find $FHDB " >&2
      exit 2
    fi
 
    $AWK -F'|' '{if ($15 == "ct" || $15 == "dt" || $15 == "mt" ) {
                    if (split($21, position, " ") >= 2) {
                            if (position["2"]+0 == 0) {
                                    printf "%s      %s           %
    s\n",""$4, ""$15,  ""position[1]
                            }
                    }
            }
    }'  < $FHDB | sort | uniq -d | uniq > $TMP
 
    if [ -s "$TMP" ] ; then
      NUM=`wc -l $TMP | $AWK '{print 0+$1}' `
      echo "There are $NUM probable overlaps for HSM $HSMNAME"
      echo "VOL ID      Method     File Mark Number"
      cat $TMP
    else
      echo "No overlaps detcted for HSM $HSMNAME"
    fi
 
    ${RM} -f $TMP
    *** end cut & paste on previous line *** 
 
Description: 
    migdbcheck erroneously claimed that extra copies were found. 
 
Description: 
    "migrc -L" terminated with a zero status.  In actuality, it failed because 
    the file system was not in the MAINTENANCE state. 
 
Description: 
    During consolidation, the un-used field in the VOLDB could become negative, 
    causing space calculation to be incorrect. 
 
Description: 
    Migcopy allocated 2048 bytes of memory per granule and did not free that 
    memory.  When hundreds of thousands of granules were copied, migcopy could
    generate a bus error when additional memory allocations failed. 
 
Workaround: 
    An administrator can split the copydb file that contains many entries into
    several copydb files, each containing a smaller number of entries.  The
    copydb files would then have to be renamed one at a time to the correct 
    copydb name, and then processed by migbtach one at time. 
 
== 110978-03 ===
Description: 
    The OBSOLETE flag for the FHDB dk entry caused dead dk entries to be
    removed if the flag was not set.  This problem caused cached files to be 
    automatically purged the next time they were migrated. 
 
** Description **: 
    If a migrated file spans tapes and a continuation tape had an I/O error
    when the file was written, the file cannot be cached. This is caused by a
    partial FHDB entry for the file.
 
    This problem causes data loss if only one copy has been made. 
 
Description: 
    The global configuration file for VSM gets corrupted in some instances. 
 
Description: 
    migsweep did not select enough files during no space processing when
    give_up_files was configured to be zero.  (give_up_files == 0 should
    indicate NO limit on the number of files selected.) 
 
Workaround: 
    Configure the hsmname so that give_up_files is a large value. 
 
Description: 
    migdbcheck erroneously shows files as "copies-needed" even though manual
    checking shows that the files have been properly migrated and copied.  This
    happens for large files with level 1 copies that span tapes. 
 
** Description **: 
    VSM must optionally use SCSI reserve/release for media manager volumes.
    This is being done to prevent data loss when VSM is used in a SAN
    environment.
 
    SCSI reserve/release can be turned off by creating the following file:
 
    /usr/var/openv/hsm/database/no_scsi_reserve 
 
Description: 
    During consolidation, migtrans may dump core if one of the volumes being
    consolidated contains the second part of a file that spans volumes. 
 
Description: 
    When the Java GUI is used to change file system properties, it also
    inadvertently turns on partial file caching.  This fix prevents partial
    caching from being turned on when it was not specifically selected. 
 
Description: 
    The VSM Java GUI performance is very poor when displaying information for
    a large number of managed file systems. 
 
Description: 
    An intermittent problem exists that can cause a new hierarchy to be created
    after a user makes changes to the file system properties. 
 
Description: 
    Running the command "migrc -L" on an ACTIVE file system could release "live"
    locks, and cause problems with the file system.  To resolve this issue, the
    "migrc" command has been changed so that the "-L" (clear locks) option can
    only be used if the file system is in the MAINTENANCE state. 
 
Description: 
    The Java GUI may have inadvertently turned on partial file caching for any
    or all managed file systems.  This fix provides a command that can be run to
    detect and correct any such occurrences. 
 
Additional Notes: 
    After the SM_341_3 patch is installed, run the following command to detect
    if partial file caching may have been inadvertently turned on for any
    managed file systems and, if desired, turn it off:
 
         /usr/openv/hsm/bin/admincmd/migpfcpatch 
 
** Description **: 
    Using the undocumented "-N" option of migdbcheck in conjunction with the
    "-r" (repair) option could result in data loss.  This fix makes those two
    options mutually exclusive. 
 
Workaround: 
    Never use the migdbcheck "-N" and "-r" options together. 
 
Description: 
    The Java GUI action "Fix DB for Filesystem" may erroneously remove files
    from the managed file system.  Because this is a dangerous thing to do, this
    capability is being removed from the Java GUI. 
 
 
== 110979-02 ===
Description: 
       Migreg -F will continue to register a volume even if migassign
        fails. This can cause volumes that should not be registered to get 
        registered to hsmname managed file-system. Migreg -F should only 
        proceed if the failure was because the volume is already assigned.
 
Description: 
     The "-a" (age) parameter specified to migmdclean was effectively being
    ignored; the default of 7 days was always being used regardless of whether
    "-a" was specified or not.
 
Description: 
 
       When consolidating volumes, migcons does not remove the consolidated
      volume from the VOLDB and the FHDB entries are not removed from
      the FHDB.
WORKAROUND: 
 
      After consolidation run 
        migmdclean -a 0 -R <hsmanme> <volume label>.<method>
     for each volume that was consolidated.
 
Description: 
      If multiple migcopy's are out of tape or optical volumes and
     are requesting new volumes from the HSM or scratch pool, the same volume
     may get assigned to two of the migcopy's.
     When this happens one migcopy will overwrite the data the other migcopy
 
     has written on the volume.
     This same problem can happen during consolidation.
 
WORKAROUND: 
     Do not use scratch pools and use different pool names for each
    managed filesystem.
    Use different pool names for each copy and stripe if they are using
    a similar method.
 
Description: 
    migVSMshutdown will fail if there are multiple instances of the java GUI
    (migsa, migfb, migam)
 
    The error will be
    /bin/nawk: syntax error at source line 4
 
Description: 
     A problem occurs on a solaris_dm platform when a non-root user executes
        migstage and if some of the files being staged will be cached back using
        the NB method.
 
    In this case, the NB method files are cached one
    at a time via a DMAPI read event on the file. This
    takes too long and defeats the intent of migstage.  
    Migstage is supposed to cache all the files from 
    one NB method volume in one operation.
 
Description: 
     Migcopy will erroneously indicate that a copy of a migrated file has been
    successfully made when:
    (1) the method is nb (NetBackup Method); and
    (2)  at least one file in the worklist of files to be copied contains
         white space in the file name.
        This file and all files in the worklist after this file will be erroneously 
        flagged as having been successfully copied.
 
    WARNING:  THESE FILES CAN NOW BE PURGED, RESULTING IN LOSS OF DATA FOR 
    THESE FILES.
 
     This problem exists on HP, SGI, and Solaris DMAPI installations of VSM.
    This problem does not exist on the kernel-based (non-DMAPI, UFS) version
    of VSM on Solaris.
 
== 110979-01 ===
Description:
    Pool names longer than 14 characters do not work correctly.  The tpreq
    command, as invoked by migcopy, fails with the log message:
 
    "Request terminated because of volume pool mismatch".
 
    Although NetBackup's use of Media Manager allows a 20 character pool name,
    VSM limits pool names to 16 characters.  This fix will allow up to 16
    character pool names when VSM is used.  The discrepancy between NetBackup's
    and VSM's Media Manager pool name length will be addressed in a future fix.
 
Workaround:
    Use pool names of 14 characters or less.
 
Description:
      If the volume pool name is more 14 characters long, migdbrpt will
      show garbage at the end of the pool name. 
 
Workaround: 
         The only workaround is to use pool names less then 15 characters.
 
Description:
    VSM will only try once to read a granule from copy 1 before trying
    copy 2. If there are occasional read errors, this can cause VSM to wait
    for a vault copy of the file when a 2nd try at copy 1 would have worked.
        File <database_dir>/<hsm_name>.GRAN_RETRY must exist for this feature
    to work.
 
Description:
    migVSMstartup and migVSMshutdown process managed filesystems one at a
    time.  This slows down the startup and shutdown of VSM. migVSMstartup and
    migVSMshutdown need to process all managed filesystems in parallel.
 
 
Description:
    When VSM is managing more then one filesystem, volumes may become
    registered to a hsmname but NOT be assigned in the media manager database.
    This can cause VSM to keep trying to register the same volumes in the
    scratch pool over and over.
 
Workaround:
    There are two possible workarounds:
 
    1)  Register volumes to each hsmname so that migcopy does not have to
    select volumes from the scratch pool.
 
    2)  Assign different pool names to volumes so each hsmname is using
    a different pool name.
 
Description:
    On Solaris VSM platforms there is code in VSM for special handling 
    of requests from an NFS daemon and code in the NFS daemon to specially
    handle migrated and purged files.  This introduces a delay in completing 
    NFS client requests involving files purged by VSM.  This delay is introduced 
    even in the VSM "easy case" where the file is migrated but the data is still 
    on disk (not yet purged).  This fix eliminates the delay in the VSM "easy case".
 
Description:
    The migcopy process will not process the COPYDB work list if the first
    file listed has no active FHDB entry.
 
    Log entries will look like:
 
      ERROR: No source for 708540M3e5c
      ERROR: Failed to copy 0x3E5C; tret = 2
      ERROR: copy_for_method() ret=7000
      ERROR: write on destination volume failed, will not try next file
      Finished 7000
 
Description:
    The VSM Java GUI could display an incorrect percentage of space
    used on filesystems that are very large.  
 
Description:
    If a tape write error occurs, all files written sinse the last file mark
    was written will be in the following state:
 
    The copydb work list will indicate a copy needs to be made.
    The file's DM attributes will indicate not enough copies have been made.
    The file will have FHDB entries for the tape volume.
 
 
    The next time migcopy runs to make copies, the migcopy copydb work list
    will be marked complete because there is an existing FHDB entry.
    The file's DM attributes will still indicate there are not enough copies
    and the file will still have FHDB entries for the file.
    These files will stay in this state and will not be purged as they
    do not have enough good copies.
 
Workaround:
    Set <database>/<hsmname.FLUSH> to contain a 1.
    This will cause a tape mark to be written after each file.
 
Description:
    When consolidating a tape that has many files on it, migtrans will
    appear to be hung in a strlen function. migtrans is not hung; it is
    sorting the tape volume list by file mark number. This can take a long
    time when there are a lot of files on the tape volume.
 
Additional Notes:
    If the following file: <Database Pathname>/database/<hsmname>.GRAN_RETRY 
    is defined, then migcopy will only try twice to read a granule from file 
    copy one.
 
Copyright (C) 2002 VERITAS Software Corporation. All Rights Reserved.
VERITAS, VERITAS SOFTWARE, the VERITAS logo, Business Without Interruption,
VERITAS The Data Availability Company, NetBackup, NetBackup DataCenter,
NetBackup BusinesServer and VERITAS Storage Migrator for Unix
are trademarks or registered trademarks of VERITAS Software Corporation
in the US and/or other countries. Other product names mentioned herein may
be trademarks or registered trademarks of their respective companies.

README -- Last modified date:  Thursday, April 17, 2003