Patch-ID# 106454-01
Keywords: solstice backup networker
Synopsis: Solstice Backup 5.1_x86: Product Patch
Date: Jul/09/99

Solaris Release: 2.5_x86 2.5.1_x86 2.6_x86

SunOS Release: 5.5_x86 5.5.1_x86 5.6_x86

Unbundled Product: Solstice Backup

Unbundled Release: 5.1_x86

Relevant Architectures: i386

BugId's fixed with this patch: 4086246 4117627 4133424 4137841

Changes incorporated in this version: 

Patches accumulated and obsoleted by this patch: 

Patches which conflict with this patch: 

Patches required with this patch: 

Obsoleted by: 

Files included with this patch: 

/usr/sbin/nsr/ansrd
/usr/sbin/nsr/jb_config
/usr/sbin/nsr/jbexercise
/usr/sbin/nsr/mminfo
/usr/sbin/nsr/nsralist
/usr/sbin/nsr/nsrarchive
/usr/sbin/nsr/nsrcap
/usr/sbin/nsr/nsrclone
/usr/sbin/nsr/nsrd
/usr/sbin/nsr/nsrexec
/usr/sbin/nsr/nsrexecd
/usr/sbin/nsr/nsrindexd
/usr/sbin/nsr/nsrjb
/usr/sbin/nsr/nsrlmc
/usr/sbin/nsr/nsrlic
/usr/sbin/nsr/nsrmmd
/usr/sbin/nsr/nsrmon
/usr/sbin/nsr/nsrstage
/usr/sbin/nsr/nsrtrap
/usr/bin/nsr/nwbackup
/usr/bin/nsr/nwrecover
/usr/bin/nsr/pstclntsave
/usr/bin/nsr/recover
/usr/bin/nsr/save
/usr/sbin/nsr/savefs
/usr/sbin/nsr/savegrp
/usr/sbin/nsr/scanner
/usr/sbin/nsr/tapeexercise
/usr/bin/nsr/nwretrieve

Problem Description: 

o	nsrmmd repeats message "Diagnostic:
             recvfd(4) fails, errno=6" and consumes all available CPU cycles.
             Initial diagnosis was that the Networker server was also a 
             storage node and there was contention between server managed 
             nsrmmd processes and nsrmmd processes managed by the storage
             node. However, the errors persisted even when the system was
             only a server and  no longer a storage node.  The errors occurred
             because the set_mm_control function is called only for nsrmmd
             processes that are not local to the server.  Set_mm_control is
             now called for nsrmmd processes controlled by the server and the
             storage node.

o	Issuing commands to a jukebox attached to a storage node
             fails. If multiple nsrmon requests are pending for a single
             nsrmmd, and the first nsrmon request did not terminate properly
             due to a time out, all subsequent requests to nsrmmd will fail.

             The following error message appears in the daemon.log:
             "nsrmon <pid>: auth failed: invalid storage node proposed:
             <hostname>".

             NOTE: This error message can still occur and is typically
                   caused by network related problems.

o	Recover does not use the "Server network interface" field
             defined in the client's resource record.  This may cause recover
             data not to traverse the desired physical network between the 
             client and the server.

             Recover now uses the "Server network interface" field.

o	The correct browse time does not display in the nwrecover
             window when a valid browse time is entered in the 'Change Browse
             Time' window.

o	CPU utilization goes up 7% per nwadmin.
             When nwadmin refreshes, it polls nsrd which queries for storage
             nodes and high speed devices.  This query can be CPU intensive
             when there are a large number of storage nodes and high speed
             devices. This information is now cached.

4086246	Using nwbackup, a directory is marked for backup. If the
             directory is then collapsed and expanded, it is no longer marked
             for backup.  The directory is not displayed or recoverable from 
             nwrecover.

o	Self-id licensing checks are calculated every time they are
             needed. Self-id licensing checks are costly to the server and 
             can degrade performance on servers with a large number of clients.
             Client license information is now stored.

o	During large savegrps, the client must re-authenticate with
             the server.  Extremely active networks can cause re-authentication
             to fail and the savegrp to abort due to the network traffic. 
             Changes have been made to re-authentication code to allow
             retires, when a failure is received.

o	Storage node nsrmmd's may be
             restarted unnecessarily. When polling events for multiple
             storage nodes occur at the same time, the nsrmon processes update
             the 'nsrmon info' attribute. This update can happen at the same
             time causing nsrd to receive an old and incorrect value for the
             attribute. This results in sequence number mismatch errors to
             the wrong caller and restarting of nsrmmd's. 

o	Nsrindexd would use 100% of the CPU when a savegroup was started.
             The savegroup had over 150 clients and the nsr.res file was
             large.  When the savegroup was started, the nsr.res file was
             opened for every client in the group.  Nsrindexd was
             spending most of its time parsing and allocating the in-core
             version of the database. Nsrindexd was changed to open the
             index and keep it open.

o	Nsrd core dumps during startup if the server has more than
             100 clients configured. The memory re-allocation route was
             changed and the check to expand the client name list happens
             at the 100th client, instead of the 101st.

o       Save has been executed with the -x option to specifically cross
             mount points.  Recover and nwrecover do not list the directory
             entry for the mounted filesystem in the browse list.

             When listing files in recover, with ls, the directory entry
             for the mounted filesystem doesn't appear in the browse list.
             The directory can be added to the recover list and restored,
             even though the directory entry doesn't appear in the browse 
             list. If the directory is added to the recover list, the
             directory entry will then appear in the browse list during the
             next listing of files.

             Nwrecover does not display the directory entry for the mounted
             filesystem in the browse list.  The directory entry will appear
             if the absolute path is specified in the selection field. If
             a command line recover is performed, the directory can be added
             to the recover list and successfully restored.

o        Creating index entries with 'scanner -i' fails.
         Rebuilding index entries using 'scanner -i', with
               a tape that has valid data, fails with the following
               SET of errors:

               write failed, Broken pipe
               fn # rn # read error Bad file number
               ssid #: NOT complete
               ssid #: # MB, # file(s)
               done with dlt tape volume.name
               error, rewind Bad file number

             An internal file descriptor becomes corrupted when scanner
             spans tapes. Once the file descriptor entry is corrupted,
             the child scanner is unable to read data from the pipe. The
             child then hangs and the parent tries to write data to the
             pipe. The write fails because the file descriptor no longer
             points to the child. The parent then tries to shutdown the
             saveset processing and closes the pipe. Which causes the
             "broken pipe" error.  Scanner then tries to rewind the
             tape, but is unable to because of the bad file descriptor.
             
             'Scanner -i' fails when rebuilding the indexes on
             large savesets. File descriptors were not closed, 
             causing files to remain open. In the case of large savesets,
             many files remained open, preventing child processes from
             exiting. This has been one of the causes for the error
             "Pipe: too many open files" on HPUX systems.

o       Trap received from NetWorker has incorrect values for IPADDRESS
             and TIMETICKS.

o       If an I/O error occurs when NetWorker is trying to read a tape,
             NetWorker may go into a loop continuously trying to read the
             same tape or NetWorker will try to read the tape and eventually
             disable the drive. Even though NetWorker detected an error
             with the tape, the volume was never marked 'full'.

             If a drive is marked read only and a backup is started, NetWorker
             will loop trying to load and unload the volume in that drive.

o       Additional checking has been added to nsrd.  NetWorker now
             verifies proper case syntax for pool names. If a pool name
             is typed with the wrong case, an error message is returned. 

             ie: # save -s alain -b "DEFAULT" /tmp
                 save: RAP error, There is no pool named `DEFAULT'.
                 save: Cannot open save session with alain

             The pool name should have been "default", all lowercase letters.

             Previously, NetWorker would wait for a tape in pool DEFAULT
             to become available.

                  08/06/98 10:28:16 nsrd: media critical event: backup to 
                  pool 'DEFAULT' waiting for 1 writable backup tape(s)

o       Nsrstage would abort the stagging operation if a ssid/cloneid
             of an incomplete save set was specified. Nsrstage has been
             changed to skip the specified incomplete save set, instead
             of aborting.

o       Ssi module fails on Solaris 2.6 with an STK silo. Starting
             the ssi module (/usr/sbin/nsr/ssi) fails. Ssi_event.log contains
             the errors:

    03-20-98 15:58:14 SSI[O]

    ONC RPC:csi_init(): Initiation Started

    03-20-98 15:58:14 SSI[0]
    Found transient program number in rpctinit : 40000000
    03-20-98 15:58:14 SSI[0]
    Found transient program number in rpctinit : 40000000NC 
    RPC: csi_rpctinit(): status: STATUS_NI_FAILURE; failed: getsockname()
    03-20-98 15:58:14 SSI[O]:
    RPC: csi_main(): status:STATUS_PROCESS_FAILURE; failed: main()
    Initiation of CSI Failed;

o       Nsrjb is unable to import a tape via the import/export door 
             on ATL jukeboxes. Nsrjb -d will error with " source component 
             empty" or "invalid request, invalid element number".  The green
             light to load a tape never comes on.

o       When a device becomes disabled due the the error count being
             reached, nwadmin doesn't display the drive as disabled.

o       Restore of data fails on HP10x systems that performed backups
             with immediate save enabled. 

             The data backed up with immediate save enabled on an HP10x
             system is compromised and full backups should be performed
             to ensure valid backups.

o       The post processing commands executed by savepnpc do not always
             execute when a time is specified with 'timeout'. The failure
             has been seen on systems with a heavy load when post processing
             should begin.  Pstclntsave has been modified to handle 
             time comparisons on loaded systems.

o       Some systems that backup extremely compressed data have 
             experienced gaps in data being transferred. Data is not being
             lost, but a performance degradation has been noticed. The gaps
             in data transfer are caused by writing file marks synchronously to
             the tape. NetWorker will block when the buffers are flushed
             and the file mark is written, causing data transfer gaps on some
             systems. 

             File marks between data will now be written asynchronously. The
             end of data file mark will still be written synchronously, to 
             ensure the jukebox won't try to eject the tape before all the 
             buffers have been flushed. 

o       Scheduled cloning selects savesets that have already been 
             cloned, if the save sets have a date in the future.

o       When save processes a named pipe it incorrectly identifies it
             as a mount point because the device number of the named pipe
             is different from the filesystem containing the named pipe. 

             This will often cause recover browse time problems and recover
             failures. Sometimes the file can be recovered using command line
             recover specifying the time stamp, or using interactive recover
             and explicitly add the named pipe to the recover list.  

             A new save binary has been produced to address this issue. 

o       Changing the mode of a volume to recyclable, does not mark the
             volume full or read-only.  If the volume that had it's mode
             changed to recyclable was mounted and a backup started, the data
             from the backup would have been appended to the tape.
             
4133424 Running 'mminfo -r volid' on a backup that had been performed
             within 24 hours will cause mminfo to core dump.

o       Pstclntsave can't determine when a savegrp has been aborted. 

o           Savegrp is unable to backup >27 save sessions in parallel and
            receives the error "savegrp: Couldn't create log file
            /nsr/tmp/sg.grp_name.srvr_name.(xxxxx for client backup".

4117627     Jb_config will only accept the default device name. Entering
            a device name other than the default will result in an error
            and cause jb_config to loop prompting for another device name.

o           jb_config display the error:

            Pathname 'entered_dev_name' is not valid. Please try again.
            Enter pathname of media drive

4137841     Configuring a SILO with jb_config fails with:

            jb_config: error, Unknown error
            There appears to be something incorrect about your jukebox
            installation. Please consult your device's documentation to help
            you troubleshoot your installation.

o           The relabeling of a jukebox tape fails with the error:

            nsrd: media alert event: Jukebox 'name' failed update handler
            got seq. number 'xxx', should be 'yyy'

o           Some old-style SBU Network Edition enablers are
            recognized as a SBU Single Server base enabler. If the
            system has been configured with >1 client and >1 device, nwadmin
            will error with "Server is disabled Too many devices".

            When a system has been changed to a Single Server, the
            server is reset to default resources, disallowing new resources
            to be created, removes custom resources, and may disable the
            server.

o           Nsrstage or savegrp fails on a file type device when the seek
            reaches 2.3GB - 3GB. Errors reported by nsrstage or savegrp:

            nsrd: media info: cannot seek to record 'xxx' file 'yyy' on
            /file_type_device: Invalid argument

                                    or

            nsrd: media: info: cannot seek on /file_type: Invalid argument

o           Optimization of client licensing code.


Patch Installation Instructions: 
-------------------------------- 
Refer to the Install.info file for instructions on using the
generic 'installpatch' and 'backoutpatch' scripts provided with
each patch.  Any other special or non-generic installation
instructions should be described below as special instructions.


Special Install Instructions: 
----------------------------- 
None.