Saturday, August 11, 2007

IBM: TSM (ADSM) back-up - TCP/IP connection failure

IBM: TSM (ADSM) back-up - TCP/IP connection failure

J1gh2 (MIS)
11 Aug 04 7:15
Hi folks

For a few months now a level 0 backup fails about once a week with "ANS1017E (RC-50) Session rejected: TCP/IP connection failure". This error appears in the database log file but there is nothing in the dsmerror.log or dsmsched.log.

The TSM server is in a heathy state in all other respects and all other scheduled backups do get completed after this . We are running TSM 5.1 on AIX 5.2. We also run TSM 4.2 on AIX 4.3 but this does not exhibit this problem.

Since it happens intermittently, it`s difficult to set a trace. The workload on the server is not particulary challending and topas shows ample idle time.

I would appreciate any help and suggestions. Thanks a lot

LED888 (TechnicalUser)
12 Aug 04 22:17

Session rejected: TCP/IP connection failure [Same as ANS1017E]
This is what the client sees and reports, but has no idea why.
The cause is best sought in the ADSM server Activity Log for that time.
Could be a real datacomm problem; or...
Grossest problem: the TSM server is down.

If you get this condition after supposedly changing the client and server to use a different port number (e.g., 1502), and the Activity Log has no significant information about the attempted session, use 'netstat' or 'lsof' or similar utility in the server operating system to
verify that the *SM server is actually serving the port number that you believe it should be. (You *did* code the port numbers into both the client and server options files, right?)
An administrator may have done a 'CANcel SEssion'.
If during a Backup, likely the server cancelling it due to higher priority task like DB Backup starting and needing a tape drive...particularly when there is a drive shortage. Look in the server Activity Log around that time and you will likely see "ANR0492I All drives in use. Session 22668 for node ________ (AIX) being preempted by higher priority operation.".

Or look in the Activity Log for a "ANR0481W Session NNN for node () terminated - client did not respond within NN seconds." message, which reflects a server COMMTimeout value that is too low.

Message "ANR0482W Session for name () terminated - idle for more than N minutes." is telling you that the sever IDLETimeout value is too low. Remember that longstanding clients may take considerable time to rummage around in their file systems looking for new files to back up.

Another problem is in starting your client scheduler process from
/etc/inittab, but failing to specify redirection - you need:
dsmc::once:/usr/bin/dsmc sched > /dev/null 2>&1 # TSM scheduler

An unusual cause is in having the client and server defined to use the same port number!

Might also be a firewall rejecting the TSM client as it tries to reach the server through that firewall.


J1gh2 (MIS)
13 Aug 04 5:03
Thanks a lot, LED888

I will follow up all the leads that you have given me and let you know.

Cheers
J1gh2 (MIS)
17 Aug 04 8:29
Hi LED888

I did indeed find ""ANR0492I All drives in use..." in the activity log at the time of the TCP/IP connection failure. We are now installing more drives...

Thanks a lot for your help
LED888 (TechnicalUser)
19 Aug 04 20:37
That's great news!

No comments: