Update to RHEL 7.4 breaks DB2 Cluster (TSAMP / RSCT)
after updating a RHEL 7.3 system to RHEL 7.4 the DB2 HADR / TSAMP cluster stopped working. I was able to establish the HADR connection but the service IPs were not assigned to the network interface.
Executing the command “lssam” to display the cluster state showed that nearly all resources are in the “Pending online state”.
First of all I though it is an issue with the cluster config – so that I deleted the HA domain and recreated it.
What I did to recreate the config:
Note: It is always a good idea to use xml files for the config and not using the “interactive db2haicu” cmd tool! Recreating a cluster is a matter of minutes.
On the standby side:
db2haicu -f db2config_standby.xml
–> this worked pretty well.
On the primary side:
db2haicu -f db2config_primary.xml
–> this did not work and ran into an error after a very long time!
db2diag.log showed the following error / warnings:
2017-11-20-10.39.08.311255+000 I58687E428 LEVEL: Warning PID : 37088 TID : 140686421243776 PROC : db2havend (db2ha) INSTANCE: stinst1 NODE : 000 HOSTNAME: server1.server.com FUNCTION: DB2 UDB, high avail services, db2haAddResource, probe:12318 DATA #1 : Error adding resource db2_stinst1_stinst1_STSC-rs to group db2_stinst1_stinst1_STSC-rg, resource handle is NOT valid
2017-11-20-10.27.12.993773+000 E56921E880 LEVEL: Error PID : 34868 TID : 140601694013312 PROC : db2havend (db2ha) INSTANCE: stinst1 NODE : 000 HOSTNAME: server1.server.com FUNCTION: DB2 UDB, high avail services, db2haMapResourceNameToHandle, probe:22152 MESSAGE : RM-specific error detected during query. If the affected RM is not the owner of the resource being queried then this is not a fatal error, but otherwise the operation leading to this error will fail. DATA #1 : String, 8 bytes IBM.Test DATA #2 : String, 27 bytes db2_stinst1_server1_0-rs DATA #3 : String, 0 bytes Object not dumped: Address: 0x0000000000000000 Size: 0 Reason: Address is NULL DATA #4 : String, 87 bytes 2610-415 Cannot execute the command. The resource manager IBM.TestRM is not available. DATA #5 : unsigned integer, 4 bytes 262154
It was pretty obvious that this was related to the RHEL 7.4 update… Digging deeper into this problem together with IBM, the following developerworks atricle brought us the solution:
I decided to install the efix and this solved the problem. After running db2 db2haicu command all resources showed up to be online again!
Next time it is NOT mandatory to recreate the db2 cluster config… Just install the eFix and the problem is gone!