High Availability for SAP Central Servers using pacemaker on SLES
Scope of the document
- Plan a SUSE Linux Enterprise High Availability platform for SAP NetWeaver central services ( Message , Enqueue and Enqueue replication server )
- Setup a two node cluster using on vmware hosts using fencing agent : fence_vmware_rest
- Integrate the high availability cluster with the SAP control framework via sap_suse_cluster_connector
- Setup the SAP HA resources
Note: This document does not deal with the installation of SAP services. It is assumed the SAP system is already installed.
SAP Landscape Layout
For simplicity we will assume the following architecture:
- 1 primary app server (PAS)
- Two node cluster for central services including ERS
- 1 DB server
Install the suse cluster connector
The SAP suse cluster connecter is of special interest. It helps to integrate the cluster with sapstartsrv service so that an administrator can control the cluster through standard SAP maintenance tools like sapcontrol, SAP MMC, SAP Lama.
The new version of the sap_suse_cluster_connector now allows to start, stop and migrate a SAP instance. Install this on both the central nodes.
Note: You will have to have root privileges to run these commands
zypper in sap-suse-cluster-connector
Install HA Packages for suse
This will install all packages related to pacemaker setup.
zypper in -t pattern ha_sles
Setup HA Cluster
On Node-1
ha-cluster-init => Do you wish to use SBD(y/n)? Select NO (since we are not using block storage device & instead using NFS based storage) => give IP address of private NIC. This will configure the ring address.
Change hacluster password if needed ( default 'linux' )
passwd hacluster => give the new password
Join Cluster from Node-2
Note: This step would require you to create and distribute ssh keys between node1 and node2. You can also choose to create the keys as part of the ha-cluster-join command.
ha-cluster-join => give the private IP of Node 1 when asked
Start pacemaker and corosync services on both nodes
systemctl restart pacemaker systemctl restart corosync
Stonith configuration using vmware fencing agent
This crm resource will actually control the system start/stop in case of errors or failover.
Create vsphere stonith resource
crm configure primitive stonith-vsphere stonith:fence_vmware_rest params ipaddr=<vcenter ip> ssl_insecure=1 login="administrator@vsphere.local" passwd="******" pcmk_host_check=static-list pcmk_host_list="node-1,node-2" op monitor interval=300 timeout=360
Create Clone for stonith resource
crm configure clone cln_stonith-vsphere stonith-vsphere meta is-managed=true clone-node-max=1 target-role=Started
Setup CRM resources for SAP services and IPs
In this step we actually define the standard services we would like to be managed as part of the cluster. Refer figure 2.
For sap central services HA we manage the below resources:
- SAP_ERS
- SAP_ASCS
- IP_ERS
- IP_ASCS
Prequisite: Please make the following changes to SAP profiles before configuring crm resourses.
Update SAP profiles
Add following lines to DEFAULT.PFL
service/halib = $(DIR_CT_RUN)/saphascriptco.so service/halib_cluster_connector = /usr/bin/sap_suse_cluster_connector
Edit the ASCS/SCS profile as follows
vi /sapmnt/SID/profile/SAP_ASCS00_sid-ascs #Change the restart command to a start command #Restart_Program_01 = local $(_EN) pf=$(_PF) Start_Program_01 = local $(_EN) pf=$(_PF) # Add the keep alive parameter, if using ENSA1 enque/encni/set_so_keepalive = true
Edit the ERS profile as follows
sudo vi /sapmnt/SID/profile/SAP_ERS01_sid-ers # Change the restart command to a start command #Restart_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID) Start_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID) # remove Autostart from ERS profile # Autostart = 1
Put cluster in maintenance mode
crm configure property maintenance-mode="true"
Configure IP resources for both ERS and ASCS instances
crm configure primitive rsc_ip_SID_ASCS IPaddr2 params ip=<vip for ASCS> op monitor interval=10s timeout=20s crm configure primitive rsc_ip_SID_ERS IPaddr2 params ip=<vip for ERS> op monitor interval=10s timeout=20s
Configure SAP resources for both ERS and ASCS instances
crm configure primitive rsc_sap_SID_ERS SAPInstance operations \$id=rsc_sap_SID_ERS-operations op monitor interval=11 timeout=60 op_params on_fail =restart params InstanceName=SID_ERS01_sapersSID START_PROFILE="/sapmnt/SID/profile/SID_ERS01_sapersSID" AUTOMATIC_RECOVER=false IS_ERS=true meta priority=1000 crm configure primitive rsc_sap_SID_ASCS SAPInstance operations \$id=rsc_sap_SID_ASCS-operations op monitor interval=11 timeout=60 op_params on_fail =restart params InstanceName=SID_ASCS00_sapascsSID START_PROFILE="/usr/sap/SID/SYS/profile/SID_ASCS00_sapascsSID" AUTOMATIC_RECOVER=false meta resource-stickiness=5000 failure-timeout=60 migration-threshold=1 priority=10
Configure Groups to club the resources
This will help keep the IP and SAP resource for ASCS and ERS together as one respectively.
crm configure group grp_SID_ASCS rsc_ip_SID_ASCS rsc_sap_SID_ASCS meta resource-stickiness=3000 crm configure group grp_SID_ERS rsc_ip_SID_ERS rsc_sap_SID_ERS
Configure the colocation constraints between ASCS and ERS
This step is of utmost importance since here we actually define the constraints based on which the failover of services is governed. We basically define that at any given point of time the ERS and ASCS services (including ip services) should be colocated as in running on different nodes. Also in case of failure ASCS will started first followed by ERS.
crm configure colocation col_sap_SID_no_both -5000: grp_SIS_ERS grp_SID_ASCS crm configure location loc_sap_SID_failover_to_ers rsc_sap_SID_ASCS rule 2000: runs_ers_SID eq 1 crm configure order ord_sap_SID_first_start_ascs Optional: rsc_sap_SID_ASCS:start rsc_sap_SID_ERS:stop symmetrical=false
At this point your cluster setup for SAP central services is complete. You may now bring the cluster out of maintenance mode
crm configure property maintenance-mode="false"
Now you may test your cluster using crm commands or either bring down the SAP services manually or even bringing down one of the hosts. Detailed cluster testing is not part of the of this document. You can use "crm status" and "crm_mon" for checking the cluster resource setup and status.
Thanks Robin , your drafted in simple steps