My client is now live, running Suite on HANA 2.0 on an IBM POWER & Storwize solution after a successful weekend migration.
The focus for this blog is to discuss what led to my client successfully migrating their SAP ECC environment to SAP HANA using capabilities inherent with IBM Power servers. Capabilities available to every SAP HANA client who choose IBM Power vs the only other alternative platform supported for SAP HANA workloads. The alternative option is built with Intel processors either running as bare-metal or with virtualization. If virtualized, it will likely be VMware which I refer to as a “compromise” solution full of gotchas, limitations, restrictions and constraints. Whereas, IBM Power using PowerVM is a “No Compromise” option. I’ll give some examples of this bold statement below.
Back to my client. In the Fall of 2018, the client chose an IBM Power solution supporting four environments using a fully virtualized two-site solution. Client chose to deploy HANA in parallel to their existing SAP ECC environment which runs on IBM i. Regarding storage, each site uses new IBM Storwize AFA products. For high-availability and resiliency, the solution uses SuSE HA clustering between a pair of Production servers with SAP HANA Replication locally from Primary Prod to Failover Prod and then from Primary Prod to the DR server.
DR consists of a (very) large Scale-up IBM POWER8 server hosting all HANA DB & App (NetWeaver) VM’s for each environment (Sandbox, Dev, QAS, etc). Production uses a smaller pair of the DR server just for the HANA DB VM’s plus a pair of POWER9 Scale-out servers both hosting redundant App servers.
Each of the IBM Power systems in this SAP HANA environment use Dual VIOS or Virtual I/O Server. For the uninitiated, Dual VIOS means there are 2 special VM’s which virtualize and manage the I/O to every VM’s. Remember a VM can technically use any combination of dedicated or virtual I/O but typically when a VIOS is used, it manages both network and storage I/O with one exception where I often see a client use a dedicated Fibre adapter for physical tape connections. Benefits of implementing Dual VIOS are many. They require fewer adapters leading to smaller servers and/or less I/O expansion, provide I/O path redundancy to network & storage while also increasing serviceability as the client can do just about any kind of maintenance on the I/O path transparent to the workloads. This means very little downtime is ever required to service and maintain the I/O subsystem. This includes adding, removing, upgrading and configuring adapters, ports, updating drivers, etc. Thus, if a port or adapter fails or if something were to happen to a VIOS (very rare), its redundant ports, adapters and VIOS are configured to automatically service the I/O from the remaining resources. There are many options available to deploy redundant VIOS, from active/passive to active/active for both network and storage I/O. Another benefit when virtualizing the I/O is to enable features such as Live Partition Mobility and Simplified Remote Restart ….. no compromises, remember?!
I should disclose that my company has an SAP migration, consulting and managed services practice. We were selected to provide both the infrastructure implementation and the SAP ECC to HANA migration services. Starting with the lower environments, my SAP services team started late last year (2018) concluding with Prod in May 2019. This client wanted each environment to be a full copy of the HANA DB whereas it is common for clients to make the lower environments smaller. Our migration and infrastructure teams worked together during every step, creating additional VM’s, adding storage, mount points, dialing in cores and memory for every HANA DB and App VM for each environment.
With IBM POWER8 and IBM POWER9 servers, SAP states Production VM’s are required to use Dedicated (and dedicated donating) cores while non-Prod environments may use dedicated cores or use Shared Processor Pools (SPP). This means clients can use every square inch of their IBM POWER servers – dialing in the cores and memory. For Non-Prod, clients receive greater granularity sharing cores leading to even greater resource efficiency. This leads to smaller and fewer servers – say it with me “lower cost!” make for very happy clients!
Contrast this with the alternative Intel solution with its two choices; bare-metal meaning no virtualization benefits or to use virtualization. Bare-metal means 1 OS image per physical server. Hopefully your infrastructure provider or SAP consultant does not under size the cores & memory as the cost to remediate can be very costly (i.e. possibly new servers if the current system is already maxed out). If the market leading virtualization product (i.e. VMware) is chosen, its VMs do not offer the granularity available as are available from IBM Power systems with its ultra-secure and rock-solid Power Hypervisor (PHYP).
The alternative virtualization product requires (i.e. limits or restricts) each VM to allocate cores in increments of full or ½ sockets. Let’s say the HANA DB system is a 4-socket Intel server using 22-core processors totaling 88 cores with 1536 GB RAM per socket or 6 TB in total. If the HANA DB sizing called for 46 cores, you would be required to assign 3 sockets or 66 cores for a VM which only needs 46, wasting 20 cores plus all of the excess memory attached to that 3rd socket. Another approach using this example of waste when using virtualization for the alternative option. If the HANA DB VM requires 3200 GB of memory. Because this is 128 GB more than physically connected to 2 sockets, you must allocate all 4,608 GB of memory attached to the 3 sockets as well as all 66 cores on those 3 sockets as previously described. 1,408 GB of memory is wasted, unable to be used by any other VM’s on that server. Fortunately, the larger DIMMs used to achieve the needed capacities are cheap so this waste is a drop in the bucket (in reality, these large DIMMs are NOT cheap at all!) SAP states there is overhead incurred from this market leading virtualization product. Also, if security is important, don’t overlook its many security vulnerabilities such as the Intel Management Engine, VMware vSphere, Linux plus the recent Meltdown, Spectre, Foreshadow and Zombieload side-channel threats come to mind.
Some of these security vulnerabilities come with a performance penalty. SAP fully supports IBM POWER8 & POWER9 using SMT8 while many recommend disabling the use of hyper-threading when using Intel servers. SMT options are SMT8, SMT4, SMT2 and SMT0. To view the SMT level on SuSE or RedHat, use `ppc64_cpu –smt` and to change the SMT level from its current level to SMT4, for example, use `ppc64_cpu –smt=4`. Note the switch used is a “<dash><dash>smt” as many editors will change that to a large single dash. The default for Linux on POWER8 should be SMT8 but there are some situations where the default is SMT4. For POWER9, all supported Linux distributions should default to SMT8. Also, clients are able to change SMT from one level to another dynamically by VM (yes, I said by “VM”). This is a huge feature, unavailable on Intel.
UPDATE (6/18/2019): SAP Note 2393917 states the following statement. “Due to the security vulnerability identified in CVE-2018-3646 VMware strongly advices customers to review and enable the recommendations indicated in VMware KB 55806. In particular, VMWare recommends that customers must ensure that after enablement, the maximum number of vCPUs per VM must be less than or equal to the number of total cores available on the system or the VM will fail to power on. The number of vCPUs on the VM may need to be reduced if it is to run on existing hardware. The number of vCPUs should be a factor of two. VMware is providing a tool to assist customers with the analysis of their VM configuration.” . This SAP Note is very explicit. Though they are not declaring clients MUST disable hyper-threading, when a vendor states they “strongly advise” you to do something, they are saying essentially telling you to do something.
Here are a couple of articles on the performance impact here and here, but do your own internet research as well. SAP has been remediating their cloud Intel environment with details in SAP Note 2709955
but are copping out regarding what clients should do regarding their on-premise Intel servers. Instead, they defer to the relevant vendors like Intel, VMware, RedHat, SuSE, etc to determine what they should do. It’s not like hyper-threading is known for performance or throughput, for that matter but if it delivers something to increase efficiency, it would be a good thing. Lose hyper-threading and all you have left for threads are physical cores. This means less efficiency for the application as HANA loves threads which is why it scales so well on IBM Power. For Intel sizing, this likely requires more cores with associated memory which leads to more sockets with more memory leading to larger, more expensive servers to obtain the desired scale. Using TDI Phase 5 based on SAPS values, any sizing would need to be adjusted to compensate for not having hyper-threading. With IBM Power, size it, dial it in the cores and memory, tune the OS and re-use spare capacity for other VM’s running Linux, AIX & IBM i (if supported by the chosen model) as needed – no compromises!
Thought I would create the table below to compare IBM’s PowerVM vs VMware’s vSphere using a couple of SAP Notes to compare PowerVM versus VMware. It gets really complicated trying to explain Intel and VMware capabilities, even IBM Power & PowerVM to a lesser degree as what is supported varies by CPU Architecture (i.e. Ivy Bridge, Haswell, Broadwell, Skylake and Cascade Lake in the case of Intel and POWER8 vs POWER9) as well as the VMware generation (i.e. vSphere pre-6.5, 6.5 and 6.7). I’ll try to annotate differences but will ask you to reference SAP Notes for VMware vSphere on Intel are 2652670, 2718982 and 2393917 for specific details. SAP Notes for IBM Power using PowerVM are 2055470, 2230704, 2188482 and 2535891
Disclaimer: I am verifying some of the values shown in the table below and will update the table as needed. With new features being supported regularly, it can be a challenge to remain current.
REMINDER: This chart ONLY applies to Business Suite and not BW. I’d have to build another table for those differences as BW was not germane to this client or this blog.
VMware vSphere (OLTP)
|Max VM’s per system||16**||1 – 16 Production VM’s*
1 – 1008 VM’s (15 Prod + 993 Non-Prod VM’s)
POWER9 E950 & E980
|1 – 8 Production VM’s*
1 – 1000 VM’s (7 Prod + 993 Non-Prod VM’s)
POWER8 E870(C) & E880(C)
|1 – 6 Production VM’s*
1 – 920 VM’s (5 Prod + 915 Non-Prod VM’s)
|1 – 4 Production VM’s*
1 – 426 VM’s (3 Prod + 423 Non-Prod VM’s
POWER8 & POWER9 2-socket Scale-Out
|Max VM size||Up to 4 sockets (BW, SL, CL)|
|VM size increments***||1, 2, 3 and 4 full sockets||Dedicated & Dedicated Donating
1 core increments
|½ socket (no multiples like 1.5) but 2, 3, 4, 5, 6, 8 and 8 ½ sockets supported||Shared Processor Pool (Non-Prod workloads)
Rule of Thumb is 20 VM’s per core
|Threading||2 threads / Hyper-Thread||SMT8 per core or virtual processor|
|Max vCPU||128 (6.5, 6.7)(BW)||
POWER8: If a VM uses more than 96 cores then set SMT=4. Otherwise set SMT=8
Max Cores per VM * SMT level = threads
Ex 1: 176*4=704 threads
Ex 2: VM1 = 96 *8 = 768 threads and VM2 = 80*8 = 640 threads for a total of 1,408 threads.
POWER9: If a VM uses more than 48 cores then set SMT=4. Otherwise set SMT=8.
Max Cores per VM * SMT level = threads
Ex 1: 128*4=512 threads and 64*4= 256 threads for a total of 768 threads.
Ex 2: VM1 = 48 *8 = 384 threads; VM2 = 48*8 = 384; VM3 = 48*8 = 384; VM4 = 48*8 = 384 threads for a total of 1,536 threads.
|128 (6.5, 6.7)(SL,CL)||
Cores * Max # of vCPU per core * SMT level
Using SPP, take the # of cores * 20 * SMT level.
POWER8 Ex: 176 * 20 * 8 = 28,160 threads
POWER9 Ex: 192 * 20 * 8 = 30,720 threads
|224 (6.5, 6.7)(SL,CL)|
|Max Cores||N/A||176 cores (POWER8)|
|N/A||192 cores (POWER9)|
|Max Memory||4 TB (6.5, 6.7)(BW)||16 TB (POWER8)|
|6 TB (6.5, 6.7)(SL, CL)||24 TB (POWER9)|
|Memory allocation||Only memory attached to ½ socket or full sockets. If more memory is required, underlying ½ or full sockets go with it.||As long as the VM has the minimum memory allocated, memory increments can occur as small as 1 MB|
|SAP could require virtualization to reproduce on bare-metal||Yes||No|
|Min performance degradation from virtualization per SAP||14% for ½ socket VM’s
Avg of 10% over bare-metal
* VIOS VM’s do not count toward these totals
** Requires an 8-socket server to achieve 16 VM’s, with each using ½ sockets per VM. If using full socket VM’s the most possible would be 8 using the 8-socket server example.
*** You can mix ½ and full socket VM’s on the same server. Example would be 4 x ½ socket VM’s which would consume 2 sockets and 6 x 1 socket VMs consuming 6 sockets totaling 8 sockets.
BW = Broadwell
SL = Sky Lake
CL = Cannon Lake
Back to the migration story. A chain is only as strong as its weakest link. This client’s environment is no different as their weak link happens to be their network. During the prep-work for the Prod migration, they decided to move the primary Prod App server to the same frame hosting the Primary Prod HANA DB VM. Using Live Partition Mobility, they dynamically moved the App VM to the same frame with the Prod HANA DB. This provided added network stability (because of their weak link) while reducing the chance of external network latency. It is difficult to coordinate downtime among the various stakeholders of a multi-billion-dollar company not to mention the cost of downtime. Since they were migrating the database from the current ECC system over the network, the client liked having the option to granularly allocate resources plus move VM’s where they needed them. With IBM Power, clients have flexibility leading to fewer scheduled outages as most maintenance and administration can be performed concurrently. Is anyone keeping score of all the advantages obtained by IBM Power as I’ve put many hash marks in its column while placing many X’s in the column for the alternative platform.
Regarding the network traffic, the network adapters are 10 GbE optical, configured in the VIOS using Shared Ethernet Adapters which provide a virtual switch. Traffic enters and leaves the server through the SEA whereby network packets within the server are sent/received over the systems memory bus using a technology in the Power Hypervisor called Virtual Ethernet (VE). This makes data transfers from VM to VM within the frame occur very fast with ultra-low latency and efficiency. Hence why the client wanted the App server sitting (logically and I guess physically as well) millimeters away from the HANA DB server.
The export of the 24 TB database from the source system began just after midnight Friday night. This took approximately 6 hours. They next moved to import the data into the new environment which took 24 hours. At the successful conclusion of the migration, they used LPM to move the App VM back to its original home on one of the POWER9 Scale-out servers. During the migration, the client chose to stuff more cores and memory in to the App VM while running on the Scale-up server. the App VM was originally sized to use 8 cores and 256 GB RAM but they called an audible and bumped the cores to 12 cores and memory to 384 GB RAM. For those familiar or not with Power systems and common workloads, this is a lot of cores and memory for an App server, but since they had spare resources on the Scale-up server, they chose to “use ‘em since they got ‘em!”. The LPM of the App VM from its temporary residence on the Scale-up server back to the Scale-out server took approximately 15 minutes from start to finish. During the LPM process, it copies memory pages from the source system to include dirty memory pages which on an active system is an ongoing activity. The more memory and the more active the memory is being used in a VM, the longer it can take to complete a LPM event (Mr Obvious is not needed, we got this!) When all memory pages are copied to the target system, the final cutover occurs in less than 2 seconds. The VM is no longer “on” the source frame while running “on” the target frame as if nothing changed. I am curious to see if the client has reduced the cores and memory on the App VM back to its post-migration design sizing of 4 cores and 64 GB RAM. Normally, both actions could be performed dynamically, and though LPM is supported by SuSE and RedHat running in IBM Power, SAP doesn’t yet support it and I can verify the HANA DB doesn’t like it.
The Client has been working through their post-migration punch-list as the system went live on schedule at 4:30 pm that Sunday afternoon. Starting with a kick-off call Friday night at 8:30 pm and going live Sunday at 4:30 pm they successfully moved their entire businesses Production environment from SAP ECC to Suite on HANA in 44 hours (31 hours for the start of export to finish of import).
Beginning last Fall, my team began to implement the infrastructure starting with the DR environment. Over the many months, we received many requests from the SAP Basis team and our SAP migration team to create new environments for testing, add resources, mount points or make some type of change to the VM. The only feature which would’ve been beneficial to have and still unsupported by SAP on either supported Platform is the ability to dynamically add/remove cores and memory. I do expect this feature to be supported on IBM Power with PowerVM shortly. These capabilities, especially dynamic memory add / removal have been around for a decade and a half with IBM Power. Technology is very reliable, very consistent and very convenient. I’m sure purists for Intel solutions using VMware might argue their product works just as well. I believe SAP’s own guidance says otherwise and of course if someone would like to have some fun, we could setup a 2-server solution and run through a battery of tests to compare virtualization features on both platforms. We’d have to run each under a heavy load as it would be unfair to our audience to do these tests in a vacuum as that isn’t real world. While at it, maybe we could run some informal Oracle database (sorry, can’t help myself – read my previous blog to know of my Oracle obsession) testing along with a TCA/TCO analysis comparing how both platforms performs. We’ll refer to it as “using a leading enterprise RDBMS product” so we don’t upset lawyers.
In summary, I’m obviously very proud how this solution performed as it took a strong, capable team to design, deploy and support this client for 4 separate migrations. This no-compromise solution was >35% less costly vs a competing solution making the lives of this client much better from beginning to go-live.
Kudo’s to IBM as they have the best platform for SAP HANA and also tremendous SAP talent available to partners and clients for pre-sales support, IBM Lab Services for HANA installation assistance and IBM Linux for SAP HANA support.
IBM Systems Magazine http://ibmsystemsmag.com/power/systems-management/data-management/sap-hana-landscapes/?utm_source=SilverpopMailing&utm_medium=email&utm_campaign=052119-Power-EXTRA+%281%29+Live+Send&utm_content=Simplify+and+Accelerate+SAP+HANA+Landscapes&spMailingID=15684623&spUserID=MTMzMTk5NTQyNjAxS0&spJobID=1641419654&spReportId=MTY0MTQxOTY1NAS2#.XOVt36iYHsI.twitter
SAP on Power blog by Alfred Freudenberger https://saponpower.wordpress.com
Linux on Power – system tuning Linux https://developer.ibm.com/linuxonpower/docs/linux-on-power-system-tuning/
Interesting article discussing the use of SMT8 on IBM POWER9 servers running DB2 https://developer.ibm.com/linuxonpower/2018/04/19/ibm-power9-smt-performance-db2/