Sunday, January 13, 2013

4-Node RAC now officially supported; will the silver bullet deliver?

Oracle introduced Real Application Clusters (RAC) back in 2001 for Oracle 9i. Those days, Friends was running on TV, gas guzzling cars were running on Boston roads and Yantra 2x was running on retailers' back end systems with Oracle sans any RAC support. Fast forward to 2006 - Friends has ended its run and Sopranos was running on TV; gas guzzling sedans were being replaced by hybrid Priuses on Boston roads and large retailers were running Yantra 7x with 2x customers either wiped out in the dot-com bust or having upgraded to newer version. More importantly with 2 node RAC being supported by Sterling Distributed Order Management (Yantra was acquired by Sterling in 2005) early adopters like Staples, Gap and Best Buy had taken the 2-node RAC plunge and are asking for more nodes support to cater to their growing transaction volumes.  Further fast forward to 2012. Modern Family is now running on TV and with the announcement of 4 Node RAC support with IBM Sterling Selling and Fulfillment (SSFS) suite 9.1 (Sterling was acquired by IBM in 2010) the largest implementations of SSFS now have more choices in front of them on what to run on their Oracle database systems. To be precise what is officially supported per the documentation is only the 4-node RAC running on DELL PowerEdge M600 two quad core processors for upto 32 processors which means no official support for IBM Power series or HP Itanium or other processor based systems.

Is 4-node RAC the silver bullet that it is meant to be for your large volume cross-channel Sterling selling solution? In this article I shall try to dive into why it took this long for 4 node RAC support to materialize, what it means for your Sterling OMS implementation and what strategies you can use to make the most of this announcement.

First, the basics - RAC is a shared disk clustered databases:  every instance in the cluster has equal access to the database’s data on disk. Each instance has its own SGA and background processes and each
session that connects to the cluster database connects to a specific instance in the cluster.The main challenge in the shared disk architecture is to establish a global memory cache across all the instances in the cluster:  otherwise the clustered database becomes IO bound.  Oracle establishes this shared cache via a high-speed private network referred to as the cluster interconnect. Sterling OMS is primarily an OLTP application with key tables corresponding to the bulk of the transactions related to Order or Inventory processing being heavily used. Granted certain functionalities of the Sterling solution behaves or is rather used in a DSS like querying manner, specifically with the reporting solution aka Sterling BI, categorized as a DW solution,  but that is usually driven of a replicated database and is not the subject of the discussion here. Tables with the heaviest lock contention on the Production transaction schema in any implementation are typically YFS_INVENTORY_ITEM and YFS_ORDER_HEADER.

Competitive studies and independent experiments on RAC (source - various online) have both come to the conclusion that certain scenarios such as bulk load, long-running transactions, and handling high-frequency update applications as areas where RAC fails to scale out well. Overall performance suffers in these situations because Oracle RAC needs to transfer large amounts of buffer data among the nodes through the interconnect. Applications that make substantial use of serialization (such as Oracle’s sequence request and index update) also suffer because nodes must wait until operations complete on other nodes before they can continue, and such operations cannot therefore be truly scalable. Incidentally Oracle has reported excellent near linear scalability with RAC including the much touted TPC-C benchmark where they achieved 1.18 million tpm. Most real life customized implementation of Sterling OMS shows the former set of symptoms (long running transactions and high frequency transaction on a particular item or order) and even an out-of-box Sterling implementation uses both indexes and sequences heavily. Sequences in particular being used for all primary key generation on all schema tables. Index updates requiers index leaf blocks to be maintained and passed along multiple nodes

To optimize Sterling application behavior considering these known challenges with the Sterling OMS system and its database design the Performance Management Guide suggested that 2 Node RAC implementations use Jumbo Frames and 10G Ethernet protocols for optimal inter connect traffic times. However, even with those best practices in places at most implementations and in in-house testing what was discovered was that the application would not scale linearly beyond 2 nodes due to the high global cache related latency. Trials and load tests at some of the large customers led to the conclusion that bulk of the order and inventory update transactions are best handled on one instance. This runs counter to the general published recommendations around cluster load balancing to achieve higher scalability. Once customers and us Sterling Performance Consultants discovered the benefits of work load segregation this approach was further used to take advantage of not just 2 nodes but to further scale out to 3+ nodes although this was not supported with prior versions of Sterling OMS.

The disconnect between Oracle's own benchmarks and  real-world experiences of Sterling OMS in the field or internal benchmarks up until SSFS 9.1 can be explained by a combination of the following factors  :-
1. Benchmarks in spite of their best intentions are skewed favorably and not a realistic representation of the system in Production scenarios where products such as Sterling OMS are customized and integrated with other systems running in older legacy systems or a different data center.
2. New orders which form a majority of the workload during peak load cause a high number of inserts in key tables such as YFS_ORDER_HEADER and YFS_ORDER_RELEASE_STATUS. Since multiple transactions such as Create Order, Schedule Order, Release Order, Order Validations etc can run on multiple JVMs they all result in updates to the right most part of the index. High insertion rates are limited by the fact that index leaf blocks have to be released by one node before it can be acquired by the other.
3. Certain external systems interfaces or even highly custom Sterling transactions such as Schedule Order are   longer running than out of the box or benchmark like conditions. This increases lock holding times.
4. In spite of numerous tweaks and advances to the Hot Sku feature Inventory Item locking continues to be the Achilles heal for all high order volume implementations. The problem is magnified during events such as Black Friday when bulk of the orders are for a limited number of SKUs.
5. Index contentions are recommended to be addressed using hash partitions or reverse indexes. Neither of these are supported by Sterling due to the negative impact to performance (slow query response times) in other conditions.

The last point was addressed partially in 9.1 and in a more full-fledged manner in 9.2 with the introduction of the randomizing elements within the primary key. I have not had a chance to test the behavior in the field but results from internal tests have shown the results to be promising.

What does this mean for your implementation? Do you take the 4-node RAC plunge or wade cautiously sticking to a single or 2-node RAC?  Here is what I suggest -
a. Ascertain the size of your Oracle Database through a combination of hardware sizing and capacity planning exercises. Then determine if that need is best met by 2 node RAC or if you truly need more Oracle instances. Even if your load can be handled by a single instance a 2-node RAC may still help you accomplish a more highly available system especially when it comes to patching and database maintenance.
b. Unless you are planning to use the Dell PowerEdge Power M600 system for your database needs you have to run your own set of load and functional tests to ensure that the 3 or higher node RAC configuration meets your needs.
c. Even if you are running the supported Processor stack for 4-node RAC you may want to determine the best allocation of Sterling transactions to Oracle instances via the service configuration if your Sterling version is pre 9.2. Trying to use a single service spread across all instances does not perform best due to reasons mentioned above. Optimal RAC configuration is best determined by load testing under various work load segregation models. A starting point would be to keep the Order flow related transactions to one instance, inventory updates to another, purges to a third instance etc.
d. If you are running Sterling 9.2 ensure that the primary key randomizing feature introduced is working as expected on the 16 tables where it is enabled by default. Key among it are the tables  YFS_ORDER_RELEASE_STATUS, YFS_ORDER_LINE. Insert times irrespective of the RAC scenario should be under 10 ms  (preferably under 5 ms) and seek (read) times should not exceed 5 ms.

Your "Sterling" Performance Architect can help guide you through these choices. I would love to know what your experience with RAC has been for Sterling so do feel free to write in with your questions or comments.