tag:blogger.com,1999:blog-85623524446586521112024-03-19T15:09:17.686-07:00Ranjith on Sterling OMSMy thoughts and lessons on Sterling/Yantra OMS suite implementations, troubleshooting, performance tuning and IT consulting interspersed with random musings and personal storiesRanjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-8562352444658652111.post-58013348368426739932013-01-13T21:24:00.000-08:002013-01-13T21:24:36.270-08:004-Node RAC now officially supported; will the silver bullet deliver?<div dir="ltr" style="text-align: left;" trbidi="on">
Oracle introduced Real Application Clusters (RAC) back in 2001 for Oracle 9i. Those days, Friends was running on TV, gas guzzling cars were running on Boston roads and Yantra 2x was running on retailers' back end systems with Oracle sans any RAC support. Fast forward to 2006 - Friends has ended its run and Sopranos was running on TV; gas guzzling sedans were being replaced by hybrid Priuses on Boston roads and large retailers were running Yantra 7x with 2x customers either wiped out in the dot-com bust or having upgraded to newer version. More importantly with 2 node RAC being supported by Sterling Distributed Order Management (Yantra was acquired by Sterling in 2005) early adopters like Staples, Gap and Best Buy had taken the 2-node RAC plunge and are asking for more nodes support to cater to their growing transaction volumes. Further fast forward to 2012. Modern Family is now running on TV and with the announcement of 4 Node RAC support with IBM Sterling Selling and Fulfillment (SSFS) suite 9.1 (Sterling was acquired by IBM in 2010) the largest implementations of SSFS now have more choices in front of them on what to run on their Oracle database systems. To be precise what is officially supported per the documentation is only the 4-node RAC running on DELL PowerEdge M600 two quad core processors for upto 32 processors which means no official support for IBM Power series or HP Itanium or other processor based systems.<br />
<br />
Is 4-node RAC the silver bullet that it is meant to be for your large volume cross-channel Sterling selling solution? In this article I shall try to dive into why it took this long for 4 node RAC support to materialize, what it means for your Sterling OMS implementation and what strategies you can use to make the most of this announcement.<br />
<br />
First, the basics - RAC is a shared disk clustered databases: every instance in the cluster has equal access to the database’s data on disk. Each instance has its own SGA and background processes and each<br />
session that connects to the cluster database connects to a specific instance in the cluster.The main challenge in the shared disk architecture is to establish a global memory cache across all the instances in the cluster: otherwise the clustered database becomes IO bound. Oracle establishes this shared cache via a high-speed private network referred to as the cluster interconnect. Sterling OMS is primarily an OLTP application with key tables corresponding to the bulk of the transactions related to Order or Inventory processing being heavily used. Granted certain functionalities of the Sterling solution behaves or is rather used in a DSS like querying manner, specifically with the reporting solution aka Sterling BI, categorized as a DW solution, but that is usually driven of a replicated database and is not the subject of the discussion here. Tables with the heaviest lock contention on the Production transaction schema in any implementation are typically YFS_INVENTORY_ITEM and YFS_ORDER_HEADER.<br />
<br />
Competitive studies and independent experiments on RAC (source - various online) have both come to the conclusion that certain scenarios such as bulk load, long-running transactions, and handling high-frequency update applications as areas where RAC fails to scale out well. Overall performance suffers in these situations because Oracle RAC needs to transfer large amounts of buffer data among the nodes through the interconnect. Applications that make substantial use of serialization (such as Oracle’s sequence request and index update) also suffer because nodes must wait until operations complete on other nodes before they can continue, and such operations cannot therefore be truly scalable. Incidentally Oracle has reported excellent near linear scalability with RAC including the much touted TPC-C benchmark where they achieved 1.18 million tpm. Most real life customized implementation of Sterling OMS shows the former set of symptoms (long running transactions and high frequency transaction on a particular item or order) and even an out-of-box Sterling implementation uses both indexes and sequences heavily. Sequences in particular being used for all primary key generation on all schema tables. Index updates requiers index leaf blocks to be maintained and passed along multiple nodes<br />
<br />
To optimize Sterling application behavior considering these known challenges with the Sterling OMS system and its database design the Performance Management Guide suggested that 2 Node RAC implementations use Jumbo Frames and 10G Ethernet protocols for optimal inter connect traffic times. However, even with those best practices in places at most implementations and in in-house testing what was discovered was that the application would not scale linearly beyond 2 nodes due to the high global cache related latency. Trials and load tests at some of the large customers led to the conclusion that bulk of the order and inventory update transactions are best handled on one instance. This runs counter to the general published recommendations around cluster load balancing to achieve higher scalability. Once customers and us Sterling Performance Consultants discovered the benefits of work load segregation this approach was further used to take advantage of not just 2 nodes but to further scale out to 3+ nodes although this was not supported with prior versions of Sterling OMS.<br />
<br />
The disconnect between Oracle's own benchmarks and real-world experiences of Sterling OMS in the field or internal benchmarks up until SSFS 9.1 can be explained by a combination of the following factors :-<br />
1. Benchmarks in spite of their best intentions are skewed favorably and not a realistic representation of the system in Production scenarios where products such as Sterling OMS are customized and integrated with other systems running in older legacy systems or a different data center.<br />
2. New orders which form a majority of the workload during peak load cause a high number of inserts in key tables such as YFS_ORDER_HEADER and YFS_ORDER_RELEASE_STATUS. Since multiple transactions such as Create Order, Schedule Order, Release Order, Order Validations etc can run on multiple JVMs they all result in updates to the right most part of the index. High insertion rates are limited by the fact that index leaf blocks have to be released by one node before it can be acquired by the other.<br />
3. Certain external systems interfaces or even highly custom Sterling transactions such as Schedule Order are longer running than out of the box or benchmark like conditions. This increases lock holding times.<br />
4. In spite of numerous tweaks and advances to the Hot Sku feature Inventory Item locking continues to be the Achilles heal for all high order volume implementations. The problem is magnified during events such as Black Friday when bulk of the orders are for a limited number of SKUs.<br />
5. Index contentions are recommended to be addressed using hash partitions or reverse indexes. Neither of these are supported by Sterling due to the negative impact to performance (slow query response times) in other conditions.<br />
<br />
The last point was addressed partially in 9.1 and in a more full-fledged manner in 9.2 with the introduction of the randomizing elements within the primary key. I have not had a chance to test the behavior in the field but results from internal tests have shown the results to be promising.<br />
<br />
What does this mean for your implementation? Do you take the 4-node RAC plunge or wade cautiously sticking to a single or 2-node RAC? Here is what I suggest -<br />
a. Ascertain the size of your Oracle Database through a combination of hardware sizing and capacity planning exercises. Then determine if that need is best met by 2 node RAC or if you truly need more Oracle instances. Even if your load can be handled by a single instance a 2-node RAC may still help you accomplish a more highly available system especially when it comes to patching and database maintenance.<br />
b. Unless you are planning to use the Dell PowerEdge Power M600 system for your database needs you have to run your own set of load and functional tests to ensure that the 3 or higher node RAC configuration meets your needs.<br />
c. Even if you are running the supported Processor stack for 4-node RAC you may want to determine the best allocation of Sterling transactions to Oracle instances via the service configuration if your Sterling version is pre 9.2. Trying to use a single service spread across all instances does not perform best due to reasons mentioned above. Optimal RAC configuration is best determined by load testing under various work load segregation models. A starting point would be to keep the Order flow related transactions to one instance, inventory updates to another, purges to a third instance etc.<br />
d. If you are running Sterling 9.2 ensure that the primary key randomizing feature introduced is working as expected on the 16 tables where it is enabled by default. Key among it are the tables YFS_ORDER_RELEASE_STATUS, YFS_ORDER_LINE. Insert times irrespective of the RAC scenario should be under 10 ms (preferably under 5 ms) and seek (read) times should not exceed 5 ms.<br />
<br />
Your "Sterling" Performance Architect can help guide you through these choices. I would love to know what your experience with RAC has been for Sterling so do feel free to write in with your questions or comments.<br />
<br />
<br /></div>
Ranjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.com1tag:blogger.com,1999:blog-8562352444658652111.post-9815450106927084442012-10-07T17:19:00.002-07:002012-10-07T17:19:35.752-07:00When seeing is not believing - Agent flow misconfiguration unraveled<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
The old saying goes Seeing is believing but the other day while examing an issue at a customer environment I saw something that made me do a double take. For I could not believe what I saw on the Sterling application configuration. Thus, the title of the blog (not to mention my weakness for catchy titles). Read on to find out how the issue was investigated and learn more about agent/flow configuration internals.<br />
<br />
Like most issues it started out mundane - an Invalid Server error from one of the agent logs. The relevant lines from the logs of the agent server <span style="font-family: Helv, sans-serif; font-size: 10pt;"><b>AsyncReqAgentServer </b>are pasted below <b>-</b></span><br />
<b><span style="font-family: Helv, sans-serif; font-size: 10pt;"><br /></span></b>
<br />
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Helv, sans-serif; font-size: 10pt;"><Errors><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Helv, sans-serif; font-size: 10pt;">
<Error ErrorCode="YCP0223" ErrorDescription="Invalid
Server." ErrorRelatedMoreInfo="No Services Configured for this
Server: AsyncReqAgentServer"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Helv, sans-serif; font-size: 10pt;">
<Attribute Name="ErrorCode" Value="YCP0223"/><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Helv, sans-serif; font-size: 10pt;">
<Attribute Name="ErrorDescription" Value="Invalid
Server."/><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Helv, sans-serif; font-size: 10pt;">
<Attribute Name="ErrorRelatedMoreInfo" Value="No
Services Configured for this Server: AsyncReqAgentServer"/><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-family: Helv, sans-serif; font-size: 10pt; line-height: 115%;">
<Stack>com.yantra.interop.services.InvalidConfigurationException</span></div>
<div class="MsoNormal">
<span style="font-family: Helv, sans-serif; font-size: 10pt; line-height: 115%;"><br /></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The AsyncReqAgent is typically used to run the ASYNC_REQ_PROCESSOR transaction. So, I did what most of us would do check out the configuration of the ASYNC_REQ_PROCESSOR transaction. Here is what I saw - </div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjotg2O2W84amrDE50xwlOCSqFKFu1EcHG_NZNqZDd8m95veqYvf7XzVPVfjdaehVYzXERCbX8epJ7iXWEQkMqd6M667OsYVhe0QCRvXuBKNNsjmQmEJTLGJnijcWuBY03PzYruY2lPJN0/s1600/agentconf3.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="56" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjotg2O2W84amrDE50xwlOCSqFKFu1EcHG_NZNqZDd8m95veqYvf7XzVPVfjdaehVYzXERCbX8epJ7iXWEQkMqd6M667OsYVhe0QCRvXuBKNNsjmQmEJTLGJnijcWuBY03PzYruY2lPJN0/s320/agentconf3.JPG" width="320" /></a></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Now, you can see what stumped me. On one hand the Application configuration is showing one thing while the same application logs is vehemently indicating another. Putting on my PE hat I figured that there is more to it that meets the eye and decided to dig a little deeper. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
First, I checked if the transaction is indeed running. A quick grep of the agent logs showed that it was running as part of the DefaultAgentServer as it was the DefaultAgentServer logs that had the "Starting service..." message.</div>
<div class="MsoNormal">
Then, I decided to check the other environments to see where it is supposed to be running or configured. In Production I learnt that it was running under the AysncReqAgentServer. In lower environments it was running in a mixed mode but with most of them it was running on DefaultAgentServer.</div>
<div class="MsoNormal">
At this stage a combination of instinct and experience led me to venture a guess that it is probably right in Production and just messed up here and elsewhere and I just have to prove that. </div>
<div class="MsoNormal">
So I checked the server configuration instead of the transaction configuration. This is a neat little configuration screen that is not very well known mostly because it is seldom used. Buried in the Platform Application view > System Administration grouping is the Configured Servers view. This can be used both to view all the servers defined but also the details of sub services or agent criteria configured for each of the servers. Here is a screenshot - </div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUTHOTJ1yhjFiW3sDs-yudRkadhiyY9SS95V_RCDlVhLiKN8JyxMAz6_AodVLPIMagBeAgeIQ4diek_awxFZHeypfWBzg-8jYYAx6d6d2KMXlGp-fT1vfDCr_5DQWPiBC3ozrWBdKtWo/s1600/agentConfig2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="168" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXUTHOTJ1yhjFiW3sDs-yudRkadhiyY9SS95V_RCDlVhLiKN8JyxMAz6_AodVLPIMagBeAgeIQ4diek_awxFZHeypfWBzg-8jYYAx6d6d2KMXlGp-fT1vfDCr_5DQWPiBC3ozrWBdKtWo/s320/agentConfig2.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The sub service list tab shown is accessed by doubleclicking and viewing the details of an individual server. Here is what it showed for the AsyncReqAgentServer -</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4rF4ZC6gAO3WzUnpSQPPqzisCGIMOl78ulkh8rFzqYdTuk_Y3uIRhvbVwQJi93aa-kADRPnjRGDxe0-JzYMQ9_vejCyuyoM7CIzrvvgxY0J18hvNOtw_2cICXb7Y627dfZb-NgmcYcjk/s1600/agentconf4.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="194" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4rF4ZC6gAO3WzUnpSQPPqzisCGIMOl78ulkh8rFzqYdTuk_Y3uIRhvbVwQJi93aa-kADRPnjRGDxe0-JzYMQ9_vejCyuyoM7CIzrvvgxY0J18hvNOtw_2cICXb7Y627dfZb-NgmcYcjk/s320/agentconf4.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
and for the DefaultAgentServer - </div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN1vlu-w6R5VX_dkzizVEeEC0VANcgT2Xy-jB__UPAEzAWeMGn-Lxk9t61pPBV25nHn7L2pbPJUh-pYJl6PSRBz5sLzyFduYwzyjW6CBsKG8R5mzwrknV57S0Wq_r89Kr7Ec4ixXLyonU/s1600/agentconf2.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="192" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN1vlu-w6R5VX_dkzizVEeEC0VANcgT2Xy-jB__UPAEzAWeMGn-Lxk9t61pPBV25nHn7L2pbPJUh-pYJl6PSRBz5sLzyFduYwzyjW6CBsKG8R5mzwrknV57S0Wq_r89Kr7Ec4ixXLyonU/s320/agentconf2.JPG" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
So now that it was clear the logs were correct (atleast in this scenario) with the ASYNC_REQ_PROCESSOR indeed running as part of the DefaultAgentServer and the AsyncReqAgentServer having no services configured. Thus it was the configuration that was out of whack between the Server and Transaction configuration. That mystery is unraveled further if one digs in how these views are dispalyed and how configuration data is propagated. </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Transaction configuration view is based on the YFS_FLOW and YFS_SUB_FLOW tables whereas the server configuration view and its associated sub-services are built on the YFS_SERVER and YFS_AGENT_CRITERIA table. Normally, these config tables are always in sync if the configuration changes are all driven by manual changes. However, in most implementations the Master Config environment is maintained as the source of config changes and CDT is used to promote configuration changes to various environments. A problem in the MC environment normally a crash or an incorrect data fix could result in a mis-configuration. This mis-configuration is then promoted to environments via CDT. Production was spared because it was running an older version of the release and config changes were yet to be promoted there. </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Here is a query that I could have used to confirm my observations - </div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
</div>
<div class="separator" style="clear: both;">
select agent_criteria_id, transaction_key, flow_key, server_key from yfs_agent_criteria </div>
<div class="separator" style="clear: both;">
where server_key in (select server_key from yfs_server where server_name = 'AsyncReqAgentServer)</div>
<div class="separator" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both;">
It can be adapted for your situation for e.g. to determine what all services are configured under a particular server. So when it comes to Sterling OMS (and perhaps most things in life) if you don't believe what you see just look further. </div>
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
</div>
Ranjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.com0tag:blogger.com,1999:blog-8562352444658652111.post-86061453377785087442012-09-09T06:31:00.001-07:002012-09-11T09:42:14.879-07:00Agent framework scalability and Tuning considerations for high volumes<div dir="ltr" style="text-align: left;" trbidi="on">
The core of the Sterling solution for many implementations lies in the Sterling agent framework and APIs provided for monitoring and order fulfillment. OMS implementations typically use Schedule Order, Release Order, ConsolidateToShipment, Real Time Availability Monitor agents to name a few. Although every agent works differently and there is no <i>run_faster</i> parameter available to scale up Sterling agents there are few underlying elements that vastly control the extent of their scalability. Very little is documented in the public domain on how exactly the agent framework works and to what extent it can scale so here goes my attempt to demystify agent operations and scalability. This post assumes that you are familiar with the OMS nomenclature else you may want to read <a href="http://sterlingoms.blogspot.in/2012/07/oms-transaction-framework-nomenclature.html" target="_blank">my earlier post</a> first.<br />
<br />
How a Generic Agent works -<br />
A generic Sterling agent is a background batch processing job that does the following -<br />
<ol style="text-align: left;">
<li>Check if there are messages to process from the configured JMS Queue. If the queue is empty post a getJobs message and go to Step 2 else go to Step 4. </li>
<li>Read the getJobs message and gets the first set of jobs (first batch) from the database using a getJobs method up to the defined buffer size (Number of records to buffer configuration which defaults to 5000).</li>
<li>Writes these records back into the configured JMS queue in the form of executeJobs messages as well as the next getJobs message containing the last fetched record key such as an TaskQKey</li>
<li>Retrieves executeJobs messages from the queue and does the necessary processing using the executeJobs method</li>
<li>After finishing first batch gets the next set of jobs (second batch) up to the buffer size using the last fetched record key in the getJobs message. </li>
<li>Works on second set of jobs.</li>
<li>Continue the above process till all the present jobs are worked upon.</li>
<li>After all the present jobs are worked upon then wait for signal
i.e. the agent trigger to start working again.</li>
<li>Upon getting the signal to start, agent will start working again i.e. follow Step 2 to Step 7 </li>
</ol>
<div>
<div>
More details on default agent behavior - </div>
<div>
Triggering an Agent is the act of posting a getJobs message to the JMS queue. Triggering may be manual or automatic i.e. self triggered. During an agent startup if there are no messages in the queue an agent automatically triggers itself.</div>
<div>
Within getJobs method, agent tries to acquire lock on YFS_OBJECT_LOCK table for agent Criteria ID</div>
<div>
If lock is not available then getJobs method exits and does nothing. This is used to ensure that duplicate sets of records are not retrieved for processing. </div>
<div>
If lock is available then getJobs method fetches records which needs to be processed. </div>
<div>
Above records are posted as execute message to JMS queue. For each message depending on the JMS session pooling setting a new MQ session is created or borrowed to post the message and then session is closed or returned to the pool. This default behavior could change in an upcoming version as a result of the testing we undertook for one of our customers.</div>
<div>
After the execute messages, one getJobs message is also posted with last record key so as to facilitate retrieval of next batch of messages.</div>
<div>
Each thread of the agent picks execute message one by one and processes them. Multiple threads of execute method can run concurrently. </div>
<div>
After all the execute messages are consumed then only getJobs message is left in queue then the same agent thread uses the getJobs() method to process the getJobs message and continue the processing cycle.<br />
<br />
Scalability concerns and Scaling the Availability Monitor agent -<br />
Are Sterling agents multi-threaded? <br />
Not entirely. The getJobs component of the agent working is deliberately made single threaded via the database locking on YFS_OBJECT_LOCK to ensure same set of records are not processed and retrieved multiple times. However, the bulk of the workload is on the executeJobs component which is multi-threaded and can run in multiple JVMs. <br />
<br />
Will my agent scale to meet the peak throughput?<br />
Depends on your volumes. Scaling an agent involves tuning the getJobs and the executeJobs component. The scaling and tuning of the executeJobs component is a different exercise which varies depending on the use case so it will not be covered in this post. At low to medium volumes under 100K/hr scalability issue are largely with the executeJobs component. For workloads under 100K jobs/hour the default settings that governs agent behavior should work well. If you are using the agent framework to process over 150K "jobs" per hour there may be challenges using the default implementation. I use the term jobs to denote the message entity for e.g. Jobs in the case of ScheduleOrder are distinct Orders and for Availability Monitor it is distinct Inventory Items.<br />
<br />
What are the elements that affect scaling beyond 100K jobs/hour?<br />
<ol style="text-align: left;">
<li>Performance of the getJobs query - Slower the query more time is spent on retrieving messages</li>
<li>Time taken to write all of the retrieved executeJobs messages to the queue - Default behavior for creating and closing MQ sessions to write individual messages meant that was a significant overhead. Using the product HF to enable bulk loading of messages significantly improves the write time per message. Other aspects such as Persistence setting used for the queue, network latency between the agent servers and MQ server can also affect message write times to the queue. </li>
<li>Buffer size of messages to get - Default of 5K may not suffice at very high loads as it would mean 40 or more execution of the getJobs component to achieve just 200K throughput. Since getJobs is single threaded there needs to be an optimal number of executions of it.</li>
</ol>
<br />
Scaling the Real Time Availability Monitor (RTAM) Agent - A case study<br />
At a customer site one of the challenges was to scale the Real Time Availability monitor agent to do the Partial Sync of inventory at over 250 K records/hour. The customer was running Sterling 8.5 HF 25, WAS 7 and MQ 7. Following actions were taken to scale the agent from about 150K/hr to around 300K /hr -<br />
1. Tuning the getJobs query - Front loading the YFS_INVENTORY_ACTIVITY table would heavily skew the test results due to the excessive time spent querying it as part of getJobs query. Hence, trimming or keeping the Inventory Activity record table under check significantly alters the time take for getJobs and also more realistically represents production work load. We also ensured usage of the correct index and updated statistics.<br />
2. Setting the agent queue to non-persistent - We defined the internal JMS Queue as non-persistent on MQ. Then, setting the PER(QDEF) option in the scp file for this queue's entry while generating the bindings. Writing each message to the persistent queue takes between 11-20ms whereas on the non-persistent queue it is under 5 ms.<br />
3. Enabled JMS session pooling for this agent via the following property in customer_overrides.properties -<br />
<i>yfs.yfs.jms.session.disable.pooling=N</i><br />
This allows sessions to be borrowed and returned to the pool instead of new ones getting created and closed for each message.<br />
<div>
4. Enabling the bulk loader property for the agent framework to avoid creating and closing sessions for each message being posted to the queue. We worked with IBM Sterling support to accomplish this via 8.5 HF48. The below 2 properties were set in the customer_overrides.properties file </div>
</div>
</div>
<i>yfs.agent.bulk.sender.enabled=Y</i><br />
<i>yfs.agent.bulk.sender.batch.size=50000</i><br />
The batch size setting of 50000 should be equal to or great than the maximum buffer size you plan to use across all agents.<br />
<div>
5. Running the agents in the same data center as the MQ server - This helps further improve the latency between the two tiers and therefore the overall performance. This may not always be possible if you are running agents in multiple data centers. </div>
<div>
6. Increasing the buffer size of records retrieved to 10000 from the default of 5000 - We tested with various settings between 5K and 25K and found that the overall performance was best at 10000 for our setup. The optimal setting for the buffer size may vary on your environment and workload so run performance tests to determine what works best for your needs.</div>
<div>
<br /></div>
<div>
Now that you have a better understanding of how the Sterling agent works you should be in a better position to troubleshoot and scale the agents. Happy testing and tuning! </div>
</div>
Ranjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.com0tag:blogger.com,1999:blog-8562352444658652111.post-45430030218373563642012-07-28T11:43:00.002-07:002012-07-28T13:48:27.954-07:00OMS Transaction Framework - Nomenclature and more<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
</div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
</div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;"><br />
After a long break following my first post - longer thanks to the distractions
of the Euro Cup and Wimbledon - I am back with a little tidbit on <span style="background-color: white;">nomenclature related to the Sterling transaction
framework. If you have been stumped by whether a Sterling process is an agent
server or integration server then this info would help you make the right
call. </span></span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">For many years as I have
worked with various customers, colleagues and partners I would hear people
using various Sterling terms - agent server, <span style="background-color: white;">integration
server, transaction interchangably. Although the Sterling OMS world is not what
it was in 2000 and as the lines are getting blurred as traditional
"agent" processes are being implemented as services I figured I should
tackle this topic in my blog. Earlier this week when one of my colleagues mentioned that this was a topic he too had explained for the n-th time to a new
customer and pinged me looking for such a write-up I figured that it was
time to put pen to paper or rather finger to keyboard. (For illustrations
do refer the Sterling product documentation guides - <a href="ftp://public.dhe.ibm.com/software/commerce/doc/ssfs/85/Application_Platform_Configuration_Guide.pdf"><span style="color: blue;">ftp://public.dhe.ibm.com/software/commerce/doc/ssfs/85/Application_Platform_Configuration_Guide.pdf</span></a>)</span></span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;"><span style="background-color: white;"><br /></span></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: x-small;">Grab a cup of your favorite beverage as this post does get a little long..</span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Transactions</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- In software parlance, a transaction usually
means a sequence of information exchange and related work (such as database
updating) that is treated as a unit for the purposes of satisfying a request
while ensuring data integrity. Transactions may be synchronous such as those
running in the UI or Asynchronous such as the batch jobs.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> In the Sterling world the product
extensibility and flexibility make the boundaries of a seemingly similar
transaction vary from implementation to implementation even if the project
teams and customers may call it the same such as Create Order Transaction or
even dropping the transaction and referring to it as Create Order. In Sterling
these transactions are defined either as an agent criteria or a Service via the
Application Configurator. These transactions are executed in background JVMs
known as agent servers or integration servers (also referred to as batch jobs)
or directly from the Sterling UI (traditional console or thick client) or a
Webservice call from external systems on the Application server JVM.
Transactions consist of the underlying API and its associated events, user
exits and conditions. A successful transaction results in the changes being
committed that usually involves a combination of Database updates and messages
being written and read from a queue or file. Either the entire transaction is
successful or an error is thrown which causes the entire transaction to be
rolled back or an error to be raised for subsequent reprocessing.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">In Sterling MCF we can
classify processes into the following types of transactions :-</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">1.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Time-Triggered transaction</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;">s or Agents- which are triggered on a scheduled
basis to perform repetitive actions.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> Actions typically include invoking APIs to perform database
updates. for e.g. consolidation of orders to shipments that may need to happen
around 30 minutes apart so the Consolidate To Shipment time triggered
transaction can be configured to trigger every 30 minutes. Most of the time
triggered transactions are driven by records in YFS_TASK_Q table or based on
the pipeline. Time triggered transactions are defined by the Transaction Name
and the Agent Criteria.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> They can be run in
single or multi thread mode and are also called agents and the servers in which
they run being called agent server. </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">Three types of Time-triggered transactions are :-</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">i.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Business Process transactions</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- Responsible for majority of processing
entities such as orders (sales/purchase/transfer) and shipments. The entities
in every implementation will require one or more business process transactions
such as CONSOLIDATE_TO_SHIPMENT, CLOSE_ORDER to complete their lifecycle.
Understanding limitations of the Sterling transaction framework and designing
for your business needs can help you get the most out of the solution.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">ii.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Purge transactions</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- Archive data from live (transaction) tables to
history tables or delete that data that does not require archiving. Helps to
mitigate unrestricted growth of the OMS</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> transactional database. Frequently underestimated in value
and in development+testing efforts and overlooked in most implementations
leading to application performance issues.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">iii.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Task Q Syncher Time-Triggered Transactions</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">– A relatively new addition to the fold and is
used to update the task queue repository table with the latest list of open
tasks to be performed</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> by the
corresponding each transaction, based on the latest pipeline configuration. 4
of these transactions are available - Load Execution, Order Fulfillment, Order
Delivery and Order Negotiation</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="background-color: white; font-family: Arial, sans-serif; font-size: 10pt;">iv.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="background-color: white; font-family: Arial, sans-serif; font-size: 10pt;">Monitors</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="background-color: white; font-family: Arial, sans-serif; font-size: 10pt;">– These are circumstance
driven transactions that watch for processes or circumstances that are out
of bounds and then raise alerts. Common monitors are those for Order, Shipment,
Inventory Availability and Exceptions. Monitoring jobs can be a huge system hog
if the data is not being purged often and if excessive stale entities exist
such as abandoned or erroneous orders.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<span style="font-family: Arial, sans-serif; font-size: 10pt;">2.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Services or Flows</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">– Transactions that are executed NOT in
pre-defined times are called services or flows. In the Database and Configuration
screen titles this name is also used for every transaction in the SDF. Services
can be invoked via use of broadly available transports - Web service/SOAP,
HTTP, JMS, MSMQ, DB, flat file etc. A service can invoke other services to make
a longer chain of services.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> A service could
include invoking APIs (product or custom), evaluating conditions, making DB
updates etc. The services are processed continuously subject to thread and
resource availability and are not triggered at any particular time. They can be
run in single or multi thread mode and the servers in which they are executed
are called integration servers. The most common scenario is the use of services
to read messages from an inbound queue to Sterling for example to Create orders
flowing in from a web channel.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">3. Externally-Triggered
Transactions -</span></b><b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;">An externally-triggered
transaction is used to map a service invoked to a Sterling transaction and to
leverage the transaction framework. Seldom used in the real world as
implementations prefer to just use a service/flow minus the transaction
instead.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">4. User-Triggered
Transactions -</span></b><b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;">A user-triggered
transaction is invoked manually through the Application Consoles, a configured
alert queue, or an e-mail service. </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">Never seen it used in the field.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> <i>So if you are implementing this or the
externally-triggered do let me know how it goes.</i></span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="background-color: white; font-family: Arial, sans-serif; font-size: 10pt;">Composite
services</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="background-color: white; font-family: Arial, sans-serif; font-size: 10pt;">– A
construct to enable invocation of multiple services in parallel. A very useful
concept ever since its addition to the SDF but needs careful testing as
implementations could run into issues stemming from funky exception handling or
inadequate logging.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Agent Criteria</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">– An element that describes attributes that are
specific to<span style="background-color: white;"> a time-triggered transaction.
These attributes include the selector criteria such as Organization code,
Manual or Auto triggered, trigger interval and server name. A particular transaction
may have one or more agent criteria for processing data for different
organizations or other logical grouping. E.g. Schedule Order agent criteria
could be used to run scheduling for different organizations at different
intervals.</span></span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Agent Server</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- Server JVM on which one or more agent criteria
(commonly referred to as agents) can run. Invokes the
com.yantra.integration.adapter.IntegrationAdapter class and is started
typically by a startIntegrationServer.sh script provided as part of the product
installation.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Integration Server</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- Server JVM on which one or more integration
services or flows (commonly referred to as services or mistakenly called
agents) are run. Invokes the com.yantra.integration.adapter.IntegrationAdapter
class and is started by a startIntegrationServer.sh.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> <o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<i><span style="font-family: Arial, sans-serif; font-size: 10pt;">Yes, you read it right!
Both agents and integration services are started by the same class and script
but the server name, service name or agent criteria name and definition
controls the behavior.</span></i><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Trigger agent</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">- This is the process that is typically invoked
via Cron or Ctrl-M jobs to trigger a certain time triggered transaction at
certain points in time using the triggeragent.sh or triggeragent.cmd script.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> For e.g. to Create Waves at certain hours
of the day in a WMS implementation the trigger agent job could be invoked to
trigger Create Wave agent or to run a nightly purge of sales order we could
trigger the Order Purge agent.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Events</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">– Help accomplish certain specific actions
executed upon a certain business event occurring. For e.g ON_SUCCESS of Create
Shipment we could have an event to send an e-mail to the customer with the
shipment details or ON_BACKORDER of Schedule Order could used to raise Alert to
the Inventory Control Business team.</span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span></b><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Event Handlers</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">are configured to
associate the required</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><b><span style="font-family: Arial, sans-serif; font-size: 10pt;">Actions</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">to a particular event. Conditions are often used
to further customize the action taken. Event handlers can invoke any service to
e-mail, or raise exception alert; </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">Publish XML to external queues/database or Invoke custom services.</span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;"> Actions associated are triggered any time
the transaction is raised and when applicable so use it with caution. Excessive
number of and complicated actions can prolong a transaction so use them wisely
and tune them well.</span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<br /></div>
<div class="MsoNormal" style="margin-bottom: 0.0001pt;">
<b><span style="font-family: Arial, sans-serif; font-size: 10pt;">User Exits</span></b><span style="font-family: Arial, sans-serif; font-size: 10pt;"> </span><span style="font-family: Arial, sans-serif; font-size: 10pt;">– These enable transactions to invoke custom
logic to interact with external systems synchronously to complete processing. A
classic example is in the Payment Agent for credit card authorization.
Frequently a source of issues when not implemented well and only care while
designing and testing can avoid myriad issues post production. </span><span style="font-family: 'Times New Roman', serif; font-size: 13.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<br /></div>Ranjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.com11tag:blogger.com,1999:blog-8562352444658652111.post-83170615448888160132012-04-17T07:11:00.003-07:002012-04-17T20:00:40.314-07:00Whose problem is it anyway?<div dir="ltr" style="text-align: left;" trbidi="on"><br />
<div class="MsoNormal" style="margin-left: 0.5in;">Growing up in India cable TV did not make it to my home until my high school days. One of the early shows that caught my attention was the very funny syndicated improvisational game show – "<a href="http://en.wikipedia.org/wiki/Whose_Line_Is_It_Anyway%3F" target="_blank">Whose line is it anyway</a>?" In the eponymous round contestants on their turn use their creative instincts and quick-wittedness to “explain or demo” random and quirky looking props. The toughest OMS problems call for that same kind of creativity, (although the results or the scenario itself is far from being funny) and for someone to step up and make sense of the problem (random or otherwise) with a complex software solution that has a seemingly quirky side to it. </div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">When a MQ Queue Full is not an MQ issue –</div><div class="MsoNormal" style="margin-left: 0.5in;">Here’s a typical problem encountered at a Sterling OMS implementation in the testing phase. A certain transaction say CREATE_ORDER fails with the following exception and stack trace – </div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;">com.yantra.interop.services.jms.JMSProducer$RetryException: com.ibm.msg.client.jms.DetailedInvalidDestinationException: JMSWMQ2007: Failed to send a message to destination 'CREATE_ORDER_QUEUE'. JMS attempted to perform an MQPUT or MQPUT1; however WebSphere MQ reported an error. Use the linked exception to determine the cause of this error.</span></div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;"> at com.yantra.interop.services.jms.JMSProducer.sendJMSMessage(JMSProducer.java:852)</span></div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;"> at com.yantra.interop.services.jms.JMSProducer.access$700(JMSProducer.java:63)</span></div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;">......</span></div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;">JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2053' ('MQRC_Q_FULL'). [system]: JMSProducer</span></div><div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0.0001pt 0.5in;"><span style="font-family: "Arial","sans-serif"; font-size: 8pt;">com.ibm.mq.MQException: JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2053' ('MQRC_Q_FULL').</span></div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">At first glance, this seems to be an MQ issue calling for the testing team to make a beeline to the WebSphere MQ administrator’s desk. However, a more thorough investigation calls for many additional checks to be done and questions to be answered before pinging the MQ Admin. </div><div class="MsoListParagraphCxSpFirst" style="margin-left: 1in; text-indent: -0.25in;">a.<span style="font: 7pt "Times New Roman";"> </span>Has the queue been sized appropriately for the environment?</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">b.<span style="font: 7pt "Times New Roman";"> </span>Are there processes – Sterling or otherwise - attached to and consuming messages from the queue?</div><div class="MsoListParagraphCxSpLast" style="margin-left: 1in; text-indent: -0.25in;">c.<span style="font: 7pt "Times New Roman";"> </span>Are the messages from the queue being consumed at a much slower rate than incoming messages? </div><div class="MsoListParagraphCxSpLast" style="margin-left: 1in; text-indent: -0.25in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">Other Sterling OMS system and performance problems would entail weeding through many more questions such as</div><div class="MsoListParagraphCxSpFirst" style="margin-left: 1in; text-indent: -0.25in;">a.<span style="font: 7pt "Times New Roman";"> </span>Is it a browser issue?</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">b.<span style="font: 7pt "Times New Roman";"> </span>Is it a database tuning issue?</div><div class="MsoListParagraphCxSpMiddle" style="margin-left: 1in; text-indent: -0.25in;">c.<span style="font: 7pt "Times New Roman";"> </span>Is it an Appserver configuration problem?</div><div class="MsoListParagraphCxSpLast" style="margin-left: 1in; text-indent: -0.25in;">d.<span style="font: 7pt "Times New Roman";"> </span>Does the solution/product scale to meet our needs? </div><div class="MsoListParagraphCxSpLast" style="margin-left: 1in; text-indent: -0.25in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">Failure to consider all these questions to identify a root cause often leads to the conclusion that most Sterling OMS system problems are simply “a Sterling issue” (the industry is still to term this an IBM issue perhaps reserving that for their other woes on “traditional” products on the IBM tech stack). Whose problem is that anyway? Or to be more precise between an Implementation team – developers and testers, System admin team - DBAs, Appserver, JMS, AIX Admins and IBM Support who is going to own it and drive it to resolution? Thus, was born the role of a services focused Yantra/Sterling Performance Engineer in 2004 (Yantra as it was known up until 2005 the Sterling Commerce acquisition). The name Performance Engineer or PE has stuck although not all issues require performance tuning but because nothing else fitted either. </div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">How Performance Engineers are like Economists –</div><div class="MsoNormal" style="margin-left: 0.5in;">Steven Levitt in his best-seller SuperFreakonomics describes economists as being trained to be cold-blooded to calmly discuss trade-offs involved in a global catastrophe while the rest of us non-economists are a bit more excitable. A good Performance Engineer (Sterling or otherwise) is a lot like that economist and although he is not called on to explain implications of a global catastrophe like an earthquake or global-warming (a production outage being the biggest catastrophe that a PE is called on to solve) he needs to analyze issues calmly and keep emotions – blame, paralysis, confusion, panic, ego – in check while collaborating with the various teams - business users, System Administrators, developers and Support to find a resolution. </div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">Had an economist been regarded as highly as a doctor or an engineer in the Indian middle class psyche perhaps I may have gone on to become one. Now 8 years since I first started as an in-house PE in the QA organization and 12 years since I started there as a Support Engineer I am still solving Sterling issues and still loving it. This blog attempts to share what I have learnt over the years (and still learning) on implementing, fixing and tuning Sterling applications. Although it may be difficult to explore all Sterling issues in a simplistic Q & A format like that of <a href="http://asktom.oracle.com/" target="_blank">asktom</a> site hosted by the legendary <a href="http://tkyte.blogspot.in/" target="_blank">Tom Kyte</a> (the first “technology” guru I was and still am in awe of) I shall experiment to see what best can be shared in this format. I am hoping that I can review your questions, try and answer some (or at least the most interesting and relevant ones) and other topics in these pages and most importantly nurture the inner "PE" in each of you. </div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;">Do let me know your comments on this post & format and what Sterling topics you want to see covered (It will keep me from boring you with personal stories and not-particularly-useful insights). </div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div><div class="MsoNormal" style="margin-left: 0.5in;"><br />
</div></div>Ranjithhttp://www.blogger.com/profile/00998643080201014947noreply@blogger.com11