Friday, November 23, 2007

SAP Load Balancing and Work Processes Troubleshoot

The benefit of segregating user groups by line-of-business (using logon groups) is related to the point that groups of users (like SD users or HR users, for example) tend to use the same sets of data. They (generally) work with the same groups of tables and hit the same indexes using the same programs (transactions).

So, if you can group all of the users hitting the same tables, onto (or one set of) App server(s), then you can tune the App server buffers to a much greater extent. If the FI users (generally) never hit against the HR tables then the App servers in the FI group don't (generally) have to buffer any HR data. That leaves you free to make memory and buffer adjustments to a more drastic extent, because you don't have to worry (as much) about screwing the HR users (as an example), when you're adjusting the FI server group.

So, (in opinion only) you should start with a buffer hit ratio analysis / DB table & index access analysis (by user group) to see where you would get the best benefit from this kind of setup. If you don't have this kind of info, then creating logon groups by line-of-business may have no benefit (or worst case, may make performance degrade for the group with the highest load %). You need some historical information to base your decision on, for how to best split the users up.

You may find that 50% of the load is from the SD users and so you may need one group for them (with 3 App servers in it) and one other group for everyone else (with the other 3).

The logon group(s) will have to be referenced by SAP GUI, so SAP GUI (or saplogon.ini + maybe the services file, only) will have to change to accomodate any new groups you create in SMLG. Also consider that there's variables for time-of-day (load varies by time-of-day) and op-mode switches (resources vary by op-mode).

All Work process are running? What will be our action?

Are all the work processes (dia,btc,enq,upd,up2,spo) running or just all the dialog work processes?

If all the work processes are running, then you may want to look at SM12 (or is SM13?) and see if updates are disabled. If they are, look at the alert log (if it's an Oracle database) and see if you have any space related errors (e.g. ORA-01653 or ORA-01654). If you do, add a datafile or raw device file to the applicable tablespace and then, re-enable updates in SM12.

If only all the dialog work processes are running, there are several possible causes. First, look to see if there's a number in the Semaphore column in SM50 or dpmon. If there is, click once on one of the numbers in the Semaphore column to select it and then, press F1 (help) to get a list of Semaphores. Then, search OSS notes and, hopefully, you'll find a note that will tell you how to fix the problem.

If it's not a semaphore (or sometimes if it is), use vmstat on UNIX or task manager on Windows to see if the operating system is running short on memory which would cause it to swap. In vmstat, the free column (which is in 4k pages on most UNIX derivatives) will be consistently 5MB or so and the pi and/or po columns will have a non- zero value. The %idle column in the cpu or proc section will be 0 or a very low single digit while the sys column will be a very high double-digit number because the operating system is having to swap programs out to disk and in from disk before it can execute them.

In task manager, look at free memory in the physical memory section under the performance tab. If it's 10MB or 15MB (I think), then the operating system will be swapping.

Usually, when all the dialog work processes are running, you won't be able to log in via SAPgui and will need to execute the dpmon utility at the commandline level. The procedure is basically the same on UNIX and Windows.

On UNIX:

telnet to server and login as sidadm user.
cd to /sapmnt/SID/profile directory
execute "dpmon pf=SID_hostname_SYSNR" (e.g. PRD_hercules_DVEGMS00) select option "m" and then, option "l"

On Windows:

Click on START, then RUN
Type "cmd" and press enter
change to drive where profile directory resides (e.g. f:)
cd to \sapmnt\SID\profile
execute "dpmon pf=SID_hostname_SYSNR" (e.g. PRD_zeus_DVEGMS00) select option "m" and then, option "l"

On both operating systems, you'll see a screen that looks like what you see in SM50. Depending on what you see here, will depend on what you do next, but checking the developer trace files (e.g. dev_disp) in the work directory (e.g. /usr/sap/SID/DVEGMS00/work) is never a bad idea.

No comments: