Posts Tagged ‘performance tuning’

Validate WCI Search Server performance

Wednesday, June 27th, 2012

WCI Search is a fickle beast. In my opinion, it started as one of the most stable pieces of the Plumtree stack and has been losing steam since the AquaLogic days with Clustered Search. Nowadays, it seems even a slight breeze will bring the damn thing down and either needs to be restarted or rebuilt.

But, if you’re getting reports that “search is slow”, how do you quantify that? Sure, something like Splunk would be useful for event correlation here, but we can build a pretty clear picture with what we’ve already got. And, I’m not talking about WebCenter Analytics – although that is possible too by querying the ASFACT_SEARCH tables.

Instead, let’s try to get the information from the log files. First, if you look at an entry for an actual search query in the search log files at \ptsearchserver\10.3.0\\logs\logfile-trace-*, you’ll see lines like this:

See it? There’s a line there that looks like this:

<request client="207.228.237.50" duration="0.015" type="query">

So although there is no timestamp on that line, if we can isolate that line over a period of time and parse out the “duration” value, we can get a sense of whether the search server is slowing down or if we need to look elsewhere.

I did this using my old friend UnixUtils, and ran a simple command to spin through all the search log files and just find those lines with the phrase type=”query” in it:

cat * | grep "type=\"query\"" > ..\queryresults.txt

The “duration” value can then be parsed out of those lines, and we can produce a nice purdy chart for the client that says “indeed, it seems searches that used to be taking 100ms are now running around 1s, so we should dig into this”:

There’s an App For That 10: LogAnalyzer

Monday, June 11th, 2012

A while back I suggested that you turn on response time monitoring in your IIS or Apache instances. But what good are log files if you can’t make sense of them?

If you’ve got the budget, I totally recommend Splunk for your data processing needs, but today’s App For That is a home-brew log parsing system that simply processes server logs and breaks down the response times by a time interval (such as 1 hour) and type (such as portal gateway requests or imageserver requests), calculating the number of hits and response time during that interval.

It’s a pretty simple .NET application that can process a large number of log files:

It also allows you to export that data for any particular filter to a .tsv so you can produce charts like this and really tease out some significant information:

We’ll be open-sourcing it soon, so stay tuned – or drop us a line and we’ll share it with you now.

Cool Tools 23: LoadImpact

Wednesday, May 30th, 2012

This is a great one – thanks to Brendan Patterson for this find!

LoadImpact is a load-testing service with a free offering that’s immediately available on their home page with no sign-up necessary. It’s an excellent load testing tool that utilizes the Amazon cloud to stress your site from multiple locations, and it’s got a really slick UI. I haven’t tried the paid version yet but welcome any comments from any of you that have explored some of the more advanced features.

Pro Tip for you portal users utilizing the free version: the URL you enter doesn’t process any JavaScript or some redirections on the URL you provide, so instead of using http://mysite.com/, you should use http://mysite.com/portal/server.pt.

Amazing how just a couple of years ago load testing tools would have costed a fortune (and many that will remain nameless still do!) – kind of like .NET Decompilers

Turn on Response Time Monitoring in IIS, Apache, or Weblogic

Thursday, May 17th, 2012

You’ve probably got monitors set up for your various portal servers and components, and are using those to track up-time of your servers. And you may even use an awesome tool like New Relic to track response times in addition to up-time.

But, if you don’t have or need anything fancy (or even if you do for that matter), one of the most common tweaks I recommend to customers prior to a Health Check is to turn on Response Time monitoring. By default, application servers like IIS or WebLogic don’t track how long they take to serve up a page, but it’s easy to turn that on for later analysis.

In IIS, you just turn on the “time-taken” header in the logs:

In Apache or other Java App Servers, use a line line this:

LogFormat "%{X-Forwarded-For}i %l %u %t %p \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\" %h %I %O %D" combined-withcookie

It’s the %D that logs the time taken in milliseconds – see the Apache Log documentation for details.

Either way, you should get a log line that looks like this:

- - - [14/Feb/2012:01:11:51 -0400] 80 "GET /portal/server.pt HTTP/1.1" 200 843 "http://portal/server.pt?space=Login" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.1)" "" 10.10.12.26 710 1133 17099

… where one of the items is milliseconds (or, in Apache’s case, microseconds) that were taken to process and serve the request. This is hugely valuable information for identifying which pages, communities, or content types are slowing you down.