100% CPU in Collaboration Server on Solaris?

Today’s post comes from a Rock Star Consultant in the WebCenter Consulting space.  It has to do with WCI Collaboration Server consuming 100% CPU on Solaris, but might be relevant to those Windows users out there.  While I personally haven’t experienced this particular issue at client sites (virtually all of them Windows), it sounds like if you’re running Collab in Unix, it might be worth upgrading your JVM.

Problem:

Collab periodically starts chewing up CPU until it maxes out the box and ultimately dies.

Details:

This looks to ultimately be a problem at the JDK level.  Out of the box, the Tomcat Collab is deployed to uses JDK 1.5.  There’s a bug in JDK 1.5 that causes the NIO connector in tomcat to sometimes freak out, resulting in Collab spinning out of control and eating all the server CPU.  For details, see this thread:

http://www.mail-archive.com/users@tomcat.apache.org/msg36900.html

Diagnosis:
Here’s the rundown on the diagnosis we did (Collab on Solaris)

Symptom:
Collab is eating up a huge amount of CPU minimal user load(80%+ on a server where it usually uses ~10%).

Troubleshooting performed:
1) Generated a stack trace for Collab
2) Run prstat -Lp <Collab Pid>.  This shows us how much CPU each thread in Collab is taking. Note that the top three threads in the attachment are taking up 22% of the CPU each.  Also note that those threads have used a huge amount of CPU time: 3-4 hours each).

3) Note the LWPID of each of the busy threads:  6248, 3413, 8198.
4) Now convert those numbers to Hex:
6248 -> 1868
3413 -> d55
8198 -> 2006
5) Now look for those hex thread ids in the thread dump.  You’ll see they all have the same stack (for example: nid=0xd55).  Specifically:

—————————

“http-7127-exec-23” daemon prio=10 tid=0x00000001008ee360
nid=0xd55 runnable [0xfffffffeff0fa000..0xfffffffeff0ff728]
at org.apache.coyote.http11. InternalNioOutputBuffer.addToBB(InternalNioOutputBuffer.java:610)
– waiting to lock <0xffffffff40927c48> (a org.apache.coyote.http11.InternalNioOutputBuffer)
at org.apache.coyote.http11. InternalNioOutputBuffer.access$000(InternalNioOutputBuffer.java:44)
at org.apache.coyote.http11. InternalNioOutputBuffer$SocketOutputBuffer.
doWrite(InternalNioOutputBuffer.java:794)
at org.apache.coyote.http11. filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126)
SNIP

Again, note that all the stack traces are the same and they appear to be trying to flush/close and output stream, but it looks like they’re blocked.

Fix:
Update the Collab JVM to a recent release of the 1.6 JDK (1.6u23 for example).  Restart Collab

Results:
Prior to change, Collab was crashing multiple times a day and using at least 20% of the CPU on a beefy Solaris box.  Post change, no Collab crashes, and Collab is pretty consistently using about 4% of CPU on the server.

Tags: , , ,

Leave a Reply