Posts Tagged ‘Search Server’

Rebuilding WebCenter Collaboration Index By Project

Thursday, April 4th, 2013

Another tip of the hat to Brian Hak for this pretty awesome Hak (see what I did there?).

Last year, Brian was faced with a problem: Some documents in WebCenter Interaction Collaboration Server weren’t properly indexed, and his only option was the rebuild the ENTIRE index (a pain we’re all pretty familiar with). With many thousands of documents, the job would never finish, and end users would be frustrated with incomplete results while the job toiled away after wiping everything out.
collab-search-service

So he took it upon himself to write CollabSearchRebuilder. CollabSearchRebuilder allows you to rebuild specific subsets of documents without having to wipe out the whole index and start all over again.

Feel free to download the source and build files and check it out!
(more…)

Validate WCI Search Server performance

Wednesday, June 27th, 2012

WCI Search is a fickle beast. In my opinion, it started as one of the most stable pieces of the Plumtree stack and has been losing steam since the AquaLogic days with Clustered Search. Nowadays, it seems even a slight breeze will bring the damn thing down and either needs to be restarted or rebuilt.

But, if you’re getting reports that “search is slow”, how do you quantify that? Sure, something like Splunk would be useful for event correlation here, but we can build a pretty clear picture with what we’ve already got. And, I’m not talking about WebCenter Analytics – although that is possible too by querying the ASFACT_SEARCH tables.

Instead, let’s try to get the information from the log files. First, if you look at an entry for an actual search query in the search log files at \ptsearchserver\10.3.0\\logs\logfile-trace-*, you’ll see lines like this:

See it? There’s a line there that looks like this:

<request client="207.228.237.50" duration="0.015" type="query">

So although there is no timestamp on that line, if we can isolate that line over a period of time and parse out the “duration” value, we can get a sense of whether the search server is slowing down or if we need to look elsewhere.

I did this using my old friend UnixUtils, and ran a simple command to spin through all the search log files and just find those lines with the phrase type=”query” in it:

cat * | grep "type=\"query\"" > ..\queryresults.txt

The “duration” value can then be parsed out of those lines, and we can produce a nice purdy chart for the client that says “indeed, it seems searches that used to be taking 100ms are now running around 1s, so we should dig into this”:

WebCenter Search Thesaurus

Tuesday, May 22nd, 2012

In 2001 Plumtree acquired RipFire, Inc., which provided the foundation for its search technology in the intervening decade.

Somehow, I’ve always known this search server (that’s grown up through the ALUI and WCI phases) allowed a thesaurus to be created, but I’d never created one at a client site until recently. Turns out, it’s not so hard.

The Search Thesaurus means that if an end user searches for term X, you want the search server to internally retrieve documents for term y as well. Creating one is pretty straight-forward: you just need to create a text file using a comma-separated list of values where the first value is the search term and the second value is the internal entry, with one entry per line.

For example, let’s say you’re not happy with the results when users enter “business cards” in the search box (because “business” and “cards” are pretty generic terms). You could create a line in your thesaurus text file like this:

business cards,business card order login

… and import the text file into the WCI Search Server with these steps:

  1. Stop Search Service
  2. Run this command: C:\plumtree\ptsearchserver\6.5\bin\native>customize.exe -r C:\thesaurus.txt C:\plumtree\ptsearchserver\6.5\prod_search_node_01
  3. Start Search Service

For more details on setting up a Search Thesaurus, see the BEA ALUI Admin Guide.

WCI Search Repair Scheduling Nuances

Monday, October 10th, 2011

WebCenter Interaction Search Server can be pretty finicky. It used to be rock-solid in the Plumtree days, but stability has declined a bit since Clustering was added in the BEA Aqualogic days. I don’t even recommend clustering any more if an index is small and manageable enough – I’ve just spent way too much time wiping out and rebuilding indexes due to problems with the cluster sync.

Over time, the database and the search index get out of synch, which is completely normal. In some cases, a custom utility like Integryst’s SearchFixer is needed to repair the index without purging it, but in most cases, a simple Search Repair will do. To schedule a Search Repair, go to Select Utility: Search Server Manager, change the repair date to some time in the past, and click Finish:

The Search Repair process runs when you run the Search Update Job. Just make sure in the log you see the line “Search Update Agent is repairing the directories…”:

Why would you NOT see this? For years I’ve been doing this “schedule search repair to run in the past, kick off Search Update job” procedure, and it always seemed like I had to run the Search Update job twice to get the search repair to actually work. It turns out, though, that you don’t have run the job twice – you just need to wait for a minute after you schedule the search repair to run in the past.

Why? Because when you set the repair to run in the past and click Finish, the date doesn’t STAY in the past – instead, the date gets set to “Now plus 1 minute”. So if you kick off the Search Update job within a minute of changing the search repair date, the repair is still scheduled to run a couple of seconds in the future.

Note that this little date-related nuisance uses the same logic that Search Checkpoint scheduling uses – dates in the past are actually set to “now plus one”. This is significant because it changes the way you should plan schedules. For example, if you want an Automation Server job to run now, and then continue to run every Saturday at midnight, you’d just go into the job and schedule it for LAST Saturday at midnight, repeating every 1 week. Automation Server will see that the job is past-due, run it now, and schedule the next run time for next Saturday at midnight. With Search Repair or Checkpoint manager, if you do this, the date will actually be set to “now plus 1″, and the job will run now. But the next job will be scheduled for 7 days from now, not 7 days from the original “Saturday at midnight” schedule.

Bottom line: for heavy jobs like search repair or checkpoint management, you should schedule them to run in the future at a specific (weekend or off-hours) date rather than some time in the past.

WCI Collaboration Search Server re-indexing

Thursday, March 10th, 2011

Oracle’s Upgrade Guide for WebCenter Interaction Collaboration Server include “Rebuild the Oracle WebCenter Collaboration search collection“.

A while back, I ran into an issue where the rebuild process was spiking the CPU on the Collab Server at 100% forever (which, I suppose, is more of a plateau than a spike).  In viewing the PTSpy logs, I saw hundreds of thousands of messages that included this SQL statement:

select * from csIndexBulkDeletes where status = 1

Checking that table, I found over 110 MILLION rows. Which is particularly odd, given that this client only had 42,000 Collab Docs. Now, I have no idea how the table got that enormous, but it’s clear that Collab’s Search Rebuild process uses that table to determine which documents to update, much like the Search Update job uses the PTCARDSTATUS table – which, incidentally, can also get messed up.

It was clear that if the search rebuild process goes haywire, Collab starts queuing up search server updates in this table, and if the table gets too big, cascading failures start to occur where the queue grows faster than it can get purged.

The solution is: before starting the Collab Search Re-index process, clear this entire table, which is rebuilt during the re-index process anyway. To do so, just run:

truncate table csIndexBulkDeletes

I should note that this isn’t all that common, as I’ve only seen it once, but at least now you know one possible solution if your rebuild process can’t seem to gain escape velocity.

Update: WCI Search Configuration through Configuration Manager

Sunday, November 7th, 2010

A couple months ago, I wrote a post about using the old nodes.ini or cluster.ini files to change your WebCenter Interaction Search Service settings through the deprecated nodes.ini or cluster.ini files.  Since then, I’ve found you can achieve the same effect through the portal’s Configuration Manager:

Now you know.

Communicating Directly with WebCenter Interaction Search Server

Wednesday, October 6th, 2010

Years ago I wrote about checking the Plumtree (ALUI?) Search Server Status The Hard Way.  And I just let it go.  A couple days ago, I told you about a great webinar on Oracle’s support site, and it took that great presentation for me to put two and two together: communicating directly with the search server is possible for more than just “checking its status the hard way”:  once you know how to connect to the port and issue commands via Telnet, you can do ALL KINDS of stuff.  Anything, in fact, that the portal can do – and more.

It turns out that WebCenter Interaction – the portal, IDK, custom code, you name it – is just building complex text queries under the covers based on the actions a user performs (such as typing a search term in the search box).  And you can see these queries when you run PTSpy (Turn on INFO for the SEARCH component):

Taking this search string and adding the (secret?) key, you can compose an identical query:

KEY redacted (((NAMESPACE english BESTBET “matt chiste”) TAG bestbet OR ((ptsearch:”matt chiste”) TAG phraseQ OR ((ptsearch:matt or ptsearch:Matt or ptsearch:Matt_) order near 25 ptsearch:chiste) TAG nearQ OR ((SPELLCORRECT (ptsearch:matt) or ptsearch:Matt or ptsearch:Matt_) and SPELLCORRECT (ptsearch:chiste)) TAG andQ)) AND (((ancestors:”dd1″)[0]) OR (((subtype:”PTUSER”)[0]) OR ((subtype:”PTCOMMUNITY”)[0]) OR ((subtype:”PTPAGE”)[0])) OR (((subtype:”PTGADGET”)[0]) AND ((gadgetsearchtype:”bannersearch”)[0])) OR ((@type:”PTCOLLAB”)[0]) OR ((@type:”PTCONTENT”)[0]))) AND (((((@type:”PTPORTAL”)[0]) OR ((@type:”PTCONTENTTEMPLATE”)[0]) OR ((@type:”PTCONTENT”)[0])) AND (((ptacl:”u200″) OR (ptacl:”212″) OR (ptacl:”211″) OR (ptacl:”207″) OR (ptacl:”202″) OR (ptacl:”201″) OR (ptacl:”51″) OR (ptacl:”1″))[0]) AND (((ptfacl:”u200″) OR (ptfacl:”212″) OR (ptfacl:”211″) OR (ptfacl:”207″) OR (ptfacl:”202″) OR (ptfacl:”201″) OR (ptfacl:”51″) OR (ptfacl:”1″))[0])) OR (((@type:”PTCOLLAB”)[0]) AND ((istemplate:”0″)[0]) AND ((collab_acl:”- \~ 1″)[0]))) METRIC logtf [1]

Then, you telnet to the search server port and paste in the text.  Search will respond with an XML formatted reply (in this case, no results):

Of course, once you have this revelation, you can see how the search text can be tuned based on the folder you’re looking for, the ACLs you want to check, or any other number of parameters.

The other startling revelation is that security is applied at the portal tier, and not the search server tier. That is, if I’m a bad guy and I know that key, and I know the query format, I can construct a query that goes against the search server to circumvent any security that has been applied to cards. Notice there are no credentials or login token passed in the query for the search service to check. Now, before you get all up in arms about this being a major security vulnerability, I offer some counter-points:

  1. Anyone with any knowledge of network sniffers or tunnel tools could easily find this key, as the traffic is not encrypted - ”Security through obscurity” is not valid security.  However, I don’t consider this a fundamental design flaw or major security hole, and it is no doubt not the security that the Plumtree engineers had in mind when they implemented this. Instead, the search server should reside in a DMZ, and the port shouldn’t be open within the general network anyway. The port is NEVER to be opened to the Internet (try it on my site – “telnet www.integryst.com 15250″ doesn’t work).
  2. Even if someone did have this secret key, and they had network access to the search server port, and they knew the search format, and they knew how to craft the request to omit ACLs, the most they could get was search results they didn’t have privileges for – not the documents themselves.

How can this be applied in a real-world scenario?  Stay tuned!

Configure node.ini or cluster.ini for Search Services

Sunday, April 25th, 2010

Years ago, Ross Brodbeck wrote some excellent articles on AquaLogic search clustering.  The information there is still applicable in WebCenter Interaction Search, and I won’t re-hash it here.  I definitely encourage you to give those articles a read for everything you ever needed to know about search – well, except this one more thing that I personally always seem to overlook.

In the old days, there was a node.ini file that allowed you to configure Plumtree (and AquaLogic) search memory parameters.  When clustering arrived, the node.ini file disappeared from the installer but the sample files stayed in PT_HOME\ptsearchserver\10.3.0\<servername>\config\.  These settings were supported in PT_HOME\ptsearchserver\10.3.0\cluster\cluster.ini so they could be applied to all nodes, but there are no sample files in that folder.

It turns out that you can still use a nodes.ini file by copying one of those sample files (such as win32-4GB.ini) to node.ini and restart your search node, but whether you use nodes.ini or cluster.ini, you absolutely should do this.  Out of the box, Search Server only uses 75MB of RAM, which is wildly inadequate in a production environment.  You can see how overloaded Search is by going to Administration, then “Select Utility: Search Cluster Manager”.  Notice the document load is almost at 500%, and the Cluster Manager provides helpful recommendations:

A little tweak to nodes.ini can go a long way:

… with just these settings:

# Recommended Search Server configuration for a
# Windows machine with 4Gb or more RAM and taking
# advantage of the /3GB switch in the boot.ini file
# of Windows 2000 Advanced Server.  The settings
# below allow considerable headroom (~1Gb) to support
# multiple concurrent crawls taxonomizing documents
# into the Knowledge Directory.  Installations with
# simpler crawler configurations may be able to
# increase the amount of memory allocated to the
# index and docset caches below, in which case a
# 3:1 ratio of index:docset should be maintained.

[Environment]
RFINDEX=index
RFPORT=15250
RF_NODE_NAME=servername
RF_CLUSTER_HOME=C:\bea\alui\ptsearchserver\10.3.0\cluster
#JAVA_HOME=@JAVA_HOME@

RF_DOCUMENT_TOKEN_CACHE_SIZE=1000000
RF_SPELL_TOKEN_CACHE_SIZE=50000
RF_MAPPING_TOKEN_CACHE_SIZE=5000

# Index cache 750Mb
RF_INDEX_CACHE_BYTES=786432000
# Docset cache 250Mb
RF_DOCSET_CACHE_BYTES=262144000