Archive for September, 2010

There’s a WCI App For That 4: CardMigrator

Tuesday, September 28th, 2010

In my last post, I talked about the need to update both a cards’ location AND its CRC in the WebCenter Interaction database to migrate cards from one UNC path to another.  Today’s post is about an “App For That“, which is a utility I had written last year but essentially abandoned until Fabien Sanglier’s great tip about the CRC value needing to be changed.

The app is one of those “thrown together” .NET jobs where I was more focused on the need to update tens of thousands of cards for a client, rather than building a pretty and usable UI.  As such there isn’t a whole lot of error checking, and I’m not comfortable sharing the whole code base here – mostly because I’m just embarrassed about how it was hacked together.  But, if you’ve got a need for something like this, drop me a line and hopefully I can help you out or just send you the code as long as you promise not to make fun of me :).

The code is pretty straight-forward:

  1. After entering the connection strings for the API and the DB (since, as mentioned, we haven’t yet found an ALUI / WebCenter API to make the change to the CRC), you click “Load Crawlers”. 
  2. The crawler list shows up in the tree on the left, grouped by Content Source since you’re likely only updating cards based on the NTCWS, and not WWW or Collaboration Server crawlers. 
  3. Clicking on a crawler shows you all the cards associated with that crawler, as well as a bunch of useful metadata. 
  4. From there, you can do a search and replace on the UNC paths for all the cards.  The update process uses the API and Database methods to update the cards and the crawler, so the next time the crawler or Search Update jobs run, no cards are updated since everything matches up – assuming, of course, you’ve already moved the physical files to the new location! 

Some relevant code is after the break; again, drop me a line if you’re looking for more.


Updating the Location of a Crawled Card in WebCenter Interaction

Friday, September 24th, 2010

Much has been written on Content Crawlers and Cards in Plumtree’s Knowledge DirectoryChris Bucchere has done an excellent writeup on creating custom crawlers, and Ross Brodbeck has done the same for cards in the Knowledge Directory.  In fact, as I re-read those two articles, I realize this post addresses open issues in both articles – how to change the location of a card, and what the Location CRC values are within a card.

In the spirit of giving credit where credit is due, today’s post is based an excellent tip I learned recently from Fabien Sanglier, who figured out this little gem long before I did, and I believe had even posted code on his ALUI Toolbox project.

First, a word on crawlers:  basically, WCI’s Automation Server just calls several methods in a crawler to perform the following (I’m heavily paraphrasing here):

  1. Open the root path specified in the crawler’s SCI Page and query for “containers” (a.k.a., folders).
  2. Query that container for all “documents” (a.k.a., cards, which don’t necessarily have to be files).
  3. Recursively iterate through each container and query for the documents within each.
  4. For each document found, query for document signature and document fetch URL.
  5. If the document signature or path has changed, flag the card as changed and refresh it (which could be metadata, file content, or security)

Later, the Document Refresh and Search Update jobs will also use that crawler code to keep track of whether documents have changed in the source repository (by checking the document signature), and whether the document has moved.  If the signature hasn’t changed, the card remains untouched:

Now, let’s say you need to change the path of an NT Crawler because you’re moving those documents elsewhere.  Normally, you’d just move the files and change your crawler’s root path.  The problem with this approach is that the crawler won’t be able to recognize these files as the same ones that are in the Knowledge Directory, because the path has changed.  Consequently, all cards will be deleted and recreated.  This may not be a problem, but if your Content Managers are like any other Content Manager since the Plumtree days, there will be a lot of portlets that link to these documents in their content.  These links will all be broken, because new cards mean new Object IDs, which are part of those URLs (even the “friendly” ones).

The (partial) solution?  Update the paths for the crawlers AND cards through the API, so that the next time the crawler runs, the portal isn’t aware of any changes and doesn’t mess with any of the already-imported cards because the signatures match up.

Here’s the rub, though: not only does the Automation Server check to see if a document’s SIGNATURE has changed (in an NT File Crawler, for example, the signature is just the “last-modified” date), but it also checks to make sure the document’s PATH has changed.  In other words, if a card has an internal path of \\oldfileshare\folder1\mydoc.doc, and you programmatically change the crawler AND the cards to use \\newfileshare\folder1\mydoc.doc, the cards will STILL get wiped out and crawled in as new.  This is because the portal maintains a CRC check of the old document path, so that if it changes, it knows it’s looking at a different document.

Unfortunately, there doesn’t seem to be a way to update this CRC value through the API, so you need to use a direct DB update to make the change.  Below is the code used to generate the CRC and the table where it needs to be updated.  In my next post, I’ll include a more comprehensive listing.

int crca = XPCRC.GenerateCRC64(strCardLocation).m_crcA;
int crcb = XPCRC.GenerateCRC64(strCardLocation).m_crcB;

DbCommand updateComm = oConn.CreateCommand();
updateComm.CommandType = CommandType.Text;
updateComm.CommandText = "update ptinternalcardinfo set locationcrc_a = " + crca + ", locationcrc_b = " + crcb + " where cardid = " + card.GetObjectID();

Bug Blog 8: WebCenter Friendly URLs break Plumtree Excel Portlets

Monday, September 20th, 2010

I’ve been a pretty big fan of Friendly URLs in WebCenter Interaction ever since it was AquaLogic User Interaction 6.5.  But the Law of Unintended Consequences has a way of striking in unique ways.  In today’s post, the bug is that Friendly URLs cause problems with the old Plumtree Excel Portlets.  Basically, if you have an Excel portlet on any page other than the home page in a community, pagination breaks because the browser bounces back to the home page in the community when you click any links.

This seems to have to do with the way the return URI is sent to the remote tier, and how the GDK interprets the URL, interacting with the fact that there are no longer query strings in the source URL (that’s right – this ancient Plumtree code uses ASP / VB and the long-dead Gadget Development Kit).  I won’t bore you with all the analysis that went into this, but the code fix is pretty straightforward.

If you’re still using Excel portlets in WCI 10gR3, you need to make the following fix in C:\ Program Files\ plumtree\ Excel Portlet Framework\ gadget\ setGadgetDisplay.asp.  Change:

Call Response.Redirect(strReturnURI & "#" & strID)

…to this:

'11/3/2009 mchiste: because the 10gR3/gdk/friendly urls screw up the strReturnURI, we use this URL instead to go back to the page we were just on
Dim strReturnURI2
strReturnURI2 = Request.ServerVariables("HTTP_REFERER")
If (strReturnURI2 = Null) Then
Call Response.Redirect(strReturnURI & "#" & strID)
If InStr(1,strReturnURI2, "#") <> 0 Then
strReturnURI2 = Mid(strReturnURI2,1,InStr(1,strReturnURI2, "#")-1)
End If
Call Response.Redirect(strReturnURI2 & "#" & strID)
End If

The code change basically tells the browser to return to the page it came from, rather than “trusting” the portal/gdk.

Cool Tools 10: Base64 Decoder

Thursday, September 16th, 2010

Generally, my Cool Tools articles feature tools that are novel, unique, or otherwise helpful when managing WebCenter Interaction portals, or other applications that can help augment (or – dare I say, replace?) it.  Today’s Cool Tool is in a category where apps are a dime a dozen.  So, let’s call, uh,’s Base64 Decoder a Cool Tool:

Why highlight a dime-a-dozen online app that’s pretty much just a free online tool?  Because today I’m going to explain a bit about how Basic Authentication works between the portal and remote tier, and show you a trick to answer a question that you may have come across during your portal administration:  what password has been configured for the “authenticationid” in the portal for ALUI Publisher Remote Server (or Collaboration Server, for that matter)?  In the process (after the break), hopefully you’ll get a little insight into why it’s not all that secure in and of itself. (more…)

Bug Blog 7: ALUI Publisher Port already in use?

Sunday, September 12th, 2010

This (configuration) bug has been around in Publisher for a while, and I’ve always fixed it the “wrong” way. Occasionally, you may have seen the following error crop up in %PT_HOME%\ptcs\6.x\logs\service.log when starting ALI Publisher, and Publisher fails to start:

INFO | jvm 1 | 2010/07/20 16:59:07 | 16:59:07,450 ERROR Starting failed jboss:service=Naming
INFO | jvm 1 | 2010/07/20 16:59:07 | java.rmi.server.ExportException: Port already in use: 1098; nested exception is:
INFO | jvm 1 | 2010/07/20 16:59:07 | Address already in use: JVM_Bind

Why? Well, Publisher uses some internal JNDI services to communicate between components (I think; honestly I have no idea what this port is actually for), and if it can’t grab the port at startup, it can’t start up. Wonderful. This port has always been specified in %PT_HOME%\ptcs\6.x\settings\config\container.conf:


… and I’ve always fixed this problem by changing that port number and re-starting (as I write this blog, the WebCenter Interaction portal you’re reading this site on has a value of 1097 in there, indicating that at some point long ago I had this problem and fixed it this way).

Recently, though, I got a great explanation from Naman Shah at PPC: this has to do with Ephemeral Ports in Windows. The description in that article says it all:

What is not immediately evident is that when a connection is established that the client side of the connection uses a port number. Unless a client program explicitly requests a specific port number, the port number used is an ephemeral port number. Ephemeral ports are temporary ports assigned by a machine’s IP stack, and are assigned from a designated range of ports for this purpose.

In other words, Publisher can’t start because Windows is already using that port for one reason or another. So, now I know the “right” way to fix this issue: rather than playing whack-a-mole changing Publisher’s port every time the problem occurs, you should simply tell Windows not to use that port.

How? In the registry, navigate to HKLM\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\ Parameters\, and add or change a line to the Multi-String value called ReservedPorts. Add in 1098-1098 on its own line, and Windows will stop using that port in the future, allowing Publisher to keep doin’ what it’s doin’.

Wall of Shame Rant: Comment Spammers

Monday, September 6th, 2010

I know I haven’t posted in a while, but – wow – those comments keep coming in!  Oh wait, no, they’re all from spammers who clearly have nothing to do but waste my time deleting them all.  These leeches should all … well, let’s keep it clean for the kiddies.  Spam is a fact of life, and it’s only going to get worse.

Fortunately, I was able to get a little bit of satisfaction recently by NOT approving the following post:

Dear Russian Mafia, I didn’t approve this asshole’s comment.  You know what to do. 

For the rest of you all, I’ve turned on Captchas for commenting so at least the automated spambots will be kept out.  Sorry for the additional 10 seconds when posting comments here!