Archive for the ‘Coding Tricks’ Category

Updating the Location of a Crawled Card in WebCenter Interaction

Friday, September 24th, 2010

Much has been written on Content Crawlers and Cards in Plumtree’s Knowledge DirectoryChris Bucchere has done an excellent writeup on creating custom crawlers, and Ross Brodbeck has done the same for cards in the Knowledge Directory.  In fact, as I re-read those two articles, I realize this post addresses open issues in both articles – how to change the location of a card, and what the Location CRC values are within a card.

In the spirit of giving credit where credit is due, today’s post is based an excellent tip I learned recently from Fabien Sanglier, who figured out this little gem long before I did, and I believe had even posted code on his ALUI Toolbox project.

First, a word on crawlers:  basically, WCI’s Automation Server just calls several methods in a crawler to perform the following (I’m heavily paraphrasing here):

  1. Open the root path specified in the crawler’s SCI Page and query for “containers” (a.k.a., folders).
  2. Query that container for all “documents” (a.k.a., cards, which don’t necessarily have to be files).
  3. Recursively iterate through each container and query for the documents within each.
  4. For each document found, query for document signature and document fetch URL.
  5. If the document signature or path has changed, flag the card as changed and refresh it (which could be metadata, file content, or security)

Later, the Document Refresh and Search Update jobs will also use that crawler code to keep track of whether documents have changed in the source repository (by checking the document signature), and whether the document has moved.  If the signature hasn’t changed, the card remains untouched:

Now, let’s say you need to change the path of an NT Crawler because you’re moving those documents elsewhere.  Normally, you’d just move the files and change your crawler’s root path.  The problem with this approach is that the crawler won’t be able to recognize these files as the same ones that are in the Knowledge Directory, because the path has changed.  Consequently, all cards will be deleted and recreated.  This may not be a problem, but if your Content Managers are like any other Content Manager since the Plumtree days, there will be a lot of portlets that link to these documents in their content.  These links will all be broken, because new cards mean new Object IDs, which are part of those URLs (even the “friendly” ones).

The (partial) solution?  Update the paths for the crawlers AND cards through the API, so that the next time the crawler runs, the portal isn’t aware of any changes and doesn’t mess with any of the already-imported cards because the signatures match up.

Here’s the rub, though: not only does the Automation Server check to see if a document’s SIGNATURE has changed (in an NT File Crawler, for example, the signature is just the “last-modified” date), but it also checks to make sure the document’s PATH has changed.  In other words, if a card has an internal path of \\oldfileshare\folder1\mydoc.doc, and you programmatically change the crawler AND the cards to use \\newfileshare\folder1\mydoc.doc, the cards will STILL get wiped out and crawled in as new.  This is because the portal maintains a CRC check of the old document path, so that if it changes, it knows it’s looking at a different document.

Unfortunately, there doesn’t seem to be a way to update this CRC value through the API, so you need to use a direct DB update to make the change.  Below is the code used to generate the CRC and the table where it needs to be updated.  In my next post, I’ll include a more comprehensive listing.


int crca = XPCRC.GenerateCRC64(strCardLocation).m_crcA;
int crcb = XPCRC.GenerateCRC64(strCardLocation).m_crcB;

DbCommand updateComm = oConn.CreateCommand();
updateComm.CommandType = CommandType.Text;
updateComm.CommandText = "update ptinternalcardinfo set locationcrc_a = " + crca + ", locationcrc_b = " + crcb + " where cardid = " + card.GetObjectID();
updateComm.ExecuteNonQuery()

Treat Collaboration Server as a REST-based API

Thursday, August 5th, 2010

The IDK methods for Collaboration Server are terribly sparse – you can’t get calendar events, file sizes, or a whole bunch of other critical data that you may want if you were to actually embark on a mission to write a better UI for Collab (trust me, I have).  Sure you could try and use the woefully undocumented Collab API – I’ve shown you how to deploy the portal API in the past – but that’s a challenge in and of itself.

Instead, let’s look at an alternate approach:  use the Collab Server as a sort of REST API.  It’s not really, but the basic idea is that you use URLs in your code to directly call functionality in Collaboration Server to do certain tasks.  For example, say you want to add a Collaboration project to a page programmatically; there is no mechanism to do this through the IDK, and we have no idea how to use the API, but using a header tool, we find that through Project Explorer, it works with a simple URL: /collab/do/project/selector/add?commPage=true&projID=COLLABID.

So, it turns out we can do the same thing programmatically, by using Java’s network libraries to call that URL directly (setting the proper authenticationid).  The code after the jump shows an example of how to do this; we use this approach in Integryst’s Automater, which allows you to script a bunch of actions at a time (what good is automatically creating a collab project if you can’t add it to a community page you just created!?). 

Tweak away!

(more…)

Deploying WCI API applications without the portal installed

Monday, July 5th, 2010

Many of you have developed WebCenter Interaction portlets (and ALUI before that, and Plumtree before that), and you likely know the difference between the WebCenter Interaction Development Kit (IDK) and the Portal Server API. The difference is pretty straightforward: the IDK provides a relatively simple way to get started and provides Portal Remote Calls (PRC) to manipulate various objects. But it’s not all that powerful, and there are a LOT of things you simply can’t do with the PRC. That’s where the Server API comes in: it can pretty much do anything the portal itself can, but it is significantly more involved to setup. This has to do with how they both work:

  • The IDK / PRC makes remote SOAP calls to the WS API server in the portal stack. The WS API server (and, in some cases, Collaboration Server or Publisher Server) are the services that actually do the heavy lifting, like manipulating the database.
  • The API, on the other hand, is essentially a fully working portal loaded using either .jars (Java) or .DLLs (.NET), so it has full access to do anything the portal can, but also needs all the portal configuration files and secondary libraries (such as for the search server) installed on the machine.

The easiest way to use the API on a remote server is to install a portal component such as the portal itself, or Automation Server. But in some cases, this isn’t desirable; you just want your application running on the remote tier without actually installing the portal. In this case, you will need to do a pseudo-install by copying all the correct files to your portlet server, updating the configuration properly, and setting various environment variables so the portal code can find the files it needs. I should also point out that there is no harm in using Java files on the remote tier even if you’re using a .NET portal; the code is the same, and fundamentally serves the same purpose – handling updates to the database.

So, here’s a quick guide on how to set up a portlet on a remote server that doesn’t have the portal installed:

  1. Make sure your application has the same .jar files (or .dlls, if it’s a .NET app) as the portal is using in your environment: you want to make sure that your remote libraries are exactly the same as the portal libraries so they do the same thing when run.
  2. Copy the ptportal, settings, and common directories from %PT_HOME% on one of your portal servers to a folder on your remote server – in my case, I’m using C:\integryst\, so you should have the following subfolders:
    1. c:\integryst\ptportal
    2. c:\integryst\settings
    3. c:\integryst\common
  3. Update c:\integryst\settings\configuration.xml: replace all instances of c:\bea\alui\ptportal\10.3.0\webapp with c:\integryst (or whatever source/destination folders you’re using). Also, replace the original machine name with the remote server’s machine name.
  4. Set the following environment variables:
    1. ICU_PATH=C:\integryst\common\icu\2.6\bin\native
    2. INXIGHT_PATH=C:\integryst\common\inxight\3.7.7\bin\native
    3. OUTSIDEIN_PATH=C:\integryst\common\outsidein\8.1.9\bin\native
    4. PORTALLIB_PATH=C:\integryst\ptportal\10.3.0\bin\native
    5. PORTAL_HOME=C:\integryst\ptportal\10.3.0
    6. PTHREADS_PATH=C:\integryst\common\pthreads\2002.11.04\bin\native
    7. PT_HOME=C:\integryst
    8. PATH= C:\integryst\ptportal\10.3.0\bin\native; C:\integryst\common\pthreads\2002.11.04\bin\native; C:\integryst\common\outsidein\8.1.9\bin\native; C:\integryst\common\inxight\3.7.7\bin\native; C:\integryst\common\icu\2.6\bin\native;

… and that should do it! Now, when you run your application, if you have PTSpy installed (or even if you don’t), you should see messages in there when the app starts up that look just like the portal startup messages, because that’s essentially what you’re doing in your code – loading the entire portal and leveraging any APIs that are available to the portal itself!

Bug Blog 6: Fix Broken File Downloads in 10gR3 (Part Trois)

Sunday, June 20th, 2010

*sigh*.  You’ve upgraded from ALUI to WCI 10gR3, and your Knowledge Directory links got all screwed up, didn’t they?  HTML files now throw an open/save dialog, some documents don’t open, you can’t copy links by just right-clicking them and choosing “Copy Shortcut”, and IE throws a popup blocker when you click a link in the Knowledge Directory, doesn’t it? 

You got the “To help protect your security, Internet Explorer blocked this site from downloading files to your computer” blues, huh?

I’ve tried creating a Plumtree Filter, and that worked pretty well, but not quite enough.

I’ve tried tweaking the Portal’s Javascript files, and THAT worked pretty well, but again, not quite well enough.

So, today, my friend’s, third time’s a charm:  rather than trying to fix this on the server side, we’re going to knock out this issue once and for all on the client side.  Check out those previous blog entries for a more detailed description of the problem, but basically, it’s because the portal uses a crazy convoluted way of opening documents via javascript. 

What we’re going to do today is stick some javascript in the footer of the page.  This JavaScript is going to simply find all those back-asswards links and replace them with NORMAL <a href=xxx target=_new> links.  If you add it to the footer of your site (specifically, the footer used for the KD, but the code is smart enough not to do anything if it’s on all pages), it should be able to take care of the rest.  Just make sure you add it to the Presentation Template and not the Content Item, because you-know-who only knows what a mess the out-of-the-box rich text editor is with javascript and adaptive tags.

The code after the break should be self-explanatory, but as always, if you’ve got a question or comment, feel free to post it here.  Also note that this code only fixes the download links; it doesn’t kill that “Open/Save” dialog box for things like HTML files.  For that you’ll still need the Plumtree Filter. (more…)

Use Host Files for better WCI security, portability and disaster recovery

Tuesday, June 8th, 2010

When configuring a WebCenter Interaction portal, it’s highly recommended to use host files on your machines to provide aliases for the various services.

For example, instead of referencing Publisher’s Remote Server as http://PORTALPROD6.site.org:7087/ptcs/, create a host file in C:\Windows\System32\drivers\etc\hosts, and add a line like this:
wci-publisher 10.5.38.12 #IP Address for Publisher in this environment
… then set your Remote Server to http://wci-publisher:7087/ptcs/.

I’m always surprised how many times the knee-jerk reaction to this suggestion is that this is a poor “hack”, or something worse like this:
“Host files??? Host files on local servers need to be avoid and you should use DNS in AD for the Portal servers. Host files, again, are an antiquated and unmanageable configuration in this day and age and, in my opinion, should only be used when testing configurations—not for Production systems. I haven’t seen host files used locally on servers in a decade…is that how you are configuring this portal system? If so, I would highly recommend you try to use the AD DNS instead.”

Yes, that’s an actual response from an IT guy who prefers telling others what idiots they are rather than actually listening to WHY this approach is being used.  In all fairness, most knee-jerk reactions are based in the reality that host files are more difficult to maintain on many servers rather than DNS entries on a single server.  But hopefully, if you’re reading this blog, you’ve got an open mind, and will agree with this approach once you see the list of benefits below.

Benefits of using host files in your portal environments:

  1. Security.  When you access a service through the portal’s gateway, the name of the remote server shows up in the URL: http://www.integryst.com/site/integryst.i/gateway/ PTARGS_0_200_316_204_208_43/ http%3B/wci-pubedit%3B8463/publishereditor/ action?action=getHeader.  For most people, this isn’t a huge problem, but allowing the name of the servers to be published in this way can be perceived as a security risk.  By using host files, you’re essentially creating an alias that hides the actual name of the server.
  2. Service Mobility.  Take the NT Crawler Web Service, for example.  When you crawl documents into the portal, the name of the server is included in the document open URL.  Now suppose the NTCWS is giving you all sorts of grief and you decide to move it to another server.  If you use host files, you can just install the NTCWS somewhere else and change the IP address that the wci-ntcws alias points to.  This way, the portal has no idea the service is being provided by another physical system  If you used a machine name, all documents would get crawled in as new the next time you ran the crawler, because the card locations will have changed.
  3. Maintainability.  This one’s a pretty weak argument, but is based on the fact that most of the time, the Portal Admin team doesn’t have access to create DNS entries and has to submit service requests to get that done.  By bringing “DNS-type services” into host files, the portal team can more easily maintain the environment by shifting around services without having to submit “all that paperwork” for a DNS entry (your mileage may vary with this argument).
  4. Environment Migration.  Here’s the clincher!  Most of us have a production and a development environment, and occasionally a test environment as well.  Normally, code is developed in dev and pushed to test, then to prod, but content is created in prod, and periodically migrated back to test and dev, so those environments are reasonably in synch for testing.  This content migration is typically done by back-filling the entire production database (and migrating files in the document repository, etc.).  The problem is, all kinds of URLs (Remote Servers, Search, Automation server names, etc.) are stored in this database, so if you’re using server names in these URLs, your dev/test environments will now have Remote Servers that point to the production machines, and you need to go through and update all of these URLs to get your dev environment working again!  If, however, you use host files, then you can skip this painful step:  your Publisher server URL (http://wci-publisher:7087/ptcs/) can be the same in both environments, but the host files in dev point to different machines than the ones in production.  Cool, huh?
  5. Disaster Recovery.  This is essentially the same as the “Environment Migration” benefit:  When you have a replicated off-site Disaster Recovery environment, by definition your two databases are kept in synch in real-time (or possibly on a daily schedule of some sort).  If a disaster occurs and you lose your primary environment, you’re going to want that DR site up as soon as possible, and not have to go through changing all those URLs to get the new environment running with new machine names.  Of course, unlike “Environment Migration” (where your dev, test, and prod environments typically share the same DNS server), this argument is also slightly weaker.  Since the DR site will likely have its own DNS server, you could conceivably just use different DNS entries at the two different sites and all will work fine.

So that’s it – hopefully you’re convinced that host files are the way to go for configuring ALUI / WCI portals; if so, stay tuned for helpful tips on how to set this up for various servers.  While Remote Servers are a no-brainer, configuring things like Automation Server and Search can be a little trickier.

Bug Blog 4: Fix Broken File Downloads in 10gR3 (Part Deux)

Monday, May 3rd, 2010

Last week I provided portal filter code to fix some elements of opening documents in WebCenter Interaction’s Knowledge Directory (and Search and Snapshot Queries).  This week is another follow-up on that theme, and I’ll provide another piece of code to continue cleaning up the mess that is the Knowledge Directory Downloading Clusterf*ck (in IE at least).

Ever gotten this little present after doing an upgrade to 10gR3, when trying to open a document in the Knowledge Directory using IE7 or IE8?

 

Ah, the old “To help protect your security, Internet Explorer blocked this site from downloading files to your computer.”  What’s even more ridiculous, if you “accept” the download, it STILL doesn’t work because IE tries to reload the page, and WCI is very stupid about how it opens this popup window (see how that address bar shows “about:blank”?).  It seems this only happens in IE, and only if you have adaptive layouts disabled with “open documents in new window” enabled, so it doesn’t affect everyone.  But I die a little on the inside every time I see that stupid thing: yes, my friends, I am an empty, hollow shell of a consultant.

Fortunately, I decided to seek redemption for WCI and myself by proxy, and checked out the HTML source for a typical file open link.  It’s kinda hilarious:


<a href="#" onclick="var currentWin = PTCommonOpener.openInNewWindow('', 'Opener_18_1322446', 800, 600, true); currentWin.location= 'http://server/portal/server.pt/document/1322446/slide_1'; return false;">Doc Name</a>

Whhaaaaaaa…!?  Sure I know there are dozens of ways you can open a document in a popup window, and many have their advantages (for example, by using JavaScript you can control the size and layout of the window), but I’d be hard-pressed to come up with a worse way to open a document.  I’ve always hated the fact that you can’t just hover over the link to see the URL, and can’t right-click and copy the shortcut to mail to someone later. (Tip: use adaptive layouts and these problems go away!).  Or they could have at least just done a window.open method in that onclick event.  OK, so they wanted to use a wrapper method – that’s fine – but the PTCommonOpener.openInNewWindow actually takes a URL paramater, so they could have just passed the URL into that function, rather than creating a new, EMPTY document and then trying to use Javascript to load the document into that window.  At very least they could have put the document URL in the “href” parameter so it’d work without JavaScript (and be 508-compliant) and users could copy the links with a simple right-click.

OK enough rambling.  On to a solution: (more…)

Bug Blog 3: Fix Broken File Downloads in 10gR3

Thursday, April 29th, 2010

Wow, was WebCenter Interaction 10gR3 a step backwards in opening documents, or what?  I’ve seen wide-spread problems opening docs in the Knowledge Directory, Search Results, and Snapshot Queries, and even written two posts on the topic already.

There are two main problems: poorly implemented document open handlers (which I’ll discuss in my next post), and that pesky Content-Disposition header that tries to force the browser to throw up an “Open/Save” dialog instead of just opening inline (which is definitely useful for HTML and Office files in particular, where users just want to view the documents in their browser without having to save ‘em first).  A third but ancillary issue is that occasionally the portal gets the MIME type of the document wrong in the Content-Type header, which is typically using application/octet-stream and doesn’t give the browser any clue about what type of file it is.

In the past posts I’ve described using BigIP iRules to remove the Content-Disposition header, but I ran into another client issue recently where just killing this header isn’t enough.  Take the following response headers for a document transfer that is NOT using the iRule to kill the Content-Disposition header:

This PowerPoint file, when clicked on in the portal, will throw the open/save dialog, but what if we wanted it to give the browser the option to open in-line?  We’d get rid of that Content-Disposition header.  Here’s the problem:  if we do that, the portal is returning a Content-Type of application/octet-stream.  Which means, without the file extension in Content-Disposition, the browser has no idea what type of document this is, so it throws up a “FIND PROGRAM/Save” dialog box instead of an “OPEN/Save” dialog box.

The solution to this problem would be to drop the Content-Disposition header, but use the file extension to fix the Content-Type while we’re at it (to application/vnd.ms-powerpoint).  But now our iRules are getting complicated, and not everyone has BigIP anyway.  Fortunately, my love/hate relationship with support.oracle.com shifted much more to the former when i found this: How to Eliminate the ‘Save/Open with’ Prompt when Opening HTML Files [ID 971003.1].  Then, of course, it shifted back to the latter when I realized that I found this article by complete dumb luck navigating the terrible interface, and the article is nowhere to be found searching through Google. But that’s another story.

At any rate, a solution was clear: you can write a .NET or Java FILTER to manipulate headers before they’re returned to the browser!  As such, I took the code from that article and expanded on it with a .NET filter of my own, which will:

  1. Remove the Content-Disposition header selectively on different file types
  2. Change the Content-Disposition header to use “inline” instead of “attachment” instead of just deleting the header (this seems to work with some browsers; a brief discussion on the differences is here)
  3. Fix the Content-Type header if the file name extension doesn’t map the MIME type the portal thinks it is
  4. Do all this through a filter and a dynamically loadable varpack that allows different configurations based on the type of file being served.

So while it’s a bit optimized in the sense that it only looks at responses being gatewayed through the portal, it’s still kind of a hack because personally, I think this should be an option in the portal in the first place rather than adding the additional overhead of a filter.

But, nonetheless, it seems to work well, and you can download the code here and the varPack file here.

If you’d like assistance with actually building and deploying this, or porting to Java – drop me a line.

The perils and pitfalls of the WCI gateway transformer

Saturday, April 3rd, 2010

All right, so you know my ol’ phrase “browser to portal, portal to back-end server“.  That is, the browser connects to the portal, the portal connects to the back-end-server (i.e., your custom code), back-end server sends HTML to the portal, the portal parses and transforms that HTML and returns it to the browser. 

This post is about that step where the portal transforms the HTML you return from the your custom code.  I could be incredibly long-winded here about all sorts of tricks and nuances of some adaptive tags that seem to only half-work, but figured I’d just share some quick tips to give you some ideas on how you can trick the gateway engine to do your bidding.
(more…)

Setting config.xml for WebLogic in Oracle’s jDeveloper

Wednesday, March 24th, 2010

This post is a two-fer.  For those of you that are drinking the Oracle KoolAid, you’ve no doubt been using Oracle’s JDeveloper lately, especially since WebCenter Spaces and Services require it to do anything substantive with them.  In the 10.x line, Oracle incorporated WebLogic (bloatware, IMO, but that’s another post entirely) as the default Application Server for debugging your apps.

The first tip here is that by default WebLogic Server looks at the Authentication Header, and even if you’re your code and app is set to allow anonymous access, if there’s any HTTP Authentication header, WebLogic fails to handle the requests and throws up a browser login dialog:

This has caused me headaches with PublisherEditor, which uses the ALUI Publisher Web Service to run.  The Publisher web service by default uses authentication headers, so the Publisher authentication headers get sent to my portlet code.  Fortunately, the fix for this is pretty straight-forward and documented: add the following to your config.xml file:

<enforce-valid-basic-auth-credentials>
false
</enforce-valid-basic-auth-credentials>

(more…)