Archive for the ‘Crawler’ Category

Crawl RSS Feeds with WebCenter Interaction

Wednesday, October 19th, 2011

I don’t know whether to file this one under “obvious” or not. On one hand, I guess most people have always known this. But on the other, it’s such an under-used feature it bears repeating: Web Crawlers in Webcenter Interaction (and even back in the ALUI days) aren’t just for web sites – they can crawl RSS feeds too.

Configuration is identical to creating a Web Crawler. In administration, select “Create Object: Content Crawler – WWW” and choose the “World Wide Web” Content Source:

Here, instead of entering a web site, just provide the URL of the RSS feed:

Once the job runs, a card is created for each article in the feed:

Note the created date shows when the feed was crawled, not when the original articles were written. And in this example, only 11 cards have been created because that’s all that’s being provided on the Integryst RSS Feed. Both of these problems can be resolved by running your crawler job regularly, so that the dates are closer to when the posts are written, and the cards stick around after they’ve “left the feed”.

Bug Blog 10: NTCWS doesn’t work with .NET 2.0

Tuesday, March 22nd, 2011

Today’s post is a quicky:  if you are installing WebCenter Interaction NTCWS (NT Crawler Web Service), you can test the service by navigating to http://NTCWSSERVER/ntcws/ContainerProviderSoapBinding.asmx, and you should see something like this:

If you get something like this, though:

Server Error in ‘/ntcws’ Application.
Configuration Error
Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.
Parser Error Message: The configuration section cannot contain a CDATA or text element.
Source Error:
Line 37: <appSettings>
Line 38: <!– 4.5WS Portal or not? Effects clickthrough and admin preferences. Acceptable values are 1 and 0. 0 implies a 5.0+ portal. –>
Line 39: 0

… you’re likely running the service with .NET 2.0 instead of 1.0.  The simple fix is to update ntcws\10.3.0\webapp\ntcws\web.config and change:

<add key="Is45Portal" value="@IS_45WS_PORTAL@">0</add>

to:

<add key="Is45Portal" value="0"/>

Happy crawling!