Customisation, Products, Technical
September 18, 2008

Extending SharePoint Search



Enlarge Image

Written by: Jess Bratcher

There are more than 20,000 SharePoint licensees in North America. The number of SharePoint users is in the one million range. SharePoint is one of Microsoft’s success stories, cementing the company’s grip on the crucial enterprise content management. Exact data are not available, but one thing is clear-SharePoint is a widely used content management and collaboration system.

Microsoft is doing a good job of providing SharePoint developers and partners with timely information about the system. A good example is the Microsoft SharePoint Team Blog. In the last few months, a number of useful new information sources have been added. A good example are the “Advanced SharePoint Videos and Whitepapers” located here or via MSDN search.

SharePoint ships with a workable search system, which I’ll refer to as ESS, for Enterprise Search System. It makes excellent use of the metadata for each document processed by the system. In addition, Microsoft has reduced the time and complexity of customizing result pages, handling content on Microsoft Exchange servers, and accessing most standard file types.

In our tests of SharePoint, ESS does a good job and offers some bells and whistles not available in other enterprise search systems. For example, you can identify the individual whose content suggests an author is knowledgeable about a specific topic. Managing crawls or standard index cycles boils down to pointing and clicking. SharePoint is security aware, and administrators don’t have to fiddle with configuration files for most customization jobs.

The one barrier that SharePoint licensees encounter is the system’s “glass ceiling” when indexing upwards of 30 million documents. Microsoft purchased Fast Search & Transfer in January 2008 in order to provide a solution for SharePoint licensees who want to index and update 30 million or more documents without experiencing system slowdowns. Microsoft has released a Fast Search Web part to make integration of the FAST Enterprise Search Platform or ESP easier. The SharePoint FAST ESP Web part is here.

But licensing Fast ESP can easily soar above $250,000, excluding customizing and integrating service fees. Add to this cost hardware, and you have a major investment to make in order to deliver acceptable search-and-retrieval functionality for large, disparate document collections.

What can a SharePoint licensee do to handle larger document collections, reduce index update latency, and deliver point-and-click search functions quickly and for less money?

If you have a document collection that is nosing toward 50 million documents, maybe even higher, you need a search system now. The good news is that there are numerous solutions available. These range from open source options such as Lucene and FLAX to the industrial-strength Autonomy IDOL (intelligent data operating layer), which can cost $300,000 or more before support and maintenance fees are tacked on.

Reasons to Switch from Microsoft’s ESS

Third-party systems can reduce the time required to index new and changed documents. As you approach the 50 million document limit, you may find that SharePoint indexing times increase. But if you are struggling with slow indexing, you will want to consider that larger documents take more computing resources. When the volume of documents to be indexed rises and the average file size goes up, you may be experiencing a performance bottleneck.

You may want certain features available for your users. Among the most-requested functions are deduplication of result sets, parametric searching/browsing, entity extraction and on-the-fly classification, and options for merging different types of content in the SharePoint environment. Web pages located in the cloud and third-party content are two types of content that can absorb scarce resources.

One of the major reasons for shifting from the ESS to a third-party system is a need to provide certain features to different types of users. For example, a SharePoint user needing information for customer support tasks wants a “view” of certain information. Third-party systems can provide administrative tools that make this type of collection-centric search a point-and-click checklist in the search administration system.

Change for its own sake is not a sure-fire recipe for success. A solid user experience or business reason is. The good news is that there are more than 300 vendors with enterprise search systems that to a greater or lesser degree support SharePoint. The bad news is that you have to select a system.

Switching Methodology

Each IT professional with Microsoft certification knows how to set up, configure, and maintain SharePoint and other “core” Microsoft server systems. Let’s look at a methodology for replacing SharePoint with ISYS Search Software’s ISYS:web. ISYS is one of a half-dozen vendors offering so-called “SharePoint Search” capabilities.
Here’s a run down of a procedure that minimizes pitfalls.

1. Set up a development server with SharePoint running. You don’t need to activate the search services. This can be on a computer running Windows Server 2003 or 2008. Microsoft recommends at a minimum a server with dual CPUs, each running at least 3 GHz, and 2 GB of memory. Also necessary for installation are Internet Information Services (IIS, along with its WWW, SMTP, and Common Files components), version 3.0 or greater of the .NET Framework, and ASP.NET 2.0. A more detailed look at these requirements can be found here.

2. Create a single machine with several folders containing documents and content representative of what you will be indexing.

3. Install ISYS:web 8 on the machine running SharePoint

4. Work through the configuration screens, noting the information required to add additional content repositories to index. An intuitive ISYS Utilities program will let you configure SharePoint indexes.

5. Launch the ISYS indexing component. Note the time indexing begins and ends. You will need these data in order to determine the index build time when you bring the system up for production.

6. Run test queries on the indexed content. If the results are not what you expect, make a return visit to the ISYS set up screens, verify your choices, delete the index, and reindex the content collection. Be sure to check that entities are appearing in the ISYS display.

7. Open the ISYS results template so you can familiarize yourself with the style sheet and the behind-display controls.

8. Once you are satisfied that the basics are working, verify that ISYS is using security flags from Active Directory.

Once you are satisfied, you can install ISYS on the production server and begin the processing of generating the master index. Additional image files for the ISYS installation are available here. These include screen shots illustrating how to set up the ISYS index.

Some Gotchas to Avoid

Most third-party search systems from Microsoft Certified Gold partners mesh smoothly with SharePoint. However, there are some challenges you will want to think about.

First, when documents change, the search system must recognize that change, copy or crawl the document, and make the changed document available to the indexing subsystem. The new index entries must be added to the main index. Some search systems use a separate index for new and changed entries, thus a single query is passed against two or more indexes. When a slow down in indexing occurs, you will need to check the resources available. Network latency can bog down the system. Inadequate random access memory and insufficient disc storage can choke the system with disc accesses.

Second, keep in mind that new documents must be indexed and changed documents have to be reindexed. In both processes, the main index must be updated. Setting the index update at too aggressive a level can slow down query processing. If you have adjusted the time and depth of your crawls, you may want to expand the hardware available to the system. Clustering can speed up search systems, but you will need to allocate additional time to configure and optimize the systems.

Third, additional text processing features such as deduplication, entity extraction, clustering, and generating suggestions or See Also hints for users suck computing resources. Under load, these indexing functions can slow down even well-resourced search systems. Implement only the text processing functions you need. Fancy extras can contribute to sluggish performance. Users will lose patience with a slow response to a query.

Finally, trim the graphical bells and whistles. The most recent version of SharePoint supports Microsoft’s rich media tools such as Silverlight. In an organization, users want answers. Some of the consumer-oriented features make a search system visually appealing, but when used to excess, eye candy can get in the way of a user’s getting the information required quickly.

To sum up, SharePoint ships with a usable search-and-retrieval system. When you want to break through the current document barrier or add features quickly, you will want to consider a third-party solution. Regardless of the system you select, set up a development server and run shake downs to make user the system will deliver the results the users need.

Jess Bratcher, for Beyond Search

This entry was posted on Thursday, September 18th, 2008 at 5:56 am and is filed under Customisation, Products, Technical. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
About the Author

Jess Bratcher

Jess Bratcher is a technical writer, editor, and researcher who works for Arnold Information Technology (http://arnoldit.com). She contributes regularly to Stephen E. Arnold's Beyond Search blog (http://arnoldit.com/wordpress), analyzing and commenting upon software technology and business in search.

Contact the author | Other Posts by Jess Bratcher (1) | Author's Website

Viewing 2 Comments

    • ^
    • v
    Jess...Excellent article, however there are two points that I thought were a bit off base. The first was the title of the article. The word Extends implies an enhancement to the SharePoint search engine. ISYS and other companies that "extend" SharePoint search actually replace the technology. A true extension would leverage the core components of SharePoint Search such as the index, index engine, relevance ranking algorithm, etc.

    The second point, which was probably a typo was in the number of SharePoint users being about a million. My company, BA-Insight, offers a true extension to SharePoint Search and is approaching the one million user mark. According to Microsoft, the number is about 100 million.

    Cheers!
    Martin
    www.ba-insight.net
    • ^
    • v
    Currently there are lots of discussions about search enhancements for SharePoint. But by far the most common and effective method of finding information is browsing a structured classification system (taxonomy).

    To categorize any SharePoint item or document directly with centrally managed taxonomy-based cross-site tree-style categories manually or automatically you can use taxonomy extensions, that are offered by several vendors, e.g.

    http://www.sharepartxxl.com/products/taxonomy/d...

    This gives a great search enhancement too.
 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus


SharePoint Magazine

Support SharePoint Magazine

Technical

Customizing Search Series - Finally Customizing the Search Results

Products

Visual Fusion Brings Location Intelligence to SharePoint

People

SharePoint Magazine talks to Bjørn Furuknap about his upcoming book "Building the SharePoint User Experience"