Syntactica Solutions for Structured Search

Structured search is the process of using the structure within a document to create a better search experience for your users. For example if a search keyword matches the text within the title of the document that search should receive a higher search ranking in the search result.

If your documents have titles, keywords, abstracts, a summary, chapter titles, terms or indexes you can usually get superior search results by using a structured search system.

Our weighted fulltext keyword matches can also be combined with any XML elements in a document to allow complex searches that also include date ranges, authors, document types or any other document metadata.

The following table compares the for main search technologies being used today.

Relational Database Objects Document Structured Search
Main Structure Table Object Document Tree
Model Relational model (ER Diagram) Object model (UML) Document (e.g. Microsoft(TM) Word(TM) Document) XML Schema
Detail Structure Table with columns. Each column can contain only one data type. Object with data inside surrounded by accessor methods. free-form text XML document
Query Language Structured Query Language (SQL) Object Query Language (OQL) Keyword XQuery
Standards Body for Search Standards None None None W3C
Fulltext Search Standards None None None W3C Fulltext
Search Result Boolean Boolean Weighted Weighted
Customization of Search Weight No ranked results No ranked results Procedural Programming (Java etc.) Form
Search Rank Rule Language No ranked results No ranked results None XPath

Note: This table was inspired from chapter 10 of the book Information Retrieval

How Syntactica Structured Search Works

Structured search uses the nesting structure inside an XML document to create a set of rules. Each rule, often called the "boost" rule, determines how much to boost the value of a match of a keyword within any structure. When the Syntactica tools analize your documents they build a list of key elements within your document. For example they will identify the "title" XML element within your document. Once these elements have been found the Syntactia tools present the search administrator with a list of all the tags. The search administrator selects a boost value for each critical section of the document. Typically all sections have a boost value of 1. The main title of a document might have a boost value of 3 and the chapter titles a boost value of 2. Other text can be ignored or have a boost value set to be very low. For example a footnote text might have a boost value of .25.

Once the search administrator sets the boost value the document collection can be reindexed. This is typically only done when the indexing strategy is changed.

What differentiates the Syntactica approach from others is our ability to quickly build XRX applications to manage the search profile for each individual group of your users. This means that you can give a custom search experience to each role or project within your organzation. This means that users find the documents they need faster and they spend less time looking at search results for documents that they are not interested in.

Syntacitca search forms can be quickly customized to include advanced search fields such as date ranges, authors or document types.

Standards Protect You from Vendor Lockin

Many search vendors do provide you with tools to create custom search based on a set of rules you provide the search systems. But very few vendors are using the W3C standards for XPath, XQuery. With the Syntactica approach your search solutions will be portable to other XML search engines that support XPath and XQuery. As as the new W3C XQuery full-text standard matrures the Syntactica solutions with also be modified to support those open standards.