Syntactica Solutions for Structured Search
Structured search is the process of using the structure within a document to create a better search experience for your users. For example if a search keyword matches the text within the title of the document that search should receive a higher search ranking in the search result.
If your documents have titles, keywords, abstracts, a summary, chapter titles, terms or indexes you can usually get superior search results by using a structured search system.
Our weighted fulltext keyword matches can also be combined with any XML elements in a document to allow complex searches that also include date ranges, authors, document types or any other document metadata.
The following table compares the for main search technologies being used today.
| Relational Database | Objects | Document | Structured Search | |
|---|---|---|---|---|
| Main Structure | Table | Object | Document | Tree |
| Model | Relational model (ER Diagram) | Object model (UML) | Document (e.g. Microsoft(TM) Word(TM) Document) | XML Schema |
| Detail Structure | Table with columns. Each column can contain only one data type. | Object with data inside surrounded by accessor methods. | free-form text | XML document |
| Query Language | Structured Query Language (SQL) | Object Query Language (OQL) | Keyword | XQuery |
| Standards Body for Search Standards | None | None | None | W3C |
| Fulltext Search Standards | None | None | None | W3C Fulltext |
| Search Result | Boolean | Boolean | Weighted | Weighted |
| Customization of Search Weight | No ranked results | No ranked results | Procedural Programming (Java etc.) | Form |
| Search Rank Rule Language | No ranked results | No ranked results | None | XPath |
Note: This table was inspired from chapter 10 of the book Information Retrieval
How Syntactica Structured Search Works
Structured search uses the nesting structure inside an XML document to create a set of rules. Each rule, often called the "boost" rule, determines how much to boost the value of a match of a keyword within any structure. When the Syntactica tools analize your documents they build a list of key elements within your document. For example they will identify the "title" XML element within your document. Once these elements have been found the Syntactia tools present the search administrator with a list of all the tags. The search administrator selects a boost value for each critical section of the document. Typically all sections have a boost value of 1. The main title of a document might have a boost value of 3 and the chapter titles a boost value of 2. Other text can be ignored or have a boost value set to be very low. For example a footnote text might have a boost value of .25.
Once the search administrator sets the boost value the document collection can be reindexed. This is typically only done when the indexing strategy is changed.
What differentiates the Syntactica approach from others is our ability to quickly build XRX applications to manage the search profile for each individual group of your users. This means that you can give a custom search experience to each role or project within your organzation. This means that users find the documents they need faster and they spend less time looking at search results for documents that they are not interested in.
Syntacitca search forms can be quickly customized to include advanced search fields such as date ranges, authors or document types.
Standards Protect You from Vendor Lockin
Many search vendors do provide you with tools to create custom search based on a set of rules you provide the search systems. But very few vendors are using the W3C standards for XPath, XQuery. With the Syntactica approach your search solutions will be portable to other XML search engines that support XPath and XQuery. As as the new W3C XQuery full-text standard matrures the Syntactica solutions with also be modified to support those open standards.