Syntactica Solutions for Dynamic Publishing
Dynamic publishing in the process of generating customized documents on-demand in real time. In the time it takes for a typical web server to generate a web page, large complex documents can be quickly transformed into multiple forms based on complex business rules. And the process can be set up cost effectively using free open source tools. But it can take years to find the right combination of these tools. The Syntactica Dynamic Publishing Framework and Syntactica Dynamic Publisher Training system allows dynamic publishers to get up and running in just a few weeks.
In the age of printed books, document creation was a slow and careful process. Document were carefully checked for quality and then sent to printers that would produce 10,000 or more books at a time. There was a very high cost of creating the plates needed for volume printing. So only books with large print volumes were cost effective. And it was difficult to create different versions of the same book with slightly different content for different users.
Today the publisher must not only create and capture content, they must make this content quickly available in multiple formats on the web and electronic books such as HTML, ePub, XML and PDF. They need to include links directly in documents to quizzes and assessments. And they need to do this all using a modern architecture that keeps the solution cost-effective.
And the expectations of users for precise search is growing. Users want more than simple keyword search. They also expect documents, chapters, articles, quizzes and review questions to appear quickly and they want the most relevant documents to appear first in the result sets. New search and retrieval technologies called "structured search" give precision that was not possible using simple keyword search tools such as Apache Lucene.
What this means is that publishers today need to have a full library of tools that allow their staff to quickly transform their documents into multiple formats. And new transformations need to be created on demand.
The ROI of Dynamic Publishing
The chart at the right shows how cost models for dynamic publishing frameworks work and how they are different from cost models for static publishing. This chart has total cost of ownership as the vertical axis and a complexity measure on the horizontal axis. Complexity has two primary components: the number of documents and the number of views or customized transformations of these documents. Note that there is a higher initial cost to setup a dynamic publishing environment. If you only have a single document type and a single output format such as PDF, there is no good return-on-investment for setting up a dynamic publishing framework. But as the number of documents you need to transform grows and the number of output views grows, the value of the dynamic publishing system will always be lower than static publishing cost models. In general static cost models grow as the square of complexity metrics. Dynamic cost models grow linearly with complexity. Please contact us for helping you determine the static/dynamic crossover point.
Many dynamic publishing users are working with limited budgets and do not have the resources to purchase complex relational databases, convert documents into tabular structures (a process called shredding) and hire full time software developers to create and maintain reports that extract documents from these tabular structures.
Dynamic publishing users need easy-to-use transformation tools that can be customized to their needs without the need for custom software. Dynamic publishers need solutions that do not require complex programming languages such as Java, .Net, C#, Perl, PHP or Python. They want solutions that allow them to drag-and-drop XML files into collections and allow fast but precise retrieval over very large document collections using easy-to-learn query languages such as XQuery.
The Syntactica XRX application architecture is perfectly suited to these requirements. With the XRX web application architecture you will never need to learn about creating middle tier objects, relational databases or transforming XML to and from these other formats.
Syntactica offers a full range of training, tools and processes to do the most with limited budgets. We focus on open source software systems that can be quickly customized to the needs of the dynamic publisher. Our solutions make it easy for non-programmers to handle all phase of the import, analysis and export of dynamic publishing documents from a wide-variety of source formats including DocBook, TEI or any other XML document. At Syntactica our philosophy is to empower the digital publisher with tools that make them more productive within their budgets.
Key Benefits of the Syntactica Dynamic Publishing Frameworks
There are several of the benefits of using the Syntactica Dynamic Publishing Architecture for managing your documents. This architecture includes:
- Open Source (free) Native XML database (currently based on eXist-db.org release 1.4).
- RESTful web service interfaces to make it easy to use, test and integrate with other systems.
- A framework of dynamic publishing tools customized for fast transformation of source documents into multiple formats.
- The Apache Lucene full-text system with tools to make it easier for non-technical users to setup and manage.
- A library of tools to manage databases of publishing information including document metadata.
- A template-driven XRX application system for adding new applications.
Together, the Syntactica Dynamic Publishing Application Framework and (the XRX Web Application Architecture) combine to give organizations rich functionality at very low cost.
An Agile, "No Shredding" Architecture With Fast Updates
Traditional relational database systems used by many older-architectures require users to chop up XML documents into rows and columns so that they can be inserted into tables for performing search and retrieval. With the Syntactica architecture:
- All XML documents are added to native XML system using a simple drag-and-drop process.
- Documents do not have to be “shredded” into SQL tables for indexing and search.
- Documents stay in their native XML format and can be quickly indexed for fast analysis.
- New data elements can be added to documents at any time without disrupting existing publishing workflows.
- New data elements can be quickly added to document reports and transformations.
- Updates to documents do not require reindexing the entire document. Only the elements in a document that change are reindexed.
- Any well-formed XML files (documents or data sets) can be imported into these databases with very little effort.
Library of Dynamic Publishing XQuery Transformations
Syntactica has access to many XQuery transformations that have already been written to transform XML documents into other formats such as HTML, ePub, PDF, atom, timelines and other possible structures. Syntactica also has many processes and tools that enable our customers to embed key document entities like terms, people, locations and dates directly in any XML documents. These structures can be use to enhance document search ranking or facilitate faceted search.
Ease of Reporting with XQuery
It is easy to extract reports on entities from Dynamic Publishing documents. For example a query to find all dates or places in a Dynamic Publishing document is only a few lines of code. The Syntactica provides a library of reporting templates to list items, search items and view items of many different types.
Consistent URL (REST) Interfaces
All Dynamic Publishing documents and Dynamic Publishing extracted entities can have consistent interfaces with external sites using simplified URLs. For example you can create an interface that allows a person, location or term to be stored in a bookmark so that all future documents that reference this entity can be quickly identified.
High Quality Fulltext Search and Retrieval with Integrated Structured Search Rules
The latest release of the Syntactica framework includes a full library of search and retrieval tools built around the extensive Apache Lucene framework document index management tools. These tools allow for very fast fulltext index management with highly-customizable document scoring systems. The Syntactica has also worked to make these indexing tools easier for non-technical users to access.
Structured Document Indexing for Fast Data Access
The Syntactica system uses information within the structure XML files to create highly efficient storage and indexing of large collections of documents. Because this metadata already exists within XML files it makes it much easier for non-programmers to setup and configure document search and retrieval systems. Out XML-aware systems never discard the rich-metadata within XML documents. This information is carefully preserved to create very efficient indexing systems so even large collections of hundreds of thousands of documents can be managed quickly and cost effectively. Even comments within XML documents can be faithfully preserved in the document lifecycle.
Easy Faceted (Drilldown) Searching
With the Syntactica Dynamic Publishing Framework, documents can easily be displayed in a web page using simple queries of the document structure. These queries can be used to show the key items referenced anywhere in this document. For example the right-side of a page can include lists of people, places, dates or events mentioned in anywhere in document. These links can be used as searches to allow users to quickly find other documents that also include these entities. This makes it easy for users to quickly navigate to similar documents.
Data Quality Reporting
The Syntactica Document Publishing Framework makes it easy to create reports that look for data quality problems. For example a very simple XQuery program can be used to check for valid date formats in an entire collection of documents. We find that a flexible library of data quality reporting tools makes it easier for organizations with limited funding to create tools to quickly identify and correct inconsistencies in complex data sets that have been collected from many different sources.
Customization by Non-Programmers
Because XQuery is based on simple XPath expressions, we feel that it is much easier for non-programmers to create and maintain customized reports. Syntactica provides a library of template applications that allow non-programmers with some training to build and maintain their own web applications. With minimal training, many non-programmers and subject-matter experts can become key contributors to development projects.
High Developer Productivity
One of the central benefits of using this architecture is very high developer productivity. Developers that are familiar with XML structures and XQuery can create new web-applications much faster than any other technology. There are several reasons for this exceptionally high productivity including the ability to avoid data translation and to leverage a large base of sample software that can be quickly modified. This productivity is growing each month as the XQuery/Dynamic Publishing community grows and more open-source applications are being shared.
Taxonomy Management Tools
The Syntactica has developed frameworks of tools to manage databases of key entities that can be consistently referenced within Dynamic Publishing documents. Examples of these entities include people, locations, terms, products or events. These entity managers can be quickly customized to meet your needs.
Data Quality Management Functions
The Syntactica has developed a library of tools to perform many data cleanup functions. These functions are usually used to standardize or transform various documents into standard formats with consistent references to defined entities.
XML Service Enablement
All reports or queries inherently web services. This means that these services can be re-used to create new web applications with very little effort. These rapid-applications, also known as mashups, allow researchers to quickly create new reports or visualizations of large and complex data sets.
Automated Document Versioning
The eXist-db 1.4 system also includes a full XML document versioning system when documents are updated. These versioning files provide a view into prior versions of XML documents with reports that show line-by-line differences between versions of documents as well as reports about who changed these documents and when the changes were made.
Workflow and Publishing Functions
The Syntactica has developed (and is in the process of developing) tools that make it easy to create and manage the overall software lifecycle including requirements management, use cases, business terminology, role-based access control, task management, workflow management and publishing to external web sites.
Roadmap of Enhancements
The Syntactica is actively working with several organizations to make it easier for organizations to have full turn-key solutions to manage Dynamic Publishing documents. These list of enhancements includes:
- Site role management application – defining site-wide roles and associating individual with content collection roles (authoring, editing, approving etc.)
- Role based access control – assigning permissions based on a person’s roles
- Remote publishing – allowing publishing to a public web site with a single click
- Database synchronization – allows developers to automatically sync with central systems
- Source-code versioning integration – better integration with version control systems such as Subversion)
- Streamlined web content management tools (news updates, FAQs, Glossary of Terms etc.)