![]() New Tamino XQuery FunctionsWith version 4.2, Tamino offers lots of new XQuery functions. This does not merely increase the total number of available functions, but enriches Tamino XQuery by opening up new possibilities. Added features are, among others, the new distinct-values() function enabling XQuery grouping, and enhancements with respect to datatype handling and here especially in the realm of text processing. Table of Contents1. The Tamino XQuery Function Set2. New XQuery Functions 2.1 The distinct-values() Function 2.2 More Constructor/Casting Functions 2.3 Additional Text Functions 2.3.1 Enhanced Collation Support 2.3.2 Text Functions Using the Default Collation 3. Recommended Reading 1. The Tamino XQuery Function SetWith version 4.1 Tamino adhered largely to the W3C's XQuery drafts dating from November 2002. When talking about XQuery functions this specifically means the Functions and Operators draft [1]. The Tamino version to come enhances the set of functions in two ways. Functions are added that had already been defined in 2002 but were not implemented in the previous version of Tamino; in addition, changes in the recent Functions and Operators draft [2] have been taken into account without sacrificing upward-compatibility of already exitsing Tamino XQueries. Apart from the set of functions defined in the W3C's XQuery drafts, Tamino implements functions of its own. Some of them are required for Tamino's XQuery update facility that is not yet part of XQuery; others extend Tamino's text retrieval ability beyond what XQuery currently offers. In the area of the latter, functions for enhanced thesaurus support are added. These are not discussed herein but will separately be dealt with in a future ticker. 2. New XQuery FunctionsProbably one of the most powerful new functions with version 4.2 is the collection() function allowing access to documents in any Tamino collection independent of how the current collection request parameter is set. This has already been discussed in the fifth Tamino Release Ticker, see Tamino XQuery Cross-Collection Joins. The following sections discuss other new functions. The sections are ordered by importance, i.e. those new functions that add most benefit to the XQuery user appear first. 2.1 The distinct-values() FunctionThe distinct-values() function makes a list of atomic values unique. According to the rules that XQuery defines for the evaluation of function calls this does not necessarily mean that the sequence the function is applied upon consists entirely of atomic values. If the sequence contains nodes, these are atomized, i.e. flattened to atomic values depending on the node's type. For details on the evaluation of function calls and the atomization mechanism, see the XQuery draft [3]. The following excerpt is taken from [2]: fn:distinct-values($arg as xdt:anyAtomicType*) as xdt:anyAtomicType* fn:distinct-values($arg as xdt:anyAtomicType*, $collation as xs:string) as xdt:anyAtomicType* A very simple example applying distinct-values() upon a sequence of integers is distinct-values((3,4,3,1,2,2,3)) yielding <xq:result xmlns:xq="http://namespaces.softwareag.com/tamino/XQuery/result"> <xq:value>1</xq:value> <xq:value>2</xq:value> <xq:value>3</xq:value> <xq:value>4</xq:value> </xq:result> The next query uses a sequence of nodes based on a use case described in the W3C's XQuery Use Cases document [4] which within the Tamino world is often referred to as the bookshop example. ( <seq> { input()//book/price } </seq>, <unique_seq> { for $val in distinct-values(input()//book/price) return <price> { $val } </price> } </unique_seq> ) The result is: <xq:result xmlns:xq="http://namespaces.softwareag.com/tamino/XQuery/result"> <seq> <price>65.95</price> <price>65.95</price> <price>39.95</price> <price>129.95</price> </seq> <unique_seq> <price>39.95</price> <price>65.95</price> <price>129.95</price> </unique_seq> </xq:result> The main asset of the distinct-values() function, however, is that it enables grouping, a concept known from SQL to arrange data in groups depending on values in certain fields. Our last example groups the books by their prices. for $val in distinct-values(input()//book/price) return <segment price=" { $val } "> { for $book in input()//book[price = $val] return $book/title } </segment> sort by (.) The result is: <xq:result xmlns:xq="http://namespaces.softwareag.com/tamino/XQuery/result"> <segment price="39.95"> <title>Data on the Web</title> </segment> <segment price="65.95"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> </segment> <segment price="129.95"> <title>The Economics of Technology and Content for Digital TV</title> </segment> </xq:result> Find more information on grouping with XQuery including more examples in an appendix to the XQuery draft [3]. To reproduce the example use the Tamino schema bib.tsd and data provided in [4] (see section 1.1.2.Sample Data). 2.2 More Constructor/Casting FunctionsAn XQuery constructor function corresponding to a certain datatype is a function that accepts a string and yields the instance of the respective type that was represented by the string. The set of a datatype's valid string representations is described in the W3C's Datatype Recommendation [5]. These so-called constructor functions are also capable of performing casts if applied on other types than string. Which datatype may be cast to which is described in a matrix contained in [2] (see chapter 17 Casting). Tamino now supports all constructor functions including all casts mentioned in [2]. To illustrate how to apply these functions and why more specific datatyping is useful we will look at an example with the datatype NCName representing XML names without colons or, more scientifically NCName ::= (Letter | '_') (NCNameChar)* NCNameChar ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender as defined in the W3C's XML Namespace Recommendation [6]. The following excerpt of a Tamino schema describes a customer data structure that uses an NCName attribute custNo: <xs:element name="customer"> <xs:complexType> <xs:sequence> <xs:element ref="name" minOccurs="0" maxOccurs="1" /> <xs:element ref="address" minOccurs="0" maxOccurs="1" /> </xs:sequence> <xs:attribute name="custNo" type="xs:NCName" use="required" /> <xs:attribute name="deliveryDuration" type="xs:integer" use="required" /> </xs:complexType> </xs:element> The following query retrieves customers by their custNo: declare namespace xs = "http://www.w3.org/2001/XMLSchema" for $custNo in ("c1","c2","c3") let $NCName := xs:NCName($custNo) for $cust in input()//customer[@custNo = $NCName] return <customer> <no> { string($cust/@custNo) } </no> <name> { string-join(($cust/name/first,$cust/name/last)," ") } </name> </customer> Using a constructor for the search criterion ensures that however the custNo values are obtained, they are formally correct. Otherwise the XQuery typing mechanism would throw an error. Thus using proper typing in queries ensures that the query engine adheres to the same validation principles as does the Tamino loader when appropriate datatypes are defined in the Tamino schema. Schema and data used for the customer example have already been used for the Tamino Release Ticker No. 5, Tamino XQuery Cross-Collection Joins, and can be found at Ticker5. 2.3 Additional Text FunctionsWith version 4.2 Tamino offers seven new text functions. These are in alphabetical order compare(), contains(), ends-with(), substring(), substring-after(), substring-before(), and tf:getCollation(). The tf: bit in tf:getCollation() denotes that this is a Tamino-specific function in the namespace http://namespaces.softwareag.com/tamino/TaminoFunction. All other new text functions belong to the XQuery function namespace (http://www.w3.org/2003/11/xpath-functions) and so can be referred to using unqualified names. 2.3.1 Enhanced Collation SupportThe compare() function takes two strings to be compared and optionally a third string that denotes a collation that gets used with the comparison. If the function does not specify a collation the default collation is used, which is based on Unicode codepoints, unless a collation is provided in the query prolog. The following query tries to find the customer named Müller but this German name can also be written as Mueller. Trying the latter, the query yields an empty sequence for $last in input()//last where compare($last,"Mueller") = 0 return $last whereas the same query using an appropriate collation let $col := "collation?language=de__PHONEBOOK;strength=secondary" for $last in input()//last where compare($last,"Mueller",$col) = 0 return $last returns <xq:result xmlns:xq="http://namespaces.softwareag.com/tamino/XQuery/result"> <last>Müller</last> </xq:result> Another way to specify a collation is with the Tamino schema by adding a tsd:collation descendant to a tsd:element (or tsd:attribute) definition. This collation can be accessed from within a query using the new tf:getCollation() function. The following query yields the same result as the previous one assumed that the respective schema defines the appropriate collation for the last element: declare namespace tf = "http://namespaces.softwareag.com/tamino/TaminoFunction" for $last in input()//last where compare($last,"Mueller",tf:getCollation($last)) = 0 return $last The last possibility to search for "Müller" without depending on how it is actually written is using a default collation declaration in the XQuery prolog: declare default collation "collation?language=de__PHONEBOOK;strength=secondary" for $last in input()//last where compare($last,"Mueller") = 0 return $last 2.3.2 Text Functions Using the Default CollationThe remaining five new text functions do not allow to specify a collation as an optional parameter. Instead, the default collation is used. The contains() function takes two strings and detects whether the first contains the latter. Different from Tamino's own tf:containsText() function contains() only performs a strict character scan whereas tf:containsText() searches for words, understands wildcards, etc. The ends-with() function is defined analogously to starts-with() that was already provided with Tamino 4.1. The three substring functions allow to cutting pieces from strings. With substring() the piece is taken by defining the start position and the length of the result string as with fn:substring("metadata", 4, 3) returning "ada". With substring-before() and substring-after() the string to be returned is defined by preceding (or following) a specified string as in fn:substring-before("xpath-functions.xml",".") returning "xpath-functions". 3. Recommended Reading
|