Tuesday, September 23, 2014

New and noteworthy in ODI 12.1.3 XML driver

Rich metadata

One of the most important features added to the XML (and by extension Complex File) driver in Oracle Data Integrator 12.1.3 is support for user-control of generated table names, column names, column datatype, column length and column precision. This is made possible by the user being able to add custom attributes to those elements/attributes whose corresponding relational (and by extension the ODI Datastore) structure they want to control.

Why add this to XSD?
Some users might ask this question. For example, the ODI LDAP driver allows you to specify an 'alias_bundle' where LDAP DN names can be aliased to a more meaningful table name. The down side to this is that it becomes another file that you need to keep and move around according as the location of your Agent that actually performs the execution.

Details about this feature can be found in the XML driver documentation here. However here are couple of tips that will be of use.
  • XML driver creates tables for complex elements as well as elements of simple type with maxOccurs > 1. In the latter case a table is created for the element and the data in each of the element instances in the XML is stored in a column with the name '_DATA'. If user wants to control the datatype, length or precision of this column, user can go ahead and add column metadata to this element.
  • Tables cannot store data from multiple elements. Suppose you have a global element of complex type or maxOccurs > 1 in your XSD and also more than one place in your XSD where this element is used via 'ref'. In this case you cannot control the table name for this element.

Recursion support

Until 12.1.3, compat_mode=v3 mode did not support recursive XML structures. If user supplied an XSD with recursive structure, it would result in an exception that said in effect 'recursion not supported'. From 12.1.3 onwards the driver supports recursion. More information may be found in this post.

ODI XML driver and recursion

The serpent Set swallowing its tail


Most well-designed XSDs avoid recursion. But certain XSDs that are generated by frameworks or by naive Object hierarchy to XML hierarchy conversion tools make heavy use of recursion. In the v2 mode of XML driver this was handled in a spotty manner. Relational structures get created, but data may or may not be populated properly. Some element data may disappear altogether. The v3 mode driver that was introduced in ODI 11.1.1.7.0 and made the default in ODI 12.1.2 uses element xpath to uniquely identify an element. This causes a problem with recursion since each recursion occurrence is a unique xpath and will lead to creation of a new table ad infinitum. This will result in a StackOverflowError. To avoid this, 11.1.7.0 and 12.1.2 XML drivers raise error when recursion is detected.

Breaking the chains


It is obvious that avoiding recursion is not the answer. Hence in ODI 12.1.3 recursion support was added to v3 mode. Broadly speaking there are two types of recursion - self-recursion and multi-element recursion. An example for self-recursion is /CustomerOrder/Order/Order. Multi-element recursion - which is more common - can be exemplified by /CustomerOrders/Order/Return/Order.

XML driver breaks the recursion loop by first detecting the recursion and then identifying the recursion head and recursion tail. For self-recursion, recursion head == recursion tail. In the multi-element recursion above, the recursion head is /CustomerOrders/Order and the recursion tail is /CustomerOrders/Order/Return. Once this information has been identified no further exploration of this hierarchy is performed. /CustomerOrders/Order table will hold the data from all the recursion descendants of 'Order' type and /CustomerOrders/Order/Return table will hold reference to the data for all the recursion descendants of 'Order' type.

So one thing a user may expect with recursion is that even if your XML data has, say 100 levels of recursion, the XML datastores will only have one level representing this whole recursion hierarchy. The driver takes care of folding in and unfolding out the XML data as it is read in or written out. However, if user wants to perform a mapping from or to such a datastore, then they need to make sure that they pay attention to the PK-FKs on the tables so that the mappings to or from different recursion levels do not overlap.  For self-recursion, the ORDER table will have, as usual, an ORDERPK column. In addition it will also have an ORDERFK column. An Order element having an Order child will have the child element's PK reference in the ORDERFK column. In a similar manner, for the multi-element recursion example, RETURN table will have an ORDERFK column that will contain reference to Order children of Return.