Tuesday, September 23, 2014

ODI XML driver and recursion

The serpent Set swallowing its tail


Most well-designed XSDs avoid recursion. But certain XSDs that are generated by frameworks or by naive Object hierarchy to XML hierarchy conversion tools make heavy use of recursion. In the v2 mode of XML driver this was handled in a spotty manner. Relational structures get created, but data may or may not be populated properly. Some element data may disappear altogether. The v3 mode driver that was introduced in ODI 11.1.1.7.0 and made the default in ODI 12.1.2 uses element xpath to uniquely identify an element. This causes a problem with recursion since each recursion occurrence is a unique xpath and will lead to creation of a new table ad infinitum. This will result in a StackOverflowError. To avoid this, 11.1.7.0 and 12.1.2 XML drivers raise error when recursion is detected.

Breaking the chains


It is obvious that avoiding recursion is not the answer. Hence in ODI 12.1.3 recursion support was added to v3 mode. Broadly speaking there are two types of recursion - self-recursion and multi-element recursion. An example for self-recursion is /CustomerOrder/Order/Order. Multi-element recursion - which is more common - can be exemplified by /CustomerOrders/Order/Return/Order.

XML driver breaks the recursion loop by first detecting the recursion and then identifying the recursion head and recursion tail. For self-recursion, recursion head == recursion tail. In the multi-element recursion above, the recursion head is /CustomerOrders/Order and the recursion tail is /CustomerOrders/Order/Return. Once this information has been identified no further exploration of this hierarchy is performed. /CustomerOrders/Order table will hold the data from all the recursion descendants of 'Order' type and /CustomerOrders/Order/Return table will hold reference to the data for all the recursion descendants of 'Order' type.

So one thing a user may expect with recursion is that even if your XML data has, say 100 levels of recursion, the XML datastores will only have one level representing this whole recursion hierarchy. The driver takes care of folding in and unfolding out the XML data as it is read in or written out. However, if user wants to perform a mapping from or to such a datastore, then they need to make sure that they pay attention to the PK-FKs on the tables so that the mappings to or from different recursion levels do not overlap.  For self-recursion, the ORDER table will have, as usual, an ORDERPK column. In addition it will also have an ORDERFK column. An Order element having an Order child will have the child element's PK reference in the ORDERFK column. In a similar manner, for the multi-element recursion example, RETURN table will have an ORDERFK column that will contain reference to Order children of Return.


No comments:

Post a Comment