Hoarding Data: ODI XML driver

Showing posts with label ODI XML driver. Show all posts

Monday, February 9, 2015

Creating a JSON file

Oracle Data Integrator and JSON

    Oracle Data Integrator 12.1.3 adds the capability to read and write JSON files. Both of these work through the Complex File dataserver definition of ODI. There does exist one limitation to JSON capability in that the JSON data must have a single root. This is analogous to XML dataserver in that a recurring element cannot be used as the root element.
    In order to specify that the Complex File dataserver is meant to read and write JSON file, a new property has been added to Complex File technology viz. "translator_type" or "tt". For JSON reading and writing the value of this property must be set to "json". For other types of Complex Files do not specify this property at all.
    If the Complex File dataserver has associated with it the XSD generated by Native File Format wizard then there is no need to explicitly provide this property specifying the translator type. If you look into a generated XSD file, it will have an annotation nxsd:version=JSON. For other source formats, the wizard will generate XSD file with annotation nxsd:version=NXSD.
   One more item of interest is that the Complex File dataserver accpets all the commands and properties supported by XML driver. So you can - for intents and purposes - treat Complex File technology as XML technology.

Getting our feet wet

Let us jump right in and see an example about reading (and writing) a JSON file. Shown below is the JSON that we are going to use as source.

JSON
{ "completed_in": 0.012, "max_id": 130, "max_id_str": "136536013832069120", "next_page": "?page=2&max_id=136536013832069120&q=twitterapi&rpp=1", "page": 1, "query": "twitterapi", "refresh_url": "?since_id=136536013832069120&q=twitterapi", "results": [ { "created_at": "Tue, 15 Nov 2011 20:08:17 +0000", "from_user": "fakekurrik", "from_user_id": 370, "from_user_id_str": "370773112", "from_user_name": "fakekurrik", "geo": null, "id": 136, "id_str": "136536013832069120", "iso_language_code": "en", "metadata": { "result_type": "recent" }, "profile_image_url": "http://a1.twimg.com/profile_images/1540298033/phatkicks_normal.jpg", "source": "<a href="http://twitter.com/">web</a>", "text": "@twitterapi, keep on keeping it real", "to_user": "twitterapi", "to_user_id": 62, "to_user_id_str": "6253282", "to_user_name": "Twitter API" } ], "results_per_page": 1, "since_id": 0, "since_id_str": "0" } The corresponding XSD file shall be As you can see it has the attribute nxsd:version="JSON". Also you can see that the XSD has a root element called "tweet". Do not be concerned. This is just an empty holder. We shall be using the same XSD for defining source and target Complex File dataservers, but with different value for the "s=" parameter of the dataserver. Let us define the source Complex File dataserver. See below the definition. You can see that the "tt" property is set. This is to clarify the nature of the dataserver. Now create Physical Schema, Logical Schema and a Model for this dataserver. See below the Model and datastores that will get created. Now create a target Complex File dataserver. As can be seen the "f=" parameter is not specified. So this dataserver will not have any content. It will only have a shape. Also note the different value given to "s=" parameter. Again create Physical Schema, Logical Schema and Model for this dataserver. Now that we have the Models created, let us create a Mapping. Create a Mapping with empty dataset. Then multi-select (Ctrl-click) all datastores from "twitter_simple" model. Darg-drop them into the default dataset. Re-arrange them to see all the datastores. Now multi-select all datastores from "twitter_simple_targ". Darg-drop them outside of the default dataset. Re-arrange them. Create mapping from the dataset to each of the three target datastores. Choose to do auto-mapping in each case. You will end with the following. Execute this mapping once to verify that there are no errors. No file will be created, so do not look for it. We are just populating data into the target via this Mapping. Next step is to actually write out the data into a JSON file. For this we will use a Procedure and the 'CREATE FILE' XML driver command. The JSON file is being created from the schema of the target Complex File dataserver. Since we want this Procedure to be executed after the Mapping, but in the same Session so that we will get the data, we shall put the Mapping and Procedure into a Package. Execute this Package. The contents of our initial JSON file will get written out to the new JSON file. You will notice that the psuedo-root node "tweet" is absent in the result. This is a basic example of reading (and writing) JSON file using Oracle Data Integrator. More complex flows can be created and the source/target of data can be changed. The main take-aways should be about the pseudo-root and the fact that Complex File technology works the same way as XML technology and accepts the same properties and commands. Note: In the mapping shown above, all the target tables are in the same mapping. This depends on the Parallel Target Table load feature of ODI and has all its limitations. Please verify that your target tables are indeed capable of being loaded in parallel. Otherwise you will need to create one mapping per XML hierarchy level.

JSON

{
"completed_in": 0.012,
"max_id": 130,
"max_id_str": "136536013832069120",
"next_page": "?page=2&max_id=136536013832069120&q=twitterapi&rpp=1",
"page": 1,
"query": "twitterapi",
"refresh_url": "?since_id=136536013832069120&q=twitterapi",
"results": [
{
"created_at": "Tue, 15 Nov 2011 20:08:17 +0000",
"from_user": "fakekurrik",
"from_user_id": 370,
"from_user_id_str": "370773112",
"from_user_name": "fakekurrik",
"geo": null,
"id": 136,
"id_str": "136536013832069120",
"iso_language_code": "en",
"metadata": {
"result_type": "recent"
},
"profile_image_url": "http://a1.twimg.com/profile_images/1540298033/phatkicks_normal.jpg",
"source": "<a href="http://twitter.com/">web</a>",
"text": "@twitterapi, keep on keeping it real",
"to_user": "twitterapi",
"to_user_id": 62,
"to_user_id_str": "6253282",
"to_user_name": "Twitter API"
}
],
"results_per_page": 1,
"since_id": 0,
"since_id_str": "0"
}

The corresponding XSD file shall be

As you can see it has the attribute nxsd:version="JSON". Also you can see that the XSD has a root element called "tweet". Do not be concerned. This is just an empty holder.

We shall be using the same XSD for defining source and target Complex File dataservers, but with different value for the "s=" parameter of the dataserver.

Let us define the source Complex File dataserver. See below the definition.

You can see that the "tt" property is set. This is to clarify the nature of the dataserver. Now create Physical Schema, Logical Schema and a Model for this dataserver. See below the Model and datastores that will get created.

Now create a target Complex File dataserver.

As can be seen the "f=" parameter is not specified. So this dataserver will not have any content. It will only have a shape. Also note the different value given to "s=" parameter. Again create Physical Schema, Logical Schema and Model for this dataserver.

Now that we have the Models created, let us create a Mapping. Create a Mapping with empty dataset. Then multi-select (Ctrl-click) all datastores from "twitter_simple" model. Darg-drop them into the default dataset. Re-arrange them to see all the datastores.

Now multi-select all datastores from "twitter_simple_targ". Darg-drop them outside of the default dataset. Re-arrange them. Create mapping from the dataset to each of the three target datastores. Choose to do auto-mapping in each case. You will end with the following.

Execute this mapping once to verify that there are no errors. No file will be created, so do not look for it. We are just populating data into the target via this Mapping.

Next step is to actually write out the data into a JSON file. For this we will use a Procedure and the 'CREATE FILE' XML driver command.

The JSON file is being created from the schema of the target Complex File dataserver. Since we want this Procedure to be executed *after* the Mapping, but in the same Session so that we will get the data, we shall put the Mapping and Procedure into a Package.

Execute this Package. The contents of our initial JSON file will get written out to the new JSON file. You will notice that the psuedo-root node "tweet" is absent in the result.

This is a basic example of reading (and writing) JSON file using Oracle Data Integrator. More complex flows can be created and the source/target of data can be changed. The main take-aways should be about the pseudo-root and the fact that Complex File technology works the same way as XML technology and accepts the same properties and commands.

Note:

In the mapping shown above, all the target tables are in the same mapping. This depends on the Parallel Target Table load feature of ODI and has all its limitations. Please verify that your target tables are indeed capable of being loaded in parallel. Otherwise you will need to create one mapping per XML hierarchy level.

Tuesday, September 23, 2014

New and noteworthy in ODI 12.1.3 XML driver

Rich metadata

One of the most important features added to the XML (and by extension Complex File) driver in Oracle Data Integrator 12.1.3 is support for user-control of generated table names, column names, column datatype, column length and column precision. This is made possible by the user being able to add custom attributes to those elements/attributes whose corresponding relational (and by extension the ODI Datastore) structure they want to control.

Why add this to XSD?
Some users might ask this question. For example, the ODI LDAP driver allows you to specify an 'alias_bundle' where LDAP DN names can be aliased to a more meaningful table name. The down side to this is that it becomes another file that you need to keep and move around according as the location of your Agent that actually performs the execution.

Details about this feature can be found in the XML driver documentation here. However here are couple of tips that will be of use.

XML driver creates tables for complex elements as well as elements of simple type with maxOccurs > 1. In the latter case a table is created for the element and the data in each of the element instances in the XML is stored in a column with the name '_DATA'. If user wants to control the datatype, length or precision of this column, user can go ahead and add column metadata to this element.
Tables cannot store data from multiple elements. Suppose you have a global element of complex type or maxOccurs > 1 in your XSD and also more than one place in your XSD where this element is used via 'ref'. In this case you cannot control the table name for this element.

Recursion support

Until 12.1.3, compat_mode=v3 mode did not support recursive XML structures. If user supplied an XSD with recursive structure, it would result in an exception that said in effect 'recursion not supported'. From 12.1.3 onwards the driver supports recursion. More information may be found in this post.

ODI XML driver and recursion

The serpent Set swallowing its tail

Most well-designed XSDs avoid recursion. But certain XSDs that are generated by frameworks or by naive Object hierarchy to XML hierarchy conversion tools make heavy use of recursion. In the v2 mode of XML driver this was handled in a spotty manner. Relational structures get created, but data may or may not be populated properly. Some element data may disappear altogether. The v3 mode driver that was introduced in ODI 11.1.1.7.0 and made the default in ODI 12.1.2 uses element xpath to uniquely identify an element. This causes a problem with recursion since each recursion occurrence is a unique xpath and will lead to creation of a new table ad infinitum. This will result in a StackOverflowError. To avoid this, 11.1.7.0 and 12.1.2 XML drivers raise error when recursion is detected.

Breaking the chains

It is obvious that avoiding recursion is not the answer. Hence in ODI 12.1.3 recursion support was added to v3 mode. Broadly speaking there are two types of recursion - self-recursion and multi-element recursion. An example for self-recursion is /CustomerOrder/Order/Order. Multi-element recursion - which is more common - can be exemplified by /CustomerOrders/Order/Return/Order.

XML driver breaks the recursion loop by first detecting the recursion and then identifying the recursion head and recursion tail. For self-recursion, recursion head == recursion tail. In the multi-element recursion above, the recursion head is /CustomerOrders/Order and the recursion tail is /CustomerOrders/Order/Return. Once this information has been identified no further exploration of this hierarchy is performed. /CustomerOrders/Order table will hold the data from all the recursion descendants of 'Order' type and /CustomerOrders/Order/Return table will hold reference to the data for all the recursion descendants of 'Order' type.

So one thing a user may expect with recursion is that even if your XML data has, say 100 levels of recursion, the XML datastores will only have one level representing this whole recursion hierarchy. The driver takes care of folding in and unfolding out the XML data as it is read in or written out. However, if user wants to perform a mapping from or to such a datastore, then they need to make sure that they pay attention to the PK-FKs on the tables so that the mappings to or from different recursion levels do not overlap. For self-recursion, the ORDER table will have, as usual, an ORDERPK column. In addition it will also have an ORDERFK column. An Order element having an Order child will have the child element's PK reference in the ORDERFK column. In a similar manner, for the multi-element recursion example, RETURN table will have an ORDERFK column that will contain reference to Order children of Return.

Tuesday, April 2, 2013

What is new for ODI XML JDBC driver in Oracle Fusion Middleware 11g R1 (11.1.1.7.0)

ODI XML driver issues

ODI XML driver had several issues such as

would not support elements with same name but different types. For example a local element with same namespace URI and local name as a global variable would not be handled.
does not support list/union simpleTypes, any, anyType, mixed type and so on.

In ODI 11.1.1.7.0 a new compatibility mode parameter has been added. This is an XML driver property called 'compat_mode'. Its default value is 'v2'. But if you specify the value 'v3' for this parameter, you can avoid issue (1). As you might imagine, if you use 'v3' for the value of this parameter, you will run across some changes in the behaviour of the driver.

Differences between 'v2' and 'v3' modes

Handling of elements with same local name and resolved type. In current XML driver, data from such elements end up stored in the same table, but the rows have different FK references. With default mode, we will use different tables to store the elements. This has implication for elements that are 'ref's of a global element. Each of the reference (and descendants) gets its own table.
Determining the root element. The XML JDBC driver allows users to specify the local name of the root element, and not a qualified name. One plain assumption here is that the root element has a unique local name. In 'v2' mode, if a root element name is not specified then the first element encountered by the parser is set as root element. This is the first element from the XSD hierarchy where the XSD does not have an 'include' or 'import'. In 'v3' mode this is indeterminate. The XSD should either have only one global element or user must specify the name of the element to be used as root. Failing this, one of the global elements from one of the target namespaces is used as root.
As a corollary to (2), only a global element is allowed to be used as root element. Old parser would allow use of even local elements. This is carry-over from DTD times where all elements are global elements. If user specifies a local element name (which also is not the same a global element name) then an exception is raised. Note that for the old parser, if the user had provided an XSD, then the restriction applies that root element must be a global element.
When using a DTD/specifying XML driver URL parameter to generate DTD, user must provide root element name. The new XML driver converts the DTD to XSD in-memory and uses this XSD to construct the data model. This conversion requires that a root element be specified. However, DTDs that are for XML structures that need to use namespaces must continue to use 'v2' mode.

Note that since JMS XML Queue/Topic drivers and Complex File driver uses XML driver internally, these drivers will also accept 'compat_mode' parameter. 11g JMS XML driver though, will not accept this parameter. It only supports a subset of XML driver parameters. Starting from 12c this is not the case

Thursday, February 7, 2013

One XSD, multiple XML files

    On occasions users want to define just one XSD (set of XSDs) that define a data structure, but pull data from multiple different XML files concurrently. By applying a twist to a feature of the ODI XML JDBC driver you can accomplish this.

i)   Define an XML data server. Provide the XSD, the schema identifier (this is actually *not* used as a DB schema, but as a prefix to the tables generated for this XSD) and optionally an XML file in the JDBC URL.
ii) Create Physical and Logical schemas for this Data Server.
iii) Create a Model based on the Logical schema and reverse engineer it.
iv) Now you have the corresponding Datastores.

    Now comes the magic.

v) Define a variable that takes its value from the ODI Session ID.
vi) Instead of the schema identifier set in step(i), use this variable eg: jdbc:snps:xml?d=something.xsd&s=#..
vii) Now for loading an XML file create an ODI Procedure that uses the ODI XML Driver's 'LOAD FILE' command. Again you can use an ODI variable as all or part of the XML file name.

So now for each invocation of the ODI Procedure in a Session, you will get an isolated set of DB tables that will store the data from the XML file. Instead of step(vii) you can do this directly in the XML JDBC URL also.

With a little thought this technique can be extended to allow loading multiple XML documents in the same Session using just one XSD definition.

Caveats

Be aware that XML JDBC driver limits the table name length according to the maximum length allowed for the DB. For example assume your DB allows 30 characters as the table name and your XML schema identifier is "SOME_LONG_ID". Then for an element named "thisIsanElementWithALongName", the table name as created in the DB will be "SOME_LONG_ID_THISISANELEMENTWI" (although in your ODI model the Datastore name will be "THISISANELEMENTWI").

Therefore with a dynamic schema identifier you must always limit the dynamic identifier length to the size of your initial identifier in step (i). Otherwise the XML driver will complain about not being able to find the table.

Code generation confusion

Another problem is that when ODI code generation sees a special character (#) in the schema name, it will enclose this schema name prefix in double quotes for all generated SQL statements. XML driver will not recognize such names. In a case where there are no Variables involved, if the value for the s= parameter is TEST, XML driver does not actually create such a DB schema. Instead, it will rewrite the table names so that they all have the prefix TEST_. It will rewrite all SQL queries that it gets, too. Eg: select * from TEST.TABLE1; changes to select * from TEST_TABLE1. Now take the case of an ODI Variable added as a suffix to the s= value - called VAR1 that then gets resolved to '203' during runtime. The value as XML driver sees it will be "TEST203", including the quotes. When this is used for a SQL statement it becomes select * from "TEST203".TABLE1. XML driver will be unable to process the content within the quotes and the query will remain as it is. It will fail. Therefore you have to be careful using when dynamic XML Dataservers with ODI Interfaces/Mappings where ODI code generation comes into play.

So is that it?

Another way would be to keep same s= parameter value but set use the 'standalone=true' parameter. This has the effect of creating an isolated instance of the Dataserver where name collisions will not happen. There are two drawbacks to this approach

Only the internal HSQL DB can be used. That means your XML data is going to reside in-memory. This is going to impact te JVM in which the Agent is running.
For each connection instance a new instance of HSQL DB itself is created. This gives additional memory overhead. Another related problem is that instances of HSQL DB may stay around. So your XML Dataserver must define an On Disconnect command sending the SHUTDOWN command to HSQL.

So what can we do to have dynamic XML/Complex File Dataservers sharing same XSD, but needing to pull from different XML files?

Here is what you need to do : change the place where you provide the prefix for the s= parameter. A sample flow is as below:

create a global ODI Variable called SCH_NAME with text/alphanumeric value
set the refreshing query as select 'TEST_<% odiRef.getSession("SESS_NO") %>' from dual; So we are getting the Variable refresh itself to do the prefixing.
Now set the value of the s= parameter as #GLOBAL.SCH_NAME
Set the same value (#GLOBAL.SCH_NAME) as the schema and work schema in the associated Physical Schema.

Now when the SQL gets to the XML driver, it will look like select * from TEST_4001.TABLE1; Notice the absence of quotes. This is because the code interpretation engine noticed that the entire name before variable substitution is actually referring to the name of a Variable and did not put any quotes around it.

Friday, July 20, 2012

Oracle Data Integrator On Connect and XML/LDAP driver

Oracle Data Integrator 11.1.1.5.0 onwards introduces a very useful feature called 'On Connect' and 'On Disconnect'. These are SQL queries that are run on the JDBC connection obtained from the Data Server when the connection is created or closed respectively. See http://docs.oracle.com/cd/E21764_01/integrate.1111/e12643/setup_topology.htm#BGBBCAEH for instructions on how to do this and also limitations on its usage.

ODI comes along with a set of special JDBC drivers that allow users to connect to and use XML/LDAP sources as JDBC sources. The interesting thing to note here is that 'On Connect'/'On Disconnect' SQL commands can be executed on these JDBC connections also. The magical part is that the SQL commands that you enter here - if they are not XML driver commands - will get pushed down to the underlying DB and will be actually executed by the underlying DB. This statement is, in fact, true for any SQL command that you execute on an XML/LDAP Data Server. In case of XML driver, if your command is one of the XML driver commands (listed here : http://docs.oracle.com/cd/E23943_01/integrate.1111/e12644/appendix_xml_driver.htm#CHDFDFJF) then it is intercepted and executed by the driver itself. Otherwise all commands are actually executed on the underlying DB.

Now there is one wrinkle that you need to be aware of when executing 'On Connect' commands on XML/LDAP drivers. As you may be already aware these drivers create/drop tables and indexes and also populate the tables. Parts of this are controlled by properties that you set on the JDBC URL or as property of the Data Server eg: create_indexes for XML driver and load_mode for LDAP driver.

Here is how 'On Connect' would work for a plain DB JDBC connection

You provide a SQL for 'On Connect'
ODI asks the DB's JDBC driver for a JDBC connection.
DB's JDBC driver returns a JDBC connection.
ODI executes the SQL that you provided for On Connect on this connection
Now ODI uses this connection to handle your Procedure commands.

Now here is how it works for XML/LDAP

You provide a SQL for On Connect
ODI asks XML/LDAP JDBC driver for a connection. The connection URL has XML/LDAP connection parameters
XML/LDAP driver internally obtains a JDBC connection to internal HSQL/external DB.
XML/LDAP driver executes SQL commands that do whatever it is that user has asked it to do via connection properties : truncate tables, drop schema, load data ….
XML/LDAP driver returns an XML/LDAP JDBC connection
ODI executes the SQL that you provided for On Connect on this connection
Now ODI uses this connection to handle your Procedure commands.

The points to note are (3) and (4) in the way it works for XML/LDAP drivers. So if you are providing a SQL for 'On Connect' that you would expect to affect all table/index operations on the underlying DB, do not be surprised if it does not work for the initial operations performed by the XMl/LDAP JDBC drivers.