Thursday, February 7, 2013

One XSD, multiple XML files

    On occasions users want to define just one XSD (set of XSDs) that define a data structure, but pull data from multiple different XML files concurrently. By applying a twist to a feature of the ODI XML JDBC driver you can accomplish this.

i)   Define an XML data server. Provide the XSD, the schema identifier (this is actually *not* used as a DB schema, but as a prefix to the tables generated for this XSD) and optionally an XML file in the JDBC URL.
ii)  Create Physical and Logical schemas for this Data Server.
iii) Create a Model based on the Logical schema and reverse engineer it.
iv)  Now you have the corresponding Datastores.

    Now comes the magic.

v)  Define a variable that takes its value from the ODI Session ID.
vi) Instead of the schema identifier set in step(i), use this variable eg: jdbc:snps:xml?d=something.xsd&s=#..
vii) Now for loading an XML file create an ODI Procedure that uses the ODI XML Driver's 'LOAD FILE' command. Again you can use an ODI variable as all or part of the XML file name.

So now for each invocation of the ODI Procedure in a Session, you will get an isolated set of DB tables that will store the data from the XML file. Instead of step(vii) you can do this directly in the XML JDBC URL also.

With a little thought this technique can be extended to allow loading multiple XML documents in the same Session using just one XSD definition.

Caveats

Be aware that XML JDBC driver limits the table name length according to the maximum length allowed for the DB. For example assume your DB allows 30 characters as the table name and your XML schema identifier is "SOME_LONG_ID". Then for an element named "thisIsanElementWithALongName", the table name as created in the DB will be "SOME_LONG_ID_THISISANELEMENTWI" (although in your ODI model the Datastore name will be "THISISANELEMENTWI").

Therefore with a dynamic schema identifier you must always limit the dynamic identifier length to the size of your initial identifier in step (i). Otherwise the XML driver will complain about not being able to find the table.

Code generation confusion

Another problem is that when ODI code generation sees a special character (#) in the schema name, it will enclose this schema name prefix in double quotes for all generated SQL statements. XML driver will not recognize such names. In a case where there are no Variables involved, if the value for the s= parameter is TEST, XML driver does not actually create such a DB schema. Instead, it will rewrite the table names so that they all have the prefix TEST_It will rewrite all SQL queries that it gets, too. Eg: select * from TEST.TABLE1; changes to select * from TEST_TABLE1. Now take the case of an ODI Variable added as a suffix to the s= value - called VAR1 that then gets resolved to '203' during runtime. The value as XML driver sees it will be "TEST203", including the quotes. When this is used for a SQL statement it becomes select * from "TEST203".TABLE1. XML driver will be unable to process the content within the quotes and the query will remain as it is. It will fail. Therefore you have to be careful using when dynamic XML Dataservers with ODI Interfaces/Mappings where ODI code generation comes into play.

So is that it?

Another way would be to keep same s= parameter value but set use the 'standalone=true' parameter. This has the effect of creating an isolated instance of the Dataserver where name collisions will not happen. There are two drawbacks to this approach
  1. Only the internal HSQL DB can be used. That means your XML data is going to reside in-memory.  This is going to impact te JVM in which the Agent is running.
  2. For each connection instance a new instance of HSQL DB itself is created. This gives additional memory overhead. Another related problem is that instances of HSQL DB may stay around. So your XML Dataserver must define an On Disconnect command sending the SHUTDOWN command to HSQL.
So what can we do to have dynamic XML/Complex File Dataservers sharing same XSD, but needing to pull from different XML files? 

Here is what you need to do : change the place where you provide the prefix for the s= parameter. A sample flow is as below:
  • create a global ODI Variable called SCH_NAME with text/alphanumeric value
  • set the refreshing query as select 'TEST_<% odiRef.getSession("SESS_NO") %>' from dual; So we are getting the Variable refresh itself to do the prefixing.
  • Now set the value of the s= parameter as #GLOBAL.SCH_NAME
  • Set the same value (#GLOBAL.SCH_NAME) as the schema and work schema in the associated Physical Schema.
Now when the SQL gets to the XML driver, it will look like  select * from TEST_4001.TABLE1; Notice the absence of quotes. This is because the code interpretation engine noticed that the entire name before variable substitution is actually referring to the name of a Variable and did not put any quotes around it.

Complex File technology and pre Java 7 bug

Problem statement


How many times have you tried to use a user-defined file structure using Native Format and registered it with ODI Topology as a Complex File Data server - only to have it fail when testing the connection with an error saying :

ORABPEL-11100

Translation Failure.
[Offset=7] Translation from native failed. For input string: "...............".
The incoming data does not conform to the NXSD schema. Please correct the problem.

Well, it could be problem with the input data, but if you are absolutely sure that your data is conformant with the XSD you have created (btw you can use commandline to check this : see here) then there is one thing you can look for : in the error message does the reported string start with "+" and is the datatype for this element as defined in the XSD xsd:long?

If you can say yes to all of this then congratulations! You have been bitten by a Java bug tht has been fixed in Java 7 onwards. The bug is in Java's Long.valueOf where the code only checks for "-" and not "+". So when it finds "+" in there it throws a NumberFormatException. This exception however gets swallowed up and the impression that is created is that it is the fault of the data in the file.

Solution

If you are using ODI 11.1.1.5.0, tough luck! With ODI 11.1.1.6.x, you will need to set up a standalone or JEE agent. These agents must run using Java 7. For standalone agent edit /oracledi/agent/bin/odiparams.sh/bat where ODI_JAVA_HOME will point to your Java location. For JEE agent when you are configuring the domain choose a Java 7 installation. Then for 'test connection' use this agent. For reverse engineering choose 'Customized' rather than 'Standard' on the reverse engineering finger tab and also choose this agent. All ODI Sessions that use this data server must execute on this agent.