Hoarding Data: February 2015

Oracle Data Integrator and JSON

    Oracle Data Integrator 12.1.3 adds the capability to read and write JSON files. Both of these work through the Complex File dataserver definition of ODI. There does exist one limitation to JSON capability in that the JSON data must have a single root. This is analogous to XML dataserver in that a recurring element cannot be used as the root element.
    In order to specify that the Complex File dataserver is meant to read and write JSON file, a new property has been added to Complex File technology viz. "translator_type" or "tt". For JSON reading and writing the value of this property must be set to "json". For other types of Complex Files do not specify this property at all.
    If the Complex File dataserver has associated with it the XSD generated by Native File Format wizard then there is no need to explicitly provide this property specifying the translator type. If you look into a generated XSD file, it will have an annotation nxsd:version=JSON. For other source formats, the wizard will generate XSD file with annotation nxsd:version=NXSD.
   One more item of interest is that the Complex File dataserver accpets all the commands and properties supported by XML driver. So you can - for intents and purposes - treat Complex File technology as XML technology.

Getting our feet wet

Let us jump right in and see an example about reading (and writing) a JSON file. Shown below is the JSON that we are going to use as source.

JSON
{ "completed_in": 0.012, "max_id": 130, "max_id_str": "136536013832069120", "next_page": "?page=2&max_id=136536013832069120&q=twitterapi&rpp=1", "page": 1, "query": "twitterapi", "refresh_url": "?since_id=136536013832069120&q=twitterapi", "results": [ { "created_at": "Tue, 15 Nov 2011 20:08:17 +0000", "from_user": "fakekurrik", "from_user_id": 370, "from_user_id_str": "370773112", "from_user_name": "fakekurrik", "geo": null, "id": 136, "id_str": "136536013832069120", "iso_language_code": "en", "metadata": { "result_type": "recent" }, "profile_image_url": "http://a1.twimg.com/profile_images/1540298033/phatkicks_normal.jpg", "source": "<a href="http://twitter.com/">web</a>", "text": "@twitterapi, keep on keeping it real", "to_user": "twitterapi", "to_user_id": 62, "to_user_id_str": "6253282", "to_user_name": "Twitter API" } ], "results_per_page": 1, "since_id": 0, "since_id_str": "0" } The corresponding XSD file shall be As you can see it has the attribute nxsd:version="JSON". Also you can see that the XSD has a root element called "tweet". Do not be concerned. This is just an empty holder. We shall be using the same XSD for defining source and target Complex File dataservers, but with different value for the "s=" parameter of the dataserver. Let us define the source Complex File dataserver. See below the definition. You can see that the "tt" property is set. This is to clarify the nature of the dataserver. Now create Physical Schema, Logical Schema and a Model for this dataserver. See below the Model and datastores that will get created. Now create a target Complex File dataserver. As can be seen the "f=" parameter is not specified. So this dataserver will not have any content. It will only have a shape. Also note the different value given to "s=" parameter. Again create Physical Schema, Logical Schema and Model for this dataserver. Now that we have the Models created, let us create a Mapping. Create a Mapping with empty dataset. Then multi-select (Ctrl-click) all datastores from "twitter_simple" model. Darg-drop them into the default dataset. Re-arrange them to see all the datastores. Now multi-select all datastores from "twitter_simple_targ". Darg-drop them outside of the default dataset. Re-arrange them. Create mapping from the dataset to each of the three target datastores. Choose to do auto-mapping in each case. You will end with the following. Execute this mapping once to verify that there are no errors. No file will be created, so do not look for it. We are just populating data into the target via this Mapping. Next step is to actually write out the data into a JSON file. For this we will use a Procedure and the 'CREATE FILE' XML driver command. The JSON file is being created from the schema of the target Complex File dataserver. Since we want this Procedure to be executed after the Mapping, but in the same Session so that we will get the data, we shall put the Mapping and Procedure into a Package. Execute this Package. The contents of our initial JSON file will get written out to the new JSON file. You will notice that the psuedo-root node "tweet" is absent in the result. This is a basic example of reading (and writing) JSON file using Oracle Data Integrator. More complex flows can be created and the source/target of data can be changed. The main take-aways should be about the pseudo-root and the fact that Complex File technology works the same way as XML technology and accepts the same properties and commands. Note: In the mapping shown above, all the target tables are in the same mapping. This depends on the Parallel Target Table load feature of ODI and has all its limitations. Please verify that your target tables are indeed capable of being loaded in parallel. Otherwise you will need to create one mapping per XML hierarchy level.

JSON

{
"completed_in": 0.012,
"max_id": 130,
"max_id_str": "136536013832069120",
"next_page": "?page=2&max_id=136536013832069120&q=twitterapi&rpp=1",
"page": 1,
"query": "twitterapi",
"refresh_url": "?since_id=136536013832069120&q=twitterapi",
"results": [
{
"created_at": "Tue, 15 Nov 2011 20:08:17 +0000",
"from_user": "fakekurrik",
"from_user_id": 370,
"from_user_id_str": "370773112",
"from_user_name": "fakekurrik",
"geo": null,
"id": 136,
"id_str": "136536013832069120",
"iso_language_code": "en",
"metadata": {
"result_type": "recent"
},
"profile_image_url": "http://a1.twimg.com/profile_images/1540298033/phatkicks_normal.jpg",
"source": "<a href="http://twitter.com/">web</a>",
"text": "@twitterapi, keep on keeping it real",
"to_user": "twitterapi",
"to_user_id": 62,
"to_user_id_str": "6253282",
"to_user_name": "Twitter API"
}
],
"results_per_page": 1,
"since_id": 0,
"since_id_str": "0"
}

The corresponding XSD file shall be

As you can see it has the attribute nxsd:version="JSON". Also you can see that the XSD has a root element called "tweet". Do not be concerned. This is just an empty holder.

We shall be using the same XSD for defining source and target Complex File dataservers, but with different value for the "s=" parameter of the dataserver.

Let us define the source Complex File dataserver. See below the definition.

You can see that the "tt" property is set. This is to clarify the nature of the dataserver. Now create Physical Schema, Logical Schema and a Model for this dataserver. See below the Model and datastores that will get created.

Now create a target Complex File dataserver.

As can be seen the "f=" parameter is not specified. So this dataserver will not have any content. It will only have a shape. Also note the different value given to "s=" parameter. Again create Physical Schema, Logical Schema and Model for this dataserver.

Now that we have the Models created, let us create a Mapping. Create a Mapping with empty dataset. Then multi-select (Ctrl-click) all datastores from "twitter_simple" model. Darg-drop them into the default dataset. Re-arrange them to see all the datastores.

Now multi-select all datastores from "twitter_simple_targ". Darg-drop them outside of the default dataset. Re-arrange them. Create mapping from the dataset to each of the three target datastores. Choose to do auto-mapping in each case. You will end with the following.

Execute this mapping once to verify that there are no errors. No file will be created, so do not look for it. We are just populating data into the target via this Mapping.

Next step is to actually write out the data into a JSON file. For this we will use a Procedure and the 'CREATE FILE' XML driver command.

The JSON file is being created from the schema of the target Complex File dataserver. Since we want this Procedure to be executed *after* the Mapping, but in the same Session so that we will get the data, we shall put the Mapping and Procedure into a Package.

Execute this Package. The contents of our initial JSON file will get written out to the new JSON file. You will notice that the psuedo-root node "tweet" is absent in the result.

This is a basic example of reading (and writing) JSON file using Oracle Data Integrator. More complex flows can be created and the source/target of data can be changed. The main take-aways should be about the pseudo-root and the fact that Complex File technology works the same way as XML technology and accepts the same properties and commands.

Note:

In the mapping shown above, all the target tables are in the same mapping. This depends on the Parallel Target Table load feature of ODI and has all its limitations. Please verify that your target tables are indeed capable of being loaded in parallel. Otherwise you will need to create one mapping per XML hierarchy level.

Hoarding Data

Monday, February 9, 2015

Creating a JSON file

Oracle Data Integrator and JSON

Getting our feet wet

Note: