Driving Watson Studio Jobs From Cognos
Over the last few years I have been working on architectures which leverage Watson Studio to process source data and populate Db2 with enriched data. This enriched data is then access and analysed by business users using Cognos. This pattern works extremely well but there are situations where the business users generate report data which they want to feedback into Watson Studio for further processing. Given this challenge I set about figuring out a way to achieve this and my findings are the subject of this post.
The first thing I wanted to figure out was how could I share data between Cognos and Watson Studio. This turned out to be quite simple as Cognos supports the ability to write “Job” output to a Cloud Object Storage. To set this up first I needed to add “Storage” to Cognos. This is done via the “Manage” option from the top left hand corner menu.
This provides access to the “Storage” option.
From here a Cloud Object Storage endpoint can be configured.
Once the Cloud Object Storage has been configured it can be used in a Cognos “Job” as part of the “Delivery” options.
Expanding the “Save to cloud” option allows the name of the file for the report to be saved under and the Cloud Object Storage in which to save it to be set.
With that set up I have a “Job” which runs to execute a report and writes the output to the Cloud Object Storage. This means that in a Watson Studio notebook I can easily access the report data via the Cloud Object Storage service.
Given the “sharing data” from Cognos aspect was covered I then moved on to looking at how to initiate “jobs” within Watson Studio. Again after a quick search I landed on the documentation for the the API’s I needed. Using this information I set about creating a set of NodeRED flows to do what I needed. The first bit to tackle was to create a flow to generate a “Bearer Token” to allow the lower level API’s to be called.
This flow uses an access API key generated via the IBM Cloud Console (I won’t cover in detail here) which is passed into the flow using an operating system environment variable (accessed in the set
node). To generate the token I use an HTTP Request node to call https://iam.cloud.ibm.com/identity/token
passing the following request payload grant_type-urn:params:oauth:grant-type:apikey&apikey=<my cloud API Key>
. Once generated I store the “Bearer Token” in a global variable within NodeRED. As the token expires I set my inject node to tigger when the flow is loaded and then every 15mins to ensure its refreshed.
With the “Bearer Token” management handled I move my focus to creating flows to manage “Jobs” inside Watson Studio. Reading the API specification it was clear that any calls are scoped by “Projects” and so I needed to specify a “Project Id” to be used for each call. As with the Cloud API Key I took the approach to pass this in via an environment variable.
Again I set this flow to trigger once when the flow loads and it copies the project ID from the environment variable and stores it in flow.projectID
so I can access it in other flows. I also copy the project ID to msg.payload
purely to allow the debug node to display the value to support debugging if needed.
I then created a flow to start a “Job Run” for a given “Job”. Jobs have a unique “asset id” so this was what I decided to use as my “selector”.
In this flow you can see that I am using a number of “link call” nodes to support re-use within my flows. The flow is triggered by an HTTP Get request and takes in a query parameters named jobName
, name
, and description
. It then calls the “Get Jobs” flow.
This flow calls the “Setup” flow (which I will cover soon) and then calls the “Get Jobs” API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs?project_id=<project id>
. It should be noted that I am working against a UK instance of Watson Studio so the API would need to be adjusted for other regions (something I will look to parameterise at some point). Back in the “Start Job” flow, the list of jobs is then searched for the one which matches the passed jobName
. Once found the flow pulls out the jobs “asset id” and calls the “Setup” flow (again I’ll cover that soon). It then sets up a “Job Run” payload as follows:
{
"job_run" : {
"name" : <name query parameter>,
"description": <description query parameter>,
}
}
and calls the “Start Job Run” API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs/<asset id>/runs??project_id=<project id>
. The <asset id>
is the one that matches to the Job we found earlier in the flow. Finally the flow generates a response string and returns it to the requester. Going back to the “Setup” flow it is defined as follows.
This flow pulls out the “Bearer Token” from the global
context and the “Project ID” from the flow
context and stores them in the msg
object. Finally it sets up msg.headers
to contain the “Bearer Token”
msg.headers = {
Authorization: msg.access_token
}
To support better visibility of “Job Runs” within Cognos I decided to add functionality to store the “Job Run” information in Db2 as this can be easily displayed in Cognos. This is where the final part of my “Start Job” flow comes in. Following the completion of the API call to start the job I capture the details and link to another flow to add the details to Db2. This lead me to create another set of flows to handle interaction with Db2 which I’ll cover now.
The first thing to point out is that I am not using any of the Db2 NodeRED nodes in the catalog but rather using the ibm_db2
npm module (this requires the Db2 client to be installed in the underlying operating system where NodeRED is running). To use this I updated the NodeRED settings.js
file to define a functionGlobalContext
to import the library.
functionGlobalContext: {
// os:require('os'),
DB2:require('ibm_db'),
},
The first thing was to set up a connection to my Db2 instance. As this could change I again took the approach of reading the necessary configuration from environment variables.
Expanding the “Connect to Db2” function node.
// Get Db2 library from global context - Added via settings.js
const Db2 = global.get("DB2");
// Defined new database
const conn = new Db2.Database()
// get connection parameters
const host = msg.db_host;
const port = msg.db_port;
const user = msg.db_user;
const password = msg.db_password;
const database = msg.db_database;
// Create connection String
const cn = "DATABASE="+database+";HOSTNAME="+host+";PORT="+port+";PROTOCOL=TCPIP;UID="+user+";PWD="+password+";Security=SSL;"; // ;sslConnection=true
// open a connection to the database
conn.open(cn);
flow.set("db_connection", conn)
return msg;
This node uses the environment information which is copied in to the msg
object and the connection is then stored in a flow
variable named db_connection
. The flow is set with an inject node which is triggered on flow load so the connection is always active.
With the connection set up I focused on writing data to the database table JOB_CONTROL
which I had created. The flow to do this was pretty simple.
All the logic is contained in the “Function” node and is set to.
var conn = flow.get("db_connection");
console.log("Add row to table");
var insert = "insert into JOB_CONTROL values ('" +
msg.payload.job_date + "','" +
msg.payload.job_ref + "','" +
msg.payload.job_id + "','" +
msg.payload.job_name + "','" +
msg.payload.job_type + "','" +
msg.payload.state +
"')";
console.log(insert);
var result = conn.querySync(insert);
msg.payload = result;
return msg;
This uses data passed in msg.payload
to create and SQL Insert statement which is then executed to inset the row.
That covered capturing the Job information but to make it really useful I needed to update the stored data with the latest state of the “Job Run” . To handle this I created two additional flows, one to retrieve data from Db2 table and another to use the to check the latest status of the job. This first flow is very simple.
Again the functionality here is contain in the “Function” node.
var conn = flow.get("db_connection");
console.log("Get Rows");
var rows = conn.querySync("select job_date, job_ref, job_id, job_name, job_type, state from JOB_CONTROL");
msg.payload = rows;
return msg;
I could optimise the SQL call here to exclude any jobs that have completed but for now I filter those out in the “Update” flow. The “Update” flow is set to trigger every 1 minute and checks each job stored in Db2 to see if the status needs updating.
This flow uses a “Split” node to process each job held in the database. If the job is not in “Completed” state a call is made to the “Get Job” flow to retrive the latest details for the job. To get the status the “Get Job” flow is passed id’s for both the job and the job run which are stored in the Db2 table as JOB_ID
and JOB_REF
. The results of the call to this flow are then passed into a “Function” node to update the state information held in Db2.
var conn = flow.get("db_connection");
console.log("Update row to table");
var update = "update JOB_CONTROL set STATE='" +
msg.payload.entity.job_run.state + "' WHERE JOB_REF='" +
msg.job.JOB_REF +
"';";
console.log(update);
var result = conn.querySync(update);
msg.payload = result;
return msg;
The “Get Job” flow is very simple.
This wrappers a call to the API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs/<job asset id>/runs/<run asset id>?project_id=<project id>
After all that I had a set of flows that would…
- Allow me to start a job in Watson Studio and store the details in a Db2 table
- Automatically update the status of the job in the Db2 table
now I needed to look at linking this back into Cognos.
Based on previous work I’d done with Cognos I was aware that webpages can be added to Dashboards using a “Webpage Widget”. Following this approach I decided to create a simple Webpage in NodeRED to allow a job to be started which could then be included in a dashboard. This approach also allowed me to display the information in the Db2 table to show the status of any submitted jobs. The flow to present the Webpage was very simple and I used a “Template” node to hold my Webpage code.
I set the Webpage to be hard wired into a single job in my project and it just displayed a button to allow me to start the job. The following shows how this looks in a Cognos Dashboard.
Going forward I would like to pull back a list of the jobs and provide options to not only start any job but also specify the name and a description and potentially any parameters defined for the job. In the Dashboard the “Job Run Status” table was set to auto refresh every 60 seconds. In order to access the Db2 table I needed to do the normal configuration in Cognos but thats very straight forward.
So that's it, I now have a way drive and share data with Watson Studio Jobs from Cognos.