Driving Watson Studio Jobs From Cognos

Tony Hickman
9 min readFeb 1, 2023

--

Over the last few years I have been working on architectures which leverage Watson Studio to process source data and populate Db2 with enriched data. This enriched data is then access and analysed by business users using Cognos. This pattern works extremely well but there are situations where the business users generate report data which they want to feedback into Watson Studio for further processing. Given this challenge I set about figuring out a way to achieve this and my findings are the subject of this post.

The first thing I wanted to figure out was how could I share data between Cognos and Watson Studio. This turned out to be quite simple as Cognos supports the ability to write “Job” output to a Cloud Object Storage. To set this up first I needed to add “Storage” to Cognos. This is done via the “Manage” option from the top left hand corner menu.

Manage menu

This provides access to the “Storage” option.

Storage configuration option

From here a Cloud Object Storage endpoint can be configured.

Initial configuration

Once the Cloud Object Storage has been configured it can be used in a Cognos “Job” as part of the “Delivery” options.

Delivery save to cloud

Expanding the “Save to cloud” option allows the name of the file for the report to be saved under and the Cloud Object Storage in which to save it to be set.

Report storage to Cloud Object Storage

With that set up I have a “Job” which runs to execute a report and writes the output to the Cloud Object Storage. This means that in a Watson Studio notebook I can easily access the report data via the Cloud Object Storage service.

Given the “sharing data” from Cognos aspect was covered I then moved on to looking at how to initiate “jobs” within Watson Studio. Again after a quick search I landed on the documentation for the the API’s I needed. Using this information I set about creating a set of NodeRED flows to do what I needed. The first bit to tackle was to create a flow to generate a “Bearer Token” to allow the lower level API’s to be called.

Generate Bearer Token

This flow uses an access API key generated via the IBM Cloud Console (I won’t cover in detail here) which is passed into the flow using an operating system environment variable (accessed in the set node). To generate the token I use an HTTP Request node to call https://iam.cloud.ibm.com/identity/token passing the following request payload grant_type-urn:params:oauth:grant-type:apikey&apikey=<my cloud API Key>. Once generated I store the “Bearer Token” in a global variable within NodeRED. As the token expires I set my inject node to tigger when the flow is loaded and then every 15mins to ensure its refreshed.

With the “Bearer Token” management handled I move my focus to creating flows to manage “Jobs” inside Watson Studio. Reading the API specification it was clear that any calls are scoped by “Projects” and so I needed to specify a “Project Id” to be used for each call. As with the Cloud API Key I took the approach to pass this in via an environment variable.

Set up flow variables

Again I set this flow to trigger once when the flow loads and it copies the project ID from the environment variable and stores it in flow.projectID so I can access it in other flows. I also copy the project ID to msg.payload purely to allow the debug node to display the value to support debugging if needed.

I then created a flow to start a “Job Run” for a given “Job”. Jobs have a unique “asset id” so this was what I decided to use as my “selector”.

Start job flow

In this flow you can see that I am using a number of “link call” nodes to support re-use within my flows. The flow is triggered by an HTTP Get request and takes in a query parameters named jobName , name, and description. It then calls the “Get Jobs” flow.

Get Jobs flow

This flow calls the “Setup” flow (which I will cover soon) and then calls the “Get Jobs” API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs?project_id=<project id>. It should be noted that I am working against a UK instance of Watson Studio so the API would need to be adjusted for other regions (something I will look to parameterise at some point). Back in the “Start Job” flow, the list of jobs is then searched for the one which matches the passed jobName . Once found the flow pulls out the jobs “asset id” and calls the “Setup” flow (again I’ll cover that soon). It then sets up a “Job Run” payload as follows:

{ 
"job_run" : {
"name" : <name query parameter>,
"description": <description query parameter>,
}
}

and calls the “Start Job Run” API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs/<asset id>/runs??project_id=<project id> . The <asset id> is the one that matches to the Job we found earlier in the flow. Finally the flow generates a response string and returns it to the requester. Going back to the “Setup” flow it is defined as follows.

Setup Flow

This flow pulls out the “Bearer Token” from the global context and the “Project ID” from the flow context and stores them in the msg object. Finally it sets up msg.headers to contain the “Bearer Token”

msg.headers = {
Authorization: msg.access_token
}

To support better visibility of “Job Runs” within Cognos I decided to add functionality to store the “Job Run” information in Db2 as this can be easily displayed in Cognos. This is where the final part of my “Start Job” flow comes in. Following the completion of the API call to start the job I capture the details and link to another flow to add the details to Db2. This lead me to create another set of flows to handle interaction with Db2 which I’ll cover now.

The first thing to point out is that I am not using any of the Db2 NodeRED nodes in the catalog but rather using the ibm_db2npm module (this requires the Db2 client to be installed in the underlying operating system where NodeRED is running). To use this I updated the NodeRED settings.js file to define a functionGlobalContextto import the library.

functionGlobalContext: {
// os:require('os'),
DB2:require('ibm_db'),
},

The first thing was to set up a connection to my Db2 instance. As this could change I again took the approach of reading the necessary configuration from environment variables.

Connect to Db2

Expanding the “Connect to Db2” function node.

// Get Db2 library from global context - Added via settings.js
const Db2 = global.get("DB2");

// Defined new database
const conn = new Db2.Database()

// get connection parameters
const host = msg.db_host;
const port = msg.db_port;
const user = msg.db_user;
const password = msg.db_password;
const database = msg.db_database;

// Create connection String
const cn = "DATABASE="+database+";HOSTNAME="+host+";PORT="+port+";PROTOCOL=TCPIP;UID="+user+";PWD="+password+";Security=SSL;"; // ;sslConnection=true

// open a connection to the database
conn.open(cn);

flow.set("db_connection", conn)

return msg;

This node uses the environment information which is copied in to the msg object and the connection is then stored in a flow variable named db_connection. The flow is set with an inject node which is triggered on flow load so the connection is always active.

With the connection set up I focused on writing data to the database table JOB_CONTROLwhich I had created. The flow to do this was pretty simple.

Store job in Db2

All the logic is contained in the “Function” node and is set to.

var conn = flow.get("db_connection");

console.log("Add row to table");

var insert = "insert into JOB_CONTROL values ('" +
msg.payload.job_date + "','" +
msg.payload.job_ref + "','" +
msg.payload.job_id + "','" +
msg.payload.job_name + "','" +
msg.payload.job_type + "','" +
msg.payload.state +
"')";

console.log(insert);

var result = conn.querySync(insert);

msg.payload = result;

return msg;

This uses data passed in msg.payload to create and SQL Insert statement which is then executed to inset the row.

That covered capturing the Job information but to make it really useful I needed to update the stored data with the latest state of the “Job Run” . To handle this I created two additional flows, one to retrieve data from Db2 table and another to use the to check the latest status of the job. This first flow is very simple.

Get jobs from Db2

Again the functionality here is contain in the “Function” node.

var conn = flow.get("db_connection");

console.log("Get Rows");

var rows = conn.querySync("select job_date, job_ref, job_id, job_name, job_type, state from JOB_CONTROL");

msg.payload = rows;

return msg;

I could optimise the SQL call here to exclude any jobs that have completed but for now I filter those out in the “Update” flow. The “Update” flow is set to trigger every 1 minute and checks each job stored in Db2 to see if the status needs updating.

Update Job Flow

This flow uses a “Split” node to process each job held in the database. If the job is not in “Completed” state a call is made to the “Get Job” flow to retrive the latest details for the job. To get the status the “Get Job” flow is passed id’s for both the job and the job run which are stored in the Db2 table as JOB_ID and JOB_REF. The results of the call to this flow are then passed into a “Function” node to update the state information held in Db2.

var conn = flow.get("db_connection");

console.log("Update row to table");

var update = "update JOB_CONTROL set STATE='" +
msg.payload.entity.job_run.state + "' WHERE JOB_REF='" +
msg.job.JOB_REF +
"';";

console.log(update);

var result = conn.querySync(update);

msg.payload = result;

return msg;

The “Get Job” flow is very simple.

Get Job flow

This wrappers a call to the API https://api.eu-gb.dataplatform.cloud.ibm.com/v2/jobs/<job asset id>/runs/<run asset id>?project_id=<project id>

After all that I had a set of flows that would…

  1. Allow me to start a job in Watson Studio and store the details in a Db2 table
  2. Automatically update the status of the job in the Db2 table

now I needed to look at linking this back into Cognos.

Based on previous work I’d done with Cognos I was aware that webpages can be added to Dashboards using a “Webpage Widget”. Following this approach I decided to create a simple Webpage in NodeRED to allow a job to be started which could then be included in a dashboard. This approach also allowed me to display the information in the Db2 table to show the status of any submitted jobs. The flow to present the Webpage was very simple and I used a “Template” node to hold my Webpage code.

Webpage flow

I set the Webpage to be hard wired into a single job in my project and it just displayed a button to allow me to start the job. The following shows how this looks in a Cognos Dashboard.

Cognos Dashboard

Going forward I would like to pull back a list of the jobs and provide options to not only start any job but also specify the name and a description and potentially any parameters defined for the job. In the Dashboard the “Job Run Status” table was set to auto refresh every 60 seconds. In order to access the Db2 table I needed to do the normal configuration in Cognos but thats very straight forward.

So that's it, I now have a way drive and share data with Watson Studio Jobs from Cognos.

--

--

Tony Hickman

I‘ve worked for IBM all of my career and am an avid technologist who is keen to get his hands dirty. My role affords me this opportunity and I share what I can