Configuration Driven Interactions with watsonx Assistant

13 min readSep 2, 2024

Following on from my last article discussing Conversing Naturally with Assistants I am going to focus on how the Cora Future Initiatives team, my IBM Client Engineering colleagues and I implemented the patterns described by Bob Moore.

We started by looking at an implementation of the patterns Bob had created. Bob implemented his framework in both Dialog and Actions but in both cases the approach to implement and new “interaction” was based around duplication of code.

When we looked deeper we could see that the changes that were needed [which caused the code to be duplicated] were in a common set of configuration items within each pattern block. This got me to thinking….

What if we abstracted this configuration away and blended it in at execution time?

I had already explored this a little bit in some work I did previously around managing assistant responses from NodeRED and slot filling. Given that both were focused on Dialog as the watsonx Assistant excution layer and the preference from the Cora team at the time was to focus the initial work on Dialog this was the base we started to build out on.

Common Elements

We looked across all the patterns to discover common configuration elements. Based on this work we found the following elements:

Clarification — What to ask the user if the assistant is unsure it has located the correct “intent”
Response — What to say to the user based in the question they have asked
Repeat — How to repeat the response should the user request that e.g. if they say something like “Can you say that again?”
Example(s) — Where a response may cover a range of areas / items etc what to say if the user asks for some examples. For example the assistant may respond to the question: “What movies do you like?” with “I like movies with a strong AI lead”. A follow up question from the user may be “interesting, can you give me some examples” to which the assistant could respond “oh I enjoyed Ironman, Wall-E and She”
Paraphrase — Another way to word the response for the cases where the used doesn’t understand. So, in the previous example the user may respond to the assistants first response with “I’m sorry I don’t get what you mean”, in which case the assistant could say “So I enjoy films that have computer characters in them”

By dynamically setting these elements we found we could implement a common dialog flow for the User Inquiry pattern and supporting sequence-level management patterns. Building on the JSON driven approach I covered in my articles that I referenced above we defined the following base JSON structure.

{    
  <Intent Name>: {
      <Persona>: {
            "clarify": <clarification message>,
            "response": <response message>,
            "repeat": <repeat message>,
            "example": <examples message>,
            "paraphrase": <paraphrase message>
     }
}

Let’s break this down a bit…

Firstly, the Intent Name. This is the linkage between watsonx Assistant and the configuration items. When watsonx Assistant identifies an “Intent” this can be passed to the configuration layer to locate the appropriate configuration JSON. Next, we have the concept of a Persona. The idea here is that the response payload could be altered for different personas. In the end we didn't exploit this, but it does opens up an interesting opportunity around using an LLM to generate variations of the JSON based on persona types. This content could then be reviewed by a human before releasing to the assistant. Finally, we have the actual configuration information.

Integrating with the “backend”

With the JSON structure defined the next task was to build out a skill in watsonx Assistant to load this configuration at runtime. As we were building out using Dialog we used a Web Hook which targeted a NodeRED endpoint (anyone who has read my other posts will know that this is my goto rapid backend prototyping tool). As we wanted to keep things a generalised as possible we used my previous approach of using the Node-RED endpoint as a “router” to which we pass the whole watsonx Assistant skill context. This allows execution decisions to be handled in the Node-RED defined configuration engine.

Creating the skill

We next focused on the actual skill flow. Building on Bob’s initial implementation we were able to create a set of Dialog nodes to cover the conversation-level ( C ), sequence-level ( B ) and the user inquiry (A1) patterns. With this in place we looked at Intent processing. As we would have a range of patterns at the conversational activities layer (the A patterns which include A1), we opted for a naming convention for the Intents where the intent name was prefixed with the pattern identifier e.g. in this initial test case A1.

So a simple “how do I order a card reader” user inquiry JSON would look like:

    "A1-orderCardReader": {
        "default": {
            "clarify": "Do you need to order a new card reader?",
            "response": "No problem. Your new card reader has been ordered and will be with you within 7 working days.",
            "repeat": "Your new card reader has been ordered and will be with you within 7 working days.",
            "example": null,
            "paraphrase": "I have arranged for a new card reader to be sent to your home address. It should arrive within the next 7 working days."
        }
    }

With a matching A1-orderCardReader intent configured in watsonx Assistant.

In order to pick up intents for particular “patterns” we implemented the concept of a “catcher” within our dialog skill.

There are some extra control variables in the “If assistant recognizes” section which I’ll get to later, but the key point here is that we used a startsWith function to see if the top detected intent was marked as being built on the A1 pattern. If the intent does match then flow continues to the Webhook call to load the configuration.

Here you can see that we pass several parameters to the Webhook.

key : which is set to the variable $current_interaction which in turn is set to the highest confidence level intent name in the catcher
type : which is set to intent but as we progressed development this became redundant (originally we were considering if we could detect using entities but this approach proved to be unnecessary)
module : defines the “pattern” identifier to help locate configuration information. This could potentially be factored out and the start of the key used but for now we’ve kept it in place
context : the current context from watsonx Assistant. This allows the backend code to access the full range of variables / data in the context. One use case we explored was to use this to pass in the persona to use. Generally, I find it easy to pass in the entire context and use backend code to locate / map required data as this removes the need to expose more backend implementation details to watsonx Assistant and makes the calls highly re-useable. [moving forward we will probably re-visit this to optimise what is sent].

Finally, we set up the Webhook for the skill to target a Node-RED flow. With all this in place we could open the “Try it” panel and test the interaction. Doing this we could see that the configuration was picked up and executed correctly. Given this successfully proved the concept we moved on to look at the other A patterns. Based on the style of engagements we could envisaged we chose to focus on the Extended Telling (A3) and Open Request (A2) patterns.

A3 Pattern — Storytelling

In the case of the Extended Telling (A3) pattern we could see that at its core, it is an array of A1 style responses with each providing part of the information that needs to be provided to the customer. This allows a large block of information or instructions to be decomposed into a set of manageable “chunks”. Based on this we created the following JSON structure.

{
    "A3-creditCards-explore": {
        "default": {
            "clarify": "Do you want to learn more about credit cards?",
            "parts": 3,
            "storyParts": [
                {
                    "confirm": null,
                    "response": "Credit cards allow you to make payments with money you borrow from the bank.",
                    "repeat": "Credit cards allow you to make payments with money you borrow from the bank.",
                    "example": "The average annual interest rates on credit cards has risen to 24.59%",
                    "paraphrase": "You make payments out of an account different your current account."
                },
                {
                    "confirm": null,
                    "response": "You get a statement each month to pay off.",
                    "repeat": "You get a statement each month to pay off.",
                    "example": null,
                    "paraphrase": "You need to make payments regularly."
                },
                {
                    "confirm": null,
                    "response": "Finally, you pay interest on what's left on your balance.",
                    "repeat": "Finally, you pay interest on what's left on your balance.",
                    "example": null,
                    "paraphrase": "You incur a cost called interest for borrowing the money, which needs paying back"
                }
            ]
        }
    }

As you can see, we declare how many “parts” there are in the response and within watsonx Assistant we use this as a control variable to allow us to iterate over the collection of responses. You may also spot that confirm is always set to null . This is because there is no need to confirm if we have the correct understanding of the customers intent as we have understood this and are now relaying information. A future refinement would be to look at removing this.

Again once we had the trigger intent defined in watsonx Assistant and the JSON content in Node-RED the conversation flowed as expected.

A2 Pattern — Open Request

Now both A1 and A3 patterns are focused on imparting information to the customer and the A2 pattern (Open Request) is focused on collecting data from a customer and this was the next area of focus.

At its heart A2 is a list of data points which are required to be collected from a customer but when we looked at it in more detail we identified two distinct execution models:

Predefined set of details which are all required to be collected (e.g. to get a balance you may need to always ask for an account number and sort code)
A set of details which can only be determined at the stage of capture and where only a subset of the details may need to be collected (e.g. to cancel a payment the information needed for a cheque payment as opposed to a funds transfer are different. In the case of a funds transfer may vary if the transfer is within the person’s accounts vs to a different person)

We came to name these A2 Static and A2 Dynamic respectively. In both cases the JSON structure we defined was the same, but the execution flow was different. The following is an example of a A2 Static configuration.

"A2-balanceInquiry": {
    "action": {
        "type": "dialog",
        "name": "A1-inquiry-balance"
    },
    "default": {
        "confirm": "I can help you with that.",
        "clarify": "Do you want me to get your balance for you?",
        "response": "I can get your balance for you",
        "repeat": "I can get your balance for you",
        "example": null,
        "paraphrase": "Sure I can do that",
        "verify": true,
        "complete": "Thank you thats all I need.",
        "dataCapture": {
            "numberOfItems": 2,
            "dataToCapture": [
                {
                    "variableName": "account_number",
                    "description": "account number",
                    "justification": "I need to know which account to look at.",
                    "confirm": null,
                    "response": "What is your account number?",
                    "repeat": "What is your account number?",
                    "example": "It's a 10-digit number, for example 1748295736.",
                    "paraphrase": "What is the 10-digit number associated with your account?",
                    "type": "A2-account-number",
                    "collected": false
                },
                {
                    "variableName": "sort_code",
                    "description": "account sort code",
                    "justification": "I need the sort code to locate the account.",
                    "confirm": null,
                    "response": "What is your account sort code?",
                    "repeat": "What is your account sort code?",
                    "example": "It's three two digit numbers with hyphens between, for example 50-29-44.",
                    "paraphrase": "What is your 6-digit sort code?",
                    "type": "A2-account-sortcode",
                    "collected": false
                }
            ]
        }
    }
}

Again, let's break this down a bit. At the start of the JSON, we have an action defined. This is to allow the framework to know what to do once the data collection has completed. In the example above, the direction is to progress to the A1-inquiry-balance interaction. As this new interaction would then need to do the actual look up we extended the capability of the A1 pattern to support executing a REST call and blending data into the response. This allowed us to assign all the “knowledge” about how to look up and respond with an account balance to the one interaction A1-inquiry-balance . In this way if another interaction needs to display an account balance it can re-use the A1-inquiry-balance interaction.

After the action definition you can see the normal configuration for interacting with the customer to introduce the data collection phase. We added a couple of extra features:

The ability to verify the data that has been collected
An acknowledgement when all data has been completed (and if necessary verified)

Next, we have the definition of what we need to collect from the customer. This is represented as an array of JSON objects that represent each data collection requirement and the numberOfItems to collect which is used by watsonx Assistant to iterate over the collection array. Focusing on the JSON object defining the collection of the data you can see the normal interaction configuration but in addition we have information about the data to collect:

The name of the variable which will hold the information.
The type of data to collect which allows us to decouple the variable from the collection and utilise Entities in watsonx Assistant in a more generic way to capture the data.
A flag to indicate if the data has been collected. This is used at execution time within watsonx Assistant and so is always initially set to false

Memory

So, we have a definition in the JSON of what we need and via type which defines the way in which to collect it. Implementing the capture process was pretty straightforward, and we re-used the previous work I had done (see references above) but it did open up an interesting question around how to avoid asking for information that we already knew. This led us to creating two memory areas within watsonx Assistant

A short-term memory area used at the point of collecting information
Longer term memory where data is held for the duration of the interaction

In this way we could pre-load information about the customer into the “longer term” memory and use a “sweeper” process to snap in any data before starting the collection process. In the case where a variable is in the “longer term” memory (by variable name) the value is added to the “short term” memory and the collected flag set to true . In this situation the collection process in watsonx Assistant would “skip” asking for the data as it is already known.

Putting this all in place worked a treat but did raise another question around the “volatility” of the data in the “longer term” memory. For example, could we rely on an account number and sort code remaining the same for all interactions? With this in mind we implemented a feature to allow a variable to be flagged as “always ask” in which case any pre-populated data would be ignored, and the customer would be asked to provide the information. This could feel a little clunky, but this is where the Dynamic collection can really help.

So, moving on to A2 Dynamic… An A2 Dynamic is indicated by an additional JSON parameter in the configuration

"A2-addNewPayee": {
    "action": {
        "type": "dialog",
        "name": "A1-inquiry-balance"
    },
    "captureType": "dynamic",
    "default": {
        "confirm": "I can help you set up a new payee."...

On top of this we needed to indicate what data to collect initially to then determine what to ask for next or to conclude that collection was complete. This was handled by extending the data collection element JSON as follows

        "dataCapture": {
            "numberOfItems": 4,
            "dataToCapture": [
                {
                    "variableName": "payment_size",
                    "description": "the size of the payment",
                    "justification": "The process of setting up a new payee is different for large payments.",
                    "confirm": false,
                    "response": "How large is the payment?",
                    "repeat": "Can you tell me how large your payment is?",
                    "example": "Is the payment small (up to £1000), medium (up to £5000) or large (more than £5000)?",
                    "paraphrase": "Is the payment small (up to £1000), medium (up to £5000) or large (more than £5000)?",
                    "type": "A2-payment-size",
                    "collected": false,
                    "required": "always"
                }...

Here you can see an additional field required which is one of three values

always which means it must always be asked for and this indicates the initial data to request
no the data is not required to be collected. In its initial defined state all items that are not always required are set to no
yes the data is required to be collected. This value is set during the evaluation phase of A2 Dynamic where is uses the currently collected data to assess what to collect next

How to decide what to ask

So I guess the next question is “how do we determine what to collect next?”. This is where we extended watsonx Assistant to leverage IBM Advanced Decision Services. We created a way to allow a Dynamic interaction to be “drawn” and from this generate a set of rules to allow watsonx Assistant to determine what to ask for next. The details of this are more involved and represent a complete post of their own but needless to say using this allowed us to very quickly define and implement complex data collection interactions. At its core the approach relies on dynamically updating the configuration JSON to mark which data needs to be collected next or to change the captureType to dynamic-completed when the rules have determined that the appropriate level of information has been collected.

The final bit of “secret sauce” for A2 Dynamic is in its ability to determine “what next” after collection has completed. Again, as we worked through a number of example interactions it became clear the “target” interaction that should follow could vary based on the data that has been collected. To support this the “target” interaction to use when collection had completed was moved to the rules execution layer. Here when the rules determine all necessary data has been collected, the target interaction to move to is returned as part of the response to watsonx Assistant. Again, the definition of this “target” is generated from the “drawn” interaction flow.

Summary

With all this in place we were able to demonstrate the ability to:

Allow business / non-technical people to represent new interactions and define the conversational content to be used when the interaction is run
Create a highly modular execution framework in watsonx Assistant which leverages the power of watsonx Assistant but allows the level of required development work for new interactions to be significantly reduced
Leverage a rules engine to support the determination of data collection paths at execution time

Since completing this initial work, we have run a number of follow on MVPs including moving the framework from Dialog to Actions and I plan to cover this in a future post.