Optional Missing Data without Certainty impact

Optional Missing Data - without Certainty impact

When building Optional Missing Data patterns, if a data is missing or if a condition isn’t met, Rainbird will infer a result with a lower certainty level. If it is needed of Rainbird to make this inference without lowering the certainty level, extra steps are needed which are described in this document.

Intent:

In summary, this pattern allows Rainbird designers to handle optional conditions or data without impacting the certainty of the inferences made.

When a Rainbird designer wants to build this pattern, he needs to take into consideration the number of potentially missing conditions/data. The expected result can easily be achieved by creating an extra rule when only one potential missing condition/data is taken into account. The number of rules needed grows exponentially with the number of potentially missing conditions/data to manipulate. This pattern provides a specific solution to reducing the number of rules needed.

Applicability

This pattern is applicable when:

  • Answers have been skipped by a user in a Rainbird driven interaction (allowUnknown enabled).
  • Data is not available from a datasources, stored fact in the Knowledge Map or injected into the model via the APi.
  • Facts have not been inferred at run time (not fact inferred or inferred with too small a certainty).
  • The missing facts are not essential to a decision being made.
  • The model is using uncertainty in its inference
  • The missing data is not essential to make an required fact

And when:

  • The missing data/facts/conditions should not impact the certainty of the result

We will use the same example as the one described in the Optional Missing Data Pattern description: Insurance Liability Claim. Please read the Optional Missing Data pattern description for more details on this example.

Let us consider a company wanting to make an assessment of liability on a motor claim. The involved parties have completed a questionnaire with information on the incident ’s circumstances. The model will produce a summary of the drivers behaviour being either ‘Appropriate’ or ‘Inappropriate’  in these circumstances that includes a confidence in these results.

The model includes the driver’s description of the road conditions (wet, dry, icy, muddy, …), we deem this data to be non-essential for Rainbird to make an inference on the driver’s behaviour (i.e. if the driver does not remember or has not included this information we can still be reasonably confident on their behaviour being appropriate/inappropriate in the incident based on other key factors).

By running the model enclosed in the .rbird export at the end of this article, we obtain the following result:

Optional Missing Data without CF impact Fig 1

Figure 1: Missing data (0% Certainty) in the Salience View

Figure 1: We obtain a certainty of 67% that the driving was inappropriate, as the 33% missing comes from the fact the driver didn’t describe the road as being wet. This missing data impact the result and decrease it. Now if we don’t want the model to decrease the certainty level in case this specific data we need to duplicate the rule shown in figure 2 (with impact on certainty), and trigger one or the other rules shown in figure 3 (without impact on certainty), depending on the availability of the data.

Optional Missing Data without CF impact Fig 2

Figure 2: Rule with impact on certainty

Building 2 Rules to remove missing data from certainty calculation:

Figure 3: In this case, the order of the rule in the relationship menu is important, the rule must be before the rule in Figure 4, otherwise it will always trigger the rule of Figure 4 and will therefore never take into account the fact that the vehicle is travelling on a wet road.

Optional Missing Data without CF impact Fig 3

Figure 4: This rule will trigger if the condition “%VEHICLE  travelling on Wet” is not respected, because it is now  a mandatory condition in the rule shown in Figure 3.

Optional Missing Data without CF impact Fig 4

With a model configured with the rules in Figure 3, we’ll have the following result, shown in the Salience View (Figure 5), in case we don’t know if the car was travelling on wet.

Optional Missing Data without CF impact Fig 5

Note: The two rules framed in purple are important, they trigger the second rule only if the Road Conditions are not Wet, hence when this information is missing. If we don’t includes these conditions, the second rule could be triggered when not needed. For example it could be triggered in the case where the first rule has a lower certainty level than the second, indeed Rainbird will prioritise the higher certainty level rule, giving the wrong output in this specific case.Fugre

If we would like to have 2 or more optional missing data which don’t impact the certainty result, we can simply reproduce the pattern described above and create additional rules to cover all the different possible outcome. For 2 optional missing data we would need to create 4 rules:

  • 1st rule when all the data is available
  • 2nd rule when the first optional data is missing, but when the second optional data is available
  • 3rd rule when the first optional data is available, but when the second optional data is missing
  • 4th rule when both optional data are missing.

If we need 3 optional missing data, we would need to create 8 rules. The number of rules is equal to 2 power n, with n being the number of optional missing data. It can quickly become overwhelming, which is why this section propose another possible pattern to handle a great number of optional missing data.

To illustrate this new pattern we will take the same example as above and consider the driver’s visibility as another optional missing data, on top of the road condition.

Instead of building 4 different rules on the same relationship. We will build 2 relationships with 2 rules each, as shown in Figure 6.

Optional Missing Data without CF impact Fig 6

Figure 6: Overview of the Rules to create (the Rule “Deemed to have” is only for the first pattern)

Before that we regroup all the conditions from which we will have an impact on the certainty, even if the data is missing. We call this rule “n0”. In our example, it only has one condition, the suitability of speed.

Optional Missing Data without CF impact Fig 7

Figure 7: Relationship and rule n0

Now we build another relationship in which we build the same pattern described above: 2 rules triggered depending on the data being available or missing. We call this relationship n1. We call the relationship n0 as condition inside this new relationship rules, as describe just bellow:

Optional Missing Data without CF impact Fig 8

Figure 8: Relationship and rule n1

This will allow the model to consider the visibility as an optional missing data without impacting the certainty in case the data is missing. Now we just need to reproduce this pattern a second time with a new relationship we’ll call n2, which will handle our second optional missing data, the driver’s visibility. We’ll call the relationship n1 as a condition this time, this will create a chain of rules.

Figure 9: Relationship and rule n1

To have the output, the user would need to start a query on the relationship “n2”, which will call in turn the relationship “n1” and “n0”.

Note: With only 2 optional missing data, this second pattern is not more time efficient than the first as we are building the same number of rules. However, it is clearer. It becomes also quicker to build when dealing with 3 or more optional missing data.

On a theoretical point of view, we can illustrate the pattern as shown bellow Figure 10.

First pattern, with 1 optional missing data:

Optional Missing Data without CF impact Fig 10

Figure 10: First pattern schema description

In rule 1, the certainty will be impacted by the condition that triggers the rule (eg “the road is Wet”) and the other set of conditions (eg “speed is not suitable”), while in rule 2, the certainty will only be impacted by the set of condition.

Second pattern, with 2 optional missing data:

Figure 11: Second pattern schema description

Click on the ‘Export.rbird’ button to download the ‘Optional Missing Data without CF impact’ map used in this example. The knowledge map can then be imported into your Rainbird Studio

Query and Results

The Export File below has the main rules on the (4) Relationship between the Concepts ‘Driver’ and ‘Driver behaviour summary’. Please take into consideration that only specific answers will show results – more information in the article above.

Article Feedback form
Did you find this article useful?

Version 1.01 – Last Update: 26/02/2021