Data Comparison

Data Comparison

Comparing data is a common method when building Rainbird maps. In this article, we will show FOUR different optimised methods of comparing instances within Rainbird.

Intent

It is a common practice to compare data looking at two sets of source tables matching data fields, another practice could be to match existing data in a system against data entered at run-time.

Applicability

When can this pattern be applied?

  • When working with string, number, date or True/False (boolean) concepts
  • Overall, data comparisons could be KYC, account confirmation, inventory checks and many others.

Limitations

  • Unknown input (if not specified as missing data)
  • Incorrect spelling (NLP)
  • variations of undefined uppercase or lowercase
  • No machine learning adaptability  

Four Methods of Data comparison

Method 1 (M1) – looks for an item match (string, integer, boolean or date type) entered in the query and searches a data source or a stored fact to compare.

Method 2 (M2) –  An extension of M1 that factors in possible alias (synonyms) of the item entered/stored within the map which makes it more dynamic.

Method 3 (M3) – Also known as Fuzzy matching. M3 factors in more concepts, and can, therefore, match known or unknown items entered.

Method 4 (M4) – Checks if the item entered ‘status’ (it’s either active or inactive) then verifies that it corresponds (M1)

Note that this list is not exhausted, those mentioned are the most common methods.

Elements used in this build

Elbow‘ – Object-specific Rules

countRelationshipInstance

ADVANCED

Injecting knowledge from Google Sheets

This method determines if ‘’ Input Name”  and “Registered Name” matches. The result is set to boolean, either True or False.

Example: A perfect match of both data fields is required: John Dowe = John Dowe – any alterations whitespace, upper/lower case, spelling or additional information are not taken into consideration.

Method 1 Build:

Concepts types:

Input Name, Method 1, and Registered Name = String concepts 

Check result = Boolean

Figure 1: Generic Structure of the ‘Elbow Technique’

Rules:

All the rules for this method are located within the relationship “outcome check 1”. Both rules are object specific. 

Figure 2: Object specific Rules 

“True” result 

In order for this rule to trigger, “Input Name” must match “Registered Name”.

Figure 3: Matching variables result in ‘True’

“False” result 

In order for this rule to trigger, “Input Name” and “Registered Name” does not match.

Figure 4: Not matching variables result in ‘False’

This method determines if ‘’ Input Name”  either matches “Registered Name” or “Alias” or neither. The outcome is set to boolean and can be True or False.

Example: Alterations of the data is possible if defined in Rainbird: John Dowe is a english placeholder for a generic name field. Therefore we allow other languages (i.e. German: Max Mustermann or an abbreviation J. Dowe to pass the comparison.

Method 2 Build:

Concepts types:

Method 2, Data to check 2,  Register 2, Alias 2 = String concepts 

Check result 2 = Boolean

Figure 5: Generic Structure of the ‘Elbow Technique’

Facts:

The “Alias 2” concept has instances which are variations of the same name connected to instances in “Register 2”, those will be considered as an ‘OR’ case for a match.

In our example Jane Doe matches against Erika Mustermann (German Version of Jane Doe) and Jeanne Michelle (French Version).

Figure 6: Concept Register 2: Both Register names have an alias

Rules:

All four rules for this method are located in the relationship “outcome check 2”. All rules are object specific. 

Figure 7: Object specific Rules 

For better visibility, we only list the two new rules considering the Alias, as the other two rules are copies Method 1.

“True” result

In order for this rule to pass, “Input Name” must match either “Registered Name” or the assigned Alias fact.

Figure 8: Object specific ‘True’ rule including Alias

“False” result

In order for this rule to pass, “Input Name” does not match “Registered Name” or any Alias’s.

Figure 9: Object specific ‘False’ rule including Alias

This method allows users to deal with unknown comparison outcomes by asking a question, and give an outcome even if there is no match to the fact(s) in the Knowledge map. This is achieved with a countRelationshipInstance function.

Figure 10: Generic Structure of the ‘Elbow Technique’

Example: This comparison example is looking at missing or unknown data. This data could be a middle name. The Method shown is comparing and matching if a middle name exists and matches and in addition deals with missing data. Jean Michel might have entered his middle name George in the original data, but in a new data set forgot to enter it. Important to mention is, that a different middle name would still be a miss-match!

Method 3 Build:

Concepts types:

Input Name 3, Input middle name, Method 3, Registered Name 3, Registered Middle Name and Check result 3 = String concepts 

The result(s) of the comparisons are one of the  instances of Check result 3:  

  • No
  • No, middle name
  • Yes, middle name
  • Yes without middle

All used Concepts are String types

Example Instances:

The “Register 3” concept holds  instances which are linked alias facts of the concept instances in “Registered Middle Name”. For example Register “John Michel” has the registered middle name George.

Figure 11: Jean Michel’s Middle name fact

Rules:

The object specific rules for this method are located in the relationship “outcome check 3”.

Figure 12: Object specific rules

“NO” result

In order for this rule to pass, the “Input Name” and “Registered Name 3” do not match.

Figure 13: No match

“NO, middle name” result

In order for this rule to pass, “Input First Name” must match “Registered first Name” but the “Input Middle Name” and “Registered Middle Name” do not match.

Figure 14: Middle name does not match

“Yes, middle name” result

In order for this rule to pass, both “Input First Name” must match “Registered first Name”  and “Input Middle Name” matches “Registered Middle Name”.

Figure 15: Name and Middle Name match

“Yes, without middle name” result

To cover the last possible case, the final rule passes, if “Input First Name” matches the “Registered first Name” and  if the individual has no middle name registered or entered.

Figure 16: no middle name existing nor entered

This method works in 2 stages. The first stage is using a gating question, that determines if the ‘’ Input Name” is Active or Inactive, based on the facts assigned to the instances of “Input Name 4”.

The second stage determines if the ‘’ Input Name 4” and “Registered Name 4” match or not. We used boolean as outcome (True or False).

Figure 17: Map structure

Examples: This method uses a required quiteria or gating question to pass before comparing a set of data. An account might need to be active to be considered for a  comparison. If an account is inactive a comparison is not required. John Doe’s account is inactive and therefore does not pass.

Method 4 Build:

Concepts types:

Input Name 4, Method 4, Status and Registered Name 4 = String concepts 

Check result 4 = Boolean

The Concept “Input Name 4” has the facts for the gating question: full name instances assigned to a  “Status”: Active or Inactive.  

Rules:

The object specific rules for this method are located in the relationship “outcome check 4”.

Figure 18: Object specific rules

“True” result 

In order for this rule to pass, “Input Name 4” must be Active and match “Registered Name” 

Figure 19: Object specific rules

“False” result 1

In order for this rule to pass, “Input Name 4” must be Active however does not match “Registered Name” 

Figure 20: Active Status but not in Register

“False” result 2

And the last combination, returning False: The “Input Name 4” must be Inactive. We did not attempt to match the name against the register.

Figure 21: Inactive Status

Advance building by using Database 

Data comparison is likely used when matching an input against an existing database. The above methods can be combined with the article injecting knowledge from google spreadsheet. Please consider to change all the “Registered Name” concepts into a datasources via google sheet. This will enable injecting data into the knowledge map making it more dynamic. 

Click on the ‘Export.rbird’ button to download the ‘Data Comparison’ map used in this example. The knowledge map can then be imported into your Rainbird Studio

Query and Results

Each method has it’s own main query on the rules named ‘outcome check’. The outcome of the query varies, please check the Method accordion above for details.

Article Feedback form
Did you find this article useful?

Version 1.01 – Last Update: 26/02/2021