Data Comparison
Comparing data is a common method when building Rainbird maps. In this article, we will show FOUR different optimised methods of comparing instances within Rainbird.
Intent
It is a common practice to compare data looking at two sets of source tables matching data fields, another practice could be to match existing data in a system against data entered at run-time.
Applicability
When can this pattern be applied?
- When working with string, number, date or True/False (boolean) concepts
- Overall, data comparisons could be KYC, account confirmation, inventory checks and many others.
Limitations
- Unknown input (if not specified as missing data)
- Incorrect spelling (NLP)
- variations of undefined uppercase or lowercase
- No machine learning adaptability
Four Methods of Data comparison
Method 1 (M1) – looks for an item match (string, integer, boolean or date type) entered in the query and searches a data source or a stored fact to compare.
Method 2 (M2) – An extension of M1 that factors in possible alias (synonyms) of the item entered/stored within the map which makes it more dynamic.
Method 3 (M3) – Also known as Fuzzy matching. M3 factors in more concepts, and can, therefore, match known or unknown items entered.
Method 4 (M4) – Checks if the item entered ‘status’ (it’s either active or inactive) then verifies that it corresponds (M1)
Note that this list is not exhausted, those mentioned are the most common methods.
Elements used in this build
‘Elbow‘ – Object-specific Rules
ADVANCED
This method determines if ‘’ Input Name” and “Registered Name” matches. The result is set to boolean, either True or False.
Example: A perfect match of both data fields is required: John Dowe = John Dowe – any alterations whitespace, upper/lower case, spelling or additional information are not taken into consideration.
Method 1 Build:
Concepts types:
Input Name, Method 1, and Registered Name = String concepts
Check result = Boolean
Figure 1: Generic Structure of the ‘Elbow Technique’
Rules:
All the rules for this method are located within the relationship “outcome check 1”. Both rules are object specific.
Figure 2: Object specific Rules
“True” result
In order for this rule to trigger, “Input Name” must match “Registered Name”.
Figure 3: Matching variables result in ‘True’
“False” result
In order for this rule to trigger, “Input Name” and “Registered Name” does not match.
Figure 4: Not matching variables result in ‘False’
This method determines if ‘’ Input Name” either matches “Registered Name” or “Alias” or neither. The outcome is set to boolean and can be True or False.
Example: Alterations of the data is possible if defined in Rainbird: John Dowe is a english placeholder for a generic name field. Therefore we allow other languages (i.e. German: Max Mustermann or an abbreviation J. Dowe to pass the comparison.
Method 2 Build:
Concepts types:
Method 2, Data to check 2, Register 2, Alias 2 = String concepts
Check result 2 = Boolean
Figure 5: Generic Structure of the ‘Elbow Technique’
Facts:
The “Alias 2” concept has instances which are variations of the same name connected to instances in “Register 2”, those will be considered as an ‘OR’ case for a match.
In our example Jane Doe matches against Erika Mustermann (German Version of Jane Doe) and Jeanne Michelle (French Version).
Figure 6: Concept Register 2: Both Register names have an alias
Rules:
All four rules for this method are located in the relationship “outcome check 2”. All rules are object specific.
Figure 7: Object specific Rules
For better visibility, we only list the two new rules considering the Alias, as the other two rules are copies Method 1.
“True” result
In order for this rule to pass, “Input Name” must match either “Registered Name” or the assigned Alias fact.
Figure 8: Object specific ‘True’ rule including Alias
“False” result
In order for this rule to pass, “Input Name” does not match “Registered Name” or any Alias’s.
Figure 9: Object specific ‘False’ rule including Alias
This method allows users to deal with unknown comparison outcomes by asking a question, and give an outcome even if there is no match to the fact(s) in the Knowledge map. This is achieved with a countRelationshipInstance function.
Figure 10: Generic Structure of the ‘Elbow Technique’
Example: This comparison example is looking at missing or unknown data. This data could be a middle name. The Method shown is comparing and matching if a middle name exists and matches and in addition deals with missing data. Jean Michel might have entered his middle name George in the original data, but in a new data set forgot to enter it. Important to mention is, that a different middle name would still be a miss-match!
Method 3 Build:
Concepts types:
Input Name 3, Input middle name, Method 3, Registered Name 3, Registered Middle Name and Check result 3 = String concepts
The result(s) of the comparisons are one of the instances of Check result 3:
- No
- No, middle name
- Yes, middle name
- Yes without middle
All used Concepts are String types
Example Instances:
The “Register 3” concept holds instances which are linked alias facts of the concept instances in “Registered Middle Name”. For example Register “John Michel” has the registered middle name George.
Figure 11: Jean Michel’s Middle name fact
Rules:
The object specific rules for this method are located in the relationship “outcome check 3”.
Figure 12: Object specific rules
“NO” result
In order for this rule to pass, the “Input Name” and “Registered Name 3” do not match.
Figure 13: No match
“NO, middle name” result
In order for this rule to pass, “Input First Name” must match “Registered first Name” but the “Input Middle Name” and “Registered Middle Name” do not match.
Figure 14: Middle name does not match
“Yes, middle name” result
In order for this rule to pass, both “Input First Name” must match “Registered first Name” and “Input Middle Name” matches “Registered Middle Name”.
Figure 15: Name and Middle Name match
“Yes, without middle name” result
To cover the last possible case, the final rule passes, if “Input First Name” matches the “Registered first Name” and if the individual has no middle name registered or entered.
Figure 16: no middle name existing nor entered
This method works in 2 stages. The first stage is using a gating question, that determines if the ‘’ Input Name” is Active or Inactive, based on the facts assigned to the instances of “Input Name 4”.
The second stage determines if the ‘’ Input Name 4” and “Registered Name 4” match or not. We used boolean as outcome (True or False).
Figure 17: Map structure
Examples: This method uses a required quiteria or gating question to pass before comparing a set of data. An account might need to be active to be considered for a comparison. If an account is inactive a comparison is not required. John Doe’s account is inactive and therefore does not pass.
Method 4 Build:
Concepts types:
Input Name 4, Method 4, Status and Registered Name 4 = String concepts
Check result 4 = Boolean
The Concept “Input Name 4” has the facts for the gating question: full name instances assigned to a “Status”: Active or Inactive.
Rules:
The object specific rules for this method are located in the relationship “outcome check 4”.
Figure 18: Object specific rules
“True” result
In order for this rule to pass, “Input Name 4” must be Active and match “Registered Name”
Figure 19: Object specific rules
“False” result 1
In order for this rule to pass, “Input Name 4” must be Active however does not match “Registered Name”
Figure 20: Active Status but not in Register
“False” result 2
And the last combination, returning False: The “Input Name 4” must be Inactive. We did not attempt to match the name against the register.
Figure 21: Inactive Status
Advance building by using Database
Data comparison is likely used when matching an input against an existing database. The above methods can be combined with the article injecting knowledge from google spreadsheet. Please consider to change all the “Registered Name” concepts into a datasources via google sheet. This will enable injecting data into the knowledge map making it more dynamic.
Click on the ‘Export.rbird’ button to download the ‘Data Comparison’ map used in this example. The knowledge map can then be imported into your Rainbird Studio
Query and Results
Each method has it’s own main query on the rules named ‘outcome check’. The outcome of the query varies, please check the Method accordion above for details.