Key Quality Indicators
Overview
The Blindata Data Quality module relies on the use of Key Quality Indicators to monitor data quality. Measuring KQIs produces a metric that represents the value of a certain quality dimension for certain data. Some examples are:
- The number of events collected in one hour (KQI) represents the freshness of the data.
- The number of items with a given blank field (KQI) identifies how complete the data is.
- The value of a calculated field that does not reflect the value of the starting fields (KQI) is a measure of how consistent the data is.
Starting from the identified and subsequently measured KQIs, the Blindata framework calculates two different synthetic indicators. The first indicator is the score. The score is a value from zero to hundred which identifies the positioning of the KQI value against an expected value. Based on the score and alert thresholds defined by the user, each KQI is assigned a further synthetic indicator as a traffic light (green, yellow, red). The traffic light allows you to check the performance of the KQI immediately, at a glance.
Score
As we said, the score is a value from zero to hundred which identifies the positioning of the KQI value against an expected value. In other words, the score measures the goodness of a KQI.
The Blindata Data Quality module provides different strategies for calculating the score:
Strategy | |
---|---|
Error Percentage | Through this strategy, the metric is to be understood as the number of errors found on the total number of analyzed elements. The calculation of the score is the normalized ratio between the total number of elements considered minus the value of the metric and the total number of elements [i.e. (totalElements - metric)/totalElements]. The score calculated with this strategy will be higher the lower the value of the metric is compared to the total number of elements taken into consideration. |
Percentage Deviation | Calculates the score as the percentage deviation from a target value. The exact formula for calculating the score is [1 - abs(totalElements - metric)/totalElements] x 100 where totalElements takes the meaning of the target value while metric is the value to check. In case the value to check (metric) is identical to the target value (totalElements) there will be a score equal to 100. |
Minimum | With this strategy, a score is calculated which expresses how the metric relates to an expected value and a minimum value. If the metric is greater than or equal to the expected value, the score will be 100. If the metric is less than the minimum acceptable value, the score will be 0. If the metric is between the minimum value and the expected value, the score will represent the normalized distance from the expected value; the shorter the distance, the higher the score. |
Maximum | Contrary to the previous one, with this strategy a score is calculated which expresses how the metric relates to an expected value and a maximum value. If the metric is less than or equal to the expected value, the score will be 100. If the metric is greater than the maximum acceptable value, the score will be 0. If the metric is between the expected value and the maximum value, the score will represent the normalized distance from the expected value; the shorter the distance, the higher the score. |
Distance | This strategy calculates a score that expresses how close the metric is to an expected value and within a fixed range. If the metric is equal to the expected value, the score will be 100. If the metric is less than the minimum acceptable value or is greater than the maximum acceptable value, the score will be 0. If the metric is between the minimum and maximum value, the score will represent the normalized distance from the expected value; the shorter the distance, the higher the score. |
Semaphores
Based on the score and alert thresholds defined by the user, each KQI is assigned a further synthetic indicator as a traffic light (green, yellow, red). The traffic light allows you to check the performance of the KQI immediately, at a glance. In addition, semaphore evaluation enables notification of data quality process results to users.
Scoring strategies and related thresholds can be used to implement different patterns for assigning a semaphore. The following diagram shows the colors of the traffic light assigned in relation to the score expressed as a percentage.