Key Quality Indicators
Overview
The Blindata Data Quality module leverages Key Quality Indicators (KQIs) to continuously monitor and evaluate the quality of data. Each KQI measures a specific dimension of data quality by producing a metric: a numerical value that reflects the current state of the data. Some examples include:
- The number of events collected in one hour, representing the freshness of the data.
- The number of records with a blank field, measuring completeness.
- A calculated field whose value deviates from expected source values, indicating consistency issues.
Once KQIs have been defined and measured, the Blindata framework derives two synthetic indicators:
- Score – a value between 0 and 100 that expresses how the observed KQI metric compares to the expected value.
- Semaphore – a color-coded traffic light (green, yellow, red) assigned based on the score and user-defined thresholds, offering a quick visual summary of data quality status.
These indicators provide both quantitative and intuitive insights into the health of your data.
Score
The score is a value between 0 and 100 that represents how well a Key Quality Indicator (KQI) aligns with an expected target. In other words, it measures the quality or goodness of a KQI in relation to a reference value.
The Blindata Data Quality module provides different strategies to calculate this score, depending on the type of quality check being performed:
Strategy | Description |
---|---|
Error Percentage | In this strategy, the metric represents the number of errors found among the analyzed elements. The score is computed as the normalized ratio between the number of correct elements and the total number of elements: (totalElements - metric) / totalElements. The fewer the errors, the higher the score. |
Percentage Deviation | Calculates the score based on the deviation from a target value. The formula is: [(1 - abs(totalElements - metric) / totalElements) * 100], where totalElements is the expected value, and metric is the observed value. A perfect match results in a score of 100. |
Minimum | This strategy evaluates whether the metric meets or exceeds a minimum threshold. If the metric is greater than or equal to the expected value, the score is 100. If it falls below the minimum acceptable value, the score is 0. Values in between are scored based on their relative distance from the expected value—the closer, the better. |
Maximum | The inverse of the “Minimum” strategy. A score of 100 is assigned if the metric is less than or equal to the expected value. A score of 0 is assigned if it exceeds the maximum acceptable value. Intermediate values are normalized based on their distance from the expected value. |
Distance | This strategy evaluates how close the metric is to an expected value within a defined range. If the metric equals the expected value, the score is 100. If it’s outside the minimum or maximum bounds, the score is 0. Values within the range are scored proportionally to their distance from the expected value. |
External | This strategy allows the score to be entered manually by the user. It is particularly useful in cases where the assessment of the result is based on qualitative judgment, external systems, or context-specific evaluations that cannot be automatically calculated. The user provides a score between 0 and 100, with support for decimal values to express more precise assessments. |
Semaphores
Based on the score and the alert thresholds defined by the user, each KQI is assigned a visual indicator in the form of a traffic light (green, yellow, or red). This semaphore allows users to instantly assess the status of a KQI at a glance.
In addition to providing a quick summary of KQI performance, the semaphore also enables notifications to be triggered based on data quality evaluation results.
Each scoring strategy, along with its associated thresholds, can be configured to follow different patterns for assigning semaphore values. The diagram below illustrates how traffic light colors are mapped to score ranges (expressed as percentages):