Classification Assignments

What is a data classification assignment?

Assignments signify potential matches between the physical elements within a system and the fields outlined in the business glossary. These assignments originate from evaluations of rules applied to columns and tables, where multiple rules may be applicable to a single field.

Each assignment falls into one of three distinct states:

  • CONFIRMED: Indicates that the assignment was confirmed automatically or by an user.
  • PENDING: Indicates that the assignment awaits confirmation from an external user.
  • REFUSED: Indicates that the evaluation is deemed incorrect, leading to the rejection of the association.

Automation thresholds can be established to streamline the confirmation, rejection, or suspension processes for assignments. Further details on these thresholds will be elaborated in the subsequent sections.

Assignments computation

Assignments arise from one or more evaluations produced by defined rules, allowing for multiple evaluations on a specific physical object. In instances where the same physical element aligns with a target in the business glossary, the final assignment score is determined by selecting the highest score among all evaluations. This scenario is particularly relevant when establishing multiple rules aimed at classifying the same business glossary item.

Before consolidating diverse ratings into a unified assignment, the ratings may exhibit a composition similar to the image below:

Classification assignment composition table

In certain instances, information may be incomplete, such as lacking complete references to the business entity, only having references to the term. The primary objective is to automatically bridge this information gap. The cases considered are:

  1. Inference from Column Metadata: The business glossary entity can be inferred from the metadata of the column itself. In this case it will be necessary to define two rules: the first defined on the metadata of the column, with a rule of the FIELD_METADATA type, through which to find the business glossary entity and the second which acts through a regular expression a rule of the REGEX_DATA_RULE type with which to associate the logical field

  2. Inference from Table Metadata: The business glossary entity can be inferred from the table metadata. In this case, there are at least two rules to be defined, one of the METADATA_TABLE type with which to search the name of the table to get the entity of the business glossary, and one of the REGEX_DATA or METADATA_FIELD type to find the related logical fields

The procedures governing the management of incomplete assignments are as follows:

  • For each incomplete assignment, it checks whether there is another assignment on that column with the business glossary element present. If it exists, the one with the highest score is taken. The resulting assignment score will be the average between the two assignments.

  • If there is still an incomplete assignment, it checks whether an assignment exists on the table containing the missing business glossary entity; if it exists the one with the highest score is taken. In this case, the final score will be given by multiplying the score of the incomplete assignment and the evaluation found.

  • In addition, there may be redundant assignments, i.e. assignments on the same physical structure, with identical business glossary items. In the case of redundant evaluations, only those with the highest score are kept.

How to manage assignments

To access the assignments in Blindata, click on the button at the top right as shown in the figure. Assignments location

The list automatically shows the assignments in “pending” status, that is those that require external intervention by the user. The display can be conditioned according to the user’s needs through the filtering options, accessible through the relative button at the top right.

Assignments filters

Once the form is open, you can filter by:

  • State
  • System
  • Data Categories
  • Physical Entities
  • Physical Fields
  • Logical Fields

Assignments filters modal

If you want to have an unfiltered view, you need to click the “clear” button inside the form. It is also possible to download the visualization of the assignments in csv format through the dedicated button.

Assignments export

The table allows direct editing of the status of the assignments through the “Actions” section defined for each element.

Assignments threshold configuration

It is also possible to modify the target logical fields and the data categories to which the assignment refers, always through the object shown in the list.

Assignments threshold configuration

By clicking on the light bulb icon, it is possible to modify the logical field, selecting one of those available, using the form dedicated to the search.

Automation of assignment evaluation

The page allows you to define thresholds to automate the process of confirming, rejecting or suspending the assignment. To apply the thresholds, click the gear-shaped button as shown in the figure.

Assignments threshold configuration

The resulting form allows the definition of score ranges for the automation of the assignment process.

Assignments threshold configuration

In the example case, assignments with scores from 0 to 0.35 will be rejected, from 0.36 to 0.75 will be left pending, while assignments with scores from 0.76 to 1 will be confirmed automatically.