Automated SQL Lineage

How Automated SQL Lineage works

SQL Lineage module has as a core feature the automatic extraction of data lineage from SQL statement, speeding up documentation and mapping processes. As data lineage is the process of comprehension, storage and viewing of the entire data life cycle. Data lineage has among its purposes to:

  • View the interactions and correlations that data has with each other.
  • Detect potential errors in data processing.
  • Implement changes to data transformation processes with less risk.

Many of the useful information for the reconstruction of the data lineage are inside the SQL statements used for:

  • Creating views
  • Statement contained in stored procedure (es. PL/SQL).
  • Statement of insert, update, merge or create table
  • Statement of select used to extract data and populate tables on other systems.

The manual analysis of these statements requires a a watchful and expert eye and it is an extremely expensive and, in particular, not scalable process.

To overcome this problem SQL Lineage module offers a statement analysis functionality, which allows users to rapidly view the changes that occur inside themselves and to automatically generate data flows between the tables involved.

In addition to simplifying the statement analysis process, SQL lineage module facilitates the creation of flows between connected tables but not directly involved by statements.

Examples of SQL Lineage Analysis

In this paragraph are presented some sql statements examples which participate in the lineage definition.

Create View Statement

Create View SQL Lineage Examples

Insert Statement

Insert SQL Lineage Examples

Update Statement

Update SQL Lineage Examples

Merge Statement

Merge SQL Lineage Examples

Select Statement

Select SQL Lineage Examples

Select Into Statement

Select Into SQL Lineage Examples

Create Table Statement

Create Table SQL Lineage Examples

Definitions

Terms Definitions
Statement A Statement represents a SQL command which is analysed by Parser in oder to gain a Lineage. A Statement can be one of the following status: PENDING, ANALYZED, PARTIALLY_ANALYZED, ERROR, IGNORED, CONFIRMED.A Statement has descriptive attributes such as name and description.The attribute “scheme” indicates the SQL scheme, which is the namespace of a table set. A Statement is contained inside a Bucket.
Bucket A Bucket is a container of Statements. In a Bucket is defined a list of systems from which a Catalog is extracted. A Bucket has descriptive attributes such as name and description.
Catalog Catalog represents a set of metadata about tables and relatives columns inside a database. In Blindata are called respectively Physical Entity e Physical Field.
Parser Parser is the component which allows to analyze a SQL statement, creating the syntax tree e from this generates a Lineage represented in the form of a graph.
SQL Lineage Lineage SQL (or simply Lineage) is a data lineage based on SQL statements analysis. Data lineage is a visual representation of the data path, which highlights how those are manipulated.
Data Flow A DataFlow is a link that can occur among the following entities: Physical Entities, Physical Fields and Systems. A DataFlow represents a data flow from a source to a destination. Thanks to the DataFlow set it is possible to reconstruct the data lineage graph, a representation of the data lifecycle.