Automated SQL Lineage
How Automated SQL Lineage works
SQL Lineage module has as a core feature the automatic extraction of data lineage from SQL statement, speeding up documentation and mapping processes. As data lineage is the process of comprehension, storage and viewing of the entire data life cycle. Data lineage has among its purposes to:
- View the interactions and correlations that data has with each other.
- Detect potential errors in data processing.
- Implement changes to data transformation processes with less risk.
Many of the useful information for the reconstruction of the data lineage are inside the SQL statements used for:
- Creating views
- Statement contained in stored procedure (es. PL/SQL).
- Statement of insert, update, merge or create table
- Statement of select used to extract data and populate tables on other systems.
The manual analysis of these statements requires a a watchful and expert eye and it is an extremely expensive and, in particular, not scalable process.
To overcome this problem SQL Lineage module offers a statement analysis functionality, which allows users to rapidly view the changes that occur inside themselves and to automatically generate data flows between the tables involved.
In addition to simplifying the statement analysis process, SQL lineage module facilitates the creation of flows between connected tables but not directly involved by statements.
Examples of SQL Lineage Analysis
In this paragraph are presented some sql statements examples which participate in the lineage definition.
Create View Statement
Insert Statement
Update Statement
Merge Statement
Select Statement
Select Into Statement
Create Table Statement
Definitions
Terms | Definitions |
---|---|
Statement | A Statement represents a SQL command which is analysed by Parser in oder to gain a Lineage. A Statement can be one of the following status: PENDING, ANALYZED, PARTIALLY_ANALYZED, ERROR, IGNORED, CONFIRMED.A Statement has descriptive attributes such as name and description.The attribute “scheme” indicates the SQL scheme, which is the namespace of a table set. A Statement is contained inside a Bucket. |
Bucket | A Bucket is a container of Statements. In a Bucket is defined a list of systems from which a Catalog is extracted. A Bucket has descriptive attributes such as name and description. |
Catalog | Catalog represents a set of metadata about tables and relatives columns inside a database. In Blindata are called respectively Physical Entity e Physical Field. |
Parser | Parser is the component which allows to analyze a SQL statement, creating the syntax tree e from this generates a Lineage represented in the form of a graph. |
SQL Lineage | Lineage SQL (or simply Lineage) is a data lineage based on SQL statements analysis. Data lineage is a visual representation of the data path, which highlights how those are manipulated. |
Data Flow | A DataFlow is a link that can occur among the following entities: Physical Entities, Physical Fields and Systems. A DataFlow represents a data flow from a source to a destination. Thanks to the DataFlow set it is possible to reconstruct the data lineage graph, a representation of the data lifecycle. |