Import Metadata from PowerBI

PowerBI

This connector is designed to import metadata from PowerBI into Blindata. It uses PowerBI REST API, specifically:

  • PowerBI Metadata Scanning API to download Workspaces metadata.
  • Reports API to download Reports .pbix files, which are used to enrich Reports’ metadata within Blindata.

Requirements and Setup

PowerBI Account Setup

Before proceeding with the setup, ensure you have a PowerBI Pro license. This is required to access PowerBI REST APIs.

To set up a PowerBI Account:

  1. Create an Azure AD app and configure the PowerBI Admin console. Follow these instructions to set up the Azure AD application service principle and configure PowerBI admin settings.

  2. Log in to PowerBI as an Admin and from Tenant setting allow:

    • Allow service principles to use PowerBI APIs.
    • Allow service principles to use read-only PowerBI admin APIs.
    • Enhance admin APIs responses with detailed metadata.
  3. Go to the Azure AD app registration page, select your application, and add the following permissions (delegate permissions):

    • Dataset.Read.All
    • Report.Read.All

    To enable the download of .pbix files, add the Application as a Contributor to each of the Workspaces where the Reports you want to download are, and also to each of the Workspaces where the Dataset used by the Reports are. You can find detailed information about the limitations when downloading a report .pbix filehere .

(In some cases you may find Azure Active Directory renamed as Microsoft Entra ID.)

Connection Definition

To run a PowerBI job, a connection to a PowerBI instance must be created. Follow these steps:

  1. Go to Settings → Agents → Connections → Click on the FAB button.
  2. A modal window will be displayed, prompting the user to select a connection name and choose the “POWER_BI” type.

PowerBI selector

Then the user must fill in all the remaining fields and press “save”. PowerBI modal

Job Definition

To create a PowerBI job, navigate to Settings → Agents → Schedules and then click on the FAB button to launch the job definition modal.

In the first step, the user must specify the system on which PowerBI’s metadata will be uploaded, and the job type must be set to POWER_BI.

PowerBI job def

In the second step, the user may either accept the auto-generated job name or specify a custom one. They may also choose to set a cron expression to schedule periodic execution of the job.

PowerBI job second step

Finally, in the last step, the user must provide the JSON configuration of the job.

PowerBI last step

Below is a template of the job configuration JSON code:

{
  "connectionName": "",
  "workspaceIds": [],
  "workspacesModifiedSinceDaysNumber": 2,
  "connectionTrialNumber": 5,
  "powerBIReportsSystemName": "",
  "powerBIReportsSystemUuid": "",
  "powerBIDatasetsSystemName": "",
  "powerBIDatasetsSystemUuid": "",
  "blindataSystemPBIDataSourceMappings": [
    {
      "blindataSystemName": "",
      "blindataSystemUuid": "",
      "powerBIDataSourceId": "",
      "powerBIDataSourceType": "",
      "powerBIDataSourceRegex": ""
    }
  ],
  "dataFlowScope": "",
  "getOnlyConsumersOfDataSourcesSpecified": true,
  "analyzeReportsPbix": true,
  "outputOptions": {
    "dryRun": false,
    "targetConnectionName": ""
  },
  "cleanupOptions": {
    "daysFromLastUpdate": 7,
    "allResourcesDeletionEnabled": false,
    "physicalEntitiesDeletionEnabled": false,
    "physicalFieldsDeletionEnabled": false,
    "systemRoutinesDeletionEnabled": false,
    "dataFlowsDeletionEnabled": false
  }
}
  • connectionName: (mandatory) The name of the connection to PowerBI.

  • workspaceIds: (optional) The workspaces to analyze.If not specified, all workspaces accessible by the connection credentials will be analyzed. For organizations with a large number of workspaces, it is suggested to specify them for performance reasons and to avoid importing personal or disused workspaces.

  • workspacesModifiedSinceDaysNumber: (optional) If specified, only the workspaces that have been modified in the last X days will be analyzed. Use this option if the job is scheduled, to improve performance and run time. If you need a full scan of all PowerBI metadata, omit this option.

  • connectionTrialNumber: (optional) The number of attempts the connector makes to connect to PowerBI before the job fails (Default to 5).

  • powerBIReportsSystem(Name or Uuid): (optional) The system in Blindata where Reports metadata will be uploaded.

  • powerBIDatasetsSystem(Name or Uuid): (optional) The system in Blindata where Datasets metadata will be uploaded.

  • blindataSystemPBIDataSourceMappings: (optional) This mapping is used to link PowerBI datasets’ tables to their respective data sources (e.g., database tables) or to link PowerBI reports to their used AAS tabular models, with dataflows.

    • blindataSystem(Name or Uuid): The Blindata system where data source metadata have been loaded.
    • powerBIDataSource… (one of the following):
      • powerBIDataSourceId: the id of the data source.
      • powerBIDataSourceType: the type of the data source (E.g. AnalysisServices).
      • powerBIDataSourceRegex: a regex to apply on the DataSource json, which has the following structure:
      {
        "connectionDetails": {
          "account": "",
          "classInfo": "",
          "database": "",
          "domain": "",
          "emailAddress": "",
          "kind": "",
          "loginServer": "",
          "path": "",
          "server": "",
          "url": ""
        },
        "connectionString": "",
        "datasourceId": "",
        "datasourceType": "",
        "gatewayId": "",
        "name": ""
      }
      
  • dataFlowScope: (mandatory) the scope name of dataflows that will be generated by the Job.

  • getOnlyConsumersOfDataSourcesSpecified: (optional) if true, only workspaces and reports that utilize the data sources specified in the blindataSystemPBIDataSourceMappings option will be analyzed.

  • analyzeReportsPbix: (optional) if true, the connector will attempt to download and analyze the .pbix file of each report. Enabling this option enriches Report metadata with the utilized dimensions and measures, but it significantly slows down the analysis process.

  • outputOptions:

    • dryRun: (mandatory) leave this field set to false.
    • targetConnectionName: (mandatory) the connection name to upload results on Blindata.
  • cleanupOptions: if enabled, the job, after uploading all metadata to Blindata, will execute a cleanup. This will eliminate all the resources (of both report and dataset systems) that have not been updated (physical entities, physical fields, routines and data flows). The automatic cleanup options are:

    • daysFromLastUpdate: (mandatory) this option specifies the number of days since the last update, beyond which elements will be deleted if they have not been updated.
    • allResourcesDeletionEnabled: if set to true, resources of all types will be deleted.
    • physicalEntitiesDeletionEnabled: (optional) if set to true, the physical entities will be deleted.
    • physicalFieldsDeletionEnabled: (optional) if set to true, the physical fields will be deleted.
    • systemRoutinesDeletionEnabled: (optional) if set to true, the system routines will be deleted.
    • dataFlowsDeletionEnabled: (optional) if set to true, the dataflows will be deleted.

Logging PowerBI data sources

Before initiating a full scan PowerBI job or scheduling an incremental one, it’s beneficial to compile a comprehensive list of all data sources utilized across available PowerBI workspaces. To achieve this:

  1. Create a PowerBI job: Initially, generate a PowerBI job. It’s recommended to deactivate the .pbix analysis to expedite the initial run.
  2. Press “TEST”: Execute the job and press the “TEST” button.
  3. Review Run Log: After the job concludes, access the run log. This log is accessible under the “RUNS” tab of the Agent detail page.

Following these steps ensures you have a complete inventory of data sources before proceeding to complete the PowerBI Job configuration.

PowerBI to Blindata Model

This section explains how the metadata model in Power BI is mapped to the metadata model in Blindata.

Metadata is available from PowerBI’s scanners APIs , and each entity in PowerBI is mapped to an entity in Blindata as follows:

  • Workspace: A Workspace is not directly mapped to a physical entity in Blindata. Instead, the workspace name is included in the schema names of other entities that represent the resources belonging to the workspace.

  • Dashboard → Physical Entity: A Dashboard is mapped to a physical entity. The name is the dashboard display name, the schema is the workspace name, the table type is DASHBOARD, and additional properties include DASHBOARD_ID (the Id of the dashboard) and DASHBOARD_APP_ID (the Id of the application).

  • Tile → Data Flow: A Tile is not directly mapped to a physical entity in Blindata. Instead, it is represented as a dataFlow from a Dataset to a Dashboard.

  • Report → Physical Entity: A Report is mapped to a physical entity. The name is the report name, the schema is the workspace name, the table type is REPORT, and additional properties include REPORT_ID (the Id of the report, used as a natural key for capturing report changes), APP_ID (the Id of the application), CREATED_BY, CREATED_BY_ID, MODIFIED_BY, MODIFIED_BY_ID, MODIFIED_DATE_TIME, DATASET_ID, DATASET_WORKSPACE_ID, and LINK_REPORT (the URL to the report).

  • Dataset → Physical Entity: A Dataset is mapped to a physical entity. The name is the dataset name, the schema is the workspace name, the table type is DATASET, and additional properties include DATASET_ID, DATASET_CONFIGURED_BY, and creationDate.

  • Table → Physical Entity: A Table is mapped to a physical entity. The name is the table name, the schema is “«workspace name».«dataset name»”, the table type is DATASET_TABLE, and the description contains both the table description and the table expressions. Additional properties include WORKSPACE_ID and DATASET_ID.

  • Column → Physical Field: A Column is mapped to a physical field. The name is the column name, the type is the column dataType, and additional properties include FORMAT_STRING, DATA_CATEGORY, SUMMARYZE_BY, and SORT_BY_COLUMN.

  • Measure → Physical Field: A Measure is mapped to a physical field. The name is the measure name, the type is MEASURE, the description is the measure expression, and additional properties include FORMAT_STRING.

  • DataSource → Physical Entity: The data source and its entities (tables and columns) must already be defined in Blindata’s metadata catalog. The connector parses the source expressions in the dataset tables (written in Power Query M language) and connects the data source entities with the mapped tables and columns using Blindata’s dataFlows.

  • DataFlow: A PowerBI’s dataflow is not mapped to Blindata.

Metadata available from PowerBI REST APIs includes information extracted from .pbix files to enhance Report metadata and capture information on Datasets utilized within the Report:

  • Measures → Physical Field: A Measure (from the Datasets or from Report’s Model Extension) is mapped to a physical field. Its physical entity is the one to which the Report has been mapped. The name is “«Table name» : «Measure name»”, the type is MEASURE, the description is the measure’s Expression, and additional properties include section_name (indicating the Report’s section where the Measure is utilized) and model_extension_name (the name of the model extension, applied only to the Measures of the Model Extension).

  • Dataset.Table.Columns → Physical Field: A Column from a Dataset’s Table is mapped to a physical field. Its physical entity is the one to which the Report has been mapped. The name is “«Table name» : «Column name»”, the type is the Column’s type, the description is the Column’s description, and additional properties include section_name (indicating the Report’s section where the Column is utilized).

Data Lineage

The PowerBI connector not only imports and maps all PowerBI elements in Blindata (see the previous paragraph for details) but also reconstructs and generates data lineage, represented as Blindata dataflows.

Lineage Within PowerBI

Starting entity Ending entity Description
Report Application
Table Dataset
Dataset Report
Dataset Dataset
Dataset Dashboard The Tile name is included in the description of the dataflow.
(Source System –> Table) Column (Report –> Model Extension) Measure Available only with .pbix analysis. The source system table can originate from a PowerBI dataset or an external data source (e.g., an AAS system).This lineage can be reconstructed in two ways: a. By analyzing the metadata of the .pbix file. b. By iterating through the source system associated with the report dataset and checking for matches in the Measure expression. The matching patterns are either 'name' or [name].
(Report –> Model Extension) Measure (Report –> Model Extension) Measure Available only with .pbix analysis. This is the lineage within the Report itself.

Lineage From Source Systems to PowerBI

(Datasource) Starting entity (PowerBI) Ending entity Description
Table Table (Dataset)
Column Column (Dataset)

PowerBI .pbix and Azure Analysis Services (AAS)

This section explains the steps that must be performed to correctly retrieve all Report’s .pbix metadata when the Report’s Dataset has an AAS tabular model in its Datasources.

  • Each AAS tabular model must be loaded in Blindata within a single system.
  • The schema of the physical entities of an AAS tabular model must be set to its name (the database field on Datasource metadata).
  • The setup for the .pbix analysis must be completed correctly .

In the job configuration JSON code, add an entry to the attribute “blindataSystemPBIDataSourceMappings” with the following values:

  • “blindataSystemName”: the name of the system in which the Tabular Models metadata has been loaded.

  • “blindataSystemUuid” (optional): the uuid of the system in which the Tabular Models metadata has been loaded.

  • “powerBIDataSource…”:

    • powerBIDataSourceId: use this to map each AAS data source to a Blindata system. (Suggested for manual mapping of each AAS database to a Blindata system).
    • powerBIDataSourceType: set this to “AnalysisServices” if you want to to map every AAS data source to a single Blindata system.
    • powerBIDataSourceRegex: use this to map a specific set of AAS data sources to a single Blindata system. For example, to map AAS data sources of one region to a Blindata system, if the region name is specified in the server URL, an example of regex can be "(?i).*\"server\"\\s*:\\s*\".*regionName.*\".*".

    The DataSource is a Json with the following structure:

    {
      "connectionDetails": {
        "account": "",
        "classInfo": "",
        "database": "",
        "domain": "",
        "emailAddress": "",
        "kind": "",
        "loginServer": "",
        "path": "",
        "server": "",
        "url": ""
      },
      "connectionString": "",
      "datasourceId": "",
      "datasourceType": "",
      "gatewayId": "",
      "name": ""
    }
    

Troubleshooting

You can view the run log of a job in the “RUNS” tab on the Agent detail page. Some logs are linked to specific codes. Refer to the table below for insights into the meaning of each code, along with possible causes and resolutions.

Code Description Possible Solutions
<1> The job failed to delete temporary csv file used to upload information on Blindata. Contact support.
<2> The job failed to download the AAS data catalog from Blindata. Verify the job’s target connection and test it to ensure everything is functioning correctly.
<3> The job failed to download PowerBI Workspaces Metadata using PowerBI’s scanners API. Verify the job’s connection and test it to ensure everything is functioning correctly. Check if the PowerBI user specified in the connection has the right permissions.
<4> , <5> The job failed to download the Report .pbix file. Ensure that the PowerBI user specified in the connection has the necessary permissions to download the .pbix file. Also, verify that the report was not created using the web interface, as the option to download the .pbix file should be visible in the Report web interface.
<6> The job can not find a dataset datasource in the Workspace metadata. The datasource could be a temporary file or the PowerBI user specified in the connection does not have the necessary permissions.
<7> The job failed to download the data catalog for an AAS system. The AAS datasource (or the job target connection) are not properly configured .
<8> The workspace metadata does not contains any dataset tables. The dataset could be generated from a temporary file or the PowerBI user specified in the connection has not the necessary permissions.
<9> The job could not retrieve tables for a report dataset. Verify that the dataset exists and is accessible. Check if the dataset is using DirectQuery or a live connection that the job might not support. Ensure the PowerBI user has the necessary permissions.
<10> The report does not contains the Dataset Id information. It could be an empty report or connected to a Live Dataset (not supported).
<11> The job could not find data source tables references in an expression of a report measure. Check if the datasource contains the expected tables. Check if the datasources job configurations are correct.
<12> The job could not find the data source table column used by the report measure. Check if the datasource contains the expected tables. Check if the datasources job configurations are correct.
<13> The job configuration to download the data source catalog from Blindata is missing. Ensure that the job configuration includes the correct settings to retrieve the data source catalog. If required, update the job parameters in the configuration.
<14> The parsing of a PowerQueryM expression has failed. Verify that the PowerQueryM expression is correctly formatted. Check for syntax errors or unsupported expressions. See here for more details.

Power Query M parsing

Both the PowerBI and the Azure Analysis Services connectors use a custom parser to analyze Power Query M code and extract lineage information from datasources tables to their tables. This grammar defines a parser for a subset of the Power Query M language, supporting variable declarations, expressions, data access, and table manipulations. It follows a structured approach with a few key characteristics:

Supported Language Constructs

  1. Variable Declarations & Assignments

    • Variables can be declared using the let keyword.
    • Assignments are performed using the = symbol.
    • Variable names can consist of letters, numbers, underscores, hyphens, and slashes.
    • Quoted variable names are supported using #" or ", followed by the name, and ending with ".
  2. Expressions

    • Expressions are separated by commas.
    • Expressions support:
      • Assigning variables to other variables.
      • Assigning variables to database access operations.
      • Assigning variables to table manipulation operations.
  3. Database Access

    • Built-in keywords for database connections (MySQL.Database, Oracle.Database, Snowflake.Databases).
    • Database access operations accept arguments in parentheses, including:
      • The database name.
      • The table or dataset name.
      • An optional list of record fields.
  4. Table Manipulations

    • Specific operations on tables are supported using a predefined syntax.
    • Table manipulations follow a structured format and reference an existing variable.
  5. Record Structures

    • Defined using square brackets [...].
    • Contain key-value pairs separated by =.
    • Can include variable references or database queries.
  6. Quoted Strings

    • Special handling for quoted variables enclosed within #" or ".

Examples

  1. Variable Assignment with Database Access

    let
      customers = MySQL.Database("server_name", "customers_table")
    in
      customers
    

    Explanation:

    • Declares a variable customers.
    • Assigns it to a database connection (MySQL.Database).
    • Specifies the server and table as arguments.
    • Returns customers at the end.
  2. Variable Assignment with Another Variable and Data Access

    let
        source = Oracle.Database("db_server", "sales_data"),
        sales = source{[data]}
    in
        sales
    

    Explanation:

    • source is assigned to an Oracle database connection.
    • sales extracts the data field from source.
    • Returns sales as the final output.
  3. Table Manipulation with a Quoted Variable

    let
        #"Filtered Sales" = Table.SelectRows(sales, each [Amount] > 1000)
    in
        #"Filtered Sales"
    

    Explanation:

    • A quoted variable (#"Filtered Sales") is used as a variable name.
    • Assigns it to a table filtering operation on sales, selecting rows where Amount > 1000.
    • Returns the filtered table.