Import Metadata from PowerBI

PowerBI

This connector is designed to import metadata from PowerBI into Blindata. It uses PowerBI REST API, specifically:

  • PowerBI Metadata Scanning API to download Workspaces metadata.
  • Reports API to download Reports .pbix files, which are used to enrich Reports’ metadata within Blindata.

Requirements and Setup

PowerBI Account Setup

Before proceeding with the setup, ensure you have a PowerBI Pro license. This is required to access PowerBI REST APIs.

To set up a PowerBI Account:

  1. Create an Azure AD app and configure the PowerBI Admin console. Follow these instructions to set up the Azure AD application service principle and configure PowerBI admin settings.

  2. Log in to PowerBI as an Admin and from Tenant setting allow:

    • Allow service principles to use PowerBI APIs.
    • Allow service principles to use read-only PowerBI admin APIs.
    • Enhance admin APIs responses with detailed metadata.
  3. Go to the Azure AD app registration page, select your application, and add the following permissions (delegate permissions):

    • Dataset.Read.All
    • Report.Read.All

    To enable the download of .pbix files, add the Application as a Contributor to each of the Workspaces where the Reports you want to download are, and also to each of the Workspaces where the Dataset used by the Reports are. You can find detailed information about the limitations when downloading a report .pbix filehere .

(In some cases you may find Azure Active Directory renamed as Microsoft Entra ID.)

Connection Definition

To run a PowerBI job, a connection to a PowerBI instance must be created. Follow these steps:

  1. Go to Settings → Agents → Connections → Click on the FAB button.
  2. A modal window will be displayed, prompting the user to select a connection name and choose the “POWER_BI” type.

PowerBI selector

Then the user must fill in all the remaining fields and press “save”. PowerBI modal

Job Definition

To create a PowerBI job, navigate to Settings → Agents → Schedules and then click on the FAB button to launch the job definition modal.

In the first step, the user must specify the system on which PowerBI’s metadata will be uploaded, and the job type must be set to POWER_BI.

PowerBI job def

In the second step, the user may either accept the auto-generated job name or specify a custom one. They may also choose to set a cron expression to schedule periodic execution of the job.

PowerBI job second step

Finally, in the last step, the user must provide the JSON configuration of the job.

PowerBI last step

Below is a template of the job configuration JSON code:

{
  "connectionName": "",
  "workspaceIds": [],
  "workspacesModifiedSinceDaysNumber": 2,
  "connectionTrialNumber": 5,
  "powerBIReportsSystemName": "",
  "powerBIReportsSystemUuid": "",
  "powerBIDatasetsSystemName": "",
  "powerBIDatasetsSystemUuid": "",
  "blindataSystemPBIDataSourceMappings": [
    {
      "blindataSystemName": "",
      "blindataSystemUuid": "",
      "powerBIDataSourceId": "",
      "powerBIDataSourceType": "",
      "powerBIDataSourceRegex": ""
    }
  ],
  "dataFlowScope": "",
  "getOnlyConsumersOfDataSourcesSpecified": true,
  "analyzeReportsPbix": true,
  "outputOptions": {
    "dryRun": false,
    "targetConnectionName": ""
  },
  "cleanupOptions": {
    "daysFromLastUpdate": 7,
    "allResourcesDeletionEnabled": false,
    "physicalEntitiesDeletionEnabled": false,
    "physicalFieldsDeletionEnabled": false,
    "systemRoutinesDeletionEnabled": false,
    "dataFlowsDeletionEnabled": false
  }
}
  • connectionName: (mandatory) The name of the connection to PowerBI.

  • workspaceIds: (optional) The workspaces to analyze.If not specified, all workspaces accessible by the connection credentials will be analyzed. For organizations with a large number of workspaces, it is suggested to specify them for performance reasons and to avoid importing personal or disused workspaces.

  • workspacesModifiedSinceDaysNumber: (optional) If specified, only the workspaces that have been modified in the last X days will be analyzed. Use this option if the job is scheduled, to improve performance and run time. If you need a full scan of all PowerBI metadata, omit this option.

  • connectionTrialNumber: (optional) The number of attempts the connector makes to connect to PowerBI before the job fails (Default to 5).

  • powerBIReportsSystem(Name or Uuid): (optional) The system in Blindata where Reports metadata will be uploaded.

  • powerBIDatasetsSystem(Name or Uuid): (optional) The system in Blindata where Datasets metadata will be uploaded.

  • blindataSystemPBIDataSourceMappings: (optional) This mapping is used to link PowerBI datasets’ tables to their respective data sources (e.g., database tables) or to link PowerBI reports to their used AAS tabular models, with dataflows.

    • blindataSystem(Name or Uuid): The Blindata system where data source metadata have been loaded.
    • powerBIDataSource… (one of the following):
      • powerBIDataSourceId: the id of the data source.
      • powerBIDataSourceType: the type of the data source (E.g. AnalysisServices).
      • powerBIDataSourceRegex: a regex to apply on the DataSource json, which has the following structure:
      {
        "connectionDetails": {
          "account": "",
          "classInfo": "",
          "database": "",
          "domain": "",
          "emailAddress": "",
          "kind": "",
          "loginServer": "",
          "path": "",
          "server": "",
          "url": ""
        },
        "connectionString": "",
        "datasourceId": "",
        "datasourceType": "",
        "gatewayId": "",
        "name": ""
      }
      
  • dataFlowScope: (mandatory) the scope name of dataflows that will be generated by the Job.

  • getOnlyConsumersOfDataSourcesSpecified: (optional) if true, only workspaces and reports that utilize the data sources specified in the blindataSystemPBIDataSourceMappings option will be analyzed.

  • analyzeReportsPbix: (optional) if true, the connector will attempt to download and analyze the .pbix file of each report. Enabling this option enriches Report metadata with the utilized dimensions and measures, but it significantly slows down the analysis process.

  • outputOptions:

    • dryRun: (mandatory) leave this field set to false.
    • targetConnectionName: (mandatory) the connection name to upload results on Blindata.
  • cleanupOptions: if enabled, the job, after uploading all metadata to Blindata, will execute a cleanup. This will eliminate all the resources (of both report and dataset systems) that have not been updated (physical entities, physical fields, routines and data flows). The automatic cleanup options are:

    • daysFromLastUpdate: (mandatory) this option specifies the number of days since the last update, beyond which elements will be deleted if they have not been updated.
    • allResourcesDeletionEnabled: if set to true, resources of all types will be deleted.
    • physicalEntitiesDeletionEnabled: (optional) if set to true, the physical entities will be deleted.
    • physicalFieldsDeletionEnabled: (optional) if set to true, the physical fields will be deleted.
    • systemRoutinesDeletionEnabled: (optional) if set to true, the system routines will be deleted.
    • dataFlowsDeletionEnabled: (optional) if set to true, the dataflows will be deleted.

Logging PowerBI data sources

Before initiating a full scan PowerBI job or scheduling an incremental one, it’s beneficial to compile a comprehensive list of all data sources utilized across available PowerBI workspaces. To achieve this:

  1. Create a PowerBI job: Initially, generate a PowerBI job. It’s recommended to deactivate the .pbix analysis to expedite the initial run.
  2. Press “TEST”: Execute the job and press the “TEST” button.
  3. Review Run Log: After the job concludes, access the run log. This log is accessible under the “RUNS” tab of the Agent detail page.

Following these steps ensures you have a complete inventory of data sources before proceeding to complete the PowerBI Job configuration.

PowerBI to Blindata Model

This section explains how the metadata model in Power BI is mapped to the metadata model in Blindata.

Metadata is available from PowerBI’s scanners APIs , and each entity in PowerBI is mapped to an entity in Blindata as follows:

  • Workspace: A Workspace is not directly mapped to a physical entity in Blindata. Instead, the workspace name is included in the schema names of other entities that represent the resources belonging to the workspace.

  • Dashboard → Physical Entity: A Dashboard is mapped to a physical entity. The name is the dashboard display name, the schema is the workspace name, the table type is DASHBOARD, and additional properties include DASHBOARD_ID (the Id of the dashboard) and DASHBOARD_APP_ID (the Id of the application).

  • Tile → Data Flow: A Tile is not directly mapped to a physical entity in Blindata. Instead, it is represented as a dataFlow from a Dataset to a Dashboard.

  • Report → Physical Entity: A Report is mapped to a physical entity. The name is the report name, the schema is the workspace name, the table type is REPORT, and additional properties include REPORT_ID (the Id of the report, used as a natural key for capturing report changes), APP_ID (the Id of the application), CREATED_BY, CREATED_BY_ID, MODIFIED_BY, MODIFIED_BY_ID, MODIFIED_DATE_TIME, DATASET_ID, DATASET_WORKSPACE_ID, and LINK_REPORT (the URL to the report).

  • Dataset → Physical Entity: A Dataset is mapped to a physical entity. The name is the dataset name, the schema is the workspace name, the table type is DATASET, and additional properties include DATASET_ID, DATASET_CONFIGURED_BY, and creationDate.

  • Table → Physical Entity: A Table is mapped to a physical entity. The name is the table name, the schema is “«workspace name».«dataset name»”, the table type is DATASET_TABLE, and the description contains both the table description and the table expressions. Additional properties include WORKSPACE_ID and DATASET_ID.

  • Column → Physical Field: A Column is mapped to a physical field. The name is the column name, the type is the column dataType, and additional properties include FORMAT_STRING, DATA_CATEGORY, SUMMARYZE_BY, and SORT_BY_COLUMN.

  • Measure → Physical Field: A Measure is mapped to a physical field. The name is the measure name, the type is MEASURE, the description is the measure expression, and additional properties include FORMAT_STRING.

  • DataSource → Physical Entity: The data source and its entities (tables and columns) must already be defined in Blindata’s metadata catalog. The connector parses the source expressions in the dataset tables (written in Power Query M language) and connects the data source entities with the mapped tables and columns using Blindata’s dataFlows.

  • DataFlow: A PowerBI’s dataflow is not mapped to Blindata. The entities mapped on Blindata are linked with DataFlows, such as DataSource.Table.Column → Dataset.Table.Column, DataSource.Table.Column → Dataset.Table.Measure, Dataset.Table → Dataset, and Dataset → Report.

Metadata available from PowerBI REST APIs includes information extracted from .pbix files to enhance Report metadata and capture information on Datasets utilized within the Report:

  • Measures → Physical Field: A Measure (from the Datasets or from Report’s Model Extension) is mapped to a physical field. Its physical entity is the one to which the Report has been mapped. The name is “«Table name» : «Measure name»”, the type is MEASURE, the description is the measure’s Expression, and additional properties include section_name (indicating the Report’s section where the Measure is utilized) and model_extension_name (the name of the model extension, applied only to the Measures of the Model Extension).

  • Dataset.Table.Columns → Physical Field: A Column from a Dataset’s Table is mapped to a physical field. Its physical entity is the one to which the Report has been mapped. The name is “«Table name» : «Column name»”, the type is the Column’s type, the description is the Column’s description, and additional properties include section_name (indicating the Report’s section where the Column is utilized).

Entities mapped on Blindata are linked with DataFlows as follows: DataSource.Table.Column → Report.Column, and DataSource.Table.Measure → Report.Measure.

PowerBI and Azure Analysis Services (AAS)

This section explains the steps that must be performed to correctly retrieve all Report’s .pbix metadata when the Report’s Dataset has an AAS tabular model in its Datasources.

  • Each AAS tabular model must be loaded in Blindata within a single system.
  • The schema of the physical entities of an AAS tabular model must be set to its name (the database field on Datasource metadata).
  • The setup for the .pbix analysis must be completed correctly .

In the job configuration JSON code, add an entry to the attribute “blindataSystemPBIDataSourceMappings” with the following values:

  • “blindataSystemName”: the name of the system in which the Tabular Models metadata has been loaded.

  • “blindataSystemUuid” (optional): the uuid of the system in which the Tabular Models metadata has been loaded.

  • “powerBIDataSource…”:

    • powerBIDataSourceId: use this to map each AAS data source to a Blindata system. (Suggested for manual mapping of each AAS database to a Blindata system).
    • powerBIDataSourceType: set this to “AnalysisServices” if you want to to map every AAS data source to a single Blindata system.
    • powerBIDataSourceRegex: use this to map a specific set of AAS data sources to a single Blindata system. For example, to map AAS data sources of one region to a Blindata system, if the region name is specified in the server URL, an example of regex can be "(?i).*\"server\"\\s*:\\s*\".*regionName.*\".*".

    The DataSource is a Json with the following structure:

    {
      "connectionDetails": {
        "account": "",
        "classInfo": "",
        "database": "",
        "domain": "",
        "emailAddress": "",
        "kind": "",
        "loginServer": "",
        "path": "",
        "server": "",
        "url": ""
      },
      "connectionString": "",
      "datasourceId": "",
      "datasourceType": "",
      "gatewayId": "",
      "name": ""
    }