Data Catalog

Introduction To The Data Catalog

Welcome to Blindata’s Data Catalog – your comprehensive solution for cataloging and managing data assets. The Data Catalog module is designed to facilitate the collection of metadata from various data repositories, providing a centralized view of structured, semi-structured, and unstructured data, including databases, documents, dashboards, reports, and machine learning models.

At Blindata, we believe in keeping things simple. The Data Catalog is designed with a three-layer structure, featuring Systems that contain Physical Entities, which, in turn, house Fields. Physical Entities may represent diverse data containers such as reports, tables, or schemas of CSV files. Fields represent attributes, offering a nested structure for increased granularity. Interconnections and relationships between these data assets are seamlessly handled through Data Flows and constraints like foreign keys. This straightforward structure ensures an intuitive and efficient approach to managing your data assets.

A 360-Degree View On Your Data Assets

Blindata’s Data Catalog serves as a collector point for various aspects of your data assets, offering a holistic perspective. It enables you to maximize the value of your data by providing a unified view across different types of data.

  • Semantic Linking for Interoperability and Composability: Ensure the interoperability and composability of your data assets by semantically linking them with terms from the business glossary. This feature aligns your data with organizational terminology and business rules, preserving the meaningful connections between different data elements.

  • Lineage Module: Explore data interactions and correlations through the Lineage module. Automatically extract data lineage from SQL statements, visualizing how data flows and relates within your organization.

  • Classification Module: Connect your Data Catalog to the business glossary effortlessly using user-defined rules. Blindata provides tools for classifying and assigning metadata to enhance understanding and governance.

  • Quality Monitoring Framework: Actively monitor data quality using Blindata’s dedicated framework. Define Key Quality Indicators (KQIs), extract metrics from connected systems, and track data quality trends over time.

  • Issue Management: Effectively manage and report issues directly within the platform. Encourage communication and collaboration among users, facilitating maintenance, reporting, and remediation activities.

  • Personalized Dashboards: Tailor your workspace with a personalized dashboard. Quickly access favorite resources and stay informed about assigned issues and quality checks.

  • ER Data Model Visualization: Explore the relational data model with Blindata. Utilize information from connected systems or user-defined inputs to display keys, indices, and other constraints in tables.

Using the Data Catalog

The Data Catalog can be utilized based on the maturity and complexity of your organization’s data management processes. Choose the approach that aligns with your needs:

  1. Manual Entry via UI or Spreadsheet Upload:

    • Suitable for organizations in the early stages of data governance.
    • Data cataloging can be performed manually through the user interface or by uploading spreadsheets.
  2. Integrated Metadata Crawlers (Blindata Agent Connectors):

    • Ideal for organizations at an intermediate level of maturity.
    • Utilize Blindata’s integrated metadata crawlers to automate the cataloging process from various sources.
  3. Data Mesh Approach (API Integration from Data Ops Pipelines):

    • Recommended for advanced organizations adopting a data mesh philosophy.
    • Shift-left responsibilities and upload metadata via APIs directly from data operations pipelines.

Select the approach that best fits your organization’s needs and progress towards effective data governance and compliance.

For more detailed instructions, please refer to the specific sections of this user guide.

Glossary of Common Terms

As you embark on your journey with Blindata’s Data Catalog, it’s essential to familiarize yourself with key terms that will enhance your understanding of the user guide. Here are some common terms you may encounter:

Term Definition
System A physical repository responsible for the storage or processing of data.
Physical Entity Represents an entity within a system, capturing a distinct unit of information.
Physical Field Represents an attribute associated with a Physical Entity, providing details about a specific characteristic.
System Routine Represents a stored procedure or stored function within a system, encapsulating a set of instructions for processing data.
Physical Constraint Represents a rule defined within a system, typically applied to the values represented by a Physical Entity or a Physical Field.
Data Flow Represents the flow of data between elements of the business glossary and the data catalog, illustrating how information moves within your organization.

Understanding these terms is crucial for navigating the functionalities of Blindata’s Data Catalog effectively. If you encounter any unfamiliar terminology while using the platform, refer back to this glossary for quick clarification.

Happy data cataloging!