Semantic Link as a Bridge Between Power BI and Python
Integrating business logic from Power BI models with the flexibility of Python within Microsoft Fabric previously required duplicating data structures. With the introduction of the Semantic Link feature, this barrier has been eliminated, creating a direct bridge between the semantic layer and the data science environment.
This solution allows DAX measures, relationships, and metadata to be read directly in notebooks. This enables the use of predefined business rules in data cleaning processes and machine learning model development, ensuring consistent results and reducing the time required to implement advanced analytics.
Business Scenario Plan – Automated Data Consistency Audit
-
- Business Objective: Ensure full data consistency between the analytical layer (Lakehouse) and the presentation layer (Power BI). The goal is to eliminate errors resulting from so-called data drift or incorrect DAX logic, which undermine confidence in reports.
- Result: Semantic Link was used as a verification bridge. It enables the simultaneous retrieval of data directly from the SQL Endpoint and final metrics from the semantic model within a single process (Notebook). This allows for the automatic comparison of results with report outputs, enabling proactive management of quality control, data auditing, and data reliability.
The sempy Library: Access to DAX Measures and Automatic Relationship Detection
The Sempy library serves as the technical foundation of Semantic Link, enabling direct interaction with Power BI semantic models from within Python. It treats the model not merely as a data source, but as a structure with its own logic and relationships.[1]
-
- Executing DAX queries: The evaluate_dax() function allows you to directly retrieve the results of measures and calculations defined in a Power BI model. This enables you to reuse business logic in a Python environment without having to duplicate it.
- Resource inventory: Using the list_datasets() and list_tables() functions, you can retrieve a list of available semantic models and tables within Microsoft Fabric workspaces.
- Fast table import: The read_table() function allows you to load entire tables from a model directly into a dataframe, eliminating the need to write SQL queries for standard data retrieval operations.
- Integration with the analytics ecosystem: Operation results are natively loaded into DataFrame structures (Pandas/Spark). This enables the immediate use of libraries such as Scikit-learn, Matplotlib, or Seaborn to build ML models and advanced visualizations.
- Connection management: The module serves as the primary communication interface between the notebook and the semantic layer, allowing for dynamic resolution of report IDs (e.g., via the `resolve_dataset_id` function).
- Using Semantic Link preserves a “single source of truth,” as any change in the business logic of the Power BI model is automatically reflected in the analyses performed in the notebooks.
[1] https://learn.microsoft.com/en-us/fabric/data-science/read-write-power-bi-python
Sempy Modules: Structure Management, Quality Control, and Model Visualization
The Sempy library is divided into several key modules that support work with the semantic layer. Below is an overview of the main areas, along with examples of their applications.
- sempy.fabric – Interaction with Power BI. This module is used to communicate with workspaces and models.[2]
-
- list_datasets() / list_tables(): Retrieves a list of available resources. Use case: Automatic inventory of models in large Fabric environments.
- evaluate_dax(dataset, query): Executing DAX queries directly from a notebook. Example: Retrieving the result of the [Total Sales] measure broken down by region into a Pandas dataframe.
- read_table(dataset, table): Loading entire tables from the model. Use case: Quick data import without having to write SQL queries.
- sempy.relationships – Structure management. The module allows you to work with relationships defined in the semantic model.[3]
-
- list_relationships(): Display relationships between tables (foreign keys, cardinality). Use case: Verifying schema consistency before starting the analysis.
- find_relationships(df_list): Automatically suggests potential relationships between loaded data frames. Example: Detecting missing relationships between new data from the Lakehouse and the existing model.
- sempy.dependencies – Logic and quality analysis. Tools for examining functional dependencies within data.[4]
-
- find_dependencies(df): Detects “one-to-one” or “many-to-one” dependencies between columns. Example: Checking whether each ProductID is assigned only one ProductName.
- plot_dependency_violations(df, col1, col2): Visualization of records that violate business rules. Application: Quickly identifying errors in source data (e.g., a single ZIP code assigned to two different cities).
[2] https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.fabric?view=semantic-link-python
[3] https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.relationships?view=semantic-link-python
[4] https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.dependencies?view=semantic-link-python
Runtime
Understanding the technical foundations of the operations performed by Semantic Link is essential for properly configuring and optimizing analytical processes within the platform.
The runtime environment is a dedicated set of computing resources on which the instructions contained in the notebook are physically executed. In the Microsoft Fabric ecosystem, Spark code and sempy library functions are executed within computing capacity units assigned to a specific workspace. This means that the entire execution of queries and transformations takes place within the organization’s secure cloud infrastructure, eliminating the need to transfer data outside the dedicated security layer.[5]
The sempy library acts as a bridge that allows interaction with the semantic model without having to leave the notebook runtime. All calls, such as metadata retrieval or DAX calculations, are processed by the Power BI engine integrated with Fabric, which ensures consistency of results with final reports.
It is worth noting that although the `evaluate_dax()` function typically returns results as a Pandas DataFrame, this does not limit the analyst. Thanks to native integration with the Spark engine, it is possible to instantly convert data into a Spark DataFrame (e.g., using spark.createDataFrame(df)), which opens the door to distributed processing on large datasets. Once the results are loaded into the runtime memory, the data can also be loaded into other high-performance libraries, such as Polars, providing complete freedom in choosing tools based on the project’s scale and performance requirements.
[5] https://learn.microsoft.com/en-us/fabric/data-engineering/runtime
Debugging semantic links (sempy)
- Error: AttributeError (Incorrect function name)
-
- Symptom: The message “module ‘sempy.fabric’ has no attribute ‘…’”.
- Cause: A typo in the name of the called function or an attempt to use a method from the wrong module (e.g., searching for list_datasets in sempy.viz).
- Solution: Verify the spelling (e.g., replace ‘eval_dax’ with the correct form ‘evaluate_dax’) and check that the imported module is correct. [6]
- Error: FabricHTTPError (Resource not accessible)
-
- Symptom: “PowerBI_Notebook_DatasetNotFound” error message.
- Cause: The resolve_dataset_id() function failed to find a dataset with the specified name in the current workspace, or the user lacks the necessary permissions.
- Solution: Ensure that the dataset name matches the one visible in Microsoft Fabric and verify access to the given Workspace. [7]
- Error: TypeError (Invalid input parameters)
-
- Symptom: “evaluate_dax() missing 1 required positional argument: ‘query’” message.
- Cause: Passing the wrong number of arguments or data of the wrong type to the function.
- Solution: You must check the function signature; for evaluate_dax, both the model ID and the DAX query content must be provided as a string. [8]
- Error: RuntimeError / MdxScript Error (DAX syntax error)
-
- Symptom: The message “Calculation error in measure…” appears. [
- Cause: A DAX query has been submitted that is logically incorrect for the Power BI engine (e.g., a reference to a non-existent measure or table).
- Solution: It is recommended to test and validate DAX queries in DAX Studio before implementing them in the Python notebook.
- Error: Data Type Mismatch
-
- Symptom: ‘null’ values or incorrect results appear in the Spark dataframe after conversion.
- Cause: Lack of automatic mapping of types returned by DAX to the expected structures in the Spark environment.
- Solution: Explicitly cast types using the .cast() function (e.g., to “double” or “date”) immediately after creating a DataFrame from the result of a DAX operation.
[6] https://docs.python.org/3/library/exceptions.html
[7] https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-power-bi?tabs=sql
[8] https://docs.python.org/3/tutorial/errors.html
Example of using Semantic Link in Microsoft Fabric:
Automatic detection of discrepancies between the semantic model and the report
Have you ever had a situation where report development was complete, all tests passed successfully, and only after some time did the client discover discrepancies between the data warehouse and the Power BI report?
That’s exactly what Semantic Link is for.
To compare the differences between the data warehouse and the Power BI report, you need to retrieve monthly sales data from the semantic model. You can pass almost any DAX code to the “dax_query” operator; however, for the purposes of this example, we’re retrieving monthly sales data.
Next, a connection is established with the SQL Endpoint on the Lakehouse side. Data is retrieved from the daily sales view, aggregated by month, and compared with sales from the semantic model. The two datasets are compared with an additional caveat regarding null values. The search looks for sales that appear in the model but not in the Lakehouse view, as well as the reverse. Null values are replaced with 0 so that the data comparison does not return a null value.
The result is shown below:
The table should then be placed in the email body. However, this approach is not suitable for large tables. Therefore, the top 10 differences are selected, converted into an HTML table, and a dynamic JSON object is then generated. The JSON will be used during the data pipeline to dynamically render the table in the email body.
Selecting a pipeline to handle notifications allows you to separate data processing logic from the communication layer. Using the platform’s native connectors ensures a higher level of security and facilitates centralized authorization management and monitoring of operation status. This approach aligns with the principle of separation of responsibilities, where the notebook handles computations and the pipeline handles orchestration and alerting.
When configuring the pipeline, if the notebook encounters an error, it will attempt to run two more times:
The notebook is then attached:
Next, add a new action called “Office 365 Email” and set it to run after the notebook has completed successfully. Use the email account for configuration and enter dynamic expressions retrieved from the notebook.
The email subject line needs to be copied from a notepad, which is why the phrase “An error was found in the report” is combined with the report’s name. The report is sent to recipients in Poland, so it contains Polish-specific terms, but these can be modified as needed.
Next, a table is generated from the email body.
The final result looks like this:
Summary and conclusions
The approach to data testing automation presented here demonstrates the real value that Semantic Link brings to the Microsoft Fabric ecosystem. By directly comparing the state of the semantic model with the data in the SQL Endpoint, it becomes possible to instantly detect inconsistencies that could impact critical business decisions.
Implementing the Sempy library in CI/CD processes and data quality monitoring enables:
-
- Ensuring consistency: Eliminating the risk of discrepancies between the data transformation layer (Warehouse) and the presentation layer (Power BI).
- Centralization of business logic: Definitions of key performance indicators (KPIs) and their logic remain in one place. Thanks to Semantic Link, they are fully reusable in the notebook environment, eliminating the need to duplicate business definitions across different tools.
- Time savings: Automation of repetitive comparative tests that previously required manual verification or the creation of complex, separate scripts.
- Proactive error management: Integration of test results with pipelines and notification systems enables a response to incidents before they are noticed by end users.
Semantic Link effectively blurs the line between the roles of Data Engineer and Power BI Developer. It transforms the semantic model from a closed “black box” into a fully-fledged, programmable component of a modern data platform. In an era of growing complexity in analytical systems, standardizing tools and maintaining a “single source of truth” is essential for building full confidence in the generated reports and analyses.
At Datariselab, we tackle these challenges daily. If your company is just starting out with Microsoft Fabric or already has solutions in place but wants to implement more advanced quality control mechanisms, our certified specialists will guide you through the process. We’ll help you implement automated testing, advanced data integrity analysis within your Microsoft Fabric environment, and monitoring across your organization, so you can fully rely on the results we deliver.
Contact us and see how we can improve your analytics ecosystem!
Schedule a meeting:
