Imagine soaring high above the city, the landscape stretching out below, each building, each road, each individual vehicle going about their day. That is the view from a helicopter — a perspective that allows you to see everything at once, yet with the ability to focus on specific details. This is an apt analogy for Microsoft Fabric, the all-encompassing analytics solution that offers a comprehensive view of your enterprise’s data landscape.
In this article we will discuss Microsoft Fabric, look at the main architectural concepts and discuss what concerns at every layer. Our goal by the conclusion of this article is to have provided answers to the common questions individuals have regarding the number of domains, workspaces, lakehouses, and how they all correspond with each other. In the article itself, we’ll use the analogy of a helicopter ride for discussing the different levels of abstraction. As we fly over Fabric and descend closer, we’ll highlight significant elements influencing its architecture, deployment patterns and design considerations, as well as other essential factors to consider.
Microsoft Fabric — an overview
Microsoft Fabric is an all-in-one analytics solution designed to meet the needs of enterprises. It covers everything from data movement to data science, real-time analytics, and business intelligence. Microsoft Fabric was introduced in preview in the spring of 2023 and was made generally available for purchase in November 2023.
Microsoft Fabric is a SaaS platform
As we take off, we get the broadest view of Microsoft Fabric. On this level, we see the foundation. The approach Microsoft has chosen is unique and significant. Microsoft Fabric is a Software-as-a-Service (SaaS) platform. This means Microsoft handles the integration and most of the infrastructure management and configuration for many different services. So, no more infrastructure deployments. No more writing infrastructure templates or considerations on the number of storage accounts. The shift to SaaS also eliminates the need for subscriptions.
Enabling Microsoft Fabric occurs at tenant-level. The Global Admin can activate Microsoft Fabric in the PowerBI Admin Portal with a simple switch. Depending on your requirements, Fabric becomes accessible to everyone in the tenant or a chosen group of users. Please refer to the screenshot below for more details.
The tenant level also serves as the first security boundary. Each tenant consistently maintains its unique Microsoft Entra ID, formerly known as Azure Active Directory. This means that the tenant’s identity and access management serve as the primary line of defense. In fact, authentication is handled by Microsoft Entra ID when signing into the web-based SaaS platform. Therefore, to limit user access in Microsoft Fabric, you can establish multi-factor authentication or conditional access policies. This step ensures a higher level of security for your organization’s data.
Microsoft Fabric enables domain-orientation
Hovering over the Admin settings for Microsoft Fabric, you have the ability to configure domains. A domain serves as a logical group for all data and artifacts, such as notebooks, that are relevant to a specific area or field within an organization. This feature is particularly beneficial for larger organizations as it allows data grouping by business capabilities, functional areas or departments. This enables users to manage their data in line with their unique regulations, restrictions, and needs.
Consider the example of a large e-commerce company. This company would typically have numerous teams under several different business units such as Finance, Marketing, Operations, and so on. In this context, a domain could represent the department itself, like “Finance”. The responsibilities and rights of managing users and workspaces can then be delegated to specific domain administrators or contributors.
The quantity of domains required for Fabric varies from one company to another. Some companies possess a multitude of diverse business domains, while others may be smaller or less advanced in terms of managing large-scale data. These companies may opt to have a smaller number of domains, but with larger teams manager more data sets. If you would like learning more on this, I encourage you to read my other article on data domains.
It’s important to note that domains primarily serve to isolate management or administration concerns. A domain is not for dictating who has access to data or other artifacts. Security and access to data and artifacts are addressed on different layers, which we will delve into next.
Workspaces
As we descend the helicopter closer, we can see the various features of Microsoft Fabric in more detail. Workspaces are a critical component of Microsoft Fabric, serving as collaborative environments for users to work on data ingestion, machine learning, notebooks, real-time analytics, lakehouses, warehouses and reports. Workspaces connect users with regions, capacities, and version control systems for code and artifacts. Let’s delve deeper into these aspects.
The relationship with regions means that all associated storage is managed within the workspace’s assigned region. This is particularly useful for global companies with strict geographical data segregation requirements. For instance, if a business unit or domain needs to separate data between Europe and the United States, utilizing multiple workspaces can help meet these demands. Consequently, a single domain with several localized platform instances may have numerous workspaces.
The linkage with version control systems aids in managing the deployment process and separating concerns related to development, testing, acceptance, and production. This implies that each development stage will have a dedicated workspace. For instance, there would be separate workspaces for development, testing, and production.
The association with capacity refers to the computing power required to facilitate various experiences and activities. Capacity alignment can be implemented across workspaces and domains. For instance, a single capacity unit could be shared among all development workspaces across all domains. Alternatively, a high-demand production environment might have its dedicated capacity. Thus, to optimize and separate compute-intensive workloads within a domain, using additional workspaces can be an effective strategy.
Workspaces also function as a security boundary for managing permissions. At the workspace level, there are different roles such as admin, member, contributor, and viewer. Each role has different rights and restrictions within the workspace. But workspaces can also be used for separating activities or distribution of data. Microsoft Fabric introduces the concept of shortcuts or references to data managed elsewhere, adding another method of decoupling to a workspace. Consider a production environment where you want to give certain users read-only access for exploratory and experimental activities. This can be achieved by setting up a new workspace and creating read-only shortcuts to other workspaces within it. This method of segregation can also be used for sharing data with other domains, e.g. providing a workspace where all data product data resides.
In the future, workspaces could also facilitate connectivity through private endpoints. This again aligns with the principle of separation of concerns. For instance, if you want to restrict certain parts of the architecture to connect only with specific other parts, establishing an extra workspace would enable this.
In conclusion, workspaces are a fundamental part of Microsoft Fabric, with domains likely having at least a few. Establishing separate workspaces for development, testing, and production is a basic method of segregation. However, depending on your specific needs, you may require additional workspaces for your domains.
Item level management
In addition to workspace-level management, you can also control and manage specific artifacts with item-level permission management. Within Microsoft Fabric, various elements such as Reports, Notebooks, Pipelines, Lakehouses, and Warehouses coexist within a unified workspace. However, the permission configuration settings may differ across any of these items.
To illustrate how this works, let’s use the Medallion architecture within a workspace from a domain as an example. The Medallion architecture is a popular design pattern that organizations use to logically organize data in a Lakehouse. It’s the go-to design approach for Fabric.
A typical Medallion architecture is divided into three distinct layers or zones, each representing the quality of data stored in the Lakehouse. The higher the level, the higher the data quality. In Microsoft Fabric, the best practice is to let each layer correspond to a separate Lakehouse entity. Every Lakehouse manages its data in OneLake and comes with a built-in SQL endpoint. This feature unlocks data warehousing capabilities without the need to move data between layers. Therefore, to implement a typical Medallion architecture in Microsoft Fabric, you create three items: a Lakehouse for Bronze, a Lakehouse for Silver, and a Lakehouse for Gold.
The systematic workflow for deployment between development, testing, acceptance, and production (DTAP) typically involves cloning workspaces. If so, this results in every environment having a minimum of three lakehouses or warehouses. Therefore, for DTAP, the number of lakehouses will be multiplied by three, resulting in a total of 12 lakehouses for four environments.
Access rights can be set for each individual item. For instance, users without a workspace role can gain additional read or edit rights through individual item rights settings. For example, let’s consider access management for the Gold layer in a Medallion architecture. You may allow extra users to read-only access through a SQL endpoint. This can be handy when you want to provide access to only the high-quality data in the Gold layer, not the entire architecture. Or, it may be useful for developers who need to debug in a Bronze production environment but are not permitted to view other data.
Lakehouses, as entities, can also provide an extra level of (protection) separation. Imagine that you have a large domain team managing numerous applications, and not all users are allowed to see all raw data. Alternatively, multiple smaller teams ingest data and you would like to separate pipelines from writing to the same location. In such cases, you can opt for additional Lakehouses allowing each small team to operate on its own data.
In conclusion, depending on your design or needs, a workspace may contain more Lakehouses than the typical three for Bronze, Silver, and Gold.
Granular security management
As we land in a Lakehouse or Warehouse, we can see the finer details of Microsoft Fabric. In order to enhance the read access control for a Warehouse or a Lakehouse’s SQL endpoint, it is also possible to introduce further limitations using SQL policy. Once the SQL Policy is set, for instance, via GRANT/DENY SQL instructions, anyone who accesses the data will be subject to Object Level Security as outlined in the SQL Policy.
For example, consider a multinational corporation’s employee data warehouse. The warehouse includes a table with employee records from all over the world, and each record (or row) includes an attribute for the employee’s country of operation. With a SQL Policy, such as row-level security, we can ensure that managers can only view records of employees who work in the same country as them.
OneLake
As you pilot the helicopter through the various layers, there is a crucial layer that cannot be overlooked. This logical layer serves as the base for storing and managing all data and is known as OneLake. OneLake is an open storage layer that is built on top of Azure Data Lake Storage (ADLS) Gen2 and utilizes the open-source Delta Parquet format.
When enabling Fabric at the tenant-level, OneLake is automatically provisioned as a single, logical data lake for the entire organization. Although workspaces, lakehouses, and warehouses align and operate on different parts, OneLake acts as a unifying force.
OneLake includes a lightweight data virtualization layer that allows for shortcutting and referencing other storage locations, even those outside of OneLake such as Azure or Amazon Web Services (AWS). This was already mentioned when discussing the number of workspaces. In addition, Microsoft Fabric introduces mirroring, which replicates database snapshots in real-time. These replicas remain in sync in near real-time.
The benefit of these patterns is that lakehouses and warehouses can behave like conglomerates of different pools of data. For example, the Bronze layer can be designed to ingest one part of the data, shortcuts another part, and replicate another part via real-time sync. The architecture above visualizes this approach.
Shortcutting may also occur on the consuming side within workspaces, leading to various consumer architecture models. For instance, a team focused solely on reporting might directly access the data. Another team might bypass the Bronze layer if the data is already processed within a different domain’s workspace. Meanwhile, a different team might integrate and combine data, then directly share this with other teams.
The conclusion we can draw from the above discussion is that OneLake facilitates various architectural scenarios without necessitating data movement or duplication.
In conclusion
In the previous sections, we’ve quickly examined various layers and design concepts on an architecture level. In respect to that, it’s crucial to note that the features discussed only represent a fraction of the comprehensive capabilities of Microsoft Fabric. This platform extends beyond merely domains, workspaces, and lakehouses. It boasts a Copilot and additional generative AI functionalities, which simplify the process of constructing notebooks, creating reports, and developing SQL code.
Microsoft Fabric is equipped with time-series databases, capable of processing billions of records in a matter of seconds. It also offers real-time analytics and compatibility with event streams, windowing, and similar features. The platform comes with machine learning capabilities, and integrates seamlessly with Purview Data Governance.
On the aspect of compute and storage decoupling, Microsoft Fabric can handle various compute engines such as Spark, SQL, kSQL, PowerBI, among others. These engines can operate concurrently on terabytes of data at scale. All these can be achieved without the need for infrastructure management, providing a streamlined Software as a Service (SaaS) experience.
In summary, when viewed from a helicopter perspective, Microsoft Fabric appears as a comprehensive, integrated solution for enterprise analytics. It offers a comprehensive view of the entire data landscape, but also offers robust methods for both safeguarding and distributing data across various levels, ranging from the workspace down to individual items and objects. As a Software-as-a-Service (SaaS) platform, it simplifies infrastructure management and configuration. By offering such a broad and detailed view of data, Microsoft Fabric provides a powerful data platform for enterprises.
This blog features as part of Microsoft Fabric Week.
About the Author:
Hands-on data management professional. Working @Microsoft.
Strengholt, P. (2024). Microsoft Fabric — a better understanding of the underlying architecture and concepts. Available at: https://piethein.medium.com/microsoft-fabric-a-better-understanding-of-the-underlying-architecture-and-concepts-847407b2524f [Accessed: 14th February 2024].