SQL Databases in Fabric?

Introduction

Much like a thermos that can keep both hot liquids hot and cold liquids cold (but not at the same time), we – as data architects and engineers – used to be very adamant about not mixing transactional processing (OLTP) and analytical processing (OLAP) on the same infrastructure since the latency-orientation of the one and throughput-orientation of the other traditionally were not compatible. This is maybe one of the biggest reasons that Microsoft got some criticism from the Fabric community when they announced the public preview of SQL database in Microsoft Fabric in November.

It should maybe not have come as a total surprise, since there for some time has been a buzz around the concept of “Translytical” Data Platforms (transactional and analytical at the same time), but personally I had gotten used to the concept of “OneLake – the OneDrive for your Data” and expected the Warehouse artifact and mirroring in Fabric to be the tools to bridge the gap between the OLTP and OLAP.

However, having tried out the Fabric SQL Database in the private and public previews has demonstrated, that having a SQL Database inside Fabric has numerous advantages – especially combined with other translytical apps like automatic mirroring and writeback from PowerBI.

Firstly, the current implementation of Delta Parquet in Fabric does not support standard SQL functionality like enforceable primary and foreign keys, identity columns and triggers.

Secondly, operational workloads are often characterized by many small write operations, which in OneLake introduces a “small files problem” since the delta log becomes large and fragmented with severe consequences on performance and capacity load for both reading and writing. A SQL Database does not have this problem.

Lastly, since the Fabric is a SaaS solution, it is easy to provision a new Fabric SQL Database, and since it is running on the Microsoft Backbone Network and mirroring is automatically enabled, the integration between the analytical and operational data is secure and near real-time with minimal set-up and maintenance.

Use cases in AP Pension

In AP Pension, we have built a metadata driven framework for ingesting data, orchestration of notebooks and managing data delivery via workspaces and shortcuts as described in the article Fabric Workspace design for automation and Data Delivery in AP Pension – Part 2.

Design of a data delivery workspace in AP Pension

To this end we do some custom logging as well as data entry via the Custom PowerBI Visual, accoMasterdata and its budgeting counterpart accoPLANNING which we – in both cases – need to be instantly available for further processing.

The writeback-solutions provide a simple interface for updating data on a SQL Database/Endpoint, which out of the box is available for other visuals in PowerBI in a semantic model. However, to get the most out of the solution, we need triggers and primary keys on the tables as well as good performance on commits.

Writeback from PowerBI with accoMasterdata

Writing to a SQL Database outside of Fabric and setting up private endpoints or mirroring for getting data in and out of Fabric for the logging or writeback solutions is not the optimal solution, so we experimented with writing to writing these data to OneLake. This, however, was not optimal either – due to the limitations to DeltaParquet mentioned above … and especially since these kinds of workloads consist of many small commits to secure data integrity and availability. This resulted an average file size in OneLake of < 10 kb.

The small files problem

In his article, Analyze Delta Tables In Fabric, Sandeep Pawar explains the performance implications of working with small files (which he defines as anything less than 8MB), but the footnote to following table from the article OneLake consumption | Microsoft Learn indicates that a transaction via proxy costs 306 Capacity Units (CU) for every 4 MB read … unless the file is smaller than this, in which case every file read would result in a cost of 306 CU.

It is possible to use the VACUUM and OPTIMIZE commands to compact the files and rearrange the delta log to address this problem, and we experimented with this as well as disabling V-ordering on very “write intensive” lakehouses.

A SQL Database does not have the same problem with small commits and is therefore better suited to handle exactly these kinds of latency-oriented workloads … and once in GA will be extremely useful to have in the arsenal.

Add to this, that Microsoft in November announced the new Microsoft SQL Server 2025: Enterprise AI-ready database from ground to cloud with built in functionality to support vector search for working with RAG models in a secure and performant environment, the potential of adding a SQL Server becomes even clearer.

Conclusion

As Microsoft has announced that Fabric is their future SaaS platform for all things data, I would expect to see other database engines like graph and document databases be added in in a not-so-distant future to supplement the current OneLake, KQL and SQL databases.

This shift leaves us with a lot of governance, which starts with defining – and then upholding – a strategy and best practices for when to use which technology on a platform that hosts Data Engineers, Data Scientists, Citizen Developers and AI-empowered Analysts alike.

The world of data is not slowing down, but embracing change and cultivating resilience by implementing a learning and growth mindset – also in the data platform and engineering teams – is still the key to success.

About the Author

Jacob Rønnow Jensen

Jacob Rønnow Jensen is head of Data Platform at AP Pension, one of Denmark’s largest Pension Funds. As a manager, consultant and architect, Jacob has more than 20 years of experience with building Data Platform and Bl-solutions and has been involved with Microsoft Fabric since the early days of the Private Preview.