Using Fabric OrgApps + Notebooks For Geospatial Data Exploration

Using Fabric Python Notebooks, Daft & OrgApps

Simon Willison is one of my favorite bloggers. In fact, what I blog, how I blog & test, is inspired by him. He wrote a blog a couple of weeks ago about FourSquare Places data that has been open-sourced. I was exploring this dataset and ended up creating a few maps. I love OrgApps in Fabric and I truly believe as it matures, it will be THE way for analysts & data scientists to provide rich insights + traditional reports to business users. Notebooks can augment the Power BI reports to provide insights that are otherwise not possible. I have submitted a session on this topic to FabCon ‘25, let’s see. If it is selected, I hope to show how transformational it is and how businesses can use it.

I won’t go into super details about the code below, but a few things to note:

  • I used daft to scan 104M rows from an S3 bucket in Fabric Python notebook without downloading the entire dataset. Why daft ? Because it’s optimized for reading S3 data. If you run the below notebook, you will see there is minimal memory & CPU consumption. Look at Simon’s blog above, he used Duckdb. I cleaned the transformed the data lazily using daft.

  • I also used Polars because polars has a nice altair integration.

  • Folium for creating interactive maps and timeseries using Plotly.

  • Notebook is embedded in OrgApps for users to explore the data. You can also embed a Power BI report using QuickVisualize for users to explore the data (as long as it is a small dataset).

 

Just download this notebook, import it in your Fabric workspace and execute it.

To get a list of files at this S3 location:

EPPC Speakers
 

## list of files

s3 = fs.S3FileSystem(region='us-east-1')
path = "s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/*.parquet"


file_info = s3.get_file_info(fs.FileSelector(
    "fsq-os-places-us-east-1/release/dt=2024-11-19/",
    recursive=True
))
for info in file_info:
    print(info.path)

About the Author:

Sandeep Pawar

Data Science professional with experience in using Data Analytics, Statistics & Machine Learning to create business solutions. I primarily use Microsoft data stack (Power BI, Synapse Analytics, Azure ML) to create scalable data informed business solutions.

Reference:

Pawar, S (2024). Using Fabric OrgApps + Notebooks For Geospatial Data Exploration. Available at: Using Fabric OrgApps + Notebooks For Geospatial Data Exploration [Accessed: 11th December 2024].

Share this on...

Rate this Post:

Share: