I’ve been using my solution to extract tables from Powerpoint a lot recently. However, I’ve come across some situations where what looks like a table is actually an embedded Excel worksheet.
This can be quite useful in Powerpoint as it provides the ability to create subtotals and apply other formulae.
You can add these sort of tables by Insert->Table->Excel Spreadsheet or Insert->Object->Microsoft Excel Worksheet or if you Copy and Paste->Embed.
Of course, it also has the side effect of completely breaking my approach for extracting the data using Power Query.
Initially I thought I was going to have to find the reference in the XML, track it through the relationship file and then to the actual embedded workbook. However, I quickly realised that there wasn’t much point.
The embedded worksheets always end up in the same folder within the zip file – ppt\embeddings – so we can just grab them from there.
The code then becomes pretty simple.
- Unzip the .pptx file using the UnzipContents function from previous postings
- Filter for just files within the embedded folder
- Tell PowerQuery to treat the content as an Excel Workbook.
Unfortunately, it didn’t work.
It turns out that, for reasons unclear to me, Powerpoint doesn’t compress the content of embedded Excel worksheets. This means that when the UnzipContents function tries to decompress them it fails and produces null instead. I’ve noticed it seems to apply the same approach to embedded images such as JPGs.
There was a quick fix to this though. I made a slight modification to the UnzipContents function and told it to return the raw binary data instead of null when it fails to decompress.
Content = BinaryFormat.Transform( // Apply header record format to our Header payload, extract the Compressed File size and grab the compressed data BinaryFormat.Binary(Header(_)[BinarySize]), // Then use Binary.Decompress to decompress it (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise <span style="text-decoration: underline;"><em><strong>null</strong></em></span> ) ]),
with:
Content = BinaryFormat.Transform( // Apply header record format to our Header payload, extract the Compressed File size and grab the compressed data BinaryFormat.Binary(Header(_)[BinarySize]), // Then use Binary.Decompress to decompress it (x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise <span style="text-decoration: underline;"><em><strong>x</strong></em></span> ) ]),
The code then becomes:
(FullFilename) => let Source = File.Contents(FullFilename), // Extract files from the Powerpoint pptx file using Mark White's UnzipContents function modified to return raw binary content if decompress fails Files = UnzipContents(Source), // Filter to just Excel workbooks in the embedded files folder #"Filtered Rows1" = Table.SelectRows(Files, each Text.StartsWith([FileName], "ppt/embeddings")), #"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each Text.EndsWith([FileName], ".xlsx")), // Add a custom column which parses each file as an Excel workbook #"Added Custom" = Table.AddColumn(#"Filtered Rows2", "Data", each Excel.Workbook([Content])), JustData = Table.SelectColumns(#"Added Custom",{"FileName", "Data"}) in JustData
And you can call it from a Query like this:
let Source = ExtractEmbeddedWorksheetsFromPowerpoint("C:\Sample.pptx") in Source
This should yield a result like this:
From here you can select the table you are interested in and drilldown the sheet:
Here’s a sample and a workbook with a working example:
Extract Powerpoint embedded v2
Reference:
Excel and Power BI. (2016). Excel and Power BI. [online] Available at: http://www.excelandpowerbi.com/ [Accessed 29 Mar. 2017].