Fast and convenient connection to SharePoint file in Power BI
A Power Query function that returns the file when providing its SharePoint link. Paste the function in you report, call it with any SharePoint file url.
Et voilà! No more endless waiting at every new editing step, a faster refresh, and only one source settings to manage.
Pros
- Faster than default SharePoint.Files connector
- Get your file with one simple step: SharePoint_File(YourFileURL)
- The same function works for all your company SharePoint sites
- Only one authentification and data source setting to manage for all your files (not by SharePoint site)
Cons
- Requires hardcoding your tenant/company name in the function to avoid hand-authored queries issue
- Requires to "Skip test connection" in Data source credentials to enable refresh in Power BI service
How To
- Paste the function code in a new query and call it "SharePoint_File"
- Update TenantName (currently "WayneCorp") in the first row of code with your tenant name (your SharePoint url starts with https://TenantName.sharepoint.com)
- Find your file url : Click File –> Copy path from Excel application or use "People With Existing Access" on SharePoint online sharing menu
- Get your file in any query by calling SharePoint_File(YourFileURL) in any query
- Get credentials: choose organisational account. Power BI should only ask it on your first call of the function
The Code
Behind the scenes, the function will generate the following code:
Web.Contents(
"https://WayneCorp.sharepoint.com/",
[RelativePath=
"sites/Batcave/_api/web/getfilebyserverrelativeurl('/sites/Batcave/Shared Documents/General/Folder/Folder/file.xlsx')/$value"
])
The Performance Story
Connecting Power BI to a SharePoint file the default way is slow, time-consuming, cumbersome, inefficient and generates ugly M code.
There are three methods in Power Query to connect to a SharePoint file:
SharePoint.Files
Default connector used by the GUI, it generates a call to get all the files in the SharePoint and returns a table (technically, one API call by each 1000 files if you check Query Diagnostic). To get the targeted file, you must filter the table on Folder Path column, which is inconveniently the folder url, i.e. too long for convenient filtering in the GUI.
Slow performance seems to confirm there is no query folding, and you can expect one minute execution time for a SharePoint with thousands of files. Just don't be afraid of the Advanced Editor and never use SharePoint.Files again.
Icing on the cake: there is a limit on the number of files the query can return (50,000 items?). Your file may never appear, don't waste a day as I did to investigate why (usually the fact there is 50,000 files on a SharePoint is not your choice).
SharePoint.Contents
According to the documentation: Returns a table containing a row for each folder and document found at the specified SharePoint site. Each row contains properties of the folder or file and a link to its content. Meaning you first get a view of the root folder, and navigate to correct location like:
let
Source=SharePoint.Contents(SharepointRootUrl)
{[Name="Shared Documents"]}[Content]
{[Name="General"]}[Content]
{[Name="Folder"]}[Content]
{[Name="Subfolder"]}[Content]
{[Name="Subfolder"]}[Content]
{[Name="FileName.xls"]}[Content]
in
Source
Behind the scenes: more API calls, but faster as they only have to return folder items every time. Few seconds should be maximum execution time.
SharePoint.Contents is also you best move is you want to combine files in a folder. Paste your folder link in the tool to get the code to connect to a folder instead of a specific file.
Important caveat: you must have access to the root folder, i.e. you can't use SharePoint.Contents if you only have access to a folder via sharing.
Web.Contents
This method doesn't use SharePoint connectors, you use the web connector to generate the direct call to SharePoint API. And it's faster, a lot faster.
If you look at query diagnostic, this method is a shortcut: it is in fact the last step of the SharePoint.Files method. You spare Power BI to build a map of your SharePoint by giving him the location of the file.
What's the rush?
One minute refresh isn't an issue when refreshing a report, especially on Power BI service. One minute refresh time in Query Editor is painful but manageable. But just try the following with a crowded SharePoint:
- Load the same table from the same Excel or csv file in two separate queries, one with SharePoint.Files and one with Web.Contents
- Filter any value on any column on both queries
- Pay attention to the time required for the same operation on the same file
You will notice every subsequent step have a noticeable lag when you are using SharePoint.Contents, the same simple operation can takes seconds when it's nearly instant if you connected via Web.Contents.
Why this additional lag? Doesn't Power query cache the same file as query result? Why is this lag only few seconds if the full query is one minute? From my understanding, Power BI uses the cache to recreate all steps of your query, not just the last output. So, to add a new step to your SharePoint.Files query, even a simple filtering, Power BI will re-run all the internal steps but using the cache: building the map of your SharePoint by merging cached chunk lists of files and their details, filtering the map to the desired path and file name, reading the file.
SHAREPOINT.FILES TO CONNECT TO A SHAREPOINT FILE IS INEFFICIENT, SLOW TO REFRESH AND SLOWS DOWN POWER QUERY EDITING EXPERIENCE FOR ALL SUBSEQUENT STEPS.
STOP USING THE GUI TO CONNECT TO A SHAREPOINT FILE TO GET A HEALTHIER EDITING EXPERIENCE.