To Dataflow or not to Dataflow, that is the Question…

Overview

If you are familiar working with Power BI you will definitely be familiar with datasets. Datasets are the base data layer of your Power BI report and store your data from a variety of data connections, which you can then use to build out your report visualizations.

Dataflow Self-Service Data Prep (Source: Microsoft)

When to use them?

So obviously using dataflows sounds like a great idea to talk about it theory, but why would or should you or your organization consider going down the route of using dataflows and when should you use them? To make things easier, I have created a list below that will help you decide if dataflows meets your data requirements.

When and why?

  • Reusable transformation logic – If you are using datasets or tables in your reports that are being replicated in other reports and you want to share that logic instead of needing to duplicate it. Remember, it may be easy to duplicate logic, but if you end up making a change to that logic down the line, it means you need to make those changes across your other reports that share that logic.
  • Breaking up data and report access – Depending on your organization, you might have certain individuals who build your report visualizations and others who build the datasets. Dataflows allow you to break these two pieces into separate components and hide that logic from your report creators.
  • Single source – Allows you to provide a single source of truth in your organization for tables or multiple forms of data. Dataflows also allow you to endorse the dataflow in your organization which will add an endorsement tag beside the dataflow. Endorsements are broken up into three categories:
    • No endorsement – The default state, no tag will appear beside the dataflow.
    • Promotion – Highlighting the content that might be valuable or worthwhile to others.
    • Certification – This certifies that the data meets your organizations quality standards, which means the data is reliable for use.
  • Partition Data source load – Depending on the data source that you are querying when refreshing your data, you may only want to refresh against it once a day for example. This could be due to dataset refreshing impacting performance on a business critical database or you want to limit the number of calls to an API.
  • Re-use M query generated tables – Perhaps there are tables that are built using M query (so not being queried against a data source) that are used across your organization. An example might be a date dimension, then using data flows allows you to easily share that table across your organization without a report creator needing to regenerate the logic themselves.

Wrap up

As you see, there are a lot of added benefits to make the switch from the common dataset model to Power BI dataflows. If you haven’t tried out dataflows yet, take a look at Microsoft’s article on Creating a dataflow to dip your toes into Power BI dataflows.

Catch you in the next one! ✌️

Leave a Reply

Your email address will not be published.