What tool can you use in Power BI Desktop to reduce data

Adam shows you two things to reduce your Power BI dataset size. These are both things he commonly sees with Power BI reports and could potentially save you a lot of space and improve refresh times along with report performance.

Data reduction techniques for import modeling

Power BI VertiPaq engine uses a high compression algorithm based on Columnstore structure, enabling it to load the data into the data model with significant size reduction compared to the original data size. But this doesn’t mean that the developers and data modeller to be careless about the size of the model and don’t follow the best practice to minimise memory consumption. We should never forget the size limitation of the dataset in Power BI Service and the low query performance of the large dataset.

In this article, I show some tips and considerations to reduce the size of the Power BI dataset.

1- Delete any column which is not needed in the reports

There should always be some columns in the tables that don’t have any analytics value, or no one needs them in the reports. Find them in the tables and delete them in Power Query, or exclude them from source data if applicable, for example, when the data source is a database view.

2- Remove the unneeded rows

Filter rows in Power Query or from the underlying data source if only you require a subset of rows in the reports.

3- Turn Auto date time off

In Power BI Desktop, Auto date/time is enabled by default, creating a date hierarchy for each Date and DateTime column in the model. In the background, the Power BI engine makes a temporary table for each column, taking considerable memory if there are too many date columns in the model. Disabling this option and using a Date table instead will save this memory usage. 

4- Manage DateTime columns

Datetime columns can consume a lot of memory, especially when they come in the fact tables. DateTime can keep time value up to a fraction of a millisecond, and it brings high cardinality to the VertiPaq engine and consumes large memory for Data and Dictionary storage. Therefore, the best practice is to convert DateTime to date or split them into two columns if you need to use time value in the reports.

5- Convert Decimal Number  to Fixed Decimal Number

Decimal Numbers can represent the numbers with precision up to 15 digits which is inefficient when data has excessive accuracy.

By changing DataType to Fixed Decimal Number, the floating part of the number would be rounded to 4 digits which brings the cardinality down and takes less memory.

6- Convert Text columns to Numeric

One step of storing data in the VertiPaq engine is to encode the non-numeric columns by assigning a numeric identifier to each unique value in the column. In some specific instances, such as sales order number or invoice number, we can convert the Text columns to numeric, which can significantly reduce the memory usage by the column. We better change the Default Summarization property to “Do Not Summarize” for these columns to avoid inappropriate summarization.

7- Preference creating the new columns in Power Query by M instead of using calculated column by DAX

The VertiPaq engine stores calculated columns (defined in DAX) just like regular columns. However, the data structures are stored slightly differently and typically achieve less efficient compression. The recommendation here is to create the new columns in Power Query or even in the data source if possible.

8- Disable load in Power Query

Should not load those tables and queries intended to be used only for supporting the other queries and are not required in the reports into the model. You can disable load for them in Power Query.  

9- Use the right granularity for the fac tables 

Summarising the fact tables can massively reduce the number of rows and minimise memory usage. For example, a Sales table with Order Numbers and Line Items with millions of records takes massive storage. However, if there were no analysis requirements for the sales detail in the reports, we can use a summarised table by grouping records by day(or even week or month), product, customer, and other required dimension’s keys to have a much smaller table.

10- Use Composite mode and Aggregation

Power BI  Composite mode and Aggregations are incredible methods to balance memory usage and query performance. In this way, we can use both Import and Direct Query modes in one model and direct the reports to use the aggregated values from the memory and detail by sending queries to the underlying data sources. You can read more about Power BI Composite mode in my other blog here.

In addition to the above points, always spend time reviewing the storage used in your Power BI file before deploying it to the service. You can do this easily by DAX Studio, which is an external tool for Power BI.

Summary

Power BI highly compresses the data when loading them into the model, allowing us to load tables with millions of records quickly.  But as there is a size limitation in the Power BI service, we should be mindful about loading the data and follow the best practice in data modelling. I this post, we went through some principles and techniques to efficiently minimize the size of Power BI datasets. 

When working with huge datasets in Power BI, we often get stuck with battling performance issuesThis can take a lot of our time, which otherwise could be used for report development. 

We usually face these issues because of loading big data models in Power BI 

  •  Slow data refreshes  
  • Higher latency reporting (report gets slow and sluggish) 
  • Getting a different license capacity for storing the model.  Power BI Pro license only allows 1 GB data, whereas Premium allows 13 GB data in size.  
  • More pressure on source system and capacity resources. 
  • Slower calculation evaluations 
  • User dissatisfaction 

Reducing the dataset size can improve performance and bring faster upload speeds to the Power BI service. 

Power BI Vertipaq Analyzer 

Previously, we discussed the Vertipaq Engine how it stores data and performs compression to reduce the memory usage of your data model.  

:

To find out more about the Vertipaq Engine.

In this blog we are going to explore:

  • What Vertipaq Analyzer is 
  • Vertipaq Analyzer features and functions
  • How it helps in reduce data size in Power BI

What is Vertipaq Analyzer? 

Vertipaq Analyzer is a helpful

tool used to collect size information for all the objects in a database.  It allows you to investigate detailed information according to the structure and compression of your model.  . 

You can connect to your Power BI desktop from DAX Studio to see detailed information bout your data model This visualization hierarchically shows the table and column information.

To demonstrate, we are going to connect to our sample PBI desktop file ‘TEST’ from DAX studio.  We want to find out what columns consume the most space in this model and how we can to reduce its data model size. 

The current size of ‘Test’ file is 28,783 KB. 

After connecting to PBI Desktop, we can analyze the complete metadata of the tables as shown below. 

You can also see detailed column information, regardless of the table they belong to, as shown below.  

It can help identify the most expensive columns with high cardinality. So, you can decide if you want to discard a few or keep them all.  

By looking at this detailed information from Vertipaq Analyzer, you can easily decide which of the following techniques you should apply when trying to reduce the data model size.  

Basic Techniques for Data Compression

Below are five techniques that can help reduce the size of your dataset and make sure that your model stays compact while delivering all the insights.

  1. Manage Date Time Columns 
  2. Delete Unneeded Columns 
  3. Remove Unnecessary Rows 
  4. Amend Column Types 
  5. Create New Columns in Power Query instead of using Calculated Column in DAX 

Technique # 1: Manage Date Time Columns 

One of the important Date/Time techniques is to disable Auto date/time. Whenever it is enabled, it creates date/timetables for date columns. These tables are hidden and are, in fact, calculated tables. That’s why they increase the model size. So, it’s very important to disable this to reduce the report in size. 

In the following image, you can see how big the size of these columns is and how they are contributing to the model size. 

Uncheck the setting as below in PBI Desktop to turn auto date/time off. 

After saving this change, the model size reduces by up to 65%! 

Another technique to reduce dataset size through managing date/time column is to split them. Power BI creates a date hierarchy for each date and date-time column. Dates always show up with time by default in Power BI.  This can consume a lot of memory.  In the Date Time columnthe time value contains a fraction of milliseconds and consumes large amounts of memory for data storage. The best way to cut down on characters is to convert the Date Time column to date. If time values are required in the report, then we can split the Date Time column  into two columns and use time values from the time column.

Technique # 2: Delete Unneeded Columns 

Whenever we get data in Power BI, many columns are not used while creating the dashboards. So, the fastest way to reduce your dataset is to remove the unwanted columns and limit them to the ones you need. Either you can exclude them from the source directly, or do so using Power Query. Go through all the columns in the Power Query and delete those not in use in any of your reports or calculations.    

If the requirements get changed in the future; you can simply add new columns as adding columns is much easier than deleting them.  

Technique # 3: Remove Unnecessary Rows

Similarly, keep only those rows you need.  Add only relevant and required data to the tablesLimiting the number of rows in a dataset is also known as Horizontal Filtering. You can filter out rows by entity or by time. 

Filtering by entity involves two things: 

  1. Omitting categories outside the scope of your analysis. 
  2. Only loading a subset of source data into the model. 

Filtering by time involves:  

Limiting the amount of data to few years as business context might have changed and including that older data may produce inaccurate results.

Technique # 4: Amend Column Types 

Converting text columns to numeric can reduce memory usage significantly as integers are faster to operate and consumes less memory. you should convert any text column exclusively having numbers, e.g., Sales Order number, Invoice number etc. to numeric whenever it’s possible.

If there are integers with high precision, i.e., by default, it can show up to 15 digits, convert their data type  to Fixed Decimal Number as it takes less memory and is more efficient. Lastly, if there are columns with values TRUE and FALSE, they should be converted to binary digits, i.e., 1 and 0.

Technique # 5: Create Custom Columns in Power Query 

 In Power BI, the best practice is to create most of the calculated columns as Power Query computed columns in M language. The Vertipaq Engine stores both, the calculated columns (defined in DAX) and Power Query computed columns (defined in M language) in the same way. Still, the difference lies in how their data structures are stored. Due to this, issues like poor efficiency and extended data refresh times arise if you create columns through DAX. It is, therefore, less efficient to add columns as calculated columns as compared to Power Query computed columns. 

However, there are some known exceptions, i.e., you can create certain calculated columns only through DAX.  But  it’s preferable to create them in Power Query as they can help you achieve greater load efficiency. 

So, we reduced almost 75% of our data model size following the aforementioned techniques. 

Conclusion 

There are several ways to optimize a large data model in Power BI. In this blog, we have discussed some of the common techniques that will help reduce the size of your data model. You can try each of these and see which works best for you as it varies from dataset to dataset that which technique is more useful. And it’s all worth it as it helps you reduce your data model size, and you also get to learn a lot on the way.

Postingan terbaru

LIHAT SEMUA