Adam shows you two things to reduce your Power BI dataset size. These are both things he commonly sees with Power BI reports and could potentially save you a lot of space and improve refresh times along with report performance. Show
Data reduction techniques for import modeling
Power BI VertiPaq engine uses a high compression algorithm based on Columnstore structure, enabling it to load the data into the data model with significant size reduction compared to the original data size. But this doesn’t mean that the developers and data modeller to be careless about the size of the model and don’t follow the best practice to minimise memory consumption. We should never forget the size limitation of the dataset in Power BI Service and the low query performance of the large dataset. In this article, I show some tips and considerations to reduce the size of the Power BI dataset. 1- Delete any column which is not needed in the reports There should always be some columns in the tables that don’t have any analytics value, or no one needs them in the reports. Find them in the tables and delete them in Power Query, or exclude them from source data if applicable, for example, when the data source is a database view.2- Remove the unneeded rowsFilter rows in Power Query or from the underlying data source if only you require a subset of rows in the reports. 3- Turn Auto date time offIn Power BI Desktop, Auto date/time is enabled by default, creating a date hierarchy for each Date and DateTime column in the model. In the background, the Power BI engine makes a temporary table for each column, taking considerable memory if there are too many date columns in the model. Disabling this option and using a Date table instead will save this memory usage. 4- Manage DateTime columnsDatetime columns can consume a lot of memory, especially when they come in the fact tables. DateTime can keep time value up to a fraction of a millisecond, and it brings high cardinality to the VertiPaq engine and consumes large memory for Data and Dictionary storage. Therefore, the best practice is to convert DateTime to date or split them into two columns if you need to use time value in the reports. 5- Convert Decimal Number to Fixed Decimal NumberDecimal Numbers can represent the numbers with precision up to 15 digits which is inefficient when data has excessive accuracy. By changing DataType to Fixed Decimal Number, the floating part of the number would be rounded to 4 digits which brings the cardinality down and takes less memory. 6- Convert Text columns to Numeric One step of storing data in the VertiPaq engine is to encode the non-numeric columns by assigning a numeric identifier to each unique value in the column. In some specific instances, such as sales order number or invoice number, we can convert the Text columns to numeric, which can significantly reduce the memory usage by the column. We better change the Default Summarization property to “Do Not Summarize” for these columns to avoid inappropriate summarization. 7- Preference creating the new columns in Power Query by M instead of using calculated column by DAXThe VertiPaq engine stores calculated columns (defined in DAX) just like regular columns. However, the data structures are stored slightly differently and typically achieve less efficient compression. The recommendation here is to create the new columns in Power Query or even in the data source if possible. 8- Disable load in Power QueryShould not load those tables and queries intended to be used only for supporting the other queries and are not required in the reports into the model. You can disable load for them in Power Query. 9- Use the right granularity for the fac tablesSummarising the fact tables can massively reduce the number of rows and minimise memory usage. For example, a Sales table with Order Numbers and Line Items with millions of records takes massive storage. However, if there were no analysis requirements for the sales detail in the reports, we can use a summarised table by grouping records by day(or even week or month), product, customer, and other required dimension’s keys to have a much smaller table. 10- Use Composite mode and Aggregation Power BI Composite mode and Aggregations are incredible methods to balance memory usage and query performance. In this way, we can use both Import and Direct Query modes in one model and direct the reports to use the aggregated values from the memory and detail by sending queries to the underlying data sources. You can read more about Power BI Composite mode in my other blog here. In addition to the above points, always spend time reviewing the storage used in your Power BI file before deploying it to the service. You can do this easily by DAX Studio, which is an external tool for Power BI. Summary Power BI highly compresses the data when loading them into the model, allowing us to load tables with millions of records quickly. But as there is a size limitation in the Power BI service, we should be mindful about loading the data and follow the best practice in data modelling. I this post, we went through some principles and techniques to efficiently minimize the size of Power BI datasets.
When working with huge datasets in Power BI, we often get stuck with battling performance issues. This can take a lot of our time, which otherwise could be used for report development. We usually face these issues because of loading big data models in Power BI
Reducing the dataset size can improve performance and bring faster upload speeds to the Power BI service. Power BI Vertipaq AnalyzerPreviously, we discussed the Vertipaq Engine how it stores data and performs compression to reduce the memory usage of your data model. :To find out more about the Vertipaq Engine. In this blog we are going to explore:
What is Vertipaq Analyzer?Vertipaq Analyzer is a helpful tool used to collect size information for all the objects in a database. It allows you to investigate detailed information according to the structure and compression of your model. . You can connect to your Power BI desktop from DAX Studio to see detailed information bout your data model. This visualization hierarchically shows the table and column information. To demonstrate, we are going to connect to our sample PBI desktop file ‘TEST’ from DAX studio. We want to find out what columns consume the most space in this model and how we can to reduce its data model size. The current size of ‘Test’ file is 28,783 KB. After connecting to PBI Desktop, we can analyze the complete metadata of the tables as shown below. You can also see detailed column information, regardless of the table they belong to, as shown below. It can help identify the most expensive columns with high cardinality. So, you can decide if you want to discard a few or keep them all. By looking at this detailed information from Vertipaq Analyzer, you can easily decide which of the following techniques you should apply when trying to reduce the data model size. Basic Techniques for Data CompressionBelow are five techniques that can help reduce the size of your dataset and make sure that your model stays compact while delivering all the insights.
Technique # 1: Manage Date Time ColumnsOne of the important Date/Time techniques is to disable Auto date/time. Whenever it is enabled, it creates date/timetables for date columns. These tables are hidden and are, in fact, calculated tables. That’s why they increase the model size. So, it’s very important to disable this to reduce the report in size. In the following image, you can see how big the size of these columns is and how they are contributing to the model size. Uncheck the setting as below in PBI Desktop to turn auto date/time off. After saving this change, the model size reduces by up to 65%! Another technique to reduce dataset size through managing date/time column is to split them. Power BI creates a date hierarchy for each date and date-time column. Dates always show up with time by default in Power BI. This can consume a lot of memory. In the Date Time column, the time value contains a fraction of milliseconds and consumes large amounts of memory for data storage. The best way to cut down on characters is to convert the Date Time column to date. If time values are required in the report, then we can split the Date Time column into two columns and use time values from the time column.
Technique # 2: Delete Unneeded ColumnsWhenever we get data in Power BI, many columns are not used while creating the dashboards. So, the fastest way to reduce your dataset is to remove the unwanted columns and limit them to the ones you need. Either you can exclude them from the source directly, or do so using Power Query. Go through all the columns in the Power Query and delete those not in use in any of your reports or calculations. If the requirements get changed in the future; you can simply add new columns as adding columns is much easier than deleting them. Technique # 3: Remove Unnecessary RowsSimilarly, keep only those rows you need. Add only relevant and required data to the tables. Limiting the number of rows in a dataset is also known as Horizontal Filtering. You can filter out rows by entity or by time. Filtering by entity involves two things:
Filtering by time involves: Limiting the amount of data to a few years as business context might have changed and including that older data may produce inaccurate results. Technique # 4: Amend Column TypesConverting text columns to numeric can reduce memory usage significantly as integers are faster to operate and consumes less memory. you should convert any text column exclusively having numbers, e.g., Sales Order number, Invoice number etc. to numeric whenever it’s possible. If there are integers with high precision, i.e., by default, it can show up to 15 digits, convert their data type to Fixed Decimal Number as it takes less memory and is more efficient. Lastly, if there are columns with values TRUE and FALSE, they should be converted to binary digits, i.e., 1 and 0. Technique # 5: Create Custom Columns in Power QueryIn Power BI, the best practice is to create most of the calculated columns as Power Query computed columns in M language. The Vertipaq Engine stores both, the calculated columns (defined in DAX) and Power Query computed columns (defined in M language) in the same way. Still, the difference lies in how their data structures are stored. Due to this, issues like poor efficiency and extended data refresh times arise if you create columns through DAX. It is, therefore, less efficient to add columns as calculated columns as compared to Power Query computed columns. However, there are some known exceptions, i.e., you can create certain calculated columns only through DAX. But it’s preferable to create them in Power Query as they can help you achieve greater load efficiency. So, we reduced almost 75% of our data model size following the aforementioned techniques. ConclusionThere are several ways to optimize a large data model in Power BI. In this blog, we have discussed some of the common techniques that will help reduce the size of your data model. You can try each of these and see which works best for you as it varies from dataset to dataset that which technique is more useful. And it’s all worth it as it helps you reduce your data model size, and you also get to learn a lot on the way. |