Title: 10.1-Advanced Power Query & Data Modeling Guide
Date: 2025-11-05 15:47
Status: Child
Tag: Power BI Power Query data modeling Data Analysis
🎯 Overview
This guide covers advanced Power BI techniques, focusing on data connection, transformation via Power Query, appending datasets, dynamic folder imports, conditional logic, reporting with matrices, and introductory data modeling. Core concepts emphasize handling large datasets efficiently, ensuring data integrity, and creating combined reports from multiple sources.
Key Learning Objectives:
- Connect and transform CSV/Excel data.
- Append static datasets for combined analysis.
- Import dynamic data from folders for scalability.
- Apply conditional columns for bucketing.
- Build matrix reports and refresh data.
- Model relationships between tables.
🔗 Data Connection & Initial Setup
Connecting to Data Sources
- Power BI Workflow: Start with a blank report. Use Home > Get Data to connect sources.
- Preferred Formats for Large Data:
- CSV: Lightweight (e.g., 800 KB vs. 10 MB Excel), ideal for flat data storage. Faster loading for large volumes (e.g., 7-10 lakh rows).
- Excel: Supports formatting, formulas, macros, but heavier; avoid for raw data import.
- Steps to Connect CSV:
- Get Data > File > CSV > Select file > OK.
- Preview data; click Transform Data to enter Power Query Editor (for transformations) or Load (direct to Power BI).
- Transform vs. Load:
- Transform Data: Opens Power Query Editor for cleaning/transformation.
- Load: Imports without changes.
Handling Headers & Data Integrity
- Promoted Headers: Automatically detects first row as headers. Essential if data lacks explicit headers.
- Merged Cells Issue (Excel-Specific):
- Merged cells disrupt data flow: Values anchor to the top-left cell (e.g., A1 merge spans A1:D1, value appears only in A1).
- Results: Blank rows, misaligned headers, duplicated/inconsistent data (e.g., “Product” repeats across rows).
- Best Practice: Unmerge all cells before import. Ensures clean, flat structure for Power Query.
- Applied Steps Pane: Tracks transformations (e.g., Source, Promoted Headers). Review/edit steps to preserve data.
🔄 Appending Queries: Combining Static Datasets
Scenario: Multi-Year Sales Data
- Problem: Combine sales CSV files (e.g., 2019: 8 lakh rows, 2020: 9 lakh, 2021: 10 lakh) into one report.
- Benefits: Single query for analysis; avoids separate reports per file.
Steps to Append
- Connect Each File:
- In Power Query Editor: Home > New Source > File > CSV > Select file (e.g., sales_2019.csv) > Transform Data.
- Repeat for 2020/2021. Ensures consistent column sequence (e.g., Order Date, Stock Date, Order Number, Product Key).
- Append Queries:
- Select one query (e.g., 2021) > Home > Append Queries > Append Queries as New.
- In dialog: Choose Three or more tables > Add tables (e.g., 2019, 2020, 2021).
- Reorder: Use Move Up/Down for sequence (e.g., 2019 first).
- Click OK > Rename new query (e.g., “Sales_2019-2021”).
- Verify: Scroll preview; data stacks chronologically (e.g., Jan 2019 to Dec 2021).
- Requirements: Identical column structure/types across files. Mismatched columns cause misalignment.
- Close & Apply: Loads combined query to Power BI (ETL: Extract, Transform, Load).
Loading & Performance
- Storage: Loads to local RAM (not server unless published).
- Impact: Larger datasets (e.g., 27 lakh rows) increase load time; use CSV for efficiency.
📁 Dynamic Folder Imports: Scalable Data Handling
Scenario: Monthly/Branch Data
- Problem: Files added dynamically (e.g., monthly sales: Jan-Dec, or branch files: Bogura.csv, Dhaka.csv).
- Benefits: Automatic inclusion of new files; refresh pulls updates without re-import.
Steps to Import from Folder
- Connect Folder:
- Get Data > More > File > Folder > Browse to path (e.g., G:\PowerBI\Batch8\NewFolder) > OK.
- Combine Files:
- In preview: Click Combine & Transform Data (dropdown on table header).
- Select sample file/sheet (e.g., Sheet1) > OK.
- Power Query scans all files, stacks data (e.g., columns: Source/Branch, Date, Item, Quantity).
- Customization:
- Filter/remove unwanted files if needed.
- Close & Apply: Creates query (e.g., “NewFolder”).
Refresh Mechanism
- Dynamic Updates:
- Add new file (e.g., April.csv) to folder > Home > Refresh in Power BI.
- System re-scans folder, appends new data automatically.
- Example: Branch Report:
- Files: Bogura.csv (407,700 qty), Chitagong.csv, etc.
- Add “NewBranch.csv” > Refresh > Appears in report.
- Delete file > Refresh > Removed from report.
- Use Case: Developer sets up; end-user (e.g., finance) adds files and refreshes—no Power Query access needed.
- Limitations: Files must match structure; fixed number of files suits append, dynamic suits folders.
Summary Steps for Folder Import
- Get Data > Folder > Select path > OK.
- Combine & Transform Data > Select sample > OK.
- Close & Apply > Refresh for updates.
🤝 Conditional Columns: Data Bucketing
Scenario: Customer Aging Analysis
- Problem: Bucket ages from DOB (e.g., 80+, 70-79, 60-69, 50-59, <50); analyze by gender.
Steps in Power Query
- Connect & Prep Data:
- Get Data > CSV (e.g., customer_data.csv).
- Add Age column: Add Column > Date > Age (from DOB).
- Round Age:
- Select Age > Transform > Data Type > Whole Number > Rounding > Round (e.g., to nearest integer).
- Create Conditional Column:
- Add Column > Conditional Column > Name: “Age Group”.
- Logic (descending order for ranges):
- If Age >= 80 → “80 and above”.
- Else If Age >= 70 → “70-79”.
- Else If Age >= 60 → “60-69”.
- Else If Age >= 50 → “50-59”.
- Else → “Below 50”.
- Logic Note: Descending thresholds prevent overlap (e.g., 75 matches 70-79, not 80+).
- Close & Apply.
If-Then-Else Syntax Insight
- Visual editor mirrors M formula: Checks conditions sequentially; “Else” catches unmatched cases.
- Best Practice: Order from highest/lowest range; test for edge cases.
📈 Reporting with Matrices
Building Matrix Visuals
- Views in Power BI:
- Report View: Drag visuals (e.g., Matrix).
- Data View: Preview tables/columns.
- Model View: Relationships (covered later).
- Matrix Setup:
- Insert Matrix visual.
- Rows: Categorical (e.g., Territory Key, Gender, Age Group).
- Columns: Categorical (e.g., Year/Source/Branch).
- Values: Numeric (e.g., Order Quantity, Customer Key—sums automatically).
- Examples:
- Appended Sales: Rows=Territory, Values=Order Quantity → Aggregates 2019-2021 (e.g., Total: 40 across years).
- Folder Branches: Rows=Item, Columns=Source (Branch) → Pivot shows qty per branch (e.g., Bogura: 407,700).
- Aging Report: Rows=Gender, Columns=Age Group, Values=Customer Key (Count) → Reveals distribution (e.g., 18,148 total; females dominant in 50-59).
Slicers & Parameters (Teaser)
- Add slicers for Year/Branch filtering; dynamic via bookmarks/parameters (next session).
🛠️ Data Modeling Introduction
Scenario: Multi-Table Analysis
- Tables:
- Product: Master (Product Key, Name, Subcategory, Color, Size).
- Sales: Transactions (Product Key, Order Quantity).
- Returns: (Product Key, Return Quantity).
- Problem: Report net quantity (Sales - Returns) by Product Name (not Key).
Steps to Model
- Load Tables:
- Get Data > CSV for each > Load (direct to Power BI).
- Relationships (Model View):
- Drag Product Key from Product to Sales/Returns (one-to-many).
- Enables cross-table queries (e.g., Product Name pulls from master).
- Report Build:
- Matrix: Rows=Product Name (from Product), Values=Order Quantity (Sales) + Return Quantity (Returns).
- Benefits: Normalized data; avoids duplication; supports DAX calculations (next session).
Key Principles
- Keys: Use unique keys for joins.
- Cardinality: One-to-many (master to transactions).
- Direction: Single (filter flows from master to facts).
🚀 Best Practices & Tips
- Data Prep: Unmerge cells; match structures before append.
- Performance: CSV for large data; preview before load.
- Dynamic vs. Static: Folders for growing data; append for fixed sets.
- Validation: Check Applied Steps; refresh tests.
- Next Steps: DAX measures, relationships depth, parameters/slicers.
📚 Resources
- Practice Files: Sales_2019-2021 CSVs, Branch folder samples, Customer DOB CSV.
- Explore: Power Query M functions (e.g., Text.Upper, Date.From).
Follow up
References