Power BI: Dataset Intersect(s)


Power BI can pull in and analyze data from a variety of sources. And once loaded, Power BI will form data relationships wherever it can. Even so, new datasets are sometimes needed instead of data relationships.

One way to create these new datasets is to use the INTERSECT() expression with dataset pairs. One business scenario, listing the similar entries of customer viewing history to measure content traffic:

Figure 1 – Power BI sample dataset.

Important to note, the first dataset dictates what the second dataset has to match against. With that in mind, be aware that INTERSECT() will treat the entirety of each row of dataset #2 as a comparable record against the entirety of each row of dataset #1:

tblOfIntersects00 = INTERSECT(CJ, SYM)
Figure 2 – Power BI table result.

But to intersect data and use only specific columns, purge the datasets of unnecessary fields. Or create new data expressions using the DISTINCT() expression on a particular column. Because the scenario is looking for matching titles, the latter works just fine:

INTERSECT(
     DataExp01,
     DataExp02
)
INTERSECT(
     DISTINCT(CJ[Title]),
     DISTINCT(SYM[Title])
)
Figure 3 – Power BI table result.

Again, the intersect expression only accepts two parameters. So, for scenarios needing to intersect more than two datasets, or data expressions, do so in pairings:

  • E.g., intersecting three datasets option #1.
INTERSECT(
   INTERSECT(
      DataExp01,
      DataExp02
   ),
   DataExp03
)
INTERSECT(
   INTERSECT(
      DISTINCT(CJ[Title]),
      DISTINCT(SYM[Title])
   ),
   DISTINCT(Rod[Title])
)
  • E.g., intersecting three datasets option #2.
INTERSECT(
     DataExp01,
     INTERSECT(
          DataExp02,
          DataExp03
     )
)
INTERSECT(
     DISTINCT(CJ[Title]),
     INTERSECT(
          DISTINCT(SYM[Title]),
          DISTINCT(Rod[Title])
     )
)

  • E.g., intersecting four datasets option #1.
INTERSECT(
     DataExp01,
     INTERSECT(
          DataExp02,
          INTERSECT(
               DataExp03,
               DataExp04
          )
     )
)
INTERSECT(
     DISTINCT(CJ[Title]),
     INTERSECT(
          DISTINCT(SYM[Title]),
          INTERSECT(
               DISTINCT(Rod[Title]),
               DISTINCT(Mike[Title])
          )
     )
)
  • E.g., intersecting four datasets option #2.
INTERSECT(
     INTERSECT(
          DataExp01,
          DataExp02
     ),
     INTERSECT(
          DataExp03,
          DataExp04
     )
)
INTERSECT(
     INTERSECT(
          DISTINCT(CJ[Title]),
          DISTINCT(SYM[Title])
     ),
     INTERSECT(
          DISTINCT(Rod[Title]),
          DISTINCT(Mike[Title])
     )
)
  • E.g., intersecting four datasets option #3.
INTERSECT(
     INTERSECT(
          INTERSECT(
               DataExp01,
               DataExp02
          ),
          DataExp03
     ),
     DataExp04
)
INTERSECT(
     INTERSECT(
          INTERSECT(
               DISTINCT(CJ[Title]),
               DISTINCT(SYM[Title])
          ),
          DISTINCT(Rod[Title])
     ),
     DISTINCT(Mike[Title])
)

Conclusion:
INTERSECT() is useful for a number of reasons. As reports pull from more and more sources, the INTERSECT() and DISTINCT() expressions are helpful when looking for redundant/ duplicate records. Or framing it another way, depending on the data sources, looking for comparable records…

“In order to see where we are going, we not only must remember where we have been, but we must understand where we have been.”

Ella Baker

#BlackLivesMatter

Leave a comment