Skip to content

Conversation

ngsg
Copy link
Contributor

@ngsg ngsg commented Aug 5, 2025

What changes were proposed in this pull request?

This patch modifies HiveSubQueryRemoveRule to use InnerJoin instead of SemiJoin for uncorrelated IN/EXISTS subqueries with logic == RelOptUtil.Logic.TRUE. This restores the logic in HiveSubQRemoveRelBuilder$join, which is replaced by Calcite's RelBuilder.

Why are the changes needed?

To restore the previous logic and increase the chances of applying optimizations to the query plan.

We started using SemiJoin in HiveSubQueryRemoveRule with HIVE-17767, which introduced SemiJoin for correlated subqueries in conjunction with removing the input Aggregate. Since HIVE-24685, we have been using SemiJoin for both uncorrelated and correlated subqueries, but we currently do not remove the input Aggregate for uncorrelated subqueries.

In addition, some optimization rules are not applicable to HiveSemiJoin, which reduces the chances of finding an optimal plan. Therefore, I suggest restoring the previous logic by using InnerJoin for uncorrelated cases.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I added a new EXPLAIN CBO query with uncorrelated IN subquery in subquery_in.q to check whether join reordering is applied. One can verify that CalcitePlanner produces a query plan with a different join order when this patch is applied.

Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if the solution should be in HiveSubQueryRemoveRule. Let's continue the discussion under HIVE-29121 and once we converge/clarify what the problem is I will do a full review of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants