HIVE-29121: Modify HiveSubQueryRemoveRule to use InnerJoin instead of SemiJoin for uncorrelated IN/EXISTS subqueries #6007
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch modifies
HiveSubQueryRemoveRule
to use InnerJoin instead of SemiJoin for uncorrelated IN/EXISTS subqueries withlogic == RelOptUtil.Logic.TRUE
. This restores the logic inHiveSubQRemoveRelBuilder$join
, which is replaced by Calcite's RelBuilder.Why are the changes needed?
To restore the previous logic and increase the chances of applying optimizations to the query plan.
We started using SemiJoin in
HiveSubQueryRemoveRule
with HIVE-17767, which introduced SemiJoin for correlated subqueries in conjunction with removing the input Aggregate. Since HIVE-24685, we have been using SemiJoin for both uncorrelated and correlated subqueries, but we currently do not remove the input Aggregate for uncorrelated subqueries.In addition, some optimization rules are not applicable to HiveSemiJoin, which reduces the chances of finding an optimal plan. Therefore, I suggest restoring the previous logic by using InnerJoin for uncorrelated cases.
Does this PR introduce any user-facing change?
No
How was this patch tested?
I added a new
EXPLAIN CBO
query with uncorrelated IN subquery insubquery_in.q
to check whether join reordering is applied. One can verify that CalcitePlanner produces a query plan with a different join order when this patch is applied.