This article covers Collaborative Query Execution (CQE) for aggregations. For information about CQE with filters or joins please see the following:
It is assumed that the reader understands the rough idea of what CQE is and how it works. Please see AETree optimization for information.
The Codebase sample driver is used to illustrate CQE for aggregations. As an example take the following query:
select COUNT(*) from EMPLOYEE
Just before the pass-down optimization step (described in the AETree optimization article), the AETree looks like this:
AEProxyColumn: AEAggregate - column #0
and after the pass-down step, which Codebase handles, the AETree looks like this:
What has happened here is that Codebase has handled the aggregation (COUNT(*)) in the DSII and returned a CBAggregationResult representing the aggregation to the SQLEngine. The AETable node for the EMPLOYEE node actually represents this, and you can see this yourself if you modify the CBAggregationResult::GetTableName() function to alter the returned table name. The SQLEngine uses this result returned from the DSII, and discards the AEAggregate node that was previously used.
The SQLEngine attempts to pass-down the entire aggregation, which can contain one or more aggregate functions. If at least one of the aggregate functions can’t be handled, then the pass-down is aborted and the SQLEngine will handle the aggregation. The SQLEngine does not allow partial aggregations.
Note: A CQE aggregation can’t be applied to the result of a sub-section of the AETree if that section had an operation which could not be handled via CQE, instead the SQLEngine will handle it. This is because if the DSII could not handle the operation that would result in the result set to be filtered, then it would not be possible for it to then apply an aggregation on a result it could not compute.
If there are no columns in the GROUP BY clause, then the result should only have a single row. Otherwise, there should be one row for each group specified by the set of columns in the GROUP BY clause. For example, the following query:
select C1, COUNT(*) from T1 group by C1
would have two columns, one for C1 and one for COUNT(*), and should return a row for every different value of C1. If the table was as follows:
then the returned result should look like the following:
Note: The columns and expressions should still appear in the select list in the same order the Set*() functions are called.
If a query containing an aggregation is in a complicated form, SimbaEngine will transform it into a “standard” form. For example, for the following query:
select C1 + AVG(C2) * COUNT(C3) from T1 group by C1
the SQLEngine will ensure that an IAggregationHandler object only need to deal with a query that resembles this:
select C1, AVG(C2), COUNT(C3) from T2 group by C1
The DSIExt classes that are actually involved in the above CQE process are:
- A DSII level IResult representing the aggregation result.
For Codebase, the actual classes are the following:
The DSIExtOperationHandlerFactory, and subclasses, are responsible for creating the handlers for passing down filters, joins, and aggregations to the DSII.
The individual handlers will be created and used as needed by the SQLEngine during the pass-down optimization phase.
Finally, the aggregation result will be returned via Passdown() on the IAggregationHandler subclass.
There are base classes supplied for the Boolean expression handler that can be used to simplify implementation. The hierarchy is:
IAggregationHandler -> DSIExtAbstractAggregationHandler -> DSIExtSimpleAggregationHandler
Typically you will subclass the DSIExtSimpleAggregationHandler, which provides support for some common aggregations. If you need more functionality then you should override other functions in the parent classes as needed, or subclass DSIExtAbstractAggregationHandler. SetAggregateFn() will take aggregate function IDs, which should be inspected to see if your DSII can handle the condition.