Algebraic Expression Tree (AETree) represents a prepared query along with metadata about the query. The AETree is used with the Simba SQLEngine, and will not be used if your data store is SQL capable. The AETree is used during Collaborative Query Processing, allowing for extensive optimization of queries depending on the capabilities of your data store.

Optimization of the Algebraic Expression Tree is a 3-step process. First, tables are re-ordered within the query, then operations are pushed-down in the AETree, and finally operations are passed-down to the Data Source Interface Implementation (DSII) via Collaborative Query Execution (CQE). This article goes over the three steps briefly to give developers using the Simba SQLEngine some insight into the optimization process. The Codebase sample driver is used as an example when working through the optimization steps.

The AETree diagrams used here were generated by the SQLEngine logs that are controlled by the DSIEXT_DATAENGINE_LOG_AETREES data engine property. This property includes values for logging in four different locations, each of which corresponds to a step in the optimization process as that have been outlined above. These AETree logs can be used to help understand what is going on in each case.

It is assumed that readers of this article have some knowledge of the SQLEngine and the AETree generated by the SQLEngine.

Step 1 – Table Reordering

Tables are re-ordered within the AETree, only when there are multiple tables, to allow for SELECT nodes with a child CROSS-JOIN to be converted into an INNER JOIN which can be passed-down via CQE. Additional optimizations based on table statistics may be added in the future.

To give an example of this, if we execute the following query using Codebase:

then the AETree looks like this before optimization:

The AESelect can’t be pushed down in a later optimization as the condition involves a table in another cross-join, and thus we can’t turn the AESelect->AECrossJoin relation into an inner join as the two tables involved in the filter are at different levels. After re-ordering the tables, we have the following AETree:

You can see the ADDRESS and DEPT tables have swapped positions, and now the AESelect can be pushed down to involve only the two needed tables, which allows it to be turned into an inner join which can be passed-down. Conceptually, this woud be the same as the following query:

Step 2 – Push-Down Optimization

Note that push-down is used to indicate optimizations where filters are pushed down in the AETree by the SQLEngine, and pass-down to indicate passing parts of the AETree to the DSII (CQE). In this step, filters are pushed down the AETree so they are at the lowest possible location on the AETree. This allows for data to be filtered out at the earliest possible time, reducing work for the SQLEngine. In addition, AESelect->AECrossJoin relationships are transformed into AEJoin nodes if possible. There can be other optimizations in play here, but the optimizations discussed give a general idea of what is occurring.

To demonstrate pushing down of a simple filter, take the query

You can see that the AESelect is above the AECrossJoin, which means the condition would be applied to the result of the cross-join, and thus be applied to many more rows than was needed. After push-down the AETree looks as follows:

Here you can see that the AESelect has been pushed down so the condition is only applied to the EMPLOYEE table, and the cross-join is then applied to the filtered EMPLOYEE result and the ADDRESS table, resulting in many fewer rows being processed. Conceptually, this is the same as the following query:

To demonstrate pushing down of filters which cause an AESelect->AECrossJoin relationship to become an AEJoin, we can continue the example from the Table Re-order phase which had an AETree that looked like this:

After the push-down phase, the tree looks like this:

What has happened here is that the AESelect was pushed down one level as the filter condition only applied to the EMPLOYEE and the DEPT table, so the ADDRESS table could be left out of the filtering. Once this happened, there was an AESelect->AECrossJoin relationship where the condition for the AESelect only involved the two child tables, and could be turned into an AEJoin.

Note that the transformation of an AESelect->AECrossJoin into an AEJoin only occurs after the initial push-down of filters occurs. This is to allow filters that might apply only to one operand of the AECrossJoin to be pushed down directly to that operand and not be applied at the join level.

Step 3 – Pass-Down Optimization (CQE)

Pass-down and CQE are used interchangeably, so if you see mention of pass-down then just mentally substitute CQE. CQE allows all or part of query execution to be handled by the data source, reducing the work that the SQLEngine must do. Since the SQLEngine is by necessity generalized and separated from the data source, it is usually slower than if the data source is able to handle the execution directly.

There are three types of CQE that can occur:

  • filtering
  • aggregations
  • joins

This section gives a brief overview of what each of the types does, for more detail on CQE for each of the types please see the following articles:

When passing filter execution to the DSII, CQE allows for the DSII to handle all or part of the filters. For example, take the following query:

Your data source may only be able to handle equality comparisons. If so, it could handle the “EMPLOYEE.FIRST_NAME = ‘Susan'” filter and return a result set representing the EMPLOYEE result with that filter already applied. The SQLEngine would then apply the “EMPLOYEE.NUM_SALARY > 10000” filter to the reduced result set. Your data source may also be able to handle the entire WHERE clause, but you may not be able to fully handle some combinations of the clause. In this case, you can filter out some of the results but not all of them, so you can return the partially filtered result and have the SQLEngine then apply the full filter again to ensure the results are correct.

Handling CQE with joins is very similar to handling CQE with filters, in fact you can consider CQE with filters to be a specialized case of CQE with joins involving only one table. Take the following query as an example:

Your data source can handle all or part of the ON condition of the join, as described above for the CQE with filters. The only difference here is that your DSII must return a single result set that represents the joined result set, in this example it would represent the join of EMPLOYEE and DEPT.

Finally, CQE with aggregations is a little bit different. Take the following query as an example:

Your DSII should inspect the aggregations that are passed down, if one of the aggregations can’t be handled than the aggregation pass-down will not be processed and the SQLEngine will handle the aggregation. Otherwise, a result set should be returned with one column for each of the aggregations. For this example that would be two columns, one for MAX(NUM_SALARY) and one for AVG(NUM_SALARY), and one row. If there is a GROUP BY, then there should be one row for each of the groups that is represented.

One thing to keep in mind about numeric literals is that AELiteral nodes always hold the absolute value of a numeric literal. When a positive numeric literal is used, an AELiteral node will be created to represent the value. When a negitve numeric literal is used an AENegate will be created and have a child AELiteral node that contains the absolute value of the literal.

For example, if the literal -5 was used in a query, the following subtree would appear in the AETree:

AENegate
AELiteral: 5

Now Is Your Turn

Download a free evaluation of  SimbaEngine X SDK and try SQLEngine in action!