Notes on Query Evaluation Techniques for Large Databases COS 480/580: Database Management Systems

Notes on Query Evaluation Techniques for Large Databases
COS 480/580: Database Management Systems

Sudarshan S. Chawathe
University of Maine

Fall 2005

The following review questions are based on Goetz Graefe’s paper titled Query Evaluation Techniques for Large Databases (ACM Computing Surveys, 25(2):73–170, June 1993).

What is the Halloween problem? Illustrate it by using an example that is not discussed in the paper.
Outline the principal steps leading from a text query to an executable query plan. For each step, clearly indicate the input and output, and brieﬂy describe its action.
Describe the main ideas of an iterator-based query-engine architecture.
Describe two alternatives to an iterator-based architecture.
What is the multiplicity of the mapping between logical and physical operators? Justify your answer using examples.
What is the difference between a left-deep plan and a bushy plan?
What is the difference between a left-deep plan and a right-deep plan (other than graphical representation)?
Describe two methods for creating level-0 runs. Comment on their performance characteristics. Justify your comments using examples.
Comment on the validity of the following statement: A hash join is preferable to a sort-merge join because the sorting-based must write the entire input to run ﬁles.
Provide an algorithm for generating optimized merge trees in the spirit of Figure 6 (page 88) and its accompanying discussion.
Provide a detailed example that illustrates bucket tuning and dynamic destaging during hashing. Your example must clearly indicate details such as hash buckets, memory buffers, and disk-resident data.
Describe hybrid hashing using a detailed illustrative example.
Compare hybrid hashing with other forms of dynamic hashing, such as extendible hashing and linear hashing. Highlight the most important similarities and differences.
What is a good block-size for an ext2-style ﬁlesystem built on a 300 GB disk that will be used mainly for storing typical mp3 ﬁles? Justify your answer.
Explain the following adjectives as they apply to indexes: clustering, nonclustering, dense, and sparse.
Provide a concrete example illustrating how a standard operating-system policy for buffer management, such as LRU replacement, may be particularly ill-suited for database workloads.
Provide pseudocode for a nested-loop aggregation operator (with grouping).
Write a standard SQL92 (SQL2) query that is equivalent to the query described in the ﬁrst paragraph of Section 4.5 (page 103).
Given a table R(a₁,a₂,…,a_k), how many non-equivalent queries will the following query template generate, where L may be replaced by any expression that results in a valid query?
select sum(a₁) from R group by L;
Provide pseudocode for a pointer-based join based on merging.
Highlight the key similarities and differences between pointer-based hybrid-hash joins and value-based hybrid-hash joins.
How does the difference in access-times for L2 caches and main memory affect main-memory management in physical operators, such as a hybrid-hash join?