In assignment 2 you implemented Relational Algebra. Then in assignment 4, you implemented queries using Relational Algebra. In this assignment, you will be doing something similar: implementing a few query processing operators, and then using them to implement queries.
A query processing operator is much like a Relational Algebra operator except:
As in the earlier assignments, you will start with code that I provide (in this archive), which contains an implementation of Database, Row, Table and Index classes; and some unit tests. You will need to 1) implement query processing operators, and 2) use these operators to implement some of the same queries you implemented in Assignment 4.
Implementing the query processing operators is likely to be the most demanding part of this assignment. Unlike Relational Algebra operators, which return tables containing all output rows at once, you must implement the query processing operators as Iterators, which return one row each time next() is called. Structuring your code to return one row at a time is going to be trickier than simply generating all of the rows at once, (especially for join).
Writing the queries should be familiar to you from Assignment 4. The style of coding — deeply nested expressions combining operators — is similar.
The archive contains the following:
It is probably easiest to understand the pieces of this assignment
by examining a query, and seeing how it is implemented in terms of
operators. Here is test_q1 from test_query_plans.cpp. (The name
test_q1 is unchanged from Assignment 4, and in fact we are
implementing the same query. But this time, using query processing
operators instead of relational algebra.)
static bool q1_predicate(const Row *row)
{
return row->at(1) == "Tweetii";
}
static void test_q1()
{
Table *control1 = Database::new_table("control1", ColumnNames{"birth_date"});
add(control1, {"1984/02/28"});
Iterator* q1 =
In the code above, the functions project, select, and table_scan are part of the query processing interface, declared in QueryProcessor.h. table_scan creates an Iterator that scans the user table. The select invocation selects those rows satisfying q1_predicate. Note that q1_predicate refers to columns by number. In this case, the username is stored in column 1 (the second column) of the input row. project takes Rows from the Iterator returned by select, and keeps only the last column (number 2).
Here is the complete interface (minus the documentation present in the
header file):
Iterator*
These functions are implemented in QueryProcessor.cpp, and all they
do is to create query processing operator objects, e.g.
Iterator* project(Iterator* input, initializer_list
So the project function is implemented by the Project class, the
table_scan function is implemented by the TableScan class, etc.
Project is a query processing operator, declared in Operators.h,
and implemented in Operators.cpp. Here is the declaration:
class Project : public Iterator {
public:
unsigned n_columns() override;
void open() override;
Row* next() override;
void close() override;
public:
Project(Iterator* input, const initializer_list
A few things to note about this class:
Here is the implementation of Project, from Operators.cpp:
unsigned Project::n_columns()
{
return _column_selector.n_selected();
}
void Project::open()
{
_input->open();
}
Row* Project::next()
{
Row* projected = NULL;
Row* row = _input->next();
if (row) {
projected = new Row();
for (unsigned i = 0; i < _column_selector.n_selected(); i++) {
projected->append(row->at(_column_selector.selected(i)));
}
Row::reclaim(row);
}
return projected;
}
void Project::close()
{
_input->close();
}
Project::Project(Iterator* input, const initializer_list
Important things to notice:
All of the code in this walkthrough can be found in the provided archive. I've implemented test_q1 and the member functions of Project. However, you will need to implement the member functions of the other operators in Operators.cpp, and the other queries in test_query_plans.cpp, (look for IMPLEMENT_ME).
The Database class is pretty much the same as in previous assignments.
In previous assignments, Table was a set of Rows, (using the RowSet type). To reflect database system internals more accurately, this assignment defines a Table as a list of Rows. This means that duplicates may occur, (just like a sequential file).
Row is a bit different. As in the past, a Table
contains Rows, and you can access Row field values by using a column
name, e.g.
Row* person = ...
printf("name: %s\n", person->value("name");
However, column names are only useful in conjunction with tables. Projects and joins can easily generate rows that have little resemblance to any tables. So for this assignment, operators will index fields of Rows by position, starting from 0. If a Row r has n fields, then you can access the ith field as r->at(i), i = 0 .. n-1. (All fields are strings, as before.)
Several of the operators require special columns to be identified. For example, Project needs to know which columns to keep. NestedLoopsJoin needs to know, for each input iterator, what columns to compare for the join. The ColumnSelector class is used for this purpose.
For example, suppose we have an Iterator whose rows have 5 columns (numbered 0 through 4), and we want to project on columns 1 and 2. The Project operator would have a ColumnSelector object set up as follows:
RowCompare is different than in previous assignments. Previously, RowCompare::only was the only RowCompare object needed, to compare whether two rows are equal, and to determine row ordering. But now, the Sort operator needs to be able to sort by a given set of columns. RowCompare has been modified to help you.
Your Sort implementation should be able to use std::sort to do the actual sorting. The last argument to std::sort should be a RowCompare object that compares rows according to the Sort operator's sort columns. I.e., your Sort::open() implementation should do the sorting by creating an approprate RowCompare object and then passing it to std::sort.
One new feature of the database classes is that tables can now have indexes. Query execution plans can start by scanning tables, or by doing index lookups, as in a real database system.
An IndexScan operator is specified by giving the lo and hi key values of interest. The scan is implemented by searching for lo, and scanning forward until hi is exceeded. IndexScan::next() returns a Row pointed to by the index.
For example, suppose we have a Table, t, with columns x, y, z:
x y z
---------------
"5" "10" "123"
"5" "71" "456"
"6" "66" "789"
"7" "42" "543"
"8" "83" "765"
"5" "99" "678"
"3" "16" "432"
"8" "25" "987"
We can create an index on y as follows:
Index* ty = t->add_index(ColumnNames{"y"});
This index provides access to Rows of t in order of y values: "10", "16", "25", "42", "66", "71", "83", "99".
To search for all the Rows with y between "20" and "50", we can do
this:
Row* lo = new Row(1);
lo->append("20");
Row* hi = new Row(1);
hi->append("50");
Iterator* i = index_scan(ty, lo, hi);
Using this iterator yields the rows ("8", "25", "987"), ("7", "42", "543").
Each operator class is a subclass of Iterator, defined in
Iterator.h:
class Iterator
{
public:
virtual unsigned n_columns() = 0;
virtual void open() = 0;
virtual Row* next() = 0;
virtual void close() = 0;
virtual ~Iterator() {}
};
open(), next(), and close() are as discussed in class.
n_columns() returns the number of columns in the Iterator's rows, (i.e., the Rows returned by next()). So for example, TableIterator::n_columns() returns the number of columns in the underlying table. Project::n_columns() returns the number of columns being projected.
The schema for this assignment is similar to Assignment 4. We have the same tables:
user(
user_id primary key,
username,
birth_date)
message(
message_id primary key,
send_date,
text)
routing(
from_user_id references user,
to_user_id references user,
message_id,
primary key(from_user_id, to_user_id, message_id)
In addition, the user table has an index on username, (this is created in the setup() function of test_query_plans.cpp).
The db directory contains the same data as in Assignment 4.
I have tried to eliminate as many memory management headaches for you as possible. (E.g., I implemented the destructor for each operator class. You're welcome.) There is just one thing you need to watch out for: deleting rows created by your operator implementations, that will never be needed again.
For example, note the call to Row::reclaim() in Project::next(). The input row is never returned. Instead, a new row is constructed (in the variable projected), and that is returned from next(). The input row will never be used again, so it is reclaimed.
Don't delete rows by using the C++ delete statement. Use Row::reclaim() instead. This function deletes a row unless it belongs to a table. We want to keep those of, course. We only want to delete rows generated by operators which will never be needed again.
Here is a typical operator test:
void sort_non_empty()
{
Table* t = Database::new_table("t", ColumnNames{"a", "b", "c"});
add(t, {"a", "z", "30"});
add(t, {"c", "y", "20"});
add(t, {"e", "x", "10"});
add(t, {"g", "w", "40"});
add(t, {"a", "z", "31"});
add(t, {"c", "y", "21"});
add(t, {"e", "x", "11"});
add(t, {"g", "w", "41"});
Iterator* i = sort(table_scan(t), {1, 2, 0});
Table* control = Database::new_table("control", ColumnNames{"a", "b", "c"});
add(control, {"g", "w", "40"});
add(control, {"g", "w", "41"});
add(control, {"e", "x", "10"});
add(control, {"e", "x", "11"});
add(control, {"c", "y", "20"});
add(control, {"c", "y", "21"});
add(control, {"a", "z", "30"});
add(control, {"a", "z", "31"});
Iterator* control_iterator = table_scan(control);
CHECK(i->n_columns() == 3);
TWICE {
CHECK(match(control_iterator, i));
};
delete i;
delete control_iterator;
}
select(
project(
nested_loops_join(
table_scan(r), {3}
table_scan(s), {1}),
{0, 1, 4}),
some_predicate)
print_iterator("r scan", table_scan(r));
print_iterator("s scan", table_scan(s));
print_iterator("join", nested_loops_join(table_scan(r), {3}, table_scan(s), {1})));
Run make:
Before you fill in the IMPLEMENT_ME sections, you will probably see compiler warnings about statements with no effect. Those will disappear as you fill in code.
Run the code as follows:
You will see two batches of tests run, the operator tests and the query tests.
You need to do the following: