-
Notifications
You must be signed in to change notification settings - Fork 979
Storage Details
A number of additional tasks take your plugin from basic skeleton to finished product.
The Drill web console allows you to view a query profile fo each query. Information is broken down by operator, with each operator giving a name. You should assign a name to your plugin-specific scan as well. (That is, we want to know not just that the operator is SCAN, we want to know it is an EXAMPLE_SCAN.)
The names are actually enum values, specified in a Protobuf file: UserBitShared.proto
. (Yes, having a single global table that includes all third-party plugins is, perhaps, not the most optimal design. But, it is how things work at the moment.) Each operator gives a unique number, which the UI maps to the enum name.
We will give our scan operator the name EXAMPLE_SUB_SCAN
, with a value that we pick as described below.
Modify UserBitShared.proto
to add the operator type:
enum CoreOperatorType {
SINGLE_SENDER = 0;
...
SHP_SUB_SCAN = 65;
EXAMPLE_SUB_SCAN = 66;
}
Choose the next available number for your ID.
Then, rebuild the Protobuf files as described in protocol/readme.txt
.
Finally, we tell the Base
framework which ID to use:
public class SumoStoragePlugin extends BaseStoragePlugin<BaseStoragePluginConfig> {
private final ExampleStoragePluginConfig config;
public SumoStoragePlugin(SumoStoragePluginConfig config,
DrillbitContext context, String name) throws IOException {
super(context, config, name, buildOptions());
schemaFactory = new SumoSchemaFactory(this);
}
private static StoragePluginOptions buildOptions() {
StoragePluginOptions options = new StoragePluginOptions();
...
options.readerId = CoreOperatorType.EXAMPLE_SUB_SCAN_VALUE;
return options;
}
If you were to run your example query from the Drill web console, then look at the query profile, you should see the name appear in the profile.
In our test so far, we used setup code to create a storage plugin config unique to that test. Users can create a "production" config using the Drill Web Console. It is often handy, however to provide a "starter" config that, in an ideal case, can be used as-is. Or, more realistically, the starter config shows the user which fields they must fill in.
The starter config resides in a file in your plugin directory. Once created, these defaults are also available to unit tests.
- Create the file
bootstrap-storage-plugins.json
in yourresources
folder. - Put into the file the JSON-encoded version of your default configuration.
It is best to do this step near the end when the exact fields of your config are finalized. If you create this file, and let Drill load it, then change the structure later, you may find that a (production) Drill will no longer start: Drill has saved the old config in Zookeeper and will fail if that old JSON cannot be deserialialized using your current storage plugin class. (This also means you must take care, when changing the config, to ensure it is backward-compatible once you release the plugin in the wild.)
Drill serializes your scan spec, GroupScan
and SubScan
to JSON using the Jackson library. Jackson will serialize everything within your classes. If your classes includes just simple types, then everything will work smoothly. However, if you include more complex types (such as your own classes, or references to Drill classes for schema or so on), then you may run into issues.
Some general rules:
- Provide a constructor, labeled as
@JsonCreator
that takes all serialized attributes. - Provide
getFoo()
methods for each attribute. If you use a name that is not in Java bean style (that is, 'foo()' rather than 'getFoo()'), then include a@JsonProperty
annotation on the getter. - If your class has
isFoo()
orgetFoo()
methods that are not JSON fields, then mark them with@JsonIgnore
. - If your class has structured types, then ensure those types also follow the Jackson serialization rules.
One detail can trip you up. Jackson uses the ObjectMapper
class for serialization. The group and sub scan are serialized and deserialized using Drill's internal object mapper that has a number of useful options set (such as the ability to serialize some of Drill's internal classes.) On the other hand, your scan spec is not serialized with this class; it is instead serialized using an object mapper defined in your plugin. You can customize this class if you like. (See the BaseStoragePlugin
class for details.)
The result is that, if your scan spec is complex, sometimes deserialization seems to work, other times it does not. As you get into custom storage plugins, you may find yourself also hitting some of these same issues.
So, if you use any of the custom types (such as MajorType
), you need to add these deseralizers to your object mapper. See PhysicalPlanReader
for where we set up the custom deserializers for the logical plan.
Drill provides the EXPLAIN PLAN FOR
syntax which shows the "logical plan" in a textual tree-like form. This form also appears in the Drill web console. Drill creates the plan by calling the toString()
method on each object in the plan. The logical play shows your GroupScan
class, so you must implement your toString()
correctly.
The following is an example of a scan (the mock scan) as it appears in the physical plan:
03-03 Scan(table=[[mock, employee_2400K]],
groupscan=[MockGroupScan [readEntries=[MockScanEntry [records=2400000,
columns=[MockColumn [name=empid_s17, type=MajorType [type=VARCHAR, mode=REQUIRED, precision=17]],
MockColumn [name=dept_i, type=MajorType [type=INT, mode=REQUIRED, precision=10]]]]]]]) ...
The above is shown divided into multiple lines for presentation; your output should be on a single line.
At present, each group and sub scan seems to do its own thing and sometimes produce a format not consistent with the above rules. To simplify the task, the Base
framework provides a class, PlanStringBuilder
to create a string in the correct format. The base group and sub scan classes take care of creating the builder and adding the base class fields. So, instead of overriding toString()
, you should override buildPlanString()
. Example:
@Override
public void buildPlanString(PlanStringBuilder builder) {
super.buildPlanString(builder);
builder.field("scanSpec", scanSpec);
builder.field("filters", filters);
}
Each method calls the toString()
on its argument. So, any object included in your group or sub scan should also use PlanStringBuilder
. In the worst case, you can simply write your own code to serialize a nested object, then pass the resulting string to the field()
method.