Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to instantiate a Morel context from an existing Calcite rootSchema? #145

Closed
GavinRay97 opened this issue Apr 30, 2022 · 10 comments
Closed

Comments

@GavinRay97
Copy link
Contributor

GavinRay97 commented Apr 30, 2022

Heya Julian,

I'm working on a web playground for Calcite, that I plan on hosting for free so that people can experiment with it. It'll essentially be a web REPL with some pre-loaded data, and the ability to render visualizations of query plans + schema diagrams.

I figured that while I'm at it, it could be nice to add a second input form, which takes Morel expressions, and is bound to the same Calcite context/rootSchema.

Having a bit of difficulty figuring out how to do this though.

I have a singleton object that holds the master schema like below -- I think I need to pass this as a foreignValue to Morel or similar?

// Holds the master schema that contains all datasource schemas and their sub-schemas
object CalciteSchemaService {

    val connection: CalciteConnection = initCalciteConnection()
    val rootSchema: SchemaPlus = connection.rootSchema

    val frameworkConfig: FrameworkConfig = Frameworks.newConfigBuilder()
        .defaultSchema(connection.rootSchema)
        .parserConfig(SqlParser.config().withCaseSensitive(false))
        .build()

    val relBuilder: RelBuilder = RelBuilder.create(frameworkConfig)

    // <SNIPPED>
}

I see this but it's protected:

protected Calcite() {
rootSchema = CalciteSchema.createRootSchema(false).plus();
relBuilder = RelBuilder.create(Frameworks.newConfigBuilder()
.defaultSchema(rootSchema)
.build());
typeFactory = (JavaTypeFactory) relBuilder.getTypeFactory();
dataContext = new EmptyDataContext(typeFactory, rootSchema);
}

@julianhyde
Copy link
Collaborator

Yes, foreignValue is the way to do it. It associates a Calcite schema with a Morel variable. You could have a single variable named ‘calcite’ for the root schema (and all first-level schemas would appear as fields) or create a separate foreignValue for each first-level schema.

You will need some kind of shell. (Parses and executes commands, and prints the results, in the environment created by previous commands in the same session.) Maybe one of the existing shells in Morel will work for you, or could be adapted.

The existing shells are able to read commands from a file, including recursively reading files, then switch back to their original input stream, which might be useful to you.

I briefly looked at creating a Java wrapper so that Morel could run in Jupyter, and it looked like a pretty similar problem.

@julianhyde
Copy link
Collaborator

I’m not sure that Calcite’s constructor being protected is a problem. Especially if it is constructed by a shell.

One improvement might be for the list of foreignValue to be dynamic, so you can add and remove after the shell has been created.

@GavinRay97
Copy link
Contributor Author

GavinRay97 commented May 2, 2022

Thanks for the pointers Julian. I think I've gotten closer with this, but getting an error about a table not being found.

My guess would be that I maybe need to pass in a custom Calcite context instead of just the ForeignValue map, but I'm not 100% sure to be honest:

The schema/table structure of my rootSchema is:

Schema: chinook
Tables:
	Schema: information_schema
	Tables: ...
	Schema: pg_catalog
	Tables: ...
	Schema: public
	Tables:
		Album
		Artist
		...
Schema: metadata
Tables:
	COLUMNS
	TABLES
object CalciteSchemaService {

    init {
		// ...
        addDatabase("chinook", h2DataSource)
    }

    fun addDatabase(databaseName: String, ds: DataSource): SchemaPlus {
        // Inner helper func
        fun getSubSchemas(connection: Connection): List<String> {
            connection.metaData.schemas.use { rs ->
                val schemas = mutableListOf()
                while (rs.next()) {
                    schemas.add(rs.getString("TABLE_SCHEM"))
                }
                return schemas
            }
        }

        if (getSubSchemas(ds.connection).isNotEmpty()) {
            val catalogSchema = JdbcCatalogSchema.create(rootSchema, databaseName, ds, null)
            rootSchema.add(databaseName, catalogSchema)
        } else {
            val jdbcSchema = JdbcSchema.create(rootSchema, databaseName, ds, null, null)
            rootSchema.add(databaseName, jdbcSchema)
        }

        return rootSchema
    }
class CalciteMorelInterpreter {

    static String smlProgram = "chinook;";
    static InputStream input = new ByteArrayInputStream(smlProgram.getBytes());
    static PrintStream output = System.out;

    static SchemaPlus rootSchema = CalciteSchemaService.INSTANCE.getRootSchema();
    static SchemaPlus chinookSchema = rootSchema.getSubSchema("chinook").getSubSchema("public");

    static Map<String, ForeignValue> foreignValueMap = Calcite
            .withDataSets(
                    Map.of("chinook", (Calcite calcite) -> new CalciteForeignValue(calcite, chinookSchema, true))
            )
            .foreignValues();

    public static void main(String[] args) {
        var morelMain = new net.hydromatic.morel.Main(List.of(), input, output, foreignValueMap, new File(""));
        morelMain.run();
    }
}
Exception in thread "main" org.apache.calcite.runtime.CalciteException: Table 'chinook.public.Album' not found
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
	at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
	at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
	at org.apache.calcite.tools.RelBuilder.scan(RelBuilder.java:1575)
	at org.apache.calcite.tools.RelBuilder.scan(RelBuilder.java:1575)
	at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:91)
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
	at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
	at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
	at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
	at com.google.common.collect.SingletonImmutableBiMap.forEach(SingletonImmutableBiMap.java:68)
	at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:91)
	at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
	at net.hydromatic.morel.Main.run(Main.java:122)
	at com.github.gavinraydev.CalciteMorelInterpreter.main(Morel.java:33)
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
	at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
	at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
	at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
	at com.google.common.collect.SingletonImmutableBiMap.forEach(SingletonImmutableBiMap.java:68)
	at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
	at net.hydromatic.morel.Main.run(Main.java:122)
	at com.github.gavinraydev.CalciteMorelInterpreter.main(Morel.java:33)

@GavinRay97
Copy link
Contributor Author

Ah okay, I figured it out. The logic for CalciteForeignValue only handles 1-level of Schema nesting, so you can't do chinook.public, you need to recursively visit and namespace nested schemas like chinook__public:

// Recursively copy sub-schemas from a rootSchema into a Morel Calcite ForeignValue context
private static Calcite createMorelCalciteForeignValueCtx(SchemaPlus rootSchema) {
    boolean toLowerCase = true;
    final ImmutableMap.Builder<String, DataSet> builder = ImmutableMap.builder();

    SchemaPlus currentSchema = rootSchema;
    while (true) {
        final Set<String> subSchemaNames = currentSchema.getSubSchemaNames();
        if (subSchemaNames.isEmpty()) {
            break;
        }
        for (String subSchemaName : subSchemaNames) {
            final SchemaPlus subSchema = currentSchema.getSubSchema(subSchemaName);
            String name = Schemas.path(subSchema).names().stream().reduce((a, b) -> a + "__" + b).get();
            builder.put(name, calcite -> {
                SchemaPlus newSchema = calcite.rootSchema.add(name, subSchema);
                return new CalciteForeignValue(calcite, newSchema, toLowerCase);
            });
        }
        currentSchema = currentSchema.getSubSchema(subSchemaNames.iterator().next());
    }

    return Calcite.withDataSets(builder.build());
}
static String smlProgram = "from a in chinook__public.artist yield {a.artistid, a.name};";

static Reader input = new InputStreamReader(new ByteArrayInputStream(smlProgram.getBytes()));
static Writer output = new StringBuilderWriter();

public static void main(String[] args) {
    Calcite calciteMorelCtx = createMorelCalciteForeignValueCtx(ROOT_SCHEMA);
    var morelMain = new Main(List.of(), input, output, calciteMorelCtx.foreignValues(), new File(""));
    morelMain.run();
}
Result:
val it =
  [{artistid=1,name="AC/DC"},{artistid=2,name="Accept"},
   {artistid=3,name="Aerosmith"},{artistid=4,name="Alanis Morissette"},
   {artistid=5,name="Alice In Chains"},{artistid=6,name="Ant?nio Carlos Jobim"},
   {artistid=7,name="Apocalyptica"},{artistid=8,name="Audioslave"},
   {artistid=9,name="BackBeat"},{artistid=10,name="Billy Cobham"},
   {artistid=11,name="Black Label Society"},{artistid=12,name="Black Sabbath"},
   ...] : {artistid:int, name:string} list

@julianhyde
Copy link
Collaborator

Is there a way to get this enhanced functionality into trunk? By which I mean, reframe it as a new feature, then submit a PR that implements the feature, including tests?

I certainly think that chinook would be a good data set to have built into Morel, alongside scott and foodmart.

Regarding the name mapping, to chinook__public. It would be preferable to address it via either chinook or chinook.public. The former would require the mapping to have something like a 'default schema' property; the latter would require CalciteForeignValue to surface sub-schemas as fields of the Morel record value, in addition to tables.

@GavinRay97
Copy link
Contributor Author

Is there a way to get this enhanced functionality into trunk? By which I mean, reframe it as a new feature, then submit a PR that implements the feature, including tests?

Sure, certainly not opposed to doing that.

the latter would require CalciteForeignValue to surface sub-schemas as fields of the Morel record value, in addition to tables.

I think I might try this route, extending the code that already exists

May see if there's a way to write a Shell that is more friendly for non-REPL uses, that allows you to instantiate a persistent foreign value context, where you can repeatedly call .run(scriptString, outBuffer)

@julianhyde
Copy link
Collaborator

May see if there's a way to write a Shell that is more friendly for non-REPL uses

Designing for reuse is hard. So I'd probably copy-paste the existing shell, customize it until I had something that works for non-REPL uses, then see whether there's any way to get rid of the copied code (by sub-classing, adding handlers, etc.) Or just start from scratch. If the code seems to be trending towards a shell, let it go in that direction. :)

Maybe take inspiration from https://github.com/SpencerPark/IJava (a Java kernel for Jupyter).

@GavinRay97
Copy link
Contributor Author

@julianhyde I have created an initial PR here, but it's throwing an error about RelMetadataQueryBase.getMetadataHandlerProvider being null:

#146

java.lang.NullPointerException: metadataHandlerProvider

	at java.base/java.util.Objects.requireNonNull(Objects.java:233)
	at org.apache.calcite.rel.metadata.RelMetadataQueryBase.getMetadataHandlerProvider(RelMetadataQueryBase.java:122)
	at org.apache.calcite.rel.metadata.RelMetadataQueryBase.revise(RelMetadataQueryBase.java:118)
	at org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:604)
	at org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:291)
	at org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:125)
	at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:244)
	at org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:124)
	at org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:114)
	at org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:178)
	at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:2025)
	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
	at net.hydromatic.morel.foreign.CalciteForeignValue.lambda$value$4(CalciteForeignValue.java:100)
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
	at com.google.common.collect.RegularImmutableSortedSet.forEach(RegularImmutableSortedSet.java:89)
	at net.hydromatic.morel.foreign.CalciteForeignValue.value(CalciteForeignValue.java:89)
	at net.hydromatic.morel.compile.Environments.lambda$foreignBindings$2(Environments.java:104)
	at com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:292)
	at net.hydromatic.morel.compile.Environments.foreignBindings(Environments.java:101)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:95)
	at net.hydromatic.morel.compile.Environments.env(Environments.java:63)
	at net.hydromatic.morel.ProgrammaticShell.makeEnv(ProgrammaticShell.java:80)
	at net.hydromatic.morel.ProgrammaticShell.<init>(ProgrammaticShell.java:54)
	at net.hydromatic.morel.ProgrammaticShellTest.run(ProgrammaticShellTest.java:19)

Would appreciate any feedback if you see what I'm doing wrong

@GavinRay97
Copy link
Contributor Author

Ah wow wtf -- removing static from foreignValueMap makes it work:

// BROKEN
static Map<String, ForeignValue> foreignValueMap = Calcite.withDataSets(BuiltInDataSet.DICTIONARY).foreignValues();

// WORKS
Map<String, ForeignValue> foreignValueMap = Calcite.withDataSets(BuiltInDataSet.DICTIONARY).foreignValues();

I guess I don't understand Java? This is very confusing 🤔

@julianhyde
Copy link
Collaborator

julianhyde commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants