Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declaration semantics in Structorizer has to be reassessed #980

Open
codemanyak opened this issue Jun 14, 2021 · 5 comments
Open

Declaration semantics in Structorizer has to be reassessed #980

codemanyak opened this issue Jun 14, 2021 · 5 comments

Comments

@codemanyak
Copy link
Collaborator

codemanyak commented Jun 14, 2021

Issue #977 revealed that the semantics and processing of declarations (particularly that of arrays) in Structorizer is deficiently based and has to be revisited. The user example in the issue contained an element with the text:
var v1[10], i: int

By the way: Your declaration does not exactly do what you intend. It is rather surprising for me that Structorizer tolerated it. The correct syntax to declare an array of ten elements would follow the Pascal style: var v1: array [0..9] of int. But it has no effect, actually. var v1[10]: int creates indeed an array of 11 elements, all initialized to 0, because Structorizer produces an array of sufficient size as soon as it bumps into some writing attempt to an array element.

Originally posted by @codemanyak in #977 (comment)

The following diagram shows several possible declaration types suggested by different programming languages. With all possible syntactic tolerance of Structorizer appreciated, we need a specification which of them are to be accepted and what effect they are meant to have w.r.t. Executor, Analyser, and code generation:
grafik

This result seems hardly acceptable:
grafik

Should we accept Pascal's array [0..9] of int at all, as it suggests a flexible start index (i.e., an index base transformation) we do not actually support? Should the "execution" of an array declaration rather result in an array of the specified size but fill with null values? But what to do then with array declarations of unspecified size (e.g. v5)?

@codemanyak
Copy link
Collaborator Author

codemanyak commented Jun 14, 2021

The discussion will have to keep in mind at least these issues: #61, #113, #335, #368, #408, #423, #739, #800.
As Structorizer does not require declaration except in the case of a record component access and overrides declarations or former type associations on assignments, it remains difficult to force the type of declared variables, in particular if the is incomplete (like in array of string where no size is given).
Nevertheless, a reliable way of interpreting decalarations like var v[10]: float is needed, or they should be rejected.
Remark: Java accepts an array declaration in both of these forms: double[] a; and double a[]; and even double[] b[]; (where the latter is a two-dimensional array b).

@codemanyak
Copy link
Collaborator Author

codemanyak commented Jun 21, 2021

In an analogy to some Pascal distributions and Oberon we might indeed think of a modified semantics of declarations in future Structorizer versions, particularly for arrays:

  • An undeclared variable will continue to tolerate assignments with all types of value at any time (like it had always done in Structorizer and e.g. in Python). Mere initialisation would not induce further type adherence checks, it might be used for weak type inference on code export, though.
  • An undeclared variable will be made an array by either being assigned an array initializer (e.g. a ← {0, 8, 15}) or by assigning an element with arbitrary non-negative index (e.g. a[i + 3] ← "some fun" with i being an integer variable or constant) as is done by now. The initial size of the array would be set by the initializer or the used index + 1, respectively, the array will remain extendible. The singular element assignment by now has the effect of filling all places from index 0 to the used index - 1 with 0, which may be convenient and gratious in many cases, but the user should not rely on it—perhaps it is better to fill these array places with null, thus provoking an error on reading access? The variable can be overridden with any value of any type at any time.
  • If a variable is declared to certain type, however, we might try to respect (and enforce, see executor ignores variable types #408) this type association as far as Structorizer can interpret the type name or description—which is the big question mark here (we would have to specify a set of acknowledged type names and specifications).
  • If a variable is declared to an array type then there might be two cases:
    • a "static" array type if the index range is given at declaration time (e.g. array [15] of elementtype or elementtype[15])—in this case later expansion of the array or redefinition attempts for the variable should not be accepted at execution time, index range would have to be checked. (But this tends to make assignments of array initializer expressions complicated.) We might even try to enforce the element type at runtime. The question is whether such a declaration should initialise the variable with an already dimensioned array filled with null elements (which would cause execution errors on reading access)—by now the variable remains uninitialised in Structorizer.
    • an "open", "dynamic", or extensible array type if the index range is not specified (e.g. array of elementtype or elementtype[]). The array could always be prolongated, but not overridden by data of different structure; element type could be supervised on assignments. Again the question arises: Whether and how to initialize the variable? As an empty array (no elements)?
    • an array unspecified with respect to both size and element type (only structure principle specified, e.g. array)—do we accept this or not? It should be extensible without element type checking, but we might want to reject redefinition/reassignment attempts with something else?
  • In analogy, a record variable would be type-enforced if explicitly declared. (If not explicitly initialized then to be established as a record with null components or to be left null as a whole?)
  • In further analogy we might try to enforce the type of explicitly declared enumeration variables.

In the consequence, Structorizer would have to maintain a declaration table apart from an inferred type association table (or a declaration flag in the type association map).
This approach might seem more sound or sophisticated but is of course more complicated, too (both in understanding and in implementation, obviously, and also for inferring subroutine parameter lists etc.). It would not be fully compatible with earlier versions and it will always remain somewhat incomplete.

So is it worth the efforts? Should it even be optional?

In any case, something is to be done about the misinterpretation of texts like var a[10]: integer—either we reject it or we interpret it like var a: array [10] of integer.

@codemanyak
Copy link
Collaborator Author

codemanyak commented Oct 5, 2021

In addition, we will have to decide whether it is desirable and feasible also to accept declarations in C and/or Java style (then with or without var prefix?), particularly, as this declaration style is already accepted in many assignment (initialisation) contexts, but causing trouble with types.
Examples:

  • int v6[10] (or even var int v7[10]?) and int v6[] <- {123, 3735, 21, 832, -152, 98, 36};
  • int[] v8 or var int[] v9?
  • int[] v9[10] (a perfectly legal, though hardly recommendable way to declare a two-dimensional array in Java or C#...)

The multitude of possible syntactic approaches adds to the already intrinsic ambiguity of an underlying grammar, in particular if type names may consist of more than one word (like unsigned int, long long int, long double or something like that): in contrast to C or C++, "unsigned" or "long" aren't reserved words in Structorizer. Index ranges might even be specified by expressions, thus potentially implying recursion.

@codemanyak
Copy link
Collaborator Author

Not to be forgotten: C++, C#, and Java export are also compromised with C-like array declarations.
grafik
grafik

@codemanyak
Copy link
Collaborator Author

The combination of a multiple declaration with an initialization like the following should be detected as illegal:
grafik
By now, Structorizer interprets it in an inconsistent way (usuallly assigning the value to the last of the variables). It actually causes harm in case of a semi-C array declaration as in:
grafik
Code export interprets this in very different (wrong) ways, e.g.:
C:
grafik
Pascal:
grafik
BASIC:
grafik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant