Skip to content

A proposal for simplifying text formats with hierarchical content

License

Notifications You must be signed in to change notification settings

devvythelopper/shifted-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

Shifted-Text

A proposal for simplifying text formats with hierarchical content

Anybody with some experience in coding has stumbled over the topic tab characters (\t) versus space characters ( ). Relentlessly people have fought in both trenches and no conclusion has ever been reached, since both have their advantages and disadvantages. What I am going to propose has many advantages of both sides.

Indentation has long been used to allow the human eye to directly recognize the hierarchical structure of markup languages (XML, HTML) or source code files (.c, .js). The designers of some programming (and markup) languages (e.g. Python, Haskell, yml) have even made indentation part of tha language's syntax. It is a useful tool. However it requires a lot of extra bytes and thus is not very bandwidth friendly. But what if there was a way to have both: visual hierarchies and low bandwidth?

ASCII to the rescue: In the ASCII control character set there are some characters that mostly retain a historical meaning and are not used widely anymore. Two specific characters stuck out to me: shift-in (ASCII character number 15) and shift-out (ASCII character number 14). It was as if they called out to me, as if they were meant to fulfill this purpose.

The idea now is really simple:

A single shift-in character means that a new line begins with an indentation one level deeper than the line before. Multiple shift-in characters characters mean that a single new line begins with an indentation as many levels deeper than the line before as there are shift-in characters.

A single shift-out character analogously means that a new line begins with an indentation one level lower than the line before. Multiple shift-out characters characters mean that a single new line begins with an indentation as many levels lower than the line before as there are shift-out characters.

An example should make this clear:

a line of text followed by a `shift-in` character
	this line on the other hand is followed by a `shift-out` character
this line is followed by 2 `shift-in` characters
		this line is followed by a `linefeed` character
		this line is followed 2 `shift-out` characters
the end.

Any source file that conforms to this format (Shifted-Text) begins with an optional Byte Order Mark (BOM) from one of the unicode encodings. If it does not, UTF-8 is assumed, but other encodings are allowed.

After the BOM follows a shift-out (\14) character.

After the shift-out character follows a single optional hexadecimal digit (from 1 to F) donating the breadth of the indentation if it was expressed in space characters. If there is no number the breadth of the indentation defaults to 4 characters.

After this (or instead of it) follows an optional two digit hexadecimal number denoting the width of lines (allowed values range from 16 (0x10) to 255 (0xFF), at which a soft-wrap linebreak must occur and any soft-line in the soft-wrapped block must be indented one level deeper than the original line. There is no default line-width, e.g. if nothing is specified lines may or may not be soft-wrapped by the editor according to user preference.

Then follows a shift-in (\15) character. After this begins the content.

In the content every shift-in character increases indentation and every shift-out (\14) character decreases indentation while multiple consecutive shift-in characters correspond to a single new line (with an increased indentation by as many levels as shift-in characters). Analogously multiple consecutive shift-out characters correspond to a single new line (with decreased indentation by as many levels as shift-out characters).

New lines without any change in indentation are expressed using linefeed characters – across all systems.

Tab characters (\t) work as you would expect them with a minimum length of 1 and a maximum length equal to the breadth of the indentation as defined at the beginning of the Shifted-Text file. Thus consistency is fully achieved.

That's it folks. For this to take a foothold it must be adopted by the development teams of your favorite source editors and compilers. So please do propose it to them. I would be happy even more if the language designers would make it the default for their compilers. I have already been programming using this file format and it makes quite a few tasks a lot easier.

About

A proposal for simplifying text formats with hierarchical content

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published