Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please Define bit Endianness, byte Endianness and size of base Types #23143

Closed
2 tasks
martinvahi opened this issue Dec 12, 2024 · 5 comments
Closed
2 tasks

Comments

@martinvahi
Copy link

martinvahi commented Dec 12, 2024

Describe the feature

Often times the abstraction level of software that is written in C-like programming languages is neither numbers nor characters, but bit-streams. Examples: SHA256, UTF8. The UTF8 even has an optional Byte-Order-Mark (BOM). Code of such software is bit endianness specific, byte endianness specific and base type size specific. In the C programming language world there tends to be an assumption that if the software has been written for "big computers" like laptops, desktops, servers, then the sizes of certain C base types like char, int, double and bit endianness and byte endianness of those types match with what they happen to be with x86/AMD64 CPUs, but from software stability point of view it would be better, if those properties of base types were EXPLICITLY DEFINED at programming language specification and guaranteed by programming language implementation.

Thank You for reading my comment.

Use Case

Use cases:

  • Formal verification of all bit-stream related code, including code that decodes UTF8.
  • Write-once-compile-and-run-everywhere functionality for real, not just "works on all x86/AMD64 CPUs" and ARM CPUs if the ARM CPUs have been switched to the same endianness mode that x86/AMD64 CPUs use.
  • Manual code review becomes easier, if bit endianness and byte endianness has been decided, defined. Less confusion, what the program should do.
  • Translation from V to other programming languages becomes easier, if there is only one understanding, what the specific V program is supposed to do.
  • Niklaus Wirth style clarity of algorithm description becomes possible in V, if there are no ambiguities about the properties of V base types.
  • Code that has been written for desktop computers and laptops and servers can be copy-pasted to microcontroller(hereafter: MCU) software projects regardless of the MCU type. Code reuse will become better than that of the code that has been written in C programming language. A year 2024 Raspberry_Pi_Pico MCU is computationally way more powerful than many desktop computers were in the past, so code written for year 2024 era MCUs like the Raspberry_Pi_Pico becomes usable in retro-computing setups, which includes old industrial equipment that uses old retro-computers due to the control electronics (electric motor control, sensors, etc.) that is specific to the computers of the era, when that industrial equipment got built.

Proposed Solution

For every base type define byte endianness, bit endianness and size in bytes. It's OK for them to match with what the classical x86/AMD64 CPU has, but it has to be defined without ambiguity and INDEPENDENT OF CPU type. That is to say, if someone creates a V compiler for some new experimental CPU, then bitstream algorithm implementations in V that work on x86/AMD64 should work WITHOUT ANY MODIFICATION on that new experimental CPU even, if bit endianness and byte endianness of that CPU differs from that of the x86/AMD64.

Other Information

Not having defined bit endianness, byte endianness and size of base types introduces an "undefined behaviour" of sorts, where people just assume that the behaviour is like it is on x86/AMD64, but it's not guaranteed to be like that.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

Version used

git version 2.39.2

Environment details (OS name and version, etc.)

Linux, AMD64 and the various CPUs that the various Raspberry Pi-s come with.

Note

You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

Huly®: V_0.6-21580

@martinvahi
Copy link
Author

Just a random example of some mess that MIGHT be prevented, if the bit endianness, byte endianness and sizes of base types were defined in V specification: #23136

Someone could write one version of code in V and that would just work without needing to reimplement the same math code for every CPU type and without having to take various CPU peculiarities to account. If that universal version is considered to be "too slow", then someone else at some computing center institution, supercomputer maintenance team, can swap it out with their CPU type specific custom version, but at least for the rest of the V users the math code would just work.

@jorgeluismireles
Copy link

Suppose someone nostalgic want to convert V language to Java Virtual Machine code. To start with a valid V number can include undescores like 2_024 no endiannes to worry about:

fn main() {
	year := 2_024
	println('Hello, World ${year}')
}

First approach would be to convert V code into java code, something like this:

public class year {
    public static void main(String[] args) {
        int year = 0x07E8;
        System.out.println("Hello, World " + year); 
    }
}

And use javac program to create a class that can be run with the program java:

$ javac year.java
$ java year
Hello, World 2024

Second approach would be to convert V code into something like javap -c disassembler does to stop using java compiler:

$ javap -c year.class
Compiled from "year.java"
public class year {
  public year();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: sipush        2024
       3: istore_1
       4: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       7: new           #3                  // class java/lang/StringBuilder
      10: dup
      11: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
      14: ldc           #5                  // String Hello, World
      16: invokevirtual #6                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      19: iload_1
      20: invokevirtual #7                  // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      23: invokevirtual #8                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      26: invokevirtual #9                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      29: return
}

the section 0: sipush 2024 \ 3: istore_1 means push short number 2024 and store it. So the compiler decided a short is enough to store our year.

Third approach would be to go "native" and produce the complete class content byte by byte:

$ xxd year.class
00000000: cafe babe 0000 0034 002b 0a00 0b00 1409  .......4.+......
00000010: 0015 0016 0700 170a 0003 0014 0800 180a  ................
00000020: 0003 0019 0a00 0300 1a0a 0003 001b 0a00  ................
00000030: 1c00 1d07 001e 0700 1f01 0006 3c69 6e69  ............<ini
00000040: 743e 0100 0328 2956 0100 0443 6f64 6501  t>...()V...Code.
00000050: 000f 4c69 6e65 4e75 6d62 6572 5461 626c  ..LineNumberTabl
00000060: 6501 0004 6d61 696e 0100 1628 5b4c 6a61  e...main...([Lja
00000070: 7661 2f6c 616e 672f 5374 7269 6e67 3b29  va/lang/String;)
00000080: 5601 000a 536f 7572 6365 4669 6c65 0100  V...SourceFile..
00000090: 0979 6561 722e 6a61 7661 0c00 0c00 0d07  .year.java......
000000a0: 0020 0c00 2100 2201 0017 6a61 7661 2f6c  . ..!."...java/l
000000b0: 616e 672f 5374 7269 6e67 4275 696c 6465  ang/StringBuilde
000000c0: 7201 000d 4865 6c6c 6f2c 2057 6f72 6c64  r...Hello, World
000000d0: 200c 0023 0024 0c00 2300 250c 0026 0027   ..#.$..#.%..&.'
000000e0: 0700 280c 0029 002a 0100 0479 6561 7201  ..(..).*...year.
000000f0: 0010 6a61 7661 2f6c 616e 672f 4f62 6a65  ..java/lang/Obje
00000100: 6374 0100 106a 6176 612f 6c61 6e67 2f53  ct...java/lang/S
00000110: 7973 7465 6d01 0003 6f75 7401 0015 4c6a  ystem...out...Lj
00000120: 6176 612f 696f 2f50 7269 6e74 5374 7265  ava/io/PrintStre
00000130: 616d 3b01 0006 6170 7065 6e64 0100 2d28  am;...append..-(
00000140: 4c6a 6176 612f 6c61 6e67 2f53 7472 696e  Ljava/lang/Strin
00000150: 673b 294c 6a61 7661 2f6c 616e 672f 5374  g;)Ljava/lang/St
00000160: 7269 6e67 4275 696c 6465 723b 0100 1c28  ringBuilder;...(
00000170: 4929 4c6a 6176 612f 6c61 6e67 2f53 7472  I)Ljava/lang/Str
00000180: 696e 6742 7569 6c64 6572 3b01 0008 746f  ingBuilder;...to
00000190: 5374 7269 6e67 0100 1428 294c 6a61 7661  String...()Ljava
000001a0: 2f6c 616e 672f 5374 7269 6e67 3b01 0013  /lang/String;...
000001b0: 6a61 7661 2f69 6f2f 5072 696e 7453 7472  java/io/PrintStr
000001c0: 6561 6d01 0007 7072 696e 746c 6e01 0015  eam...println...
000001d0: 284c 6a61 7661 2f6c 616e 672f 5374 7269  (Ljava/lang/Stri
000001e0: 6e67 3b29 5600 2100 0a00 0b00 0000 0000  ng;)V.!.........
000001f0: 0200 0100 0c00 0d00 0100 0e00 0000 1d00  ................
00000200: 0100 0100 0000 052a b700 01b1 0000 0001  .......*........
00000210: 000f 0000 0006 0001 0000 0001 0009 0010  ................
00000220: 0011 0001 000e 0000 003e 0003 0002 0000  .........>......
00000230: 001e 1107 e83c b200 02bb 0003 59b7 0004  .....<......Y...
00000240: 1205 b600 061b b600 07b6 0008 b600 09b1  ................
00000250: 0000 0001 000f 0000 000e 0003 0000 0003  ................
00000260: 0004 0004 001d 0005 0001 0012 0000 0002  ................
00000270: 0013                                     ..

and is until here we see endianess is required. Line 00000230 contains the three bytes 11 07 e8 corresponding to sipush 2024 and corresponding to year := 2_024.

As far I understand is the backend (here the JVM) which needs the endiannes not V itself.

@martinvahi
Copy link
Author

martinvahi commented Dec 19, 2024

@jorgeluismireles Thank You for the answer.

...
and is until here we see endianess is required
...

Suppose someone wants to implement some fast file hashing algorithm and starts to use speed hacks like "shift one bit left" to multiply by 2, then how can that be done without knowing the byte endianness and bit endianness of a multi-byte number? Thank You.

@jorgeluismireles
Copy link

Just to clarify, V lang has compile time $if, $else to detect the platform endianess:

// hton16 converts the 16 bit value `host` to the net format (htons)

// hton16 converts the 16 bit value `host` to the net format (htons)
pub fn hton16(host u16) u16 {
	$if little_endian {
		return reverse_bytes_u16(host)
	} $else {
		return host
	}
}
...
// reverse_bytes_u16 reverse a u16's byte order
@[inline]
pub fn reverse_bytes_u16(a u16) u16 {
	// vfmt off
	return ((a >> 8) & 0x00FF) |
		   ((a << 8) & 0xFF00)
	// vfmt on
}

And helping methods:

import encoding.binary

fn main() {
	million := u64(1_000_000)
	println('le million: ${binary.little_endian_get_u64(million)}')
	println('bg million: ${binary.big_endian_get_u64(million)}')
}
le million: [64, 66, 15, 0, 0, 0, 0, 0]
bg million: [0, 0, 0, 0, 0, 15, 66, 64]

Write your code or pseudo code of your ideas to understand your problem.

@martinvahi
Copy link
Author

I think that the compile-time endianness detection will do.
Thank You for the answers and code examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants