-
Notifications
You must be signed in to change notification settings - Fork 136
Hprose Specification
The hprose was created as a lightweight, self-describing, semi-text, compact, dynamic-typed, language-neutral, platform-neutral protocol.
The hprose protocol has the following design goals:
- It must be self-describe, not require IDL or external schema definitions.
- It must be human-readable except the original binary data, even if there is a little difficult.
- It must be as compact as possible.
- It must be as fast as possible.
- It must be language-independent.
- It must be platform-neutral.
- It must support recursive types.
- It must support unicode strings.
- It must support 8-bit binary data without escaping or using attachments.
Hprose serialization has 7 value types:
- Integer (32-bit signed int)
- Long (unlimited long integer)
- Double (float, double or decimal)
- Boolean
- UTF8 char (16-bit unicode char, UTF8 format, 1-3 bytes)
- Null
- Empty (empty string, empty binary data)
The tags are case-sensitive. The serialization data are blank-sensitive.
If the integer n
satisfies 0 ≤ n ≤ 9, the serialization data is represented by the octet 0x30 + n.
The other integer is represented like this:
i<n>;
The tag i
represents the beginning of the integer, and the tag ;
represents the end of the integer.
<n>
is the string representation of n.
For example:
0 # int 0 8 # int 8 i1234567; # int 1234567 i-128; # int -128
The long integer n
is represented like this:
l<n>;
The tag l
represents the beginning of the long integer, and the tag ;
represents the end of the long integer.
<n>
is the string representation of n.
For example:
l1234567890987654321; # long 1234567890987654321 l-987654321234567890; # long -987654321234567890
The double type has a 3 special value: NaN, Positive Infinity and Negative Infinity.
The NaN is represented by the octet 'N'
.
The Positive Infinity is represented by two octets 'I+'
.
The Negative Infinity is represented by two octets 'I-'
.
The other double n
is represented like this:
d<n>;
The tag d
represents the beginning of the double. and the tag ;
represents the end of the double.
<n>
is the string representation of n.
For example:
N # NaN I+ # Infinity I- # -Infinity d3.1415926535898; # double 3.1415926535898 d-0.1; # double -0.1 d-1.45E23; # double -1.45E23 d3.76e-54; # double 3.76e-54
The exponential symbol e
is case-insensitive.
The octet 't'
represents true, and the octet 'f'
represents false.
t # true f # false
Most languages have a unicode char type, such as Java, C#. UTF8 char is used to store this type data. So it can't stores all the unicode codepoint. If an unicode codepoint need 2 char to store, it will be serialized to a String.
The char <c>
is represented like this:
u<c>
The tag u
represents the beginning of the UTF8 char. There is no end tag, because UTF8 char is self-describe.
<c>
is the utf8 encode char. For example:
uA # 'A' is stored as 1 octet 0x41 u½ # '½' is stored as 2 octets 0xC2 0xBD u∞ # '∞' is stored as 3 octets 0xE2 0x88 0x9E
Null represents a null pointer or null object. The octet 'n'
represents the null value.
n # null
Empty represents a empty string or empty binary data. The octet 'e'
represents the empty value.
e # empty
DateTime is a reference type in hprose, although in some programming languages may be a value type. This will not bring any problems. We will discuss the difference between reference types and value types in Ref section.
DateTime type can represent local and utc time. The local time end with the tag ;
, and the utc time end with the tag Z
.
DateTime data can contain the year, month, day, hour, minute, second, millisecond, microsecond and nanosecond, but not all have to be required. It could only contain year, month, day to represent date or only contain hour, minute, second to represent time.
The tag D
represents the beginning of the date, The tag T
represents the beginning of the time.
For example:
D20121229; # local date 2012-12-29 D20121225Z # utc date 2012-12-25 T032159; # local time 03:21:59 T182343.654Z # utc time 18:23:43.654 D20121221T151435Z # utc datetime 2012-12-21 15:14:35 D20501228T134359.324543123; # local datetime 2050-12-28 13:43:59.324543123
Bytes type represents binary data. It corresponds to byte[] or stream object in the programming language.
The max length of bytes is 2147483647.
The binary data bytes
(assuming its length is len
) is represented like this:
b<len>"<bytes>"
The tag b
represents the beginning of binary data. If <len>
is 0, <len>
can be omitted. <bytes>
is the raw binary data. The tag "
is used to represent the beginning and the end of the binary data.
For example:
b"" # empty binary data b10"!@#$%^&*()" # byte[10] { '!', '@', '#', '$', '%', '^', '&', '*', '(', ')' }
String type represents unicode character string or array encoded in UTF8.
The max length of string is 2147483647. The length is the number of neither bytes nor unicode codepoints. It is the number of 16-bit unicode characters.
The string str
(assuming its length is len
) is represented like this:
s<len>"<str>"
The tag s
represents the beginning of string. If <len>
is 0, <len>
can be omitted. <str>
is the utf8 encode string. The tag "
is used to represent the beginning and the end of the string.
For example:
s"" # empty string s12"Hello world!" # string 'Hello world!' s2"你好" # string '你好'
GUID is short for Globally Unique Identifier. It corresponds to GUID or UUID in the programming language.
GUID is a 128-bit value. it is represented like this:
g{<GUID>}
The tag g
represents the beginning of GUID. The tags {
}
are used to represent the beginning and the end of the GUID. <GUID>
is a string formatted as 32 hexadecimal digits with groups separated by hyphens, such as AFA7F4B1-A64D-46FA-886F-ED7FBCE569B6.
For example:
g{AFA7F4B1-A64D-46FA-886F-ED7FBCE569B6} # GUID 'AFA7F4B1-A64D-46FA-886F-ED7FBCE569B6'
GUID string is case-insensitive.
List is a recursive reference type. It corresponds to array, list, set or collection in the programming language.
It can contain 0 or more elements. the element can be any valid type in hprose.
The max count of list elements is 2147483647.
The list with n elements is represented like this:
a<n>{<element_1><element_2><element_3>...<element_n>}
The tag a
represents the beginning of list. If <n>
is 0, <n>
can be omitted. The tags {
}
are used to represent the beginning and the end of the list. <element_i>
is serialized data for every element. There is no delimiter between elements, because the serialized data is self-describe.
For example:
a{} # 0 element array a10{0123456789} # int[10] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} a7{s3"Mon"s3"Tue"s3"Wed"s3"Thu"s3"Fri"s3"Sat"s3"Sun"} # string[7] {'Mon, 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'} a3{a3{123}a3{456}a3{789}} # int[][] { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} }
Map is another recursive reference type. It corresponds to map, hashtable, dictionary or dynamic object in the programming language.
It can contain 0 or more key-value pairs. The key and the value can be any valid type in hprose.
The max count of map key-value pairs is 2147483647.
The map with n key-value pairs is represented like this:
m<n>{<key_1><value_1><key_2><value_2>...<key_n><value_n>}
The tag m
represents the beginning of map. If <n>
is 0, <n>
can be omitted. The tags {
}
are used to represent the beginning and the end of the map. <key_i>
is serialized data for every key, <value_i>
is serialized data for every value. There is no delimiter between key, value and key-value pairs, because the serialized data is self-describe.
For example:
m{} # 0 key-value pairs map m2{s4"name"s5"Tommy"s3"age"i24;} # { # "name": "Tommy", # "age" : 24 # }
Object is similar to the map, but the key of the object must be a string, and the object has a fixed structure. Here, this fixed structure is called class. The class corresponds to struct, class or object prototype in the programming language.
The object serialization is more complex than the map. It is separated into two parts:
- Class serialization
- Object instance serialization
Every class has an integer reference, which is referenced by object instances serialization.
The class integer reference starts from 0. It means that, if there are some different objects belong to some different classes, when serializing the classes, the first serialized class is 0, the second serialized class is 1, and so on.
The form of class serialization is below:
c<type_name_length>"<type_name_string>"<field_count>{<field_1_name><field_2_name>...<field_n_name>}
The form of object instance serialization is below:
o<class_integer_reference>{<field_1_value><field_2_value>...<field_n_value>}
For example:
class Person { String name; int age; } Person[] users = new Person[2]; users[0] = new Person("Tommy", 24); users[1] = new Person("Jerry", 19);
The serialization data for users is:
a2{c6"Person"2{s4"name"s3"age"}o0{s5"Tommy"i24;}o0{s5"Jerry"i19;}}
We know that JSON is a lightweight data format. But JSON can't represent the following data:
var list = []; list[0] = list;
Because JSON does not support ref. But hprose can serialize it like this:
a1{r0;}
Every reference type data has an integer reference, it is independent of class integer references.
The integer reference of reference data starts from 0. When the same data is serialized again, it will be serialized as a ref.
But note that the ref itself does not have the integer reference.
The ref data is represented like this:
r<integer_reference>;
The tag r
represents the beginning of the ref. and the tag ;
represents the end of the ref.
For example:
var list = [ { "name": "Tommy", "age" : 24 }, { "name": "Jerry", "age" : 18 } ];
The serialization data for list is:
a2{m2{s4"name"s5"Tommy"s3"age"i24;}m2{r2;s5"Jerry"r4;i18;}}
Another example:
var a = []; var b = []; a[0] = a; a[1] = b; b[0] = a; b[1] = b; var c = [a,b];
The serialization data for c is:
a2{a2{r1;a2{r1;r2;}}r2;}
Hprose RPC protocol supports multicall, passing arguments by reference and synchronous and asynchronous communication.
Hprose RPC protocol is very simple. It consists of a few paragraphs data. Each piece of data contains at least 1 octet tag. The following are hprose RPC protocol tags:
- 0x46('F'): Function List
- 0x43('C'): RPC Call
- 0x52('R'): RPC Result
- 0x41('A'): RPC Arguments
- 0x45('E'): RPC Error
- 0x7A('z'): End
The RPC server can publish one or more functions/methods. Each function/method has a name. The name is expressed as a string. The function list is a list of the function/method names.
If the client-side only send the End tag 'z' without other data to the server. The server-side should return the function list.
Hprose RPC protocol is dynamic and weakly typed. It supports dynamic type arguments, variable-length arguments function/method. So the function list only a function/method name list, it has no argument information.
For example, if the server side published two functions, one is 'hello', the other is 'MD5', The function list should return the form below:
Fa2{s5"hello"s3"MD5"}z
Hprose RPC server supports publishing missing functions/methods. It means that if a function/method the client-side invoked is not existed in the server-side, it can be processed by a unified handler. If the server-side set up such a unified handler. The function list should have a '*' item to represent it.
For example:
Fa1{s1"*"}z
The named functions/methods and the missing functions/methods unified handler can be published together.
For example:
Fa3{s1"*"s5"hello"s3"MD5"}z
Hprose client can ignore its implementation, but the server-side must implement it.
RPC call launches by the client-side, for example:
Cs5"hello"a1{s5"world"}z # result = client.hello("world");
You can pass zero or more arguments, for example:
Cs3"sum"a3{012}z # result = client.sum(0, 1, 2);
If there is no argument, the argument list can be ignored, for example:
Cs9"deleteAll"z # client.deleteAll();
you can pass arguments by reference, for example:
Cs4"sort"a1{a10{2465318790}}tz # void Sort(ref int[] a); // defined in server # int[] a = new int[] {2, 4, 6, 5, 3, 1, 8, 7, 9, 0}; # client.sort(ref a);
The function/method name published by hprose is case-insensitive. such as the above example, the function/method name defined in server-side is "Sort", but in client-side, it can use 'sort', 'Sort' or 'SORT' to invoke it, it's no problem.
After tag 'C', tag 's' represents name string, tag 'a' represents argument list, tag 't' represents passing arguments by reference. DO NOT use tag 'f' represents passing arguments by value, because this is the default behavior. tag 'z' represents the end of call.
The function/method name and argument list are independent object serialization. So they have the independent integer reference.
Hprose supports multicall, for example:
Cs5"hello"a1{s5"world"}Cs3"sum"a3{012}z # client.beginBatch(); # client.hello("world"); # client.sum(0, 1, 2); # results = client.endBatch();
Multicall is an optional feature of hprose client, the client does not have to implement it.
RPC reply returns from the server-side, for example:
string hello(string str) { return "Hello " + str + "!"; } int sum(int a, int b, int c) { return a + b + c; }
Cs5"hello"a1{s5"world"}z # client call Rs12"Hello world!"z # server reply Cs3"sum"a3{012}z # client call R3z # server reply
The return value can be any type supported by hprose.
If published function/method has no return value, the return value is Null.
After return value is the returned argument list if the arguments passed by reference, for example:
void Sort(ref int[] a) { Array.Sort(a); }
Cs4"sort"a1{a10{2465318790}}tz # client call RnAa1{a10{0123456789}}z # server reply
After tag 'R' is the serialized result, After tag 'A' is the serialized argument list. At the reply end is the tag 'z'.
The function/method result and argument list are independent object serialization. So they have the independent integer reference.
In the above example, the argument of server-side function do not need to be defined as ref argument. In fact, the Array.Sort can be published directly. However, this is about the protocol implementation, we do not do more discussion here.
For multicall reply, here is the example:
Cs5"hello"a1{s5"world"}Cs3"sum"a3{012}z # client multicall Rs12"Hello world!"R3z # server reply
If there is a error occurs during the execution of the remote function/method, hprose server should return the error message in the form below:
E<error_message>z
<error_message>
is the serialized error message string, for example:
void errorExample() { throw new Exception("This is a error example."); }
Cs12"errorExample"z # client call Es24"This is a error example."z # server reply
If a error occurs during a multicall, the multicall can be interrupted. for example:
Cs5"hello"a1{s5"world"}Cs12"errorExample"Cs3"sum"a3{012}z # client multicall Rs12"Hello world!"Es24"This is a error example."z # server reply
or not be interrupted. for example:
Cs5"hello"a1{s5"world"}Cs12"errorExample"Cs3"sum"a3{012}z # client multicall Rs12"Hello world!"Es24"This is a error example."R3z # server reply
Hprose RPC communication can use any underlying network protocol for transport, such as HTTP, TCP or UNIX socket.
When hprose RPC works on HTTP, the hprose RPC data is sended as the body of the POST request, and returned as the body in response.
Hprose RPC has no special requirements on HTTP head.
Hprose RPC can also work on TCP or UNIX socket. The length of the hprose RPC data is send as the head in 4 bytes big endian encoding. The hprose data is send as the body without encoding. For example:
<0x00,0x00,0x00,0x18>Cs5"hello"a1{s5"world"}z # result = client.hello("world");
<0x00,0x00,0x00,0x18> means 4 bytes big endian encoding of integer 24.
The response format is also like this. The top bit of the first byte is always `0`, it means the max length of the body is 231 - 1.
This head is designed to deal with the packets of socket data quickly and easily.
Half Duplex Socket Binding is easy to implement, but the availability of the connections are not very high. So we support a Full Duplex Socket Binding, too. The length of the hprose RPC data is also send as the head in 4 bytes big endian encoding, but the top bit of the first byte is always `1`. This bit is not the sign of the length, because the length is always a positive integer, it is only for a half duplex or full duplex. and then followed 4 bytes as the request id. This 8 bytes is the head. The hprose data is send as the body without encoding. for example:
<0x80,0x00,0x00,0x18><0x00,0x00,0x00,0x00>Cs5"hello"a1{s5"world"}z # client call <0x80,0x00,0x00,0x13><0x00,0x00,0x00,0x00>Rs12"Hello world!"z # server reply <0x80,0x00,0x00,0x10><0x00,0x00,0x00,0x01>Cs3"sum"a3{012}z # client call <0x80,0x00,0x00,0x18><0x00,0x00,0x00,0x02>Cs5"hello"a1{s5"world"}z # client call <0x80,0x00,0x00,0x13><0x00,0x00,0x00,0x02>Rs12"Hello world!"z # server reply <0x80,0x00,0x00,0x03><0x00,0x00,0x00,0x01>R3z # server reply
<0x80,0x00,0x00,0x18> means 4 bytes big endian encoding of integer 24.
<0x00,0x00,0x00,0x00>, <0x00,0x00,0x00,0x01> and <0x00,0x00,0x00,0x02> are the request ids. The client use it to deal with the relationship of the request and response.
Hprose RPC can also work on websocket. The hprose RPC data is sended and received as binary data on websocket.
Hprose RPC add 4 bytes head as request id, the server don't need to care the request id encoding, only need to repeat it in the response.
for example:
<0x00,0x00,0x00,0x00>Cs5"hello"a1{s5"world"}z # client call <0x00,0x00,0x00,0x00>Rs12"Hello world!"z # server reply <0x00,0x00,0x00,0x01>Cs3"sum"a3{012}z # client call <0x00,0x00,0x00,0x02>Cs5"hello"a1{s5"world"}z # client call <0x00,0x00,0x00,0x02>Rs12"Hello world!"z # server reply <0x00,0x00,0x00,0x01>R3z # server reply
<0x00,0x00,0x00,0x00>, <0x00,0x00,0x00,0x01> and <0x00,0x00,0x00,0x02> are the request ids. The client use it to deal with the relationship of the request and response.
No other special requirements.
ABNF:
serialize-data = integer / long / double / nan / positive-infinity / negative-infinity / true / false / null / empty / utf8-char / bytes / string / datetime / guid / list / map / object / ref UINT = "0" / %x31-39 *DIGIT SINT = *("+" / "-") UINT integer = DIGIT / %x69 SINT ";" long = %x6C SINT ";" double = %x64 SINT ["." 1*DIGIT ["E" SINT]] ";" nan = %x4E positive-infinity = %x49 "+" negative-infinity = %x49 "-" true = %x74 false = %x66 null = %x6E empty = %x65 one-byte-utf8 = %x00-7F two-byte-utf8 = %xC0-DF %x80-BF three-byte-utf8 = %xE0-EF %x80-BF %x80-BF four-byte-utf8 = %xF0-F7 %x80-BF %x80-BF %x80-BF utf8 = one-byte-utf8 / two-byte-utf8 / three-byte-utf8 / four-byte-utf8 utf8-char = %x75 (one-byte-utf8 / two-byte-utf8 / three-byte-utf8) bytes-length = UINT bytes = %x62 DQUOTE DQUOTE / %x62 bytes-length DQUOTE <bytes-length>OCTET DQUOTE string-length = UINT string = %x73 DQUOTE DQUOTE / %x73 string-length DQUOTE <string-length>utf8 DQUOTE year = 4DIGIT month = "0" %x31-39 / "1" %x30-32 day = "0" %x31-39 / %x31-32 %x30-39 / "3" %x30-31 local = ";" utc = %x5A timezone = local / utc date = %x44 year month day timezone hour = %x30-31 DIGIT / "2" %x30-x33 minute = %x30-35 DIGIT second = %x30-35 DIGIT millisecond = 3DIGIT microsecond = 6DIGIT nanosecond = 9DIGIT time = %x54 hour minute second ["." (millisecond / microsecond / nanosecond)] timezone datetime = date / time / %x44 year month day time guid = %x67 "{" 8HEXDIG "-" 4HEXDIG "-" 4HEXDIG "-" 4HEXDIG "-" 12HEXDIG "}" element = serialize-data list-count = UINT list = %x61 "{" "}" / %x61 list-count "{" <list-count>element "}" key = serialize-data value = serialize-data keyvalue = key value map-count = UINT map = %x6D "{" "}" / %x6D map-count "{" <map-count>keyvalue "}" classname-length = UINT classname = classname-length DQUOTE <classname-length>utf8 DQUOTE fieldname = string field-count = UINT class = %x63 classname field-count "{" <field-count>fieldname "}" class-ref = UINT fieldvalue = serialize-data object = [class] %x6F class-ref "{" <field-count>fieldvalue "}" ref = %x72 UINT ";" string-list = %x61 list-count "{" <list-count>string "}" end = %x7A function-list = %x46 string-list end function-name = string function-arguments = list passing-by-ref = true rpc-call = 1*(%x43 function-name [[function-arguments] passing-by-ref]) end rpc-result = %x52 serialize-data rpc-arguments = %x41 list rpc-error = %x45 string rpc-reply = rpc-error end / 1*(rpc-result [rpc-arguments]) [rpc-error] end
Ma Bingyao <[email protected]>
© Copyright 2008-2015 Hprose.com. All Rights Reserved.
Any party may implement this protocol for any purpose under the terms of the MIT License, provided that the implementation conforms to this specification.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and these paragraphs are included on all such copies and derivative works.
THIS DOCUMENT AND THE INFORMATION CONTAINED HEREIN IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.