The following characters are in many cases skipped before reading any element: space, tab, newline, carriage-return, comma and semicolon. They are essentially ignored. Binary serialized things should always start with a non-printable ascii character.
TVec and TMat will be serialized differently depending on the implicit_storage flag of the PStream they are being written to.
If implicit_storage is set, then serialization won't write the actual whole structure of the TVec or TMat, but will only save the size information and elements as a 1D or 2D sequence (see 6.1.4 and 6.1.5), ex:
4 [ 1.2 3.5 2.8 5.2 ] 3 2 [ 0.1 0.2 0.3 0.4 0.5 0.6 ]
If implicit_storage is false, then the complete structure of the TVec or TMat with the pointer to its storage (possibly shared with others) will be written explicitly. This corresponds to true, deep serialization.
Ex:
TVec( 4 0 *1->Storage(4 [ 1.2 3.5 2.8 5.2 ]) ) TMat( 3 2 2 0 *2->Storage(6 [ 0.1 0.2 0.3 0.4 0.5 0.6 ] ) )
For TVec, we have length offset followed by the storage pointer. For TMat, we have length width mod offset followed by the storage pointer.
This allows to keep structure. For example, if we had a submatrix viewing the second column of the previous TMat, we would have:
TMat( 3 1 2 1 *2 )
To allow mixing of ascii and binary in a file, a non-printable ascii character is used as a one-byte header to identify any binary portion. In Table 6.1 we give the header codes for all basic types
Note that char is considered to be the same as signed char, and long is considered to be the same as int, i.e.: 4-bytes long, which is the case on current architectures.
Base type | Byte order | Header byte | Number of bytes to follow |
char | - | 0x01 | 1 |
signed char | - | 0x01 | 1 |
unsigned char | - | 0x02 | 1 |
short | little-endian | 0x03 | 2 |
short | big-endian | 0x04 | 2 |
unsigned short | little-endian | 0x05 | 2 |
unsigned short | big-endian | 0x06 | 2 |
int | little-endian | 0x07 | 4 |
int | big-endian | 0x08 | 4 |
unsigned int | little-endian | 0x0B | 4 |
unsigned int | big-endian | 0x0C | 4 |
long | little-endian | 0x07 | 4 |
long | big-endian | 0x08 | 4 |
unsigned long | little-endian | 0x0B | 4 |
unsigned long | big-endian | 0x0C | 4 |
float | little-endian | 0x0E | 4 |
float | big-endian | 0x0F | 4 |
double | little-endian | 0x10 | 8 |
double | big-endian | 0x11 | 8 |
PRInt64 | little-endian | 0x16 | 4 |
PRInt64 | big-endian | 0x17 | 4 |
PRUint64 | little-endian | 0x18 | 4 |
PRUint64 | big-endian | 0x19 | 4 |
We consider both one-dimensional sequences ( array, vector, ...) which only have a length, and two-dimensional sequences which have a length and a width.
Ascii-serialized one-dimensional sequences will have the following format:
length [ ... ... ... ]
with the elements of the sequence separated by a single space.
However, on reading, several variations of this format are recognized:
Ascii-serialized two-dimensional sequences will have the following format:
length width [
... ... ... ... ... ... ]
with the elements of each row separated by a tab, and the rows separated by a newline.
However on reading, blanks, commas and semi-colons between elements are completely ignored (skipped), so you may format the data as you wish.
2D Sequences are used exclusively for TMats. Notice that it's also possible to make a 1D sequence of 1D sequences, but that's different from a 2D sequence.
We consider both one-dimensional sequences ( array, vector, ...) which only have a length, and two-dimensional sequences which have a length and a width.
The following table gives the corresponding header-byte:
Type of sequence | byte-order | Header byte |
one-dimensional | little-endian | 0x12 |
one-dimensional | big-endian | 0x13 |
two-dimensional | little-endian | 0x14 |
two-dimensional | big-endian | 0x15 |
All that follows is supposed to be in the byte-order implied by the header-byte.
The first header-byte is followed by an element-type byte giving the nature of the elements in the sequence. It can be either the byte identifying a base-type given in Table 6.1 (the endianness must match), or '0' = 0x30 to indicate a sequence of booleans (1 byte per boolean) or 0xFF to indicate a generic sequence.
The header bytes are followed by one (for 1D sequences) or two (for 2D) 4-byte int to indicate the length (and possibly width) of the sequence. So the total header size for sequences is 6 bytes for 1D sequences and 10 bytes for 2D sequences.
This header is followed by a dump of the elements of the sequence (in row-major mode for 2D). Notice that a sequence of a base type, may be saved as a generic sequence (with the element-type byte 0xFF)
Type of sequence | Header byte | Followed by |
Generic on little-endian | 0x12 | size as 4-byte little-endian int, |
then binary serialization of the elements | ||
Generic on big-endian | 0x13 | size as 4-byte big-endian int, |
then binary serialization of the elements | ||
Sequence of a base-type | 0x14 | size as 4-byte little-endian int, |
on little-endian | base-type given by header byte in previous | |
table, followed by binary dump of elements | ||
Sequence of a base-type | 0x15 | size as 4-byte big-endian int, |
on big-endian | base-type given by header byte in previous | |
table, followed by binary dump of elements |