Passing parameters between C++, PHP, JavaScript, etc...






4.25/5 (9 votes)
Easy to implement minimal format for simple data exchange, especially between new and obscure scripting languages.
Introduction
A common programming task (the only task?) is moving and manipulating data, sometimes between different languages. This can be a chore since language creators don't implement any standard serializing format for nested data such as arrays. In fact, many languages don't support any native serializing at all.
Scripting is all the rage these days. Perhaps, it's just the particular field I work in, but I seem to run into more and more scripting languages. Yes, there are the ubiquitous languages, JavaScript, ASP, PHP, etc... but there are countless other scripting languages for industrial robotics control, camera control, phone switching, on and on.
At some point, you want to get data into or out of these languages, and the choices can be vague or less than optimal. XML is standard, but XML data is bulky and hard to parse. An XML parser is not something I really want to add to ACME's Burger Flipper Scripting Language, just so you can get burger flipping statistics into your Total Meal Performance package. Ultra simple string formats are not extensible, and require you to constantly upgrade everything to handle new parameters. Other custom formats I see used are often inefficient, limited in usefulness, or buggy.
The goal of this article then, is to provide an easy to implement minimal format for simple data exchange especially between new and obscure scripting languages.
Instead of calling it ETIMFFSDEEBNAOSL, which sounds German, I'm going to refer to this format formally as Simple Cross-language Serializing or SCS, so we have a shorthand, and later, in design reviews, we can use cool phrases like, 'I'll just SCS that data to you' and make nearby management people feel dumb.
Goals
Unfortunately, the section above already one-upped this one by not only mentioning the goal, but making a joke about it. So, instead of restating it, I'll just go into more detail.
Above all, simplicity and flexibility will be stressed. There will be ways in which the protocol could be expanded to reduce the size of the serialized data, such as using binary, or compression, etc.... But the trade off would be to increase the complexity of the encoder / decoder, and thus increase the implementation / debugging time and raise the risk of compatibility flaws and lazy implementations.
Since this data format is chiefly for scripting formats that are new and/or obscure, we want something that is quickly and easily implemented.
So, steps to obtaining our goal will be to...
- Define a compact portable data format for nested data.
- Outline an easy to implement encoder / decoder.
- Provide example encoder / decoder in C++, PHP, and JavaScript.
Use
To make this article a little easier to read and understand, I'm going to cover a few example uses before I go further, since I don't have a cool picture to put on top of the article. Here is a simple example of how SCS will be used for, say, passing data from PHP to JavaScript, for instance...
// Create PHP array $person_info = array( 'Fred'=>array( 'hair'=>'blonde', 'eye'=>'blue', 'age'=>26 ) ); // Send to Javascript echo 'var javaArray = DeserializeArray( "' . SerializeArray( $person_info ) . '" );';
Or how about sending data from PHP to C++:
// PHP - Return data on fred $person_info = QueryDatabase( "fred" ); echo SerializeArray( $person_info ); // C++ - Use fred's age CPsPropertyBag cPersonInfo( GetPhpReturnString() ); long lAge = cPersonInfo[ "Fred" ][ "Age" ].ToLong();
Why another format?
If you Google 'pass array PHP JavaScript', substituting your favorite languages, you'll find various implementations including many lazy implementations of standards like XML. The goal here is to define the simplest possible practical implementation. In fact, ideally, a lazier implementation will not even be possible.
Note on XML. It seems I've run into many people that claim XML is the end all format. I've never seen such strong claims of one-size-fits-all before, despite the many thousands of formats that have gone before. But, it seems many are determined to make XML the only format for data exchange. Someone has even once suggested to me that I base-64 encode streaming video data and wrap it in XML to make it 'standard'. I don't share, and am not going to consider, such a narrow view on any format or language. There is a clear trade-off between a formats feature set and the complexity of implementation. I will offer what I hope is an objective comparison with XML for the challenges within the scope of this article.
What about other standards? There is a huge list of similar protocols, check out XML Alternatives for a start. After many hours of pouring through many formats, I was unable to find a good fit for the objectives explained here.. If I've missed an identical solution, you're welcome to rub it in. But, I doubt you will be able to say it was obvious. At some point, one just has to bite the bullet and get things done; at least, I'm sharing in an obvious place...
Custom solutions. Another thing I'm going to look at, is a common runaway encoding technique many people use on custom implementations. It seems that their implementation started as a flat representation, such as a=1,b=2,c=3, then recursion was added to handle nested data. Though the simplicity is hard to beat, the data expansion from nested encoding can be pretty significant. It also makes the nested values hard for a human to read. Because of these two factors, I will stray from simplicity to solve this one problem.
Here is such a runaway implementation in PHP:
function RunawayEncode( &$params, $sep = '&', $arr = '?' ) { $ret = ''; foreach( $params as $key => $val ) { if ( strlen( $key ) && $val ) { // Continue link if ( strlen( $ret ) ) $ret .= $sep; // Multidimensional assignment if ( $arr && is_array( $val ) ) $ret .= $key . '=' . $arr . rawurlencode( MakeParams( $val, $sep, $arr ) ); // Flat assignment else $ret .= $key . '=' . rawurlencode( $val ); } // end if } // end foreach return $ret; } function RunawayDecode( $params, $sep = '&', $arr = '?' ) { $ret = array(); $parr = split( $sep, $params ); foreach( $parr as $val ) { $kv = split( '=', $val ); // NULL param if ( !isset( $kv[ 0 ] ) || !strlen( $kv[ 0 ] ) ); // One dimensional else if ( !isset( $kv[ 1 ] ) ) $ret[ $kv[ 0 ] ] = 1; // Flat assignment else if ( !$arr || $kv[ 1 ][ 0 ] != $arr ) $ret[ $kv[ 0 ] ] = rawurldecode( $kv[ 1 ] ); // Multi dimensional assignment else $ret[ $kv[ 0 ] ] = ParseParams( rawurldecode( substr( $kv[ 1 ], 1 ) ), $sep, $arr ); } return $ret; }
The data format
The encoding rules for SCS data:
- The equality character '=' will separate name from value.
- The comma character ',' will separate name/value pairs.
- The curly bracket characters '{' and '}' will enclose nested data.
- All data will be encoded as strings using the URL encoding scheme as described in RFC 1738.
Pretty simple? I choose RFC 1738 encoding because many script languages have built-in functions, and if not, it's easy to implement (see the C++ ScsSerialize.h header for an example). Additionally, much of the same logic behind using this encoding in URLs applies here. This format also gives us the advantage of being able to read most data rather easily. Here is an example of an encoded array:
Mary{Married=No,DOB=7-2-82}
The PHP declaration of the above array would be:
$test_array = array( 'Mary'=> array( 'Married'=>'No', 'DOB'=>'7-2-82', ), );
Here's a slightly more complex example demonstrating nested encoding. Notice how data is only encoded once.
Mary{Married=No,DOB=7-2-82,Pets{Dog=1},Invalid%20Characters=%21%40%23%24%25%5E%26%2A%28%29}
Declared in PHP...
$test_array = array( 'Mary'=> array( 'Married'=>'No', 'DOB'=>'7-2-82', 'Pets'=> array( 'Dog'=>1, ), 'Invalid Characters'=>'!@#$%^&*()' ), );
Parsing
As mentioned, the big pro here is that this can be easily parsed. Here is an example PHP implementation. You'll notice that the encoding function is similar in complexity to the above runaway example; however, the decoding function is more complex. This extra complexity avoids the re-encoding of data. Myself and a few others attempted a simpler decoder and struck out. I'd be very interested if anyone is able to do better.
function ScsSerialize( &$params ) { $ret = ''; foreach( $params as $key => $val ) { if ( $ret ) $ret .= ','; // Save the key $ret .= rawurlencode( $key ); // Save array if ( is_array( $val ) ) $ret .= '{' . ScsSerialize( $val ) . '}'; // Save single value if any else if ( strlen( $val ) ) $ret .= '=' . rawurlencode( $val ); } // end foreach return $ret; } function ScsDeserialize( $params, &$last = 0 ) { $arr = array(); $l = strlen( $params ); $s = 0; $e = 0; while ( $e < $l ) { switch( $params[ $e ] ) { case ',' : case '}' : { // Any data here? if ( 1 < $e - $s ) { // Divide $a = split( '=', substr( $params, $s, $e - $s ) ); // Valid? if ( !isset( $a[ 0 ] ) ) $a[ 0 ] = 0; else $a[ 0 ] = rawurldecode( $a[ 0 ] ); // Single value? if ( !isset( $a[ 1 ] ) ) $arr[ $a[ 0 ] ] = ''; // Key / value pair else $arr[ $a[ 0 ] ] = rawurldecode( $a[ 1 ] ); } // end if // Move start $s = $e + 1; // Punt if end of array if ( '}' == $params[ $e ] ) { if ( $last ) $last = $e + 1; return $arr; } } break; case '{' : { $k = rawurldecode( substr( $params, $s, $e - $s ) ); if ( isset( $k ) ) { $end_array = 1; $arr[ $k ] = ScsDeserialize( substr( $params, $e + 1 ), $end_array ); $e += $end_array; } // end if $s = $e + 1; } break; } // end switch // Next e $e++; } // end while return $arr; }
Here's the JavaScript version. Not quite a one to one, as JavaScript apparently doesn't support references of generic types.
function ScsSerialize( x_params )
{
var ret = '';
for ( var key in x_params )
{
if ( key && x_params[ key ] )
{
// Continue link
if ( ret ) ret += ',';
// Save the key
ret += escape( key );
if( x_params[ key ].constructor == Array ||
x_params[ key ].constructor == Object )
{
ret += '{' + ScsSerialize( x_params[ key ] ) + '}';
}
else ret += '=' + escape( x_params[ key ] );
} // end if
}
return ret;
}
function ScsDeserialize( x_params, x_arr )
{
var l = x_params.length, s = 0, e = 0;
while ( e < l )
{
switch( x_params[ e ] )
{
case ',' : case '}' :
{
var a = x_params.substr( s, e - s ).split( '=' );
if ( 1 < e - s )
{
// Valid?
if ( null == a[ 0 ] ) a[ 0 ] = 0;
// Decode
else a[ 0 ] = unescape( a[ 0 ] );
// Single value?
if ( null == a[ 1 ] ) x_arr[ 0 ] = '';
// Key / value pair
else x_arr[ a[ 0 ] ] = unescape( a[ 1 ] );
} // end if
// Next data
s = e + 1;
// Punt if end of array
if ( '}' == x_params[ e ] ) return e + 1;
} break;
case '{' :
{
// Get the key
var k = x_params.substr( s, e - s );
if ( k.length )
{
// Decode the key
k = unescape( k );
// Decode array
x_arr[ k ] = Array();
e += ScsDeserialize( x_params.substr( e ), x_arr[ k ] );
} // end if
// Next data
s = e + 1;
} break;
} // end switch
// Next e
e++;
} // end while
return e;
}
I know it's kinda long, but I'll go ahead and post the C++ version just so this article is as complete as possible. The mad cut-and-pasters will appreciate it, I'm sure. The actual encode / decode functions are about the same, but I added functions for converting from strings to integers and doubles etc... Just to make things easy to use. C++ does not have built-in support of this type.
#include <map> #include <string> //================================================================== // TScsPropertyBag // /// Implements a multi-dimensional property bag with nested serialization /** This class provides functionality for a multi-dimensional property bag. It also provides automatic type conversions and, hopefully, easily ported serialization. Typical use CScsPropertyBag arr1, arr2; arr1[ "A" ][ "AA" ] = "Hello World!"; arr1[ "A" ][ "AB" ] = (long)1; arr1[ "B" ][ "BA" ] = (double)3.14159; for ( long i = 0; i < 4; i++ ) arr1[ "list" ][ i ] = i * 2; // Encode CScsPropertyBag::t_String str = arr.serialize(); // Let's have a look at the encoded string... TRACE( str.c_str() ); TRACE( _T( "\n" ) ); // Decode arr2.deserialize( str ); // 'Hello World!' check... TRACE( arr2[ "A" ][ "AA" ] ); TRACE( _T( "\n" ) ); // Get long value long lVal = arr2[ "A" ][ "AB" ].ToLong(); // Get double double dVal = arr2[ "B" ][ "BA" ].ToDouble(); // Get string value LPCTSTR pString = arr2[ "list" ][ 0 ]; */ //================================================================== template < class T > class TScsPropertyBag { public: //================================================================== // CAutoMem // /// Just a simple auto pointer /** This class is a simple auto pointer. It has properties that I particularly like for this type of job. I'll quit making my own when boost comes with VC... */ //================================================================== template < class T > class CAutoMem { public: /// Default constructor CAutoMem() { m_p = NULL; } /// Destructor ~CAutoMem() { release(); } /// Release allocated object void release() { if ( m_p ) { delete m_p; m_p = NULL; } } /// Returns a pointer to encapsulated object T& Obj() { if ( !m_p ) m_p = new T; return *m_p; } /// Returns a pointer to the encapsulated object operator T&() { return Obj(); } private: /// Contains a pointer to the controlled object T *m_p; }; /// Unicode friendly string typedef std::basic_string< T > t_String; /// Our multi-dimensional string array type typedef std::map< t_String, CAutoMem< TScsPropertyBag< T > > > t_StringArray; public: /// Default constructor TScsPropertyBag() { } //============================================================== // TScsPropertyBag() //============================================================== /// Constructos object from encoded string /** \param [in] sStr - Encoded array */ TScsPropertyBag( t_String sStr ) { deserialize( sStr ); } //============================================================== // IsStdChar() //============================================================== /// Returns non-zero if the character does *not* need encoding /** \param [in] ch - Character to check */ static bool IsStdChar( T ch ) { return ( _T( 'a' ) <= ch && _T( 'z' ) >= ch ) || ( _T( 'A' ) <= ch && _T( 'Z' ) >= ch ) || ( _T( '0' ) <= ch && _T( '9' ) >= ch ) || _T( '_' ) == ch || _T( '-' ) == ch || _T( '.' ) == ch; } //============================================================== // urlencode() //============================================================== /// Returns URL encoded version of a string. /** \param [in] sStr - String to encode \return TSend_string object containing encoded string */ static t_String urlencode( t_String sStr ) { t_String sRes; long lLen = sStr.length(), i = 0; T tmp[ 256 ]; while ( i < lLen ) { if ( IsStdChar( sStr[ i ] ) ) sRes += sStr[ i ]; else { _stprintf( tmp, _T( "%%%02lX" ), (long)sStr[ i ] ); sRes += tmp; } // end else i++; } // end while return sRes; } //============================================================== // urldecode() //============================================================== /// Decodes URL encoded string /** \param [in] sStr - URL encoded string to decode \return Decoded string */ static t_String urldecode( t_String sStr ) { t_String sRes; long lLen = sStr.length(), i = 0; T tmp[ 256 ]; while ( i < lLen ) { if ( _T( '%' ) != sStr[ i ] ) sRes += sStr[ i ]; else { tmp[ 0 ] = sStr[ ++i ]; tmp[ 1 ] = sStr[ ++i ]; tmp[ 2 ] = 0; sRes += (TCHAR)( _tcstoul( tmp, NULL, 16 ) ); } // end else i++; } // end while return sRes; } //============================================================== // destroy() //============================================================== /// Releases all memory resources and prepares class for reuse. void destroy() { m_lstSub.clear(); m_str.release(); } //============================================================== // serialize() //============================================================== /// Serializes the array /** \return Serialized array. \see */ t_String serialize() { t_String sRes; // Just return our value if we're not an array if ( !IsArray() ) return m_str.Obj(); // Iterator t_StringArray::iterator pos = m_lstSub.begin(); // For each array element while ( pos != m_lstSub.end() ) { // Add separator if needed if ( sRes.length() ) sRes += _T( ',' ); sRes += pos->first; // Is it an array? if ( pos->second.Obj().IsArray() ) { sRes += _T( '{' ); sRes += pos->second.Obj().serialize(); sRes += _T( '}' ); } // Serialize the value else sRes += _T( '=' ), sRes += urlencode( (LPCTSTR)pos->second.Obj() ); // Next array element pos++; } // end while return sRes; } //============================================================== // deserialize() //============================================================== /// Deserializes an array from string /** \param [in] sStr - Serialized array string. \param [in] bMerge - Non-zero if array should be merged into current data. Set to zero to replace current array. \param [in] pLast - Receives the number of bytes decoded. \param [in] pPs - Property bag that receives any decoded characters. We could also have just called this function on the object, but this way provides a little extra flexibility for later. \return Number of items deserialized. \see */ LONG deserialize( t_String sStr, BOOL bMerge = FALSE, LONG *pLast = NULL, TScsPropertyBag *pPs = NULL ) { // Ensure object if ( !pPs ) pPs = this; // Do we want to merge? if ( !bMerge ) pPs->destroy(); LONG lItems = 0; long lLen = sStr.length(), s = 0, e = 0; while ( e < lLen ) { switch( sStr[ e ] ) { case ',' : case '}' : { if ( 1 < e - s ) { // Find '=' long a = s; while ( a < e && '=' != sStr[ a ] ) a++; t_String sKey, sVal; // First character is separator if ( a == s ) sKey = urldecode( t_String( &sStr.c_str()[ s + 1 ], e - s - 1 ) ); else sKey = urldecode( t_String( &sStr.c_str()[ s ], a - s ) ); // Single token if ( 1 >= e - a ) (*pPs)[ sKey ] = _T( "" ); // Both tokens present else (*pPs)[ sKey ] = urldecode( t_String( &sStr.c_str()[ a + 1 ], e - a - 1 ) ); // Count one item lItems++; } // end if // Next element s = e + 1; // Time to exit? if ( '}' == sStr[ e ] ) { if ( pLast ) *pLast = e + 1; return lItems; } } break; case '{' : { // Get key t_String sKey = urldecode( t_String( &sStr.c_str()[ s ], e - s ) ); // Do we have a key? if ( sKey.length() ) { // This will point to the end of the array we're about to decode LONG lEnd = 0; // Get the sub array lItems += deserialize( t_String( &sStr.c_str()[ e + 1 ] ), TRUE, &lEnd, &(*pPs)[ sKey ] ); // Skip the array we just decoded e += lEnd; } // end if // Skip this token s = e + 1; } break; } // end switch // Next i e++; } // end while return lItems; } //============================================================== // operator []() //============================================================== /// Indexes into sub array /** \param [in] pKey - Index key \return Reference to sub class. \see */ TScsPropertyBag& operator []( LPCTSTR pKey ) { return m_lstSub[ pKey ]; } //============================================================== // operator []() //============================================================== /// Indexes into sub array /** \param [in] sKey - Index key \return Reference to sub class. \see */ TScsPropertyBag& operator []( t_String sKey ) { return m_lstSub[ sKey.c_str() ]; } //============================================================== // operator []() //============================================================== /// Indexes into sub array /** \param [in] n - Index key \return Reference to sub class. \see */ TScsPropertyBag& operator []( long n ) { TCHAR szKey[ 256 ] = _T( "" ); _stprintf( szKey, _T( "%li" ), n ); return m_lstSub[ szKey ]; } //============================================================== // operator []() //============================================================== /// Indexes into sub array /** \param [in] n - Index key \return Reference to sub class. \see */ TScsPropertyBag& operator []( unsigned long n ) { TCHAR szKey[ 256 ] = _T( "" ); _stprintf( szKey, _T( "%lu" ), n ); return m_lstSub[ szKey ]; } //============================================================== // operator []() //============================================================== /// Indexes into sub array /** \param [in] n - Index key \return Reference to sub class. \see */ TScsPropertyBag& operator []( double n ) { TCHAR szKey[ 256 ] = _T( "" ); _stprintf( szKey, _T( "%g" ), n ); return m_lstSub[ szKey ]; } //============================================================== // operator = () //============================================================== /// Conversion from string object t_String operator = ( t_String sStr ) { m_str.Obj() = sStr.c_str(); return m_str.Obj(); } //============================================================== // operator = () //============================================================== /// Conversion from string t_String operator = ( LPCTSTR pStr ) { m_str.Obj() = pStr; return m_str.Obj(); } //============================================================== // operator = () //============================================================== /// Conversion from long t_String operator = ( long lVal ) { T num[ 256 ] = _T( "" ); _stprintf( num, _T( "%li" ), lVal ); m_str.Obj() = num; return m_str.Obj(); } //============================================================== // operator = () //============================================================== /// Conversion from unsigned long t_String operator = ( unsigned long ulVal ) { T num[ 256 ] = _T( "" ); _stprintf( num, _T( "%lu" ), ulVal ); m_str.Obj() = num; return m_str.Obj(); } //============================================================== // operator = () //============================================================== /// Conversion from double t_String operator = ( double dVal ) { T num[ 256 ] = _T( "" ); _stprintf( num, _T( "%g" ), dVal ); m_str.Obj() = num; return m_str.Obj(); } //============================================================== // LPCTSTR() //============================================================== /// Conversion to string operator LPCTSTR() { return ToStr(); } //============================================================== // ToStr() //============================================================== /// Returns local string object LPCTSTR ToStr() { return m_str.Obj().c_str(); } //============================================================== // ToLong() //============================================================== /// Converts to long long ToLong() { return _tcstol( ToStr(), NULL, 10 ); } //============================================================== // ToULong() //============================================================== /// Converts to unsigned long long ToULong() { return _tcstoul( ToStr(), NULL, 10 ); } //============================================================== // ToDouble() //============================================================== /// Converts to double long ToDouble() { return _tcstod( ToStr(), NULL ); } //============================================================== // IsArray() //============================================================== /// Returns non-zero if array elements are present BOOL IsArray() { return 0 < m_lstSub.size(); } private: /// Our value CAutoMem< t_String > m_str; /// Array of strings t_StringArray m_lstSub; }; /// Property bag type /** \see TScsPropertyBag */ typedef TScsPropertyBag< TCHAR > CScsPropertyBag;
Comparison
In terms of simplicity, it's hard to get much simpler. The only simpler versions I have seen are of the runaway type, or language specific. Such as manually outputting a JavaScript array, for instance. In this case, our work is lost if we want to now switch to another target language.
One way in which XML excels, as seen below, is human readability. Although it is possible to decipher the SCS string, it is not as clear unless you add new line characters. It would have been possible to make the decoder white space agnostic, but it would have required tokenizing the data. This would have just been something someone could leave out of the implementation, and thus we would have strayed from our goals. Also, the introduction of white space could potentially cause problems when pasting data as strings into source files. This has priority here as being closer to our goals of cross-language communication. Though we attempt to make it somewhat readable, take into account that human readability is not a priority for SCS when considering your options.
In terms of bandwidth, say for an AJAX project. Consider the following array...
// Test array
$A = array( 'Department'=>
array(
'Accounting'=>
array(
'John'=>
array(
'Married'=>'Yes',
'DOB'=>'1-14-78',
'Pets'=>
array(
'Fish'=>8,
'Dog'=>1,
'Cat'=>2
),
'ValidCharacters'=>'.-_',
'InvalidCharacters'=>'[,=]'
),
'Mary'=>
array(
'Married'=>'No',
'DOB'=>'7-2-82',
'Pets'=>
array(
'Dog'=>1,
),
'InvalidCharacters'=>'!@#$%^&*()'
),
),
),
);
Our SCS implementation comes in at 218 bytes. 42% less than the XML equivalent. But is harder to read. It looks like this:
Department{Accounting{John{Married=Yes,DOB=1-14-78,Pets{Fish=8,Dog=1,Cat=2},\
ValidCharacters=.-_,InvalidCharacters=%5B%2C%3D%5D},\
Mary{Married=No,DOB=7-2-82,Pets{Dog=1},\
InvalidCharacters=%21%40%23%24%25%5E%26%2A%28%29}}}
This typical XML output weighs in at 517 bytes. I struggled a little with whether or not to remove the header and formatting characters. I decided to leave them since this really is a lot of the argument for using XML, to be 'standard'. This is actually cheating a little since I used the same URL encoding instead of the more common base-64. But, XML allows me this.
<?xml version="1.0" encoding="UTF-8" ?>
<Department>
<Accounting>
<John>
<Married>Yes</Married>
<DOB>1-14-78</DOB>
<Pets>
<Fish>8</Fish>
<Dog>1</Dog>
<Cat>2</Cat>
</Pets>
<ValidCharacters>.-_</ValidCharacters>
<InvalidCharacters>%5B%2C%3D%5D</InvalidCharacters>
</John>
<Mary>
<Married>No</Married>
<DOB>7-2-82</DOB>
<Pets>
<Dog>1</Dog>
</Pets>
<InvalidCharacters>%21%40%23%24%25%5E%26%2A%28%29</InvalidCharacters>
</Mary>
</Accounting>
</Department>
The typical runaway implementation. It should be noted that this example particularly amplifies the redundant encoding issue. There are other instances where it would be competitive though never significantly better. Notice the severe mangling due to the recursive encoding.
Department=?Accounting%3D%3FJohn%253D%253FMarried%25253DYes%252526DOB%25253D1-14-78%252526\
Pets%25253D%25253FFish%2525253D8%25252526Dog%2525253D1%25252526Cat%2525253D2%252526\
ValidCharacters%25253D.-_%252526InvalidCharacters%25253D%2525255B%2525252C%2525253D%\
2525255D%2526Mary%253D%253FMarried%25253DNo%252526DOB%25253D7-2-82%252526\
Pets%25253D%25253FDog%2525253D1%252526InvalidCharacters%25253D%25252521%25252540\
%25252523%25252524%25252525%2525255E%25252526%2525252A%25252528%25252529
Flexibility
We are not going to attempt to encode variable types or other properties such as minimum and maximum values at the parser level. But these things can still be done in the framework of the current protocol. For example, consider the following XML:
<variable name=x type=float min=-10 max=10>3.14</variable>
We can represent this type of information by just adding a sub array. In the case of XML, the content or value field is implicit between the tags. We will need to add an explicit 'value' field. And the result is actually shorter than the minimal XML.
variable{name=x,type=float,min=-10,max=10,value=3.14}
Or better still...
x{type=float,min=-10,max=10,value=3.14}
Also, there cannot be similar names at a given scope. For instance...
<table><tr><td>One</td><td>Two</td></tr>
<tr><td>Three</td></tr><table>
Would have to be represented as something like:
table{tr{0{td{0=One,1=Two}},1{td{0=three}}}}
// For clarity
table
{
tr
{
0
{
td
{
0 = One,
1 = Two
}
},
1
{
td
{
0 = three
}
}
}
}
You'll find most data structures can be represented well enough in this protocol. It's usually just a matter of efficiency, especially when dealing with high-bandwidth, binary data like live video or audio. Then again, what format covers everything well?
Conclusion
I think that supplies a good idea of what was being attempted, and what was achieved. A few notes...
Notice that the supplied functions allow you to easily serialize parts of the array as well as the whole array. Also, you can decode one array into a larger array. This is a subtle but powerful construct.
The property bag concept achieved in the C++ implementation is a powerful addition to the language. It can severely cut development time when dealing with data. The nice thing about C++ is that you can describe how exactly you want operators to behave. I actually use a more advanced form of this class that allows serializing/deserializing into lots of formats like the Windows Registry, INI files, URL GET and POST variables, MIME formats, database, etc... This can be an enormously powerful way to handle generic data. I know I didn't invent this by the way, there are many examples out there...
I'd like to add more languages to this example. Perl, Python, VB, come to mind. If anyone wants to donate, please feel free.
Thanks everybody!