views:

464

answers:

1

I am attempting to put a potentially large string into a rendezvous message and was curious about size constraints. I understand there is a physical limit (64mb?) to the message as a whole, but I'm curious about how some other variables could affect it. Specifically:

  • How big the keys are?
  • How the string is stored (in one field vs. multiple fields)

Any advice on any of the above topics or anything else that could be relevant would be greatly appreciated.

Note: I would like to keep the message as a raw string (as opposed to bytecode, etc).

+3  A: 

From the Tibco docs on Very Large Messages:

Rendezvous software can transport very large messages; it divides them into small packets, and places them on the network as quickly as the network can accept them. In some situations, this behavior can overwhelm network capacity; applications can achieve higher throughput by dividing large messages into smaller chunks and regulating the rate at which it sends those chunks. You can use the performance tool to evaluate chunk sizes and send rates for optimal throughput.

This example, sends one message consisting of ten million bytes. Rendezvous software automatically divides the message into packets and sends them. However, this burst of packets might exceed network capacity, resulting in poor throughput:

sender> rvperfm -size 10000000 -messages 1

In this second example, the application divides the ten million bytes into one thousand smaller messages of ten thousand bytes each, and automatically determines the batch size and interval to regulate the flow for optimal throughput:

sender> rvperfm -size 10000 -messages 1000 -auto

By varying the -messages and -size parameters, you can determine the optimal message size for your applications in a specific network. Application developers can use this information to regulate sending rates for improved performance.

As to actual limits the Add string function takes a C style ansi string so is theoretically unbounded but, given the signature of the AddOpaque

tibrv_status tibrvMsg_AddOpaque( 
   tibrvMsg       message, 
   const char*    fieldName, 
   const void*    value, 
   tibrv_u32      size);

which takes a u32 it would seem sensible to state that the limit is likely to be 4GB rather than 64MB.

That said using Tib to transfer such large packets is likely to be a serious performance bottleneck as it may have to buffer significant amounts of traffic as it tries to get these sorts of messages to all consumers. By default the rvd buffer is only 60 seconds so you may find yourself suffering message loss if this is a significant amount of your traffic.

Message overhead within tibco is largely as simple as:

  1. the fixed cost associated with each message (the header)
  2. All the fields (type info and the field id)
  3. Plus the cost of all variable length aspects including:
    1. the send and receive subjects (effectively limited to 256 bytes each)
    2. the field names. I can find no limit to the length of the field names in the docs but the smaller they are the better, better still don't use them at all and use the numerical identifiers
    3. the array/string/opaque/user defined variable length fields in the message

Note: If you use nested messages simply recurse the above.

In your case the payload overhead will be so vast in comparison to the names (so long as they are reasonable and simple) there is little point attempting to optimize these at all.

You may find you can considerable efficiency on the wire/buffered if you transmit the strings in a compressed form, either through the use of an rvrd with compression enabled or by changing your producer/consumer to use something fast but effective like deflate (or if you're feeling esoteric things like QuickLZ,FastLZ,LZO,etc. Especially ones with fixed memory footprint compress/decompress engines)

You don't say which platform api you are targeting (.net/java/C++/C for example) and this will colour things a little. On the wire all string data will be in 1 byte per character regardless of java/.net using UTF-16 by default however you will incur a significant translation cost placing these into/reading them out of the message because the underlying buffer cannot be reused in those cases and a copy (and compaction/expansion respectively) must be performed. If you stick to opaque byte sequences you will still have the copy overhead in the naieve implementations possible through the managed wrapper apis but this will at least be less overhead if you have no need to work with the data as a native string.

ShuggyCoUk