Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the feedback!

> 1. It appears from the encoding docs that length for TEXT > fields are in bytes, not in characters..

Correct. Text is UTF-8 encoded, and the length sent on the wire is bytes.

You make a good point that this can be frustrating in languages where UTF-8 isn't the standard representation for text. Though, is it really a character count that you want, or is it a UTF-16 length? Depends on the language, again. Hmm, this could get ugly...

I think we can optimize around this problem by taking advantage of the fact that Cap'n Proto implementations will generally use arena allocation (since message objects must be allocated in contiguous segments). Just start by attempting to encode the string into whatever space is left in the current segment. If it turns out to be enough, which it usually will, great! Mark the space as allocated. If not, you have to back-track and try a slower path.

> 2. I'm unclear on first reading how references are supported.

Usually you send a whole "message" (composed of N "segments") at a time, and that message cannot have pointers to any data objects outside the message (but can have pointers to interfaces). A sophisticated protocol could send one segment at a time, and the receiver could request additional segments when the far pointers pointing into them are first accessed (this could all be transparent to the application). In general, though, if you want your message to reference things outside the message, you probably want to use interfaces.

http://kentonv.github.com/capnproto/language.html#interfaces

These let the receiver call back at some arbitrary time in the future. (The details of how these will be implemented is not specified yet, but it'll be something like what E does.)



To be clear, I think a character count isn't what you want. That's easy enough to do post-parse. A byte count is most useful so that can make a single call to pull the value into memory. A buddy and I wrote python bindings for Hessian and the fact they were encoding character counts and not byte count caused us pain.

Makes sense regarding pointers within messages, and interfaces.


Ohhhhhh, yeah, I misread what you were saying. Yes, the count on the wire is a byte count. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: