protocol buffers
TRANSCRIPT
Protocol Buffers
Protocol Buffers
Ceyhan Kasap | Software Infrastructure
Data Serialization
● The process of translating an object into a format that can be stored in a memory buffer, file or transported on a network.
● End goal : Reconstruction in another computer environment.
● Reverse process: Deserialization
Binary Serialization
● Many languages provides built in language support
● Language specific (Interop issues)● Example : Java - Serializable marker interface
(increases likelihood of bugs and security holes )
● Item 74: Implement Serializable judiciously● Item 78: Consider serialization proxies instead of serialized instances
Binary Serialization
● Advantages● Memory efficient● Fast to emit and parse
● Disadvantages● Not human readable● Platform dependent
CROSS PLATFORM SOLUTIONS - XML (Extensible Markup Language)
● Design goals: simplicity, generality, and usability across the Internet
● Hierarchical structure, validation via schema (DTD, XSD etc)● A common standard with great acceptance.● Criticism for verbosity and complexity (especially when
namespaces are involved)
CROSS PLATFORM SOLUTIONS - JSON (Javascript object notation)
● Lightweight data- interchange format● Uses human-readable text to transmit data objects
consisting of attribute–value pairs.● Remember: xml is markup language and json is
data format
Google Data Encoding Solution Options
«At Google, our mission is organizing all of the world's information.
We use literally thousands of different data formats and most of these formats are structured, not flat»
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
Google Data Encoding Solution Options
« Not efficient enough for this scale. Writing code to work with the DOM tree can sometimes become unwieldy.»
Option 1 : Use XML
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
Google Data Encoding Solution Options
«When we roll out a new version of a server, it almost always has to start out talking to older servers. Also, we use many languages, so we need a portable solution.»
Option 2 : write the raw bytes of in-memory data structures to the wire
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
Google Data Encoding Solution Options
« there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol....»
Option 3 : Use hand-coded parsing and serialization routines for each data structure (used solution before protocol buffers)
https://opensource.googleblog.com/2008/07/protocol-buffers-googles-data.html
What are protocol buffers?
A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
Initially developed at Google to deal with an index server request/response protocol.
Designed and used since 2001 in Google. Open-sourced since 2008.
How do they work?
You define your structured data format in a descriptor file (.proto file)
You run the protocol buffer compiler for your application's language on your .proto file to generate data access classes.
You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
How do they work?
.proto Java
Message Definition
Messages defined in .proto files Syntax:
Message [MessageName] { ... } Can be nested Will be converted to e.g. a Java class
Message Contents
Each message may have Messages Enums: enum <name> { valuename = value; } Fields
Each field is defined as <rule> <type> <name> = <id> {[<options>]}; Rules : required, optional, repeated
Generated Code
MESSAGES• Immutable (Person.java)
BUILDERS• (Person.Builder.java)
ENUMS & NESTED CLASSES• Person.PhoneType.MOBILE• Person. PhoneNumber
PARSING & SERIALIZATION• writeTo(final OutputStream output)• parseFrom(byte[] data), parseFrom(java.io.InputStream input)
Backward / Forward Compatibility
DO NOT change the tag numbers of any existing fields.
You can delete optional or repeated fields, but you must not add or delete any required fields.
Backward / Forward Compatibility
When adding new field you must use fresh tag numbers… (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).
A good practice : Make your deleted fields are reserved. Protocol buffer compiler complains if reserved
fields are used.
Backward / Forward Compatibility
Changing a default value is generally OK … But remember that default values are never sent
over the wire.
Sender Receiver
Receiver reads value as 20 if not sent by sender
Performance Comparison
http://homepages.lasige.di.fc.ul.pt/~vielmo/notes/2014_02_12_smalltalk_protocol_buffers.pdf
Performance Comparison
http://homepages.lasige.di.fc.ul.pt/~vielmo/notes/2014_02_12_smalltalk_protocol_buffers.pdf
Possible Use Cases For Us?
Java, C++, C# IBM MQ / Solace messages DB raw data Log messages to disk Show as XML / JSON exe utility associated with protobuf files
Use Cases at Barclays Investment Bank
http://www.slideshare.net/SergeyPodolsky/google-protocol-buffers-56085699
QUESTIONS?