在分析狮中做攻城狮系列文章
这个系列我会将在我在分析部分作为研发工作的日常,技术和思考发出来,供大家参考.

Why

为什么使用Protocol Buffers, 思考这个问题:
How do you serialize and retrieve structured data?

Use Python pickling. This is the default approach since it’s built into the language, but it doesn’t deal well with schema evolution, and also doesn’t work very well if you need to share data with applications written in C++ or Java.
You can invent an ad-hoc way to encode the data items into a single string – such as encoding 4 ints as “12:3:-23:67”. This is a simple and flexible approach, although it does require writing one-off encoding and parsing code, and the parsing imposes a small run-time cost. This works best for encoding very simple data.
Serialize the data to XML. This approach can be very attractive since XML is (sort of) human readable and there are binding libraries for lots of languages. This can be a good choice if you want to share data with other applications/projects. However, XML is notoriously space intensive, and encoding/decoding it can impose a huge performance penalty on applications. Also, navigating an XML DOM tree is considerably more complicated than navigating simple fields in a class normally would be.

What

Protocol buffers are the flexible, efficient, automated solution to solve exactly this problem. With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.

How

Simple - Defining Protocol Format

tutorial.proto

package tutorial;

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}

men.proto

import "tutorial.proto"

message men{
    required Person person = 1;
}

说明:

Field modifiers:
- required
- optional
- repeated
Scalar Value Types (See the table below for details)

[1] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.

[2] In all cases, setting values to a field will perform type checking to make sure it is valid.

[3] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].

[4] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).

赋值的1,2,3 为 unique numbered tag
- Tags in the range 1 through 15 take one byte to encode
- Tags in the range 16 through 2047 take two bytes
- The smallest tag number you can specify is 1, and the largest is 229 - 1, or 536,870,911. You also cannot use the numbers 19000 through 19999 (FieldDescriptor::kFirstReservedNumber through FieldDescriptor::kLastReservedNumber)
Defauil Value [ default = HOME ]

Compiling Protocol buffers

安装编译器

从 Github Release下载最新release, 推荐使用c++ compiler

安装:

$ ./configure
$ make
$ make check
$ sudo make install
$ sudo ldconfig # refresh shared library cache.

测试:

1 2	$ protoc Missing input file.

编译文件

1
2
3

$ protoc --python_out=. tutorial.proto 
$ ls
tutorial_pb2.py  tutorial.proto

The Protocol Buffer API

Standard Message Methods

IsInitialized(): checks if all the required fields have been set.
str(): returns a human-readable representation of the message, particularly useful for debugging. (Usually invoked as str(message) or print message.)
CopyFrom(other_msg): overwrites the message with the given message’s values.
Clear(): clears all the elements back to the empty state.

Parsing and Serialization

SerializeToString(): serializes the message and returns it as a string. Note that the bytes are binary, not text; we only use the str type as a convenient container.
ParseFromString(data): parses a message from the given string.

阅读笔记 Protocol Buffers In Python