Protocol Definitions
ProtoPoke can decode raw frames into named, typed fields using a YAML or JSON protocol definition file. This gives you Wireshark-style display with per-field colour-coded hex dumps and a nested field tree — without writing any code.
Overview¶
A protocol definition describes:
- The endianness of integer fields (big or little)
- A list of message types, each with:
- A match rule to identify which frames belong to this type
- A list of fields that describe the byte layout
The parser evaluates message definitions top-to-bottom and uses the first match, so put specific rules before catch-all rules.
File Format¶
Create a .yaml or .json file:
protocol:
name: "MyProto"
version: "1.0" # optional
endianness: big # big or little
messages:
- name: "LoginRequest"
match:
type: magic
offset: 0
value: "0x01"
fields:
- { name: opcode, type: uint8, display: hex }
- { name: username_len, type: uint16 }
- { name: username, type: string, length: "{username_len}" }
- name: "GenericPacket"
match:
type: always
fields:
- { name: data, type: bytes, length: -1, display: hex }
Loading a Definition¶
Config tab → set Protocol Definition File to the path of your .yaml file → click Apply.
# Read-only — loading is an operator action
get_protocol_definition # inspect what is loaded
get_protocol_definition_schema # YAML spec + example
The MCP server exposes only read-only protocol-definition tools. When an AI client wants to propose or update a definition, it emits the YAML in chat for the operator to save and load.
Match Strategies¶
Three strategies identify which message definition applies to a frame.
magic — Match Bytes at a Fixed Offset¶
The most common strategy. Identify packets by a fixed opcode or magic sequence:
Accepted value formats: "0x10", 16, [0x10], "0x10 0x00", [0x10, 0x00].
sequence — Match by Stream Position¶
Match frames by their position in the stream (useful for handshakes/banners):
match:
type: sequence
direction: server_to_client # which direction to count
index: 0 # 0-based position
always — Catch-All¶
Always matches. Use as the last entry to catch anything not handled by earlier definitions:
Field Types¶
All fields share these common keys:
| Key | Required | Description |
|---|---|---|
name |
yes | Unique identifier; referenced in length expressions |
type |
yes | Field type (see below) |
display |
no | Rendering hint: auto, hex, ascii, decimal, enum |
Integers¶
- name: opcode
type: uint8 # uint8 | uint16 | uint32 | uint64 | int8 | int16 | int32 | int64
display: hex
Sizes: 1, 2, 4, or 8 bytes. Endianness follows the top-level endianness setting.
Floats¶
Bytes¶
Raw byte sequence with a required length:
String¶
Decoded text:
- name: username
type: string
length: "{username_len}" # length in bytes
encoding: utf8 # utf8 (default) | ascii | utf16
Null-terminated variant:
Padding¶
Skip alignment or reserved bytes (not parsed or displayed):
Enum¶
Any integer field can carry named values:
- name: status
type: uint8
display: enum
enum:
0x00: "Success"
0x01: "Not Found"
0xFF: "Unknown Error"
Bitfield¶
Integer decoded as individually named bits:
The integer width is inferred from the highest bit index (rounds up to the nearest byte).
Array¶
Counted sequence of identical sub-structures:
- name: records
type: array
array:
count: "{record_count}"
item:
- { name: id, type: uint32 }
- { name: name_len, type: uint8 }
- { name: name, type: string, length: "{name_len}" }
TLV Sequence¶
Stream of Type-Length-Value triples:
- name: attributes
type: tlv_sequence
length: "{total_length - 5}"
tlv:
type_size: 2
length_size: 2
endianness: big
tags:
0x0001:
name: "UserID"
value_type: uint32
0x0002:
name: "Username"
value_type: string
encoding: utf8
Length Expressions¶
The length and count keys accept several formats:
| Format | Example | Meaning |
|---|---|---|
| Fixed integer | 4 |
Always 4 bytes |
| Field reference | "{payload_len}" |
Value of a previously parsed field |
| Arithmetic | "{total_length - 5}" |
Computed from field values |
| Rest of frame | -1 |
Consume all remaining bytes |
| Null terminated | null_terminated: true |
Scan until \x00 |
Expressions support +, -, *, // and builtins min(), max(), abs(), int(). Field names in {} are substituted by their parsed integer value. Evaluation is sandboxed.
Working with Parsed Messages¶
# Get parsed message from intercepted frame
unit, msg = await api.get_next_intercepted_parsed()
print(msg.message_type) # e.g. "LoginRequest"
print(msg.protocol_name) # e.g. "MyProto"
# Access a field by name
f = msg.field_by_name("username")
print(f.value) # Python value
print(f.display_value) # Rendered string
print(f.offset) # Byte offset in frame
print(f.size) # Bytes consumed
# All fields as a flat dict
print(msg.as_dict()) # {"opcode": 1, "username": "admin", ...}
# Forward with field edit (length fields auto-recomputed)
api.modify_field_and_forward(unit.id, {"username": "hacker"})
Iterative Definition Building¶
Reverse engineering a protocol is incremental. Start minimal and expand:
Pass 1 — Split opcode from payload:
protocol:
name: "Unknown"
endianness: big
messages:
- name: "Packet"
match:
type: always
fields:
- { name: opcode, type: uint8, display: hex }
- { name: rest, type: bytes, length: -1, display: hex }
Pass 2 — Add a specific message type:
- name: "LoginRequest"
direction: client_to_server
match:
type: magic
offset: 0
value: "0x01"
fields:
- { name: opcode, type: uint8 }
- { name: username_len, type: uint16 }
- { name: username, type: string, length: "{username_len}" }
- { name: password_len, type: uint16 }
- { name: password, type: bytes, length: "{password_len}", display: hex }
- name: "Packet"
match:
type: always
fields:
- { name: opcode, type: uint8, display: hex }
- { name: rest, type: bytes, length: -1, display: hex }
Pass 3 — continue adding message types until no rest placeholders remain.
Examples¶
See the included example protocol definitions:
examples/protocols/chat.proto.yaml— a fictional chat protocol covering all field types (enum, bitfield, array, TLV)examples/protocols/dns.proto.yaml— DNS protocol definition