IPFS ping protocol

A user on freenode#ipfs did it in Lua, exactly that: just ping.

Good find! That’s not just ping, though; it’s a lip2p implementation (or at least the beginnings of one) in lua. And that’s also not just an enthusiastic IPFS user – that’s one of the core go-ipfs devs. So he probably had at least a little head start in terms of understanding how the go-ipfs implementation works :slight_smile:.

The readme summarizes nicely what you would need to do to reimplement it. However, it sounds like you’d need to implement a little multistream functionality first before you can ping.

Implement ping

Open stream, multistream negotiate /ipfs/ping/1.0.0
Write a random 32 byte value
read back that same 32 byte value
time that process
repeat as often as needed

Yeah. Well I hope I’ll manage (and find free time) to do it.
I think the protocol is just not that nicely documented, so I’ll try to also write it down.

I was thinking that my end goal would be a tiny/basic implementation for ARM devices (R.Pi, routers etc.) and maybe even IoT devices, e.g. ESP8266/ESP32. That would be interesting and useful, but that’s fairly ambitious.

If I just redo the ping so that I fully understand it and nicely document it, I think I would be fairly happy. :slight_smile:

1 Like

So my progress insofar …

I’m doing it first in PHP, because it’s simple (to me), then I’ll try to do it in C.

I start the communication, switch to unencrypted, but then I’m stuck with how to start multiplexing. I think I somehow write the wrong headers or something. :’(

The following works:

After “/plaintext/1.0.0” I send: \0b00001000 \0x01 \x0A
Which by my logic would be how to initiate the multiplexing:

  • first character (from the right): 000 = new stream, 00001 = stream id
  • second character (length of data): 1, i.e. 1 byte
  • third character (data): just a new line

I get back: \0x13/multistream/1.0.0\n

  • first character \0x13 is 19, the length of the data
  • next is the data

Anything I send after this with a header \b00001010 (message initiator, stream id=1) or \b00001001 (message receiver, stream id=1) or without a header causes to drop the connection.
For example I tried both headers (or without one) + \0x13/multistream/1.0.0\n and it drops. I’ve tried also with /ipfs/ping/1.0.0, of course with the length before it.

What am I doing wrong?

I ran two IPFS’s with disabled encryption (–disable-transport-encryption), added the nodes to the bootstrap (otherwise it doesn’t work) and captured its traffic with wireshark.

You add to bootstrap like this:
ipfs bootstrap add /ip4/[ip where to connect]/tcp/[port]/ipfs/[peer id]

Example:
ipfs bootstrap add /ip4/10.0.0.1/tcp/4001/ipfs/QmaRmsQ2BfJFqRkqgd3MEPTgP3vrvwh5FTWPxf1irF1jtR

I started sniffing without IPFS’s started, then I started them and ran ping while sniffing, then ended.

I’ve saved the pcapng dump on IPFS, https://ipfs.io/ipfs/QmVYdhiH4XW32fdMWptwFVFhYKjsfn54xUPhXZTNZC8zPY

Seems that it’s a bit more complex.

I also don’t understand why /multistream/1.0.0 gets repeated so often. Seems like a lot of overhead.

Ok, according to this https://github.com/multiformats/multistream-select, you can also use ln\n at various points to get the available options.

I’ll use bold for received messages, italic for transmitted. Everything has a byte (varint) at the beginning, with the length of the message (including newline) written in binary. Lines are ended with newline i.e. \n.

So it seems that the handshake needs to be in form of:
/multistream/1.0.0
/multistream/1.0.0

Anything else fails: closes socket.

If we do the before described handshake, we can send ls. If we have the normal ipfs deamon, with encryption on, then we get the only option /secio/1.0.0, if not, we get /plaintext/1.0.0, which is what I got with my unencyrpted tests.

We have to repeat that back.

After we repeat that back, we send ls again. We get the only option, which is (again) /multistream/1.0.0.

We have to repeat that back.

After we repeat that back, we send ls again. We get two options, each in its own line, but sent in one piece. After the first byte indicating length (like any other message insofar) we get an another byte indicating the number of entries. Then each entry has its own preceeding byte for length.
/yamux/1.0.0
/mplex/6.7.0

With added extra bytes (including length, newlines):
\0x1D \0x02 \0x0D /yamux/1.0.0 \0x0A
\0x0D /mplex/6.7.0 \0x0A

  • \0x0A is new line
  • \0x1D is 29 which is the number of characters in the whole response (not counting this byte for length).
  • \0x02 is number of entries, 2 in this case
  • \0x0D is 13, which is the number of characters of /yamux/1.0.0 including new line.
  • \0x0D is the same thing just for mplex.

I didn’t notice the same behaviour with the ls executions before this one. Which points that possibly if the choice is only one, then number of entries is not given.

1 Like

Today’s research.

I’ll write again bold for received messages, italic for transmitted.
I’ll omit in the first part the preceeding binary byte of message length for clarity. Also at the end it’s always \n which I’ll also omit for the same reason. I’m using still --disable-transport-encryption.

So the handshake insofar goes like this:
/multistream/1.0.0
/multistream/1.0.0
/plaintext/1.0.0
/plaintext/1.0.0
/multistream/1.0.0
/multistream/1.0.0
/mplex/6.7.0
/mplex/6.7.0

After you execute the last line you get a bunch of data which is in a different format now. The format is found here: https://github.com/libp2p/specs/tree/master/mplex
Mplex in short is a protocol for stream multiplexing, which in simple terms is how to group a bunch of requests into a single connection.

First a bit of intro:

The format is such, that first you get an extra header byte, then the rest is like before. Note that there are then two data length bytes one after the other, because of “nesting” of protocols i.e. one in the other. One comes from mplex the other from multistream.

  • 1st byte is the header byte.
  • 2nd byte is the data length byte.
  • The data after that is data length bytes long.

Or in graphical form:
[header] [strlen(data) in binary] [data]

The header is in a special binary format where the 3 righter-most bits are type (or flag) of the header and lefter-most 5 bits are the ID of the stream.

Visually this would be like this, where D stands for ID, T stands for type:
D D D D D T T T

The left part, ID, is easy to understand. The 5 bits just represent a number, which is used to identify a stream. They are used for identifying streams. 5 bits gives us 2^5 = 32 possible streams.

A few real examples, where I’ve replaced D with actual values. T aren’t important for now.
0 0 0 0 0 T T T is for example stream ID 0
0 0 0 0 1 T T T is for example stream ID 1
0 0 0 1 0 T T T is for example stream ID 2

When you get the header byte, the most easy way to extract the stream ID is to use bitwise shifting operators, which shift bits by X places. In this case we need to shift by 3 places, that’s why we’ll use header >> 3.

That gives us for the first 3 examples:
0 0 0 0 0 0 0 0 for stream ID 0
0 0 0 0 0 0 0 1 for stream ID 1
0 0 0 0 0 0 1 0 for stream ID 2

Now we’ll focus on the Ts, i.e. type or flag of the header.

There are 3 bits, that gives us 2^3 = 8 possible flags, but just 7 are defined in the protocol.
D D D D D 0 0 0 (0) NewStream initiating a new stream, that one that sends this becomes the initiator
D D D D D 0 0 1 (1) MessageReceiver data sent by the one who didn’t initiate the stream
D D D D D 0 1 0 (2) MessageInitiator data sent by the one who initiated the stream
D D D D D 0 1 1 (3) CloseReceiver closing the stream by the one who didn’t initiate the stream
D D D D D 1 0 0 (4) CloseInitiator closing the stream by the one who initiated the stream
D D D D D 1 0 1 (5) ResetReceiverreseting the stream by the one who didn’t initiate the stream
D D D D D 1 1 0 (6) ResetInitiator reseting the stream by the one who did initiate the stream

We can see a pattern where the rightermost bit indicates 0 for the initiator and 1 for the receiver. This also manifests in the decimal numbers so that initiator has always even numbers and receiver odd ones.

To get the type (flag) from the header we use bitwise and of the number 7, which is 111: header & 0x07

So what did I get?

IPFS, after doing the /mplex/6.7.0 handshake, returns 3 streams. This is manifested with getting the following things all at once:

initiate stream 0
initiate stream 1
initiate stream 2
stream0: /multistream/1.0.0
stream1: /multistream/1.0.0
stream2: /multistream/1.0.0

I’ll present it here also in raw, but I’ll present it like this:
1st byte in binary (header)
2nd byte in hexadecimal (length)
3rd byte in ASCII character (stream name)

initiate stream 0: 00000000 0x01 0
initiate stream 1: 00001000 0x01 1
initiate stream 2: 00010000 0x01 2

Now I’ll do it the same as before, just that the 3rd part will be length byte of the nested protocol + ASCII data (stream data). \n stands for new line character.
stream 0: 00000010 0x14 0x13 /multistream/1.0.0\n
stream 1: 00001010 0x14 0x13 /multistream/1.0.0\n
stream 2: 00010010 0x14 0x13 /multistream/1.0.0\n

As we can see we first got initiate stream type headers, i.e. type 0 with stream IDs 0, 1, 2. Then next we got message initiator type headers, i.e. type 2 with IDs 0, 1, 2.

We see we have first 0x14 which is 20 i.e. strlen([data byte] + /multistream/1.0.0\n) and then the next byte is 0x13 which is 19 i.e. strlen(/multistream/1.0.0\n).

1 Like

I’ll write again bold for received messages, italic for transmitted.

This time I’ll not omit the length byte nor the new lines (which I’ll mark with \n), because the protocol seems to be more complex regarding the use of that. I’ll write binary bytes in hex form, e.g. 2 bytes one after the other I’ll write as 0x00 0x00.

I’m using still --disable-transport-encryption.

So the handshake insofar goes like this:
0x13 /multistream/1.0.0\n
0x13 /multistream/1.0.0\n
0x11 /plaintext/1.0.0\n
0x11 /plaintext/1.0.0\n
0x13 /multistream/1.0.0\n
0x13 /multistream/1.0.0\n
0x0D /mplex/6.7.0\n
0x0D /mplex/6.7.0\n
0x00 0x01 0
0x08 0x01 1
0x10 0x01 2
0x02 0x14 0x13 /multistream/1.0.0\n
0x0A 0x14 0x13 /multistream/1.0.0\n
0x12 0x14 0x13 /multistream/1.0.0\n

The explanation what is what is in previous posts.

Now in order to ping we must open a new stream and we can’t use the 3 ones opened by the IPFS on the other side as each has a separate function. We can just ignore those streams in order to just ping, but I didn’t at first and instead I tried to play with them.

In order to communicate with each of the streams opened by the IPFS we have to generate messages with proper headers of type (flag) 1, i.e. MessageReceiver and the ID.
We can generate the header byte in the reverse fashion as we decode it, using bitwise functions that’s header = (id << 3) | flag. So we first shift the ID 3 bits to the left and then we bitwise OR the bits on the right with the type (flag).

So each message we send has to have the header byte, but also two different length data bytes, along with the data itself, i.e. [header] [strlen(data) + 1] [strlen(data)] [data].

I sent first the ls\n command to see if I’ll get anything useful.
0x01 0x04 0x03 ls\n (for stream 0)
0x09 0x04 0x03 ls\n (for stream 1)
0x11 0x04 0x03 ls\n (for stream 2)

What I got were 2 byte responses with length byte of 0x00 and headers with each stream ID, but types (flags) of always in two cases 6 ResetInitiator and in one case 4 CloseInitiator. Which stream responded what was random, but always just 1 with close and 2 with reset.

After that I’ve sent to each the standard /multistream/1.0.0 back (with correct headers).

What I got in reply was the following each from a different stream ID (I’ll omit the headers, data bytes, newlines for clarity).
/libp2p/circuit/relay/0.1.0
/ipfs/kad/1.0.0
/ipfs/id/1.0.0

Which stream ID had which response was purely random.

To each of those I sent back again ls\n and same thing happened as before, only now I could identify which of those returns type 4 and which type 6. Type 4 (close) is returned by KAD and type 6 by the other 2.

Now let’s go to the ping protocol.
Again, I have to emphasize, that we don’t have to handshake the other streams. I just did it out of curiosity.

We have to open a new stream with stream ID for example 3, i.e. send a header with ID 3 and type (flag) 0. We get a multistream in response. Now we’re the initiator and IPFS is the receiver.

0x18 0x01 3
0x19 0x14 0x13 /multistream/1.0.0\n

We must respond with multistream back, note that header type is 2 MessageInitiator as we’re the initiator now.
0x1A 0x14 0x13 /multistream/1.0.0\n

Fun thing is that if we send now ls\n we get ls\n back, so theoretically we could use this already as ping. If we send anything else ending with \n we get back na, sending without a new line sends headers of flag 3 (close) and one of 5 (reset).

But let’s do it the proper way.

We send the same way as multistream /ipfs/ping/1.0.0 and we get the same thing back:
0x1A 0x12 0x11 /ipfs/ping/1.0.0\n
0x19 0x12 0x11 /ipfs/ping/1.0.0\n

Now we have to send 32 bytes of ping data (anything we want), but with just one length byte.
Before we were sending [header] [strlen(data) + 1] [strlen(data)] [data] for negotiation.
Now that we’re in ping mode we send it as [header] [strlen(data)] [data].

If we send anything less than 32 bytes, even if we mark it correctly in the header lenght, it will just stall. If we send anything more it will just read the first 32 bytes and send them back.

So how does it look like?
0x1A 0x20 12345678901234567890123456789012345678912
0x19 0x20 12345678901234567890123456789012345678912

We can repeat this last message for multiple pings and of course measure the time in between. :slight_smile:

1 Like

Example code that pings a locally started IPFS daemon with disabled encryption, i.e.ipfs daemon --disable-transport-encryption

A real example output from the script is here: https://github.com/seba1337/php-ipfs-ping/blob/master/ping-output.txt

Now, that it works in plaintext, I’ve decided to tackle the encrypted real version.

Starting the handshake with encryption is at first fairly easy. The handshake begins similarly to the plaintext handshake.

/multistream/1.0.0
/multistream/1.0.0
/secio/1.0.0
/secio/1.0.0

But then it gets tricky. It switches to a binary format. I’ll write all in hex.

From what I’ve analysed insofar, we get 6 bytes which are always the same, no matter the node or connection attempt.

00 00 01 7c 0a 10

Next we get 16 bytes which are always different with each attempt of connection.

24 bf fc 18 f4 3d 97 01 9c aa 77 80 0c e8 98 ac

Next are 41 bytes which are always the same, no matter the node or connection attempt:

12 ab 02 08 00 12 a6 02 30 82 01 22 30 0d 06 09 2a 86 48 86 f7 0d 01 01 01 05 00 03 82 01 0f 00 30 82 01 0a 02 82 01 01 00

Next are 256 bytes which are different from node to node, but same for each connection attempt (probably public key):

d4 4d ef ff ce a0 0b af c9 df 75 e4 cf 51 31 91 c4 f9 eb 72 57 54 cc 0f ee b7 17 fe 08 c5 d2 79 8d 66 3d 3f da ff 94 24 65 77 ad d4 11 e5 0c 0d bf be e8 bf 33 a8 f0 a2 b8 0a ec 76 96 f6 09 da 13 ab 7c 56 58 08 c0 90 0f 8d 1f 56 7a 7c 3a 81 91 1a 46 95 e7 4f ec 28 f3 0c 47 aa cc 77 78 58 c8 6a 00 48 5a 39 b6 b8 0c 0d ab cd 92 b1 88 fa 53 3a c4 fd f9 6c 9a 30 46 c1 b2 3c c6 8b ed fa a4 0b af d3 27 57 30 d3 a1 19 91 ab 8a f4 be ae 1f 12 d6 a8 30 45 14 42 61 43 71 bf 5d 51 a2 8d 90 f9 6b 3a 64 f0 36 7a 22 75 1b 86 42 57 3f bd 1a 6b 73 42 cb e7 9a a1 f8 3b d4 74 42 c7 e4 67 4a a0 87 b7 f1 45 06 1e d9 2a 65 05 1a da 0e 63 1a 8d 13 3a d0 69 0a c5 5b cd 57 24 f2 9b a8 6e 78 d3 be b4 ea a6 76 da 8f 1b 2c 31 e6 91 9c 51 71 fd 47 ff 61 c1 c3 58 71 fc 2f fa 4e 1a 15 b9

And at last 65 bytes which are always the same no matter the node or connection attempt:

02 03 01 00 01 1a 11 50 2d 32 35 36 2c 50 2d 33 38 34 2c 50 2d 35 32 31 22 18 41 45 53 2d 32 35 36 2c 41 45 53 2d 31 32 38 2c 42 6c 6f 77 66 69 73 68 2a 0d 53 48 41 32 35 36 2c 53 48 41 35 31 32

Thanks to an user on ipfs@freenode I’ve received a work-in-progress (not yet public?) spec for secio.

The binary format should be protobuf, which is explained here: Encoding | Protocol Buffers Documentation

The serialization should be of this format:

Propose {
Rand: 16 secure random bytes,
Pubkey: public key bytes,
Exchanges: comma separated string of supported key exchanges,
Ciphers: comma separated string of supported ciphers,
Hashes: comma separated string of supported hashes,
}

Hey @sebaseba !

I have started working on the mplex part and your post helped me a lot, thank you for sharing :slight_smile: ! I can relate to everything you said when “talking” with a local IPFS daemon (with encryption on).

However, when talking to one of the bootstrap peers of IPFS I see different results, it opens more than 3 streams and doesn’t send data every time. Here is a sample of what I get with peer 104.236.179.241:4001 :

data: <<1>>, flag: 0, flag_label: :new_stream, stream_id: 0
data: nil, flag: 0, flag_label: :new_stream, stream_id: 6
data: <<1>>, flag: 0, flag_label: :new_stream, stream_id: 1
data: nil, flag: 1, flag_label: :message_receiver, stream_id: 6
data: <<1>>, flag: 0, flag_label: :new_stream, stream_id: 2
data: nil, flag: 2, flag_label: :message_initiator, stream_id: 6
data: <<1>>, flag: 0, flag_label: :new_stream, stream_id: 3
data: nil, flag: 3, flag_label: :close_receiver, stream_id: 6

It doesn’t look to behave consistently. I’ll try to make sense of all of that later, I’ll keep you posted :slight_smile:

1 Like

That’s nice. It’s always a pleasure to find out, that somebody appreciated what I’ve shared.

I’m not sure what I’m looking at, but it seems like the remote node opens 4 streams, instead of 3, and you open a new stream (id=6), where you first send something as a receiver (wrong) and then as an initiator and at the end you close it?

It could be also that the data is longer? I simplified it in my code/explanation regarding length that it’s just 1 byte, but it’s a varint. It’s 1 byte up to 254 chars, above it gets more complicated and becomes 2 bytes (or more).

Or maybe it’s just (slightly) different protocol with bootstrap nodes. Who knows.

I mean IPFS is so huge and done by so many devs without a consistent standard/documentation at once place, which causes a bunch of tiny differences or rather inconsistencies that show only when you rewrite the whole thing. Like for example a silly one: how ls\n works, when there is less than 2 options.

I will get at it when I’ll analyze and deeply understand secio first. :slight_smile:
Right now I’m somewhat busy. I’m on IRC: seba- #ipfs@freenode.net.

Last time we’ve seen that first 6 bytes are always the same, but what do they mean?
00 00 01 7c 0a 10

I went further today with trying to understand scio. So, when we do the initial secio handshake what we get back is first 4 bytes indicating length of data and the rest is protobuf serialized data of Propose type, which is defined here https://github.com/libp2p/go-libp2p-secio/blob/master/pb/spipe.proto

So the first 4 bytes (in hex),

00 00 01 7c

actually mean 380 which is the length of the data after the first 4 bytes (we receive a total of 384 bytes).

How do we get 380 out of those 4 bytes?

We treat the 4 bytes as a single binary number (bigendian uint32).
00 = 00000000 = 0
00 = 00000000 = 0
01 = 00000001 = 124
7c = 01111100 = 1

(01 7c) = (00000001 01111100) = 380

Or put in an another (more math/code) way:
len = [byte1] * 256^3 + [byte2] * 256^2 + [byte3] * 256^1 + [byte4] * 256^0
len = [byte1] << 3*8 + [byte2] << 2*8 + [byte3] << 8 + [byte4]

In our case
len = 0 * 256^3 + 0 * 256^2 + 1 * 256^1 + 124 * 256^0
len = 1 * 256 + 124 * 1 = 256 + 124 = 380

So that gives us the length of data.

We read that much (380 bytes) of data and what we get is Propose encoded in protobuf.

We’ll focus on the next two bytes:
0a 10

This is already the beginning of Propose encoded in protobuf.

It has 5 fields, numbered or ID-ed from 1 to 5 (field_number). All of them are length-delimited types (strings, bytes) which are marked in protobuf (wire_type) as 2.

In protobuf each field starts with a header byte followed by a byte (or more bytes as it’s varint) indicating length of data.

Header is defined similarly to mplex. Righter-most 3 bits are type (wire_type) and the left part (first 5 bits) is the ID (field_number) of the field.

So like in mplex, if we draw it graphically, where ID is D and type is T:

D D D D D T T T

For example type = 2 (string) and ID = 1 would be:

0 0 0 0 1 0 1 0 , 00001 for ID, 010 for type

Codewise we do it exactly the same as in mplex. If we name the header byte as h.
type = h & 0x07
id = h >> 3

In hex 00001010 is 0a, which is also what we’ve received.

The next byte is length of this data.

Hex 10 stands for 16 and we did notice last time that after the initial “static” bytes, we get 16 bytes of data, which are always different. In the Propose definition it’s defined as rand 16 bytes, which in crypto lang is usually named as nonce.

After we read those 16 bytes of nonce, we’ve finished with the first field, so the next byte 12 is already the header of the next field. I’ll write this header in binary and I’ll put a space between the type and ID part.

0x12 = 00010 010

We can see again that it’s of string type, i.e. 010 = 2, but this time its ID is 2 and not 1. In the definition of Propose it’s the pubkey or public key.

The next two bytes are length and not just one.
ab 02

How do we know it’s 2 and not 1? If we write the two bytes in binary:
0xab = 1 0 1 0 1 0 1 1
0x02 = 0 0 0 0 0 0 1 0

If the leftmost bit is 1, it means that the next byte should be taken in account as well. Which means that if the length of data is above 127, then 2 bytes are needed. (Probably it’s the same in mplex and libp2p protocol and not 254 as I wrote before, but I haven’t yet dwelved into.)

So how do we make sense out of this now?
We can’t just join the numbers as before. It’s a bit more complex.

First we remove the left-most bit in 0xab.
0xab = [1] 0 1 0 1 0 1 1 => 0x2b = 0 1 0 1 0 1 1

Then we append the next byte 0x02 infront:
(0x02 0x2b) = 00000010 0101011 = 299

We can do in code/math way:
length = [byte2] * 128 + ([byte1] & ~128)
length = [byte2] << 7 + ([byte1] & ~128)

So next 299 bytes are the public key, which I don’t yet understand the format as the key itself is 256 bytes long and 43 bytes are for me unknown.

Next are 3 fields all with a header byte and length byte, but inside they are comma-separate values, which are also ordered in preference.

From now on, I’ll leave out the hex/binary form and just write what we get.

First is exchanges.

We get the ID = 3, type = 2, length = 17. The data is:
P-256,P-384,P-521

After that we have the field ciphers.

We get the ID = 4, type = 2, length = 24. The data is:
AES-256,AES-128,Blowfish

At last we have the field hashes.

We get the ID = 5, type = 2, length = 13. The data is:
SHA256,SHA512

1 Like

Great stuff @sebaseba! I would love to have to review the current in progress secio spec: https://github.com/libp2p/specs/pull/106

The public key data is also serialized using protobuf. How it was serialized, was a bit of a mystery at first as in the secio spec in progress the link points to https://godoc.org/github.com/libp2p/go-libp2p-crypto#Key, which doesn’t give us that much info.

Luckily an another user wrote his own implementation of secio and from that we can see the protobuf definition of the public key serialization.

I’ll write here the first few bytes in hex of the pubkey field (ID = 2), which was protobuf decoded in the last post.
08 00 12 a6 02 30 82 01 22 30

It has two fields: key type and the actual public key.

The first field key type it’s of type enum, which in protobuf has type 0. It can take 3 different values:

  • 0 = RSA
  • 1 = Ed25519
  • 2 = Secp256k1

As enum in protobuf is treated as varint, length byte is not needed. Also our values are less than 127, so the next byte is just directly the value in binary.

In our case the first two bytes were
08 00

If we write those in binary:
0x08 = 00001 000 (header, ID = 1, type = 0)
0x00 = 00000000 (data, enum value = 0)

So it’s a type 0 key, which, from the enum definition before, means it’s a RSA key.

The next field is the public key, it’s a protobuf type = 2 or string/byte array, so it has to have also length. In our case it has a length of 294 bytes. The relevant bytes for this are:
12 a6 02

In binary it would be:
0x12 = 00010 010 (header, ID = 2, type = 2)
0xa6 = 1 0100110 (length, leftmost bit is 1, we have to read the next byte as well)
0x02 = 0 0000010 (length, leftmost bit is 0, this is the end of length)

We convert the last two to the correct length (the exact explanation why/how is in the previous post):
(0x02 << 7) + (0xa6 & ~128) = 294

The next 294 bytes of data is in our case the RSA public key.
30 82 01 22 30 and so on.

If we google those bytes we get for example this post, which talks about the first bytes of ASN.1 DER RSAPublicKey.

I extracted the 294 bytes (those which start with 30 82 01 22 30). To recap, the bytes are found inside the secio handshake. The handshake is protobuf serialized Propose with 5 fields, of which the field pubkey ID=2 is again protobuf serialized PublicKey with 2 fields, of which the field with again ID=2 contains the public key in DER format.

Graphically:

   Response -> {nonce,
    	    	pubkey -> PublicKey {KeyType, 
    	    	    	    	     Data},
    	    	exchanges,
                ciphers,
                hashes
   }

Once you save these bytes to for example publickey.der, then you can run openssl to check if you exctracted correctly the public key data.

You do it by invoking the following command to CLI:

openssl rsa -text -inform DER -in publickey.der -pubin

For example, I’m giving what I got (you’ll get of course something else):

Public-Key: (2048 bit)
Modulus:
    00:99:00:93:c8:b3:23:f8:19:31:2a:c6:08:07:1e:
    97:ff:09:f3:76:f9:eb:86:28:c0:86:5f:54:53:8f:
    e3:a7:65:28:a6:51:43:42:7e:73:cb:0f:3f:1c:7a:
    96:2f:32:ef:1a:91:7a:b0:48:1b:d9:94:05:88:dd:
    95:f9:fb:70:91:2d:71:82:80:99:54:68:70:0d:e7:
    12:ef:bd:65:de:7f:38:94:b8:74:e5:49:8c:1b:8c:
    da:ef:6b:6d:c2:56:92:a3:6c:0a:56:30:26:d3:ad:
    07:87:37:a4:33:11:a0:83:65:85:5a:ca:f8:8a:1e:
    cf:63:7f:e0:19:92:cc:e0:00:01:29:5d:eb:9f:9f:
    cd:a1:fb:5d:ca:9a:26:70:7f:98:84:95:a7:0c:0f:
    39:bf:ff:f6:ee:42:a5:b9:4d:01:6a:3a:d1:1a:61:
    ad:cb:5b:69:a5:c0:22:a2:c4:d5:4d:17:94:da:d7:
    d3:fa:4b:6f:aa:c8:d5:09:eb:c7:85:cd:2d:fb:19:
    a1:1d:75:49:7a:37:5f:b8:fc:68:b6:79:b1:39:b5:
    1a:81:35:a7:07:b4:aa:0d:c1:b7:17:0c:cc:df:b2:
    2d:e3:6e:b4:a0:a8:17:58:c6:bd:c4:13:b5:dc:c0:
    23:8a:2e:d1:35:4f:bb:26:d1:a6:f6:c0:0c:45:5c:
    0a:d5
Exponent: 65537 (0x10001)
writing RSA key
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAmQCTyLMj+BkxKsYIBx6X
/wnzdvnrhijAhl9UU4/jp2UoplFDQn5zyw8/HHqWLzLvGpF6sEgb2ZQFiN2V+ftw
kS1xgoCZVGhwDecS771l3n84lLh05UmMG4za72ttwlaSo2wKVjAm060HhzekMxGg
g2WFWsr4ih7PY3/gGZLM4AABKV3rn5/NoftdypomcH+YhJWnDA85v//27kKluU0B
ajrRGmGty1tppcAiosTVTReU2tfT+ktvqsjVCevHhc0t+xmhHXVJejdfuPxotnmx
ObUagTWnB7SqDcG3FwzM37It4260oKgXWMa9xBO13MAjii7RNU+7JtGm9sAMRVwK
1QIDAQAB
-----END PUBLIC KEY-----

From what I understood insofar, you have to answer in the same way, but in order to do that you have to generate first a new public/private key (and nonce, 16 random bytes).

To generate the pair we can use openssl.

We generate first the private key.
openssl genrsa -out client-pk.pem

From the private key we can generate the public key in DER format.
openssl rsa -in clien-pk.pem -pubout -outform DER > client-pubk.der

When you answer you get a response in a similar way than before.
Again first 4 bytes are the length of the message, followed by protobuf serialized data Response.

This time we have just 2 fields, both of type 2 (string, bytes[]).

The first field (ID=1) is the so called ephemeral public key, which is the public key of a key pair, that you generate each time you form a new connection. You do this so that even if someone breaks the key, it broke only the session, not all the communication before, but you do use the same key for establishing who you are (peerID).

This key is supposed to be in P-256 eliptical form, but I’m not sure what that is. I suspect it’s secp256k1 or prime256v1. I think NIST named P-256 the later.

For some reason, the field is 65 bytes long and the first byte is always 0x04, the rest being the actual key bytes.

If I generate new eliptical keys using openssl:

openssl ecparam -name secp256k1 -genkey -noout -out ec-key.pem
openssl ec -in ec-key.pem -pubout -outform der -out ec-pub.der

I get the public key file ec-pub.der with 88 bytes (91 bytes with prime256v1).

Each time I generate new keys, the first 24 bytes (27 bytes with prime256v1) are always the same and the last 64 bytes are different. The byte before the bytes of the key is always 0x04, so it might be that it’s just a badly stripped header byte or it indicates the type of key. No idea.

The second field (ID=2) in the protobuf data is the signature. It’s 256 bytes long. It’s a SHA-256 RSA digest of the corpus in secio lingo.

The corpus is composed of 3 strings concatenated together:

  • Propose protobuf serialized data of the server (i.e. what we’ve received before Response, without the first 4 bytes indicating length)
  • Propose protobuf serialized data of the client (i.e. what we’ve sent back, without the first 4 bytes indicating length)
  • Ephemeral public key (i.e. the 65 bytes we just received)

We can check if the signature coresponds to the corpus if we save the corpus as described to a file e.g. corpus.dat, the signature to e.g. sig.dat and the public key of the server (which we received in the previous message, i.e. Propose) to server-pubkey.der and then running the following:

openssl dgst -sha256 -verify server-pubkey.der -keyform der -signature sig.dat corpus.dat

It should return Verified OK.

You respond to this the same way, just that in the corpus you reverse the order of client/server and use your own ephemeral public key, i.e. you concatenate your data Response, their data Response, your ephemeral public key, which you have to generate. I think you generate it as I described before, i.e. ec-pub.der, and you just take the last 65 bytes. You should use prime256v1, otherwise it doesn’t work.

We save this contatenation to a file e.g. `corpus2.dat.

Now we sign this using our private key.

openssl dgst -sha256 -sign client-private-key.pem -out signature.dat corpus2.dat'

The signature is in the file signature.dat, which will be included now in our protobuf response.

We protobuf serialize the data, so that ID=1 contains the 65 bytes of the generated ephemeral public key and ID=2 contains 256 bytes of the contents of signature.dat. Both fields are of type 2.

Now we send this back, together with 4 bytes at the begining indicating length of message.

We get the response, again with 4 leading bytes with the lenght anda 48 bytes response, which is everytime different.

It is supposed to be the received nonce, but now encrypted (16 bytes) and the SHA256 HMAC of the encrypted nonce (32 bytes).

Unforunately I got kinda stuck here, how to proceed. How it’s even encrypted, how is the HMAC generated, i.e. what does it use for the shared secret.