Further Cryptosystem Improvements
My initial design for the message format exposed metadata for message sizes, segment sequence numbers and redundancy values. I have now changed this so that only the signature, message hash, the fingerprint of the recipient's public key and the encryption nonce are in the clear, and everything else is encrypted.
I have had an idea for how to further improve the correlation avoidance with the recipient public key fingerprint also, by using a short 4 bit prefix that indicates how many times the public key was hashed, and randomising this through a segmented longer message so only the sender public key connects the messages.
I have to also consider how to reduce that correlation as well. the receiver MUST be able to identify packets as associating by some means, AND they need a public key to go with their private key. The encryption is generated half from this part.
The only real viable way is to simply use more than one private key to sign a string of messages, but there is a cost in time for generating each one so it may be simplest, and create more of a fog of anonymity if single keys were used for some minimum amount, say 4 message segments, and then shuffle which key is used over the full message so the time correlation is broken as well. Or perhaps even better to randomise this distribution further so there will be random numbers of messages randomly distributed with a common public key in the signature, and the best an attacker can hope to do then is to grab a subset of definitely related messages out of an unknown full size of message.
To improve this further maybe later an All or Nothing Transform could be put on the messages so unless the attacker correctly identifies enough pieces of the data (accounting for the concealed redundancy rate for the packet), and breaches the public key being used in the "to" section of the message, which is obscured by hashchaining, they still have got nothing and no positive mapping between sender and receiver, as this information is only exposed in the last hop as the receiver gets the message.
I have also started thinking about how the onion layers will work too. The sender knows all of the public keys it can use, with the session cookies for relay as it gathers these beforehand to enable the client to run.
The forward path, the first three hops to the exit node, are easy to see how they are encrypted, but the last three, return paths are a little more complicated to understand.
The shared secrets of these individual hops can be combined via the magic of XOR into an encryption key that is provided to the exit node to encrypt the replies that relate to the forward message to the server at the exit. This means that they then encrypt this payload, and forward to the first hop in the return path, and as each subsequent hop decrypts the packet, at the end the remaining encryption is known to the sender to be connected to the forward path and identifiable by the session cookie that the exit also knows, which is rolled over after a message circuit completes.
This also has implications for the traffic pattern of the system. It is not possible to grant an unlimited ability to the exit node to know how to send messages back, they have to be pre-authorised, and so this means that the nodes will be in constant chatter with the other side if they are expecting the other side to initiate a return of data. This will be something like a polling process. Messages sent forward have return information to them, and if they are expected to have a relatively immediate return, the message will wait on the server reply, and then the exit router will use the provided multi layered secret to encrypt the reverse path, combined with the layered encrypted header which contains the routing instructions.
If a message is intended to only be one way, and not have a return expected, the return will simply be a "message received" signal constructed by the client, and later messages will poll for a relevant response.
This messaging sequence architecture is designed to be flexible so that it can be adapted to work with multiple types of messaging patterns used by servers, the synchronous immediate return, as you see with web services, but also the asynchronous polling/gossip pattern that you will see more with peer to peer network services. Both Bitcoin and Lightning Network traffic contains both types of messaging pattern, some are like "send me inventory" and the reply can be "ain't got none sorry", or an immediate "here is a bunch", or like on Lightning, "sending x satoshis in your direction here's the transaction update" and immediate response acknowledgement counter-transaction.
As such, it may be neccessary for Indra clients to have some knowledge about the messaging pattern of the service in question in order for the correct prompting and timing to be performed for a protocol specific messaging pattern. This also means that adding types of exit traffic will need, for optimal compatibility, intelligence in the router to understand how the traffic works, whether it's sync or async, gossip, connection style TCP, etc. Partly because one of the goals of Indra is to enable continuity of connections even if intermediary node path changes, this should not be relevant if the endpoint stays the same, an issue particularly pertinent to TCP and QUIC connections, which have an expectation of a stable IP address baked into them. Reliable UDP type protocols that already do not care about the static exit path can even exploit this to scatter the traffic through multiple exits without disrupting the protocol, thus improving the randomness of traffic.
There is a lot of things to consider when building an anonymised messaging network. Every way in which it is made harder for surveillance to correlate messaging is needed, balanced against the tradeoffs they require in latency and connection stability. And very very important is spam prevention. Most of the traffic on Tor network is spam, or malicious messages. By requiring payment to all hops in a path, the cost of spam will be raised which should thus concommitantly reduce its occurrance. This is important, because the Tor network has been blacklisted by almost all internet services where it has been used as an attack vector. The lack of scaling of the network has limited the scope of denial of service attack usages, as the network only has so much bandwidth. Indra will be able to grow much more, and thus the spam must not also grow or the attackers will trigger hostile responses. Distributed Denial of Service attacks depend on free traffic, so this should be a major factor in reducing this potential problem as the network scales up.