The 12th word
LLMs are unpredictable beasts. Working with OpenAI public APIs is an exercise in trial and error, so much uncertainty and unpredictable behavior!
The idea behind the "39 Rhymes" rapper bot was to use words from the English BIP39 wordlist to compose rhymes, but feeding all 2048 in a prompt would be too much tokens (I tested it in the playground) for the public APIs of Chat GPT and others. So I had to pivot.
My next best idea was to feed just a subset of all the 2048 words, this subset could be a lot smaller, like 12 or 24 random words. The AI wouldn't have a very large vocabulary to work with, but the idea never was to create a song that uses only words from the list anyways. Since the bot will just sprinkle a couple of keywords from the list mixed with it's own knowledge, providing just a subset should be fine.
And to generate subsets of size 12 or 24, we can simply use a generateMnemonic function from any BIP39 library, sounds easy enough.
Since we already generated a valid mnemonic, it would be a cool feature if the song contained in it a valid seed, Da Vinci Code style.
But feeding 12 words in a well-crafted prompt is not enough to make those unpredictable black boxes obey you, the machine can choose to use less than the twelve, or use them in a different order, and then you don't have a valid seed anymore since the order matters and the last word contains a checksum in it.
So the next strategy was to give more than 12 words for the AI to have more options (24 seems to be fine and economic) , and ask for it to use any 11 ones that it chooses. Then after the reply we can check what were the used ones, their order and generate the last word outside of the AI, by calculating it.
Here is the explanation about the last word from the excellent Bitcoiner Guide Seed Tool.
One offline method used to generate a menmonic seed is to print the BIP39 list and pick them from a hat randomly, one at a time. However, this method of seed generation cannot calculate the BIP39 checksum (represented as the final word), which is where this tool comes in.
Randomly picking seed words using this method provides 11 bits of entropy per word. In the case of a 12 word seed which requires 128 bits of entropy, picking the first 11 words gives 11x11=121 bits of entropy. This means there are 7 bits of entropy (ones or zeros) left over that need to be set in order for the checksum (which in this case is 4 bits long) to be calculated. Final word = 7 random bits + 4 bit checksum.
This is why there are multiple valid final words for any given first 11. Each new iteration of those final 7 bits changes the checksum and subsequently the BIP39 it is mapped to. The length of the checksum changes with the length of the seed, but the principle outlined above is still true. Flip some bits and see what happens!
Of course that I don't expect people to use this online-hosted-that-sends-data-to-God-knows-where bot to create wallets, that would be incredibly dumb, but since we have a seed embedded in a poem, we might as well make it a valid seed while there, just for the lulz 😂