฿TC Transcription Review
Transcribe the bitcoin world
Without Bryan Bishop, there would be little written record of conference talks. He has written over a thousand transcripts and we’ve enriched these with metadata at btctranscripts.com (stored in markdown files on GitHub).
With the advancements in speech-to-text AI, we can unlock the information trapped in audio in talks and podcasts.
Transcription is the easy part
Transcription BTC (TSTBTC) is a command line tool that transcribes youtube videos and mp3s using Whisper on a local machine for single speakers and formats it with metadata. (We also have whisper diarization for multiple speakers that delivers a diarized transcript but requires manual addition of metadata.) TSTBTC sends a JSON payload of the transcript to...
Human in the loop
Whisper is impressive on its own, but lacks some of the technical training to recognize bitcoin technical jargon. So if we are going to transcribe en masse, we will need a way to enlist an army of reviewers. TSTBTC can send transcripts to a queue of transcribed material that waits for human review. This allows reviewers to claim a transcript and correct it in exchange for some sats. By capturing the diff of reviewers’ corrections, we can build a DB of jargon to improve our accuracy with embedding style post-processing.
Making audio and video searchable
After a transcript is reviewed, a pull request is opened against bitcointranscripts and, once merged, it will be scraped and indexed on ElasticSearch thereby making it immediately searchable on bitcoinseach.xyz.
See you soon
We are hard at working getting something useable shipped. See you on the other side.