CASCDR Update #6: Engineering Flexible L402 Workflows on an Open Nostr Marketplace

0 sats

0 comments

Recently we've been speaking publicly about how our project enables privacy in spite of an ever-growing threat of mass surveillance + AI tyranny.

With CASCDR and the NIP-105 protocol spec we use Lightning to remain anonymous while using state of the art AI services that are conventionally KYC (article link + podcast link).

Quick Recap

In previous posts we've covered the numerous workflow examples we've built with FOSS service endpoints such as:

🧠 General Purpose LLM (GPT)
👁️ AI Image Analysis Capabilities
🖼️ Text to Image Generation
🗣️ AI Voice-to-Text Transcriptions
🎥 Youtube Video Extraction

With these services enabled, we created several examples that can accomplish tasks ranging from transcribing a YouTube link to helping you appraise the value of your belongings in Satoshis with just a picture.

Example of the Image to Appraisal Workflow

Making Zapier Possible but with L402 + FOSS

What if it were possible to go beyond just using static workflows that are pre-programmed?

What if we could unleash the ingenuity and skill of entrepreneurs on a free market enabled by lightning + nostr?

What if we could transcend the walled gardens in workflow apps like Zapier and build anything our imaginations can fathom with some Sats & the service building blocks at our disposal?

Today, we're announcing our design proposal to just that. We propose an open standard and development kit that will make it possible to chain custom, user defined workflows for L402 services. Let's dig into the specifics.

Workflow Specifics: Solving the Challenges of Composability & Creating a Compiler!

Fundamentally, what we're talking about is making it so that a machine (the client application) can take loose instructions based on human language/input and pare them down into very specific instructions that another group of machines (the service providers) can do their work.

What are some design examples we can draw from? Bingo. A compiler.

Fundamentally we are compiling the workflow pipeline down into specific instructions for the service provider APIs. Let's illustrate our proposal with an example where we wish to use an LLM like ChatGPT to make a high quality text to image prompt for image generation:

In this example, we have a ChatGPT Lightning enabled API and another Text to Image API. Note they each have an HTTP request schema, a response schema and a hashID of each. The hashID is simply a SHA-256 hash of each schema treated as a JSON formatted string.

That hashID makes it simple and straightforward for us to search for information about these API endpoints at a later time with the client apps need to map them together. We can then search either a local database or an external source (nostr, API, or website) to find a mapping that can take the response (output) schema of Service A and create a request (input) schema of Service B.

Ensuring No Stone Goes Unturned

For this to work, we need to make sure that all mandatory request fields in the workflow are fulfilled. There would fundamentally be three classifications for each field:

Mappings - are the rules for taking the name of one field from the response (output) of a service and feeding it into a differently named request (input) schema for another service.
Presets - are predetermined or pre-programmed fields that are set by the client. For example, setting the preface to the prompt "Make me a high quality and descriptive text to image prompt describing..."
User Defined - are the fields that are variables left completely up to the user. In an example like this, it might be something like "...the Amalfi coast in Italy during the summer" where it is appended to the preset in #2 above

The compiler must set rules that stipulate that all mandatory fields can exhaustively be classified into one of the three above cases. Or in other words:

Dry Run Through the Example

Now that we've covered the basics let's actually run through what an example would look like. The process would consist of three steps:

The user or the client application would arrange the services it wants in order (LLM then Text to Image Generation)
The user would press a "build" button
The compiler + client would pull all relevant mappings, select one and apply it. In this case, it would pull the mapping of LLM "result" to Text to Image's "prompt"
All that would be left would be for the user to configure their presets & user defined values. In this case, that would be:
1. Selecting the LLM Model (I.e. ChatGPT3.5 or ChatGPT4, Llama2, etc.)
2. Selecting the Text to Image Model (i.e. beautiful-landscapes-v4.5)
3. Filling in the LLM Prompt (i.e. "Make me a beautiful landscape of the Amalfi Coast in Italy during the summertime")
Once those criteria are fulfilled, the client enables the "run" button and the user can pay the BOLT11 invoices and watch as the workflow progresses through each stage in realtime.

Example of the Workflow Output :)

With this approach, we've laid the groundwork for how to create flexible, composable applications!

UNCLEJIM21

ATX Based Hacker (Pleblab). Jack of all trades hardware, embedded, application software. Passionate about freedom.

Related Project

CASCDR + Data Buffets

View on Nostr

View on Nostr ↗