STORY
Reducing project scope for the deadline
AUTHOR
Joined 2023.07.01
DATE
VOTES
sats
COMMENTS

Reducing project scope for the deadline

On our call yesterday, we discussed whether or not we would be able to add a filter for submitted data by Monday. Let's recap the problem we were trying to solve for context.

The problem

Two problems people training LLM's run into are ensuring the quality of the data getting submitted and manually reviewing it. To address these, we wanted Bitcoin PAL to be a way for a project to incentivize crowd-sourcing high-quality data and to auto-accept / reject submissions. The way we wanted to achieve that was to create a filter.

If you look at our project charter, we wanted our filter to include two criteria at first:

  1. Verifying acceptable sources such as: stack exchange (must have at least 50 votes and will accept the highest answer), bitcoin optech, BIP’s, LIP’s, and updated versions of bitcoin core.

  2. Must be above 50% relevancy with current database wrt to how related it is. (AI can provide % guesstimate)

If a data submission got reject, a user has the ability to submit it out of band through a github PR. The community would then discuss whether or not it belongs in the data set and then get merged if it is.

The idea here was to reject early and often using the filter to preserve the quality of the data. But as people submit data out of band to the LLM, we would train the model on what is getting merged. Then as the LLM grows, we can loosen the requirements of the filter more and more.

(Also as a fun side note, on the PM call we had for hackathon projects someone highlighted that another important criteria to consider was copyrighted content. I was glad he mentioned that because that was something I hadn't considered before. One of the benefits of building in public!)

Scoping it down

When we had our call yesterday, we did a progress check and we quickly realized that we weren't in a place to implement the filter criteria. Couple of the issues that were raised were:

  1. Security concerns around auto-merging data

  2. Logistics of implementing the filter criteria

For #2, we had to figure out where the filtering would happen. Would it happen in the front end? Do we send the data to one LLM to check and then if it passes merge it into a second LLM that then sync's back to the first LLM?

We realized that we hadn't fully thought through how we would implement the filter and so we began brainstorming a more feasible solution to complete before our deadline.

We settled on pointing people to our github repo to submit a pull request. We would then run an agent to score the data and comment on the PR how relevant it was. Then we rely on the community to tACK or nACK the pr. After rough consensus we merge it.

We also settled on providing a PR template to specify that the submitter must leave their lightning address inside github.event.pull_request.title. In other words we would no longer collect their lightning address in the front end. We also no longer needed a front-end UI to accept data submissions which helped further scope down our work.

Although it was a little disappointing, I'm still impressed with what we've been able to build over the past couple of weeks. We also decided to simply mention the ideal end goal of having a filter in our presentation.