Can Bug Bounties Fix GenAI’s Security Problems?

Generative AI models have some massive safety issues. With the right prompts or jailbreak, bad actors can sidestep an AI vendor’s content moderation guidelines and produce harmful content, such as prejudicial content and phishing scams.

However, Anthropic has announced that it is launching an invite-only bug bounty program in association with HackerOne, which will reward researchers up to $15,000 for discovering universal jailbreak vulnerabilities.

Anthropic said: “The rapid progression of AI model capabilities demands an equal swift advancement in safety protocols. As we work on developing the next generation of our AI safeguarding systems, we’re expanding our bug bounty program to introduce a new initiative focused on finding flaws in the mitigations we use to prevent misuse of our models.”

Can bug bounty programs provide AI vendors with a solution? How can a bug bounty help?

Key Takeaways

Anthropic announces a new bug bounty program to enhance its AI safety.
The AI startup hopes the program will help to identify universal jailbreaks.
Universal jailbreaks are challenging for AI vendors as they enable users to sidestep content moderation policies.
This new initiative has been launched in partnership with HackerOne, one of the biggest bug bounty providers.
The bug bounty market was valued at $1,130 million in 2023 and is expected to reach a value of $3,537 million by the end of 2030.

Table of Contents Table of Contents

Key Takeaways
How Bug Bounties Can Help Improve AI Safety
The Problem with Universal Jailbreaks
Bug Bounties & AI Development
The Bottom Line
References

Table of Contents

Key Takeaways
How Bug Bounties Can Help Improve AI Safety
The Problem with Universal Jailbreaks

Show Full Guide

Bug Bounties & AI Development
The Bottom Line
References

How Bug Bounties Can Help Improve AI Safety

There are only so many hours in the day. Even the most well-resourced in-house team of AI and ML engineers will struggle to discover all of the potential vulnerabilities in their models.

Outsourcing safety testing activity to a crowd of third-party researchers augments the in-house teams ability to identify vulnerabilities and can help to make products safer overall.

The Problem with Universal Jailbreaks

Universal jailbreaks first came to the forefront of the AI conversation back in December 2022 after users discovered a jailbreak known as Do Anything Now or DAN.

This jailbreak called on ChatGPT to adopt the role of an AI assistant that wasn’t bound by ethical guidelines — to adopt an alter ego which could do anything — enabling the chatbot to generate content that didn’t comply with OpenAI‘s content moderation policies.

This enabled users to produce hateful content, phishing emails, and even malicious code. The ease with which this kind of content could be created raised serious questions about the safety of large language models (LLMs) as a whole and whether they were putting users at risk.

“If AI safety isn’t taken into account, AI models could be manipulated to generate harmful content, such as providing instructions on creating bombs or producing offensive language.

“Bug bounty programs focused on preventing these malicious usages embrace an approach called ‘red teaming for AI safety’, which aims to ensure responsible use of AI and adherence to ethical standards,” Prins said.

Bug Bounties & AI Development

In recent years, bug bounty programs have been on the rise among many software companies, as proprietary vendors have looked for ways to mitigate vulnerabilities throughout the software supply chain.

According to Verified Market Reports, the bug bounty market was valued at $1,130 million in 2023 and is expected to reach a value of $3,537 million by the end of 2030 as more businesses work with ethical hackers to discover vulnerabilities in their products.

The bug bounty market notably collided with generative AI back in April 2023 when OpenAI announced the launch of a bug bounty program with Bugcrowd, a crowd-source security platform that currently serves over 1,000 customers. OpenAI’s program offers bounties ranging from $200 for low-severity findings up to $20,000 for exceptional discoveries.

At the time of writing, the program appears to have remained quite small — paying out rewards for 112 vulnerabilities with an average payout of $503.76.

In any case, now that HackerOne is partnering with Anthropic, it is clear that more vendors are eying bug bounty platforms as a tool to better secure their flagship models.

The Bottom Line

Bug bounty programs can provide an overburdened in-house AI vendor with outside support they can use to better identify vulnerabilities in models. Using these initiatives is simply a cost-effective way to improve the security and performance of AI models in the future.

With something as delicate and transformative as AI, it probably pays to never be too careful.