Should All General AI Service Providers Protect Their Users?

Jul 13, 2023

Listen to Article ( 4 minutes )

Read Mode

Pixis

By now anyone keeping up with AI news has heard of OpenAI facing a couple of legal charges. One is for supposedly garnering vast amounts of personal data from social media accounts, private messaging platforms, and even medical records in order to train their generative AI models, ChatGPT and Dall-E. Could you imagine your boss asking ChatGPT if you really like them, only for the AI to tell them all the jokes you’ve made at their expense based on the private group chats with your colleagues? Talk about a snitch!

Another lawsuit claims that their AI model had plagiarized two authors’ books without their permission. The risk of getting sued for copyright infringement can be a pretty big deterrent for businesses to use such generative platforms and could mean a barrier to AI service providers’ revenues.

This is an issue where photo editing platform, Adobe, has chosen to take a stance in. Recent news shows them announcing an indemnity clause designed to ease enterprise fears regarding privacy laws of their generative AI art tool, Firefly. The company is confident of offering this guarantee because they will be training their model using only their own stock library, openly licensed content and public domain content where the copyright has expired. But they also recognize the majority of us have severe trust issues, and have boldly claimed to indemnify enterprise customers who get sued over content generated on Firefly.

We are now left to ponder an important question: Should all AI service providers now make it standard practice to exclusively use proprietary data and indemnification clauses? Let’s dive into this whimsical debate and explore the implications of such a decision.

The Protective Case

Proponents argue that using proprietary data is a necessary step in safeguarding AI creations and ensuring ethical and legal compliance. The indemnity clause introduced by Adobe, is a prime example of addressing potential legal complexities. By using proprietary data to train their AI models, AI service providers can promise businesses greater control over the content generated, even making it more specific to their purpose. It also minimizes their risk of infringing on existing copyrighted material or users’ private data.

It is not just the protection from legal repercussions that AI service providers can offer enterprises, but also increased brand value. By using models that exclusively train with owned or licensed data, brands can cultivate a trustworthy image. With the public’s constant concerns about the security of their data, this could prove to be the ideal way for generative AI models to evolve.

The Costly and Limited Pool

While embracing proprietary data can provide legal protection and make generative AI service providers the goody two shoes of the market, it could effectively increase their costs as well. Let’s keep in mind AI models require large amounts of data sets to make them highly effective. The costs to build a proprietary database would not only include collecting vast amounts of data, but also employing people to clean the data for training purposes.

The limited scope of data could potentially hinder the very essence of creativity that AI promises. Without access to a wide range of diverse information, AI may struggle to produce inspired output. Think of it as confining an artist to a single color palette or a musician to a single note—surely it limits their creative capacity. After all, the beauty of AI lies in its ability to learn and adapt. The argument states that if we solely rely on proprietary data, we risk creating a bubble where AI reproduces the same ideas and biases.

Final Thoughts

The question of whether AI service providers should exclusively use proprietary data to protect their users is one that requires a delicate balance. The sole use of permissible data and introduction of indemnity clauses may compel us to find a solution that can get the best of both worlds. AI service providers may not need to think about having to offer indemnification clauses if they make it a standard protocol to use licensed data to train their models.

AI service providers could draft licensing contracts with enterprises to use their original content to train their AI models, instead of building their own database. For example, music generative AI training with music owned by music labels or freelance composers, or image generating models licensing with photographers or art galleries.

Yes, it would be an added cost for AI providers, but they may find more people would be willing to pay for a subscription to their model, since it would be trained on legal, professional, and original data. This would be beneficial financially for the AI service providers and enterprises in different industries, creatively for AI users, and for the training of AI models.

Wait, did we just end this debate?