Community feedback on the BLOOM Responsible AI License

#1
by yjernite HF staff - opened
BigScience Workshop org

The BLOOM RAIL license in its current form was the product of several conversations between many of the BigScience participants on how to best release the BLOOM models to combine our values of openness and transparency and of responsibility.

It also aims to be a starting point for further conversations on the use of RAIL in other settings, and feedback on its current and future iterations. Please comment here!

Reminder: please follow the code of conduct here

I have a serious concern over one of the Use Restrictions clauses of the BigScience RAIL License v1.0:
(e) To generate or disseminate information or content, in any context (e.g. posts, articles, tweets, chatbots or other kinds of automated bots) without expressly and intelligibly disclaiming that the text is machine generated;
This affects the entire field of linguistic steganography, which I have been working on for a couple of years.
https://arxiv.org/abs/2104.09833
https://arxiv.org/abs/2211.06662

Linguistic steganography involves a conflict of interest between two parties: those who want to censor media and those who want to evade detection. Linguistic steganography is a technology for the latter. Its goal is to conceal a message in some cover text such that an eavesdropper is not even aware of the existence of the secret message. In other words, an intrinsic feature of linguistic steganography is to generate content without expressly and intelligibly disclaiming that the text is machine generated.

A growing body of work uses a large language model to enumerate natural variations of text into which secret messages are embedded. While existing systems are just proof-of-concept demonstrations without practical utility, they have the potential of becoming effective tools to counter censorship. But the use restriction clause in question poses a threat to the existence of the research field, if it comes to wide use.

BigScience Workshop org

@murawaki thank you so much for taking your time and providing this nice feedback. We take it into account, I will share your feedback and discuss it with other members from BigScience.

Hi everyone,

In the License's Paragraph 6 is said that "Licensor claims no rights in the Output You generate using the Model", but I have a question about the INPUTs that I use to create a prompt: The data that I use to feed the model is shared with the Licensor in some way?

My concern is based on the fact that we may use this model for tasks that envolves some enterprise information that cannot be shared with any third party.

I apologize if this is a foolish question, but it's really crucial for me, and I didn't found any information about it.
Thank you very much in advance.

BigScience Workshop org

Sign up or log in to comment