Why transparency is important for generative AIs
I gave a seminar to IPSANZ this week on generative AI and issues at the intersection of legal practice and copyright. One of the take home principles I referred to was the need for transparency.
My talk was very much focussed on two narrow domains - first, the ways in which which lawyers might use generative AI, whilst being aware of the limitations of those systems; and secondly, the way in which those systems are battling claims of copyright infringement.
Transparency has a role to play when lawyers use a generative AI system, because the ability to understand the reasoning behind a proposition put forward by an AI is central to a lawyer’s ability to judge the reliability and accuracy of that proposition. That’s why we lawyers footnote everything when writing articles, submissions and the like. This feature is notably absent from most generative AIs (the best I’ve seen is Bing’s AI search, which gives a couple of links at the bottom of its answer.
Transparency has a critical role to play in copyright litigation, because in order to understand whether there has been a substantial reproduction of the work of another, it is necessary to understand:
how the training of a system works - does it involve something akin to ‘copying’ a work it is trained on, or is it more akin to ‘learning’ (more on this in a later post)
the relationship between a work used as training data, and a particular output, in order to understand whether there has been a substantial reproduction of that work.
But outside of those narrow domains, transparency is important in helping us to understand whether the particular set of training data might introduce a bias into the system. On that note, I recommend this article on Gizmodo, entitled “GPT-4 Is a Giant Black Box and Its Training Data Remains a Mystery”.