With the new advances in AI over the past two years, every company is looking for guidance on how to manage the new risks and opportunities it brings. A company that wants to leverage AI within their organization to build new products or make existing processes more efficient must understand how to safely use these technologies.
What is Generative AI?
Artificial Intelligence, or AI, is the application of a program to simulate human intelligence. The mathematical concepts used in AI have developed over the course of hundreds of years, but the field itself is widely considered to have started in 1955. The advances in artificial intelligence over the past two years have been primarily in the realm of “Generative AI.” Generative AI refers to systems that can generate high-quality text, images, and other content based on the data they were trained on.
Simplistically, think of a program that can write blogs, generate financial reports, generate images, or answer questions in a conversational format. AI models that do this are known as Language Models, and ones trained on massive amounts of text are known as “Large Language Models” or LLMs. The most popular one, ChatGPT from OpenAI, is powered by an LLM that is thought to be have been trained on almost 300 billion words of text (the equivalent of almost 4 million novels). ChatGPT first started making headlines because speaking with it was so similar to speaking with a human and it would generally be correct in its answers. Before the launch of ChatGPT (powered by GPT3), none of these models could generate anything human-like enough to be usable. Two years later and we have LLMs from several companies, both public and private, that can generate text on a variety of topics and tasks that are on par, or better than, what a typical human can do.
The way a generative AI system works is that it is given an input, known as a Prompt, and it generates an output. Prompt engineering is a rapidly growing field, as the method for providing input to a generative AI model drastically changes the output given, both in terms of accuracy and structure.
The Opportunity
Generative AI represents significant opportunities for a myriad of tasks that were previously extremely difficult to solve, because it is based on written (or spoken) text. The amount of data that is shared within an organization in the form of documents is vast, and the primary barrier to formalizing any of this information into automated (or streamlined) processes is that it often takes a human to read those documents and put the information into a usable, consistent format. With LLMs (large language models), we now have a widely available tool that can do that with a high degree of accuracy. Suddenly, companies can approach problems with a completely new perspective.
Some examples based on new products over the past year:
- Review a contract for specific clauses
- Review and analyze responses to an RFP or IFB
- Question and answer chatbot with knowledge of current events
- Have a custom chatbot with knowledge of your company documents.
The possibilities abound, and companies that have processes grounded in unstructured data, such as pdfs, documents, etc, want to be able to leverage these technologies to increase efficiency.
The Risks
There is also a general, and relevant, fear of not understanding these technologies well enough to regulate their use within a company. And rest assured, whether approved officially or not within a company, employees are likely using it. Almost half of Americans are familiar with ChatGPT and many are using it regularly at their work, with or without approval.
There are several fundamental risks associated with using Generative AI in a workplace without safeguards:
Data Privacy
The first, and most important, risk is data privacy. The consumer grade versions of ChatGPT and other generative AI chatbots will store your personal information for training purposes (unless you explicitly tell them not to). If any data is put into a training dataset, it can be used as part of a future generation, potentially exposing sensitive data to unauthorized users.
For some industries and companies that deal with private information, such as PII (personally identifiable information) or PHI (protected health information), a leak of data like this would be catastrophic.
False Information
A generative AI model will generate information based on probabilities. This means that sometimes it will generate something that is probable, based on previous text, but is incorrect. A large language model has no concept of “true” or “false” or “fact” or “fiction” in the sense that a human does, and when a model does generate something that is not true, it’s become known as a “Hallucination”. There are very deliberate ongoing efforts to reduce or eliminate hallucinations from LLMs but some question whether they can ever be eliminated.
Companies run risks of an employee using an LLM, and it gives them false information that they then give to a customer or use to make an important decision. This has already happened in legal cases.
Traceability and Transparency
In heavily regulated environments and industries, having justification for decisions can be an essential part of ensuring compliance. Whether authorized or not, if employees at a company are using generative AI in decision making or their daily processes, they must be intentional about the usage, or there will be no way to trace those decisions. No one really knows what causes large language models to output the text that it does, which means that the best someone may be able to do is point at a conversation it had. The point here is not to fear the tool, but to understand that if pressed, it would be impossible to trace a specific piece of output to a specific cause.
Unknown Bias
All AI models will produce results consistent with their training data. If the training data used has bias within it, the models will produce content that represents those biases. This can be a hard thing to quantify or understand and is potentially less relevant for users using models trained on massive datasets, but can be very real on smaller models and difficult to trace.
How to Mitigate Risk with Generative AI
There is substantial opportunity and risk associated with using generative AI within an organization. The question becomes, how can an organization maximize the benefits of these new technologies while also mitigating the risks? We’ve compiled a list of best practices that all organizations should require of any system they build or use that leverages generative AI.
Keep Humans in the Loop
The greatest opportunity from AI lies in the assistance it can provide to humans. Thinking of a system that leverages AI as a tool that exponentially increases the productivity and accuracy of the person doing a job presents a perspective that will inherently limit the risk of those systems. An AI system that includes a human to interact within the process is known as “human in the loop”.
Be intentional about where this interaction is placed. Keep a trained professional as a fundamental part of your process and ensure they’re trained on both the subject matter and the system itself.
When evaluating a system, ensure that a human remains in the loop at key decision points.
Ensure Data Privacy
As mentioned above, this represents one of the most dangerous risks for a company looking to use generative AI. There are several requirements one should expect from a system that leverages AI to ensure data privacy, including (but not limited to):
- Using a generative AI service that does not use your data for training purposes
- Masking all identifiable information before sending it to an LLM. This means removing names, emails, phone numbers, etc, from text before sharing it with a generative AI model.
- Limit how users interact with models. Instead of giving your employees a chatbot where they can type anything, ensure you’re using a system that facilitates a process where users provide input that is first processed and then shared with an AI model. This gives you full control over what is shared and provides traceability.
Generate Justification for Output
Though no one completely understands why an LLM produces specific output, there are several techniques available that can help users verify the information produced. Every system that utilizes generative AI in producing content, evaluating content, or making decisions based on content should include a justification for those decisions in the output. Even if a system hallucinates information within a citation or justification, having that information directly correlated to the output allows for easy auditing and verification.
For example, if you have a system that answers questions based on an internal knowledge base, when the system generates an answer, it should also specify the knowledge that was used in generating the answer, such as document names and page numbers. For systems using Retrieval Augmented Generation (RAG), this is straightforward and should be required of it.
Provide Traceability and Auditability
The best way to ensure that generative AI is being used productively and properly within your organization is to have visibility into its interactions. At a minimum, this means that any system your organization uses that leverages AI should track, in an easy to view fashion, all interactions with a generative AI model. At its most basic level, this means that every input and output given to a system should be visible to the administrators of that application at the organization.
Realistically, unless an application is being developed within your organization, the prompts used to provide a service are going to be the intellectual property of the company providing a service and will not be shared. However, all company data being provided to the system should be visible and accessible on demand. This gives you the ability to know what data was used for specific outputs.
For chat applications this means a history of all chats. For analysis applications, this would mean the data that was analyzed, how it was analyzed, and the output.
Conclusion
Generative AI represents one of the most important advances in the history of technology. Companies who wish to capitalize on these advancements can reap enormous benefits from incorporating these tools in ways that accelerate their business, but it should be done intentionally and with caution. Following the best practices above will allow companies to incorporate these technologies into their business while limiting risk.