Privacera has boosted its Privacera AI Governance (PAIG) solution with a set of new access controls and data filtering functionality. It wants to ensure that access controls on data added through the use of Retrieval Augmentation Generation (RAG), are maintained when that data is added to generative AI (GenAI) applications.
Don Bosco Durai, co-founder and CTO at Privacera, said, “In generative AI, Retrieval-Augmented Generation (RAG) systems operate by sourcing contextual information from a VectorDB, aggregating data from diverse origins such as Confluence Wiki pages, SharePoint, Databases, and support tickets, and other operational systems.
“These sources inherently possess their own access controls, so it’s crucial that the VectorDB inherits those and then maintains and enforces equivalent security measures when utilizing this data in generative AI applications.
“PAIG makes it easy to maintain distinct access controls aligned with the original source permissions– an essential part of leveraging robust user- and group-level policy enforcement within VectorDB.”
What problem is Privacera addressing?
LLMs are built on large data sets and may well include sensitive data. The initial creation of the LLM from a set of known data allows the creator to take steps to secure and protect such data. This is required to meet basic privacy needs and ensure that regulatory needs are met. PAIG already provides the governance tools to do this.
Over time, however, an LLM will need to be enhanced and augmented. This might occur during the fine-tuning process or as users ask questions that cannot be answered using the data already contained. In such instances, technologies such as RAG are used to look for additional data and add it to the LLM.
When new data is brought back and added to the LLM, it is not always clear what data is sensitive and how it needs to be protected. Organisations risk exposing sensitive data, creating a breach of duty regarding privacy.
The new controls that Privacera has added to PAIG are designed to examine the metadata around new data. Is it PII? Does it have access controls? How do those access controls relate to the way data is protected in the LLM?
To ensure that sensitive data is treated properly, PAIG also delivers audit trails that allow for data monitoring.
What are the new upgraded features of PAIG?
In the announcement, Privacera has listed several of the upgraded features in PAIG. They include:
- Seamless Integration with Multiple Data Sources: Users are now able to merge data from varied platforms like Confluence, SharePoint, Databases, and support tickets into VectorDB, ensuring original access policies of these sources are accurately reflected for users and groups.
- Advanced Classification-Based Filtering: Users can implement robust security and compliance policies through classification and tagging of data segments in VectorDB. For example, access to finance-related data in VectorDB can be restricted exclusively to members of the finance team, or embeddings tagged as “INTERNAL” are not provided as context to the LLM when contractors or external users query the GenAI applications.
- Fine-Grained Authorization Protocols: Users can employ dynamic metadata filtering to tailor access rights, guaranteeing real-time compliance and heightened security. E.g., Enforcing GDPR and CCPA by filtering customer data based on geographic location or individual consent.
Enterprise Times: What does this mean?
Retaining access rights to data when it is merged into a new dataset has been a long-standing challenge for IT. As GenAI systems gather more and more data, there is a concern that this will exacerbate existing problems with weak security controls.
PAIG already allows an organisation to set its own adaptive access controls at both the role and attribute levels. It creates a strong level of protection that often goes further than the initial data protection. With sensitive data, it looks for patterns that match what it understands as being sensitive and can apply control, encrypt and even redact data.
What it is doing with these new controls is hardening and expanding those controls to include the entire data acquisition process for LLMs. Although it talks about inheriting controls from the source data, it assumes that there are controls and that those can be mapped to existing controls. What it doesn’t say but implies is that it will also default to the policies that exist to ensure an underlying level of trust and privacy.
It will be interesting to see the details of how it plans to extract the security controls on data added through RAG. The details of what it finds and what it does with that data are likely to be included in the audit files it produces. That will put a duty of care on customers to read the audit files to identify potential mismatches. From a regulatory perspective, this is a good thing.
Is this the end of data leakage of sensitive data as it is moved between systems? Probably not, but Privacera customers now have a tool that enables them to reduce the risk significantly.