At AWS re:Invent 2022, Enterprise Times sat down with Mark Ryland, Director, Office of the CISO at AWS. It was an opportunity to talk about the announcement of the Amazon Security Lake (ASL) and post-quantum cryptography.
Collating all the data gathered by security tools and logs across an enterprise is a complex task. For years, vendors have used their own file formats to store the data they gather and generate. It means that when security teams want to use the data, they have to extract it, transform it and then load it back into their analytics tools. The ETL process, as it’s called, is time-consuming and has to be run constantly.
In June, Amazon, along with seventeen other vendors, announced the Open Cybersecurity Schema Framework (OCSF). It is a vendor-agnostic data taxonomy. Its goal is to normalise security log and event data across a wide range of products, services, and open-source code. That standardisation of data reduces the overhead of being able to mix data from all those sources. While that is a welcome step forward, it doesn’t address the issues of data silos those tools and services create.
Amazon wants to be more than just a sponsor of OCSF
Amazon wants to play a bigger part in OCSF than just co-sponsoring the file format. It wants people to use its platform to store their security data. But rather than just use its existing data lake products, the company decided to create Amazon Security Lake. It wants to make it easy for all that data to be stored in one location so that it can be more easily analysed.
Ryland, said, “the OCSF common interchange format is also the native format of the security data lake.”
Why is this important? Ryland continued, “it’s not even transformation to an intermediate language, you just put that in the data lake, and that’s the schema of the database. It’s a huge part of the story. When we did the OCSF launch a few months ago, there was a lot of interest from the community. And the primary story is security. People spend too much time data munging.”
What also makes this interesting is that AWS has announced its plans to move to a zero-ETL world. People can click on the data that they want and have the data pipelines created automatically. Those pipelines will continuously synchronise the data in near-real time. It means that Security Lake becomes an easy-to-deploy repository for security data.
Importantly, Ryland sees customers storing data from other vendors who are not OCSF compliant. “There is always a chance someone will come up with some new telemetry that doesn’t fit the schema. That’s fine, I can still put that in the data lake. It won’t be easily joined across other datasets. But it’s still present, and it can be searched on and queried.”
But what about the analytics tools and services?
AWS has an abundance of analytics offerings. If it can make ASL the obvious data lake for security data, that opens the door to get people to use its analytics services and tools. It can also bring its machine-learning and AI tools into the equation. After all, a lot of security teams are turning to ML and AI to carry out data analysis of their security data.
We asked Ryland if he thought that ASL would change the efficiency of those tools.
Ryland replied, “I think it will have, over time, a very significant dramatic impact, precisely because of the centralization. I’m not making five copies of the data for five different tools.”
There is also a significant cost saving here on the storage, as Ryland points out. The more copies you have to make of the data, the more storage you need and the more money you spend.
Unsurprisingly, Ryland believes that money “can be spent on better tools, better analytics, and maybe storing it longer. It’s just a more efficient model to keep a single source of truth, and then have the tools all play against that.”
He sees other efficiencies beyond saving money on data storage. ”We have all these disparate datasets, and you can’t literally do a joint operation between multiple databases. Your analysts are doing cut and paste looking at different screens. It’s just not a very efficient system the way we tend to operate today.”
Can we scale the analytics to handle very large ASL instances?
If AWS is successful in making ASL the data lake of choice for those using OCSF-compliant tools, it will end up holding a vast amount of data. It will easily be in the petabyte range if you look at an MSP aggregating data from multiple customers. We asked Ryland how ASL would cope and what AWS could do to address that problem of scale.
Ryland pointed to other announcements that were made at re:Invent, such as the new Trainium instance types. These are custom chips designed for machine learning which will handle much larger and more complex data sets.
The investments made by AWS with its Inference service allow you to find the instance type best suited for your application’s compute and memory needs. He believes that this will “drive down cost driver performance creating a virtuous cycle with centralised security data allowing security practitioners to catch up with business analytics people who have been doing this now for several years.”
Where is AWS on post-quantum cryptography?
There has been a lot of work to create new security protocols that can withstand the perceived threat from quantum computing as it becomes a reality. NIST (National Institute for Science and Technology) has been proactive in finding solutions to the problem.
One of its actions has been to run competitions to get people to propose new cryptographic standards. Surprisingly, AWS has not been among those who have submitted ideas for consideration. We asked, what is AWS doing on post-quantum cryptography?
Ryland replied, “We’ve already implemented Kyber, which is the first winner. We have an open-source implementation, you can download our production-quality version of Kyber. We have it in production today with three of our services. You can do post-quantum TLS, and negotiate and exchange your cryptographic keys over a post-quantum hybrid key exchange.
“It means we encrypt it twice. We do an inner encryption with the PQC, then we do an ECDC, traditional encryption of that. If someone finds a weakness in the post-quantum crypto, which is not unlikely, because cryptographic protocols take a long time to refine and perfect, then you’re still guarded, because their existing technology saves you.
“There’s already been a defect found in Sike, which was one of the selected protocols. That’s going to go on for a few years.”
One of the uses of PQC that Ryland mentioned was when sending secrets to services such as hybrid key exchanges and other services. Customers could use post-quantum key exchanges in production using the open-source code AWS makes available. You can even use that code to add features to your endpoint.
Deploying PQC comes with challenges
The big challenge for PQC, Ryland says, is going to be the TLS part and updating all the endpoints. He highlights three challenges to doing this.
“One hard part will be the whole public key infrastructure. All of the certificates that we all use to prove to one another that I really signed this, or this is really my input, involves traditional RSA or ECD, public key cryptography.
“Then, things like my browser have a chain of trust back to some root certificates. All that stuff has got to be upgraded to PQC, but we’re getting started on that.”
“The third, and probably the hardest part, is application authors using asymmetric encryption in their application code. We have been saying don’t do personal cryptography use a library like libcrypto or something, and we can upgrade in the background. If you keep using that library, it’ll just get magically better. But there’s a whole bunch of source code out there that you have to find, modify, and then redeploy, to be quantum-safe. And that’ll take a long time.”
Ryland believes we probably have 10 years to do this. He also believes that the industry is getting on with it and we are not in a bad shape.
How is commercial off-the-shelf software doing?
Dealing with the in-house code that a lot of organisations have is always going to be difficult. However, talk to the commercial off-the-shelf software (COTS) vendors, and the picture isn’t much clearer. Many don’t appear to see PQC as a priority. Without those vendors updating their software, it’s hard to see how we can really pressure enterprise developers.
Ryland commented, “a lot of COTS software is just calling the Linux or Windows cryptography API’s. It means that they are isolated from those inner workings in detail. If I’m using libcrypto on Linux, or if I’m using the Windows equivalent, the OS vendor can upgrade the behaviour of those API’s. And my COTS software will just get better without me making any code changes. That probably covers a lot of what you’re concerned about. But, Linux and Windows will have to have those features.
“Today, Linux is part of the OpenSSL project. We decided, some years ago, about the same time we decided to build our own TLS implementation, to build our own crypto implementation. Both TLS and libcrypto are hundreds of 1000s of lines of code that no one understands, and it’s underfunded. It’s painful that, as an industry, we all rely on code from volunteers. It’s not that great.
“So we’re just replacing that, and we’re open-sourcing it, and we welcome people to join in. But if they don’t help us, don’t worry. It’s not going to be a community effort, we’re going to fund the whole thing, but we’ll give it away. So Linux is in pretty good shape.
“The other good news is that all of the browser vendors care a lot about this stuff. Mozilla and Chrome and all these guys have been pretty good about modernising their TLS and keeping up with the latest cryptographic standards. They’ll push that out through their browsers.
“So even if my OS is old, my browser will just be modern and do the right thing. As it does today. A lot of browsers won’t support TLS 1.0, even if the operating system does. They just refuse to negotiate it, and that saves users from worrying about it.”
Conclusion
This was an interesting conversation with Ryland. With Amazon Security Lake, there is a lot to be positive about. AWS has the ability to scale the data lake to cope with very large sets of incoming data. It also has the tools, services and hardware to ensure that the analysis and visualisation of that data are possible.
It will be interesting to see how it goes about dealing with data sources from vendors who are not OCSF compliant. Can it adapt its plans for zero-ETL to make it easy to import and integrate that data? If not, security teams will still have to do ETL but on a much smaller scale.
In terms of post-quantum cryptography, it’s interesting that AWS didn’t put forward a protocol. However, it has implanted Kyber and already has it running in production. In addition, its open-source implementation of Kyber is good news for developers. It allows them to add PQC to their own environments in their own time.