Adversarial AI: Cybersecurity battles are coming

According to the RAND Corporation, a longtime mainstay of the military establishment, “cyber warfare involves the actions by a nation-state or international organization to attack and attempt to damage another nation’s computers or information networks through, for example, computer viruses or denial-of-service attacks.”

During my recent CXOTalk conversation (episode #324) with one of the top cybersecurity experts in the world, Stuart McClure, the conversation turned to cyber warfare and adversarial AI (also known as offensive AI).

Stuart is author of the highly respected book, Hacking Exposed, and CEO of security firm Cylance. The company uses AI and machine learning, rather than pre-defined malware signatures, to prevent cyber-attacks.

McClure says that a battle between AI systems in cybersecurity is not here yet but will come in the next three to five years. He describes three points necessary to build an AI system, including those that can be used to bypass other AI systems:

  1. The first is the data itself. That must be created somehow.
  2. The second is security domain expertise, the ability to know what makes a successful attack and what’s an attack that’s not successful. And being able to label all of those elements properly.
  3. The actual learning algorithms and the platform that you use, the dynamic learning system to do this very, very quickly and rapidly.

He explains that gaining the security domain expertise is the hardest challenge among these three. And it is lack of domain expertise that presents the first line of defense against foreign powers developing AI systems that will succeed in bypassing our defensive capabilities.

The conversation offers a glimpse inside the mind of a world’s expert on security and is well worth your time if this topic interests you.

Listen to the full, 45-minutes discussion and read the complete transcript on the CXOTalk site. Edited excerpts that focus on adversarial AI are below.

Are other AI systems waging war against your models?

That’s what we call adversarial AI or offensive AI, sometimes it’s called. I just call it AI versus AI. We have yet to see an adversary of any sophistication leveraging AI in the wild today to defeat AI.

 

We know that that’s coming. We certainly have anticipated it for many, many years. We have a team dedicated to adversarial AI research to build in a sort of preparation for that type of technique going after us.

It will happen. We know that. But for now, we haven’t seen it and we are very, very ready for that and have anticipated that for quite some time.

The way that we do that is we actually try to break our own models, our own AI. By trying to break our own AI, we’re actually anticipating how the adversary would try to break us as well. We do this in real time in the cloud in thousands of computers inside of Amazon AWS. By doing that, we can actually predict and prevent new forms of AI adversarial attacks.

When will these attacks happen?

In the next probably three to five years, I believe absolutely we will start to see AI systems successfully bypassing other technologies. I’m hoping not ours, but possibly ours — bypass these technologies and gain a foothold. Right now, we are years and years ahead of the adversary because of this technique. I would say we’re at least three years ahead.

Now, that window might shrink. When it does, then we will have a challenge. But again, we’re spending more research, more time, more effort to make sure that we understand all of the different adversarial techniques and then building that into our improving learning math models will ultimately keep us ahead of the bad guys.

How do you fight attacks from countries with great resources?

It takes three things to build a proper AI or a bypass AI model.

 

  1. The first is the data itself. That’s what you might call resources, at least the first implementation of it is the data, so the examples of what would bypass us. That has to be created somehow.
  2. Now, the second thing is the security domain expertise, the ability to know what is an attack that’s successful and what’s not an attack that’s successful and being able to label all of those elements properly.
  3. Then the last is the actual learning algorithms and the platform that you use, the dynamic learning system that you’ve created to be able to do this very, very quickly and rapidly.

You need all three elements.

A nation-state could absolutely provide the first and the third without much struggle or problem. The second, which is the domain expertise problem, that is an age-old issue. If you go into the entire security industry right today and you ask, “Well, what percentage of people,” let’s say adversaries in security, “actually know how to create it, find a zero-day, exploit it, and use it?” just a simple example of something that’s quite complex, you’re probably talking about 0.1 percent of the hackers out there in the world that can do that kind of thing.

Similarly, in the world of defense, the folks that can actually detect a zero-day, prevent a zero-day, and move on to clean it up are probably simple. We’re in the low single digits. It’s a much more difficult problem to scale is the domain expertise. While certainly a large country–China, Russia, what have you–who have a lot of resources at hand and a lot of smart people, you could start to catch up but it becomes just a very difficult scale problem because humans are not easily scalable.

The resources, the limitation around resources and just scaling resources is simply this domain expertise. Not everybody quite really understands the core foundational problems of cybersecurity and how to effect it and how to mitigate or prevent it. That becomes a real challenge because it’s a very complex, multidimensional field of both attack surface area and defense capabilities.

Describe the mathematics you use to model threats?

We’ve gone through many evolutions of our algorithms. We use many different types of techniques. Right now, we’ve settled on two great groups of techniques. The first is traditional deep learning algorithms like neural networks. That’s sort of our primary go-to usage. But we also use more sort of anomaly-based algorithms like Gaussian and Bayesian, for example. It just depends on the use.

 

We’ve applied AI mathematics into, I think, over a dozen different features inside of the technology today to catch all kinds of different attacks. And so, how these algorithms work, it’s really, really simple. You take a large data set of data. You take then the characteristics of all of that data. Then you feed the characteristics, along with the labels, into these learning algorithms. It’ll tell you what are the predictive features that are most predictive of a classification set.

One of the greatest examples I give is I usually tell people, “Just look outside or look out your window and look at people walking by on the street. Now I’m going to give you a challenge. Think of three qualities of each person walking by that would give you a high probability detection that they are a man or a woman.”

Of course, this is a controversial topic but something that is, I think, quite interesting to talk about. You could look at them and say, “Well, look, long hair tends to be predictive of women or females, but not necessarily. It’s maybe only 90 percent. Facial hair might be highly predictive of men. Not 100 percent, but maybe 90 percent.” Adam’s apple, clothes, you name it, there are all kinds of qualities that you would probably come up with as you start to look through this.

Now, just take those three or four features, these characteristics. Now plot that in a three-dimensional graph or a four-dimensional graph if you have four qualities. Then now stick these learning algorithms into that graphing matrix in memory and start to learn from it.

What’ll happen is, you keep training each new sample that this is a woman, this is a man, this is a woman, this is a man, and you pull all these features. You’ll start to learn that, yes, truly, these characteristics–hair length, Adam’s apple, things like dress — are highly predictive of a man versus a woman. Now, it doesn’t mean it’s 100 percent, but if you learn enough from enough people around the world, you can probably get to 99.99 percent, and that’s the same kind of concept.

Instead of three or four features of classification, for us, we mapped over two million features. That’s how advanced the machine learning and the feature extraction has become in our world.

CXOTalk offers in-depth conversations and learning with the world’s top business and technology executives. Check out our extensive and free video library.

 

(Cross-posted @ ZDNet | Beyond IT Failure)

LinkedIn Twitter
Well-known expert on why IT projects fail, CEO of Asuret, a Brookline, MA consultancy that uses specialized tools to measure and detect potential vulnerabilities in projects, programs, and initiatives. Also a popular and prolific blogger, writing the IT Project Failures blog for ZDNet.