Role Mining, and why it’s a fantasy

Over the years I’ve had a play with a few role mining tools, and while I can’t claim that as any type of industry review, it did leave me with a general feeling that the whole concept is a fantasy.

The main problem I have is that role mining assumes there is a logical structure out there, just waiting to be discovered. I don’t know about you, but the access management landscapes I see in my consultancy work are too complex, with a long legacy of by-hand, partially-scripted, migrated and  superstitiously-maintained groups. The exact environments where you’d really like to use a role mining tool are also the ones where they are not going to give you anything sensible.

Recently I completed Andrew Ng’s online Machine Learning course (which I did out of pure interest – and it was interesting – I recommend it!) Role mining is a good fit for a Classification training algorithm. RBAC’s progeny “Adaptive Access Control” is a clear Anomaly Detection problem – but first, you’d have a massive job enumerating, cleaning up, manipulating and converting the data before it’s in a form the algorithms can use.

In terms of Role Mining we have a number of data types we can look at collecting:

1. Group Membership

This is by far the easiest to get but also pretty useless. Users belong to all sorts of groups they don’t (or no longer) need, and the common practises of “access cloning” and “access creep” make this inevitable. A Role Mining tool that looks at nothing other than group membership (and some of them only look at direct group membership!) is not going to help in your typically messy IT environment.

2. What users have access to

This is much harder to enumerate. Firstly you have to look at many different types of system and applications, all of which may express access control differently. Then you have to link these permissions back to users, both directly and through all their group memberships. This is possible to do, and there are products out there that do this sort of thing. However, while invaluable for security audit, it’s not the right data set for a Role Mining exercise.

3. What users are actually using

This is, to me, the real data you’d want. If we can monitor over, say, 3 months what users are actually accessing then we may be in a position to start discovering something worthwhile. With that fantasy data set I could run my Classification algorithm, supplementing the access history with work roles such as business unit, job type and location. Now that could turn up interesting results!

But how would I gather all this utilisation data in the first place? Access logging would have to be ramped up to capture all access, across all systems of interest. Such a volume of logs would have to be continuously captured and stored – but not just that, we’d also have to contend with different logging formats, different ways of identifying users, and different ways of expressing access. There will also be applications where such detailed logging is unavailable or can’t be turned on for performance or security reasons. And what about reads? Many applications don’t log reads, but they might be a significant part of a user’s job.

OK, but you could still do it, right?

Here’s my final reason why I think Role Mining is a fantasy. The data collection strategy I’ve outlined above would be time consuming, and not just in elapsed time – there would be concerted effort required by different people to increase logging, monitor performance, collate the logs, interpret the data – and that’s before you can even start doing the analysis. It’s a big job and my general feeling is that the organisations who need this the most are also the least likely to agree to the expense. They will continue to be seduced by the “quick fix”, or the product with the flashy dashboard, or the vague notion that the cloud/AI/blockchain/flavour-of-the-month will fix it.

It would be an interesting project though … maybe some customer will prove me wrong one day.

Leave a Reply

Your email address will not be published. Required fields are marked *