Why I care so much about identity data quality

I feel like I’m always trying to convince people that the quality and maintence of identity data is important and worth putting effort into, while they nod and say “sure, sure”, while thinking “this crazy lady knows nothing about reality”. But you know what? I’m not crazy – and here are some reasons why.

Do you know who all your users are?

It’s pretty easy to count the number of user accounts you have, and you may even be able to check when they last logged in, but do you actually know who they all are, including external people such as contractors? It amazes me how many organisations allow essentially unregistered externals a high level of access to their systems including priviledged account access and a corporate email address. Yes someone “signed it off” at the time – but that sort of thing can be almost impossible to correlate later on – especially if you’re looking at hundreds or even thousands of contractor accounts.

This also links to the question of immutable identifiers. It’s one thing to know who your users are in a single system – but do you know across all systems? Can you itemise all of someone’s accounts and access? This might be possible to track down for one person – but what if you have to report on all users (while the auditor is looking over your shoulder)?

Do you know enough about your users?

One of the big problems here is that the definition of “enough” keeps expanding.

How often have you logged into some social media/cloud/website lately to have it ask you to provide extra information about yourself – perhaps a mobile phone number for password reset authentication. New features and functionality often mean information is just expected to be there, but because it wasn’t needed in the past no one bothered collecting it. I have seen this countless times with on-prem applications – someone has been sold on the great new feature that automates manager approval, only to discover that the manager attribute in AD is only sporadically populated, not trustworthy, and there’s no single, matchable source to get it updated – especially for those pesky externals. Suddenly the great new feature is only available if an arduous data cleanup operation is first conducted – something no one budgeted for.

Now IAM can’t magic non-existant data out of thin air – but at least if you know who your users are, and can link them to source data using proper identifiers, you can get that information from where it lives or, at the very least, come up with a workable plan to get the information.

Spot the bad

Poor quality identity data leads directly to false alerts, errors and general “noise” that can easily mask genuine problems. If we can expect at least a certain baseline then abnormalities will be easier to spot. This is not just about security risks, but also being able to proactively fix issues before a whole lot of user and support time has to be wasted.

Provision, don’t Migrate

Continuous change is part of IT. Whether it be a replacement of an existing application, a migration to a cloud service, a merger or divestiture, we are regularly having to create new user accounts and provision new access. It would be interesting to get stats on what percentage of corporate user accounts are still created and permissioned by hand, but I’m prepared to bet it’s still the majority.

The other thing about these points of change is that they always seem to happen in such a rush. The licenses have been paid for, the service is up, we want the users in there NOW.. so what happens? All the accounts get migrated from somewhere else – the application being replaced perhaps – along with all the poor quality data and accounts of dubious origin that were in the old system. I want to see a time where the user accounts in the new system are provisioned from trusted source data because it’s actually quicker and easier to do it that way. If you’ve had that experience then your work in cleaning up IAM data has just well and truly paid off!

We’ll clean it up later

Now who’s talking crazy! If you don’t have time to clean bad data now you won’t have time later either.

Data cleaning often falls into the “important but not urgent” quandrant, making it of no interest to the type of management that can only get interested in “urgent” (even when the addendum to that is arguably “but not important”). Unfortunately sooner or later your poor quality identity data will cause something urgent (a security breach), or more shortcuts (migrating or ignoring bad data), that will only prolong and multiply the issues.

In summary

I know it’s boring (believe me, I really do) but I also know that true governance and management of identity data is one of those essential things, like patching servers, or keeping your house clean – it might seem easier today to skip it, but if you never get round to it the cost will be manifestly higher in the long run.

2 Replies to “Why I care so much about identity data quality”

Jef Kazimer says:

August 5, 2015 at 3:28 am

Another important aspect of data quality is that poor data can lead to inadvertent access. Computed Groups, SAML Claims, Kerberos Claims, can all be used to provide access based on metadata of a user. If that data is free form, not managed, and not maintained, users may be missing access they need to do their role, or worse, given access that is not appropriate for their role. As IT transitions from the “we maintain control of services” to “we connect to services”, the quality of the identity data is more important than ever. This is where a good Identity management strategy which includes a data management strategy is even more important! If you don’t know the source of the data, can you trust it’s the correct data to be used with access decisions?
Carol says:

August 5, 2015 at 5:59 am

That’s a great point Jef, and I think it’s another example of data being used (or needed) for new uses that didn’t exist previously – while at the same time no one budgets for actually doing the clean-up and the ongoing data maintenance to ensure the new functionality. People seem to believe that just because the AD schema contains a “department” attribute it will somehow be magically populated and correct.