FIM Best Practice: Use the best Data Sources

FIM is all about data. It’s identity data, sure – but it’s still just data. And it needs to come from somewhere.

Typically we will have multiple sources of data coming into FIM, but as with everything, there are good and bad ways to manage this.

Good vs Bad data sources

Not all data sources are created equal. Ideally we want to get as close as possible to data that is maintained by the people who are most interested in its correctness. So for example:

  • Employee information in the HR database,
  • Cost Codes in the Purchasing system,
  • Location details in the Asset Management system.

Data is entered haphazardly when there is no real penalty from it being wrong. Perhaps telephone numbers are put in AD as a convenience to users but if they’re wrong the phones don’t break, and they only get changed when someone complains. So here AD is a bad source for telephone number.

Object sources

An object source is allowed to project a new object into the FIM Metaverse. It says “this object can exist as a unique identity in the IdM metadirectory”.

We should have only one object source per object category.

Note I’m not saying object type. It could be perfectly valid for person objects to be projected from HR and the Student Roster and the FIM Portal, being respectively Employees, Students and Contractors.

You get into trouble when the same identity can be projected from multiple sources. I have seen environments where people only show up in HR after they start work, so both AD and HR are declared as projection sources. This is a ludicrous design as you can’t guarantee reliable join criteria, and all you’ve done is made yourself a duplicates generator.

Attribute sources

Again we should aim for one attribute source per attribute, per object category.

Say we have multiple email environments for different businesses within a conglomerate. Naturally each should be able to contribute email address – but only for people from that business.

While it’s possible to write all sorts of complex precedence rules it is usually a bad idea. You’re weakening your system to accommodate poor data practices and in the end, no one will thank you for it.

Clearly communicate the various data sources

People are going to have to get used to updating values “at the source”. If a user’s name is misspelt in the GAL there’s no point changing it in AD when FIM is updating the name from HR. It needs to be fixed in HR. This idea will take a while to filter through, but the good thing is, once FIM has changed a value back a few times, people usually get the idea.

Got something to add? Disagree? Comments are open!

4 Replies to “FIM Best Practice: Use the best Data Sources”

  1. It there is no ‘HR’ database as such but the customer wants to use FIM to manage identity with Active Directory would it better to create a data source (i.e. SQL database) and populate this with the information and feed this data source into FIM? I.e. would this be better than using FIM Portal itself as the ‘point of truth’ for data

  2. In reality we always have to make compromises. You can populate the Portal from AD and then use it as a SOT from then on – I’ve been forced down that track myself with a current project I’m working on. But it’s not ideal. People are creating duplicates, and they’re not following the rules to deactivate people properly. I’m having to build a whole lot of data monitoring and reporting around it to alert to inconsistent data. Access to HR in this environment would only give us info about 60% of the users – but it’s better than nothing and we’re working through the politics now.

    So do what you need to do to get the project moving, but keep in mind that best practices like this one are there to be aimed for, because they really do save trouble in the long run.

  3. I constantly state to the HR department that data must be consistent and meet current business rules. It always falls on deaf ears.

  4. It is a struggle. I always put clearly stated assumptions at the top of my design: “Data source from HR is assumed to be correct”, “The HR database does not contain duplicate Person records”… It doesn’t mean it won’t but at least you’ve made it clear the assumptions under which you’ve done the design. I don’t even think it would be possible to do an IAM design without these assumptions…

Leave a Reply

Your email address will not be published. Required fields are marked *