Unpacking Machine Learning for Access Governance -

Anyone who has spent some time on Identity and Access Management programmes would have confronted the challenge of designing and mining roles and access profiles that stay relevant throughout its lifecycle. It is a notoriously sticky end of all Identity and Access Management initiatives and often gets abandoned or overlooked, partly because of its sheer complexity, but also because this one is not really an IAM issue per se.

Enterprises deploy applications to support the delivery of their business processes, these processes, applications and the access profiles therein have individual lifecycles of their own. Just because you are managing one of these components effectively does not mean that other wheels churn in tandem. Change in business priorities, “Transformation” programmes, re-platforming continuous updates triggered by evolving business and security requirements ratchets up a debt of the access control variety. Too often, programmes are not bothered with addressing this last mile. While champagne bottles are popped to celebrate the conclusion of a complex change or transformation programme, the wasteland of unnecessary access profiles registers a few hundred or thousand more entries that are too often left with standing privileges.

Over the past few years, the adoption of PaaS has thankfully led to the progressive adoption of “Just in Time” access for infrastructure and these are increasingly being embedded into the working practices, tools and pipelines. Logic would dictate that we could extend these concepts to enterprise applications and access profiles. Save some development in attribute based access controls, which allows for fine grained access together with improved parameterisation — the fact still remains that these parameters and attributes still must be maintained for relevance over time.

In an ideal world, a mapping of enterprise critical processes extricated from a BPM or process mining solution, can be mapped to mission critical data that underpins these processes. Data based on their criticality labels could then be mapped to applications and access profiles, such that any change in the process could trigger real time downward “self-healing” or remediation activity. In an ideal world, the application owner should be able to pull together a dashboard that allows them to navigate towards any remediation activity in terms of the critical data access profiles, its composition, description and function. In the ideal world, the application owner should be able to isolate outliers to assignments, whether these be human or otherwise. How far are we from this ideal world?

The primary component of this solution is what i call the “Enterprise Data Model” (EDM). The EDM in my mind is the mapping of the critical business processes to information assets and access profiles. It is an enterprise artefact and not platform or product dependent. On the killer assumption that a rudimentary model can be built using Machine learning clustering and classification algorithms, i can then start to map the EDM as a source for my far from imperfect access profiles whether they be sat in an Identity Governance solution, Entra ID permission groups or elsewhere. Having reached to this point, we can train the model for a number of use cases based on personas — Auditors, InfoSec, Process Owner, Application Owner, Data Owner, End User, Role Owner. The final frontier would be a generative interface which can either be achieved through RAG with an Open AI playground or other LLM’s through API integration. The auto-heal processes would follow and there are options to either let the updates happen by themselves or for intervention by the respective personas. Snap back from that dream now. Reality bites.

Baby steps. With my fledgling skills on Python, i was able to inform an LLM like the Open AI playground, the context derived from a mimicked EDM (dummy data), aided by functions that support NLP tasks, all in under 30 lines of code. Besides answering questions like “Can you list the number of applications without a single assignment under the Order to Cash Process?”, for levity sakes, it was also able to write for me a Haiku on Enterprise Architecture!

Questions around the tagging overheads, the multi-dimensionality and training data sets loom. Performance costs are an important consideration. With a limited set of 1200 records of dummy data ingested into a vector DB, it has taken 60 seconds to churn the results. In my view, this is high. Although we are not talking about the hundred billion tokens and parameters that LLM’s are trained on, for enterprises of global scale, i anticipate the parameters to enter into their high thousands. The final frontier being the “auto-heal” part, together with the security implications of such a solution being exposed to the various personas. As a reality check, the work assumption would be that building such a model would itself take multiple iterations, with probabilistic models throwing out sub-par results in during initial iterations. Overall costs are something to ponder over as well.

For the initiated, it does not need repeating why the above work is necessary and why we cannot solve this equation with point solutions. But the promise of this is not just IAM related, it does feed into a number of other enterprise use cases that link to Enterprise Architecture, Data Security, Configuration and Change Management.

I would be keen to hear from people within my network or otherwise who have been working on this or similar use cases. There is a still a long way to go, and a meeting of minds is in order!