The Risk Matrix is everywhere. In every procedure about risk management, in JSEA and HAZOP documents and on Start cards. The only way, apparently, in which risk is able to be identified at all levels in an organisation is to use the risk matrix to do it.
The world is very complex, and the future nigh unknowable. So, I use The Matrix to simplify how I make a decision, so that I’m able to make a decision at all. However, just because it’s a cognitively economical tool doesn’t make it a good one. When I’m using the Risk Matrix it often feels like I’m using a blunt object when I really need a precision (or organic) tool. But I don’t know what else to do, and so I just keep on using it!
The Matrix asks me to think about what could happen, in terms of the consequences of future events, and whether they will actually happen, in terms of assessing likelihood. Estimating consequence often is not too tricky; I can say with relative ease how bad (or good) something is. For example, a fatality is worse than a broken arm which is worse than a cut finger. Or losing $10 million is worse than losing $5 million. But likelihood is underspecified. The Matrix shapes how I should think about likelihood in terms of guideline relative frequencies, words like “possible” or “unlikely”, or percentages representing chance. But how do I use these concepts when I am faced with assessing likelihood in my day-to-day workplace? Just because these conceptions of likelihood are everywhere, because The Matrix is everywhere, doesn’t mean that they should be everywhere.
What I mean is that there are certain conceptions of likelihood that make sense at the management level that are far less useful on the workshop floor, but because of the ubiquity of the Risk Matrix people on the ground have no other guidance or mandate, and are instead forced to think about likelihood in terms of, for example, relative frequencies. That is, based on how often something happened in the past, that’s how often it will occur in the future.
But does it matter if likelihood is thought about the same way by management and by operators? To explore this idea, let’s examine 4 ways to interpret what probability, or likelihood, means (from de Ella and Laprise, 2005):
- Classical – ratio of favourable events to the total possible number of events. Linked with the symmetry or equiprobability between events occurring. For example, predicting that the probability of getting a 6 when rolling a die is the same as getting a 1 because of the symmetrical faces of a cube.
- Frequentist – counting the number of times an event occurs out of the total number of possible times the event could have occurred. For example, predicting that the probability of getting a 6 when rolling a die is 1/6, because out of 6,000,000 test rolls, I have gotten a 6 about 1,000,000 times.\
- Subjective – the degree of belief of a person that an event will occur. Linked with experience and intuition. For example, I believe that when I roll a die I will get a 6 about 1/6 of the time, based on my experience of the ‘rolling a die’ scenario. Another person, who has had a different experience of rolling a die (say their die was slightly weighted!), may predict a different outcome.
- Propensity (Causal) – the degree to which the occurrence of an event is determined by its causes. For example, predicting that I will get a 6 when rolling a die because given the initial state of the die in my hand, then angular momentum it was rolled with, the height from the table and other relevant factors, these conditions are, causally speaking, sufficient to produce an outcome where the 6 is face up on the table.
Hindsight and Foresight
The 4 interpretations above can be classified according to whether they appeal to hindsight or foresight. The classical and propensity views lead people to seek foresight about an event that could happen in the future. The classical view makes use of a theoretical assumption about symmetry to do so and the propensity view asks us to think about causation built into the system to determine if something is going to happen. In contrast, the frequentist and subjective views ask us to make use of our hindsight, whether calculated (frequentist) or intuited via experience (subjective).
From a management point of view, the frequentist view is wonderful. It’s all about counting numbers of events, which is easy and gives something so delightful – certainty. Representing the world in numbers feels very clear and concise. It is abstracting the world to a world of symbols that I have control over, because deep down I know I don’t have control over the world, even though I want to. With the frequentist approach we appeal to hindsight, which we know gives us a kind of 20/20 vision of the past: “50 events of this type occurred each month for the last 20 years, so this will continue with a high chance into the future”.
In a sense, isn’t this quite a static view of how real systems work? Because isn’t it the case that in the minutiae of the lead up to every event, even if it looks the same as a previous event of the same type, there are tiny differences that may contribute to a different outcome? For example, in the case of throwing a die, each time it is thrown, the start point in the hand may be slightly different; the amount of force or angular momentum it is thrown with will be slightly different; the place on the table it first hits, and the part of its surface it first hits with will be slightly different each time. The tiny differences (and they may not be tiny!) in each of these factors all contribute to the final outcome of the die roll, despite the fact that there can be only 1 of 6 outcomes. In short, which number shows up is actually dependent on all these factors, and the symmetry between the faces of the die is one such factor. The classical approach therefore gives an estimation as to what the outcome could be; the frequentist also, and in this case they are both much more useful than the subjective view. However, even if we cannot compute it, the final state of the die is dependent on these causal factors and physical processes.
Where this takes us is to the question: aren’t all events fundamentally unique? Consider the die rolling above: in reality there are many, many little factors that contribute to the final outcome. The frequentist view ignores these and just counts the final outcomes. The diversity and depth present in real systems is not considered. If one were to approach the die rolling from a propensity viewpoint, a set of questions would no doubt emerge about how each little factor contributes to the outcome.
To bring this back to management and operators, management ask workers to apply hindsight thinking to foresight problems. Counting the occurrence of similar events gives an indication of what the past has been like for that type of event, but says nothing about the fine details and causal structures that may have lead up to unique events. If every day reveals new, unique events for workers to face, they should be thinking about these fine details, as much as their perspectives let them, in order to work out whether these particular unique events could occur. And if they are always facing unique events, even if they look a bit the same as previous, then applying frequentist thinking will not be appropriate.
So why does management insist upon using numbers to guide how their operations work? It’s because they’re looking for patterns. Patterns to highlight where potential problems, or solutions, may be. And they are right to do so, since from the point of view of management they can see patterns across a whole company that no one else can. Such as how many hand injuries are happening on one site compared to another. This pattern begs the question, “what is different about these two sites?”, and may lead to crucial details emerging about what to do about it.
And if applying frequency rates is inappropriate for workers facing unique events, what should they do? They should look for patterns. But different patterns from the ones management looks for. Workers perhaps should be driven to look for causal patterns, taking advantage of the many pieces of information available to them. For example, if I’m going to move a forklift out of the way so I can access an electrical panel that needs maintenance, I should think about how the forklift moves to avoid hurting myself, others or equipment; I should think about where the forklift should be placed, since there are 20 other people working in this shed too; I should think about who needs the forklift back in the place I’m moving it away from and when they need it. There is much data and many patterns that people see every second that they use to estimate the likelihood of an event based on the occurrence of its’ (possible) causes. This is how experience and intuition is developed, through going through this process many times over. This is also how differences to familiar situations are identified, and how unique events can be dealt with.
But the pattern-connection between managers and operators goes further than this. Managers (as opposed to management – the organisation level), just like operators, don’t naturally use the words likelihood, consequence or risk, in the risk matrix sense, when making decisions about these concepts in their everyday work environment. They naturally do think about patterns in terms of what could happen (consequence) and whether it could actually happen (likelihood), but not explicitly using a risk matrix approach. For example, if I am eating soup at my desk, there is a risk that I’ll spill it over my computer keyboard and thus I consider how likely this is. But the way I go about this is by looking for the same kinds of patterns that operators do in a workshop when faced with their everyday risks.
In short, forcing people to think about likelihood and risk in terms of relative frequencies, so very much ingrained in the risk matrix, marginalises the richness of the patterns that people naturally see, whether managers or operators. But the higher level patterns that management (as distinct to managers) see shouldn’t be dismissed either, since workers simply cannot see them. Searching for patterns link the two worlds of management and operational work, but they are of different types. Techniques and tools should be developed that explicitly link both types, to improve the overall pattern of work for all.
If the Risk Matrix is a blunt object that is only useful in some contexts, it shouldn’t be the over-arching meta-narrative that governs how we do things. What I am suggesting here is that we need to find a new, broader, more flexible set of governing conceptions to help us do work safety.
Searching for patterns is the doorway to this brave new world, unplugged from the matrix.
I enjoyed reading your article Ben. I too have been thinking about how to assess risk (safety) differently for some time, and the risk matrix concept has been at the heart of this for me.
Since many factors actually influence the likelihood of these things eventuating, maybe it would be more effective to measure the factors that contribute, instead of an arbitrary or guessed sum of these. Contributors such as level of supervision (rare to continuous), rigorous systems of work that are well tested and proven, the capability and competence of the people involved (greenie to jedi master), the adequacy of the plant or equipment involved in the task etc when considered in light of the foreseeable consequence may give a more thorough indication of risk. Perhaps one could work through the items above which would indicate a likelihood (or propensity) which can then be used in the matrix. I imagine someone smarter than I could quickly put together a formula and ‘neo matrix’ that could take all of these things into account.
This has been in my thoughts for some time so I am looking forward to the further discussion.
One of the problems with Risk Matrixes is that they underspecify what the likelihood and the consequence represent. For example, if I crash my car into the tree, there will be an actual outcome. The severity will be fatal, or serious, or minor. An airbag may reduce the severity of my particular outcome.
When we put it into the matrix, though, we are talking about a hypothetical crash. What is the severity of a thing that hasn’t happened? The reality is that the severity is a probability distribution. Crudely, there is a certain chance it will be fatal, a chance it will be negligible, and chances for every outcome in between. When we put it into the matrix we are simplifying by choosing the most likely, worst credible, or some other point along the distribution. Leaving aside all the problems with uncertainty versus deterministic probability Ben addresses, the matrix is only helpful if this simplification makes us make better decisions.
I have seen lots of cases where it clearly hasn’t. One weird effect is that if you choose “worst credible” as the event you are measuring likelihood and probability of, the severity of this event very seldom changes. This is counter-intuitive, since plenty of protections change the severity of actual events without changing the severity of the hypothetical worst-credible event. You only have to read the convoluted debates on LinkedIn to see that many people who use risk matrixes have never stopped to think about the underlying reality.
Another effect of the matrix is that it shapes future obligatoins. I’ve seen matrixes drawn where a ‘catastrophic’ event can never be ‘acceptable’ because, to quote the manager “no fatality would ever be acceptable”. They think they are being conservative, but the end effect is that no one is willing to admit that a hazard might be catastrophic, because they will then be left unable to ever get it signed off, no matter how unlikely they can make it.
I’ve also seen risk mitigations discarded as ineffective because they don’t change the likelihood or severity of the worst credible event (“no change in the matrix, therefore no improvement”) despite the fact that the mitigation significantly reduces the likelihood of less severe outcomes.
A significant difficulty with these matrices is that they co-opt the sense of risk to include only those outcomes that are unwanted deviations from the intended path. If we take a fulsome view we might define the decision-making challenge thus: Risk is the exposure of capital to opportunity. If there is a positive return on the commitment of capital, then the upside risk is said to have exceeded the downside risk.
One way of explaining why the use of risk matrices in the front line setting is counter-productive is that generally workers are oriented toward Doing Work Safely – as that outcome can be anticipated within the working Line of Sight.
Much of hazard exposure is immediately detectable and avoidance of the hazard is the normal defense mechanisms. Time, distance or shielding/encapsulation (of the danger or the worker with PPE) are the tried an true methods of defense. In these situations it pays to focus on getting the goal successfully with good confidence that needless rework will be avoided or at least minimized.
It pays to remember what kinds of things when badly in the past and to safeguard against them. Sometimes the remembering is by way of a written procedure, other times it is by way of knowledge recall amongst work team members. Each is highly dependable if employed consistently.
Neither is fool proof; thus we encourage work to proceed with vigilance against the emergent need to modify the plan or even to stop the work and retreat to a safe state. At the Sharp End, the mechanistic character of the risk matrix can actually work against the plan for success with high shared confidence.
In my experience the risk matrix only makes sense as a comparative measure of downside risk. Employed to assess the relative exposure to failure of a portfolio of scenarios for achieving a desired outcome it can provide some basis for conversation about further refinement of plans.
But even at the managerial level, the risk matrices only add value if they lead to insights about how to get a particular type or instance of specific work done safely with good agreed upon confidence and with a sensitivity to the potential for variation in the actual circumstance. That there is nothing in a typical matrix which points explicitly to the particulars of the work to be done should serve as a reminder that these are not to be applied in some rote manner as if they provided some objective insight all by themselves.
Ben great article. I discovered systems theory and safety II not too long ago and am slowly reading through the ‘back catalogue’ of excellent content that is out there! I would add that predictability within a system seems very dependent on where it sits on the two axes of complexity and order. As the theory goes, its impossible to predict future states in complex disordered systems (even non-adaptive ones) which says something about the value of risk matrices in this regard. As I’ve heard Dave Snowden say in some of his lectures, we can only estimate the likely disposition of a system to respond in specific ways for any given ‘insult’ and then the best we can do is try to engineering the conditions to shape that response. Systems theoretic tools (STPA for example) may be better suited to this sort of a hazard/vulnerability assessment application.