One Metric to Rule Them All and in the Darkness Bind Them
by Wes Williams
I think metrics and measurements are good when used in the correct way based on the context and team I am working with. I use metrics to help them see what their issues are. Once they see their issues, we use metrics to help us determine, as early as possible, if the changes we are making are having a positive or negative impact on those issues and the rest of the system.
Measurements ARE necessary to know that we are headed in the right direction.
There are plenty of articles out there about abusing metrics. I thought it well known that all metrics need to be balanced (e.g. code coverage going up and complexity going down), and of course they need to be trended to be useful.
Now I have a request to find one or two metrics to apply to all teams to determine how effective Agile and coaching are at improving the teams. Does someone really think that one or two metrics can be used to determine effectiveness?
All teams do not have the same highest priority issue(s). Teams with terrible user stories and acceptance criteria do not need the same metrics as a team trying to fix high coupling code issues.
Ok, enough complaining! To help me, and I hope others, I want write about 1) the goals of specific metrics, 2) the dangers and abuses of those metrics, and 3) how to balance those metrics against each other.
Average Velocity trend
- Predictability!! What can be done by a specific date or when can something be completed.
- Velocity is a capacity measure, NOT a productivity measure.
- Velocity allows a team to know how much business value they can deliver over time.
- Developing a consistent velocity allows for more accurate (i.e. predictable) release and iteration planning.
- Calling this a measure of productivity. Focusing on velocity alone could even hurt productivity. Teams can artificially increase velocity in many ways: stop writing unit tests or acceptance tests, increase estimates, stop fixing story defects, and reduce customer collaboration, just to name a few.
- Comparing velocity between teams. Velocity is a team value and not a global value. Many variables affect a team's velocity, including relative estimating base, support requirements, number of defects, political environment of the product or project, and more.
- Calculating velocity by individual. This leads to a focus on individual performance vs. team performance (i.e. sub optimization).
- Using velocity to commit to the content of an iteration when the value is not valid. Velocity is a simple concept and provides a lightweight measure, but it is also a very mature measure. To be useful it requires estimation maturity and the consistent application of this over a period of time by a stable team base. If it lacks these elements, its abuse can come at the hands of management or from the team, the latter occurring when a team makes assumption about the validity of the metric when, without the mature elements in place, it is not usable at all.
- Percentage of rework vs. stories done on average each iteration. This can help a team see how much of their work in each iteration is delivering new value to the team's customers.
- Planned work vs. unplanned work trend. A lot of unplanned work will cause a team’s velocity to be of less value because it hinders the team's ability to plan. Having a low value for unplanned work will make the team’s planning more consistent and accurate.
- Code quality metrics such as code test coverage, cyclomatic complexity, static error checking, and performance. A team that is increasing their velocity by not focusing on code quality is making a short term decision that will have a negative impact over time.
Delivered Features vs. Rework Resolution trend
- Makes _waste_ visible so that it can be eliminated.
- Gives the team a good understanding of how much of their iteration capacity is consumed by rework (i.e._waste_).
- Lagging indicator of the team quality.
- Story defects are not worked on until a regression period, giving a short term indication of fewer defects.
- Increasing story estimates and/or reducing defect estimates.
- Hiding defects as stories.
- An inconsistent velocity. Delaying defect correction until later will make the velocity trend erratic with large spikes.
- Planned vs. unplanned scope. A team that is delaying defect correction will tend to have more unplanned work due to poor quality issues.
- Number of defects in the backlog. Ideally this number should be on a downward trend. An upward trend of the number of defects in the backlog could indicate the team is delaying defect correction.
- Increasingly long regression periods at the end of each release.
Completed Work vs. Carryover trend
- Show how well the team executes the iteration (i.e. delivers on their commitments).
- Planning less work than the team is capable of to allow for interruptions or poor estimating.
- Delaying refactoring code to complete work but not keeping the code at a level that makes change cheaper and easier in the future (or other good practices such as TDD/unit testing).
- A velocity trend that is not improving or is going down could be caused by planning less than the real capacity of the team.
- Planned vs. unplanned work can indicate if the team is being interrupted and is causing task switching that could be the cause of the carryover.
- Downward test coverage trend and/or upward cyclomatic complexity trend could indicate that the code is becoming more difficult to change and much more difficult to estimate accurately.
Planned vs. Unplanned Scope trend
- Show how good the team is at planning.
- Show how often the team is being interrupted within the iteration to work on something that wasn't originally planned.
- Large placeholders to allow unplanned work to come in and appear to be part of the planned work.
- Delivered Features vs. Rework Resolution trend
- Completed Work vs. Carryover trend
Code Coverage vs. Cyclomatic Complexity trend
- Reduce the cost of change. Clean code tends to make the application easier to understand and safer to change.
- Indicate that the system is being tested at an accurate level.
- Indicate that the code quality is good: loosely coupled, simple as possible, etc.
- Focusing only on one code metric, e.g. 100% code coverage with generated tests will not make the code easier to understand or change.
- Focusing on code quality alone and not focusing on the business goals of the customer.
- Velocity trend
- Delivered Features vs. Rework Resolution trend
- Afferent and efferent coupling trends
- Abstractness trend
- Package dependency cycles
- Number of changes in class(es)
This is far from an exhaustive list of metrics! But I hope the idea helps, of thinking about a metric, what your goal is of measuring a value, and how you can stop yourself or others from gaming the value by balancing it with other methods.