Epic remaining cost estimation based on team velocity

Tools

This post was entirely created using Jupyter notebook - if you don't know it I'd highly recommend to install it on your machine: http://jupyter.readthedocs.io/en/latest/install.html Then you could reproduce this example with your Jira data.

This and other ideas on how to use Python with Jira data are available on my GitHub: https://github.com/robert-krasinski/JiraAndPythonForManagers

Rationale

Feature cost estimation is very important information in development process. It should be always present during feature prioritization excercises. Using the estimated cost and projected profits should give you information which stories are most profitable and affect priorities for coming sprints.

In this approach I'm not taking into consideration costs of not implementing particular features. In some situations those will be much higher than development costs. For example not fixing security bug may cause data leak and legal costs or not improving performance of the service can increase infrastructure costs.

Calculations

I'd like to show you how quickly estimate cost for all open (not Completed, Rejected) epics in the project. There are few assumptions that are made to ensure that calculations will be possible:

  • epics are divided into stories that are linked to epics in Jira
  • stories are estimated in SP
  • we know the maximum, minimum and average velocity of the team. I'll show later how to obtain it from Jira report.
  • we know what's the sprint development cost for entire team
  • project is developed by one development team.

The model estimates only not completed stories that are assigned to epics in Jira. In this approach bugs are not estimated in sprints and they're affecting calculations only trough team velocity. More bugs - velocity is lower and the cost of the epic will be higher and vice versa.

I marked green sections that should be modified by when executing on different data source (e.g. Jira url, velocity).

Set Jira url
In [31]:
jira_url = 'https://kainos-evolve.atlassian.net'

Code below loads all open epics from Jira using Python API

In [32]:
import pandas as pd
from jira import JIRA
jira = JIRA(jira_url)

#load all open epics 
jql = 'project=VXT and type=epic and status not in (Completed, Rejected)'
#Jira credentials are stored in ~/.rcnet file 
epicsRaw = jira.search_issues(jql)

epics = pd.DataFrame()
epics['version'] = ''
epics['key'] = ''
epics['type'] = ''
epics['status'] = ''
epics['summary'] = ''

#add epics to dataframe
for issue in epicsRaw:
    #issue may have many versions - in this approach, one version per issue is recommended
    for fixVersion in issue.fields.fixVersions:
        epics = epics.append(
            {'version': fixVersion.name, 
             'key': issue.key,
             'type': issue.fields.issuetype.name,
             'status': issue.fields.status.name,
             'summary': issue.fields.summary,
            }, ignore_index=True)
            
epics
Out[32]:
version key type status summary
0 1.12 VXT-3952 Epic Tech. Scoping Summary...
1 1.14 VXT-3952 Epic Tech. Scoping Summary...
2 1.15 VXT-3773 Epic Idea Summary...

When we have all epics it's time to find all related issues.

In [33]:
epicKeys = epics['key'].tolist()
jql = '"Epic Link" in (' + ", ".join(epicKeys) + ')'
jql
Out[33]:
'"Epic Link" in (VXT-3952, VXT-3952, VXT-3773, VXT-3690, VXT-3577, VXT-3564, VXT-3546, VXT-3530, VXT-3527, VXT-3524, VXT-3507, VXT-3506, VXT-3497, VXT-3485, VXT-3484, VXT-3376, VXT-3351, VXT-3337, VXT-3292, VXT-3264, VXT-3261, VXT-3260)'

All issues related to epics are loaded from Jira in one batch and added to Pandas dataframe.

In [34]:
issuesRaw = jira.search_issues(jql)

issues = pd.DataFrame()

issues['epic'] = ''
issues['key'] = ''
issues['type'] = ''
issues['status'] = ''
issues['SP'] = 0
issues['summary'] = ''

for issue in issuesRaw:
    issues = issues.append(
        {
         'key': issue.key,
         'type': issue.fields.issuetype.name,
         'status': issue.fields.status.name,
         'SP': issue.fields.customfield_10005,
         'summary': issue.fields.summary,
         'team' : str(issue.fields.customfield_14200),
         'epic': issue.fields.customfield_10008
        }, ignore_index=True)

Stories that are rejected or completed should not be included in calculations.

In [35]:
#only open issues are calculated
issues = issues.loc[~(issues['status'].isin(['Completed', 'Rejected']))]
issues.sort_values(["epic", 'type', 'status'], inplace=True)

Cost can be calculated only for epics that are estimated in story points in 100%. To check the estimation factor we need to count estimated and not estimated stories.

In this approach bugs are not estimated - their affect the velocity directly. Fixing bugs in the sprint means that a team is not delivering the features - the velocity is lower. If you're planning reducing tech debt in near future you may like to reduce minimum velocity to reflect that in cost estimations.

In [36]:
#bugs are not required to be estimated
issues['emptySP'] = ((issues.type == 'Story') & issues.SP.isnull())
issues['notEmptySP'] = (~issues.emptySP)

issues
Out[36]:
epic key type status SP summary team emptySP notEmptySP
39 VXT-3337 VXT-3252 Story Tech Refinement 3.0 Summary... Custom Controls False True
23 VXT-3351 VXT-3494 Story Awaiting Prioritisation 3.0 Summary... Ops Team False True
33 VXT-3351 VXT-3356 Story Awaiting Prioritisation 8.0 Summary... Ops Team False True
In [37]:
import numpy as np
#in python we can treat True as 1 and False as 0 so simple sum suffice to calculate count of 
#estimated and not estimated issues in each epic
aggrIssues = issues.groupby(['epic']).agg({'emptySP':'sum','notEmptySP':'sum', 'SP': 'sum'})
aggrIssues['estimatedPerc'] = np.ceil(aggrIssues.notEmptySP / (aggrIssues.notEmptySP + aggrIssues.emptySP) * 100)
aggrIssues = aggrIssues.reset_index()

aggrIssues
Out[37]:
epic SP emptySP notEmptySP estimatedPerc
0 VXT-3337 3.0 0.0 1.0 100.0
1 VXT-3351 11.0 2.0 2.0 50.0
2 VXT-3376 0.0 2.0 0.0 0.0
3 VXT-3484 0.0 1.0 0.0 0.0
4 VXT-3506 36.0 0.0 3.0 100.0
5 VXT-3524 0.0 2.0 0.0 0.0
6 VXT-3527 0.0 2.0 0.0 0.0
7 VXT-3546 11.0 0.0 3.0 100.0
8 VXT-3564 1.0 2.0 4.0 67.0
9 VXT-3690 0.0 1.0 3.0 75.0
10 VXT-3773 0.0 2.0 0.0 0.0
11 VXT-3952 6.5 1.0 2.0 67.0
Set maximum, average and minimum estimated velocity of the team and team sprint costs
In [38]:
minVelocity = 10
avgVelocity = 15
maxVelocity = 20
#assuming that a team have 4 developers and each one salary is 1000 / week and we have 2W sprints
sprintCosts = 4 * 1000 * 2

Below are estimated cost calculations. Information that we can't estimate epic because insufficient data in related stories might also be useful.

In [39]:
del aggrIssues['emptySP']
del aggrIssues['notEmptySP']

#if the epic's stories are not estimated
aggrIssues['minCost'] = 'Data not sufficient to estimate'
aggrIssues['avgCost'] = 'Data not sufficient to estimate'
aggrIssues['maxCost'] = 'Data not sufficient to estimate'

#only calculate costs for fully estimated epics
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['minCost']] = np.ceil(aggrIssues.SP / maxVelocity * sprintCosts)
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['avgCost']] = np.ceil(aggrIssues.SP / avgVelocity * sprintCosts)
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['maxCost']] = np.ceil(aggrIssues.SP / minVelocity * sprintCosts)

#aggrIssues

For presentation purposes it's worth to add descriptions to epics.

In [40]:
#load epic descriptions from Jira
epicNames = []
for epicKey in aggrIssues['epic']:
    epic = jira.issue(epicKey)
    epicNames.append(epic.fields.summary)
    
epicNames = pd.Series(epicNames)
aggrIssues['summary'] = epicNames.values

#change column order
cols = ['epic', 'summary', 'SP', 'estimatedPerc', 'minCost', 'avgCost', 'maxCost']
aggrIssues = aggrIssues[cols]

aggrIssues
Out[40]:
epic summary SP estimatedPerc minCost avgCost maxCost
0 VXT-3337 Summary... 3.0 100.0 1200 1600 2400
1 VXT-3351 Summary... 11.0 50.0 Data not sufficient to estimate Data not sufficient to estimate Data not sufficient to estimate
2 VXT-3376 Summary... 0.0 0.0 Data not sufficient to estimate Data not sufficient to estimate Data not sufficient to estimate

Above table with cost estimations. If you're interested in more ideas on how to use data you already have in Jira for making smart decisions in your project please visit: https://github.com/robert-krasinski/JiraAndPythonForManagers