Epic remaining cost estimation based on team velocity¶

Tools¶

This post was entirely created using Jupyter notebook - if you don't know it I'd highly recommend to install it on your machine: http://jupyter.readthedocs.io/en/latest/install.html Then you could reproduce this example with your Jira data.

This and other ideas on how to use Python with Jira data are available on my GitHub: https://github.com/robert-krasinski/JiraAndPythonForManagers

Rationale¶

Feature cost estimation is very important information in development process. It should be always present during feature prioritization excercises. Using the estimated cost and projected profits should give you information which stories are most profitable and affect priorities for coming sprints.

In this approach I'm not taking into consideration costs of not implementing particular features. In some situations those will be much higher than development costs. For example not fixing security bug may cause data leak and legal costs or not improving performance of the service can increase infrastructure costs.

Calculations¶

I'd like to show you how quickly estimate cost for all open (not Completed, Rejected) epics in the project. There are few assumptions that are made to ensure that calculations will be possible:

epics are divided into stories that are linked to epics in Jira
stories are estimated in SP
we know the maximum, minimum and average velocity of the team. I'll show later how to obtain it from Jira report.
we know what's the sprint development cost for entire team
project is developed by one development team.

The model estimates only not completed stories that are assigned to epics in Jira. In this approach bugs are not estimated in sprints and they're affecting calculations only trough team velocity. More bugs - velocity is lower and the cost of the epic will be higher and vice versa.

I marked green sections that should be modified by when executing on different data source (e.g. Jira url, velocity).

Set Jira url

jira_url = 'https://kainos-evolve.atlassian.net'

Code below loads all open epics from Jira using Python API

import pandas as pd
from jira import JIRA
jira = JIRA(jira_url)

#load all open epics 
jql = 'project=VXT and type=epic and status not in (Completed, Rejected)'
#Jira credentials are stored in ~/.rcnet file 
epicsRaw = jira.search_issues(jql)

epics = pd.DataFrame()
epics['version'] = ''
epics['key'] = ''
epics['type'] = ''
epics['status'] = ''
epics['summary'] = ''

#add epics to dataframe
for issue in epicsRaw:
    #issue may have many versions - in this approach, one version per issue is recommended
    for fixVersion in issue.fields.fixVersions:
        epics = epics.append(
            {'version': fixVersion.name, 
             'key': issue.key,
             'type': issue.fields.issuetype.name,
             'status': issue.fields.status.name,
             'summary': issue.fields.summary,
            }, ignore_index=True)
            
epics

When we have all epics it's time to find all related issues.

epicKeys = epics['key'].tolist()
jql = '"Epic Link" in (' + ", ".join(epicKeys) + ')'
jql

'"Epic Link" in (VXT-3952, VXT-3952, VXT-3773, VXT-3690, VXT-3577, VXT-3564, VXT-3546, VXT-3530, VXT-3527, VXT-3524, VXT-3507, VXT-3506, VXT-3497, VXT-3485, VXT-3484, VXT-3376, VXT-3351, VXT-3337, VXT-3292, VXT-3264, VXT-3261, VXT-3260)'

All issues related to epics are loaded from Jira in one batch and added to Pandas dataframe.

issuesRaw = jira.search_issues(jql)

issues = pd.DataFrame()

issues['epic'] = ''
issues['key'] = ''
issues['type'] = ''
issues['status'] = ''
issues['SP'] = 0
issues['summary'] = ''

for issue in issuesRaw:
    issues = issues.append(
        {
         'key': issue.key,
         'type': issue.fields.issuetype.name,
         'status': issue.fields.status.name,
         'SP': issue.fields.customfield_10005,
         'summary': issue.fields.summary,
         'team' : str(issue.fields.customfield_14200),
         'epic': issue.fields.customfield_10008
        }, ignore_index=True)

Stories that are rejected or completed should not be included in calculations.

#only open issues are calculated
issues = issues.loc[~(issues['status'].isin(['Completed', 'Rejected']))]
issues.sort_values(["epic", 'type', 'status'], inplace=True)

Cost can be calculated only for epics that are estimated in story points in 100%. To check the estimation factor we need to count estimated and not estimated stories.

In this approach bugs are not estimated - their affect the velocity directly. Fixing bugs in the sprint means that a team is not delivering the features - the velocity is lower. If you're planning reducing tech debt in near future you may like to reduce minimum velocity to reflect that in cost estimations.

#bugs are not required to be estimated
issues['emptySP'] = ((issues.type == 'Story') & issues.SP.isnull())
issues['notEmptySP'] = (~issues.emptySP)

issues

import numpy as np
#in python we can treat True as 1 and False as 0 so simple sum suffice to calculate count of 
#estimated and not estimated issues in each epic
aggrIssues = issues.groupby(['epic']).agg({'emptySP':'sum','notEmptySP':'sum', 'SP': 'sum'})
aggrIssues['estimatedPerc'] = np.ceil(aggrIssues.notEmptySP / (aggrIssues.notEmptySP + aggrIssues.emptySP) * 100)
aggrIssues = aggrIssues.reset_index()

aggrIssues

Set maximum, average and minimum estimated velocity of the team and team sprint costs

minVelocity = 10
avgVelocity = 15
maxVelocity = 20
#assuming that a team have 4 developers and each one salary is 1000 / week and we have 2W sprints
sprintCosts = 4 * 1000 * 2

Below are estimated cost calculations. Information that we can't estimate epic because insufficient data in related stories might also be useful.

del aggrIssues['emptySP']
del aggrIssues['notEmptySP']

#if the epic's stories are not estimated
aggrIssues['minCost'] = 'Data not sufficient to estimate'
aggrIssues['avgCost'] = 'Data not sufficient to estimate'
aggrIssues['maxCost'] = 'Data not sufficient to estimate'

#only calculate costs for fully estimated epics
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['minCost']] = np.ceil(aggrIssues.SP / maxVelocity * sprintCosts)
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['avgCost']] = np.ceil(aggrIssues.SP / avgVelocity * sprintCosts)
aggrIssues.loc[((aggrIssues['estimatedPerc'] == 100)), ['maxCost']] = np.ceil(aggrIssues.SP / minVelocity * sprintCosts)

#aggrIssues

For presentation purposes it's worth to add descriptions to epics.

#load epic descriptions from Jira
epicNames = []
for epicKey in aggrIssues['epic']:
    epic = jira.issue(epicKey)
    epicNames.append(epic.fields.summary)
    
epicNames = pd.Series(epicNames)
aggrIssues['summary'] = epicNames.values

#change column order
cols = ['epic', 'summary', 'SP', 'estimatedPerc', 'minCost', 'avgCost', 'maxCost']
aggrIssues = aggrIssues[cols]

aggrIssues

Above table with cost estimations. If you're interested in more ideas on how to use data you already have in Jira for making smart decisions in your project please visit: https://github.com/robert-krasinski/JiraAndPythonForManagers

	epic	SP	emptySP	notEmptySP	estimatedPerc
0	VXT-3337	3.0	0.0	1.0	100.0
1	VXT-3351	11.0	2.0	2.0	50.0
2	VXT-3376	0.0	2.0	0.0	0.0
3	VXT-3484	0.0	1.0	0.0	0.0
4	VXT-3506	36.0	0.0	3.0	100.0
5	VXT-3524	0.0	2.0	0.0	0.0
6	VXT-3527	0.0	2.0	0.0	0.0
7	VXT-3546	11.0	0.0	3.0	100.0
8	VXT-3564	1.0	2.0	4.0	67.0
9	VXT-3690	0.0	1.0	3.0	75.0
10	VXT-3773	0.0	2.0	0.0	0.0
11	VXT-3952	6.5	1.0	2.0	67.0

	version	key	type	status	summary
0	1.12	VXT-3952	Epic	Tech. Scoping	Summary...
1	1.14	VXT-3952	Epic	Tech. Scoping	Summary...
2	1.15	VXT-3773	Epic	Idea	Summary...

	epic	key	type	status	SP	summary	team	emptySP	notEmptySP
39	VXT-3337	VXT-3252	Story	Tech Refinement	3.0	Summary...	Custom Controls	False	True
23	VXT-3351	VXT-3494	Story	Awaiting Prioritisation	3.0	Summary...	Ops Team	False	True
33	VXT-3351	VXT-3356	Story	Awaiting Prioritisation	8.0	Summary...	Ops Team	False	True