Context

Open source projects rely on libraries that get updated regularly. While not all updates require code changes, some do. According to semantic versioning, versions are numbered incrementally according to the type of change. A “MAJOR version” signifies incompatible API changes, a “MINOR version” signifies added functionality in a backwards compatible manner, and a “PATCH version” signifies backwards compatible bug fixes. Additionally, the label “pre-release” can also be used, and is defined as a version that indicates “…that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version.”

In this project, I explored the number and frequency of updates in a project currently under development to characterize the update requirements once the project will be “done” and in production. I used the dependency manifest file from the frontend of Government of Canada’s Tracker project as a representative example of a modern software project. Refer to the Methods section at the end of this post for details on the steps to gather and to clean the data.

Exploring the data

Number of packages

The dependency manifest file uses a total of 72 different packages. As those were created and updated through time, Figure 1 demonstrates that the annual number of updates increased from 2010 to 2018.

Number of versions

The 72 packages used in the dependency manifest file were updated through a total of 7,461 versions between December 29th, 2010 and December 31st, 2020. Those updates included major versions, minor versions, patches and pre-release versions. Figure 2 demonstrates that most of those updates were patches (40.4%), followed by pre-release versions (36.2%), minor versions (19.8%) and, finally, major versions (3.6%).

I then removed the pre-release versions from the remaining of the analysis as I was mainly interested in the maintenance of the dependency manifest file once in production leaving a total of 4,763 stable versions (i.e., major, minor and patches). Figure 3 demonstrates that the number of stable versions (major, minor and patches) increased between 2010 and 2015 but has been relatively stable since then.

Versions per packages

Figure 4 is a plot of the number of versions (major, minor and patches) for each package since December 29th, 2010. The package with the most versions is at the top (webpack with 319 versions) and the package with the least versions is at the bottom (webpack-config-utils with 6 versions). Figure 4 also demonstrates the variability in the number of versions among packages. Note that not all packages existed in December 2010 and the creation date of packages might account for some of this variability.

Figure 5 is a heatmap of the number of versions per year per package with packages organised in alphabetical order on the y-axis. The more versions in a year for a given package, the darker the color (see legend). This figure demonstrates that packages were created in every year between 2011 and 2019 and that the number of versions per package per year is highly variable among packages and years. Some packages had most of their versions in the early years after their creation and are no longer updated (e.g., babel-core), others had most versions in recent years (e.g., eslint-plugin-react) and others had no clear trends (e.g., d3). Also note that since 2018, some packages switched to scope packages such as babel-core which is now @babel/core. For the purpose of this analysis, those packages are treated as different packages.

Implications for software maintenance

As the number of stable updates seems relatively stable since 2015, the remaining of the analyses focused on data from 2015 to 2020. Number of updates per year and type, and totals, are detailed below.

year major minor patch total
2015 22 189 433 644
2016 48 207 380 635
2017 45 222 418 685
2018 50 242 440 732
2019 41 230 456 727
2020 45 214 372 631
total 251 1304 2499 4054

Based on these numbers, the annual, monthly and weekly number of updates were calculated:

version_type n annual average monthly average weekly average
major 251 41.8 3.5 0.8
minor 1304 217.3 18.1 4.2
patch 2499 416.5 34.7 8.0

Based on these averages, and assuming that they are representative of the years to come, once in production, the Tracker project will require maintenance and continued development as new versions will continue to be made available. Figure 6 illustrates the number of new patches, minor versions, major versions and their total over a year.

Conclusions

Once completed, the Tracker project will require maintenance and continued development. For instance, if left unmaintained for three months, one could expect the project to lag by a total of 156 versions (96 patches, 50 minor versions and 10 major versions). Now, assuming that version numbering for the dependencies used in the project were done according to the semantic versioning guidelines, not all new versions imply to need for the immediate intervention of a programmer. Patches are expected to fix bugs and to be backwards compatible. Minor patches are expected to add functionality and to be backwards compatible. However, major versions are expected to include incompatible API changes that would require the immediate intervention of a programmer. So in other words, maintenance would be required every one to two weeks. Note that this only accounts for the direct dependencies of the dependency manifest file from the frontend of Government of Canada’s Tracker project

Methods

Retrieving a list of dependencies

Javascript projects declare their dependencies in a manifest format called “package.json”. The dependency manifest file from the frontend of Government of Canada’s Tracker project was retrieved using curl with the following command:

curl -sL "https://raw.githubusercontent.com/canada-ca/tracker/003a3d6e0452fb9a248a7d51fd5556bc03dfbcc6/frontend/package.json" > data/raw/tracker-frontend-package.json

Extracting all versions to a CSV file

Using the following Bash command, package names (both development and production) were extracted from the tracker-frontend-package.json and details for each was retrieved from the npm API on January 1st, 2021.

The command uses jq to select dependency name from the json and format the API response as CSV. Once again curl was used to retrieve data from the API. The generated CSV contained the name of the package, the version and a time stamp.

for pkg in $(cat data/raw/tracker-frontend-package.json | jq '.dependencies * .devDependencies | keys[]' | tr -d \"); do $(curl -sL "https://registry.npmjs.org/$pkg" | jq -r --arg  name "$pkg" '.time | to_entries[] | [$name, .key, .value] | @csv' >> data/processed/nodejs.csv && sleep 1 ); done

Cleaning the CSV file

The data needed a little attention before conducting analysis. The reader is referred to the Rmarkdown file for code and details.

Briefly, the data was cleaned in the following five main steps:

  1. read the CSV file and specify column names;
  2. removed rows that did not contain data - for each package, the “time” returned a series of keys (version numbers but also a “created” and a “modified”) and values (time stamp) - it was assumed that the row with “created” and the earliest version of a package were on the same day and that the “modified” and the latesst version of a package were on the same day;
  3. extracted the date from the time stamp, extracted the year as a factor, extracted the month as a factor, extracted the day of the week as a factor and parsed the version number to only include the first three numbers and the dots separating the numbers;
  4. extracted the version numbers into separate columns for major, minor and patch versions using the semver library; and
  5. add a new column with the type of versions (major, minor, patch or pre-release).

The cleaned dataset contains the following 13 columns: package_name, version, time_stamp, date, time, year, month, major, minor, patch, prerelease, build and version_type.

Acknowledgments

Thank you M. Williamson for the raw data and L. Puts for help with some plots.