...

Accelerating software development with AI: a controlled experiment

We all know that AI is transforming the way software is developed. But how many of us are clear on exactly what kinds of business benefits it can deliver?

We wanted to develop a better understanding of this, to ensure we maximise the productivity and quality gains AI is able to deliver, and also to be able to pass on our learnings onto our clients.

We therefore ran a controlled experiment, pitting a team of AI-assisted software engineers and quality analysts against a ‘human-powered’ group who were only allowed to use their own brains.

Here’s what we did and what we discovered…

How we ran the experiment

In all, a total of 52 of our people took part in our experiment, over a series of two hackathons.

We split them into two groups:

  • One assisted by AI tools, with 30 engineers and 8 automation quality analysts (QAs)
  • A control group that was purely human-powered, with 12 engineers and 2 automation QAs – allowing us to baseline the potential gains of using AI

We gave each group three hours to perform the same task.

  • Develop a .NET Web API to dynamically process mathematical expressions
  • Implement a custom PEMDAS-based algorithm for expression evaluation, without using any third-party libraries
  • Write unit tests to validate the functionality of the solution
  • Test the implementation using five provided edge cases
  • Write three automated functional test scripts for specified scenarios
  • Test these functions using the provided test web shop application
  • GitHub Copilot
  • Cursor IDE (using ChatGPT)
  • Qodo (was Codium)
  • Tabnine

The results

We expected the results of this experiment to be positive in favour of the AI team, but we were still amazed by the difference the AI tools made in terms of enhancing speed and quality.

Here are the overall outcomes we recorded… 

On average, when compared to our human-powered team, our AI-assisted engineers were able to:

• Complete the coding nearly 2x (44%) faster

• Conduct the unit tests just over 2x (51%) faster

• Cover nearly twice as many (83%) more edge cases

And when we compared our fastest human-powered engineer with our fastest AI-assisted engineer, the results were even more impressive: the AI-assisted engineer was nearly 5x (78%) faster.

We also compared a human-powered engineer with one who already had experience using AI tools (in this case, GitHub Copilot). We found that:

• For the coding, the AI-powered engineer was 4x (75%) faster

• For the unit tests, the AI-powered engineer was 6x (83%) faster

This demonstrated to us that, as our team of engineers become more experienced with AI tools, our productivity gains will increase even further.

For the QAs, we also saw a significant improvement in the times it took the AI-assisted analysts: the average overall time was just over 2x (54%) faster with AI.

And, as with the engineers, we compared the two fastest times, and found that the first AI-assisted QA to complete the task was 9x (89%) faster.

A comparison of AI tools

We also aimed to make some comparisons between the four different AI tools that we used in the experiment, in particular with regard to user experience, productivity gains, and security and IP protection.

For user experience: GitHub Copilot came out on top. Our developers rated it as a robust and mature tool, suited for .NET application development. It offered consistent suggestions and responses as well as strong context management. Cursor and Codium came in joint second place.

For productivity gains: Cursor came out on top, allowing our team to be 3.2x (69%) faster than human-only developers when it came to completing the full task. GitHub Copilot was in second place, making the team 2.7x (62%) faster.

For security and IP protection , we found the following:

GitHub Copilot transmits code snippets from the integrated development environment (IDE) to GitHub in real time to generate relevant suggestions. Once the suggestion is created, both the prompt and the suggested code snippet are immediately discarded—but note that this is only the case for the Business and Enterprise licence options.

Cursor provides a Privacy Mode that can be activated during the onboarding process, ensuring that no code is stored on their servers or by their sub-processors.

Qodo: Paid licence user data is not used to train its AI models. The data is deleted from their storage after 48 hours. Also,
they provide an option for a zero-retention policy, where data is removed immediately if users specifically request.

All three tools are certified for SOC2 compliance.

(Note that we didn’t assess Tabnine as we felt the model wasn’t mature enough and its users struggled to complete the task.)

Conclusion and our next steps

Our experiment made it clear that AI could offer us some huge benefits in productivity and quality. In every aspect of the tests we conducted, from coding to unit tests to automated test script production, there was a clear time saving – in most cases very significant. It will also enable us to improve the quality of our outputs, as AI-generated code was able to give us broad edge case test coverage.

Furthermore, we expect our efficiency gains to improve further as it is clear that development speed increases with experience when it comes to using AI tools. We found that just a short amount of training significantly accelerates outputs.

As for the future… Our product owner colleagues also ran an experiment to understand how AI can accelerate and improve the product discovery process. We are now looking at how we can use their AI-generated requirements as prompts to build applications – ultimately with the possibility of using AI-assisted processes from an initial description of requirements right through to final outputs.

Meanwhile, right now, we’re already starting to reap the benefits of AI-assisted development with some of our clients, delivering even greater value for them.

If you’d like to find out more about how AI-assisted software development can benefit your business, get in touch now.

Aleksandar Karavasilev, CTO at Damilah