Accelerating software development with AI: a controlled experiment Archives

We therefore ran a controlled experiment, pitting a team of AI-assisted software engineers and quality analysts against a ‘human-powered’ group who were only allowed to use their own brains.

Here’s what we did and what we discovered…

How we ran the experiment

In all, a total of 52 of our people took part in our experiment, over a series of two hackathons.

We split them into two groups:

One assisted by AI tools, with 30 engineers and 8 automation quality analysts (QAs)
A control group that was purely human-powered, with 12 engineers and 2 automation QAs – allowing us to baseline the potential gains of using AI

We gave each group three hours to perform the same task.

For the engineers, the task was:

Develop a .NET Web API to dynamically process mathematical expressions
Implement a custom PEMDAS-based algorithm for expression evaluation, without using any third-party libraries
Write unit tests to validate the functionality of the solution
Test the implementation using five provided edge cases

And for the automation QAs:

Write three automated functional test scripts for specified scenarios
Test these functions using the provided test web shop application

The tools used by the AI-assisted team were:

GitHub Copilot
Cursor IDE (using ChatGPT)
Qodo (was Codium)
Tabnine

The results

We expected the results of this experiment to be positive in favour of the AI team, but we were still amazed by the difference the AI tools made in terms of enhancing speed and quality.

Here are the overall outcomes we recorded…

On average, when compared to our human-powered team, our AI-assisted engineers were able to:

• Complete the coding nearly 2x (44%) faster

• Conduct the unit tests just over 2x (51%) faster

• Cover nearly twice as many (83%) more edge cases

And when we compared our fastest human-powered engineer with our fastest AI-assisted engineer, the results were even more impressive: the AI-assisted engineer was nearly 5x (78%) faster.

We also compared a human-powered engineer with one who already had experience using AI tools (in this case, GitHub Copilot). We found that:

• For the coding, the AI-powered engineer was 4x (75%) faster

• For the unit tests, the AI-powered engineer was 6x (83%) faster

This demonstrated to us that, as our team of engineers become more experienced with AI tools, our productivity gains will increase even further.

For the QAs, we also saw a significant improvement in the times it took the AI-assisted analysts: the average overall time was just over 2x (54%) faster with AI.

And, as with the engineers, we compared the two fastest times, and found that the first AI-assisted QA to complete the task was 9x (89%) faster.

A comparison of AI tools

We also aimed to make some comparisons between the four different AI tools that we used in the experiment, in particular with regard to user experience, productivity gains, and security and IP protection.

For user experience: GitHub Copilot came out on top. Our developers rated it as a robust and mature tool, suited for .NET application development. It offered consistent suggestions and responses as well as strong context management. Cursor and Codium came in joint second place.

For productivity gains: Cursor came out on top, allowing our team to be 3.2x (69%) faster than human-only developers when it came to completing the full task. GitHub Copilot was in second place, making the team 2.7x (62%) faster.

For security and IP protection , we found the following:

• GitHub Copilot transmits code snippets from the integrated development environment (IDE) to GitHub in real time to generate relevant suggestions. Once the suggestion is created, both the prompt and the suggested code snippet are immediately discarded—but note that this is only the case for the Business and Enterprise licence options.

• Cursor provides a Privacy Mode that can be activated during the onboarding process, ensuring that no code is stored on their servers or by their sub-processors.

• Qodo: Paid licence user data is not used to train its AI models. The data is deleted from their storage after 48 hours. Also,
they provide an option for a zero-retention policy, where data is removed immediately if users specifically request.

All three tools are certified for SOC2 compliance.

(Note that we didn’t assess Tabnine as we felt the model wasn’t mature enough and its users struggled to complete the task.)

Conclusion and our next steps

Our experiment made it clear that AI could offer us some huge benefits in productivity and quality. In every aspect of the tests we conducted, from coding to unit tests to automated test script production, there was a clear time saving – in most cases very significant. It will also enable us to improve the quality of our outputs, as AI-generated code was able to give us broad edge case test coverage.

Furthermore, we expect our efficiency gains to improve further as it is clear that development speed increases with experience when it comes to using AI tools. We found that just a short amount of training significantly accelerates outputs.

As for the future… Our product owner colleagues also ran an experiment to understand how AI can accelerate and improve the product discovery process. We are now looking at how we can use their AI-generated requirements as prompts to build applications – ultimately with the possibility of using AI-assisted processes from an initial description of requirements right through to final outputs.

Meanwhile, right now, we’re already starting to reap the benefits of AI-assisted development with some of our clients, delivering even greater value for them.

If you’d like to find out more about how AI-assisted software development can benefit your business, get in touch now.

Aleksandar Karavasilev, CTO at Damilah

But my husband, who at the time had more experience than me in using AI, suggested I was asking the wrong question. He recommended that I reframe it as: “Are there any scientific articles that prove placing a chopped onion in a room will help with a cough?”

This time, the response was far more credible and useful (and, it turns out, onions do really help).

I learned an important lesson here: although the potential for AI is enormous, when most people engage with it for the first time, it’s usually in a superficial way, often leading to poor results. In order to gain maximum value from the tools, and improve outputs, it’s worth learning the best ways to provide context and specificity, while also asking for an answer based on relevant sources.

Putting AI to the test

At Damilah, we’re very excited about the transformative potential of AI. We have, therefore, been exploring how it can augment our business activities and help deliver greater value to our clients. For example, our engineering teams ran a series of hackathons to calculate whether using AI tools could accelerate software development and improve quality (in a nutshell: yes, it can – hugely).

At the same time, we wanted to test whether AI could do the equivalent for our product discovery and inception processes – and if so, in what ways. So we ran an additional hackathon to examine this. In it, we asked three teams to work on a fictional brief, using a variety of AI tools, including ChatGPT and Perplexity (using Claude).

As with our engineering colleagues, the results were astounding.

We discovered that AI could significantly reduce the amount of time we spent on discovery – by anywhere from 20% to 50%, depending on the task and the tools being used – and with results that matched the quality of the work done without the assistance of AI.

And, most importantly of all, it revealed areas where AI could free us from routine, repetitive work, allowing our people to focus on higher-value activities.[

Accelerating and improving workflows

Specifically, we identified two primary ways that AI can accelerate and improve workflows:

Jump-starting a project: For example, when preparing for a client interview, we can use AI to generate an initial list of questions based on the context we provide. These AI-generated prompts serve as a springboard, helping us refine ideas faster, and ensure we’ve covered everything.
Enhancing existing work: In other cases, we can input a draft of some work we’ve already created, prompting the tool to polish and improve it, as well as asking whether we may have missed something. This approach allows us to benefit from AI’s ability to enhance clarity and suggest useful amendments and additions.

Another of our most impactful findings from the hackathon was the way in which we could use AI to accelerate the creation of wireframes with Figma. This aided our conversations with developers while showing similar levels of quality outcomes compared to when we use the traditional discovery processes.

And a real game-changer has been using AI tools for writing acceptance criteria. Traditionally, creating detailed, actionable user stories (which list the requirements that a developer has to meet) is time-consuming and mentally draining. Now, however, by giving a well-crafted prompt to an AI tool, we can generate acceptance criteria in a matter of minutes. This not only saves time but also ensures consistency, freeing our teams to focus on other priorities (more on that later).

AI as an enabler

Despite these extremely encouraging results, we aren’t getting carried away with AI. While it accelerates and enhances many of our processes – and we’re already using it to speed up workflows in live environments with clients who have agreed to us using AI tools – we also believe there should always be a human involved every step of the way.

For us, quality is paramount, so our product owners will always review and refine the AI outputs to ensure they meet our standards and our clients’ needs.

We’re also conscious that an over-reliance on AI may lead to diminished problem-solving skills – a phenomenon akin to forgetting basic arithmetic because we’re accustomed to using calculators. To counter this, we view AI as an enabler, not the be-all and end-all, so will always ensure our people develop and maintain those key analytical capabilities. Above all, AI’s purpose is to enhance human creativity and decision-making, not replace them.

Furthermore, we’re wary of the common problem of ‘garbage in, garbage out’. That is, as I found out with my first experience of AI, it’s essential to take the time to learn how to craft well thought-through prompts and to train the model. AI tools can only become that valuable enabler and accelerator if we ensure we have the skills and patience to do this.

Focusing on value-creation

By learning to use AI in the most effective ways to perform routine and time-consuming tasks, we’re enabling many of our people to spend more time focusing on high-value activities, such as deepening their market understanding, engaging more effectively with stakeholders and shaping product roadmaps.

And, perhaps most excitingly of all, it means we can experiment with bold ideas that we perhaps wouldn’t have risked testing previously as we’d be concerned about the time it would consume. Instead, it enables us to ‘fail fast’ in our search for innovative solutions that genuinely solve our clients’ problems and help them to meet their objectives.

In fact, if I was a potential client looking for a software development partner, I’d always choose a firm that has already successfully established AI tools into its processes. That’s because their teams will be unburdened by all the mundane, repetitive work, and able to truly focus on building a highly creative partnership that delivers outstanding results.

To discuss how our AI-accelerated workflows enable us to deliver greater value for your business, get in touch now.

Iskra Ristovska, Principal Product Owner at Damilah