A new vendor survey reports a significant disconnect between major investments in generative AI and the adoption of quality assurance practices within the software development lifecycle (SDLC).
The “State of Digital Quality in AI Survey,” released by Boston-based digital quality and crowdsourced testing company Applause, states that considering the rapid, global rise of genAI apps and agentic AI, which enables autonomous decision-making and execution without human intervention, “rigorous crowdtesting throughout the SDLC is critical to mitigating expanding risks associated with the technology.”
More than 4,400 independent software developers, QA professionals and consumers worldwide participated in the survey, which explored common AI use cases, tools and challenges, as well as user experiences and preferences.
Findings:
Many organizations have been slow to adopt embedding AI throughouttheir integrated development environament:
- Over half of the software professionals surveyed believe genAI tools improve productivity significantly, with 25% estimating a boost of 25-49% and another 27% seeing increases of 50-74%.
- Yet, 23% of software professionals say their integrated development environment (IDE) lacks embedded Gen AI tools (e.g., GitHub Copilot, OpenAI Codex), 16% aren’t sure if the tools are integrated with their IDE, and 5% have no IDE.
- While red teaming, or adversarial testing, is a best practice to help mitigate risks of inaccuracy, bias, toxicity and worse, only 33% of respondents reported using this technique.
- The top AI testing activities involving humans include prompt and response grading (61%), UX testing (57%) and accessibility testing (54%). Humans are also essential in training industry-specific or niche models; 41% of developers and QA professionals lean on domain experts for AI training.
Businesses are investing heavily in AI to enhance customer experiences and reduce operational costs – but flaws still reach users.
- Over 70% of developers and QA professionals who responded said their organization is developing AI applications and features. Chatbots and customer support tools are the top AI-powered solutions being built (55%). And, just over 19% have started to build AI agents.
- Within the past three months, 65% of users reported that they have encountered problems using Gen AI, including responses that lacked detail (40%), misunderstood prompts (38%), showed bias (35%), contained hallucinations (32%), were clearly incorrect (23%) or included offensive content (17%). Only 6% fewer people experienced hallucinations since last year’s survey.
- Gen AI users are fickle, as 30% have swapped one service for another, and 34% prefer different Gen AI services for different tasks.
“The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications,” said Chris Sheehan, EVP of High Tech & AI, Applause. “Given massive investment in the technology, we’d like to see more developers incorporate AI-powered productivity tools throughout the SDLC, and bolster reliability and safety through rigorous end-to-end testing. Agentic AI is ramping up at a speed and scale we could hardly have imagined, so the risks are now amplified. Our global clients are already ahead of the curve by baking broad AI testing measures into development earlier, from training models with diverse, high-quality datasets to employing testing best practices like red teaming.”
Additional insights:
- Consumer demand for multimodal capabilities has increased.
78% of consumers say multimodal functionality or the ability to interpret multiple types of media is important to them in a Gen AI tool, compared with 62% last year. - GitHub Copilot (37%) and OpenAI Codex (34%) are still the AI-powered coding tools of choice.
They were the favorites in 2024, too, but the gap between their usage is closing. Last year, GitHub Copilot was preferred by 41% of respondents, and OpenAI Codex by just 24%. - QA professionals are turning to AI for basic support of the testing process.
The top three use cases are test case generation (66%), text generation for test data (59%) and test reporting (58%).
Sheehan continued, “Enterprises best positioned to capture value with customer-facing generative AI applications understand the important role human intelligence can play. While every generative AI use case requires a custom approach to quality, human intelligence can be applied to many parts of the development process including model data, model evaluation and comprehensive testing in the real world. As AI seeps into every part of our existence, we need to ensure these solutions provide the exceptional experiences users demand while mitigating the risks that are inherent to the technology.”
The AI Survey is part of the State of Digital Quality content series from Applause. The annual State of Digital Quality Report draws on Applause’s experience serving global enterprises and technology leaders for more than 15 years, including many AI innovators. Based on in-depth analysis of testing platform data, survey results and interviews with customers and internal experts, the report provides guidance on how organizations investing in AI and other technologies can gain the most value.
Source: https://insideainews.com/