Search TXNP

< More How to's

Friday, January 19, 2018

Share: facebooktwitterdigg

How well does testing work?
Mal Warwick

November, 2007

For direct mail fundraisers, testing is the stuff of which great success stories are built, brick by brick. Uncounted millions of dollars have been raised as a result of careful, step-by-step improvements in direct mail list selection, suggested gift levels, premium offers, packaging, postage choices, and myriad other elements.

But testing has its limits. And sometimes those limits can be downright discouraging.

One test conducted by a fast-growing West Coast environmental organization illustrates just how confusing and seemingly contradictory direct mail test results can be.

In a 149,000-piece membership acquisition mailing, the organization chose to test the impact of using a newly redesigned logotype. They selected at random two panels of 30,000 addresses each. One panel (the “control”) was mailed an acquisition package using the old logo; letters sent to the second panel (the “test”) used the new logo.

For an outsider, there was nothing very interesting in that. What made the test noteworthy was this: Following the advice of their direct mail fundraising counsel, the organization selected a third panel as a double-check. This third panel (a so-called “null test”) also consisted of 30,000 addresses; it was statistically indistinguishable from the control panel. All 60,000 letters in those two groups were sent identical letters employing the organization’s old logo.

When all three panels were mailed simultaneously, the null test panel outpulled the control panel by a larger margin than the difference between the control and the test! The 30,000 letters comprising the null test panel generated 319 gifts, more than the control, which yielded just 281—even though the packages mailed to the same group were totally identical.

But not so fast! As a statistician would tell you, the difference between those two numbers (281 and 319) is not very meaningful.

By contrast, the difference between the new logo and the old—when viewing the disparity between the test panel and the null panel (272 vs. 319 gifts)—is actually more significant. But even that difference doesn’t ring the bell of statistical certainty.

In other words, from a statistical perspective, none of the results of this test were meaningful.

But if you follow instinct and common sense rather than the precepts of statistics, you would be quickly forgiven if you wonder nevertheless why there was such a seemingly big difference between the control panel (old logo) and the null test panel (also old logo). This clearly seems to be an anomaly. What, then, might explain it?

Take your pick of the following three possible explanations:

The lettershop or the postal service screwed up—dropping the test panels on different days or improperly packaging or processing them.

Statistics is not the exact science it’s cracked up to be. What’s supposed to happen 95 out of 100 times actually happens less often than that.

Whatever the reason, these unpredictable results should drive home a fundamental lesson of direct mail testing: If it’s important enough to test once, test it again to be sure. Because what you see is not necessarily what you’ll get.


When direct mail fundraisers get together, we talk about testing. Because when you get right down to cases, testing is all we’ve got to talk about. Or so it seemed, at any rate, at a fundraising forum held in New York by Moore Response Marketing Services. A report on the discussion came to me from Karen Hill at Moore Response in the form of detailed notes shortly after the session, and they proved all over again that direct mail fundraisers shouldn’t take anything for granted.

For example, one major mailer reported improved results from eliminating Basic/Residual-sorted names from marginal lists: These names proved to have a lower response rate than names that qualified for greater postal discounts. The procedure reduced the organization’s mailing cost and boosted the overall response rate.

However, another mailer had the opposite experience: Basic/Residual-sorted names were very responsive. Eliminating them from its mailings would have caused a significant revenue loss.

Why the difference? One possible explanation identified by participants in the forum was that the first mailer worked primarily in urban areas, with names concentrated in relatively few ZIP codes, while the second mailer relied on more widely scattered rural respondents.

In other words, what works for one organization may not work for another—and blindly following another’s test results may be dangerous to your fiscal health.

Other highlights from the forum:

One experienced mailer advised caution in the headlong rush to downsize packages now classified as “flats” under USPS rules. Despite high postage rate, flats sometimes still pay for themselves by yielding higher response (perhaps because they face less competition and stand out more in the mail). Another mailer found a second reason not to abandon flats: They yield higher-dollar donors, who have proven better lifetime givers for that organization—suggesting that the added investment in flats will pay off in the long run.

A merge-purge bureau had suggested deleting names of deceased persons from a customer’s file by using an updated proprietary file (presumably, a File of the Dead). But the customer re-keyed and mailed those names “just one more time”—and the results were equal to or better than those from the rest of the mailing!

A large charity with many local chapters tested whether donors would respond better to direct mail appeals identified as coming from a local chapter rather than from the national organization. As is typically the case in the commercial world, there was no discernible difference.

One participant wisely urged large mailers to track donors back to the sources of their original gifts—and to differentiate in particular between disaster-related sources and those acquired by less urgent appeals. Five years of testing showed that lists yielding higher average gifts (and lower response rates) were a better long-term investment than lists producing lower average gifts (and higher response rates). One mailer experimented with recyclable paper and inks, and found that both can make a difference in response. Unfortunately, it wasn’t the response environmentalists would have wanted: Both changes decreased response. But another organization reported no meaningful difference as a result of using recyclable materials.

Twelve major direct mail fundraisers were represented at the forum, including UNICEF, National Jewish Center for Immunology, Father Flanagan’s Boys’ Home, American Red Cross, CARE, Smithsonian Institution, and Society for the Right to Die.

This article was excerpted from Testing, Testing, 1, 2, 3: Raise More Money with Direct Mail Tests by Mal Warwick (Jossey-Bass Publishers, 2003). Copyright © 2003 by Mal Warwick. To order this book click here.

This article was reprinted with permission from Mal Warwick. Consultant, author, and public speaker Mal Warwick has been involved in the not-for-profit sector for more than 40 years. He has written or edited seventeen books of interest to nonprofit managers. He has taught fundraising on six continents to nonprofit executives from more than 100 countries. Copyright (c) by Mal Warwick. All rights reserved.


Your TXNP Weekly E-Newsletter is made possible by the generosity of:

FROST in many Texas cities

TXNP Professional Members Are Dedicated to Texas and Texans.

Aurora Grants & Consulting |Dawson Murray Teague Communications | ELITE Research | FOR THE PHILANTHROPIST | Graystone Consulting | J A Churchill Associates | John F. Lewis PC | McConnell & Jones LLC

Sign up for your personal TXNP E-Newsletter

at-t Meadows Foundation express news HOBLITZELLE FOUNDATION v greenly zachry foundation w b h b bank of america southwest airlines Sid W. Richardson Foundation forst