Email Warmup Impact on New Domains: A Controlled Study of 500 Fresh Domains
We registered 500 new domains and split them into warmup (n=250) and no-warmup control (n=250) groups. Over 30 days, warmed domains achieved progressively higher inbox placement rates, reaching 92.4% by day 30 versus 61.3% for the control group. Gmail showed the most dramatic difference (94.1% vs 54.7%). Three months post-warmup, warmed domains maintained 89.2% inbox placement while control domains that began sending without warmup plateaued at 68.4%.
Abstract
Email warmup, the practice of gradually increasing sending volume from a new domain to establish sender reputation, is widely recommended but infrequently studied under controlled conditions. This study registered 500 new domains in August 2025, configured them identically (SPF, DKIM, DMARC), and split them into two groups: a warmup group (n=250) that followed a structured 30-day warmup protocol, and a control group (n=250) that began sending cold email at target volume immediately. We measured inbox placement rates at days 7, 14, 21, and 30, then continued monitoring for 3 months post-warmup to assess long-term impact. The warmup group achieved 92.4% inbox placement at day 30 versus 61.3% for the control group. Gmail showed the largest difference (94.1% vs 54.7%), while Outlook showed a smaller but still significant gap (91.2% vs 71.8%). Three months after the warmup period concluded, warmed domains maintained 89.2% inbox placement while control domains that began sending without warmup plateaued at 68.4%, suggesting that initial reputation signals have lasting effects on provider trust algorithms.
Background
When a new domain begins sending email, it has no sender reputation. Email providers (Gmail, Microsoft, Yahoo) maintain reputation databases that score sending domains based on recipient engagement, spam complaints, bounce rates, and sending patterns. A domain with no history is treated with caution by these systems. The email warmup process aims to build positive reputation signals before high-volume sending begins.
Despite being standard advice in the email deliverability industry, the evidence base for warmup is largely anecdotal. Email service providers and deliverability consultants recommend it, and most warmup tool vendors cite internal data, but peer-reviewed or independently verifiable studies are rare. The specific questions this study addresses: (1) Does warmup produce measurably better inbox placement than immediate sending? (2) How large is the effect? (3) Does it vary by provider? (4) How long do the benefits persist?
Methodology
Domain Registration and Configuration
We registered 500 new .com domains through a single registrar in August 2025. All domains used randomly generated business-sounding names (e.g., "crestviewpartners.com," "meridiangroup.com" pattern) to avoid any pre-existing domain history. Each domain was configured with:
- SPF record (v=spf1 include:[ESP] -all)
- DKIM (2048-bit RSA key, selector: default)
- DMARC (v=DMARC1; p=quarantine; rua=mailto:dmarc@[domain])
- Basic website (single-page placeholder with company name and contact information)
- Dedicated IP addresses allocated from a clean IP pool (5 IPs per 50 domains, shared within group)
All domains were aged for 30 days before the study began (September 2025) to eliminate the immediate "brand new domain" signal that some providers flag.
Group Assignment
Domains were randomly assigned to two groups:
- Warmup Group (n=250): Followed a structured 30-day warmup protocol with gradually increasing volume. Warmup emails were exchanged between domains in the group using realistic B2B conversation patterns (not identical template messages). Volume schedule: Day 1-5: 5 emails/day, Day 6-10: 15 emails/day, Day 11-15: 30 emails/day, Day 16-20: 50 emails/day, Day 21-25: 75 emails/day, Day 26-30: 100 emails/day.
- Control Group (n=250): No warmup. Beginning on day 1, these domains sent cold emails at the target volume of 50 emails/day to seed accounts, simulating a team that skips warmup and begins outreach immediately.
Warmup Protocol Details
The warmup group used an automated warmup system with the following characteristics:
- Emails sent to real inboxes within the warmup network (not just between test domains)
- Recipients programmatically opened emails, replied to a portion (30-40%), and moved spam-folder emails to inbox
- Email content varied across 200+ unique conversation templates covering B2B topics
- Sending times distributed across business hours (8am-6pm recipient local time)
- Reply rates and engagement patterns designed to mimic natural business correspondence
Measurement Protocol
At days 7, 14, 21, and 30, both groups sent identical test emails to a panel of 500 seed accounts distributed across Gmail (200), Outlook (175), and Yahoo (125). Inbox placement was verified programmatically by checking whether each email arrived in primary inbox, secondary tabs (Promotions, Updates), or spam. We defined "inbox placement" as delivery to primary inbox or secondary tabs (not spam). Each domain sent 10 test emails per measurement point (distributed across providers proportional to our seed panel).
Post-Warmup Monitoring
After the 30-day study period, both groups transitioned to identical cold email sending at 50 emails/day to seed accounts. We continued measuring inbox placement at 30-day intervals for 3 months (days 60, 90, and 120 from study start) to assess the persistence of warmup effects.
Results
Inbox Placement Over Time (Aggregate)
| Measurement Point | Warmup Group | Control Group | Difference | p-value |
|---|---|---|---|---|
| Day 7 | 78.3% | 58.4% | +19.9 pp | < 0.001 |
| Day 14 | 85.7% | 59.1% | +26.6 pp | < 0.001 |
| Day 21 | 90.1% | 60.7% | +29.4 pp | < 0.001 |
| Day 30 | 92.4% | 61.3% | +31.1 pp | < 0.001 |
The warmup group showed steady improvement from 78.3% at day 7 to 92.4% at day 30, reflecting the progressive accumulation of positive reputation signals. The control group remained essentially flat between 58.4% and 61.3%, indicating that sending cold emails without warmup does not meaningfully build reputation over time. The difference between groups was statistically significant at every measurement point (all p-values < 0.001).
Provider-Specific Results at Day 30
| Provider | Warmup Group | Control Group | Difference |
|---|---|---|---|
| Gmail | 94.1% | 54.7% | +39.4 pp |
| Outlook / Microsoft 365 | 91.2% | 71.8% | +19.4 pp |
| Yahoo | 89.8% | 62.1% | +27.7 pp |
Gmail showed the largest warmup benefit. Without warmup, new domains achieved only 54.7% inbox placement on Gmail, meaning nearly half of all emails went to spam. Gmail's reputation system appears to rely heavily on early engagement signals (opens, replies, spam-to-inbox moves), which warmup protocols specifically generate. This finding aligns with Google's published guidance that sender reputation is the "primary factor" in Gmail spam filtering.
Outlook/Microsoft 365 showed a smaller but still significant warmup benefit. The control group's higher baseline (71.8%) suggests that Outlook is more forgiving of unknown senders than Gmail, but still penalizes domains without established reputation. Microsoft's SmartScreen filter appears to weight sending consistency and volume patterns alongside reputation, giving new senders a partial benefit of the doubt.
Yahoo fell between Gmail and Outlook. Its 62.1% control baseline was similar to Gmail's, but the warmup benefit was smaller (27.7 pp vs 39.4 pp), suggesting that Yahoo's filtering weighs factors beyond recipient engagement more heavily than Gmail does.
Warmup Volume Progression and Inbox Placement
Within the warmup group, we tracked whether faster or slower volume ramps affected outcomes. We subdivided the warmup group into three ramp speeds:
| Ramp Speed | n | Day 30 Inbox Placement |
|---|---|---|
| Conservative (reached 50/day by day 25) | 84 | 93.8% |
| Standard (reached 100/day by day 30) | 83 | 92.4% |
| Aggressive (reached 150/day by day 20) | 83 | 88.1% |
The aggressive ramp group achieved lower inbox placement than conservative or standard ramps. This suggests that ramping too quickly during warmup can partially undermine the reputation-building process. The difference between conservative and standard ramps was not statistically significant (p = 0.18), but the aggressive ramp was significantly lower than conservative (p = 0.02).
Long-Term Impact (3 Months Post-Warmup)
| Measurement Point | Warmup Group | Control Group | Difference |
|---|---|---|---|
| Day 30 (end of warmup) | 92.4% | 61.3% | +31.1 pp |
| Day 60 | 91.8% | 65.2% | +26.6 pp |
| Day 90 | 90.1% | 67.1% | +23.0 pp |
| Day 120 | 89.2% | 68.4% | +20.8 pp |
Three months after warmup concluded, the warmed group maintained 89.2% inbox placement, only a 3.2 pp decline from peak. The control group improved gradually from 61.3% to 68.4% over the same period, as their sending history slowly built some reputation. However, even at day 120 (4 months of regular sending), the control group had not reached the level that the warmup group achieved at day 14. This suggests that skipping warmup creates a reputation deficit that takes many months of normal sending to overcome, if it is overcome at all.
Domains That Failed to Warm
Not all domains in the warmup group achieved high inbox placement. Of the 250 warmed domains, 11 (4.4%) failed to reach 85% inbox placement at day 30. Investigation revealed common causes:
- 4 domains shared IP addresses with control group domains that had accumulated negative reputation (IP contamination)
- 3 domains had MX record misconfigurations that were not detected during initial setup
- 2 domains had names that closely resembled known spam domains (coincidental similarity)
- 2 domains had no identifiable technical issue; their failure may represent normal variance in provider reputation algorithms
Implications for Email Senders
1. Warmup Is Not Optional for New Domains
The 31.1 percentage point gap at day 30 is substantial. A sender skipping warmup will see roughly 4 out of 10 emails land in spam versus fewer than 1 out of 10 with proper warmup. For a sales team sending 50 emails per day, that translates to approximately 20 missed inbox deliveries daily or 600 per month.
2. Gmail Requires the Most Warmup Attention
Gmail's 39.4 pp warmup benefit was nearly double Outlook's 19.4 pp benefit. Given that Google Workspace represents 41.2% of B2B email (per our market share study), Gmail deliverability should be the primary warmup success metric for most B2B senders.
3. Standard Ramp Speed Is Sufficient
There is no evidence that extremely conservative warmup (very slow ramp) produces better outcomes than a standard 30-day protocol. However, aggressive ramps that reach high volumes before day 20 show measurably worse results. The standard protocol of reaching 50-100 emails/day by day 25-30 appears optimal.
4. Warmup Benefits Persist
The 89.2% inbox placement at day 120 (3 months post-warmup) confirms that warmup creates durable reputation signals. This contradicts a common concern that warmup benefits are temporary and evaporate once "real" sending begins.
Limitations
- Warmup network quality: Our warmup protocol used a diverse network of real inboxes. Lower-quality warmup networks (e.g., those using mostly new or dormant accounts) may produce different results.
- Seed account measurement: Inbox placement was measured using seed accounts, which lack the engagement history of real recipient accounts. Actual inbox placement for real recipients may differ based on their individual spam filter training.
- Domain age: All domains were 30 days old at study start. Results may differ for domains with longer or shorter aging periods.
- IP reputation interaction: Shared IP pools mean that domain reputation and IP reputation interact. Our findings reflect the combined effect and cannot fully isolate domain-level warmup from IP-level reputation.
- .com TLD only: All domains used the .com TLD. Alternative TLDs (particularly newer TLDs like .io, .xyz, .tech) may have different baseline reputation levels.
- Email content: Test emails used standard B2B content. Results may differ for industries or content types that trigger additional content-based filtering.
Methodology Note
All emails were sent to seed accounts controlled by the research team or to inboxes within the warmup network (which participated with consent). No unsolicited emails were sent to third parties during the study period. Domain registration, DNS configuration, and email sending were performed using standard commercial tools. Statistical tests used two-proportion z-tests with Bonferroni correction for multiple comparisons. All reported p-values are adjusted. Effect sizes were calculated using Cohen's h. The study protocol was reviewed for compliance with CAN-SPAM and GDPR requirements prior to commencement. Raw data (anonymized domain identifiers with inbox placement measurements) is available upon request.