Viewable With ANY Browser

Note: My Web pages are best viewed with style sheets enabled.

Unrated

Plain-Text vs HTML E-Mail

Sizes and Errors: 2025 Analysis

Copyright © 2025 by David E. Ross

Definitions

bloat
the increase in the size of an HTML-formatted message to convey the same textual content as a plain-text message
bloat factor
a measure of bloat, computed by dividing the size of the HTML-formatted message by the size of the plain-text message that has the same content. If a 3,000 byte HTML-formatted message has the same content as a 600 byte plain-text message, the bloat factor is 3,000 ÷ 600 = 5.0.
KB
kilobyte, 1,000 bytes

History

In 2025, I noticed a number of newsgroup messages about HTML-formatting of E-mail. I decided to see if there was any improvement in how this worked by collecting 20 HTML-formatted E-mail in late September and early October 2025, much of which was spam.

Findings

Conclusions


Methodology

I collected 20 messages that were either HTML-formatted or were two-part (containing both plain text and HTML). Making sure that no two messages were from the same source, I stored each message four times. I excluded any attachments (which, for HTML-formatted messages, means excluding any images or background). The first two of these were used in my analysis. The last two were saved in case I needed further information.

  1. I stored the readable content from the first line to the end of the message into a plain-text file.
  2. I stored the raw message (the source file) excluding any <x-html></x-html> tags and any <!Document> tags into an HTML file. I excluded those tags because not all HTML-formatted E-mail messages contain them, and I did not want comparison of the sizes of HTML markup to be skewed by such tags. This is especially important with <!Document> tags, which often have very extensive sets of attributes.
  3. I saved the source file, which in some cases reflected a 2-part message — a message that included both plain-text and HTML formatting. This included the entire header section.
  4. I stored the source file without the header section.

I recorded the sizes of the first two in a spread sheet — the first file as Text and the second as HTML. Dividing the total of the HTML sizes by the total of the Plain sizes gave me the average bloat factor. Because bloat is meaningful in the gross context over many messages — in terms of bandwidth impacts, disc space occupied, etc — this average is based on the total size of all messages compared with the total size of the equivalent plain-text content. If I had averaged the individual bloat factors of each message, the result would have been 16.9, greater than the 16.0 reported under "Findings".

Note that the Plain files might not contain all the content intended for the message. This would be caused by placing text within images, which would make the message incomplete for a blind person using an audio browser. For commercial messages, this would be a violation of the Americans with Disabilities Act.

For HTML errors, I used the W3C Markup Validation Service. I input the content of an HTML file. I recorded the number of errors on the same spreadsheet. Dividing the number of errors by the HTML size and multiplying by 1,000 gave me the proportion of HTML errors per KB of HTML for each message. Because the impact of HTML errors falls on individual messages, I then took the average of the individual proportions. If I had instead used the total number of errors versus the total HTML sizes, the result would have been 6.4 HTML errors per KB of message size, fewer than the 7.3 reported under "Findings".

NOTE WELL: In the chart below, two-part messages are indicated. The total size of a two-part message is equal to the sum of the plain-text and HTML-formatted sizes plus about 4-6 KB for the header section. Thus, two-part messages might appear to increase bloat. In this latest study, bloat only considered the size of the HTML-formatting; the size of the plain-text portion of two-part messages was excluded. Nevertheless, two-part messages require more bandwidth to send and receive and more disc space to store than messages that are only HTML-formatted.

Raw Data
(sizes in bytes)
Msg #Plain SizeHTML SizeBloat
Factor
HTML
Errors
Errors
per KB
2-Part
Message
1759104,395137.54314.1X
21,27162,19048.9631.0
32,53527,61010.9903.3
41,74522,65113.081435.9
54,09660,80514.864710.6X
62,02839,39619.4621.6
72,23636,32816.2651.8X
83,3739,7392.919319.8X
91,25421,25216.91235.8X
103,31761,40518.51642.7
113,67190,69724.74154.6X
122,032107,59152.9940.9
132,79331,42311.31314.2X
146,22691,21814.7540.6X
154963,4306.9175.0
164,22813,8973.329020.9
179,821185,34318.9110.1
183,67073,49620.01852.5X
192,4866,8272.711116.3X
201,21239,41732.51724.4X

New study 7 October 2025


Valid HTML 4.01