Note: My Web pages are best viewed with style sheets enabled. |
Unrated |
In the fall of 2008, I examined 20 HTML-formatted E-mail messages. From this study, I concluded that the average bloat factor was 3.4 and that HTML-formatted messages contained an average of 7.4 HTML errors per KB.
Early in 2010, I decided to repeat this study for several reasons.
I again collected 20 HTML-formatted E-mail messages. This time, the average bloat factor was 4.6 (worse than in 2008); and HTML-formatted messages contained an average of 4.6 HTML errors per KB (less than the ratio in 2008).
After two more years during which E-mail clients might have further evolved, I repeated the study with a fresh set of 20 HTML-formatted messages collected during January 2012. Unlike the 2010 study, however, I did not collect data on what E-mail clients were used; in 2010, half the messages failed to indicate any client while in 2012 that grew to three-fourths of the message not indicating any client, preventing any use of client identifications. The average bloat for HTML-formatted messages was 4.0 times the size of equivalent plain-text messages, with 5.0 HTML errors per KB.
I repeated the study with another 20 E-mail messages collected in June of 2015 to see what changes might have occurred in HTML-formatted E-mail in the prior three years. The average bloat factor for HTML-formatted messages increased to 10.0 times the size of the equivalent plain-text content, with 6.1 HTML errors per KB.
I repeated the study with another 20 E-mail messages collected in June of 2019 to see what changes might have occurred in HTML-formatted E-mail in the prior four years. The average bloat factor for HTML-formatted messages increased to 16.0 times the size of the equivalent plain-text content, with 7.3 HTML errors per KB.
In September 2021, I repeated the study, again with 20 HTML-formatted and multi-part (plain-text and HTML-formatted combined) messages each from a different sender.
A significant source of bloat is the nesting of unnecessary <div>—</div> elements. I examined a message with a bloat factor of over 17. It had the following markup 19 times where one would have sufficed:
One problem that might not appear as an HTML error is excessive nesting of tables. I examined a message that had seven tables, not separate tables but each nested within another. This is considered very poor HTML design.
I collected 20 messages that were either HTML-formatted or were two-part (containing both plain text and HTML). Making sure that no two messages were from the same source, I stored each message four times. I excluded any attachments (which, for HTML-formatted messages, means excluding any images or background). The first two of these were used in my analysis. The last two were saved in case I needed further information.
I recorded the sizes of the first two in a spread sheet — the first file as Plain and the second as HTML. Dividing the total of the HTML sizes by the total of the Plain sizes gave me the average bloat factor. Because bloat is meaningful in the gross context over many messages — in terms of bandwidth impacts, disc space occupied, etc — this average is based on the total size of all messages compared with the total size of the equivalent plain-text content. If I had averaged the individual bloat factors of each message, the result would have been 16.9, greater than the 16.0 reported under "Findings".
Note that the Plain files might not contain all the content intended for the message. This would be caused by placing text within images, which would make the message incomplete for a blind person using an audio browser. For commercial messages, this would be a violation of the Americans with Disabilities Act.
For HTML errors, I used the W3C Markup Validation Service. I input the content of an HTML file. If the message's HTML included a <!DOCTYPE> declaration (which is required for Web pages), I selected the "Validate Full Document" option. Otherwise, I selected the "Validate HTML fragment" option, specifying "HTML 4.01 Transitional" (which is the least restrictive HTML 4.01 syntax) and then repeated with "HTML5 (experimental)", choosing the least number of errors between them. I recorded the number of errors on the same spreadsheet. Dividing the number of errors by the HTML size and multiplying by 1,000 gave me the proportion of HTML errors per KB of HTML for each message. Because the impact of HTML errors falls on individual messages, I then took the average of the individual proportions. If I had instead used the total number of errors versus the total HTML sizes, the result would have been 6.4 HTML errors per KB of message size, fewer than the 7.3 reported under "Findings".
NOTE WELL: In the chart below, two-part messages are indicated. The total size of a two-part message is equal to the sum of the plain-text and HTML-formatted sizes plus about 4-6 KB for the header section. Thus, two-part messages might appear to increase bloat. This was not considered during any of these studies, however, because part of that increase — the plain-text part — is actually valid text content. Nevertheless, two-part messages require more bandwidth to send and receive and more disc space to store than messages that are only HTML-formatted.
Msg # | Plain Size | HTML Size | Bloat Factor | HTML Errors |
Errors per KB | 2-Part Message |
---|---|---|---|---|---|---|
1 | 3,339 | 51,501 | 15.4 | 21 | 0.4 | |
2 | 8,159 | 149,673 | 18.3 | 20 | 0.1 | |
3 | 1,457 | 41,767 | 28.7 | 145 | 3.5 | |
4 | 2,215 | 9,233 | 4.2 | 277 | 30.0 | |
5 | 441 | 2,678 | 6.1 | 48 | 17.9 | |
6 | 2,122 | 35,399 | 16.7 | 561 | 15.8 | x |
7 | 324 | 1,976 | 6.1 | 23 | 11.6 | x |
8 | 3,306 | 52,671 | 15.9 | 829 | 15.7 | x |
9 | 768 | 10,210 | 13.3 | 38 | 3.7 | |
10 | 1,697 | 37,737 | 22.2 | 864 | 22.9 | |
11 | 919 | 15,998 | 17.4 | 57 | 3.6 | x |
12 | 3,575 | 45,933 | 12.8 | 495 | 10.8 | x |
13 | 977 | 3,004 | 3.1 | 6 | 2.0 | |
14 | 1,345 | 26,629 | 19.8 | 89 | 3.3 | x |
15 | 1,705 | 55,694 | 32.7 | 274 | 4.9 | x |
16 | 3,045 | 19,589 | 6.4 | 22 | 1.1 | x |
17 | 1,327 | 50,842 | 38.3 | 28 | 0.6 | |
18 | 5,646 | 56,783 | 10.1 | 212 | 3.7 | x |
19 | 241 | 1,533 | 6.4 | 73 | 47.6 | x |
20 | 4,195 | 54,494 | 13.0 | 69 | 1.3 | x |
New study 24 September 2021
"Internet" Table of Contents |
David Ross home |
|