Return to Nick Wallingford's CV


Abstract || Acknowledgements || Introduction || Literature Review || Methodology || Data Collection Methods || Findings, Results, Interpretation || Conclusions, Recommendations || References || Appendices

Findings, Results and Interpretation

Outline of Section

5.1  Frequency and Size of Messages
5.1.1  Frequency of Messages
5.1.2  Annual and Monthly "Views" of Posting Frequency
5.1.3  Daily Activity Measures of Maximum and Minimum
5.1.4  Annual Profile Measure
5.1.5  Message Size and Archive Size
5.2  Contributors to the List Activity
5.2.1  Contributors Within Each Year
5.2.2  "Single Posting" Contributors to a List
5.2.3  "Primary Contributors" to a List
5.2.4  Average Number of Messages per Contributor
5.2.5  Contributors Accounting for Half of the Total Messages
5.2.6  Relating Contributors to Email Addresses
5.3  Other Header Information
5.4  Subject: Line
5.4.1  Length of Subject: Lines
5.4.2  "Blank" and Long Subject: Lines
5.4.3  Messages Responding to Other Messages
5.5  Date and Time Considerations for the List Postings
5.5.1  Hour of the day for each contribution
5.5.2  Messages by Day of the Week
5.5.3  Number of Messages by Month of the Year
5.6  Countries
5.7  Domains
5.7.1  "Nature" of Domains Originating Messages
5.7.2  BITNET
5.7.3  "Web-based" Addresses
5.7.4  Number of Domains Originating Messages
5.7.5  Domain Lengths
5.7.6  Levels in Domain Name
5.8  Quotations
5.8.1  Nature of Quotes Within Messages
5.8.2  Quoted Material to Total Message Length
5.9  The Impact of Moderation on List Activity
5.9.1  Reduced Number of Messages After Introduction of Moderation
5.9.2  Reduced Size of Message Archive After Introduction of Moderation
5.9.3  Elimination of HTML in Messages and Binary Attachments
5.9.4  Reduced Quotes After Introduction of Moderation
5.9.5  Change in Quoted Material Compared to Previous Year


5.1 Frequency and Size of Messages

Two of the measures of “activity” that are immediately apparent to most users of electronic mail distribution lists are:

Some lists continue for long periods of time with extremely high activity, generating more than 100 messages each day. Often instructions for subscribing to such lists do not warn the unsophisticated user of the potential to overwhelm with the sheer volume of messages.

Other lists allow the distribution of attached binary files, such as digital photographs or graphics, along with the messages to the list. Some such attachments can put an immediate strain on the dial-up Internet connection of a user.

Lists vary dramatically in numbers of postings over periods of time. They vary not only when compared to other lists, but also internally, over time and occasionally by season, depending upon the nature of the list in question.

A “snapshot” approach, providing the number of messages per day over any given week of a list's activity, has the risk of misrepresenting the potential number of messages that might be generated over a longer time period. Describing a particular list as having had a given average number of messages daily over a given monthly or annual period can be accurate but not as useful as a combination of measures of activity.

5.1.1 Frequency of Messages

The activity on the BEE-L list developed slowly. During the first several years, it was unusual to have a day with more than two or three messages to the list. Even then, the periods of activity tended to be somewhat clustered within several days of a month. The list had been in existence for more than three years before the first day on which 10 messages were posted. It was common through these first few years to have several consecutive days with no messages at all.

These posting behaviours resulted in average numbers of messages per day in the 1 to 5 range until early 1994, when activity overall dramatically increased (see Table 4). While there were still months with reduced activity, the list was set on a growth period that would extend to early 1998.

5.1.2 Annual and Monthly “Views” of Posting Frequency

Viewing list activity on an annual basis provides an overview of the periods of increased and decreased activity of a list, and can be one useful measure of any list’s activity over time, as shown in Figure 1.

  Number of Messages on an Annual Basis

The view does, however, tend to mask monthly and daily variations that might seem quite at odds or contradictory to such summarised results. In the case of the BEE-L, the annual total of messages would seem to indicate a steady growth to the 1996/97 period. As can be seen, monthly and daily views of the same data indicate that the growth in message numbers was not steady or consistent within that period.

By 1998 then there were few days when there would be no postings at all; on most days there were 15 or more messages (Table 4). After this peak in activity, there has been a steady reduction in message numbers for all but one of the subsequent years.

  Message Numbers on a Monthly Basis

Examining the same data by charting the number of messages on a monthly basis as in Figure 2 indicates a considerable amount of variability. Even in consecutive months, list activity on the BEE-L list was far from consistent in message numbers.

In practice, this would appear to the list members as “high” and “low” traffic periods when viewed over this monthly timeframe.

  Message Numbers on a Monthly Basis with Moving Average

By applying the concept of running, or moving, averages to the monthly data, the overall trends related to message numbers emerge more clearly. Given the variability of the data (in this case, the varying numbers of messages from one month to the next), the use of an average of the eight months before and after a given month, provides a “smoothing effect” that can demonstrate the true nature of trends over the time period of 12 years as indicated in Figure 3.

Even the extremes of high and low activity months in the early 1998 period are evened out with this approach to a definition for “frequency of postings”.

5.1.3 Daily Activity Measures of Maximum and Minimum

Monthly sampling can still appear to misrepresent the activity of a list, however, as large day-to-day activity ranges within a single month would appear quite different to a list member.

  Maximum, Average and Minimum Daily Messages Within Each Month

In the case of BEE-L, within the framework of the monthly average number of messages, there would still be individual days within each month with considerably higher or lower message numbers (Table 4 and Figure 4).

Days of monthly maximum numbers of postings, in particular, are of interest to a list member, as such peak loads create an incentive to either “bulk delete” messages without reading them, or even unsubscribe entirely. Days of minimal activity might well lead a new member to expect that to be the normal activity level for a newly-joined list, if the measures of average and recent maximum messages per day had not been considered. Figure 4 illustrates some of these variations (minimum, average and maximum daily values for each month) in activity.

From the list member’s point of view, list activity could best be appreciated by a combination of the various measures discussed. To describe a list’s frequency of posting by a simple measure such as “number of messages per month”, or “average number of messages per day” might be appropriate if the list activity is consistent over the time period. Some consideration, however, should be given to extremes of high and low activity during the given period of time.

5.1.4 Annual Profile Measure

One measure that was relatively easy to obtain and also easy to interpret for a new user was used to provide a “profile” of list activity for each of four years of the archived material (the first full year and the last three), shown in Table 1.

  Days with Given Number of Messages

The number of days with extremely large numbers of messages, in particular, would be perceived by many users as significant in a decision to remain a member of such a distribution list.

The provision of this set of measures, the percentage of days from one year of archive with given numbers of messages, provides an effective measure for a list member to appropriately consider “frequency of postings”, eliminating the variations inherent in strictly monthly or annual frequency measures.

The list in 2000/01 has stabilised to average about 10 messages per day, but as can be seen, it is not likely that a single “measure of activity” can fully account for the range of daily, monthly and annual posting behaviours on any given list. Measures that attempt to “smooth” the inherent variations are more likely to be effective at describing list activity than simple averages, while some reference to the extremes and their size and nature are necessary for reliable predictive ability.

5.1.5 Message Size and Archive Size

The size of individual messages, as well as the total size of all messages received over a period of time, can dramatically affect a list member's perceptions of the value of remaining on a list.

Long text messages result in either skim reading or even deletion without reading if time for email activities is pressured. Some lists allow for the posting of binary files (including graphics) and messages containing Hypertext Markup Language (HTML) elements, both of which can result in messages considerably longer than an average message containing simple text.

From an administrative point of view, the size of a list's archive can have several important impacts, including:

For most lists, particularly those with a relatively large number of postings, the overall number of messages will be closely related to the size of the archive. That is, given a large number of messages, the average length of the given messages will be affected significantly by the preponderance of relatively small messages that are usual in email communications generally.

  Average Size of Each Message on an Annual Basis

The average size per message for the BEE-L list has remained relatively stable, and almost always between 1.5kb and 2kb when examined on an annual basis (see Figure 5). The largest message in the archives is 6,693 lines of message body – nearly 500kb of message. This message (in fact, a message with a binary attachment) was sent shortly before the decision was made to introduce moderation.

  Average Message Size on a Monthly Basis

As with message frequency, however, viewing the same data on a monthly basis indicates the variability within the data, shown in Figure 6. The relatively small number of messages involved in the early years of the list has impacted considerably on the average message size, allowing for the occasional message that was somewhat longer to have a more dramatic impact on the average message size than would have been the case if there had been more messages in the time period concerned (Table 4). The value for the month of January 1991 in Figure 6, for instance, has not been shown to scale due to its abnormally large value.

  Total Message Size on an Annual Basis

A measure of annual size in kilobytes of the archive does provide some valuable information about the BEE-L list history, as well as providing some predictive indications for the maintainers of the archive, illustrated in Figure 7.

  Total Message Size (kb) on a Monthly Basis

As with annual message numbers, the BEE-L activity indicates a slow increase until 1994, with dramatic increases in size (sometimes doubling on an annual basis) until 1997/98 when moderation was imposed.

Since 1998, the list archives have continued to grow steadily at approximately 7Mb per annum. Given this annualised view of archive size, a list owner or host computer administrator could make reasonable plans for required storage for some future period of time.

The monthly view of that data in Figure 8 highlights several months of extreme size variations that are significant to a wide understanding of list archive sizing.

Months such as September 1997 and April 1998, taken in isolation, would indicate why the list owner might have been concerned at the longer-term storage needs of the list.

  List Size in Line Numbers on a Monthly Basis

The activity measure of “lines” as in Figure 9, rather than total number of kilobytes, would appear to be similar in overall trend. In the case of the BEE-L archives, there are a very few messages that were stored without “hard” carriage returns and line feeds, which would tend to make “lines per message” a less accurate measure of size than overall kilobytes of message. These messages were sent with the anticipation that the receiver’s email client program would provide the line breaks as required. In the context of the list archive, they appear as extremely long lines that extend well off the right side of the screen, and required special treatment for viewing when compared to the other text files in the archive.

5.2 Contributors to the List Activity

Over the 12 years of the BEE-L archive, it appears that a total of 2,278 individuals have made postings to the list (some individuals may have used different names and some names may be used by more than one individual). To determine measures to reflect the levels of contribution and participation, however, requires a more complex set of measures. As with the measures related to frequency of postings, a number of factors complicate any single description that attempts to quantify the participation levels of individuals.

As well, it must be remembered that “participation” in this context refers to individuals who have actually written a message to the list. Two other categories of participants that are not approachable by this methodology of archive examination include:

For that reason, people who write messages are described in this dissertation as “participating” in the list activities, but in fact the word “contributing” is perhaps more descriptive.

5.2.1 Contributors Within Each Year

While it does not provide a comprehensive indication of “contribution”, the number of individuals who post messages in a given time period can be seen as one valid measure of a list's activity. This is shown in Figure 10.

  Number of People Making Postings

In the case of BEE-L, the number of individuals posting one or more messages in each year follows the general pattern of the other measures of list activity discussed in the previous section – frequency and size of postings. That is, the numbers increased steadily from list inception to peak in the 1996/97 year, with a tailing off to the end of the 12 year period.

During the early years of the BEE-L list, there were a relatively small number of participants in the discussions, indicative of the numbers of people using the Internet (and BITNET) during the late 1980’s and early 1990’s. As list activity and frequency of postings grew, so also did the number of contributors.

The activity of each contributor, however, might vary widely in any given time span. If the consideration is only for the number of people making one or more postings, no reference is made to the actual numbers of messages written to the list. Other measures can and should be developed and utilised to determine the nature of the list based on contribution levels.

That is, two lists with the same number of postings by the same total number of people over a period of time might in fact be demonstrably different when viewed using other metrics. The difference in list nature would be extreme, for instance, if all of the messages from one list were written by different people, while all from the second list were written by a single individual. The development of a measure of “primary contributors” would assist to appropriately measure the activity in such a situation.

5.2.2 “Single Posting” Contributors to a List

A significant measure of the nature of any given list would be the number of people who only ever make a single posting to the list, as shown in Figure 11. These single postings might be in the form of a question to the other members of the list, with no subsequent follow-up from the original writer, or perhaps a single comment on a particular issue the writer may feel strongly about.

  Percentage of Messages from “Single Message” Contributors on a Monthly Basis

The single postings would not appear to contribute to longer-term list nature or stability, perhaps being better described by the term Smith (2001) used for such contributions to USENET groups:

“…the number of people who contributed one and only one message to the newsgroup in the time period selected. Sometimes referred to as drive-by posters.”

Any list, or any given time span of a list, that has an extremely large percentage of single postings by contributors would lack the give-and-take one commonly encounters in distribution lists and other such activities. If a participant posts only a single message – never writing to question, reply to, respond to or further develop ideas from any other messages – the on-going development of ideas and clarification of points would be limited and lacking in substance and quantity.

On the other hand, the nature of Internet activity is such that there will generally be a significant number of such participants. Smith (1999) found that there were 42% of writers in one ten week study who only contributed a single message to USENET groups. The equivalent measures for the BEE-L archives are similar in the recent years, with “single posters” being even larger as a percentage of all writers in earlier years.

Of the 2,778 people who have written messages over the 12 years of the BEE-L archives, 1,026 were “single posting” contributors – apparently 37% of people who wrote to the BEE-L list made only the one written contribution.

5.2.3 “Primary Contributors” to a List

Many electronic mail distribution lists develop the equivalent of a “personality” of their own, a series of socially acceptable ways of discussing issues, responding to other participants, and any number of other idiosyncratic mannerisms. It is little wonder that new members are encourage to “lurk” (read, but not post messages) for a period of time before becoming active contributors to a list – many lists exhibit low tolerance for those who deviate from the list norms, as described by North (1994).

A core group of participants of a list at any given time would appear to be responsible for such a list nature. This group of contributors, characterised by combinations of perceived wisdom, length of time on the list and by simple measures of numbers of individual contributions, will to a large extent determine the overall “nature” of a list at any given time.

The quantitative measures of numbers of messages posted over given periods of time can be used to demonstrate “importance” of a member, or at the very least, a perception of “self-importance” of an individual contributor.

The primary contributors to the BEE-L list have remained reasonably constant over time. Of the top ten contributors, measured by the total number of messages written (shown in Table 6), six appear to have remained active at similar levels over periods of five years or more, with one seeming to have either reduced levels in the last year or even have left the list. Only one is considered a top contributor based on a very large number of contributions over only the last two years. Of the two who are no longer active, one has died and the other would appear to have left the list after some pointed comments about the nature of the list’s moderation.

Rather than viewing the single measure of “total messages over total archive”, however, some other measures of individual activity will provide better methods of measuring participation levels.

  Messages from “Top 10” and “Top 11-50” Contributors of Each Year

One alternative would be to determine the number of messages written by the ten contributors with the greatest number of messages within each year of a list’s archive. To extend this concept, the next 40 most prolific writers over time have been included in the chart shown as Figure 12. In the case of the BEE-L list, this provides a measure of the relative influence on list activity of these two groups of people.

The percentage of the total number of messages in a year that were contributed by the group of ten individuals can be a valuable measure of the nature of any particular list. An inordinately large percentage would indicate something of a “closed shop” in a list's activities, with contributions from “outsiders” either being dissuaded or discouraged or even in some way suppressed in terms of participation.

5.2.4 Average Number of Messages per Contributor

The average number of messages from each contributor over a given period of time is another valid measure of list activity, buffering to some extent the measures related to “single posters” and the “top contributors” described above.

  Average Number of Messages per Contributor Within Each Year

Over the 12 years of the BEE-L list, the average number of messages per contributor is nearly 16. Given that contributors have participated for varying numbers of years, however, looking at the results within each year of the list’s archive can arrive at a better measure, shown as Figure 13.

Analysed in this manner, it is determined that the average number of messages written by each contributor within any given year has varied from 3 messages up to a maximum of over 9 messages per writer per year.

Following the introduction moderation of messages in 1998, the average number of messages per writer dropped by nearly two per writer for the following two years. In the last year of the list's archives, it has again risen to the level of just over 8 messages per writer per year.

5.2.5 Contributors Accounting for Half of the Total Messages

On the face of it, it would seem that the period from 1995 to 1998 was characterised by each writer providing more messages in each year. The danger in this conclusion is that the overall average can be affected by a smaller number of writers who may have provided an extremely large number of messages. A different and perhaps more demonstrative measure might be to examine what smallest percentages of the total number of writers in a year were responsible for, for instance, one half of the total messages written, shown as Figure 14.

  Smallest Number of Contributors Needed to Represent 50% of Contributions

For any of the twelve years of the BEE-L archives, fewer than 50 individuals generated fully half of the total number of messages. If viewed in light of the number of individuals posting messages in these years, it can be seen that only a small percentage of the total number of people writing messages – less than 10% for all but three years - accounted for half of the total number of messages (see Figure 15).

  Percentage of People Accounting for 50% of Postings

In the last few years of the list, the concentration of postings to a relatively small group of individuals has become even more pronounced.

5.2.6 Relating Contributors to Email Addresses

While the data of the archives was manipulated considerably to determine the statistics related to each “contributor” to the list, contributions come in fact from particular electronic mail addresses. A total of 3,490 discrete mail addresses have been used by the 2,778 indentifiable individuals who have written messages to the list, an average of 1.25 addresses each. More telling, however, is looking at the individuals who have used more than one address over time.

  Number of Email Addresses Used by People

Rather than being evenly distributed, it can be seen in Table 2 that most writers only ever used a single address for posting to the BEE-L list. Given that 1,026 of the 2,304 people using a single address were those who made only a single posting, that is not entirely surprising. Altogether, 17% of the people who have posted to the list have used more than one email address. Without the inclusion of the single postings, 27% of people have used more than one email address through the period examined.

With many people using multiple email addresses on the Internet for reasons of privacy and portability, the email address to individuals ratio might well increase into the future. The ratio for the twelve years examined are shown in Figure 16. Early years have relatively high ratios due to the relatively small number of messages, combined with the changes from BITNET to Internet, changing host naming behaviours and the general flux that accompanied the early years of Internet access for most people, with many changing Internet Service Providers (ISPs) through the first years.

  Ratio of Email Addresses to People Posting in Each Year

5.3 Other Header Information

Accompanying each message in the BEE-L archives are other lines of “header” information. That is, each message is prefaced by a series of lines consisting of a keyword such as To:, From:, Subject: or cc: (for example), as described by Resnick (2001).

The headers are significant in the analysis of list activity not only for the disk space that they consume, but also for the particular information that they can contain (Table 5).

  Average Number of Lines in Headers

From a historical perspective, it can be seen that the overall length of the header component of messages in the BEE-L archives has grown over time (particularly in the three years 1996/97, 1997/98 and 1998/99), with an increasing number of messages with large numbers of header lines (see Figure 17 and Table 7).

  Percentage of Messages of Given Header Length - 1989/90 and 2000/01

Comparing data from the first and last years of the list archives makes this particularly evident in Figure 18.

Header lines such as:



MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

have tended to increase the overall header length, but have not been considered particularly meaningful in the analyses carried out for this dissertation.

As well, cross-posting of messages to other lists or individuals will generate multiple header lines in addition to minimal number actually required for simple delivery to the BEE-L list of a simple message.

An “average” message over the 12 years had 9.8 lines of header information, with a standard deviation of 2.2. One message to the BEE-L list had a 70 line header, the result of a message that was being simultaneously sent to quite a lengthy “carbon copy” list that was included in the message's headers.

5.4 Subject: Line

5.4.1 Length of Subject: Lines

An analysis of individual header lines can provide interesting commentary on the changing nature of a list as well. The Subject: line, in particular, can be used to determine with some degree of certainty the general nature of interchanges on a list.

Subject: lines should be used to effectively convey the subject matter of a message or, if replying to another message, indicate which other message caused the message to be written. As described by the International Federation of Library Associations and Institutions (1999):

“The subject line of an article enables people to decide whether or not to read your article. Tell people what the article is about before they read it. A title like “Car for Sale” does not help as much as “66 MG Midget for sale: Beaverton OR.” Don't expect people to read your article to find out what it's about, many won't bother. Some sites truncate the length of the Subject: line to forty characters, so keep your subjects short and to the point.”

5.4.2 “Blank” and Long Subject: Lines

Blank Subject: lines or extremely long Subject: lines may well be counter-productive in attracting readers.

  Number of Characters in Message Subject: Line

While only 1% of the messages over the twelve years have been posted without a Subject: line, it is interesting to see in Table 8 and Figure 19 that about 9% have had Subject: lines of more than 40 characters in length, presumably causing truncation in many mail readers.

  Percentage of Messages with Different Subject: Lines

Over the 12 years of the BEE-L archives, the percentage of messages with no subject at all has decreased to nearly nil (Figure 20) – it would be a rare message now that the list moderators would allow through without some Subject: line to describe the message content.

5.4.3 Messages Responding to Other Messages

While this dissertation does not attempt to analyse content of Subject: lines, the use of the Re: convention within email programmes does bear some commentary. A near universal feature of electronic mail client software is the use of a prepended Re: to an existing Subject: line if the software's “reply” function is invoked.

Messages that have one (or more) instances of Re: in the subject can with reasonable certainty be assumed to be messages that are in reply to a previous message, rather than being entirely new in initiation. Given that Subject: lines can, and often are, changed manually, however, the figures would be expected to understate the number of messages that were direct replies to a previous message.

Many messages within the BEE-L archives contain structured header lines such as:



In-Reply-To:   <508.f5PF8OP29015@listserv.albany.edu>

that would seem to indicate the ability to quantitatively analyse the number of messages that were direct replies. It does not appear, however, that all mailers have adhered to the convention. Accordingly, the analysis was not carried out for this list.

Apart from the first few years of the list, more than half of all messages have Re: in the Subject: line as shown in Table 9 and Figure 20, clearly indicating for this list the nature of “on-going” topical discussion. A list consisting primarily of announcements, on the other hand, would be expected to have only a small number of reply-type messages.

5.5 Date and Time Considerations for the List Postings

5.5.1 Hour of the day for each contribution

The Date: header of the messages in the BEE-L archive record the time that messages were posted, reporting the time as registered by the mail server of the person originating the message. The generally accepted convention is that the time be reported as local to the server, but that it be accompanied with an indication of either the time zone involved or, more commonly now, a correction factor (plus or minus) of hours ahead of or behind Greenwich Mean Time (GMT).

Such header lines would appear as:



Date:       Mon, 5 Feb 90 15:29:00 EDT
Date:       Sat, 3 Jan 1998 13:35:57 –0800
Date:       Fri, 2 Aug 1991 14:27:53 TUR
Date:       Thu, 5 Sep 1991 14:20:49 +0200

The time of posting, then, can be parsed, extracted and analysed to determine what time of the day messages have been generated with some degree of confidence.

Initial consideration of the nature of the list might lead to the assumption that postings would be made in the evening hours by many or most list contributors. That is, given that the list consists predominantly of hobbyist beekeepers, or if commercial, people who might be expected to be working outside during the daylight hours, the use of the Internet in the evening might be expected. The data, however, does not back that up (see Table 10 and Figure 21).

  Number of Messages by Hour of the Day

Rather, it would appear that the morning hours from 8am to 11am are significantly favoured as a time period in which to write the emails to the BEE-L list. One explanation might be that, given the hobbyist orientation of many of the list contributors, the list is being received and written to at places of employment, rather than home. As discussed in the next section, this conclusion would concur with that of Smith (1999).

5.5.2 Messages by Day of the Week

A related analysis to the time of the day described above is the day of the week upon which postings are made as shown in Table 11 and Figure 22.

  All Messages by Day of the Week

When considered in conjunction with the above analysis, some more evidence is provided that members of the BEE-L list may well be participating from work addresses. This supports the statement of Smith (1999):

“... the Usenet has a weekly cycle of activity that builds during the workweek and falls off over the weekends, suggesting that many people access the Usenet from their workplaces. This may challenge the belief that recent network growth has been predominantly driven by home consumer use.”

The percentage of postings made on weekdays is very consistent at 16% of all messages each day (Table 11). The days of the weekend see that percentage falling to 9% and 10%. One might have expected to see a significant amount of the activity concentrated on the weekends, but that does not appear to be the case.

  Percentage of Messages by Day of the Week - 1989/90 and 2000/01

The effect is even more dramatic when only two years of the list activity are examined – the first in 1989/90 and the most recent in 2000/01, shown as Figure 23. Given the nature of Internet availability in 1989, it is not at all surprising to see an even more pronounced inclination to weekday postings during that year. It would not have been common to have home access to the Internet then, and the postings would reflect the use of employment-related email facilities.

Even in 2000/01, though, the percentage of weekend postings is only at about 10% for each of the two days, down from the average number of postings for a day. The differences from 1989/90 can be seen to reflect the increase in general connectivity in the home and the use of the Internet in the home.

5.5.3 Number of Messages by Month of the Year

For a variety of reasons, it can be expected that some email distribution lists might exhibit some seasonal cycles of activity. In the case of the BEE-L, the natural correlation to attempt is with the activity of beekeeping (Tables 12 and 13).

  Percentage of Messages (Number and Size) by Month

The activity of the list does, in fact, provide a degree of seasonal cycling consistent through the years, summarised in Figure 24, and it is not at all difficult to describe this in terms of the nature and quantity of beekeeping endeavour for those seasons. An underlying assumption in this analysis, however, is a Northern Hemisphere predominance of membership and participation in the list's activities.

The beekeeping year is characterised by:

The number of messages through the months of the years of the list archives, as well as the total size of the messages, follows a similar cycle for almost every year of the archive (Tables 12 and 13).

When viewed over all twelve years, the list activity reflects the seasonal level of interest of beekeepers. It could be argued, however, that busy commercial beekeepers would not post to such a seasonal pattern. They might, arguably, post to a pattern the reverse of that - activity on the list increasing when the work “in the field” is not so onerous. Given the preponderance of hobbyist beekeepers on the BEE-L list, however, it is not surprising that the activity in this case reflects the nature of the subject of the distribution list.

5.6 Countries

Country of origin for each message can be readily obtained for many of the email addresses used for posting to the BEE-L list (Table 14), but the nature of Internet addressing in general precludes completeness and absolute assurance.

  Number of Messages by Country/Generic Domain

Addresses that include a country code top level domain (ccTLD) such as .nz or .au can be reasonably assumed to originate from that country. Even then there can be no absolute confidence – it may just be the use of an address for a company that has registered a domain in that particular country, regardless of the physical location of the message writer.

For the data obtained from the BEE-L archives, the list of country codes maintained by Crepin-Leblond (1994) was used to translate domain code to country name for these addresses with a ccTLD, with the results summarised in Figure 25.

More problematic within the data, however, are generic top level domains (gTLD), identified and described by Galperin and Gordin (1995):



Domain Name     Meaning
COM Commercial organizations EDU Educational institutions GOV Government institutions MIL Military groups NET Major network support centers ORG Organizations other than those above INT International organizations

These domains are also described as “CONEs” by Zook (2001a) – an acronym for the four primary domains, excluding .gov and the less frequently used .int and .mil gTLDs. Given their nature and original processes of allocation, it can be assumed that many of these do, in fact, originate from the United States or Canada. Addresses ending with .edu, for instance, are almost certainly American educational institutions.

Addresses with .com, however, are not useful in this analysis by country. Zook (2000) attempted to extend the ability to determine country of origin for the .com domains by referring to the recorded address of the person/organisation that registered the domain, and found that 66.9% of the .com domains were highly likely to be located in the United States. He also noted that the growth rate of gTLDs was faster in 23 of the 25 countries with the largest numbers of domains registered.

When it is considered that nearly one third of all messages to BEE-L originated from a .com address, and nearly two thirds of all messages from the range of gTLDs, it is clear that any analysis by country can only be preliminary and incomplete. Even with that disclaimer, it was found that messages with email addresses from 69 distinct country domains have been posted over the 12 years of the list archives, charted in Figure 26 by year of first posting to the list.

  Number of New Countries Posting

As with the Internet uptake generally, the BEE-L message archives reveal a steady increase through the early 1990s in the number of new countries from which messages were being received (see Figure 27), with Internet access available, though perhaps not widely, through many countries in the middle 1990s. The increase in 1995/1996 reflects the significant expansion in access to the Internet that occurred about that time for many countries and, consequently, many beekeepers, and particularly the beekeepers who did not already have access related to education or government employment.

  Number of Countries Posting on Annual Basis

By 1997/98, the number peaked at nearly 50 countries represented in BEE-L messages (Figure 27).

  Number of Countries Contributing Messages on Monthly Basis

Viewing the same data on a monthly basis in Figure 28 indicates that participation levels by country have paralleled the overall list activity in such measures as number of messages and number of contributors.

In the last year of the archive the number of countries has fallen to only 30. The perception of some list members has been that this is reflective of the increasingly “North American” orientation of the list. Even though the list is open to membership and contribution from any country, the nature of the list discussions appear to have constrained some foreign contribution in recent years.

5.7 Domains

The domain of origin of each email posting to the list is contained within each email address, being the portion to the right of the @ symbol.

As described in the commentary on the number of countries represented in the BEE-L distribution list, the Top Level Domain (TLD) will be either a generic TLD (gTLD) or a country code TLD (ccTLD).

5.7.1 “Nature” of Domains Originating Messages

Both Morris (2000) and Dick (2001a) described how the BEE-L list had begun as a discussion group for such people as beekeeping researchers, scientists, regulators and educators. Given the nature of the access to electronic mail at that time, that is not unexpected.

The changing nature of the list participants can, to some extent, be quantified by examining elements of the domains from the email addresses. Emails originating from the gTLD .edu, for instance, can immediately be categorised as having come from an individual with an email account with some form of educational institution. Domains with country codes as their TLDs, however, use at least two conventions to indicate “educational”. In some countries, such as New Zealand, the second level indicator is .ac, while in others, the .edu domain as used in the US is utilised, albeit in the second position of the domain name.

Addresses such as:



jsmith@tenet.edu
jsmith@monash.edu.au and 
jsmith@unitec.ac.nz

are all representative of the range of addresses related to education.

Similarly, the .gov gTLD used in the US for the registration of governmental organisations can be found at the second level in a country code address as either .gov or as .govt, depending upon the domain registration decisions of the particular country involved.

  Percentage of Messages from Government and Educational Domains on a Monthly Basis

By applying these principles, the addresses of emails to the BEE-L list can be shown over time to support the view held by Dick (2001a) that over the 12 years of the list activity, there has been a change in nature of origin from research and governmental organisations to “ordinary” beekeepers (see Table 15 and Figure 29).

  Percentage of Messages from .com Domains

In the context of this analysis, the increasing use of the.com gTLD shown in Figure 30 can be seen as reflective of the overall uptake of the Internet by not only commercial organisations, but also as the TLD of choice for Internet Service Providers (ISPs) in many parts of the world as described by Zook (2000), even preferable to the use of a ccTLD for the country in which the organisation or ISP is based.

5.7.2 BITNET

BITNET (“Because It's Time NETWORK”) was a precursor to the Internet, providing connectivity primarily for educational and research institutes in the United States and Europe beginning in 1981, as described by Barnette (1994). The BITNET “nodes” (computers connected to the network) were identified using an 8 character naming system, but gateways were ultimately created to move electronic mail between BITNET and Internet addressed computers and accounts.

In the early days of BEE-L, as many as 90% of the messages originated from BITNET addresses, reflecting the origins of the list (and electronic mail communications in general) within the educational and scientific communities. The reduction over time is shown in Table 16 and Figure 31.

  Percentage of Messages Originating from BITNET

BITNET addressing persisted in the list headers as late as 1996, at a reducing frequency. When used, the BITNET addressing was often being directly relayed/gatewayed to an Internet address (such as the final appearance of BITNET in the archives in November 1996) -



IBRA%cardiff.ac.uk@ukacrl.BITNET

This formatting for the address indicated a message from the International Bee Research Association’s (IBRA) Internet address that was being sent via a United Kingdom node of BITNET. Ironically, the message would have then been routed back to the Internet, by then the predominant networking system in use.

BITNET addressing, without the structured domain level system of Internet addressing, did not initially reveal the source and nature of the originator of a message. As the Internet gained predominance, however, most institutions moved to Internet-addressing systems, and “translation tables” from BITNET to an equivalent Internet address, such as that provided by Anonymous (1993) enabled the writer to determine country and nature of the originator of messages.

5.7.3 “Web-based” Addresses

While most messages can still be identified as coming from a “traditional” Internet Service Provider (ISP) email account, there is a trend toward the use of “web-based” email addresses.

These accounts are widely available on the Internet, providing users with a number of identifiable advantages:

Three such domains that provide such web-based email subscription services, Hotmail, Yahoo and Juno, were identified and examined in the context of messages to the BEE-L list, with the results displayed in Table 17 and Figure 32.

  Percentage of Messages Originating from Yahoo, Hotmail and Juno

Though only first appearing in the archives in June 1996, these three domains now account for 3.8% of all messages. In the 2000/01 year, they accounted for almost 5% of all messages posted.

The use of these addresses can be expected to continue and increase in prominence as a proportion of all email addresses in use, given their perceived advantages listed above.

5.7.4 Number of Domains Originating Messages

The number of unique domains originating messages over time to the BEE-L list (shown as Figure 33) appears to parallel the overall number of messages posted to the list.

  Number of Domains Originating Messages

Bearing in mind the number of “single message” postings to the list, people who only ever posted the one message, that is not a surprising finding. Domains originating the most messages are identified in Table 18.

5.7.5 Domain Lengths

Zook (2001b) has provided an analysis of the overall lengths of a sample of the gTLDs:

“Although com/net/org domain names can be up to 63 characters long the median length of all gTLD domain names is eleven characters. The median length for net and org domain names is ten characters.”

  Average Length of Originating Domains by Year

For the messages to the BEE-L list originating from these domains, the equivalent averages are:

21,230 .com/.net/.org addresses average 9.6 characters length
8,846 .net/.org addresses average 10.2 characters length

Average annual figures are shown in Figure 34.

  Domain Lengths of Messages

The relatively large number of messages originating from domains of six characters in Table 19 and Figure 35 can be explained by examining the domains of greatest representation. The aol.com domain alone (see Table 18) has originated more than 10% of all BEE-L messages over the 12 years, and accounts for two thirds of the total number of six character domain messages.

5.7.6 Levels in Domain Name

For organisations with a large number of users, more domain levels can be added to the name, so long as the result remains within the delegated authorisation of an existing name.

The owner of the domain beekeeping.co.nz, for instance, can (with reference only to the requirements of the Domain Name Services used by that host) create such hosts as:



hopkins.beekeeping.co.nz
bumby.beekeeping.co.nz
cotton.beekeeping.co.nz

So long as the domain owner configures the mail transport agent (MTA) to deliver the mail to the correct user, such multi-level domains are both acceptable and tolerated by the Internet transport systems.

  Number of Levels in Domain Names Originating Messages

Domain levels consisting of only a TLD and one further level, then, would indicate a host directly allocated under a gTLD. Any email from a domain within a ccTLD would have a minimum of three domain levels (apart from a very few exceptions such as the domain .govt.nz). Addresses with more levels would indicate some levels of user-designated domain mail hosting delegations as described above. Summarised results of this analysis are shown in Figure 36.

Early postings to the BEE-L list were often from a “one level domain”, as with the BITNET addressing previously referred to. This consists of a simple account@host address such as jsmith@albnyvm1.

  Percentages of Domain Levels for Each Year

The early postings, including those from BITNET addresses, originated from predominantly one and two level domains (see Table 20 and Figure 37). Through the middle 1990s third level domains reached their maximum representation, with approximately 30% of message numbers over three consecutive years. This can be understood, to some extent, as a commonly used technical approach by network managers of that time. The use of “highly qualified” addresses, with a specific mail host identified as part of an email address, was seen as the most expedient means of ensuring mail delivery.

The resurgence of two level domains through the late 1990s, stabilised at just over 70% of all messages in the last two years analysed, and indicates the widespread acceptance of two level domains as “the norm” of email addressing. Even within the larger organisations, internal email routing has been developed to provide appropriate delivery to any particular host without the need to actually specify the host name in the third or greater domain level in the email address.

5.8 Quotations

5.8.1 Nature of Quotes Within Messages

Email client programs generally support the ability to use a Reply To function for a message that has been received. In the case of the BEE-L LISTSERV settings, the Reply To has always been directed back to the list itself as the originator of the message, though individual settings by a user can be used to override this default. That is, it is generally accepted that when a user chooses to Reply To, the response is sent to the BEE-L host computer that then distributes it to each of the list members.

Another email client feature is the ability to include a copy of the message that initiated the response. The default convention followed by many programs, and accepted as normal by most users, is the use of a prepended > symbol for each line of material quoted from a previous message. If quoted material is then used in yet another reply, another > symbol is inserted, giving an appearance such as:



>> My bees have produced honey this season.

> You didn't tell the list where you have your hives.

Oh, yes, I am based in the Red River Valley of Texas.

In that exchange, it would be assumed that three individual email messages were used to generate the three separate lines of text, with each reply being used to copy material from the previous message(s).

The use of the > character for quoted material is both easy to do and generally helpful to email readers. It allows for the repetition of questions, the inclusion of pertinent originating material and the “tracking” of who said what in a particular series of interchanges.

In many cases, however, the use of quoted material adds considerably to the size of the message involved. Quoting a long message in order to reply with only a few words creates a message of considerable size with minimal new material involved. As well, the repetition of material within the archive makes the process of searching for specific text that much more complex, given that the text might appear several times repeated in different messages.

While the convention to use > to indicate a line of quoted material is common, it is not universal. Some users provide no indicate apart from contextual of the use of quotes. Other email clients use different characters, such as / or # to indicate blocks of quoted material. Still other users might indicate an area of quoted material with something like:



snip --

A bee colony is not likely to do something in its self management 
that is harmful to its survival and prosperity.

snip --

In analysing the archives of the BEE-L list, lines that consisted of one or more > characters followed by one or more spaces were identified and categorised as “quotes”. It is accepted that this assumption understates the total amount of quoted material, given that lack of universally accepted convention. Results are shown in Tables 21, 22 and 23, and illustrated in Figure 38.

  Total Message Lines and Quoted Lines

Over the 12 years of the BEE-L list archival material, 61% of the total number of messages were devoid of any quoted material, using the assumptions identified above. In these cases, the body of the message was either entirely original in nature or did not use > to indicate any quoted material.

In one extreme case, a message appeared on the list that had only 15 “new” lines out of a total of 657 – the writer had chosen to use the Reply function, but then quoted the daily digest (all of a day's messages contained in one message) with minimal new content.

Over the years, a total of 36 messages were sent to the list the consisted of nothing but quotes. While this would generally have been done by user error, it still has the outcome of messages included in the archive that do not contribute to the “sum of knowledge” of the list. In practice it might result in abusive messages from other list members as well, further degrading and diffusing the focus of a list's purported subject.

5.8.2 Quoted Material to Total Message Length

Rather than consider the number of lines of quoted material, a better measure of the impact of quoted material on user perception would be the percentage of quoted material of the total message length, shown in Figure 39. As described, judicious use of quotes can provide context and increase readability; excessive use is undesirable from both user and archival points of view.

  Percentage of Lines That Were Quotes

Of the 13,998 messages that have quoted material in the BEE-L archive, 19% had half or more of their message body in quoted lines. This level of repetition of material would tend to indicate some degree of abuse of the ability to include quoted materials in replies.

Of the 21,503 messages with Re: in the Subject: line, 42% do not, in fact, have any quoted lines in the body of their messages. Writers do not universally feel obliged to include quoted material, even when responding to a particular message.

The percentage of quoted material increased consistently until it reached a high in 1996/97 year of over 20% of each message, on average, consisting of quoted lines (see Figure 39). The introduction of moderation in April 1998 was intended to reduce this increasing inclusion of duplicated material. The stabilisation at just over 8% for the last three years considered in this dissertation indicates the effectiveness of this policy initiative.

5.9 The Impact of Moderation on List Activity

The trends apparent in various of the tables and figures presented in this dissertation indicate to some extent the nature of the problem facing the list owner in early 1998. List activity in terms of message numbers (frequency of messages) had increased and continued to grow at an alarming rate. On a single day in February 1998, 74 messages had been sent to the list. As well, the average size of each message was also increasing, with one aspect of that being the inclusion of large amounts of material being quoted from other messages.

In early 1998, the list owner convened a group of members he considered capable of introducing some sense of “moderation” to the list. The decision to moderate a list was described by Kovacs, McCarty & Kovacs (1991):

“Should the group be moderated or unmoderated? The role of the moderator more or less combines the duties of editing a newsletter or journal with leading a seminar. The advantages of a moderated group are chiefly focus and coherence. These benefits can be of prime importance in a very active group, but moderation takes care and time. An unmoderated group is completely subject to the vicissitudes of its members, but it requires almost no attention once it has been established.” (p. 132).

For some distribution lists, moderation involves textual direction and near total control of list activity. For the BEE-L list, the intentions have been variously stated over time, and within the BEE-L Guidelines for Posting provided by Dick (2001b), as including:

This dissertation addresses various aspects of two of these intentions:

5.9.1 Reduced Number of Messages After Introduction of Moderation

As discussed previously, the average number of messages being distributed each day through the BEE-L list in early 1998 had risen to over 20, with as many as 74 being sent on one day in February 1998 (see Table 4). In the month of April 1998, there were a total of 1,071 messages – the first and only time the 1000 messages/month milestone was achieved.

  Numbers of Messages Before and After Moderation

Immediately following the beginning of moderation in April 1998, the number of messages was decreased quite dramatically, with a reduction over the next months of 50% to 70% in monthly message totals (see Figure 40).

5.9.2 Reduced Size of Message Archive After Introduction of Moderation

Along with the reduced frequency of messages came an equivalent reduction in the overall archive size for a month. That is, the storage requirement for the BEE-L list archive was decreased, as illustrated in Figure 41.

  Size of Message Archive Before and After Moderation

5.9.3 Elimination of HTML in Messages and Binary Attachments

In particular, the introduction of moderation eliminated the opportunity for the message with an attached binary file received in September 1998 (497kb in size) that contributed to such a large archive for that particular month. Messages containing HTML text were also eliminated. HTML messages had the effect of increasing storage requirements, given that the message was generally repeated by the email client – once without HTML formatting and once with the inclusion of a number of formatting tags.

One example from the BEE-L archives identified that a message that could have been sent as .4kb of text was posted to the list with HTML formatting that required a total of 2.1kb, more than 5 times as much storage area.

5.9.4 Reduced Quotes After Introduction of Moderation

As discussed previously, the relative importance of quoted material in the archive can be viewed from an absolute measure (lines or kilobytes) or from the measure of percentage of total lines within the messages.

  Number of Lines of Quoted Material by Month

Both measures provide meaningful data for the BEE-L list when the impact of moderation is examined. Immediately after the introduction of moderation, both the total number of lines and the percentage of quoted material in messages were reduced, shown in Figures 42 and 43 respectively.

  Percentage of Quoted Material Before and After Moderation

Moderators rejected or seriously edited messages that, in their opinions, contained too much repeated material. The membership of the list, for most part, accepted the changes required, and where they did not comply or understand how to reduce the quoted material, the moderators effected the change through either refusing to forward the message to their list, or, if time permitted, the quoted material was manually deleted.

5.9.5 Change in Quoted Material Compared to Previous Year

The change in percentage from one year to the next is a good measure of the changing nature of quoted material. That is, rather than reporting only the absolute percentage in a given year, identifying the quantum and direction of change from the previous year will provide a better indication of direction and quantity of change.

  Change in Percentage of Quoted Material from Previous Year

The resulting graph for the BEE-L list shown in Figure 44 demonstrates the changes brought about by moderation. After moderation was introduced, the reduction was significant (more than 4%) for the year that only included the first three months of list moderation. For the first full year of moderation, a further 6% reduction was achieved. The next two years had only minor additional reductions, but when viewed in this manner, sequential years of reduction are accumulative – the change is overall significant.


Abstract || Acknowledgements || Introduction || Literature Review || Methodology || Data Collection Methods || Findings, Results, Interpretation || Conclusions, Recommendations || References || Appendices