Return to Nick Wallingford's CV


Abstract || Acknowledgements || Introduction || Literature Review || Methodology || Data Collection Methods || Findings, Results, Interpretation || Conclusions, Recommendations || References || Appendices

Conclusions and Recommendations

Outline of Section

6.1  Use of List Archival Data
6.2  Ethical Issues
6.3  Measures of List Activity
6.4  Measures of Contribution Levels
6.5  Motivation for Participation and Non-Contribution
6.6  Life Cycle of a List
6.7  Header Line Analysis – Subject, Date/Time
6.8  Domain Analysis
6.9  Domain/Host Naming
6.10  Nature of Contributors to BEE-L
6.11  Quoted Material and Impact of Moderation
6.12  Summary


6.1 Use of List Archival Data

The quantity of data associated with an Internet mailing list archive would indicate a rich source of material for research. In the case of the BEE-L list, this has been true, even without delving into the information contained within the postings. This dissertation has identified a wide range of analyses that can be conducted on the metadata contained in the message headers stored with each message in an archive. As well, it has revealed a number of areas of potential and useful related research both on this list and upon email distribution list archives in general.

Though the analysis of email distribution list archive material is not entirely new, many of the techniques and methods are still to some extent under development. Langner (1999) identified some of the key advantages available in such research, as well as some of the elements that need to be addressed before absolute reliance on this source of data. In particular, the issues related to authenticity and identity need further examination.

This research identified some of the problems of relating individuals to email addresses. Individuals posted messages with multiple “name forms” (“John Smith”, “J Smith”, “Smith the Beekeeper”) as well as using multiple email addresses either concurrently or over time. Methods for analysing and accumulating these identifiers were developed in the course of this project. Effective methods of dealing with such data need to be described and accepted for use in further such research, in order to ensure continued consistency.

The related issues of forged addresses may, with some mailing lists, become of significance and import. While this research into a “beekeeping list” did not reveal particular problems in this regard, other lists referred to in the literature would appear to include contributors with significantly more facility to use email deceptively and with willingness to do so. Research that relies on the headers supplied should, at the very least, attempt to verify the degree of potential deception that might be contained within the data.

6.2 Ethical Issues

As this dissertation has primarily involved a descriptive statistical analysis of the data, the writer has not been obliged to address the particular ethical issues related to list archives. Extended research using list archival material would necessitate such an examination, however. The storage and availability of written communications on the Internet, generally, require an extensive and thoughtful consideration of issues of ownership, privacy and authenticity. The material in list archives might appear to be “public domain” in nature; any research that could be subsequently related back to particular individuals might well require extensive contemplation of the ethical issues related to the research.

Sixsmith and Murray (2001) and Langner (1999) have identified and discussed some of the issues involved. As the use of such distribution list archives is relatively recent, however, it might well be that perceptions and expectations of ethical behaviour are still in the process of formation and may lack adequate expression and description.

6.3 Measures of List Activity

The two measures of number of messages and size of messages have been identified as having significant impact on a list member’s experience of the list. In the case of the BEE-L list, both the number of messages and the total amount of storage required for them increased until early 1998, with a decrease following the introduction of moderation. The average size of any individual message, however, has remained relatively constant over the list’s history, due to the large number of short messages during any given time period. There have, however, been times when large messages have been posted, impacting both archive size and user experience.

The need for some acceptable measures of “list activity” has been identified. Deficiencies in “simple” measures of “number of messages per year” or “size of annual archive” were recognised and discussed as they related to the day-to-day user experience of receiving messages from the list.

The examination of list activity from an “annual view” compared to that of a “monthly view”, for instance, revealed the impact of variations of activity over time. For a clear understanding for activity on a list, summarisation is necessary, but has an inherent potential to mislead through the simplification of the data involved. Other lists should be examined using similar tools as employed in this research to determine if there are any standards that could be described and accepted as generally useful.

In the process of this research, the writer identified and initiated the use of “moving averages” for message numbers over time. This measure was developed to overcome the shortcomings related to the simple measures referred to above, providing a more realistic description of the nature of list activity. Further work should be undertaken to examine the effect of different “smoothing factors” on this list’s activity, as well as the impact of these measures on lists that have extremes of activity over time.

Similarly, the quantification of minimum and maximum daily numbers of messages in a month enabled the writer to graphically provide indicators of list activity. Daily variations can have a significant impact on a list participant and the use of an email distribution list. Even if the “daily average” number of messages is acceptable, the percentage of days with extremely high or low numbers of messages can considerably impact upon a reader’s perception of activity levels. Other research should be considered to identify methods of expressing this form of variation in a manner that could be readily understood by list users.

There is a need for an understandable set of metrics that can readily describe the frequency and size of postings in a way that could help a user to anticipate the degree of commitment involved when becoming a member of a list. Such an overall measure need not be complex or mathematically intricate – it simply needs to convey to a reasonable degree of accuracy what a list subscriber might be expected to encounter once having joined the list. Such a measure, if successfully developed and generally accepted, could be published along with the list subscription information as a means of targeting the list to subscribers most likely to remain after the initial enthusiasm of joining.

6.4 Measures of Contribution Levels

Mailing lists have been shown to develop a sense of “virtual personality”, the term used by Zenhausern (1994) and Zenhausern & Wong (1998). To a great extent, this can be determined by the “primary contributors”, the regular writers of messages. In the case of the BEE-L list, analyses have quantified the postings of the most prolific message writers, indicating that even in periods with large numbers of messages, a relatively small group of individuals contribute a disproportionate number of the postings. Measures of “contribution levels” could be developed to indicate one aspect of the potential list personality at any given time, incorporating both the single message posters and the individuals who contribute large numbers of messages. Both have been identified as having an impact on a user’s perception of the list. A series of metrics to convey the relative contributions of high- and low-posters could be developed to provide an objective measure of expectation of list nature.

6.5 Motivation for Participation and Non-Contribution

The reasons for participation and the perceptions of value provided and received by list members were examined by Rojo (1995) and Rafaeli and LaRose (1993). This dissertation has only quantified the contribution levels of individuals; further research could be undertaken to reveal the motivating factors and resulting perceptions of the individuals who contribute to an email distribution list. The motivations behind some messages might be considered somewhat obvious: self-aggrandisement, self-promotion, malicious, innocently misled, informational, altruistic and many other such reasons. Research to characterise the nature of contributions over time, relating them to the individuals who made them, might well lead to measures and characterisations of the list personalities that individuals assume at different times and periods of list participation.

More involved research could include the motivations of those who don’t contribute to list activity. Only a small percentage of the people who receive most email lists ever contribute significant numbers of messages. The people who only ever read, or who only rarely contribute, might have a different perception of received value than those writing the majority of the messages. In particular, the perception of these “lurkers” of the “contributors” might provide a useful insight into the nature of value of a list. Some value may accrue to an individual by simply reading a list’s messages. Without the particular value increases related to someone making postings, though, there would be nothing to read. The overall value of a list must be seen as a complex combination of assessment of worth by the individuals concerned – both in writing and in reading of messages. While research would be complex and composite, it might provide a better understanding of “the collective good” that Rojo (1995) and Rafaeli and LaRose (1993) identified and considered.

6.6 Life Cycle of a List

Measures of overall activity levels, combined with measures of contribution levels, could be further developed in conjunction with Nagel’s (1994) concepts of “activity cycle”. The measures of activity levels and contribution levels by individuals could be examined in relation to the milestones in list development proposed by Nagel. It is reasonable to believe that Nagel’s description of the stages “initial enthusiasm” and “growth” might be able to be established as measures of activity and contribution. Other stages such as “community” and “discomfort with diversity” would require more qualitative analysis of the contents of messages.

6.7 Header Line Analysis – Subject, Date/Time

Other metadata such as the number of header lines in list archives can reveal information about the changes in Internet usage. The number of header lines per message in the BEE-L archives have increased over time, with the additional use of lines to indicate such MIME (Multi-Purpose Internet Mail Extensions) information as encoding and content type.

The Subject: lines of messages can convey information additional to the actual topic assigned to the message. In particular, the use of “blank” Subject: lines and those with a prepended Re:, indicating a reply to a previous message, provide some information about the list’s use by members. In the BEE-L list, only a small number of messages have been sent with no subject, but more than half of messages in most years had a subject line with a prepended Re:, which may have been added manually by the author or generated through use of a reply function.

Analyses of time of day of postings, day of week of postings and month of year of postings have revealed significant and consistent preferences for morning hours of weekdays, with two peaks of monthly activity that correspond to the most active times of the beekeeping year for the majority of the list’s members.

6.8 Domain Analysis

The domain component of the email address can in some instances be used to indicate country of origin, but with the increasing use of generic top level domains, the analysis can only be preliminary and incomplete. In the case of the BEE-L list, it has shown that a minimum of 69 countries have been represented in the postings of the 12 years of the list. In recent years, the participation levels from country code domains has decreased to some extent, but given the preponderance of generic domains now used, this may not in fact be indicative of significant changes to posting patterns. The reduction and ultimate demise of BITNET address on the list, and a rise in the use of web-based email domains for the BEE-L list has been quantified in the dissertation.

Further research could develop a “lookup” database table to provide a better translation for gTLDs such as .com and .net to country of origin. While current online databases can be examined, as undertaken by Zook (1999, 2000), there are no methods described to readily utilise the data contained in the registration of domains files. Registered domains will always be something of a “moving target”, but this research identified that a majority of the email addresses involved came from a relatively small number of domains overall. Country of origin analyses of email messages would be considerably enhanced by reference to country origins of predominant .com domains. In the case of this research, those .com domain messages made up nearly one third of all messages. It is easy to see how significant they can be in any analysis that wishes to relate postings to countries.

6.9 Domain/Host Naming

The changing nature of domain and host naming conventions were examined in terms of domain levels. The use of mail with named individual hosts would appear to have diminished over time, with most domains now using only two or, at most, three domain levels (and that mostly for ccTLDs). Further research might be able to identify the reasons for this transition in naming conventions and use. The writer has speculated upon the increased capability of mail transport software, but issues relating to internal security and work patterns of individuals might have initiated the change.

In the early 1990s, for instance, one person would generally only ever wish to work (and receive email) at one particular computer. It might be that the demands of the office and out-of-office email access requirements precipitated the changes to domain levels employed by system operators.

6.10 Nature of Contributors to BEE-L

The analyses undertaken in this dissertation add weight to the hypothesis that the overall “nature” of the people who post to the BEE-L list has transitioned from education/scientific/regulatory personnel toward a wider range of “conventional” Internet users over the twelve years of the list’s existence.

Given the growth of the Internet and inherent changes to availability and access, this is not unexpected. However, this hypothesis remains unproved because it is not known how many education/scientific/regulatory personnel have used email addresses that do not contain .ac, .edu, .gov or .govt gTLDs. It is, nonetheless, the first time such a quantification has been attempted with the data available from this list.

6.11 Quoted Material and Impact of Moderation

Quoted material in messages can usefully provide context and support meaning, but can also provide unnecessary bulk, repetition and complications within a list archive. The use of quoted material in the BEE-L archives has been calculated, and the dramatic increase in the percentage of such repeated text in the period up until 1998 has been quantified.

The impact of moderation on a list can have both social and technical implications. In the case of the BEE-L list, the effect on message numbers and size of messages was examined, with the conclusion that moderation had a considerable influence on the reduction in frequency and size of messages. As well, the percentage of quoted material was significantly reduced to provide a more succinct set of archives for improved ability to search for previous material.

The examination of the social impacts of moderation on a list would provide a range of other related research. The quantitative approach used in this dissertation could be extended and developed with an analysis of the nature of the messages before and after moderation.

Similarly, the perceived role(s) of different moderators could be related to the impact on a list, both quantitatively and qualitatively, to determine to what extent a “moderator” acts to facilitate, encourage, limit or in other ways influence the contribution and nature of messages.

6.12 Summary

Using metadata only, a mailing list archive can be objectively analysed to provide a wide range of information about the list’s history and current situation. Further comparative work with a range of similar and dissimilar lists could be used to develop the series of measures that could readily assist a user to evaluate any particular list. These metrics would provide a means of anticipating list activity and, to some extent, provide indications of the possible list personality in terms of contributors and nature of participation by members.


Abstract || Acknowledgements || Introduction || Literature Review || Methodology || Data Collection Methods || Findings, Results, Interpretation || Conclusions, Recommendations || References || Appendices