Product Information:-

  • Journals
  • Books
  • Case Studies
  • Regional information

Managing a student research project

Options:     Print Version - Managing a student research project, part 5 Print view

Gathering data to an adequate standard

It is important that the researcher demonstrates that the data were properly collected. Ideally this means that others working at the same level would have been able to arrive at the same readings or observations. Where primary data are concerned this is usually not feasible. Instead, researchers must settle for following a procedure that will be adjudged adequate in the light of the level of their research; in particular by the examiners to whom the research report will be sent for assessment. This question is taken up later in the broader context of the assessment of the research report. The following checklist of points can be used to secure adequate data gathering standards for both secondary and primary data. These are that:

  1. The data actually measure what they purport to measure.
  2. Proper attention was paid to measurement error and the reduction of its effects.
  3. A suitable sample was used; in particular that:
    it provided a basis for generalization; and
    that it was large enough for the effects of interest to be detected.
  4. Data were properly recorded; in particular that:
    the conditions under which the data were gathered were properly noted; and
    that suitable data recording methods were used and efforts were made to detect and eliminate errors arising during recording.

Not all of these points apply in every situation, and the full list is perhaps only appropriate in the situation where data are to be gathered in some systematic way and are of the nominal, ordinal, interval or ratio type. The researcher's own notes, which we view as textual data and an important data source, would probably need to be judged only against standard 4 above. Nonetheless, the list will now be reviewed point by point, with the greatest emphasis being placed on data recording.

Ensuring the data measure what they purport to measure

Very often it is difficult to measure the actual variable of interest and instead surrogate measures may be adopted. This is particularly likely to be a problem in secondary data gathering where the researcher may not know just how the data were derived.

Errors in measurement

Quantitative data are often subject to measurement error and the size of that error may have important implications for both the way the data are used and for the scale of the data gathering effort.

Aside from errors due to malfunction of measuring equipment which are of no interest in this context, error may take the form of bias: as in the under-reporting of small company activities in many official statistics; deliberate or instinctive falsehood, as in many answers to questionnaire surveys; or distortion of one form or another, as in the response of a laboratory amplifier to a high-frequency signal.

The practical implication of all three possibilities is the same: information is lost and the data do not fully represent the phenomenon under study. Though it is often easier for the engineer to overcome such difficulties by employing more sophisticated measuring devices, similar opportunities may well arise in the social sciences.

Another form of measurement error that is relevant to quantitative data is pure random error that is supposed on average to fluctuate about zero. Since this is relatively easy to cope with statistically, it is the usual (though not always the most accurate) model of error adopted.

In practice we can attempt to deal with measurement error in one of two ways. The first is to measure the phenomenon of interest by several different methods. Where each gives rise to random measurement error, a combination of the measurements can be expected to give a better estimate of the true value, provided the methods are not subject to the same error. Obviously this approach requires more data gathering effort but has much to commend it in fields where accurate measurements are difficult. It is, thus, much used in social science research often under the term of "triangulation".

The second way of dealing with measurement error that is more or less random is to increase the size of the sample, and this is discussed below.

Choosing the sample

Data gathering normally involves some kind of sampling. The conclusions that can validly be drawn from the sample depend critically on both the population sampled and the procedures used for generating the sample. The first step in choosing the sample is, accordingly, to choose a target population to be sampled that permits interesting conclusions to be drawn and to select a sample in such a way that the conclusions are valid. Though this is unlikely to be a problem for the physical scientist it certainly is in many other fields, particularly the social sciences. Very often, the sheer cost of data gathering pushes the student in the direction of some "convenience sample" that meets neither of these criteria.

Any statistical method requires a certain size of sample to have a reasonable probability of detecting an effect of interest and in these circumstances the collection of enough data may be quite beyond the resources available to the student researcher. Some social science projects, for example, are very unlikely to produce the hoped for results because they are not based on enough observations. In such cases a large sample is needed if the effects are to be revealed. A crude rule of thumb applicable in a number of situations is that the sample size needed is proportional to the square of the accuracy of the estimates derived from the sample. Thus, to double the accuracy it is necessary to increase the sample size fourfold.

There are many different procedures that can be used for sampling and the reader should consult a specialist text for further details.

Recording the data

An important aspect of the experimental design model is the idea of factors and, by implication at least, the values of all factor levels should be recorded along with the actual measurements of interest. This provides protection against the discovery that further variables, and therefore measurements, are relevant to the phenomenon in question. Equally, notes on the sources of data and time and date of collection can be extremely useful when, many months later, the researcher is attempting to correct an error or to decide whether a set of figures whose origin has long since been forgotten can be used in analysis. In both cases the recording of adequate additional information will help to ensure that few data that have been collected will prove to be unusable.

Researchers do waste effort by having to repeat data gathering activities because certain information was omitted originally. In practice, it is almost always straightforward to collect additional measurements, when the initial data gathering takes place. In further discussions of data recording it will, therefore, be assumed that consideration has been given to exactly what data are to be recorded and the focus now will be on how to record them.

In primary data gathering, recording may involve two processes. Firstly, data must be captured in some way that is feasible in the context in which they are to be gathered, following which it is often necessary to transcribe or convert the data into a form suited to computer input.

The main concern here is the reduction of data to a form suitable for computer analysis. Ratio and interval scaled data are already in this form and present no problem. Ordinal data can either be input as ranks, or equivalently, using letter codes – for example, A=1, B=2. Pictorial data need to be converted into numbers in some way or other; the most usual method nowadays being by the use of a digital scanner. We note though that digitized picture data, although numeric, is not suitable for most analytical purposes; a picture generates too much numeric data. Nominal data may be recorded by using the number 1 to denote the presence of some attribute, (for example, the item is green), or zero if it does not possess it. For pure textual data there is little choice but to input them as they stand.

Transcription may also be the major process involved when secondary data are being used. This two-stage process is at best somewhat inefficient and at worst may introduce errors at the transcription stage, so automatic data gathering methods that collect the data directly in a form suitable for computer analysis have obvious attractions. To that end it is normal to use a digital scanner in conjunction with a text reading package to transcribe secondary data that exist only on paper, such as tables in books. When this is done, however, it is rare that transcription will be completely accurate especially with older documents produced in non computer typefaces. It is, therefore, necessary to undertake a careful correction process (which is much facilitated for prose by running the electronic text through a spell checker). Such processes do not, however, find all the errors without considerable effort. If the text is one that is useful to other researchers it may well be worth putting it on the Web. Such scholarly generosity has been the source of much material on the Web, especially in the arts and humanities.

The detection of errors at the data capture stage may, by analogy with data processing terminology, be dubbed validation. The ensuring of accurate transcription will similarly be referred to as verification.

Validation is primarily based on identifying implausible data: for example, a questionnaire that records a pregnant man or more typically, but more subtly, one anomalous liberal response from an individual amidst a host of authoritarian ones. Not all anomalies will, in fact, be errors and, conversely, such procedures will not identify data that could be correct but in fact are not. Successful validation is heavily dependent on experience and this is one reason why training in the use of data gathering techniques is necessary.

Verification lends itself to more mechanical methods. The traditional approach in data processing, for example, is for two different people to enter the same data into the computer system and then accept the two sets of data if they are the same but otherwise to examine them for transcription error. This approach relies on the reasonable assumption that the same mistake is unlikely to be made by two different individuals. However, the student researcher is unlikely to be able to afford to pay for this type of verification which is increasingly confined to large-scale professional surveys so needs to think of ways either of approximating to it, or better, improving the quality of data entry.

The rejection of data at the validation or verification stage is a somewhat negative process. Though transcription errors are usually remediable, validation errors will not be unless thought is given to making them so. The only way in which this can be done is to introduce redundancy – that is, extra information – into the data gathered so that incorrect or missing data can be reconstructed. If, for instance, the aim is to measure a length, one way is to measure it in millimetres and record it. If this is done incorrectly, however, the complete set of measurements related to this length will have to be thrown away. On the other hand, if it is also measured in inches, any error will be evident when that measurement is converted to millimetres. This example also throws light on the role of "feel" in validation. Many people in the UK and USA have a far better intrinsic concept of imperial measurements than metric ones and a check of this sort will accordingly have a good chance of detecting the error at the time when the measurement is made.

A related issue is the need to check what units a variable is measured in. The loss of a Mars Lander because NASA produced measurements in feet and inches which were assumed by European collaborators to be metric illustrates the point neatly. However, more subtle versions of this problem exist: time series of, say, office space construction may have switched from being recorded in square feet to square metres.

In many social science applications it may well be possible to approach the respondents again; and in science and engineering studies the measurements can, in principle, be repeated. Nonetheless, both of these approaches require effort and in some cases may for all practical purposes be impossible. Therefore, if the researcher is to avoid throwing away hard-won data it is advisable to devote a little thought to how errors in them can be detected and eliminated.

Though the avoidance of error is a common theme in all types of data capture or transcription there are many different methods that can be used for either or both of these purposes. These differ in the amount of equipment and preparation required to use them, in their costs and in their suitability for dealing with large volumes of data. Though the division is far from being clear cut it is useful to distinguish between methods that are primarily suited to data capture and those that are mainly used for transcription, and that approach will be followed here.

Data capture

The one form of data that will be gathered by all researchers is their own research notes which are worthy of more attention than they are often afforded. Though the researcher who has pursued an almost uninterrupted academic career should have developed effective note-taking practice this may need amendment when, as is often the case, the research is concerned with a new field of study. The problems of the part-time researcher or of someone returning to academic study after a number of years are likely to be greater.

The basic problem with research notes is that they arise from a variety of activities, from the researcher's own reading through to occasional flashes of inspiration. Usually, they eventually comprise a huge mass of data of many different types. Furthermore, there is no simple way of ensuring that two different pieces of data that should be juxtaposed will be.

As far as notes on books are concerned the most effective practice is probably to make them as the books are read and to produce photocopies of selected passages of particular interest that can be annotated as the student wishes. The selections should rarely amount to more than a few per cent of the work in question unless some form of textual study is being undertaken, so there should be no problem with copyright law. Certainly it is usually a sign that the researcher has not digested the contents of a work if it is found necessary to copy most of it, and proper cross-referencing soon becomes impossible if the practice is repeated wholesale. Moreover, problems of copyright law and – even more importantly as far as academic institutions are concerned – plagiarism, are likely to arise. An alternative approach that avoids these difficulties is for the researcher to compile notes as the text is read.

In normal circumstances, researchers will want to produce substantial notes of their own relating to projected analyses, organization of the research report and so on. These are usually more easy to deal with. For the unexpected insight it is worth carrying a pocketbook, notebook PC or electronic organizer in which sufficient information can be jotted down to enable the idea to be properly worked up later into notes.

Logbooks and journals are the simplest method of data recording available to the experimental scientist or the researcher conducting a field study. Their use is relatively straightforward and is often facilitated by employing a standard layout for each type of observation to be made. Appropriate blanks can be photocopied to be filled in and filed in a binder as required. It should be noted that, nowadays, the logbook will frequently be on a PC, since even in the field, notebook and sub-notebook computers provide a more convenient and flexible way of recording notes and observations. As research is a learning process some students find benefit in viewing the logbook as a chronological record; not a day-to-day diary but a record of key incidents. Examples of such would be: opinions expressed by others during the study; sudden insights gained; and more effective ways of conducting the research. Notes of this type could trigger action and, in some fields of study, assist in the production of the research report.

Interview notes and similar materials are rather more difficult to structure because it is hard to predetermine the course of an interview. Nevertheless, there will in most cases be an interview schedule listing those topics to be covered and this may well serve as the basis of a data gathering instrument with half a page, say, being allocated to each subject heading.

With a little experience it is usually possible for researchers to generate their own "shorthand" for recording, thus enabling them to come nearer to a verbatim record. In view of the high information content of pictorial data it makes obvious sense for the researcher to record data in that form, where possible.

Questionnaires provide a more structured approach to gathering data of this type. Where closed questions, (those which provide for only a limited list of responses) are used, subsequent transcription is particularly easy. It pays to design them from the outset with processing in mind if it is intended that they should be analysed by computer eventually.

Tape or digital recorders are generally acceptable in most interviewing situations subject perhaps, to certain parts of the interview being "off the record". If using a tape recorder, it may well be worth carrying back-up tapes and batteries, and both "narrow" and "wide angle" microphones, so that the most appropriate type can be selected.

One aspect of tape recording which is frequently overlooked by student researchers is the cost of transcription. Six to eight hours transcription per hour of tape recording may well be needed. Moreover, it needs special equipment and is best carried out by experienced staff.

For these reasons, and to cope with the situations where recording is not acceptable, the student still needs other methods of recording. Though the act of taking notes can be useful in pacing an interview, the ideal method is one that the student can carry out while still looking at the interviewee. At a minimum this will usually require some sort of shorthand or code with the ideal being the ability to recall every detail of the interview (remembering that non-verbal behaviour is often very important) an hour afterwards. It is useful to write notes on the interview as soon as possible after it has taken place.

Lightweight video cameras are easy to use and it is a straightforward matter to produce digital video that can be stored on a hard disk. Given the advent of data compression formats such as JPEG for video and the size of modern hard drives, substantial amounts of video material can easily be stored on a PC. As noted above, however, applying computer analysis to such video images remains, in general, a difficult task.

A special type of video image that is useful to many researchers, especially those involved in information systems research is the image of PC screens. These can be stored as a video image or in the case of web browser screens as hypertext markup language (html). This type of data has the further advantage that software packages exist that enable the interaction of a user with a website to be analysed.


This article is adapted by the authors from: by J.A. Sharp, J. Peters and K. Howard, Gower Press, 2002.

Buy this title on Amazon.