Further Analysis of the Mercer “Benefits” Survey

Comments from Nancy Mathiowetz, Professor Emerita, UWM

Former President, American Association for Public Opinion Research

Former Editor, Public Opinion Quarterly

Introduction

It would be useful in reviewing the survey to understand the analytic objectives of the study. What empirical questions are they attempting to address?  And how do they want to use these data? That framework would provide a better lens for reviewing the instrument.

Both questionnaire design and sample design are important to review in understanding the quality of a survey. With respect to questionnaire design, one wants a well-designed questionnaire, for which the wording is both easy to comprehend and does not bias the respondent. The structure of the questions (e.g., Likert scales, open ended, multiple choice) is also important and can contribute to the overall quality of the survey data. A poorly designed questionnaire renders data that may be misleading, biased, or inaccurate.

Similarly, it is important that the sample design –that is, the identification of the population of interest and the means by which to select members of that population—be clearly specified and executed. Once people are selected for inclusion in a study, efforts should be made to encourage their participation so as to have representation across the full diversity of the population of interest.  Similar to a poorly designed questionnaire, a poorly designed or executed sample can result in misleading,biased, or inaccurate estimates.

The Mercer Survey

The American Association for Public Opinion Research offers a series of recommended best practices, including recommendations about question wording (see: https://www.aapor.org/Standards-Ethics/Best-Practices.aspx).  Specifically with respect to question wording, the website states: 

Take great care in matching question wording to the concepts being measured and the population studied.

Based on the goals of a survey, questions for respondents are designed and arranged in a logical format and order to create a survey questionnaire. The ideal survey or poll recognizes that planning the questionnaire is one of the most critical stages in the survey development process, and gives careful attention to all phases of questionnaire development and design, including: definition of topics, concepts and content; question wording and order; and questionnaire length and format. One must first ensure that the questionnaire domains and elements established for the survey fully and adequately cover the topics of interest. Ideally, multiple rather than single indicators or questions should be included for all key constructs.

Beyond their specific content, however, the manner in which questions are asked, as well as the specific response categories provided, can greatly affect the results of a survey. Concepts should be clearly defined and questions unambiguously phrased. Question wording should be carefully examined for special sensitivity or bias. When dealing with sensitive subject matter,techniques should be used that minimize the discomfort or apprehension of respondents or respondents and interviewers if the survey is interviewer administered. Ways should be devised to keep respondent mistakes and biases(e.g., memory of past events) to a minimum, and to measure those that cannot be eliminated. To accomplish these objectives, well-established cognitive research methods (e.g., paraphrasing and “think-aloud” interviews) and similar methods (e.g., behavioral coding of interviewer-respondent interactions) should be employed with persons similar to those to be surveyed to assess and improve all key questions along these various dimensions.

In self-administered surveys careful attention should be paid to the visual formatting of the questionnaire, whether that be the layout of a mail survey or a particular eye towards respondents completing a web survey on a mobile device. Effort should be taken to reduce respondent burden through a positive user experience in order to reduce measurement error and break offs.

In reviewing a hard copy version of the questionnaire, one that appears to have been written for faculty members, given reference to research[1], I see a questionnaire that consists of three distinct types of questions:

  • A partial ranking question (Question 1) that asks for the five most attractive aspects of the position at two points in time;
  • Five-point Likert rating scales, ranging from Strongly Agree to Strongly Disagree and including a “middle” category of “Neither agree or disagree;” and
  • Multiple sets of “maximum difference scales” which ask respondents to examine multiple sets of employment or benefits attributes,requesting respondents to select the most important and least important within each set.

Some specific comments about each of these types of questions follows.

With respect to question 1 (partial ranking question), the format choice is not of major concern –certainly this type of ranking is often used to determine respondent’s preferences. What is of concern is some of the mismatch/sloppiness in the question. First, the question references working for“UW” but most of the employees answering this question do not work for the UW system but rather at a specific UW facility, so the wording is odd. Second, the question itself asks about what “interested” you most (for the first part of the question) and what is most “important” (for the second part), but the column headings use the term “attractive.” While not a critical inconsistency,it’s a bit sloppy. 

The 5-point Likert items have two sets of response options–strongly agree/strongly disagree (Questions 2-11, 15-20, 22-27) or very satisfied/very dissatisfied (Questions 14a through 14r). Of the 22 items that are agree-disagree items, all but two are written in a positive frame, that is,the language indicates a positive point of view. This is not a best practice and the use of such an approach can lead “straight lining” where individuals simply mark the items in a single column, not carefully reading each item. And in general, the field of survey methodology recommends avoiding the use of agree-disagree items since it often leads to acquiescence bias, that is, the tendency to agree to statements, which leads to exaggerated estimates of endorsement for positively-worded statements.

Although I am reviewing a hard copy questionnaire, I note that Question 14 has 18 sub-questions, all requiring the respondent to use the same five-point scale (as well as a “not applicable” option). Once again, if presented on a single screen, this would not follow a best practice and leads to respondents not fully considering each item individually. In addition, it does not appear that these 18 items are rotated, so as to avoid order effects. Once again, this is in contrast to  best practices which recommends randomizing the order of long lists.

Finally, the survey consists of two sets of maximum difference(maxdiff) scaling, an extension of the method of paired comparisons. In a typical maxdiff scaling question, a respondent will rate between four and six attributes of an entity/product/service. Analysis of the data using a specific statistical technique, hierarchical Bayesian multinomial logit modeling, produces important estimates for each attribute for each respondent.

The redundant nature of maxdiff questionnaires is one of the drawbacks of the approach, since respondents often feel that they have just answered the question. In the present questionnaire, Question 12 consists of 11sets and Question 13 consists of 20 sets, each requiring the identification of most important and least important. 

What is odd and most disturbing about Question 12 is that the question states “some of these benefits or programs are not current benefits or programs at the university.” But the attributes listed in Question 12 are not all benefits or programs –they are attributes of the actual work environment or characteristics of employment.  For example, the question includes attributes such as “Type/variety of work,”“Stable employment,” “Career advancement/professional development,” and “Pay.” These“attributes” are juxtaposed alongside benefits such as “Sick leave,”“Healthcare benefits,” and “Retirement savings plans.” In contrast, the attributes presented in Question 13 appear to be, for the most part, benefits.

I find the mixing of employment attributes and benefits attributes in Question 12 to be atypical of most maxdiff designs. It seems inappropriate to ask respondents to make tradeoff assessments between employment attributes such as pay and benefits attributes such as sick leave.The mix of items –which are attributes of two very different constructs –could result in a misleading set of empirical findings.

And placing two of these maxdiff questions next to each other–thereby forcing the respondent to answer 31 sets of these items consecutively—is not ideal with respect to overall questionnaire design or consideration of respondent fatigue.

Sample Design

It does not appear that a sample has been selected for participation, but rather a census of all benefits-eligible employees. What methodology is being used to ensure diverse participation both across all UW system locations and throughout the ranks of faculty and staff? Although a census allows for all members of the population to voice their opinions, it also means that resources to encourage participation must be spread throughout the population, rather than focused on a specific scientific sample.

Final Notes

The survey included no request for demographic information,location, years working in the UW system, or position. Can we assume that this information will be imported from HR files, given the unique link sent to request participation? At a minimum a few of these questions should have been asked to ensure that the data were collected from the person intended to be queried.

And why does the survey bear the UW system and UW-Madison logos,but not those of other universities? If a different methodology is involved for the Madison campus as compared to other campuses, how will this impact comparisons across campuses?


[1] It is unclear if there is a different version of the questionnaire sent to non-faculty staff members.