479703917

Our Evolving Ecosystem: A Comprehensive Analysis of Survey Sourcing, Platforms, and Lengths

 /  White Paper
By: Mark Menig, TrueSample  and Chuck Miller, Digital Marketing & Measurement (DM2)

Abstract

A primary RoR project was conducted by TrueSample and DM2 to examine data quality and the potential for response differences due to (a) the type of device used to answer survey questions, (b) the survey instrument length, and (c) source of survey sample. Examining the combinations of these data collection ecosystem variables quantifies the potential for differing response quality and unintended bias. The questionnaire used for the RoR allowed respondents to be classified into a variety of attitudinal and behavioral segments. The analysis of whether or not these classifications deviated from the averages under different survey conditions provides insight into the potential effects of these variables on sample compositions and response quality. Findings include recommended areas to monitor when one or more of these variables are present in surveys.

Background

US internet usage continues to evolve from a “proximity fixed” experience at a stationary desktop or laptop to a portable experience that allows internet connectivity nearly anywhere. As this experience evolves, consumers will continue to evolve as well, viewing internet access as “always on” and readily available to meet their information and communication needs – on increasingly diversified internet-enabled devices. For many subgroups of the US population, this mindset shift has already occurred.

Given this rapid shift, researchers must increasingly be prepared for respondents that choose to complete surveys on a variety of devices – including smartphones, tablets, and computers (laptop or desktop), to name a few. Current research (by Decipher, Kinesis et al.) suggests that the percentage of respondents accessing surveys via mobile devices is growing rapidly, currently at 20% – 33% or even more for some audiences. This has given rise to a situation where surveys become “accidentally mobile” as more and more people attempt completion on devices for which a survey isn’t optimized or well-suited. As such, lack of adaptation of certain surveys to the devices used to access them may create unintended data quality issues, which may be exacerbated among certain population subgroups. To complicate matters, mobile devices are increasingly taking a variety of forms, with screens generally ranging in size from 3 to 10+ inches which creates vastly different “mobile” experiences. In addition, many times these devices are actually being used in a proximity- fixed fashion when taking surveys, creating further complexity.

The good news for researchers is that this new dynamic in survey-taking may broaden the exposure of survey research, engaging population subgroups that may never have participated in a computer-based experience. Survey portability not only engages new groups, but opens up a range of new methods for obtaining in-the- moment insight that extends the value of marketing research like never before. The research industry is demonstrating a spirit of innovation in simultaneously exploring and embracing mobile research. But two factors are critical to our long-term success: (1) quickly quantifying the impact of the new variables that have been added to our data collection equation, and (2) adapting our survey practices to mitigate issues that may arise as a result of our changing research ecosystem.

Methodology

The objectives of this research were to examine sample compositions for potential bias under a variety of conditions found in typical survey data today. To best analyze this, a pre-planned quota design ensured equal representation and readable base sizes for our dimensions of study, while controlling for demographics. Multiple online sample providers were used across each of three types of sample sources:

  1. Traditional research panel
  2. Virtual or managed panel (i.e., website relationships for “research and more” where respondent demos or characteristics are known)
  3. Affiliate or river sampling (i.e., real-time engagement without a pre-existing relationship)

Quotas were also established by three device types being studied:

  1. Mobile phone
  2. Tablet
  3. Computer

The question of “who uses which devices” has been previously researched – and as such, we held demos constant to best answer the question “what roles do sample source, device type, and survey length play in survey data outcomes.” Understanding how these ecosystem variables might impact resulting survey data was the primary objective.

Respondents were allowed to complete the survey using their device of choice, with demographics controlled through the quota design. As a result, some quotas would have naturally filled more quickly than others (Age 50+ mobile phone respondents were slowest to fill). All quotas were filled steadily and meticulously over an 8 week timeframe to help eliminate the potential for any anomalies.

Combinations of all of these elements were controlled using the following quota design:

evolving-1

 

As respondents entered the survey instrument, 1 of 3 survey treatments was assigned at random. The instrument was constructed in modules, which allowed us to create surveys of varying lengths:

  1. Short Survey (~8 min) = Warm-up, Concept, Demos.
  2. Medium Survey (~13 min) = Warm-up, Concept, Habits, Demos.
  3. Long Survey (~18 min) = Warm-up, Concept, Technology deep dive, Habits, Demos.

By equally applying the quota design across the three survey treatments, our ending sample for each was:

  • n=600 in Total
  • n=200 for each of the (3) device types
  • n=200 for each of the (3) sample sources

This design allowed us to focus on our key areas examination, without introducing differences due to demographics or variations in sample recruitment.

Areas of Initial Focus

Our design will allow for significant subsequent analysis on the effects of sample sourcing, device types, and survey lengths on survey data outcomes. The first areas of focus we sought to understand were:

  1. Which types of respondent attitudinal and behavioral segments might be more or less prevalent given the different combinations of our examined variables?
  2. How do these segments behave under different survey conditions?
  3. What are the unintended data outcomes (e.g., data quality issues) that may result through various combinations of our studied variables?

As part of the survey design, many standard psychographic and behavioral measures were included to test for the possibility of trait bias as well as method bias. Combinations of survey questions were used to create classifications for study. The initial classifications included:

  • Influentials (from stated activities and connections, online and offline)
  • Social Media usage (frequency and type of use, and interaction behaviors)
  • Shopping Affinity (frequency online and offline, and general attitudes)
  • Heavy Online users (hours per week)
  • Technology usage – three classifications (number and frequency of behaviors)
  • Mobile Enthusiasts (number and frequency of behaviors)

Creation of these classifications was part art and part science. Generally speaking, the distributions of the indices created for these items were evaluated for natural breaks that bifurcated two groups, or in some cases a break was established along the top quintile or quartile (or thereabouts). Stringent definitions for the classification were not needed, as the main objective was to evaluate the behavior of the defined classes AND assess the proportions of the classes that appeared (and how they changed) across the ecosystem variables being studied.

Data Quality Definitions

The components of data quality are multidimensional. Survey data quality is generally a function of comprehension and engagement, both the ability to comprehend the information in the questionnaire and the level of effort applied to providing valid and consistent responses. Other factors such as the environment in which the questionnaire is administered factor-in as well. With this in mind, our survey instrument was constructed using a variety of question types, rating scales, and subject matter designed to assess consistency and engagement during the interviewing process. Among the items we measured and evaluated to determine the quality of responses were:

  • Item completion
  • Time to complete the survey
  • Time spent on various survey items (e.g., evaluating a concept)
  • Answer consistency (within a grid, and across the entire survey)
  • Evidence of straight-lining in grids
  • Rating consistency on “opposite” attributes (e.g., “I always buy brands” and “I always buy generics”)
  • Propensity to state many low incidence items
  • Open-ended response quality, including length and contextual relevance

From examination of these items a composite Quality Score was computed for each respondent. This score generally summed many binary (1,0) “poor data” flags, although some art was applied to categorize a few items on a continuum for quality (e.g., open-ends).

Questionnaire administration: Other Details

  • Multiple grids were used, with the most used in the Long Survey treatment to assess fatigue. Item response scales varied among 3, 5 and 7 points. Wider scales, such as 10 or 11, were not used in this study but may be employed in subsequent RoR.
  • A few commonly known, factual questions were also included in the survey instrument, although correct/incorrect answers were not used as part of the Quality Score. For these questions, subsets answering Correctly and Incorrectly were examined independently, and maybe not surprisingly, those answering correctly provided higher quality data across all of the survey treatments. Use of these types of questions to identify potentially lower quality data needs deeper examination, as removal of respondents based on their “book knowledge” could create more biases under certain circumstances.

Concept Stimulus

evolving-2

A common element across all three survey treatments was the presentation of an unbranded Google Glass concept. This recently developed high tech product would be familiar to some respondents, while unfamiliar to others. In addition, discussion of the concept allowed us to investigate the role of technology and its potential intrusion into our lives – subjects that elicited a variety of open-end responses, including some that were very passionate.

Several types of survey items were used to assess respondent attitudes and behavioral intent towards the concept. These included purchase intent (with and without price), as well as attribute ratings, brand and product awareness, and a why/why not buy open-end. These items, as well as time spent viewing the concept, were compared across segment classifications and the ecosystem variables under examination.

An additional battery of questions was posed under the assumption that people could – in the near future – embed computer chips into various parts of the human body (e.g., surgically implanting mobile phone technology). This provocative proposition was anticipated to yield the most intense reaction – as it would be difficult to be indifferent to such a prospect. These items were added to investigate more extreme emotional reactions and, perhaps, to produce more distinctive differences among our dimensions of study.

Results

The survey itself provided many interesting attitudinal and behavioral measures for comparison. The analysis focused on the examination of:

  1. Individual item responses and response patterns across items
  2. Changes in composite Quality Scores for the segment classifications across the ecosystem variables

As might be expected, time spent viewing the original Google Glass concept was inversely proportional to screen size. Multiple factors may have contributed to this, including:

  • Due to the smaller-screen, respondents may have been spending more time to read and concentrate better on all smaller print
  • Some respondents may have spent time with display settings, manipulating the concept on their smaller screens to make it more legible

Regardless, time spent reading the concept was correlated to response quality across the entire survey. A short amount of time spent on the concept screen was a strong indicator of lower quality across the entire survey. Interestingly, it was hypothesized that an unusually long time spent on the concept might indicate inattentiveness and subsequent poor quality, but this was not true.

evolving-3

Knowing that some respondents would be familiar with Google Glass, we originally thought that among the Higher Quality respondents those familiar might view the concept for less time than those unfamiliar. That turned out to not be the case; prior awareness of the product did not impact time spent viewing the concept, possibly due to the viewing factors noted above.

evolving-4

Among the other more interesting results:

Reactions to the surgical-implantation concept, unlike the Google Glass concept, required no special graphical representation.

Q26. Some futurists believe that an evolution of computing devices might include surgical implantation of small chips in the human body – in places such as your hand, wrist, eye or ear. One such application might be embedding a mobile phone chip in your body to simplify calling.

Generally speaking, and assuming such chips were federally approved, available and affordable, which of these statements best describes you? (“Buy” also assumes surgical installation)

  • I would be among the first to buy this product (7%)
  • I would buy this product after it has been on the market a while (11%) I might buy this product if I decided it could benefit me (22%)
  • I doubt I would buy this product even after many were using it (22%)
  • I would not buy this product under any circumstance (38%)

The question above was basically the lead-in to the provocative open end:

Q27. How do you feel about a future where people might choose to implant computer chips in their bodies? (Please be as specific as possible)

As anticipated, this question generated an interesting and useful range of written text responses, affording investigators to clearly define poor and high quality open-end responses. A breakout yielded the following:

  • 8% of people gave a “junk” response (invalid for the question): N/A, Yes, No, None, or gibberish
  • 8% of people gave a valid but very short response of six or fewer characters, such as: OK, Sad, Wary, Fine, 666, YOLO (“You Only Live Once”) and “1984″ (referring to the novel)

Together, the above 16% of respondents were designated as giving a Low Quality response on the respective, surgical-implant open-end in this section of the questionnaire.

The rest of the open-end comments researchers considered as High Quality responses. Some natural groupings occurred:

evolving-5

These distinctions of Low and High quality were useful in the deeper examinations across other variables.

Use of the Quality Score

The Quality Score was derived from individual item quality measures (listed on page 4) and other patterns of response (e.g., time spent on key items and in total). Here are some of the significant outcomes using the Quality Score to group respondents:

  • Overall 23% of the sample was classified as “lowest quality” quartile
  • Males scored significantly worse than Females
evolving-6
  • In general, Quality Scores improve with Age. Notably, Ages 18-34 had the worst Quality Score, nearly twice as poor as that for Ages 50+
  • Quality Scores improved with time spent on the interview:

evolving-7

  • Asian, Native American, Hawaiian and Aleutian ethnicities indexed worse on their Quality Scores than did others.
  • Individuals with household incomes around the US median appear to be the most conscientious survey takers.
  • Interestingly, anyone answering “Prefer not to answer” at any demo question indexed for a much lower quality score than on-average

The chart on the following page summarized the Quality Score indices for across key demographics

Quality Scores by Demo Group

evolving-8

Interestingly, those with non-college “Studies after High School” provided the best data.

evolving-9

When examining quality by device type and survey length, those completing interviews on tablets appeared to be most engaged and provided the highest quality data. Those completing on PCs/computers provided the worst data on surveys of shorter lengths, potentially indicating this mode made it easiest for respondents to inattentively “fly” through the survey.

The best data outcome across all combinations of study resulted from: the Short survey, on a Tablet, among Females, with a Shopping Affinity.

Lower Quality Percentages by Key Dimensions

evolving-10
evolving-11

Survey Completion Distributions by Treatment and Device Type

evolving-12

Reflections

This study has clearly demonstrated that certain subgroups are more predisposed to providing data of poor quality. While troubling (but not surprising) this confirms our collective intent to examine practices in the pursuit of continuous improvement is well-founded. That said, with the continual diversification of internet enabled devices and on-the-go survey taking it may be time for us to step-up our survey game:

  • Other research has shown it is reasonable to assume a large percentage of people will take your survey on a device that you didn’t necessarily intend/anticipate
  • Ongoing work to explore methods of dividing or “chunking” surveys into more manageable pieces should be accelerated with the goal of enhancing respondent experience and preserving respondent and response quality
  • When longer surveys are unavoidable, the lower quality of responses among smaller-screen users might be improved by providing voice-over narrative of longer text, or by using streaming video format
  • The facts that faster concept viewing and quicker survey completion are highly correlated to lower quality data may be mitigated to a large extent by including intermittent (and even entertaining) graphics and narrative

In short, our survey designs are not changing as quickly as the world around us.

Other, more well-known issues in survey data quality were confirmed here. These include problems that researchers have continued to struggle with for many years:

  • Respondents who choose “Prefer not to answer” on demographics are much more likely to demonstrate lower quality – they are not trying hard to provide thoughtful responses
  • Some subgroups appear predisposed to give poor data despite being given a shorter survey
  • Lack of attention, comprehension or analytical thinking, as well as clumsiness and distractions, are among the most difficult quality problems to solve, but they must be quantified or we risk generating inaccurate insights

At some point, simply dropping respondents from the sample becomes absolutely necessary, which requires either (a) interviewing more sample (often out-of-pocket for the researcher) or (b) diminishing the value of the study to the client, who ends up with a smaller than expected sample of acceptable interviews.

However, there is some hope in the capabilities of real-time survey monitoring, by the survey software program itself. For example, when a respondent straight-lines a battery of questions or fails a quality “trap,” the online survey program could, in principle, politely reject these responses and encourage revision. Instead of an abrupt bold red-font error warning, a pleasant voice-over narrative might ask a respondent to be more vigilant (as if “we’re watching you”). In short, stepping-up our survey game. Why do “Tech Enthusiasts” provide the worst data of all groups studied? Maybe because our plain, static-page surveys are immensely boring by comparison to other technology in which they engage. The internet has become an incredibly rich environment filled with deeply engaging content – where in comparison, online survey research for the most part remains antiquated having yet to harness the unique capabilities of internet enabled consumer devices.

Considerations for Future Work

The study had four key items to be addressed in future work. These include:

  1. Use of 10 or 11 point scales, individually and in grids, will mirror surveys in-use while providing clearer distinctions in the level of cognitive effort requested from respondents
  2. Collection of more time stamps on individual items or user-specified sections of the questionnaire will provide greater depth of understanding around survey completion
  3. Additional questions and other challenging items in the questionnaire will better differentiate the “Long” survey treatment from the others
  4. Assessment of more sample sources, along with increased granularity of device types, will enrich our understanding

Conclusion

We live in an era in which respondent participation is in decline, and other data sources (e.g., customer databases, web behavioral data) are sometimes used without any perceived need for survey data. These substitutes have emerged along with an increasing number of specialists who, despite their quality computing skills, lack the contexts of marketing, sampling, and statistics. Consequently, traditional marketing research budgets have been in decline for years, while more money is poured into data science, pattern tracking and big data management.

Contrary to some mistaken assumptions, primary research surveys remain the only way to produce some insights – such as meaningful responses to measure attitudes, beliefs and emotions that drive brands and purchase behaviors. Such measures are clearly needed both now and in the foreseeable future, and as such, we should continue to refine the craft of survey research. Given the need that surveys remain an integral part of the insights toolbox, we must vigilantly measure, analyze, and improve survey data quality to best meet client needs and better understand our changing ecosystem.




  Back to All Articles
Like this article? Share it!