The results are back from your online surveys. Now that you’ve collected your statistical survey results and have a data analysis plan, it’s time to dig in, start sorting, and analyze the data. Here’s how our Survey Research Scientists make sense of quantitative data (versus making sense of qualitative data), from looking at the answers and focusing on their top research questions and survey goals, to crunching the numbers and drawing conclusions.
Here are FOUR steps aimed at showing you how to analyze data more effectively:
- Take a look at your top research questions.
- Cross-tabulate and filter your results.
- Crunch the numbers.
- Draw conclusions.
Take a look at your top research questions
First, let’s talk about how you analyze the results for your top research questions. Did you feature empirical research questions? Did you consider probability sampling? Remember that you should have outlined your top research questions when you set a goal for your survey.
For example, if you held an education conference and gave attendees a post-event feedback survey, one of your top research questions may look like this: How did the attendees rate the conference overall? Now take a look at the answers you collected for a specific survey question that speaks to that top research question:
Notice that in the responses, you’ve got some percentages (71%, 18%) and some raw numbers (852, 216).
The percentages are just that–the percent of people who gave a particular answer. Put another way, the percentages represent the number of people who gave each answer as a proportion of the number of people who answered the question. So, 71% of your survey respondents (852 of the 1,200 surveyed) plan on coming back next year.
This table also shows you that 18% say they are planning to return and 11% say they are not sure.
The raw numbers are the number of individual survey respondents who gave each answer — these should not involve any sampling. So 852 people said, “Yes, I’m coming back next year!” If you assume that most of the people who said yes–and maybe some of those who said they were not sure–are coming next year, you can build a forecasting model to estimate the number of people* who will attend next year’s conference. *You can determine this number with more confidence if you had a very high participation rate, meaning most of the people who attended the conference and received your survey filled it out.
Cross-tabulating and filtering results
Recall that when you set a goal for your survey and developed your analysis plan, you thought about what subgroups you were going to analyze and compare. Now is when that planning pays off. For example, say you wanted to see how teachers, students, and administrators compared to one another in answering the question about next year’s conference. To figure this out, you want to delve into response rates by means of cross tabulation, where you show the results of the conference question by subgroup:
From this table you see that a large majority of the students (86%) and teachers (80%) plan to come back next year. However, the administrators who attended your conference look different, with under half (46%) of them intending to come back! Hopefully, some of our other questions will help you figure out why this is the case and what you can do to improve the conference for administrators so more of them will return year after year.
Using a filter is another useful tool for modeling data. Filtering means narrowing your focus to one particular subgroup, and filtering out the others. So instead of comparing subgroups to one another, here we’re just looking at how one subgroup answered the question. For instance, you could limit your focus to just women, or just men, then re-run the crosstab by type of attendee to compare female administrators, female teachers, and female students. One thing to be wary of as you slice and dice your results: Every time you apply a filter or cross tab, your sample size decreases. To make sure your results are statistically significant, it may be helpful to use a sample size calculator.
Benchmarking, trending, and comparative data
Let’s say on your conference feedback survey, one key question is, “Overall how satisfied were you with the conference?” Your results show that 75% of the attendees were satisfied with the conference. That sounds pretty good. But wouldn’t you like to have some context? Something to compare it against? Is that better or worse than last year? How does it compare to other conferences?
Well, say you did ask this question in your conference feedback survey after last year’s conference. You’d be able to make a trend comparison. Professional pollsters make poor comedians, but one favorite line is “trend is your friend.”
If last year’s satisfaction rate was 60%, you increased satisfaction by 15 percentage points! What caused this increase in satisfaction? Hopefully the responses to other questions in your survey will provide some answers.
If you don’t have data from prior years’ conference, make this the year you start collecting feedback after every conference. This is called benchmarking. You establish a benchmark or baseline number and, moving forward, you can see whether and how this has changed. You can benchmark not just attendees’ satisfaction, but other questions as well. You’ll be able to track, year after year, what attendees think of the conference. This is called longitudinal data analysis. Learn more about how
SurveyMonkey Benchmarks can help give your survey results context.
What is longitudinal analysis?
Longitudinal data analysis (often called “trend analysis”) is basically tracking how findings for specific questions change over time. Once a benchmark is established, you can determine whether and how numbers shift. Suppose the satisfaction rate for your conference was 50% three years ago, 55% two years ago, 65% last year, and 75% this year. Congratulations are in order! Your longitudinal data analysis shows a solid, upward trend in satisfaction.
You can even track data for different subgroups. Say for example that satisfaction rates are increasing year over year for students and teachers, but not for administrators. You might want to look at administrators’ responses to various questions to see if you can gain insight into why they are less satisfied than other attendees.
Crunching the numbers
You know how many people said they were coming back, but how do you know if your survey has yielded answers that you can trust and answers that you can use with confidence to inform future decisions? It’s important to pay attention to the quality of your data and to understand the components of statistical significance.
In everyday conversation, the word “significant” means important or meaningful. In survey analysis and statistics, significant means “an assessment of accuracy.” This is where the inevitable “plus or minus” comes into survey work. In particular, it means that survey results are accurate within a certain confidence level and not due to random chance. Drawing an inference based on results that are inaccurate (i.e., not statistically significant) is risky. The first factor to consider in any assessment of statistical significance is the representativeness of your sample—that is, to what extent the group of people who were included in your survey “look like” the total population of people about whom you want to draw conclusions.
You have a problem if 90% of conference attendees who completed the survey were men, but only 15% of all your conference attendees were male. The more you know about the population you are interested in studying, the more confident you can be when your survey lines up with those numbers. At least when it comes to gender, you’re feeling pretty good if men make up 15% of survey respondents in this example.
If your survey sample is a random selection from a known population, statistical significance can be calculated in a straightforward manner. A primary factor here is sample size. Suppose 50 of the 1,000 people who attended your conference replied to the survey. Fifty (50) is a small sample size and results in a broad margin of error. In short, your results won’t carry much weight.
Say you asked your survey respondents how many of the 10 available sessions they attended over the course of the conference. And your results look like this:
You might want to analyze the average. As you may recall, there are three different kinds of averages: mean, median and mode.
In the table above, the average number of sessions attended is 6.3. The average reported here is the mean, the kind of average that’s probably most familiar to you. To determine the mean you add up the data and divide that by the number of figures you added. In this example, you have 10 people saying they attended one session, 50 people for four sessions, 100 people for five sessions, etc. So, you multiply all of these pairs together, sum them up, and divide by the total number of people.
The median is another kind of average. The median is the middle value, the 50% mark. In the table above, we would locate the number of sessions where 500 people were to the left of the number and 500 to the right. The median is, in this case, 7 sessions. This can help you eliminate the influence of outliers, which may adversely affect your data.
The last kind of average is mode. The mode is the most frequent response. In this case the answer is six. 260 survey participants attended 6 sessions, more than attended any other number of sessions.
Means–and other types of averages–can also be used if your results were based on Likert scales.
When it comes to reporting on survey results, think about the story the data tells.
Say your conference overall got mediocre ratings. You dig deeper to find out what’s going on. The data show that attendees gave very high ratings to almost all the aspects of your conference — the sessions and classes, the social events, and the hotel — but they really disliked the city chosen for the conference. (Maybe the conference was held in Chicago in January and it was too cold for anyone to go outside!) That is part of the story right there — great conference overall, lousy choice of locations. Miami or San Diego might be a better choice for a winter conference.
One aspect of data analysis and reporting you have to consider is causation vs. correlation.
What is the difference between correlation and causation?
Causation is when one factor causes another, while correlation is when two variables move together, but one does not influence or cause the other.
For example, drinking hot chocolate and wearing mittens are two variables that are correlated — they tend to go up and down together. However, one does not cause the other. In fact, they are both caused by a third factor, cold weather. Cold weather influences both hot chocolate consumption and the likelihood of wearing mittens. Cold weather is the independent variable and hot chocolate consumption and the likelihood of wearing mittens are the dependent variables. In the case of our conference feedback survey, cold weather likely influenced attendees dissatisfaction with the conference city and the conference overall. Finally, to further examine the relationship between variables in your survey you might need to perform a regression analysis.
What is regression analysis?
Regression analysis is an advanced method of data visualization and analysis that allows you to look at the relationship between two or more variables. There a many types of regression analysis and the one(s) a survey scientist chooses will depend on the variables he or she is examining. What all types of regression analysis have in common is that they look at the influence of one or more independent variables on a dependent variable. In analyzing our survey data we might be interested in knowing what factors most impact attendees’ satisfaction with the conference. Is it a matter of the number of sessions? The keynote speaker? The social events? The site? Using regression analysis, a survey scientist can determine whether and to what extent satisfaction with these different attributes of the conference contribute to overall satisfaction. This, in turn, provides insight into what aspects of the conference you might want to alter next time around. Say, for example, you paid a high honorarium to get a top flight keynote speaker for your opening session. Participants gave this speaker and the conference overall high marks. Based on these two facts you might think that having a fabulous (and expensive) keynote speaker is the key to conference success. Regression analysis can help you determine if this is indeed the case. You might find that the popularity of the keynote speaker was a major driver of satisfaction with the conference. If so, next year you’ll want to get a great keynote speaker again. But say the regression shows that, while everyone liked the speaker, this did not contribute much to attendees’ satisfaction with the conference. If that is the case, the big bucks spent on the speaker might be best spent elsewhere. If you take the time to carefully analyze the soundness of your survey data, you’ll be on your way to using the answers to help you make informed decisions.
Back to Surveys 101
3 quick tips to improve survey response rates
Here are some ideas to ensure that respondents will answer your surveys.
1. Be quick
If your survey is short and sweet, there's a greater chance that more respondents will complete it.
2. Offer incentives
Little incentives like small discount or an entry into a drawing can help ensure respondents complete your survey.
3. Buy a targeted audience
With SurveyMonkey Audience, you can purchase access to an audience who meets specific demographic criteria for your survey. It's a great way to get targeted responses from a specific group.
Looking for more survey types and survey examples?
Here's why millions of people rely on SurveyMonkey
Send as many surveys and quizzes as you want—even with free plans.
Easily create and send professional surveys. Get reliable results quickly.
Access pre-written questions and templates approved by our survey scientists.
Check results on the go from any device. Spot trends as data comes in.
Surveys give you more than just answers. Get feedback and new perspectives.
Extract and share insights from your data with your team.
I shall assume that the questionnaires were completed and submitted for analysis in paper form. Online questionnaires are discussed in section 4.1. Here is a summary of the key stages in the process of analysing the data with useful tips – more extensive discussion follows:
- Prepare a simple grid to collate the data provided in the questionnaires.
- Design a simple coding system – careful design of questions and the form that answers take can simplify this process considerably.
- It is relatively straightforward to code closed questions. For example, if answers are ranked according to a numerical scale, you will probably use the same scale as code.
- To evaluate open questions, review responses and try to categorise them into a sufficiently small set of broad categories, which may then be coded. (There is an example of this below.)
- Enter data on to the grid.
- Calculate the proportion of respondents answering for each category of each question.
- Many institutions calculate averages and standard deviations for ranked questions. Statistically, this is not necessarily a very sound approach (see the discussion on ‘evaluating data’ below).
- If your data allow you to explore relationships in the data – for example, between the perceived difficulties that students experience with the course and the degree programme to which they are attached – a simple Chi-squared test may be appropriate.
- For a review of this test and an example, see Munn and Drever (1999) and Burns (2000) – the page references are indexed.
- You may wish to pool responses to a number of related questions. In this case, answers must conform to a consistent numerical code, and it is often best simply to sum the scores over questions, rather than compute an average score.
Preparing a grid
You will have a large number of paper questionnaires. To make it easier to interpret and store the responses, it is best to transfer data on to a single grid, which should comprise of no more than two or three sheets depending on the number of questions and student respondents. A typical grid looks like this:
If the answers to a question are represented on the questionnaire as points on a scale from 1 to 5, usually you will enter these numbers directly into the grid. If the answers take a different form, you may wish to translate them into a numerical scale. For example, if students are asked to note their gender as male/female, you may ascribe a value of 1 to every male response and 0 to female responses – this will be helpful when it comes to computing summary statistics and necessary if you are interested in exploring correlations in the data. It will make it much easier to analyse the data if there is an entry for all questions. To do this, you will need to construct code to describe ‘missing data’, ‘don’t know’ answers or answers that do not follow instructions – for example, if some respondents select more than one category.
Coding open questions is not straightforward. You must first read through all of the comments made in response to the open questions and try to group them into meaningful categories. For example, if students are asked to ‘state what they least like about the course’, there are likely to be some very broad themes. A number may not find the subject matter interesting; others will have difficulties accessing reading material. It may be useful to have an ‘other’ category for those responses that you are unable to categorise meaningfully.
Often, it is sufficient and best simply to calculate the proportions of all respondents answering in each category. (An Excel spreadsheet is much quicker than using a calculator!) It is clear that having a category for all respondents who either don’t know or didn’t answer is very important, as it provides useful information on the strength of feeling over a particular question.
Questionnaire results are often used to compute mean scores for individual questions or groups of questions. For example, the questionnaire may ask students to rate their lecturer on a five-point scale, with 5 denoting excellent, 4 good, 3 average, 2 poor and 1 very poor. The mean score is then used as an index of the overall quality of a lecturer with high scores indicating good quality. This is not a particularly useful or legitimate approach as it assumes that you are working on an evenly spaced scale, so that, for example, ‘very poor’ is twice as bad as ‘poor’, and ‘excellent’ twice as good as ‘good’.
Often analysts add up scores over a number of related questions. For example, you may ask students ten questions related to a lecturer’s skills, all ranked from 1 to 5 with 5 indicating a positive response, and add up the scores to derive some index of the overall ability of the lecturer. Again, except in carefully designed questionnaires, this approach is inappropriate. It assumes that each question is relevant and of equal importance. Comparing scores across different lecturers and modules, this assumption is unlikely to hold. If you are interested in summative indices of quality, it may be best simply to ask the students to rate the lecturer themselves on a ranked scale.