Test: AB test

When you want to test two (or more) versions of your design or prototype, you can conduct an AB test. An AB test could e.g. be on different layouts, placement of text, color, but also greater workflows, designs, etc.

Things to do before conducting the AB test:

  • Define which variables you are testing in your design (the difference between A and B)
  • Identify which metrics you want to focus on - is it performance metrics (which ones?) or self-reported metrics (which ones?)
  • Consider if you need a within- or between subject design for your test (see Within- & Between Subject)
  • For proper statistical analysis, you need to have at least 15 participants per version


How to setup an AB test in Preely
In Preely you’ll set up the AB test as you set up a regular test, see ‘Create test - step by step’ and ‘Test: Usability test’. You will need to set up two tests - one for each version. How you run the tests depends on if the test design is within- or between subject.

Analysis
For the analysis, you should focus on the metrics you defined before conducting the test and compare them to each other.

When working with statistics in AB testing, we work with two different kinds:
Descriptive statistics is a way to summarize the dependent variable for the different conditions.
Inferential statistics tells us about the likelihood that any differences between our experimental groups are “real” and not just random fluctuations due to chance.

For most, it’ll be enough to use descriptive statistics.

How to perform descriptive statistics
The most common way to describe the differences between experimental groups is by describing the mean scores on the dependent variable for each group of participants. This is a simple way of conveying the effects of the independent variable on the dependent variable.

Other tools that can be used are e.g.:

  • Bar or pie diagrams
  • Boxplots
  • Scatterplots



Want more advanced statistics?

How to perform inferential statistics
Even though the mean scores of the experimental groups showed a difference, this can be due to chance. So the question is: Is the difference big enough so we can out rule chance and assume the independent variable had an effect? Inferential statistics gives us the probability that the difference between the groups is due to chance. If we can rule out the chance explanation, then we conclude that the difference was due to the experimental manipulation.

Two variables: If you have two variables you should use a Student’s t-test. Here we get the value t and we identify the probability (p) that the t-value was found by chance for that particular set of data, if there was no effect or difference. You can find different tools on the Internet and Excel can also be used, to calculate both the t- and the p-value.

More than two variables: If you have more than two variables you should use an ANOVA. Here we get the value F and we identify the probability (p) that the F-value was found by chance for that particular set of data, if there was no effect or difference. You can find different tools on the Internet and Excel can also be used to calculate both the F- and the p-value.

Draw conclusions
The smaller the p probability is, the more significant our result becomes and the more confident we are that our independent variable really did cause the difference. The p-value will be smaller as the difference between the means is greater, as the variability between our observations within a condition (standard deviation) is less, and as the sample size of experiment increases (more participants or more measurements per participant). A greater sample size gives our experiment grater statistical power to find significant differences.

Within your organization you need to decide when the p-value is significant, often we operate with a p-value that has to be less than 0.05 to be significant. Then, if the p-value is less than 0.05 we conclude that the results are not due to change, but an effect of the independent variable.