Give people beer. Get people to rate the beer and say which beer they think it is. Collect data. Look at data. Drink a beer (optional).
We created a “beer tasting book” so that people knew this was a serious endeavor. We include notes about aroma and color, even though the beers pretty much all look and smell the same. If the beer tasting had been with things other than fizzy yellow beers, this would have been more interesting. The book also gave people a place to take notes as they tasted the beers. Finally, at the end of tasting all beers (or as many as they felt like doing), we gave people a dada collection card, so that we knew what they rated each beers and let them guess which beers they thought were which. If they were interested, we graded their guesses.
The number of participants was 37, but individual non-responses for each beer type varied between 7 and 11. That is, usable data points per beer type varied between 26 and 30.
The aim is to test the ratings of cheap beers. Beers you can get in cans. Beers you can bring to festivals. Beers that are available in SoCal. The beers are as follows:
B. The Tasting and Dada Collection
1. Bar Setup
This experiment was run at a freestanding bar with space behind/under the bar to hide the beers. We set one can of each on top of the bar so that people knew what beers were being tasted. We then ordered the beers. They were ordered as listed above. The beers were then set out, 3 cans at a time behind the bar so that people could not see them. Four sets of keys were hidden behind the bar as well, for the bartenders to refer to.
When a new taster walked up, they were handed a book that contained instructions. This book also contained the place where they could take notes, rate each beer and guess what beer it was.
When each person was ready for their first or a subsequent beer, they handed their empty cup and booklet to a bartender. The bartender would then find a number that they had not yet tasted, cross off the number and pour 2-3oz of that beer into the cup. The taster would receive their cup and book back and be told which number they were now tasting.
To make sure that no bias was created by the ordering of the beers, there was an attempt made to randomize the order that people tasted beers in. Each bartender was helping roughly six people at a time, so it was possible to give an individual taster a unique order. Some tasted 1-6 in order, some 6-1 in reverse order. Other orders were used two, even numbers in order followed by odd numbers in order, for example. Some just received a random order. A few people requested their own random order.
Three cans of each beer were pre-stocked from a cooler behind the bar. When a beer had it’s 3rd can opened, two additional beers were added behind the last can. In this manner, it was possible to serve nearly all the beer cool, but not cold. There should have been a roughly uniform temperature across the beers.
The primary data of interest are a) the blind rating of each beer and b) the score people thought they were giving a beer. For example, the mean rating of the beer that people guessed as Miller Lite. The mean and median of each of these ratings should provide some insight.
From this data, one can also examine how well participants performed at guessing, i.e. the percentage of correct guesses for each beer.
The next question is “Do the differences between the (blind) ratings of each beer matter?”. This will be answered in two ways. The first through a matrix of paired t-tests that establish a 95% confidence interval (α = 0.05) of the difference between each beer with each other beer. In standard statistical form:
H0 : μ1 – μ2 = 0
Ha : μ1 – μ2 ≠ 0
H0 = the null hypothesis
Ha = the alternative hypothesis
μ_ = the mean of a given set of data
The downside of the t-test is that one must assume normality in the target population. As an alternative, a similar matrix of paired Wilcoxon (a.k.a. Mann-Whitney) tests will be performed. This is a non-parametric test that has the benefit of not requiring population normality. Specifically, it tests the shift in distribution between two values, and therefore can be used to give an estimate of difference.
Normality will be checked using the Shapiro-Wilkes test as well as examination on quartile-quartile plots. Histograms may also prove useful.
Statistical analyses were performed using R software and associated packages (www.r-project.org).
V. Results and Discussion
The relationships between actual rating and perceived rating appear to have generally held out, with little discrepancy between the two (Figure 2). The superlatives were the same, with PBR nailing the best category at a rating of approximately 3.0, and Bud Light hitting the bottom at around 2.15 (Table 1). Of note were the discrepancies in Tecate Light and Coors Light. People rated Coors Light higher than what they thought was Coors Light, and rated Tecate Light lower than what they thought was Tecate Light. In other words, Coors Light was better than people thought it would be, and Tecate Light was worse than people thought it would be.