RAA 5: Assessing the Components of Skill Necessary for Playing Video Games

Background:

This article dives into study of the effect of playing video games. To be more specific, this study focuses on finding a definitive list of skills that video games require, and a method to evaluate the skills required by a particular game. There is been many researches (Dorval and Pepin 1986, Mulligan, Dobson and McCracken 2005, Green and Bavelier 2004, etc) that were done to assess the skills related to playing video games. However, these studies failed to specify whether the effects are for video games in general or for specific games. The authors of this article picked psychometric method over task analysis and conceptual analysis to develop a set of scales that could be used to evaluate games.

Methods:

Two sets of people participated in the this research. First group of consisted of 6 students and the author himself. The second group  consisted of 30 undergraduate students in a course on the psychology of video games. They participated in the study by rating games that they played. And the rating was done through a questionnaire that the first group of participants designed, with 24 items and a 9-point scale from “Not Necessary” to “Very Necessary”. The games used in this research are Brothers Brawl, Heavy Rain, Tetris, Fallout (2 and 3) etc.

Main findings:

Six factors were identified. They are Perceptual Motor Abilities, Cognitive-Verbal Abilities, Problem-solving abilities, Information Utilization Abilities, Persistence, and Human human interaction. Statistical analysis was done on the data to find significance. Games of different genre were compared based on these 6 factors to show the requirement of skills for each genre. And similarity was found between FPS and TPS games, and Action and RPG games, and puzzle and platform games.

My thoughts:

First of all, I think letting the participants be part of developing the questionnaires was troublesome for the reliability of this research. And having the author also be one of the participants is inappropriate in the same sense. Participants should be randomly selected instead of picking the people who are most convenient to participate. If we look pass this, the result is quite interesting. Not only can they be useful to passively evaluate games, but also extremely helpful in developing games. I would like to see researches in finding the relationship between these skill requirement and difficulty level of the games.  Some games lose their fun in either super easy or super hard settings. Players should be able to progress starting from the easy and slowly climbing up to insane. What some games do is that they take out some of the aspects of the game in easy setting, so less number of skills are required. This doesn’t help the player to make a progress at all.

Dylan

RAA 4: Expected Usability Magnitude Estimation

I found this article in here.

Background:

Usability concerns the efficiency, effectiveness, and satisfaction of the product. Out of the three, satisfaction is typically measured through post-task / post-session questionnaire mostly likely with Likert-style ratings. According to the expectancy disconfirmation theory (Oliver, 1977), satisfaction occurs when the quality is above expectations, and conversely dissatisfaction occurs when the quality is below expectations. And the expectation was measured through pre-task questionnaire similarly to satisfaction also with Liker-style ratings. This proposal of comparing expectation ratings and experience ratings was made by Albert and Dixon (2003). The authors Rich and McGee, argue that this method begs improvement in areas of the reliance of Likert scale, confusion over task classification, and the requirement for many users, thus decide to come up with their own design of Usability Magnitude Estimation (UME). In the article they stated that “UME is a subjective assessment method where participants assign usability values to targets using ration-based number assignment”. Through this article, they wish to show the significant improvement over Albert and Dixon (2003).

Methods:

They conducted a usability test of a prototype Business Intelligence application with six participants. The application helps analysts to select optimal project to invest in given scenarios. Participants were asked to conduct 10 tasks. UME was used for both measuring the expectation and the experience of each task. And the data collected through the tests were statistically analyzed. Be noted that the participants were given instruction and practice before the test starts.

Main findings:

They found their method to be Easy to administer and analyze, Use the same underlying ratio scale for expected usability ratings and experience ratings, allowing valid comparisons, provides a theory-based and empirical usability issue prioritization strategy.

My thoughts:

I generally agree with findings and c. The change that was brought by measuring ratio of the scores that the participants hand picked themselves definitely helps to distinguish the severity of the problem. I would imagine giving low scores as low as 1 and high scores 1000 for designs that are innovative in many ways but suffer from significant design flaws. One example of this is Windows 8.  However, I think the article I read from last week shows that in fact this method of Rich and McGee hasn’t overthrown the method developed by Albert and Dixon at all. They showed that it was hard for participants to get the it right especially in the early stage of testing. This means the testing results from the early stage and the rest could be incomparable due to the learning curve of the participants. The problem may be a lack of “instructions” and “practices” given in the study. However, I think having to give instructions and practices to the participants just so that they can know how to use UME begs questions in its ease of administration. The article suggested two practice tasks for learning to use UME, so it brings the number of tasks that participants need to perform from 10 to 12. And among those 12, only 10 of them are even considered reliable and useful. Therefore, the conclusion of that the authors draw was hardly convincing. I think instead of just preaching the benefits of UME, I would prefer to see UME go head to head against Likert scale like what they did in the last article I wrote.

Dylan

RAA 3: Comparison of Three One-Question Post-Task Usability Questionnaires

This article is available in here.

Background:

This article investigate the three methods for post-task questionnaire or rating that are widely used in usability testing. They are Usability Magnitude Estimation (UME), Subjective Mental Effort Estimation SMEQ, and Likert scale.

For UME, participants create their own scale of difficult ratings. They can assign a task rating that’s any value greater than zero. The judgement therefore is made based on the ratio of ratings for all the tasks.

For SMEQ, participants are asked to either draw a line through move a scroller on a vertical scale with nine labels from “Not at all hard to do” to “Tremendously hard to do”.

And for Likert scale, participants are asked simply choose from a fixed number of ratings.

Methods:

This study investigates the correlation of the the three methods by three separate experiments. In experiment one, six users were asked to perform seven tasks on a web-based Supply Chain Management application. And prior to the test, participants received practice on making judgement through UME. They were also asked to complete both two 7 point Likert scale ratings and a UME. Think aloud protocol was used in this test. In experiment two, 26 participants were recruited to test a travel and expenses application. And 5 tasks were assigned to them. Besides UME and Likert scale, experiment two also included SMEQ as part of the test. The researchers use these data collected from these two experiments to find statistical significance.

Main findings:

Through the experiments they designed and the statistical analysis of the data collected from them. They reached a conlusion that with sample sizes of above 10-12, any of the three question types can yield reliable results. But below 10 participants, none of the question types have a high detection rates.

Likert question was easy for participants to use, and easy for administer to setup in a electronic form. SMEQ question showed good overall performance as well. By using scroller in the online version, it was easy to learn. One drawback is the making the widget. Participants had difficulty learning to use UME question type. It was less sensitive than other question types and had lower correlations with other measures such as System Usability Scale (SUS). And based of these findings, they suggest that if you want the additional information and benefits of using SMEQ or UME, SMEQ is a good choice but not UME.

My thoughts:

I came across to this article while I was looking for ways to do post-task questionnaires. I found this a blog called measuringusability.com. I think it’s quite interesting to see people testing the testing methods used in usability testing. And it’s interesting to see that they prove their points using statistical analysis as well.

This article convinced me that sometimes a simplest design can do the trick without overcomplicating things. That’s why in UR 4, my group went with just one post-task question – “overall, this task was”. And in this article, I’ve also found references of previous study done on the number of options for the Likerty form question. It turns out anything more than 7 will only confuse the participants without revealing more details.

Overall, I think this is a great article. And I’d love to read more about this.

Dylan

RAA2: a perfect combination of quantitative and qualitative research methods

Article: Using Wearable Sensors and Real Time Inference to Understand Human Recall of Routine Activities

Background:

The paper addresses the issue of the inaccuracy of self-report data especially in the context of high-frequency, low-salience events (routine events). The study creates phone based situ surveys asking participants to recall their routine activities, and this data with the ground truth data obtained from wearable sensors. Based on this method they study the effect that were brought by changing the frequency of the survey to come up with the optimal ratio of survey presentation.

Methods:

The study recruited 20 participants of different professions. They ask the participants to carry both a Mobile Sensor Platform for collecting factual data and Smart Phone running MyExperience for ESM (Experience Sampling Method) survey. The study last for 8 work days. On each day, participants were asked to perform three main tasks :(1) wearing MSP from the time they started their day in the morning until 7pm (2) answering 8 questions survey throughout the day varying number of times (3) answering a survey in the evening about the surveys they had during the day. The eight questions are divided into two groups by the two activities that were measured in the study, sitting and walking. The questions for each were (1) how many times the participant performed the target activity, what were the (2) longest and (3) shortest episodes of the activity, and (4) the total time spent performing the target activity. At the end of the study, an exit interview was performed.

Main findings:

The study shows a general the error of the recall declines with increasing number of surveys. And difference in error between 1 survey and 3 is huge. By analyzing participants with in two groups, office worker and non-office workers, the study shows that office workers make significantly lower recall errors than others. And with the data collected from the evening survey, they find out that the level of annoyance grows with the number of surveys. It was also found that it’s preferable for participants to receive surveys on a fixed schedule instead of having surveys randomly pop up to them.

My thoughts:

This study is a perfect example of integrating quantitative research methods and qualitative research method together to answer the same research question. As was shown the study, the data collected using both research methods were consistent with each other and complementary with each other. The paper goes back and forth using both of these methods to illustrate the same points and the same finding. And together, the result seems more convincing and engaging than the ones I’ve seen only using one of them. I think this article can be a template for studies that aim to utilize both quantitative and qualitative research methods. I’ll definitely go back to this when I write UR 4.

RAA 1: Effects of different ratios of worked solution steps and problem solving opportunities on cognitive load and learning outcomes

Worked Example (keyword):  is a step-by-step demonstration of how to perform a task or how to solve a problem” (Clark, Nguyen, Sweller, 2006).

Purpose of the research: The purpose of this research is to address the question of how quickly worked examples should be faded and replaced by to-be-solved problems to be most beneficial for learning.

Background: The research is grounded on three things: expert-reverse effect, assistance dilemma, and that the speed of fading worked examples should be adjusted accordingly to the  difficulty of the problem against the learn’s background.

Expert-reverse effect suggests that most appropriate student guidance is provided when a procedure provides much support (e.g. worked examples) in the beginning of a learning phase and increasingly less support when the learners proceed in skill acquisition (Kalyuga, Ayres, Chandler, & Sweller, 2003; Kalyuga & Hanham, 2011).

Assistance dilemma claims it is crucial to hold the right balance between giving support (worked examples) to the students and deliberately withholding it (e.g. to-be-solved problems) (Koedinger & Aleven, 2007).

.

Methods: 

So the research sampled 125 german high-school student in computer aided cognitive tutor lesson on circles geometry, in which three geometry principles were taught. And within those three principles there were five learning opportunities / steps, which are either worked examples or to-be-solved problems.

In the research procedure, first demographics information was collected by asking participants to fill out a questionnaire. Then, a written introduction to the three mathematical principles were given to all participants to read. And before learning phases started, participants were on two introductory problems to familiarize themselves with the system. Both before and after the learning phase, a cognitive load questionnaire and a test  of both procedure and conceptual problems in different difficulty levels were conducted for the purpose of comparison. The data were analyzed using statistical methods to show statistical significances.

Main findings:

Support: learners rated extraneous load as induced by to-be-solved steps generally to be higher than extraneous load as induced by worked steps.

Support: ratings of extraneous load were generally negatively related to learning outcomes.

Don’t support: extraneous load would decrease as a result of higher ratios of worked steps.

Support: Scores related to easy principle show significant difference for different rations of worked steps and to-be-sloved steps.

Don’t support: No significancy difference between different ratios of worked steps and to-be-solved steps with respect to conceptual knowledge related to the easy rule or with respect to procedural or conceptual scores related to the difficult rule.

My thoughts: First of all, let me explain how I came across this paper. I don’t really know where to find HCI papers. I tried Dr. V’s publications, but it seem to be that they are all studies on social networking not HCI. I remembered University of Maryland has a department for HCI. And this the the first paper of the first person of UMD’s HCII department. I used this not because I’m particularly interested in this area. It’s just so that I won’t be wasting more time searching for papers online.

The interesting thing about this article is that it’s clearly using quantitative research methods. I see familiar things in the STAT 501 class that I’m taking this semester. This shows to me that we don’t necessarily have to use qualitative research method in studying interaction. I don’t see the author of this paper mentioning interview or observation of the participants. I just want to say what a fresh change. Every Wednesday is a battle of my STAT 501 instructor defending quantitative research methods, and Dr. V my CGT 512 professor selling qualitative research methods. I’m pro quantitative by my engineering and science nature. Qualitative research method is like witch craft to me. I almost gave up on further studying in HCI, because I haven’t seen anything in our textbook or reading about using quantitative method studying interaction design just until last week.

I’m really interested to find out more about what those numbers mean. And the way the author present the data is also quite different than what I’ve learned in my STAT 501 class, so it’s also worth exploring.