RAA 5: Assessing the Components of Skill Necessary for Playing Video Games

Background:

This article dives into study of the effect of playing video games. To be more specific, this study focuses on finding a definitive list of skills that video games require, and a method to evaluate the skills required by a particular game. There is been many researches (Dorval and Pepin 1986, Mulligan, Dobson and McCracken 2005, Green and Bavelier 2004, etc) that were done to assess the skills related to playing video games. However, these studies failed to specify whether the effects are for video games in general or for specific games. The authors of this article picked psychometric method over task analysis and conceptual analysis to develop a set of scales that could be used to evaluate games.

Methods:

Two sets of people participated in the this research. First group of consisted of 6 students and the author himself. The second group  consisted of 30 undergraduate students in a course on the psychology of video games. They participated in the study by rating games that they played. And the rating was done through a questionnaire that the first group of participants designed, with 24 items and a 9-point scale from “Not Necessary” to “Very Necessary”. The games used in this research are Brothers Brawl, Heavy Rain, Tetris, Fallout (2 and 3) etc.

Main findings:

Six factors were identified. They are Perceptual Motor Abilities, Cognitive-Verbal Abilities, Problem-solving abilities, Information Utilization Abilities, Persistence, and Human human interaction. Statistical analysis was done on the data to find significance. Games of different genre were compared based on these 6 factors to show the requirement of skills for each genre. And similarity was found between FPS and TPS games, and Action and RPG games, and puzzle and platform games.

My thoughts:

First of all, I think letting the participants be part of developing the questionnaires was troublesome for the reliability of this research. And having the author also be one of the participants is inappropriate in the same sense. Participants should be randomly selected instead of picking the people who are most convenient to participate. If we look pass this, the result is quite interesting. Not only can they be useful to passively evaluate games, but also extremely helpful in developing games. I would like to see researches in finding the relationship between these skill requirement and difficulty level of the games.  Some games lose their fun in either super easy or super hard settings. Players should be able to progress starting from the easy and slowly climbing up to insane. What some games do is that they take out some of the aspects of the game in easy setting, so less number of skills are required. This doesn’t help the player to make a progress at all.

Dylan

Advertisements

Thoughts on Microsoft’s so called futuristic video

Many of you may remember that one of the speaker from Microsoft mentioned in his talk about a futuristic video of what Microsoft has accomplished. I believe I’ve found the video here.

A lot of you may know that I’m not a huge fan of Microsoft with XBOX being the only exception. I’ve got nothing against them. I’ve been Windows user from since Windows 95. My conversion to apple started when I bought my first iPod. I think the problem with Microsoft is not that they don’t make good product. It’s their vision of the “future” is not as far as user’s expectation, and other industry pioneers.

Let’s take this video for an example. One may argue that there are indeed many cool features in this video. But what I get from this video is that Microsoft imagines the future to be restricted to 2D interface, just like our PCs and tablets today. This is not a surprising at all. Man kind has suffered from this restriction since the first cave painting came out. Because of this, our 3D thinking and imagination have not been developed as good as 2D thinking and imagination. We are able to perceive the world in 3D, but the virtual world that we created in books, paintings, magazines, movies, TV shows and games only exist in 2D. I think now it’s time to fix this.

3D interface does exist in the area of augmented reality. The idea of this is using integration of real world and virtual world to provide more information to users. There are several practices of this technology that exist today, mostly for military application. For example tank drivers use this technology to measure the target area of attack, and fighter pilot use it to track enemy aircraft.

The obvious advantage of augmented reality to the 2D design that’s shown in this video is that we don’t need to turn everything into to a display. We only need one display per eye, two per person. We will not only be able to accomplish what’s shown in this video with a much cheaper cost, but also do more in  3D. Imagine you are an architect testing out the design of your building and you are standing at the location for your new building. With this technology, you will be able to integrate the 1:1 scale virtual model into the location to see the exact result. It’s way cooler than touching your fridge to check the weather.

However, there is one thing that’s stoping augmented reality to become a “thing”. That’s the issue of display. Intuitively, you want to get the display as close to your eye as possible. Some people are working on making display on contact lenses, and some are working on projecting images directly to the retina. There aren’t any huge progress on these yet. Making very thin touch screen display however is much more promising. This is probably the reason why Microsoft opts to choose this.

Although design-wise I think 3D interface with augmented reality I think is much better than what’s shown in this video, in the end it all comes to whichever that comes first and whichever is cheaper.

Dylan

UI Design Bad Example: Elevator in my building

There are people posting UI design issue of elevators. I’ve seen one UI design writing about a strange phenomena in a building that he visited, that when people see other people coming to the elevator, they would give them a friendly smile and close the door on them when they get there. He later discovered that people were really trying to press the button to hold the door, but because the locations of the close button and hold button were switched, they did the opposite.

This blog post focuses on the design issues of the elevator in my building, which may not be unique to others, but they are eating me from the inside if I don’t write about it.

Issue 1: Visibility of system status

The system status for elevator is intuitively simple. It should show on every floor that which floor it’s on right now, so people can make an estimation of the time they have to wait for the elevator. And this information is usually displayed on the top of the elevator door on each floor. However, here in this particular design it was replaced by a sign saying “Elevator B”. Not only does it not give you the information that you expect, the information that’s displayed prominently is completely useless.

Issue 2: “pressability” of the buttons

I was so confused when I saw this the first time. I had no idea what is pressable and what not. My first intuition was to press the black half of the button. It didn’t work. Then I tried the white key. It did the trick. But the problem was that it didn’t give me tactile feeling that the button goes down a little bit when I press it, however the button does light up after I press it. This is not a huge problem for me, but imagine if you were a blind person, how would you find out about where to press and whether have you pressed the correct button. This design flaw turns the effort to include the Braille illustration for floor number into waste.

There are many more things about is elevator that I really don’t like. But they don’t hurt the design as much as these two above. I’m just gonna leave a couple of pictures for you guys to make your own judgement.

Dylan

RAA 4: Expected Usability Magnitude Estimation

I found this article in here.

Background:

Usability concerns the efficiency, effectiveness, and satisfaction of the product. Out of the three, satisfaction is typically measured through post-task / post-session questionnaire mostly likely with Likert-style ratings. According to the expectancy disconfirmation theory (Oliver, 1977), satisfaction occurs when the quality is above expectations, and conversely dissatisfaction occurs when the quality is below expectations. And the expectation was measured through pre-task questionnaire similarly to satisfaction also with Liker-style ratings. This proposal of comparing expectation ratings and experience ratings was made by Albert and Dixon (2003). The authors Rich and McGee, argue that this method begs improvement in areas of the reliance of Likert scale, confusion over task classification, and the requirement for many users, thus decide to come up with their own design of Usability Magnitude Estimation (UME). In the article they stated that “UME is a subjective assessment method where participants assign usability values to targets using ration-based number assignment”. Through this article, they wish to show the significant improvement over Albert and Dixon (2003).

Methods:

They conducted a usability test of a prototype Business Intelligence application with six participants. The application helps analysts to select optimal project to invest in given scenarios. Participants were asked to conduct 10 tasks. UME was used for both measuring the expectation and the experience of each task. And the data collected through the tests were statistically analyzed. Be noted that the participants were given instruction and practice before the test starts.

Main findings:

They found their method to be Easy to administer and analyze, Use the same underlying ratio scale for expected usability ratings and experience ratings, allowing valid comparisons, provides a theory-based and empirical usability issue prioritization strategy.

My thoughts:

I generally agree with findings and c. The change that was brought by measuring ratio of the scores that the participants hand picked themselves definitely helps to distinguish the severity of the problem. I would imagine giving low scores as low as 1 and high scores 1000 for designs that are innovative in many ways but suffer from significant design flaws. One example of this is Windows 8.  However, I think the article I read from last week shows that in fact this method of Rich and McGee hasn’t overthrown the method developed by Albert and Dixon at all. They showed that it was hard for participants to get the it right especially in the early stage of testing. This means the testing results from the early stage and the rest could be incomparable due to the learning curve of the participants. The problem may be a lack of “instructions” and “practices” given in the study. However, I think having to give instructions and practices to the participants just so that they can know how to use UME begs questions in its ease of administration. The article suggested two practice tasks for learning to use UME, so it brings the number of tasks that participants need to perform from 10 to 12. And among those 12, only 10 of them are even considered reliable and useful. Therefore, the conclusion of that the authors draw was hardly convincing. I think instead of just preaching the benefits of UME, I would prefer to see UME go head to head against Likert scale like what they did in the last article I wrote.

Dylan

RAA 3: Comparison of Three One-Question Post-Task Usability Questionnaires

This article is available in here.

Background:

This article investigate the three methods for post-task questionnaire or rating that are widely used in usability testing. They are Usability Magnitude Estimation (UME), Subjective Mental Effort Estimation SMEQ, and Likert scale.

For UME, participants create their own scale of difficult ratings. They can assign a task rating that’s any value greater than zero. The judgement therefore is made based on the ratio of ratings for all the tasks.

For SMEQ, participants are asked to either draw a line through move a scroller on a vertical scale with nine labels from “Not at all hard to do” to “Tremendously hard to do”.

And for Likert scale, participants are asked simply choose from a fixed number of ratings.

Methods:

This study investigates the correlation of the the three methods by three separate experiments. In experiment one, six users were asked to perform seven tasks on a web-based Supply Chain Management application. And prior to the test, participants received practice on making judgement through UME. They were also asked to complete both two 7 point Likert scale ratings and a UME. Think aloud protocol was used in this test. In experiment two, 26 participants were recruited to test a travel and expenses application. And 5 tasks were assigned to them. Besides UME and Likert scale, experiment two also included SMEQ as part of the test. The researchers use these data collected from these two experiments to find statistical significance.

Main findings:

Through the experiments they designed and the statistical analysis of the data collected from them. They reached a conlusion that with sample sizes of above 10-12, any of the three question types can yield reliable results. But below 10 participants, none of the question types have a high detection rates.

Likert question was easy for participants to use, and easy for administer to setup in a electronic form. SMEQ question showed good overall performance as well. By using scroller in the online version, it was easy to learn. One drawback is the making the widget. Participants had difficulty learning to use UME question type. It was less sensitive than other question types and had lower correlations with other measures such as System Usability Scale (SUS). And based of these findings, they suggest that if you want the additional information and benefits of using SMEQ or UME, SMEQ is a good choice but not UME.

My thoughts:

I came across to this article while I was looking for ways to do post-task questionnaires. I found this a blog called measuringusability.com. I think it’s quite interesting to see people testing the testing methods used in usability testing. And it’s interesting to see that they prove their points using statistical analysis as well.

This article convinced me that sometimes a simplest design can do the trick without overcomplicating things. That’s why in UR 4, my group went with just one post-task question – “overall, this task was”. And in this article, I’ve also found references of previous study done on the number of options for the Likerty form question. It turns out anything more than 7 will only confuse the participants without revealing more details.

Overall, I think this is a great article. And I’d love to read more about this.

Dylan

RAA2: a perfect combination of quantitative and qualitative research methods

Article: Using Wearable Sensors and Real Time Inference to Understand Human Recall of Routine Activities

Background:

The paper addresses the issue of the inaccuracy of self-report data especially in the context of high-frequency, low-salience events (routine events). The study creates phone based situ surveys asking participants to recall their routine activities, and this data with the ground truth data obtained from wearable sensors. Based on this method they study the effect that were brought by changing the frequency of the survey to come up with the optimal ratio of survey presentation.

Methods:

The study recruited 20 participants of different professions. They ask the participants to carry both a Mobile Sensor Platform for collecting factual data and Smart Phone running MyExperience for ESM (Experience Sampling Method) survey. The study last for 8 work days. On each day, participants were asked to perform three main tasks :(1) wearing MSP from the time they started their day in the morning until 7pm (2) answering 8 questions survey throughout the day varying number of times (3) answering a survey in the evening about the surveys they had during the day. The eight questions are divided into two groups by the two activities that were measured in the study, sitting and walking. The questions for each were (1) how many times the participant performed the target activity, what were the (2) longest and (3) shortest episodes of the activity, and (4) the total time spent performing the target activity. At the end of the study, an exit interview was performed.

Main findings:

The study shows a general the error of the recall declines with increasing number of surveys. And difference in error between 1 survey and 3 is huge. By analyzing participants with in two groups, office worker and non-office workers, the study shows that office workers make significantly lower recall errors than others. And with the data collected from the evening survey, they find out that the level of annoyance grows with the number of surveys. It was also found that it’s preferable for participants to receive surveys on a fixed schedule instead of having surveys randomly pop up to them.

My thoughts:

This study is a perfect example of integrating quantitative research methods and qualitative research method together to answer the same research question. As was shown the study, the data collected using both research methods were consistent with each other and complementary with each other. The paper goes back and forth using both of these methods to illustrate the same points and the same finding. And together, the result seems more convincing and engaging than the ones I’ve seen only using one of them. I think this article can be a template for studies that aim to utilize both quantitative and qualitative research methods. I’ll definitely go back to this when I write UR 4.

UI Design Good Example: Starcraft 2 UI design – Micro vs Macro

I just bought Starcraft II, and have been playing it for a while. Yes it does run on Mac!!! So it came to me that I should do a blog on it.

For those of you who are less familiar with Starcraft, or computer games, Starcraft II is a military science fiction real-time strategy game developed and released by Blizzard Entertainment for Microsoft Windows and Mac OS X. We all know strategy games. The most common example of strategy game in western culture is either Chess or Monopoly depending on your generation. Here we say Starcraft II is a real-time strategy game. This means that it’s not turn-based, so that you don’t have to wait for your opponent to make a move before you do anything. I would use the metaphor that Starcraft II is like Chess, but you’re allowed to move all pieces at any time you want. The goal in Starcraft II is simple – destroy all enemy units. And to achieve that goal, you need three things: collecting the resources to build your economy, developing new technology to gain an edge in combat, and produce as many troops as possible, as quickly as possible to crush the enemy forces.

In this blog post, I’m gonna talk about the interface design of Starcraft II in a PvP (Player versus Player) or PvA (Player versus A.I.) context and how it affects the gameplay of this game.

To win a game in Starcraft II, there are two sets of skills that are needed: Micro and Macro. Micro, or Micromanagement refers to any action that applies to an individual or small number of units. One example of Micro is Hit and Run, which means that if your units have the range advantage, you can kill all the enemy units without taking damage by dragging your units to keep a healthy distance between your units and enemy units. Macro, or Macro-management is on the other hand related to the large scale events, such as expanding your base, building your troops, developing new technology, etc. And the interface design of Starcraft II has made possible that the information needed for both are shown on the interface at the same time.

The picture above is a screen shot of Starcraft II. I’ve marked all the parts of the interface that are related to the gameplay with red square. Square a (Macro related) shows the amount of resources in numbers. Square b (Micro related) shows the ability the selected units have (move, stop, attack, patrol, etc). Square c (Micro related) shows the selected units, what they are, how many are they, etc. Square (Macro related) shows the control groups you have, most people use this feature to control their units in combat and produce units back in base at the same time. Square e (Macro related) is the mini map of the stage of combat showing both your units and enemy units that are visible to you.

From the design of this interface, we can actually see that the information related to Macro is designed to be glance really really quickly in a general sense, but the information for Micro is shown with excessive amount of details to the extent that not all of them are useful to you at the same time in a given situation. This emphasis on information related to micro is evidence that the game is micro focused, meaning that the game wants you to focus on most of your time and energy micro-manage the battle with the units you have to do most damage to your enemy and to preserve your troops as well instead of worrying too much about producing units, collecting resources back home. This pushes the players to engage the game in a fast moving and pressured set up, and let them fight each other till a complete destruction of one side or one side surrenders.

Why Blizzard does this to players, you ask? Are they promoting violence? This is because Blizzard games are built for competition. They intentionally complicates things, and forces you to practice in hope that you won’t be humiliated by your opponent or will be able to dominate your opponent. And therefore, players will spend more time on playing the game, and will grow loyal to the series and the brand.

Dylan

A summary of why UX book sucks and why I hated it


TO DR.V

PLEASE SHARE THIS WITH THE BOOK AUTHOR BUT DON’T FORGET TO OMIT MY NAME!!!

1. Keyword definition on the side margin of the page

Most books put keyword definition in a colored box  before the appearance of the keyword, so the reader won’t feel clueless or puzzled when they first see the keyword. However, the UX book puts them on the side margin parallel to the keyword. Side margins are usually reserved for the reader to take notes. It’s unexpected to find important information here.

2. Preface is way too long

The preface for this book is way too long. And it looks too much like a regular chapter with subsections and bullet points. This actually makes finding the table of contents harder than usual, because it’s expected to have the table of contents in the first few pages.

3. Inserting articles looking like research papers in the middle of the chapter

Sometimes I found myself in those light blue pages, which I assume are research articles written by other people. Sometimes I don’t even know what these pages are. I’m not gonna argue whether they fit in the chapter (sometimes they are not). I just think that these articles should be made available in references or resources at the end of each chapter. Readers can have the choose to either dig in deeper to the materials by searching for and reading these articles or not. It should not be forced on to every reader. Moreover, having those articles in between subsections breaks the continuity of the contents. If you read the book in a linear fashion, it’s hard to recall what’s in the last section after finishing the article.

4. Division and naming of sub-section and sections of each chapter doesn’t not always makes sense.

For example, in chapter 16 and 17. Analysis of quantitative data and reporting them, which is closely after the first one , are divided to fit in two chapters. So the reader has to recall the information of chapter 16 in order to understand what’s going on in chapter 17.

Also for the titles of sections in chapter 16 and 17, we see the word “formative” and “qualitative” used interchangeably. But this doesn’t not justify names such as “Formative (qualitative) data analysis”, “Reporting qualitative formative results”, “formative reporting content” appearing at the same time. The inconsistency in using that term is causing confusion.

Moreover for the example of chapter 16 and 17, shouldn’t “formative reporting content”, “formative repairing audience, needs, goals, and context of use” be a sub-section of “reporting qualitative formative results”. This happens very often throughout the book. The naming of sub-section is actually worse. Most subsections should definitely not be under the same section title. This has happened in every chapter I read throughout the semester. And this is the most unbearable issue I have with this book.

To sum up this problem, the reader should be able to look at the table of content and know what to expect in each subsection of a chapter without actually reading them. There should be a clear logical thread going on in titles. If not, then some of the contents are probably not belong there.

5. The book goes into so much detail into things that we couldn’t care less.

Some of the suggestions are just an insult to the reader’s ability to make sensible decisions and come up with valid ideas. We have not signed the right for the author to point finger in everything that we do in UX by purchasing the book. And I certainly don’t agree that the way the author promotes is the best way of doing it or what it seems the only way to do it in the reading.

6. Hard to distinguish between section titles, sub-section titles, and bullet points.

Section titles, sub-section titles, and bullet points all look the same to me. I noticed that some of them are bold and upper case, and some of them are italic, but the differences are too subtle. I expect large differences in font size or even color.

Dylan

Week 12 Reading Notes and Reflections – This is a battle between UX and STAT

This is week’s reading is ridiculous. Not only it’s trashing statistics to an unbearable extent, but it also justifies its wrong doing in a shameless manner. One quota from the book says it all – “Because product design is not research but engineering, we are not concerned with getting at scientific “truth”; our is more practical and less business. Our evaluation drives our engineering judgement, which is also based on hunches and intuition that are, in turn, based on skill and experience.”. It’s this kind of attitude that leads to the design that kills people.

List of sins against statistics:

1. “It may help to include standard deviation values, for example, to indicate something about the rough level of confidence you should have in data.”

Standard deviation and level of confidence are very different things. Level of confidence is to show how confident you’re that the real value (population mean) falls into an interval called confidence interval (E.g. we’re 95% confident that the average time it takes for the user to print out a report in the system is between 2 to 5 mins). Our confidence is in our method not in our data.

2. “Sometime it can mean that you should try to run a few more participants. ”

The number of participants should be decided according to your expectation towards the data (for example if you want your margin of error to be 0.3% then you need roughly a thousand people to participate). You can’t fix your result by getting more participants.

3. What are the UX goals based on?

The book talks nothing about how they actually set the UX goals. It may well be that the goals are set too high or even unrealistic in some cases. How can you tell?

4. “The quantitative data analysis for informal summative evaluation does not include inferential statistical analysis.”

So you have the data, why not do inferential statistical analysis on it? In fact, there is program for doing that. It can be done in minutes. And it costs nothing.

5. Why is this the only way to identify UX problems?

Statistical analysis can show association too. See, in statistics we have both quantitative and categorical data. Say there is question on the survey “what do you think is the best way to get to XXXX screen” and there is 4 answers. That data is categorical not quantitative. We can definitely run test with it to see if there is an association between this and the average time spent doing a task. On the other hand it makes sense, since the book tells us not to do statistical analysis, so I guess qualitative analysis is the only way.

I’m not disagreeing with the ability of qualitative data analysis. I think it’s great in finding UX issues. It’s just that the book is unfair in the way it describes power of statistics. If I want to make a very high-cost decision, I want to know the probability that I will be right. I can’t just put my trust into honesty of the participants and the ability of the evaluators to interpret the data.

There are questions that qualitative analysis can never give answers to. One I can think of is “Pepsi and Coke, which one is better?”. You can never convince me with your conclusion drawn by qualitative research. Exploring the goals and emotions behind each brand won’t tell you anything!!!

Dylan