Emotional Intelligence Home Page


Here is my pretty much line-by-line commentary on this paper.

S. Hein
January 2005

Measuring Emotional Intelligence with the MSCEIT V2.0

ABOUT THIS ARTICLE: The actual article can be found at, and cited as: Mayer, J.D., Salovey, P., Caruso, D.R., & Sitarenios, G. (2003).  Measuring emotional intelligence with the MSCEIT V2.0.  Emotion, 3, 97-105.

ABOUT PUBLISHING THIS MANUSCRIPT ON THE WEB: A copyedited version of this manuscript is scheduled to appear in the journal Emotion. Copyright is by the American Psychological Association. According to APA guidelines on internet publishing (http://www.apa.org/journals/posting.html), the article can only be posted on the author's web site; "the posted article must carry an APA copyright notice and include a link to the APA journal home page; APA does not permit archiving with any other non-APA repositories; APA does not provide electronic copies of the APA published version for this purpose; and, authors are not permitted to scan in the APA published version." Please help us by abiding by these guidelines and not posting the article elsewhere. Please refer to that APA web site (http://www.apa.org ) for more information. Thank you!

abiding by these guidelines? doesn't that mean obeying the rules? and what happens if we don't? will we be punished? and if we are punished are they really just "guidelines"? and if we could be punished then are the people who obey the rules (or who "abilde by the guidelines") really helping anyone, or are they just being threatened and as a result are afraid of being punished!?

 See also my more detailed criticism of the APA "guidelines"


Measuring Emotional Intelligence with the MSCEIT V2.0

John D. Mayer
Peter Salovey
David R. Caruso
Gill Sitarenios


Does a recently introduced ability scale adequately measure emotional intelligence (EI) skills?

notice they are talking about skills, like goleman

Using the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT), we examined (a) whether members of a general standardization sample and emotions experts identified the same test answers as correct, (b) the test’s reliability, and (c) the possible factor structures of EI. Twenty-one emotions experts endorsed many of the same answers as did 2112 members of the standardization sample, and exhibited superior agreement particularly when research provides clearer answers to test questions (e.g., emotional perception in faces). The MSCEIT achieved reasonable reliability, and confirmatory factor analysis supported theoretical models of EI. These findings help clarify issues raised in earlier articles published in Emotion.


The past 12 years have seen a growing interest in emotional intelligence (EI), defined as a set of skills concerned with the processing of emotion-relevant information, and measured with ability-based scales.

saying ei is a set of skills is moving away from idea of intelligence as being potential. and moving towards something that can be taught. i can teach someone brick-laying skills. so does that mean there is something called brick-laying intelligence? see this article for a discussion of ability, skills, potential

A new ability test of EI, the Mayer-Salovey-Caruso Emotional Intelligence Test, Version 2.0 (MSCEIT), potentially improves upon earlier measures, and can inform the debate over the scoring, reliability, and factor validity of such scales .

they are still using the word "inform"; they are too woried about the "debate" over this stuff like scoring, reliability, and factor validity of such scales and not worried enough about children, teenagers

do we want smart people who have studied psychology to be woried about numbers or human beings?

The MSCEIT is intended to measure four branches, or skill groups, of emotional intelligence: (a) perceiving emotion accurately, (b) using emotion to facilitate cognitive activities, (c) understanding emotion, and (d) managing emotion .

first time i remember seeing "skill groups"

The MSCEIT is the most recent of a series of ability scales of emotional intelligence. Its immediate predecessor was the MSCEIT Research Version 1.1 (MSCEIT RV1.1), and before that, the Multifactor Emotional Intelligence Scale . Those tests, in turn, evolved out of earlier scales measuring related constructs such as emotional creativity, social intelligence, and non-verbal perception .

The MSCEIT and its predecessors are based on the idea that emotional intelligence involves problem solving with and about emotions.

now they get back to problem solving, but we aren't looking at how children solve emotional problems before they are taught the ways used in whatever society they were born into. the problem solving part of the test is measuring people against social standards, not differences in inborn potential. only the facial part is measuring inborn potential, i'd say.

Such ability tests measure something relatively different from, say, self-report scales of emotional intelligence, with which correlations are rather low . The ability to solve emotional problems is a necessary, although not sufficient, ingredient to behaving in an emotionally adaptive way.

emotionally adaptive way. isn't this a bit like saying "what do we call a well adjusted slave"? (a quote from Maslow, which Mayer used himself once in one of his articles on EI)

The MSCEIT V2.0 standardization process required collecting data sets relevant to the several issues concerning emotional intelligence. For example, in a recent exchange of papers in this journal, Roberts, Zeidner, and Mathews , raised concerns about the earlier-developed MEIS test of emotional intelligence.

i want to keep in touch with roberts and the others. reminds me of my meeting with roberts in australia. i liked his style. very smart, non-conforming guy.

These concerns included whether there is one set of correct answers for an emotional intelligence test or whether expert and general (e.g., lay) opinions about answers diverge too much,

i don't really care if the experts and the general public agree. they have all been brainwashed in more or less the same system. i am pretty sure that the sample that jack and the others used which they call the general public was mostly university students and university graduates. if we really want to look at individual differences we have to get to the kids before they are brainwashed, trained, molded - whatever you want to call it

whether such tests could be reliable,

a test can be reliable and still not be telling us anything useful

and whether the factor structure of such tests was fully understood and consistent with theory.

i have no idea what a factor structure is and i don't really care if their test supports their theory if i don't like their theory. i am hardly even sure what their theory is anymore. they are so worried about proving that there test is "reliable, valid" etc. i don't know what other articles they have been writing, but i am not at all impressed with this one. and i feel discouraged they are not using their brainpower to write about something with more direct relevance to the problems in the world today. like understanding the feelings of a so called terrorist, and a so called president, who i would say is extremely dangerous, more dangerous than the so called terrorist. the president is more dangerous to our freedom, for example.

In that same issue of the journal we, and others, responded .

i feel sad they are wasting so much time on this test stuff and trying to defend themselves and prove their test is good. i wonder how many hours were spent on all of this.

Findings from the MSCEIT standardization data reported here have the promise of more directly informing the debate, through empirical findings.

pretty long winded statement, which didn't tell me much. but it sounds like they are feeling proud of themselves. It sounds like they are saying, we have the data to prove we are right.

The analyses we present address three questions: (a) Do general and expert criteria for correct answers to emotional intelligence test items converge?

as i said, i don't care if they converge. by the way, converge seems to mean that they have the same answers, in case you are a psych student in a country where English is not the first language - something which these PhD's never seem to think about. Anyhow, let me give a specific example of why I don't care if everyone agrees with the same answers. Here in peru nearly everyone thinks it is normal to make kids stand at attention in the hot sun in school and also to make them wear uniforms. so if they asked a bunch of college graduates and a bunch of psychologists questions about education, would we be surprised if both of them said it was a fine thing to have students stand in the hot sun at attention and wear uniforms? but would this mean it is good for humanity?

(b) What is the reliability of such tests?

like i said, a test can be reliable and useless. example, let's say i test rocks by seeing if they float. i put them in water and watch them sink. i find out that over and over again they don't float. the test is reliable. it tells me rocks don't float. but we already knew this. in other words, it is nice that it is reliable, but it has to be much more than that to impress me.

(c) And, is the factor structure of such tests consistent with theoretical models of EI?

factor structure again.... can't these psychologists use regular English? i hope they don't talk to their kids like that! but i am fairly sure they talk to their psych students like that. and that alone scares me. i would put money on the bet that all of the authors have spent more time talking about factor structure than feelings with their university students. and more time on test reliability than on children or suicidal teenagers. or the childhood, adolescense and development of the feelings of a "terrorist"



The Criteria for Correct Answers

One must know how to score a test’s items before one can settle such issues as the test’s reliability and factor structure. Our model of EI hypothesizes that emotional knowledge is embedded within a general, evolved, social context of communication and interaction .

right here i would disagree with them. i don't believe we have evolved to a healthy understanding of emotions, communication, interaction. take the example, of someone asking the question "why didn't you..." this is common in many cultures. but it is not the best way to communicate. it puts the other person on the defensive. i would say what is commonly accepted around the world is messed up. i would not use the current standards to prove that somoene is intelligent in any sense of the word.

to me an intelligent person, a really intelligent person is going to give much different answers to a lot of the questions on this kind of a test. a really intelligent person, i'd say, listens to his own mind, not to what is the social norm. he uses his own memory, and processing ability for example. he observes. remembers. thinks. he forms his own conclusions and comes up with his own answer and also his own ways of solving problems. he does not just try to solve them in the ways he has seen others trying to solve them. and he sees that the ways others are trying to solve problems are not working.

take the current president of the usa, bush. it seems pretty clear to a lot of intelligent people that bush's idea of solving problems is not the best we humans could come up with. yet bush, being not too intelligent, or being a product of a very thorough brain washing system, goes on spending billions of dollars to try to solve the problem, as he sees it, in the way he thinks is best. it actually doesn't even take someone very intelligent to see that his ideas are flawed, yet the elections in the usa were a kind of a test. and the americans chose bush again. so does this mean that the elections are a good test of intelligence, just because the majority chose bush? and i wonder if the majority of psychologists or so called emotions experts in the usa voted for bush. i would guess the majority of the people both psychologists and non-psychologists around the rest of the world would not have voted for bush. so what does this tells us? i'd say it tells us that what you consider are the best answers on a test, just like who wins an election in one part of the world, depends more on culture than intelligence.

Consequently, correct test answers often can be identified according to the consensus response of a group of unselected test-takers.

like i have been saying, the general consensus does not impress me.

For example, if the respondents identify a face as predominantly angry, then that can be scored as a correct answer.

maybe on the face identification part of the test, but not on the rest of it.

We have further hypothesized that emotions experts will identify correct answers with greater reliability than average, particularly when research provides relatively good methods for identifying correct alternatives, as in the case of facial expressions of emotion, and the meaning of emotion terms .

yeah this is pretty much what i just said myself!

If the general and the expert consensus diverge too far as to the correct answers on a test, a complication arises, because the two methods yield potentially different scores for each person.

true, so they are saying they feel better when the general public agrees with the so called experts. but i don't feel better. because these so called experts all are psychologists who went to the same kinds of schools and got the same kinds of training and had to follow the same kinds of rules in order to get their little piece of paper called a diploma. and they go to the same conferences held in different expensive hotels around the world each year and listen to the same speakers.

Roberts et al. used an expert criterion that we had developed based on only two experts,

the two experts were Jack and David...but they never tell the public that, as far as I ever remember seeing. they told me this in person. sorry if they feel betrayed, but isn't it the truth, guys? and why didn't you ever tell people this in the first place?

and a general consensus method to score the earlier MEIS measure, and found those methods did not always converge.

In other words, Jack and David's answers didn't always match those of the general public - but i think what they are calling, or what I am implying that they are calling the general public, was actually mostly college students -- but that is a different issue. Anyhow, let's say Jack and David's answers don't match the general consensus.... then what? what could this mean? it could mean Jack and David are worse at solving emotional problems. Or it could mean they are better. I would really like to see what their original answers were and give my commentary. And I believe it would be helpful to the people interested in all of this to provide the answers. And I challenge them to provide their answers and tells us exactly where there answers didn't match what they call the general consensus. Then we can judge for ourselves whether we think Jack and David's answers are better or worse.

I have said these guys, Jack and David, have a high level of integrity. And I like to think they are interested in the real truth. And I would like to think they are secure enough to give us their answers. But I am afraid they will come up with some lame excuse like "That is not the way researchers do things." Or, "That is not in the APA guidelines." So I challenge them to not give me any crap excuse and to show me their original answers. And if they tell me we don't have them anymore, then I would like them to go back and take the MEIS test again and give me their answers. And, beyond that I would like to know where Jack and David disagreed between themselves. I can't imagine they both agreed to everything. And if they did, then that scares me a bit. It is like the saying, if two people always give you the same answers, then you don't need one of them.

Some aggregation of experts beyond two is necessary, however, to achieve a reliable identification of answers . Twenty-one emotions experts were employed in the present study.

I am not so sure I like them giving all this responsibility away to 21 so called emotions experts. I feel more comfortable with Jack and David's answers, since I know both of them personally. Still, I am pretty sure I would disagree with some of their answers. But more importantly, I would have designed a different test with different questions and different possible answers. Jack, David and I have different values, different life experiences. So we are going to believe different kinds of people are emotionally intelligent. There is no way to get around that using a test like they have designed - at least the non-facial part, that is the most objective part of the test.

Issues of Reliability

The MSCEIT V2.0 must exhibit adequate levels of reliability, as did the MEIS, MSCEIT RV1.0, and comparable psychological tests .

must? they make it sound like this is really important to the survival of the species. they sound like Maurice Elias. lol. In something I read once he kept on saying "must" this and "must" that.

As with its predecessor tests, the four MSCEIT V2.0 branch scores (e.g., Perception, Facilitating, Understanding, Management) draw on different tasks that include different item forms; that is, the items are non-homogeneous. Under such conditions, split-half reliability coefficients are the statistic of choice (relative to coefficient alphas), as they involve the orderly allocation of different item types to the two different halves of the test . The test-retest reliability of the total MSCEIT score has been reported elsewhere, at r(60) = .86 .

Issues of Factor Structure

The factor structure of a test indicates how many entities it plausibly measures. It is important to any debate over whether EI is a coherent, unified, concept. In this specific case, it indicates how many dimensions of EI the test is "picking up" -- one unified dimension, many related dimensions, or something else. We believe that the domain of EI is well described by 1-, 2-, and 4- oblique (correlated) factor models, as well as other equivalent models. If the MSCEIT V2.0 shows similar structure to the MEIS for both expert and general scoring, it would strengthen the argument that the theory of EI we employ works across tests. Using the standardization sample, we performed confirmatory factor analyses of the full scale MSCEIT V2.0, testing 1-, 2-, and 4- factor models to examine the range of permissible factor structures for representing the EI domain.



General Sample

The present sample consisted of 2,112 adult respondents, age 18 or older, who completed the MSCEIT V 2.0 in booklet or online forms prior to May, 2001.  The sample was composed of individuals tested by independent investigators in 36 separate academic settings from several countries. The investigators had requested pre-release versions of the MSCEIT booklet or online forms (depending on Internet availability and other criteria), and had submitted documentation of their research qualifications and of approval of their research from their sponsoring institution. Only basic demographic data were collected across samples due to the diverse nature of the research sites.

Of those reporting gender, 1,217 (58.6%) were women and 859 (41.4%) were men.  The mean age of the sample was M = 26.25; S = 10.51, with roughly half the sample college-aged (52.9%), and the rest ranging upward to 69 years old.  The participants were educationally diverse, with 0.6% reporting not completing high school, 10.3% having completed only high school, 39.2% having some college or university courses, 33.7% having completed college, and 16.1% holding Masters level or higher degrees.  The group was ethnically diverse as well, with 34.0% Asian, 3.4% Black, 2.0% Hispanic, 57.9% White, and 2.3% other or mixed ethnicity.  Most participants came from the United States (1240), with others from South Africa (231), India (194), the Philippines (170), the United Kingdom (115), Scotland (122), and Canada (37); all testing was in English.

Expert Sample

The expert sample was drawn from volunteer members of the International Society for Research on Emotions (ISRE) at its 2000 meeting. The Society was founded in 1984 with the purpose of fostering interdisciplinary scientific study of emotion. Membership is open to researchers and scholars who can demonstrate a serious commitment to the investigation of the emotions. Twenty-one experts, 10 male and 11 female, from eight Western countries, participated. The sample of experts had a mean age of 39.38 (S = 6.44; Range =30 to 52); no data about their ethnicity were collected.


The MSCEIT V2.0 is a newly developed, 141-item scale designed to measure four branches (specific skills) of emotional intelligence: (a) Perceiving Emotions, (b) Using Emotions to Facilitate Thought, (c) Understanding Emotions, and (d) Managing Emotions. Each of the four branches is measured with two tasks. Perceiving Emotions is measured with the Faces and Pictures tasks; Facilitating Thought is measured with the Sensations and Facilitation tasks; Understanding Emotions is measured with Blends and Changes; and Managing Emotions is measured with Emotion Management and Emotional Relationships tasks.

Each of the 8 MSCEIT tasks is made up of a number of item parcels or individual items. A parcel structure occurs, for example, when a participant is shown a face (in the Faces task), and asked about different emotions in the face in five subsequent items. The five items make up an item parcel because they are related to the same face, albeit each asks about a different emotion . Other items involve one response per stimulus, and are, in that sense, free-standing. Response formats were intentionally varied across tasks so as to ensure that results generalized across response methods, and to reduce correlated measurement error. Thus, some tasks, such as Pictures, employed 5-point rating scales, whereas other tasks, such as Blends, employed a multiple-choice response format.

Briefly, in the Faces task (4 item parcels; 5 responses each), participants view a series of faces and for each, respond on a five point scale indicating the degree to which a specific emotion is present in a face. The Pictures task (6 parcels; 5 responses each) is the same as Faces except that landscapes and abstract designs form the target stimuli, and the response scale consists of cartoon faces (rather than words) of specific emotions. In the Sensations task (5 parcels; 3 responses each), respondents generate an emotion and match sensations to them. For example, they might generate a feeling of envy and decide how hot or cold it is. In the Facilitations task (5 item parcels; 3 responses each), respondents judge the moods that best accompany or assist specific cognitive tasks and behaviors, for example, whether joy might assist planning a party. In the Blends task (12 free-standing items), respondents identify emotions that could be combined to form other emotions. They might conclude, for example, that malice is a combination of envy and aggression. In the Changes task (20 free-standing items), respondents select an emotion that results from the intensification of another feeling. For example, they might identify depression as the most likely consequence of intensified sadness and fatigue. Respondents in the Emotion Management task (5 parcels; 4 responses each) judge the actions that are most effective in obtaining the specified emotional outcome for an individual in a story. They are asked to decide, for example, what a character might do to reduce her anger, or prolong her joy. Finally, in the Emotional Relationships task (3 item parcels; 3 responses each), respondents judge the actions that are most effective for one person to use in the management of another person’s feelings. See the test itself, and its manual, for more specific task information .

General and Expert Consensus Scoring

The MSCEIT yields a total score, two area scores (experiential and strategic), four branch scores corresponding to the four-branch model, and eight task scores. Each score can be calculated according to a general consensus method. In that method, each one of a respondent’s answers is scored against the proportion of the sample who endorsed the same MSCEIT answer. For example, if a respondent indicated that surprise was "definitely present" in a face, and the same alternative was chosen by 45% of the sample, the individual’s score would be incremented by the proportion, .45. The respondent’s total raw score is the sum of those proportions across the 141 items of the test. The other way to score the test is according to an expert scoring method. That method is the same, except that the each of the respondent’s scores is evaluated against the criterion formed by proportional responding of an expert group (in this case, the 21 ISRE members). One of the purposes of this study was to compare the convergence of these two methods.


The MSCEIT administration varied depending upon the research site at which the data were collected (see "Sample," above). The MSCEIT was given to participants to complete in large or small groups, or individually. Of the 2,112 participants, 1,368 took the test in a written form and 744 took the test in an on-line form that presented the exact same questions and response scales, by accessing a web-page. Those taking the pencil and paper version completed scannable answer sheets that were entered into a database. Web page answers were transmitted electronically. Prior research has suggested that booklet and on-line forms of tests are often indistinguishable .


Comparison of Test-Booklet versus On-Line Administration Groups

We first compared the MSCEIT V2.0 booklet and on-line tests. For each, there are 705 responses to the test (141 items X 5 responses each). The correlation between response frequencies for each alternative across the two methods was r(705) = .987. By comparison, a random split of the booklet sample alone, for which one would predict there would be no differences, yields almost exactly the same correlation between samples of r(705) = .998. In each case, a scatterplot of the data indicated that points fell close to the regression line throughout the full range of the joint distribution, and that the points were spread through the entire range (with more points between .00 and .50, than above .50). Even deleting 30 zero-response alternatives from the 705 lowered the correlation by only .001 (in the random split case). The booklet and on-line tests were, therefore, equivalent, and the samples were combined.

Comparison of General vs. Expert Consensus Scoring Criteria

We next examined the differences between answers identified by the experts and by the general consensus. We correlated the frequencies of endorsements to the 705 responses (141 items x 5 responses) separately for the general consensus group and the expert consensus group, and obtained an r(705)= .908 that, although quite high, was significantly lower than the r = .998 correlation for the random split (z(703) = 34.2, p < .01). The absolute difference in proportional responding for each of the 705 alternatives also was calculated. The mean value of the average absolute difference between the expert and general groups was M D(705)| = .08; S = .086, which also was significantly greater than the difference between the booklet and on-line samples of MD(705)| = .025; S = .027; z(705) = 16.3, p < .01.

We hypothesized that emotions experts would be more likely than others to possess an accurate shared social representation of correct test answers; their expertise, in turn, could provide an important criterion for the test. If that were the case, then experts should exhibit higher inter-rater agreement than the general group. To assess inter-rater agreement, we divided the expert group randomly into two subgroups of 10 and of 11 experts each, and computed the modal response for each of the 705 responses for the two subgroup of experts. The Kappa representing agreement controlling for chance across the two expert subgroups for the 5 responses of the 141 items was 6= .84. We then repeated this process for two groups of 21 individuals, randomly drawn from the standardization (general) samples and matched to the expert group exactly on gender and age. Two control groups, rather than one, were used to enhance our confidence in the results; the education level of the comparison groups was comparable to that of the rest of the general sample. When we repeated our reliability analysis for the two matched control groups, we obtained somewhat lower Kappas of 6= .71 and .79.

The same superiority of the expert group exists at an individual level as well, where disaggregated agreement will be lower . The average inter-rater Kappa coefficients of agreement across the 5 responses of the 141 items, for every pair of raters within the expert group, was 6= .43, which significantly exceeded the average Kappas of the two control groups 6’s = .31; .38 (z’s = 4.8; 1.85; p < .05 to .01, one-tailed test).

Because general and expert groups both chose similar response alternatives as correct, and experts have higher inter-rater reliability in identifying such correct alternatives, members of the standardization sample should obtain somewhat higher test scores when the experts’ criterion is used (before normative scaling corrections are applied). Moreover, the expert group should obtain the largest score advantages on skill branches where the experts most agree, owing to the experts’ greater convergence for those responses. For example, one might expect increased expert convergence for Branches 1 (emotional perception) and 3 (emotional understanding) because emotions experts have long focused on the principles of coding emotional expressions , as well as on delineating emotional understanding . By contrast, research on Branches 2 (emotional facilitation of thought) and 4 (emotion management) is newer and has yielded less consensus, and so experts might be more similar to the general sample in such domains.

To test this idea, we conducted a 4 (branch) X 2 (consensus versus expert scoring criterion) ANOVA on MSCEIT scores. The main effect for scoring criterion was significant (F(1,1984) = 3464, p < .001), indicating, as hypothesized, that participants obtained higher raw scores overall when scored against the expert criteria. The main effect for branch was significant as well (F(3,5952) = 1418, p < .001), indicating, unsurprisingly, that items on some branches were harder than others. Finally, there was a branch by scoring criterion interaction (F(3,5952) = 2611, p < .001).

Orthogonal contrasts indicated that participants scored according to the expert criterion on Branches 1 and 3 obtained significantly higher scores than when scored against the general consensus (see Table 1; F(1,1984) = 1631 and 5968, respectively; p’s < .001). Branch 2 (using emotions to facilitate thought) showed a significant difference favoring general consensus (F(1,1984) = 711, p’s < .001), and Branch 4 showed no difference (F(1,1984) = 1.57, n.s.). The advantage for expert convergence on Branches 1 and 3 may reflect the greater institutionalization of emotion knowledge among experts in these two areas.

In a final comparison of the two scoring criteria, participants’ tests were scored using the general criterion, on the one hand, and the expert criterion, on the other. The correlation between the two score sets ranged from r (2004-2028) = .96 to .98 across the Branches, Areas, and Total EIQ scores, as reported in Table 1.

The evidence from this study reflects that experts are more reliable judges, and converge on correct answers where research has established clear criteria for answers. If further studies bear out these results, the expert criteria may prove superior to the general consensus.

Reliability of the MSCEIT V2.0

The MSCEIT has two sets of reliabilities depending upon whether a general or expert scoring criterion is employed. That is because reliability analyses are based on participants’ scored responses at the item-level, and scores at the item-level vary depending upon whether responses are compared against the general or expert criterion. The MSCEIT full-test split-half reliability is r(1985) = .93 for general and .91 for expert consensus scoring. The two Experiencing and Strategic Area score reliabilities are r(1998) = .90 and .90, and r(2003) = .88 and .86 for general and expert scoring, respectively. The four branch scores of Perceiving, Facilitating, Understanding, and Managing range between r(2004-2028) = .76 to .91 for both types of reliabilities (see Table 1). The individual task reliabilities ranged from a low of " (2004-2111) = .55 to a high of .88. However scored, reliability at the total scale and area levels was excellent. Reliability at the branch level was very good, especially given the brevity of the test. Compared to the MEIS, reliabilities were overall higher at the task level but were sometimes lower than is desirable. We therefore recommend test interpretation at the total scale, area, and branch levels, with cautious interpretations at the task level, if at all.

Correlational and Factorial Structure of the MSCEIT V2.0

As seen in Table 2, all tasks were positively intercorrelated using both general (reported below the diagonal) and expert consensus scoring (above the diagonal). The intercorrelations among tasks ranged from r(1995-2111) = .17 to .59, p’s < .01, but with many correlations in the mid .30’s.

Confirmatory Factor Analyses

A factor analysis of the MSCEIT V2.0 can cross-validate earlier studies that support 1-, 2-, and 4- factor solutions of the EI domain . The 1-factor, "g" model, should load all eight MSCEIT tasks. The 2-factor model divides the scale into an "Experiential" area (Perception and Facilitating Thought Branches) and a "Strategic" area (Understanding and Managing Branches). The 4-factor model loads the two designated Branch tasks on each of the 4 branches . These analyses are particularly interesting given that the MSCEIT V2.0 represents an entirely new collection of tasks and items.

We tested these models using AMOS , and cross-checked them using LISREL and STATISTICA . The confirmatory models shared in common that (a) error variances were uncorrelated, (b) latent variables were correlated, i.e., oblique, and (c) all other paths were set to zero. In the 4-factor solution only, the two within-area latent variable covariances (i.e., between Perceiving and Facilitating, and between Understanding and Managing) were additionally constrained to be equal so as to reduce a high covariance between the Perceiving and Facilitating branches

There was a progressively better fit of models from the 1- to the 4-factor model, but all fit fairly well (4 vs. 2 factors: P