Mc Quaig Report Vol.1 Do You Know How Much Hiring Mistakes Are Costing Your C...
IP 200 Introduction
1. The - IP200
1. INTRODUCTION
Although INTEG specializes in Integrity in the World of Work in the South African
market, it also developed other products that are making a further contribution to
assessing Integrity from different angles, and are dedicated to those specific areas,
e.g., cognition, personality, etc.
The following aspects are covered:
I. A short description of the test – including the number of test-items it consists of,
the structure of the instrument, the time required to complete it, the measuring
areas (scales) it provides and the purpose served by the test.
II. Psychometric Properties of the instrument – including reliability, validity, bias,
fairness & readability.
III. General use and decision-making rules advised by the developer.
IV. An example of the Summarized Report generated by the test.
V. Technical Manual.
VI. Norms.
Considering the technical detail, uniqueness, comprehensiveness and volume, as well
as the commonality in some facets, it is considered necessary to combine the
description of the last two subjects, i.e. Technical Manual & Norms and to present it
first as well as a general introducing coverage of subject ‘II’ being ‘Psychometric
Properties’.
2. TEST DEVELOPMENT MODEL
Before presenting the above specifies on the individual tests, it is considered
necessary to provide a brief outline of the seven steps the traditional well
established model of test development consists of that was used in developing the
INTEG-tests.
2.1 Conceptualizing : What are we looking for?
2.2 Operationalizing : How would this show itself?
2.3 Quantifying : How can we attach a value to what we have observed?
2.4 Pilot testing : How does the test behave in practice?
2.5 Item analysis : Does each item contribute properly to the total score?
2.6 Norm development and interpretation : What does this score mean? (Develop
and maintain norms).
2.7 Evaluation of test : Is the assessment process consistent and accurate? (Is it
reliable and valid?).
2. The - IP200
A detailed description of each step for each test, does not fit in the format of this
paper, but because INTEG specializes in Integrity, and the IP200 is the flagship in this
field, a more detailed application of these steps in the development process of this
measuring instrument is provided in Addendum A as an example of the composition
of the INTEG-tests in more practical terms and how the steps were applied in the
development of all the others tests. In the light of the fact that even the summarized
information reflected in this addendum covers 40 odd pages, a condensed version
thereof is provided here, under the heading “Main Moments in Developing the
IP200”.
“MAIN MOMENTS IN DEVELOPING THE IP200
1. The IP200 was originally developed to service as a Counseling Instrument.
2. The concept of Integrity was at first studied at length, thoroughly, in-depth and in
much detail – conceptualizing phase by implication.
3. The results flowing from this initial phase were used to differentiate between
behavior, attributes, attitudes, etc. that presents themselves as below to higher in
terms of this description/concept – establishing a criteria by implication (of
Integrity).
4. The ‘criteria’ so established, was used to produce a ‘normal’ distribution of
employed people (irrespective of their jobs, experience, qualification, industry type
involved in, race, gender, nationality, etc.) along the continuum of integrity.
5. Out of a total of 21, 192 independent anecdotes/narrative that were produced by
the research team, 1212 were submitted to statistical analysis and 202
phrases/items generated that differentiated significantly between the poor and
good Integrity population – reduced through a process of expert interpretation and
combination to 200 (which represents the items the test consists of).
6. Each one of these items proved, on independent analysis, to differentiate significantly
relative to an integrity-related criteria – as described above.
3. The - IP200
7. A Factor Analysis of the 200 selected and differentiating anecdotes, produced 40
identifiable groups and a further Factor Analysis of the latter 40 Areas, produced 8
groups; representing the 8 Substructures the IP-test consists of.
8. During the statistical analysis steps of developing and introducing the IP200 only
valid and reliable information was used, i.e. only the ‘test’-results produced by
testees comprehending the test and not trying to manipulate the outcome thereof
were used – i.e. a Consistency score of above a 6-sten, and a Lie-Detector of above
a 7 sten.
9. In line with the so-called ‘Full Service Policy’, it was decided to reflect the 4 items
with the highest Face Validity as the so-called ‘Loading Items/Factors under each
Area in the Categorization Document to promote the effectiveness of the
counseling function amongst the users of the instrument – notwithstanding the
‘statistical results’.
10. Considering the knowledge that all the test-items differentiate significantly on the
Integrity Construct, the RRA (Replacement Regression Analysis) was conducted on
all the 170 items to ensure an optimal loading on each Area.
11. The above introduce the Multiple Order Approach (MOA) of selecting and loading
of items on specific Areas to be assessed – according to which each of the so-called
four loading Items functions as a multiple unit of items, each contributing different
values/degrees relative to its ‘true’ component relative to the particular
observation (item) and the degree it succeeds in declaring the variance of the
particular construct/area it loads on.
12. The MOA used during the development of the IP200, impacted significantly on Step
5 of the Development Process, (i.e. the conducting of an Item Analysis to determine
whether ‘each item’ contribute properly to the total score) in that each so-called
Loading Item consists of an integrated multiple set/unit of items, optimized in
terms of eliminating (by matter of speech) the Error Component of each of the
relevant item/observation – to a maximum of three observations/items per Loading
Item.
4. The - IP200
13. The experimental instrument (IP200 test) was applied in practice to evaluate its
consistency and accuracy (i.e. the reliability and validity), and finally it was
submitted to the HPCSA for registration.”
3. TECHNICAL MANUAL
The Technical Manual of most of the tests represents rather hefty documents,
covering the information, summarized in this paper, in such detail and run into
hundreds of pages. The technical research part thereof is covered in the Training
Manual and it is suggested that it rather forms part of a fully fletch training endeavor.
It is, nevertheless, available on request.
4. NORMS
The developers of the INTEG measuring instruments are using, in line with the
generally accepted international practice, the integrated concept of norming the
results in order to compose a unitary norm for all their products. This concept is
derived from the so-called ‘Integrated Multiple Composed Norm (IMCN)’ that was
initiated by the European Psychometric Convention (EPC) of 1994.
The six steps, to accomplish the above, involve a multiple, involved and longitudinal
process, but it is justified considering the obvious benefits to be derived from the
results.
The above does not eliminates the individual Norming Process that is still embarked
and reported on in the Technical Manual of each test. This is an ongoing process
depending on different populations, circumstances and intended uses of the test in
question.
5. PSYCHOMETRIC PROPERTIES
The models used to determine the psychometric properties of the INTEG-measuring
instruments forms the subject of this generalized introduction and the results
obtained for each test, are reported in general terms, when the particular test is
presented in this paper.
5.1 Reliability
Reliability is a measure of the consistency with which a measuring instrument
measures. Reliability is thus the consistency with which a measure/test
5. The - IP200
achieves the same results under different conditions. If a low degree of
consistency is achieved by a measure, it is uncertain whether anything of
substance is really measured by the particular instrument. This is the first,
primary and acid ‘test’ an instrument must pass in terms of the successive
hurdles of psychometric properties an accurate, successful and effective
psychometric test must adhere to in a statistical and practical sense.
With this as a general background, the following reliability models were applied
in the development of each INTEG-test:
5.1.1 Coefficient of Internal Consistency where a split-half approach of the test’s
items were used to determine how consistent the instrument is in an
internal sense.
5.1.2 Co-efficiency of Stability where a test-retest approach was used to
determine the reliability/consistency of the instrument when applied to the
same group of people or two or more occasions – how stable is this test
over time.
5.2 Validity
The validity of a measure is the extent to which the instrument measures what
it claims, or is supposed to measure/test. In other words, validity is concerned
with the extent to which the measure is free of irrelevant or contaminating
influences. Validity is thus the ratio of the relevant score to the total or
observed score. Therefore the larger the irrelevant component, the lower the
validity. Another name for this irrelevant component is ‘bias’. Logically this
leads to the conclusion that the validity of an instrument cannot be greater
than its reliability – justifying the primary importance placed on the
concept/property of reliability above.
With this as a general background, the following validity models were applied in
the development of each INTEG-test – all of which are important, although they
apply differently in different contexts and therefore require different kinds of
evidence:
5.2.1 Construct Validity
Determining the extent to which the instrument produces results that are in
line with what is already known in the particular field of study. A proven and
6. The - IP200
popular approach is to use the Discriminant Validity in this instance - not
correlating with measures known to be independent therefrom. Similarly
Factor Analysis is used to determine the extent to which the particular
instrument is utilizing a similar factorial structure present in other
techniques/tests of the same (or related) construct.
5.2.2 Content Validity
Determining to what extent the context of the instrument accurately reflects
the domain it assesses.
5.2.3 Criterion-Related Validity
Determining to what extent the results generated by the instrument relate to
some (sound, reliable and valid) external criterion of success in the particular
field. In this area there are the following two forms of criterion – related
validity modules, namely:
Concurrent Validity
- determining the extent to which the instrument successfully distinguishes
between known groups relative to the criterion of success.
Predictive Validity
- determining the extent to which the instrument successfully predicts how
(unknown) groups may differ in the future regarding the select criterion of
success.
5.2.4 Face Validity
Part of Content Validity, is the notion of Face Validity. Determining the extent
to which the instrument appears (especially to the uninformed) to be doing
what it claims to be doing – i.e. does the instrument, and the items it consist of,
seem to be appropriate?
5.3 Bias, Fairness & Discrimination
Bias can best be described as the systematic error in measurement or research
that affects one group (e.g. race, age, gender, etc.). more than another. Unlike
random error, bias can be controlled for.
7. The - IP200
Fairness on the other hand, is the extent to which assessment outcomes are
used in a way that does not discriminate against particular individuals or
groups.
It is clear that a commonality exists between the above two and in the
development of the INTEG-test the so-called ‘Norming Process’ was applied in a
(statistical) practical approach where a wide variety of factors that are ‘known’
to be ‘sensitive’ to the concept of bias, fairness and/or discrimination (like age,
gender, ethnicity, language, etc.) were sub-divided into two categories each
(like young and old) and the results of the test correlated with a multi-
dimensional (external) success-criterion.
If the obtained set of correlations differ to a significant degree for a particular
sub-divided group, the probability for the instrument to measure/predict
unfairly on the specific factor (it is sub-divided on), is considered to be
good/strong. The opposite is also true. The model used is commonly known as
the ‘Sub-Division Norming Process’.
5.4 Readability
Although language per sé is not categorized as a psychometric property (except
for been known as a ‘sensitive factor’ in terms of the concept of fairness), it can
play a determining role in test-administration and interpretation. Other than
using language experts and doing practical trail-runs with the particular test
with the purpose of minimizing the differential effect of the language used in
the test (e.g. to the total elimination of verbal test-items in the COPAS), the Fry
Readability Graph was used in the development of the test to ensure that the
language used was at a low ‘complexity’ level – and of course to always include
Language as a ‘known sensitive factor’ in the ‘Sub-Division Norming Process’
during the seventh and last step in the Test Development Model. Attention is
given on a continuous basis to the language-issue in verbally/text-based tests –
statistical analysis of all items are performed and feedback is gathered from
users of the test in different situations, to ensure that items, words and
sentences in the test are properly comprehended and serve their intended
purpose.
In summary, the following seven actions are taken to ensure effective and
optimal ‘Readability’ in text-based tests:
8. The - IP200
- Using Language Experts to formulate texts during test-development.
- Applying the Fry Readability Graph during test-development.
- Using ‘Language’ as a given ‘critical/sensitive’ factor in the ‘Sub-Division
Norming Process’ during the Evaluation of the Test in last (7th) step of the
Test Development Model.
- Perform continuous statistical analysis on items used in text-based tests.
- Gather and implement ‘post-mortem’-information on tests used in practice
– especially when tests are applied for the first time to particular groups and
under specific circumstances or conditions.
- Translating tests when necessary.
I. GENERAL DESCRIPTION
The IP200 is the flagship of the dedicated Integrity tests that consists of 10 sub-structures
of which each has 5 measuring areas. It is a very comprehensive, diagnostic and
developmental instrument that provides the user with more than 60 scales to measure the
complex concept of Integrity with, to provide feedback to testees, to make predictions on
future behaviour and/or develop Integrity on an individualistic or corporate basis. It
consists of 200 test-items, declares approximately 88% of the total variance and takes
approximately an hour to administer and boasts Lie- and Consistency as well as Unnatural
Exaggeration factors. The test can either be completed in using the ‘pencil-and-paper’ or
‘on-screen’ approach, the scoring is completely computerised and the user has the choice
to use an ‘on-line’ approach in the entire administrative process. The latter is applicable
to all the INTEG-tests.
● Measuring Areas – Scales
Ten Substructures of Integrity Plus the five measuring areas each of the ten
1. Socialisation Substructures consists of.
2. Trustworthiness ●Purpose
3. Credibility Comprehensive Integrity Measure in the
4. Work Ethics World of Work.
5. Attitudes – Integrity Constraining - Representative
6. Functional vs. Dysfunctional Behaviour - Detailed
7. Manipulative Abuse of Power – Non - Diagnostic
8. Values - Clinical
9. Transformation Commitment & Man Integrity - Development – (a Training Module is
10 Monitor, Lie, Consistency & Exaggeration Registered with the Service SETA in each
Of the 10 Areas to Develop Integrity).
- Selection
- Rolling out the Culture of Integrity
9. The - IP200
- Organisational Development
- Investigative Orientation
- Career Planning & Development
II PSYCHOMETRIC PROPERTIES
Reliability : Ranging from 0.84 to 0.92 (Significant)
Validity : Ranging from 0.42 to 0.66 (Significant)
Fairness (Norming Process) : “Although the Standard Error of the Main and the Standard
Error of Measurement were both calculated in all four variances of the above
demographics, no significant results were obtained” and “The T-test to
establish whether significant statistical differences exist, render ‘no significant
differences’ at the lowest level – i.e. 0.001”.
Readability & Ease of Comprehension : “A maximum of sixth grade of the Fry Readability
Graph was never exceeded” There is also no time restriction applicable to the
completion of the test and there is always a trained ‘administrator’ available
to allow testees to clear up any doubts they may have regarding the meaning
of words or sentences.
The Seven Actions regarding ‘Readability’ (described earlier in this paper),
were/are applied in the IP200.
III GENERAL USE AND DECISION-MAKING RULES
The IP200 can be used at a grade 8 schooling level.
Consists of 200 questions; hence the name
Consider a score of below a 6-sten on the Lie-Detector scale as a knock-out score.
The IP200 must be interpreted by a registered psychologist.
If the IP200 is interpreted for the first time, adhere to the so-called ‘5-Step
Interpretation Process’