Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

Self-criticism self-report measures: Systematic Review

Info: 23354 words (93 pages) Dissertation
Published: 28th Jan 2022

Reference this

Tagged: PsychologyTherapy

Abstract

Objectives

Self-criticism is a transdiagnostic process that has been attracting recent research and clinical interest. This systematic review identified and evaluated the measurement properties of self-report questionnaires of self-criticism.

Methods

A systematic review was performed using four databases and a search of the grey literature was undertaken. Studies were included when the main focus was to evaluate the measurement properties of English versions of scales or subscales that aimed to measure self-criticism in adults. Both the methodological quality of included studies and the specific measurement properties were evaluated; these ratings were then combined into a best evidence synthesis.

Results

Five scales and five subscales were identified, described in 14 papers. The scales were designed to measure different types of self-critical thinking including trait self-criticism, repetitive self-criticism and self-criticism in response to difficult situations or as a mood regulation strategy. The majority of the included studies were either rated as having poor methodological quality, or were given indeterminate or negative ratings for the measurement properties they reported.

Conclusions

Only tentative recommendations could be made about two measures of self-criticism based on existing evidence; future high quality research is required. Questionnaire choice should also include consideration of the type of self-criticism that the clinician or researcher wishes to assess.

Contents

Click to expand Contents

1. Introduction

1.1 Self-criticism

1.2 Measures of self-criticism

1.3 Rationale of current systematic review

2. Objective

3. Method

3.1 Search strategy

3.2 Search terms

3.3 Selection criteria

3.5 Selection process

3.6 Data extraction

3.7 Quality assessment

3.7.1 Step one – assessment of the methodological quality of studies

3.7.2 Step two – quality assessment of instruments

3.7.3 Step three – Best Evidence Synthesis (BES)

3.8 Measurement properties

3.8.1 Measurement properties in this study

4. Results

4.1 Selection of studies

4.2 Questionnaires found in search

4.3 Self-criticism as a trait

4.3.1 Self-Critical Cognition Scale (SCCS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.3.2 Levels of Self-Criticism Scale (LSCS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.3.3 Attitudes Towards Self Scale (ATSS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.3.4 Attitudes Towards Self Scale-Revised (ATSR)

Reliability

Validity

Best Evidence Synthesis (BES)

4.3.5 Temperament & Personality Questionnaire (TPQ)

Reliability

Validity

Best Evidence Synthesis (BES)

4.4 Self-criticism in response to difficult situations

4.4.1 Forms of Self-Criticising/Attaching and Self-Reassuring Scale (FSCRS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.4.2 Self-Compassion Scale (SCS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.5 Self-criticism as a mood regulation strategy

4.5.1 Inventory of Cognitive Affect Regulation Strategies (ICARUS)

Reliability

Validity

Best Evidence Synthesis (BES)

4.6 Measures of repetitive self-criticism

4.6.1 Habit Index of Negative Thinking (HINT)

Reliability

Validity

Best Evidence Synthesis (BES)

4.6.2 Self-Critical Rumination Scale (SCRS)

Reliability

Validity

Best Evidence Synthesis (BES)

5. Discussion

5.1 Self-criticism scales and subscales

5.1.1 Self-criticism as a trait

5.1.2 Self-criticism in response to difficult situations

5.1.3 Self-criticism as a mood regulation strategy

5.1.4 Measures of repetitive self-criticism

5.2 Assessing the methodological quality of included studies

5.2.1 Issues with content validity

5.2.2 Issues with reliability

5.2.3 A COSMIN ‘fair’ rating

5.2.4 Assessing face validity?

5.3 Recommendations

5.4 Limitations

5.5 Conclusions

References

Appendices contents page

Appendix 1. Table 2 Questionnaire Characteristics

Appendix 2. Table 3 Study Characteristics

Appendix 3. Table 4 Quality criteria for measurement properties assessed

Appendix 4. Table 5 Ratings for methodological quality and measurement properties

Appendix 5. Table 6 Construct validity – ratings for methodological quality and measurement property

List of Tables

Table 1 Best Evidence Synthesis (BES)

1. Introduction

1.1 Self-criticism

Self-criticism has been defined as a self-evaluative process where individuals judge aspects of themselves in a negative or harsh way (Shahar et al., 2015a). Experiencing self-criticism has been reported in the general population (Kupeli et al., 2013) and across a range of settings including sport (Anshel & Sutarso, 2010) and academia (Powers et al., 2011). It is thought to be closely related to shame (Smart et al., 2015), as well with as lower levels of self-compassion (Neff, 2003). A large amount of research has focused on its relationship with perfectionism; self-criticism is thought to be a central component of “perfectionistic concerns”, a negative form of perfectionism (Powers, Zuroff & Topciu, 2004; Bergman, Nyland & Burns, 2007). Furthermore, it has been suggested that self-critical elements of perfectionism are key to the link between perfectionism and depression (Gilbert, Durrant & McEwan, 2006).

As expected, higher levels of self-criticism have been reported in clinical populations compared with non-clinical populations (Baiao et al., 2014). Self-criticism is thought to be a transdiagnostic process as it has been associated with a number of different mental health problems. Previous research has particularly focused on its association with depression (Cox et al., 2004a; Luyten et al., 2007). Dunkley et al (2009) found that self-criticism predicted symptoms of depression and global psychosocial impairment across a 4-year period. Furthermore, self-criticism has been found to mediate the relationship between shame and depression (Pinto-Gouveia et al., 2013).

Self-critical individuals are also more likely to experience a range of other clinical difficulties including suicidality (O’Connor & Noyce, 2008), social anxiety (Cox, Fleet & Stein, 2004b; Shahar et al., 2015b), eating disorders (Fennig et al., 2008), compulsive exercise (Taranis & Meyer, 2010), binge eating disorder (Dunkley, Masheh & Grilo, 2010), Post-Traumatic Stress Disorder (Cox et al., 2004c; Harman & Lee, 2009) and persecutory delusions (Hutton et al., 2012).

In treatment studies, self-critical individuals have greater difficulties establishing and maintaining therapeutic relationships (Whelton, Paulson, & Marusiak, 2007), as well as worse therapeutic outcomes (Rector et al., 2000; Marshall et al., 2008). As self-critical individuals appear to be vulnerable to a wide range of mental health problems, and possibly have difficulties engaging in treatment, research has also begun to focus on treatments specifically targeting self-criticism (see Kannan & Levitt, 2013 for a review, as well as Shahar et al., 2012; 2015a for example treatment studies).

1.2 Measures of self-criticism

Different research groups have conceptualised self-criticism in different ways. As a consequence, a number of self-report questionnaires measuring self-criticism have been developed. These differ in terms of design, structure and content. Some questionnaires are designed to measure self-criticism as a single factor whereas others assess different forms of self-criticism. A number of questionnaires have been developed that contain a subscale measuring self-criticism as one component of a broader construct such personality traits associated with depression (Parker et al., 2006) or self-compassion (Neff, 2003). Furthermore, as well as focusing on the content of self-critical thinking, measures have been developed that define self-criticism, or negative self-thinking, as a mental habit, with more of a focus on its process, in terms of frequency or repetitiveness, controllability and level of awareness (Verplanken et al., 2007).

As no ‘gold standard’ questionnaire has been identified, some researchers have also attempted to measure self-criticism by using a mixture of items taken from different measures (for example, Cox et al., 2004a), or used questionnaires that were not originally developed to measure self-criticism, such as the Dysfunctional Assumptions Scale (DAS) (Weissman & Beck, 1978) or the original or revised versions of the Depressive Experiences Questionnaire (DEQ) (Blatt, 1976; Welkowitz, Lish & Bond, 1985; Bagby et al., 1994; Viglione et al., 1995; Santor, Zuroff & Fielding, 1997). Although the DEQ contains a factor called ‘self-criticism’, this factor aims to measure ‘introjective depression’, rather than the construct of self-criticism.

1.3 Rationale of current systematic review

The accurate measurement of clinical constructs with valid and reliable questionnaires is crucial. Having multiple measures of self-criticism creates a number of difficulties for both researchers and clinicians, especially when it is unclear about which questionnaires are of adequate psychometric quality (de Boer et al., 2004). Firstly, it is difficult to choose an appropriate questionnaire to use in a research study. This may be particularly problematic when wanting to use a questionnaire that has been validated in a non-clinical population with different patient groups. Secondly, if different measures are used, the comparison of results between research studies is very hard. Finally, if researchers use questionnaires that were not originally designed to measure self-criticism, or select items from different measures, it may lead uncertainty about the interpretation of their findings and conclusions.

2. Objective

The purpose of this systematic review was to identify and evaluate the measurement properties of self-report questionnaires of self-criticism. The characteristics, for example, length, content area and response options, and psychometric properties are reviewed and recommendations about the potential clinical and research utility of the different measures are made. This systematic review therefore allows an evaluation of current measures of self-criticism, as well a direct comparison between these measures. It is hoped that it will help both researchers and clinicians to make “evidence-based decisions” (Abma et al., 2012, P. 6) about which questionnaire is most appropriate for a particular context.

3. Method

3.1 Search strategy

OvidSP and Web of Science (WoS) were used to search through a number of databases. In WoS, the Core collection & Medline were both selected, excluding case reports, and refined by English language. In OvidSP, PsycINFO, Ovid Medline (R) (1946 to date of search) and Embase Classic+Embase (1947 to date of search) were selected, with English language added as a limit. The initial search took place in June 2015 and the search was updated in February 2016.

In order to account for publication bias, an initial scope of some Grey literature databases were completed in June 2015 (Mahood, Van Eerd & Irvin, 2014). The initial grey literature search meant that the author was fairly certain that no relevant unpublished papers were being excluded; therefore only published articles were included in the review. The Grey literature search terms and databases are listed below:

Grey literature search terms:

“self criticism” AND psychometric

Grey literature databases:

3.2 Search terms

The search terms used were:

“self critic*” OR “inner critic*” OR “negative think*” OR “negative self statements” OR “self judg*” OR “self attitude*” OR “attitude* toward self”

AND

Psychometric* OR reliab* OR valid* OR reproducib* OR construct* OR develop* OR creat* OR assess*

The search terms related to self-criticism were chosen to maintain the specificity of this systematic review, but also included broader terms (for example “negative think”) to reduce the risk of excluding potentially relevant papers. The use of broader search terms was also necessary because this systematic review included sub-scales of self-criticism and it was possible that authors may not have included all of the subscales names in the abstract. Of note, self-esteem was not included as a search term in this systematic review. Although it could be argued that the degree of self-criticism may be associated with one’s level of self-esteem, this review considered them to be related but distinct constructs.

In relation to the psychometric search terms, a number of these were chosen from a previously developed search filter (Terwee et al., 2009). A scope of previous systematic reviews focused on psychometric properties was also completed which informed the decision to add in some additional search terms. Furthermore, COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) recommends not using a search term for the “type of measurement instrument”, for example, questionnaire, inventory, scale, etc. It suggests that, because of the wide range of terminology used to describe questionnaires, this increases the risk of some being inappropriately excluded (COSMIN, 2016A).

3.3 Selection criteria

The inclusion and exclusion criteria was as follows:

Inclusion

  1. Full text original article published in English in a peer reviewed journal;
  2. The main focus of the article was to describe the development or evaluation of the measurement properties of the self-report questionnaire (or interview-schedule);
  3. The mentioned self-report questionnaire aimed to measure self-criticism (or a synonym of self-criticism) (either the whole measure, or a sub-scale).
  4. The measure or subscale focused on self-criticism in a general way across different life domains (rather than focusing on self-criticism about one particular activity, for example, sport or education);
  5. The measure or subscale that was used was the English version the particular questionnaire;
  6. The article assesses the questionnaire using the adult population (either clinical or non-clinical);

Exclusion

  1. Studies assessing questionnaires of self-criticism with children, young people or more than one adult (for example, romantic couples);
  2. The items of the self-report questionnaire could not be extracted or located (after internet searches completed, inter-library loan requests made & authors directly contacted);
  3. Articles that focused on the psychometric properties of a questionnaire that had been translated into a different language;
  4. Opinions, reviews, editorials, conference summary posters & unpublished dissertations were also excluded.

The inclusion criteria is in line with the COSMIN guidelines that state that the focus of the study for inclusion in such a systematic review “should be the development or evaluation of the measurement properties of a measurement instrument” (COSMIN, 2016A). Focusing only on psychometric studies is also in line with a number of previous systematic reviews that have used COSMIN (for example, Abma et al., 2012; Weldam et al., 2013).

Of note, only English version of questionnaires were included; Schellingerhout et al (2012) point out that pooling results from original and translated versions of questionnaires could result in inconsistent findings.

3.5 Selection process

After the initial search, the references were exported into EndNote and then into Excel. The titles and abstracts were screened by the main author. After this, full texts were found through EndNote, OvidSP, inter-library loans, Senate House (a central library for the University of London) and by contacting the authors directly. Authors were also contacted if a copy of the questionnaire items were not included in the original development paper. At the full-text stage, where unclear, the inclusion / exclusion of particular questionnaires or studies were discussed further with the author’s supervisor. Reference checking and citation tracking (using OvidSP and Google Scholar) were then carried out on studies that met the inclusion criteria.

3.6 Data extraction

The following data was extracted from the included studies:

  1. Questionnaire characteristics;
  2. Study characteristics (as recommended by Mokkink et al (2012) this data was extracted through completing the Interpretability and Generalisability COSMIN boxes);
  3. Evaluated measurement properties of the questionnaires.

3.7 Quality assessment

The quality assessment was completed in three stages. To ensure that the included articles met the inclusion / exclusion criteria and the quality assessment was accurate, fivearticles were double rated by two independent reviewers, the main author and another trainee clinical psychologist who was familiar with COSMIN. The strength of agreement between reviewers was ‘very good’ [k= 0.88, p<0.0005] (Altman, 1999).

3.7.1 Step one – assessment of the methodological quality of studies

Before a questionnaire can be used, research should evaluate its measurement properties, and these studies should be of high methodological quality (Mokkink et al., 2010a). As Terwee et al (2012) point out, if the methodological quality of a study is adequate, their results can be deemed valid and appropriate conclusions can be made from them, i.e. it can truly assess whether the instrument is useful. On the other hand, if the quality of a study is poor, it remains unclear what conclusions can be drawn from results.

The methodological quality of the included studies were assessed using COSMIN (Mokkink et al., 2010a) (a copy of the COSMIN checklist was retrieved online – see COSMIN, 2016B). COSMIN is the only specific tool focused on methodological quality assessment. The following domains are covered by COSMIN: internal consistency, reliability, measurement error, content validity, construct validity (divided into structural validity, hypothesis testing and cross-cultural validity), criterion validity and responsiveness. There are three additional boxes within COSMIN; the first is only completed if a study uses Item Response Theory (IRT) methods, an assessment method that takes both the item characteristics and participants personality abilities into account (An & Yung, 2014). The final two boxes focus on Interpretability, i.e. the degree to which one can interpret qualitative meaning from quantitative scores, and Generalisability, i.e. how generalizable the results of a study are.

COSMIN was completed using a 4-step procedure (Mokkink et al., 2012). In step 1, the author determined which properties were assessed in the specific study, and therefore which COSMIN boxes need to be completed. Sometimes, within one study the same measurement property had been assessed in multiple participant groups, or different participant groups had been used to assess different measurement properties; in both cases the relevant COSMIN boxes were completed more than once for that study. As pointed out by Mokkink et al (2012), this step required subjective judgement as studies used different terminology for measurement properties. In step 2, the author determined whether IRT methods had been used by the study and, if they had, the IRT box was completed. In step 3 and 4, the author completed the boxes that corresponded to those marked out in step 1, as well as the Interpretability and Generalisability box.

For each study or sample, the methodological quality for a particular measurement property was rated by a series of items on a 4-point nominal rating scale: poor, fair, good, and excellent (Terwee et al., 2012). For each measurement property, an overall score is determined by the lowest rating of any item, i.e. the “worst score counts”.

3.7.2 Step two – quality assessment of instruments

The assessment of the quality of each questionnaire was completed using the criteria proposed by Terwee et al (2007) (see Appendix 3). Of note, no criterion is provided for structural validity. Instead, for exploratory factor analyses, a criteria was used that has been outlined by previous systematic reviews (e.g. Schellingerhout et al., 2012), and for confirmatory factor analyses, a criteria was devised by the author after consultation with two other trainee clinical psychologists familiar with systematic reviews of measurement properties (see Appendix 3).

3.7.3 Step three – Best Evidence Synthesis (BES)

A BES was completed to summarise the evidence of the measurement properties for each questionnaire. The results of different studies were combined taking account of the number and methodological quality of the studies, the results of the measurement properties that were evaluated, as well as the consistency of results across studies. Each questionnaire was given an overall rating using a criteria similar to that proposed by the Cochrane Back Review Group (see Furlan et al., 2009; van Tulder et al., 2003). The criteria was adapted from Schellingerhout et al (2012) and has been used by a recent systematic review (Heinl et al., 2016).

3.8 Measurement properties

The terminology and definitions of measurement properties used in this systematic review are taken from Mokkink et al (2010b). The measurement properties are divided into 3 domains: reliability, validity and responsiveness. Reliability consists of internal consistency, reliability and measurement error. Validity contains content validity (including face validity), construct validity (further subdivided into structural validity, hypotheses testing, and cross-cultural validity) and criterion validity. The term responsiveness is used for both the domain and measurement property.

3.8.1 Measurement properties in this study

In this systematic review, only English versions of questionnaires were included; therefore cross-cultural validity was not assessed. Furthermore, since there is no ‘gold standard’ measure of self-criticism, criterion validity was not assessed. Of the included studies, no information was provided for responsiveness and measurement error (called ‘agreement’ in Terwee et al (2007) criteria). Focusing on the COSMIN interpretability box, information was only provided about how missing items were handled, and scores (i.e. means and standard deviations), meaning that the Terwee et al (2007) properties ‘floor and ceiling effects’ and ‘interpretability’ were not completed. Apart from those mentioned above, all other properties were assessed as part of steps 1 and 2 of the quality assessment.

4. Results

4.1 Selection of studies

The PRISMA flowchart is displayed in Figure 1. The database search resulted in a total of 4414 papers. The grey literature search resulted in an additional 11 papers. At this stage, unpublished version of a questionnaire of self-criticism (the Self-Critical Rumination Scale) was found and through contact with the author the (very recently) published paper was included for screening, therefore the total found through other sources was 12.

Removing duplicates left 2693 papers that were screened. After screening these papers titles and abstracts, 2557 papers were excluded.The full text of 136 papers were reviewed and 125 of these were excluded.Of the 11 included papers, both reference checking and citation tracking resulted in the addition of 1 paper.

One additional paper (the Temperament & Personality Questionnaire development paper) was found through manual searching, from a study that was screened at the full text stage. Although the study did not meet the inclusion criteria, through an internet search and contact with the author, the questionnaire’s original development paper was found and included.

4.1.1 Figure 1 PRISMA Flow Chart

Consort-Logo-Graphic-30-12-071

Additional records identified through other sources
(n = 12)

Records identified through database searching
(n = 4414)

Identification

Records after duplicates removed
(n = 2693)

Screening

Records excluded
(n = 2557)

Records screened
(n = 2693)

Full-text articles excluded, with reasons
(n = 125)

  • 1 full text not in English
  • 106 questionnaire doesn’t measure self-criticism
  • 13 not validation study
  • 5 not English language version of questionnaire

Full-text articles assessed for eligibility
(n = 136)

Additional studies

1 reference checking

1 citation tracking

1 manual search

Eligibility

Studies included in qualitative synthesis
(n = 14)

Included

4.2 Questionnaires found in search

This systematic review identified five questionnaires solely measuring self-criticism and five subscales measuring self-criticism. The questionnaires were grouped into the following categories: self-criticism as a trait, self-criticism in response to difficult situations, self-criticism as a mood regulation strategy and measures of repetitive self-criticism.

The questionnaire characteristics, including a description of the focus of the questionnaire, a description of the included items, probe statements and example items, and response options, are displayed in Appendix 1. The characteristics of the included studies are displayed in Appendix 2. As stated earlier, the quality criteria for the measurement properties assessed are displayed in Appendix 3. Appendix 4 displays the ratings for the methodological quality and measurement properties for internal consistency, reliability, content validity and structural validity. Appendix 5 displays a separate table for construct validity (measured through hypothesis testing). In this Appendix table, correlation coefficients and between group comparisons are presented. Only correlations were extracted for constructs that were deemed to be the most relevant for self-criticism research. These were self-criticism using a different questionnaire, self-esteem, self-compassion, mental health (for example, depression and general measures of anxiety), perfectionism, shame and rumination.

In the next section, the results per instrument are described. In Table 1 the results for each questionnaire are summarised into a Best Evidence Synthesis (BES). The BES summarises the evidence for each questionnaire taking account of the number and methodological quality of the studies (using COSMIN), the results of the measurement properties that were assessed within each study (see Appendix 3 for criteria), and the consistency of results across different studies. The BES resulted in each questionnaire being given an overall rating determined by predefined criteria that took account of both the methodological quality rating and the measurement property rating.

Table 7 Best Evidence Synthesis (BES)

Type of questionnaire Questionnaire Internal consistency Reliability Content validity Structural validity Hypothesis testing
Trait Self-Critical Cognition Scale + ? (limited) Weak + Weak
Trait Levels of Self-Criticism Scale + Not studied Weak ? (limited) ? (limited)
Trait Attitudes Towards Self Scale Not studied Weak Weak
Trait Attitudes Towards Self Scale-Revised Weak Weak + Weak
Trait Temperament & Personality Questionnaire ? (limited) Weak Weak + Not studied
Difficult situations Forms of Self-Criticising/Attaching and Self-Reassuring Scale ++ Not studied Weak ++ Conflicting findings
Difficult situations Self-Compassion Scale Conflicting findings Weak Weak Conflicting findings ++
Mood regulation Inventory of Cognitive Affect Regulation Strategies Conflicting findings Weak Weak Conflicting findings Conflicting findings
Repetitive self-criticism Habit Index of Negative Thinking Weak Weak Weak Not studied ++
Repetitive self-criticism Self-Critical Rumination Scale + ? (limited) +++ ++ +

Notes:

 

Overall rating (i) Level of evidence (ii) Criteria (iii)
+++ ; ? (strong) ; —

 

 

Strong Consistent findings in multiple studies of good methodological quality OR in one study of excellent methodological quality);
++ ; ? (moderate) ; – – Moderate Consistent findings in multiple studies of fair methodological quality OR in one study of good methodological quality
+ ; ? (limited) ; – Limited One study of fair methodological quality
Conflicting findings Conflicting Conflicting findings across studies
Weak Unknown Only studies of poor methodological quality
+ positive rating; ? indeterminate rating; – negative rating;

(i) Direction of rating (positive, indeterminate or negative) was based on the measurement property ratings (see Appendix 3);

(ii) Level of evidence was based on the methodological quality of studies;

(iii) Criteria was adapted from Schellingerhout et al (2012) and Heinl et al (2016).

4.3 Self-criticism as a trait

4.3.1 Self-Critical Cognition Scale (SCCS)

Reliability

The development paper assessed internal consistency and test-retest reliability using separate samples. For both, the methodological quality of the study was rated as fair as it was not explained how missing items were handled. In terms of the measurement properties, it received a positive rating for internal consistency.

Although it had a high test-retest correlation, it was given an indeterminate rating for test-retest reliability because it did not report an intraclass correlation (ICC). Further issues around test-retest reliability were related to the lack of details about the administration of the questionnaire at time 1 and time 2, including whether the testing conditions were similar, or whether participants were stable over the specified time period.

Validity

The methodological quality of the development paper for content validity was rated as poor and it was given an indeterminate rating for this measurement property. The target population was not stated and therefore there was no involvement of them in the item development or selection.

The methodological quality for structural validity was rated as fair and it was given a positive rating for this measurement property. For hypothesis testing the methodological quality was rated as poor because there was a poor description of the comparator instruments used in the study. It was given an indeterminate rating for this measurement property; no specific hypotheses were formulated and although for the majority of variables it was possible to deduce what was expected (for example, the relationship between self-criticism and self-esteem and depression), this was unclear for other variables (for example, shyness or social desirability).

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Limited positive evidence for internal consistency and structural validity;
    • Limited indeterminate evidence for reliability;
    • Unknown evidence for content validity and hypothesis testing due to the poor methodological quality.

4.3.2 Levels of Self-Criticism Scale (LSCS)

Reliability

The psychometric properties of the LSCS were assessed in the development paper, consisting of two studies. The first study assessed internal consistency; the methodological quality was rated as fair due to lack of information about how missing items were handled, with a positive rating for this psychometric property.

Validity

For content validity, the methodological quality was rated as poor and this measurement property was given an indeterminate rating as there was no description of the target population and no involvement in the item development. The methodological quality for structural validity was rated as fair, but it was given an indeterminate measurement property rating as the variance explained by the final factors was not mentioned.

The methodological quality for construct validity was rated as fair and it was given an indeterminate rating for this measurement property as only vague hypotheses were formulated. For example, one might expect the two subscales to relate differently to two forms of perfectionism (‘self’ versus ‘other’), however, no details were given about the direction of the expected relationships.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Limited positive evidence for internal consistency;
    • Limited indeterminate evidence for structural validity and hypothesis testing;
    • Unknown evidence for content validity due to the poor methodological quality;
    • Reliability was not studied.

4.3.3 Attitudes Towards Self Scale (ATSS)

Reliability

The methodological quality of the study for internal consistency was rated as fair. However, the measurement property was given a negative rating due to Cronbach alpha of the self-criticism subscale (less than 0.70).

Validity

The methodological quality for content validity was rated as poor and it was given an indeterminate rating for the measurement property; no clear description of the target population was given or involved with item development. Structural validity was rated as fair for the methodological quality but was given a negative rating for the measurement property as the factors only explained 40% of the variance. For hypothesis testing, the methodological quality was rated as poor and it was given an indeterminate rating for the measurement property. In the study, no specific hypotheses were made a priori and it was unclear what was expected.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Limited negative evidence for internal consistency and structural validity;
    • Unknown evidence for content validity and hypothesis testing due to the poor methodological quality;
    • Reliability was not studied.

4.3.4 Attitudes Towards Self Scale-Revised (ATSR)

Reliability

The psychometric properties of the ATTSR were assessed in one study with multiple samples. As with the original version, internal consistency was rated as fair for methodological quality and negative for the measurement property (the Cronbach alpha for the self-criticism subscale was 0.65).

For reliability, the test-retest correlation was low. The methodological quality was rated as poor as COSMIN states that two measurements (for time 1 and time 2) should be included in the study’s results. This measurement property was given an indeterminate rating as it assessed reliability using a statistical test other than that recommended by COSMIN (ICCs not specified).

Validity

For content validity, as in the original version, the ATSR was given a poor rating for methodological quality and an indeterminate rating for measurement property. Structural validity was assessed using a confirmatory factor analysis; the methodological quality was rated as fair and it was given a positive rating for this measurement property.

The methodological quality for hypothesis testing was rated as poor and it was given an indeterminate rating for this measurement property; no specific hypotheses were formulated and it was not possible to deduce what was expected. There was also a poor description of comparator instruments used.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Limited positive evidence for structural validity;
    • Limited negative evidence for internal consistency;
    • Unknown evidence for reliability, content validity and hypothesis testing due to the poor methodological quality.

4.3.5 Temperament & Personality Questionnaire (TPQ)

Reliability

The psychometric properties of the TPQ were assessed in its development paper using two community samples. The methodological quality for internal consistency was rated as fair. The measurement property was given an indeterminate rating because of the small sample size. Furthermore, not all the subscales Cronbach alphas were greater than 0.70, and because separate values were not presented for each subscale, it is unclear what the Cronbach alpha was for the self-criticism subscale.

Reliability was rated as poor for methodological quality and an indeterminate rating was given for the measurement property. There appeared to be a significant difference in depression scores between time 1 and time 2, suggesting that participants were not stable between the time interval. Furthermore, the time interval (ranging from 5 – 150 days) was not considered appropriate for all participants.

Validity

The development paper was rated as poor for content validity, and an indeterminate rating was given due to no target population being specified or involved with item development. Structural validity was rated as fair for the methodological quality and positive for the measurement property.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Limited positive evidence for structural validity;
    • Limited indeterminate evidence for internal consistency;
    • Unknown evidence for reliability and content validity due to the poor methodological quality;
    • Construct validity (hypothesis testing) was not studied.

4.4 Self-criticism in response to difficult situations

4.4.1 Forms of Self-Criticising/Attacking and Self-Reassuring Scale (FSCRS)

Reliability

The psychometric properties of the FSCRS were assessed in the original development paper (Gilbert et al., 2004), as well as in two additional studies that confirmed the factor structure of the questionnaire in a general population sample (Kupeli et al., 2013), and through secondary data analysis of data that had been collected by previous research studies (Baiao et al., 2015). In all three studies, the methodological quality for internal consistency was rated as fair and they were given a positive rating for this measurement property.

Validity

The content validity was assessed in development paper; the methodological quality was rated as poor and it was given an indeterminate rating for the measurement property as there was no clear description of the target population and no involvement in the item development stage.

Structural validity was assessed in all three studies; the methodological quality was rated as fair with positive ratings for the measurement property. For hypothesis testing, the three studies’ methodological quality was rated as fair. Of note, the development paper was one of the only studies to include another measure of self-criticism. Gilbert et al (2004) and Kupeli et al (2013) were given positive ratings for this measurement property. Baiao et al (2015) investigated gender differences, however, because they did not give specific hypotheses about what was expected and it was unclear what was expected based in their results, an indeterminate rating was given.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Moderate positive evidence for internal consistency and structural validity;
    • Conflicting findings for hypothesis testing;
    • Unknown evidence for content validity due to the poor methodological quality;
    • Reliability was not studied.

4.4.2 Self-Compassion Scale (SCS)

‘Self-judgement’ was considered to be a synonym of self-criticism and thus this subscale of the SCS met the inclusion criteria. The psychometric properties of the SCS were assessed in the development paper, divided into three studies. There has also been further examination of its factor structure by Williams et al (2014) using three community samples.

Reliability

In the development paper, for internal consistency, the methodological quality was rated as fair and it was given a positive rating for the measurement property. For internal consistency Williams et al (2014) referred to the original 6 factors found in Neff (2003). However, because they did not check unidimensionality themselves, the methodological quality was rated as fair and an indeterminate rating was given for the measurement property.

For reliability the test-retest correlation was high. However, the methodological quality of the development paper was rated as poor as COSMIN specifies that two measurements must be presented in the results. The measurement property was given an indeterminate rating as ICCs were not presented.

Validity

In the development paper, content validity was assessed in undergraduates using focus groups. The comprehensibility of the items were also checked by administering items to undergraduate participants. However, because the target population was not clearly defined, and there was not enough information to assume that undergraduates were the target population, it was categorised as no target population involvement. Thus, it was given a poor rating for the methodological quality and an indeterminate rating for this measurement property.

Structural validity was rated as fair for the methodological quality of the development paper but it was given an indeterminate rating for the measurement property because the amount of variance explained by the final factors was not presented. In Williams et al. (2014) for structural validity the methodological quality for two samples was rated as excellent, and a good rating was given to the third sample. A negative rating was given for the measurement property has the original factor structure was not confirmed through confirmatory factor analyses, thus the self-judgement subscale was not replicated.

Hypothesis testing was completed in all three of the studies in the development paper. The methodological quality was rated as fair and it was given a positive rating for the measurement property, although it is important to note that correlations were not presented for each subscale separately.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Moderate positive evidence for hypothesis testing;
    • Conflicting evidence for internal consistency and structural validity;
    • Unknown evidence for reliability and content validity due to the poor methodological quality.

4.5 Self-criticism as a mood regulation strategy

4.5.1 Inventory of Cognitive Affect Regulation Strategies (ICARUS)

Reliability

The psychometric properties of the ICARUS were assessed in the development paper which consisted of three studies, and used two different populations (undergraduates and outpatients). Internal consistency was assessed in all three studies. In the first and second study the methodological quality was rated as good, but in the third study it was rated as poor due to a small sample size. For all three studies the measurement property was given an indeterminate rating as the sample size did not equate to seven times the number of items in the questionnaire.

For reliability, the test-retest correlation was low. The methodological quality was rated as poor and it was given an indeterminate rating for this measurement property. These ratings were also due to the sample size used in the study (n = 28).

Validity

In relation to content validity, the target populations were clearly defined as individuals with a range of affect regulation styles including those who respond adaptively or poorly to small hassles or traumatic events, or who develop psychological disorders associated with affect dysregulation. However, as none of the target populations were used to assess whether all of the items were relevant for them, the methodological quality was rated as poor and it was given a negative rating.

For structural validity study one was rated as good for the methodological quality but was given an indeterminate rating for the psychometric property. Study three also assessed structural validity; although it was given a positive rating for the psychometric property, the methodological quality was rated as poor because the sample size equated to less than five times the number of questionnaire items.

All of the studies in the development paper focused on hypothesis testing. Study two focused on hypothesis testing using both an experimental design and a between group comparison, and because of this, they were rated separately (labelled as 2A and 2B in Appendices 2, 4, & 5).Overall, there were mixed results for hypothesis testing. Two studies were rated as poor for the methodological quality with indeterminate ratings for the measurement property because no specific hypotheses were made and it was not possible to deduce what was expected (study 1 & 2B). Study (2A) focused on the predictive validity of the ICARUS using a mood induction experiment. This study was rated as fair for methodological quality and an indeterminate rating was given for the measurement property. The study design made the results difficult to interpret; although a mood induction paradigm was used, the level of distress after the mood manipulation was relatively mild and participants were only given a very short period to employ affect-regulation strategies. In study 3, the methodological quality was rated as fair and it was given a positive rating for this measurement property. However, it is important to note that no specific hypotheses were made about the self-criticism/self-blame subscale.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Conflicting evidence for internal consistency, structural validity and hypothesis testing;
    • Unknown evidence for reliability and content validity due to the poor methodological quality.

4.6 Measures of repetitive self-criticism

4.6.1 Habit Index of Negative Thinking (HINT)

Reliability

The HINT is a measure of negative self-thinking, which is considered to be a synonym of self-criticism, and thus met the inclusion criteria. The psychometric properties of the HINT were explicitly assessed in four studies within the development paper. The internal consistency was also assessed in one additional study (Verplanken, 2006).

Although the Cronbach alphas for the HINT were consistently high, the methodological quality of the studies were rated as poor as factor analysis had not been used to confirm unidimensionality. This led to an indeterminate rating for this measurement property.

Despite the high test-retest reliability correlation,the methodological quality for test retest-reliability was rated as poor and an indeterminate rating was given for the measurement property. These ratings were given because ICCs were not used, and the results highlighted that 45% of participants had experienced at least one life event, suggesting that participants’ level of negative self-thinking may not have been stable during the time period.

Validity

The methodological quality of the study for content validity was rated as poor and a negative rating was given for this measurement property. These ratings were given because although the target population was defined as a “non-clinical population” (Verplanken et al., 2007, P. 527), the items of the HINT were adapted from the Self-Report Habit Index (Verplanken & Orbell, 2003) and there was no target population involvement in adapting the items.

Hypothesis testing was assessed in four studies; all of which were given a fair rating for methodological quality due to no information regarding how missing items were handled. A positive rating was given for this measurement property as specific hypotheses were outlined and the results were in line with these. Some of these studies specifically focused on the discriminant validity of the HINT, for example, whether negative self-talk differs from general negative thoughts.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Moderate positive evidence for hypothesis testing;
    • Unknown evidence for internal consistency, reliability and content validity due to the poor methodological quality;
    • Structural validity was not studied.

4.6.2 Self-Critical Rumination Scale (SCRS)

Reliability

The SCRS is the most recently developed questionnaire of self-criticism. The psychometric properties were assessed in its development paper which consisted of four separate studies.

For internal consistency, the methodological quality was rated as fair and a positive rating was given for this measurement property. The methodological quality for test-retest reliability was rated as fair. Although the test-retest correlation was high, it was given an indeterminate rating for the measurement property because test-retest reliability was assessed using a statistical test other than what COSMIN recommends (ICCs were not used). Also, because the questionnaire was completed through an online survey, it was unclear whether the test conditions were similar for both measurements.

Validity

In study one, content validity was assessed using both undergraduates and out-patients at a mental health clinic. As enough information was provided to assume that these were the target populations, the methodological quality was rated as excellent and it was given a positive rating for this measurement property.

Structural validity was assessed using both an exploratory and confirmatory factor analysis in separate studies. For both, the methodological quality was rated as fair and it was given a positive measurement property rating.

The construct validity was assessed through hypothesis testing using a large number of self-report questionnaires, including those that measured constructs particularly pertinent to the SCRS such rumination. The methodological quality was rated as fair and it was given a positive rating for this measurement property; specific hypotheses were formulated and the majority of the results were in line with hypotheses.

Best Evidence Synthesis (BES):

  • The BES resulted in:
    • Strong positive evidence for content validity;
    • Moderate positive evidence for structural validity;
    • Limited positive evidence for internal consistency and hypothesis testing;
    • Limited indeterminate evidence for test-retest reliability.

5. Discussion

The aim of this systematic review was to identify and evaluate the measurement properties of self-report measures of self-criticism. It took account of the methodological quality of the studies, and a Best Evidence Synthesis (BES) was completed. This review found five questionnaires that solely focused on self-criticism, and five that had a subscale measuring self-criticism. These ten questionnaires were further subdivided into categories based on the aim of their questionnaire.

The main theme that emerged from this systematic review was that the majority of studies were either rated as having poor methodological quality, or were given indeterminate or negative ratings for the measurement properties they studied. As well as this, two key issues emerged. Firstly, self-criticism was conceptualised differently by authors, leading to questionnaires with different content and structure. Furthermore, the way that self-criticism was defined was, at times, very broad or unclear. Thinking specifically about self-report measures, not having a clear or precise definition of self-criticism could impact on the item development and consequently its measurement properties, particularly those associated with the questionnaire structure, such as internal consistency and structural validity. It could also lead to a poor theoretical basis about the relationship self-criticism has with other constructs, affecting the quality of hypothesis testing.

Related to this, the second issue was the disparity between what a questionnaire aimed to measure, and the actual items used. As the focus of this systematic review was on questionnaires that “aimed to” measure self-criticism, the individual items were not formally evaluated. Nevertheless, on inspection, some items could be construed as measuring different affect or reactions to failure, high personal standards, and other distinct but overlapping constructs such as perfectionism, shame or self-esteem. This issue is discussed further below when discussing evidence for the different scales and in relation to face validity.

5.1 Self-criticism scales and subscales

5.1.1 Self-criticism as a trait

Two questionnaires and one subscale defined self-criticism as a dispositional tendency or broad personality construct. In terms of the BES, both the Self-Critical Cognition Scale (SCCS) and the Levels of Self-Criticism Scale (LSCS) had limited positive evidence for internal consistency and the SCCS had limited positive evidence for structural validity. However, the other measurement properties consisted of a mixture of limited indeterminate or weak evidence due to the poor methodological quality. There also appeared to be issues related to the questionnaire items; the LSCS did not mention the term self-criticism and the SCCS included broader items about the inability to keep a balanced perspective, and the exaggeration of negative aspects of oneself.

Authors of the Temperament and Personality Questionnaire (TPQ) defined self-criticism as a personality construct, specifically viewing it as predisposing individuals to depression. Apart from limited positive evidence for structural validity of their self-criticism subscale from the BES, the TPQ only had limited indeterminate or weak evidence. Furthermore, although the subscale items of the 109-item version appeared to have a more specific focus on self-criticism or being tough on oneself, the TPQ research team have cautioned the use of this version due to confusion over scoring (R. Graham, personal communication, January 15, 2016). It is also unclear which items are part of the self-criticism subscale within the other versions of the questionnaire.

The Attitudes Towards Self Scale (ATSS) and the ATSS Revised (ATSSR) conceptualised self-criticism as one of three potential self-regulatory vulnerabilities to depression. Of note, however, the actual items appeared to focus on reactions to failure, rather than specifically self-criticism. The BES highlighted that there was only limited positive evidence for the ATSSR’s structural validity, and other than this, there was a mixture of limited negative or weak evidence.

Looking at these results as a whole, it would suggest that there is limited positive evidence for questionnaires that view self-criticism as a trait. It could also be argued that there are issues with conceptualising self-criticism as a personality or self-regulatory dimension as it leads to a very broad definition and studies often lack detail about how to characterise this further. Furthermore, some questionnaires that view self-criticism as a personality construct (such as the LSCS), cite the research about Blatt’s depression vulnerability theory, or the Depressive Experiences Questionnaire (DEQ) (Blatt, D’Afflitti & Quinlan, 1976). This could lead to further confusion about the conceptualisation of self-criticism because the DEQ aims to measure ‘introjective depression’ rather than the construct self-criticism. Thus, the ‘self-criticism’ factor of the DEQ contains items that reflect a range of different constructs such as guilt, emptiness and hopelessness, as well as feeling unsatisfied, unable to assume responsibility and being threatened by change. A questionnaire that is influenced by the DEQ may therefore develop items that go beyond the construct of self-criticism, and may in turn affect the validity and reliability of the measure.

5.1.2 Self-criticism in response to difficult situations

This systematic review identified one scale and one subscale that focused on self-criticism when things go wrong for someone, or in difficult times. Firstly, the Forms of Self-Criticizing/Attacking Reassuring Scale (FSCRS) included items about self-criticism and other negative feelings about oneself in relation to failure such as disappointment, inadequacy and disgust. The psychometric properties were assessed in multiple studies; the BES resulted in moderate positive evidence for internal consistency and structural validity.

In terms of hypothesis testing, different studies received different ratings for this measurement property, resulting in ‘conflicting findings’ in the BES. Two studies received positive ratings as the results were in line with hypotheses made a priori. Of note, the development paper was one of the only studies to include another measure of self-criticism, thus allowing a comparison between different measures. Baiao et al (2015) received an indeterminate rating as it did not state hypotheses about expected gender differences, and based on their results (significant gender differences in only the non-clinical population), it was unclear what was expected.

The Self-Compassion Scale (SCS) conceptualised self-judgement (considered a synonym of self-criticism) as a negative component of self-compassion. The items focus being disapproving and intolerant about an individual’s flaws and other aspects of themselves they don’t like. The SCS had moderate positive evidence for hypothesis testing in the BES; the development paper included multiple hypotheses formulated a priori and the results were in accordance with these. It received a positive rating for internal consistency in development paper, however, the BES resulted in conflicting findings due to the indeterminate rating given to Williams et al (2014) for this measurement property. Similarly, the SCS’s structural validity was summarised as having conflicting findings. The development paper received an indeterminate rating for this measurement property as the amount of variance explained by factors was not recorded. Williams et al (2014), whose methodological quality was rated as good and excellent for different samples, completed a series of confirmatory factor analyses which did not confirm Neff’s original six factor structure, including the self-judgement subscale, resulting in a negative rating for this measurement property. This suggests that there are potential issues with the original structure of the SCS proposed by Neff. Finally, the SCS received a weak rating for content validity in the BES. Although the items were piloted in undergraduates, the development paper also used the questionnaire with Buddhist individuals, meaning that it was unclear what the target population was.

5.1.3 Self-criticism as a mood regulation strategy

The Inventory of Cognitive Affective Regulation Strategies (ICARUS) defined self-criticism/self-blame as one of many cognitive strategies that an individual might use when experiencing negative affect. Items focus on self-criticism but also include broader items such as concentrating on, or repetitively thinking about negative emotions. The BES resulted in conflicting findings for internal consistency, structural validity and hypothesis testing, and weak evidence for reliability and content validity due to the poor methodological quality of the studies. One consistent theme with the ICARUS was because it has a total of 59 items, it fell down on COSMIN items that were related to sample sizes, which were often too small in relation to the total number of items. Future research would therefore need to use a relatively large sample size to secure better methodological ratings for this measure.

5.1.4 Measures of repetitive self-criticism

Two measures of repetitive self-criticism were identified; the Self-Critical Rumination Scale (SCRS) and the Habitual Index of Negative Thinking (HINT). The SCRS focused on both the process of self-criticism in terms of frequency and repetitiveness and its content, including feeling ashamed of oneself. Its psychometric properties were evaluated in a very comprehensive development paper. The BES resulted in moderate positive evidence for structural validity and limited positive ratings for internal consistency and hypothesis testing. Although the test-retest reliability was high, it received an indeterminate rating for this measurement property as ICCs were not used. The SCRS was the only questionnaire to receive a positive rating for content validity; because the items were piloted in both a non-clinical and clinical population there was enough information to assume that these were its target populations.

The HINT was the only measure that focused solely on the process of negative self-thinking (considered to be a synonym of self-criticism) as a habit, as opposed to focusing on the content. It measured different features of the concept of a habit, such a frequency, lack of conscious intent and lack of awareness of initiation. The BES resulted in weak evidence for content validity, internal consistency and reliability due to the poor methodological quality of the studies. Methodological issues included no target population involvement in item development stage for content validity, and the lack of stability of participants between measurements for test-retest reliability. Furthermore, although the Cronbach alphas were consistently high for the HINT, no factor analyses were performed to confirm unidimensionality of the scale. Although the BES resulted in moderate positive evidence for hypothesis testing, this is difficult to interpret in the context of the issues described above.

5.2 Assessing the methodological quality of included studies

This systematic review used COSMIN to assess the methodological quality of the included studies. COSMIN uses a “worst score counts” method whereby an overall score is determined by the lowest rating of any item. A number of themes emerged in regards common areas where studies were marked down on.

5.2.1 Issues with content validity

Firstly, apart from the SCRS, all measures were given a poor methodological quality rating for content validity. The COSMIN item where all studies fell down on was question two: “was there an assessment of whether all items are relevant for the study population?” In the majority of studies the items were developed by the authors. Although this is important as they would be considered as ‘experts’ in their field, according to COSMIN, it is also crucial for studies to define their target population and use individuals from this population to assess the included items. Furthermore, inspection of the study characteristics highlighted that the majority of studies used an undergraduate population, and it cannot be assumed that this was the intended target population of each questionnaire. This issue is particularly important for self-criticism, which researchers wish to measure in a wide range of non-clinical and clinical populations. Thus, defining the target population and making sure the items are relevant for them prior to using the questionnaire is crucial for the accurate measurement of self-criticism.

5.2.2 Issues with reliability

Secondly, for test-retest reliability only one study (Parker et al., 2006) explicitly stated that Intraclass Correlation Coefficients (ICCs) were calculated. COSMIN state that ICCs are the preferred statistical method for test-retest reliability with continuous scores as Pearson’s and Spearman’s correlation coefficients do not take account of systematic error (Mokkink et al., 2012). Because ICCs were not used by the majority of studies, the test-retest reliability correlation has been mentioned for each scale, i.e. whether it was high or low. This seemed appropriate because, although scales received the same COSMIN rating, some had test-retest reliability correlations that were less than 0.70, which is widely accepted as being an unacceptable test-retest reliability value (Test-Retest Coefficient, 2016).

In the BES there was weak evidence for the test-retest reliability for five questionnaires (the HINT, TPQ, ATSSR, SCS and the ICARUS). Methodological issues with reliability included the use of inappropriate time intervals, not presenting two measurements (i.e. means and SDs) in the study results, and not ensuring the stability of participants between time points. Because of the consistent weak ratings for reliability, it is recommended that this should be an area for high quality future research.

5.2.3 A COSMIN ‘fair’ rating

Finally, a large number of studies were given a fair rather than a good rating across measurement properties due to a lack of information about how missing items were handled. Whilst it is assumed that all researchers carefully consider how to handle missing items, not explicitly including this information does not allow for a full interpretation of the study’s findings. For example, if there were large number of missing items, the decision to include or exclude these from the analysis may impact on the final results.

5.2.4 Assessing face validity?

A possible limitation of COSMIN is that it does not have a criteria to evaluate the face validity of each questionnaire, defined as the degree to which items of a questionnaire look as though they are an adequate reflection of the construct to be measured (Mokkink et al., 2010B). In the COSMIN manual it states that because face validity involves subjective judgement no criteria has been developed (Mokkink et al., 2012). However, without criteria for face validity, it is possible that a questionnaire could be given positive ratings for both the methodological quality of a study and the measurement properties but the items may not actually measure self-criticism. Therefore researchers and clinicians selecting a measure are urged to check that the scale probe question, items and response ratings appear assessing the construct of interest to them, rather than focusing purely on the findings from the BES. It would be helpful for future research to focus on the development of a set of criteria to formally assess face validity. This may be particularly pertinent for research areas such as self-criticism where there is no universally agreed definition of this construct.

5.3 Recommendations

Tentative recommendations are given based on the current level of evidence (future high quality studies may change these recommendations). It is also important to emphasise that, due to the different conceptualisations of self-criticism, the questionnaire of choice will ultimately depend on the particular research approach or question.

Since the SCRS had consistent positive ratings, this systematic review would recommend its use in future research if the focus is on frequent or repetitive self-critical thinking. In regards to the FSCRS, because of its moderate positive evidence for internal consistency and structural validity, this systematic review would also recommend this measure for researchers or clinicians wishing to assess self-criticism in response to things going wrong, particularly if they also wanted to assess self-hatred separately and / or self-reassurance. It would be important for future studies to focus on its test-retest reliability as this has not yet been assessed.

Due to the limited positive evidence for questionnaires that define self-criticism as a broad personality or self-regulatory dimension, it is recommended that future research either conduct high quality studies focused on their measurement properties or develop and evaluate alternative measures. Of note, because of the lack of positive evidence for the ATSS and ATSSR it is recommended that these subscales are not used to measure self-criticism in future research. In regards to the other subscales of self-criticism, because of the limited positive evidence, conflicting or weak evidence for the TPQ, SCS and the ICARUS, this systematic review cannot make strong recommendations about their use to measure self-criticism.

The HINT has it’s a unique focus on solely the process of habitual negative self-thinking, so researchers and clinicians may be keen to use this scale. However, the methodological quality of the studies was poor, so further research is required to assess the psychometric properties in high quality studies.

In terms of more general recommendations, it is suggested that future studies assessing construct validity include more than one measure of self-criticism to allow for better comparison between measures. Furthermore, none of the included studies assessed every measurement property. Of note, responsiveness, defined as the ability of a questionnaire to detect clinically important changes over time (see both Terwee et al., 2007 & Mokkink et al., 2010b), was not assessed in any study. This would be particularly important to explore in future studies, as self-criticism is being targeted through specific interventions, and therefore these questionnaires will potentially be used as outcome measures. Lastly, future research could consider using COSMIN to aid the development of a new measures of self-criticism.

5.4 Limitations

This systematic review has a number of limitations. Firstly, only including studies that focused on the English version of self-report measures may have resulted in selection bias. However, the inclusion of translated versions could have resulted in inconsistent findings regarding the measurement properties of the same questionnaire; because of this, it has been suggested that separate systematic reviews are conducted for translated versions of measures (Schellingerhout et al., 2010).

Secondly, this systematic review only included studies that were specifically focusing on the evaluation of psychometric properties of self-report questionnaires. It could therefore have excluded some studies with an experimental design that evaluated properties as part of this, for example, calculating the Cronbach alpha for the study population.

Finally, this systematic review only included questionnaires that specifically aimed to measure self-criticism. As highlighted previously, some studies have used measures that were not originally designed to measure self-criticism such as the Dysfunctional Assumptions Scale (DAS) (Weissman & Beck, 1978) and the DEQ (Blatt, D’Afflitti & Quinlan, 1976). However, it is hoped that by not including these in this systematic review it will act as a strong caution to future researchers, as only scales or subscales specifically designed to measure a particular construct will lead to truly valid and reliable results.

5.5 Conclusions

Valid and reliable measures of self-criticism are need by both researchers and clinicians. This systematic review evaluated the measurement properties of scales and subscales measuring self-criticism, as well as assessing the methodological quality of included studies. Five scales and five subscales were found across 14 studies. These scales were designed to measure four main different types of self-critical thinking: trait self-criticism, repetitive self-criticism, self-criticism in response to difficult situations and self-criticism as a as a mood regulation strategy. Across all questionnaires, there were issues with content validity, specifically around defining and involving target populations with item development. Furthermore, although not formally evaluated, there appeared to be issues with the final items included in the questionnaires; the majority appeared to measure ideas and constructs beyond the construct of self-criticism. This therefore highlighted the need for a standard criterion to be developed measuring face validity. Finally, although tentative recommendations were made about the use of the SCRS and the FSCRS on the basis of existing evidence, further high quality research is needed into these and some of the other scales. Due to differences between the precise focus of measures, such as self-critical rumination or self-criticism at times of difficulty, the final decision about which questionnaire to use will ultimately depend on the goals of the researcher or clinician.

References

Abma, F. I., van der Klink, J. J., Terwee, C. B., Amick, B. C. I., & Bültmann, U. (2012). Evaluation of the measurement properties of self-reported health-related work-functioning instruments among workers with common mental disorders. Scandinavian journal of work, environment & health, 5-18.

Altman, D. G. (1999). Practical statistics for medical research. New York, NY: Chapman & Hall/CRC Press.

An, X. & Yung, Y. F. (2014). Item Response Theory: What it is and how you can use the IRT procedure to apply it. Retrieved from: https://support.sas.com/resources/papers/proceedings14/SAS364-2014.pdf.

Anshel, M. H., & Sutarso, T. (2010). Conceptualizing maladaptive sport perfectionism as a function of gender. Journal of Clinical Sport Psychology, 4(4), 263-281.

Baião, R., Gilbert, P., McEwan, K., & Carvalho, S. (2015). Forms of Self‐Criticising/Attacking & Self‐Reassuring Scale: Psychometric properties and normative study. Psychology and Psychotherapy: Theory, Research and Practice, 88(4), 438-452.

Bagby, R. M., Parker, J. D., Joffe, R. T., & Buis, T. (1994). Reconstruction and validation of the Depressive Experiences Questionnaire. Assessment,1(1), 59-68.

Bergman, A. J., Nyland, J. E. & Burns, L. R. (2007). Correlates with perfectionism and the utility of a dual process model. Personality and Individual Differences, 43, 389-399.

Blatt, S., D’Afflitti & Quinlan, D. M. (1976). Experiences of Depression in Normal Young Adults. Journal of Abnormal Psychology, 85(4), 383 – 389.

Blatt, S. (2004). Experiences of depression: Theoretical, clinical, and research perspectives. Washington, DC: American Psychological Association.

Blatt, S.J., & Zuroff, D.C. (1992). Interpersonal relatedness and self‐definition: Two prototypes for depression. Clinical Psychology Review, 12, 527–562.

Carver, C. S., & Ganellen, R. J. (1983). Depression and components of self-punitiveness: high standards, self-criticism, and overgeneralization. Journal of abnormal Psychology, 92(3), 330.

Carver, C. S., La Voie, L., Kuhl, J., & Ganellen, R. J. (1988). Cognitive concomitants of depression: A further examination of the roles of generalization, high standards, and self-criticism. Journal of Social and Clinical Psychology, 7(4), 350.

COSMIN A – Systematic Reviews of Measurement properties (2016, January 1). Retrieved from: http://www.cosmin.nl/Systematic%20reviews%20of%20measurement%20properties.html.

COSMIN B – The COSMIN Checklist (2016, January 1). Retrieved from: http://www.cosmin.nl/COSMIN%20checklist.html.

Cox, B. J., McWilliams, L. A., Enns, M. W., & Clara, I. P. (2004a). Broad and specific personality dimensions associated with major depression in a nationally representative sample. Comprehensive Psychiatry, 45(4), 246-253.

Cox, B. J., Fleet, C., & Stein, M. B. (2004b). Self-criticism and social phobia in the US national comorbidity survey. Journal of Affective Disorders, 82(2), 227-234.

Cox, B. J., MacPherson, P. S., Enns, M. W., & McWilliams, L. A. (2004c). Neuroticism and self-criticism associated with posttraumatic stress disorder in a nationally representative sample. Behaviour research and therapy, 42(1), 105-114.

De Boer, M. R., Moll, A. C., De Vet, H. C., Terwee, C. B., Völker‐Dieben, H. J., & Van Rens, G. H. (2004). Psychometric properties of vision‐related quality of life questionnaires: a systematic review. Ophthalmic and Physiological Optics, 24(4), 257-273.

Dunkley, D. M., Sanislow, C. A., Grilo, C. M., & McGlashan, T. H. (2009). Self-criticism versus neuroticism in predicting depression and psychosocial impairment for 4 years in a clinical sample. Comprehensive psychiatry,50(4), 335-346.

Dunkley, D. M., Masheb, R. M., & Grilo, C. M. (2010). Childhood maltreatment, depressive symptoms, and body dissatisfaction in patients with binge eating disorder: The mediating role of self‐criticism. International Journal of Eating Disorders, 43(3), 274-281.

Fennig, S., Hadas, A., Itzhaky, L., Roe, D., Apter, A., & Shahar, G. (2008). Self‐criticism is a key predictor of eating disorder dimensions among inpatient adolescent females. International Journal of Eating Disorders,41(8), 762-765.

Furlan, A. D., Pennick, V., Bombardier, C., & van Tulder, M. (2009). 2009 updated method guidelines for systematic reviews in the Cochrane Back Review Group. Spine, 34(18), 1929-1941.

Gilbert, P., Clarke, M., Hempel, S., Miles, J. N. V., & Irons, C. (2004). Criticizing and reassuring oneself: An exploration of forms, styles and reasons in female students. British Journal of Clinical Psychology, 43(1), 31-50.

Gilbert, P., Durrant, R., & McEwan, K. (2006). Investigating relationships between perfectionism, forms and functions of self-criticism, and sensitivity to put-down. Personality and Individual Differences, 41(7), 1299-1308.

Harman, R., & Lee, D. (2010). The role of shame and self‐critical thinking in the development and maintenance of current threat in post‐traumatic stress disorder. Clinical psychology & psychotherapy, 17(1), 13-24.

Heinl, D., Prinsen, C.A.C., Deckert, S., Chalmers, J.R., Drucker, A.M., Ofenloch, R., Humphreys, R., Sach, T., Chamlin, S.L., Schmitt, J., Apfelbacher, C. Measurement properties of adult quality-of-life measurement instruments for eczema: a systematic review. Allergy 2016; 71: 358–370.

Hutton, P., Kelly, J., Lowens, I., Taylor, P. J., & Tai, S. (2013). Self-attacking and self-reassurance in persecutory delusions: A comparison of healthy, depressed and paranoid individuals. Psychiatry research, 205(1), 127-136.

Ishiyama, F. I., & Munson, P. A. (1993). Development and validation of a self-critical cognition scale. Psychological reports, 72(1), 147-154.

Kamholz, B. W., Hayes, A. M., Carver, C. S., Gulliver, S. B., & Perlman, C. A. (2006). Identification and evaluation of cognitive affect-regulation strategies: Development of a self-report measure. Cognitive Therapy and Research, 30(2), 227-262.

Kannan, D., & Levitt, H. M. (2013). A review of client self-criticism in psychotherapy. Journal of Psychotherapy Integration, 23(2), 166.

Kupeli, N., Chilcot, J., Schmidt, U. H., Campbell, I. C., & Troop, N. A. (2013). A confirmatory factor analysis and validation of the forms of self‐criticism/reassurance scale. British Journal of Clinical Psychology, 52(1), 12-25.

Luyten, P., Sabbe, B., Blatt, S. J., Meganck, S., Jansen, B., De Grave, C., … & Corveleyn, J. (2007). Dependency and self‐criticism: relationship with major depressive disorder, severity of depression, and clinical presentation.Depression and anxiety, 24(8), 586-596.

Mahood, Q., Van Eerd, D., & Irvin, E. (2014). Searching for grey literature for systematic reviews: challenges and benefits. Research synthesis methods,5(3), 221-234.

Marshall, M. B., Zuroff, D. C., McBride, C., & Bagby, R. M. (2008). Self‐criticism predicts differential response to treatment for major depression. Journal of clinical psychology, 64(3), 231-244.

Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … & De Vet, H. C. (2010a). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Quality of Life Research, 19(4), 539-549.

Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … & de Vet, H. C. (2010b). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of clinical epidemiology, 63(7), 737-745.

Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Startford, P. W., Knol, D. L., Bouter, L. M., de Vet, H. CW. (2012). COSMIN Checklist Manual. Retrieved from: http://www.cosmin.nl/COSMIN%20checklist.html.

Neff, K. D. (2003). The development and validation of a scale to measure self-compassion. Self and identity, 2(3), 223-250.

O’Connor, R. C., & Noyce, R. (2008). Personality and cognitive processes: Self-criticism and different types of rumination as predictors of suicidal ideation. Behaviour Research and Therapy, 46(3), 392-401.

Parker, G., Manicavasagar, V., Crawford, J. O., Tully, L., & Gladstone, G. (2006). Assessing personality traits associated with depression: the utility of a tiered model. Psychological medicine, 36(08), 1131-1139.

Pinto‐Gouveia, J., Castilho, P., Matos, M., & Xavier, A. (2013). Centrality of shame memories and psychopathology: The mediator effect of self‐criticism. Clinical Psychology: Science and Practice, 20(3), 323-334.

Powers, T. A., Zuroff, D. C. & Topciu, R. A. (2004). Covert and overt expressions of self‐criticism and perfectionism and their relation to depression. European Journal of Personality, 18, 61-72.

Powers, T. A., Koestner, R., Zuroff, D. C., Milyavskaya, M., & Gorin, A. A. (2011). The effects of self-criticism and self-oriented perfectionism on goal pursuit. Personality and Social Psychology Bulletin, 37(7), 964-975.

Rector, N. A., Bagby, R. M., Segal, Z. V., Joffe, R. T., & Levitt, A. (2000). Self-criticism and dependency in depressed patients treated with cognitive therapy or pharmacotherapy. Cognitive Therapy and Research, 24(5), 571-584.

Santor, D. A., Zuroff, D. C., & Fielding, A. (1997). Analysis and revision of the Depressive Experiences Questionnaire: Examining scale performance as a function of scale length. Journal of Personality Assessment, 69(1), 145-163.

Schellingerhout, J. M., Verhagen, A. P., Heymans, M. W., Koes, B. W., Henrica, C., & Terwee, C. B. (2012). Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review.Quality of Life Research, 21(4), 659-670.

Shahar, B., Carlin, E. R., Engle, D. E., Hegde, J., Szepsenwol, O., & Arkowitz, H. (2012). A pilot investigation of emotion‐focused two‐chair dialogue intervention for self‐criticism. Clinical Psychology & Psychotherapy,19(6), 496-507.

Shahar, B., Szsepsenwol, O., Zilcha‐Mano, S., Haim, N., Zamir, O., Levi‐Yeshuvi, S., & Levit‐Binnun, N. (2015a). A Wait‐List Randomized Controlled Trial of Loving‐Kindness Meditation Programme for Self‐Criticism. Clinical psychology & psychotherapy, 22(4), 346-356.

Shahar, B., Doron, G., & Szepsenwol, O. (2015b). Childhood Maltreatment, Shame‐Proneness and Self‐Criticism in Social Anxiety Disorder: A Sequential Mediational Model. Clinical psychology & psychotherapy, 22(6), 570-579.

Smart, L. M., Peters, J. R., & Baer, R. A. (2015). Development and validation of a measure of self-critical rumination. Assessment, 1073191115573300.

Taranis, L., & Meyer, C. (2010). Perfectionism and compulsive exercise among female exercisers: High personal standards or self-criticism? Personality and Individual differences, 49(1), 3-7.

Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., … & de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of clinical epidemiology, 60(1), 34-42.

Terwee, C. B., Jansma, E. P., Riphagen, I. I., & de Vet, H. C. (2009). Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Quality of Life Research, 18(8), 1115-1123.

Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R. W., Bouter, L. M., & de Vet, H. C. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Quality of Life Research, 21(4), 651-657.

Test-Retest Reliability Coefficient: Examples & Concept (2016, March 1). Retrieved from: http://study.com/academy/lesson/test-retest-reliability-coefficient-examples-lesson-quiz.html

Thompson, R., & Zuroff, D. C. (2004). The Levels of Self-Criticism Scale: comparative self-criticism and internalized self-criticism. Personality and individual differences, 36(2), 419-430.

Van Tulder, M., Furlan, A., Bombardier, C., Bouter, L., & Editorial Board of the Cochrane Collaboration Back Review Group. (2003). Updated method guidelines for systematic reviews in the Cochrane Collaboration Back Review Group. Spine, 28(12), 1290-1299.

Verplanken, B. (2006). Beyond frequency: Habit as mental construct. British Journal of Social Psychology, 45(3), 639-656.

Verplanken, B., & Orbell, S. (2003). Reflections on Past Behavior: A Self‐Report Index of Habit Strength1. Journal of Applied Social Psychology, 33(6), 1313-1330.

Verplanken, B., Friborg, O., Wang, C. E., Trafimow, D., & Woolf, K. (2007). Mental habits: Metacognitive reflection on negative self-thinking. Journal of Personality and Social Psychology, 92(3), 526.

Viglione Jr, D. J., Lovette, G. J., Gottlieb, R., & Friedberg, R. (1995). Depressive Experiences Questionnaire: An empirical exploration of the underlying theory. Journal of personality assessment, 65(1), 91-99.

Weissman, A. N., & Beck, A. T. (1978). Development and validation of the Dysfunctional Attitude Scale: A preliminary investigation. Presented at the Annual Meeting of the American Educational Research Associatior, Toronto, Ontario, Canada.

Weldam, S. W., Schuurmans, M. J., Liu, R., & Lammers, J. W. J. (2013). Evaluation of Quality of Life instruments for use in COPD care and research: a systematic review. International journal of nursing studies, 50(5), 688-707.

Welkowitz, J., Lish, J. D., & Bond, R. N. (1985). The depressive experiences questionnaire: Revision and validation. Journal of Personality Assessment,49(1), 89-94.

Whelton, W. J., Paulson, B., & Marusiak, C. W. (2007). Self-criticism and the therapeutic relationship. Counselling Psychology Quarterly, 20(2), 135-148.

Williams, M. J., Dalgleish, T., Karl, A., & Kuyken, W. (2014). Examining the factor structures of the five facet mindfulness questionnaire and the self-compassion scale. Psychological Assessment, 26(2), 407.

Appendices contents page

Appendix 1. Table 2 – Questionnaire Characteristics

Appendix 2. Table 3 – Study characteristics

Appendix 3. Table 4 – Quality criteria for measurement properties assessed

Appendix 4. Table 5 – Ratings for methodological quality and measurement properties

Appendix 5. Table 6 – Ratings for construct validity

Appendix 1. Table 2 Questionnaire Characteristics

Questionnaire Type of questionnaire Original reference Questionnaire designed to measure Description of items Probe statements & example items Items for scales or self-criticism subscales Response options (Likert scales)
Self-Critical Cognition Scale Trait Ishiyama & Munsun (1993) To assess the “dispositional tendency” to process information about the self in a self-critical way. (P. 148). It has two subscales: ‘Negative-self processing’ & ‘Failure in positive self-processing’. Items focus on self-criticism, making negative social comparisons, an inability to keep a balanced perspective about oneself & exaggeration of negative aspects of oneself Probe statement: unclear. Negative self-processing: “I tend to blow my weaknesses, limitations and mistakes out of proportion in my thinking”; Failure in positive self-processing: “I’m good at looking at myself critically while still remaining positive toward myself” (P. 150) 13 6-point (agree-disagree scale)
Levels of Self-Criticism Scale Trait Thompson & Zuroff (2004) Self-criticism is conceptualised as a broad personality construct consisting of two developmental levels (Comparative Self-Criticism (CSC) and Internalised Self-Criticism (ISC)). CSC is defined as a negative view of oneself compared with other people. ISC is defined as a negative view of oneself compared with internalised personal standards. No mention of self-criticism. CSC items focus on social anxiety, concerns & dilemmas. ISC items focus on affect & reactions to failure, high personal standards & experience of shame Probe statement: unclear. CSC: “I don’t spend much time worrying about what other people will think of me (Reversed)”; ISC: “When I don’t succeed, I find myself wondering how worthwhile I am” (P. 424) 22 7-point (1 =not at all; 7= very well)
Attitudes Towards Self Scale Trait (subscale) Carver & Ganellen (1983) The ATSS measures three self-regulatory vulnerabilities to depression (high-standards, overgeneralisation and self-criticism). Self-criticism is defined as making harsh judgements of oneself for failing to attain a standard. No mention of self-criticism. Items focus on affect & reactions to failure & high personal standards Probe statement: unclear. “When my behaviour doesn’t live up to my standards, I feel I have let myself or someone else down” (P. 333) 4 5-point (1= extremely untrue, 5= extremely true)
Attitudes Towards Self Scale-Revised Trait (subscale) Carver et al (1988) As above – the ATSSR was developed to produce “cleaner” subscales. (P. 352) No mention of self-criticism. Items focus on affect & reactions to failure Probe statement: unclear. “I get angry with myself if my efforts don’t lead to the results I wanted” (P. 353) 3 5-point (‘I agree very strongly’ to ‘I disagree very strongly’, middle option ‘neither agree nor disagree’)
Temperament & Personality Questionnaire Trait (subscale) Parker et al (2006) The TPQ measures personality traits or constructs thought to predispose individuals to depression. Self-criticism is defined as the tendency to be very tough on oneself. Items focus on self-criticism, being tough/hard on oneself, high personal standards & a sense of satisfaction with oneself Probe statement: “Please tick the option that best describes the way you usually or generally feel or behave (over the years and not just recently”. Item: “I find it hard to measure up to my own standards” (available online) Multiple versions of TPQ (81, 89 & 109-item versions). 4 or 8 in 109-item version 4-point (3= very true; 2= moderately true, 1= slightly true, 0= not true at all)
Forms of Self-Criticising/Attacking and Self-Reassuring Scale In response to difficult situations Gilbert et al (2004) To assess forms of self-attacking when things go wrong for people. Separated into two forms: ‘Inadequate self’ focuses on attending to failures and inadequacies, and ‘Hated self’ focuses on more aggressive/disgust based self-attacking. Also, measures ‘self-reassurance’, defined as the ability to be reassuring to oneself when things go wrong. Items focus on self-criticism, disliking oneself, not feeling good enough and other feelings about oneself associated with failure including disappointment, inadequacy, anger, frustration & disgust. There are also positively worded items about feeling good enough, loveable & acceptable Probe statement: “When things go wrong for me…” Items: Inadequate self: “I remember and dwell on my failings”; Hated self:“I call myself names”; Reassure self:“I am able to remind myself of positive things about myself” (P. 37) 22 (Kupeli et al (2013) developed 18-item version) 5-point (0= not at all like me, 4= extremely like me)
Self-Compassion Scale In response to difficult situations Neff (2003) The SCS assesses levels of self-compassion in terms of 3 main components (divided into 6 sub-scales): self-kindness VS self-judgement; common humanity VS isolation; mindfulness VS over-identification. Self-judgement is conceptualised as a negative component of self-compassion, and is defined as being disapproving or judging of one’s inadequacies and failures. Items focus on self-judgement; being disapproving, intolerant & impatient about flaws, inadequacies & aspects of one’s personality that you don’t like First probe statement: “How I typically act towards myself in difficult times” Item: “When I see aspects of myself that I don’t like, I get down on myself” (available online) 5 5-point (1= Almost never, 5= Always always)
Inventory of Cognitive Affect Regulation Strategies Mood regulation strategy Kamholz et al (2006) The ICARUS assesses the deliberate and conscious cognitive affect-regulation strategies people use to reduce distressing emotions. Self-criticism/self-blame is defined as focusing on one’s own perceived weakness and inadequacy. Items focus on self-criticism, self-blame & thoughts about one’s shortcomings, faults & mistakes. Also broader items focused on concentrating on negative emotions or repetitive thinking in response to negative emotions First probe statement: “Indicate what you generally think about to make your mood better when you are sad…” Item: “I think about all my shortcomings, failings, faults and mistakes” (P. 231) 6 4-point (1= I don’t do this at all; 2= I do this a little bit; 3= I do this a medium amount; 4= I do this a lot)
Habit Index of Negative Thinking Repetitive self-criticism Verplanken et al (2007) A measure of the habit of negative self-thinking (adapted from the Self-Report Habit Index (Verplanken & Orbell, 2003). Focuses on the way a person thinks (as opposed to the content of thoughts). Items focus on aspects of negative self-thoughts including whether they are frequent, automatic, unintentional & difficult to disengage from. First probe statement: “Thinking negatively about myself is something…” Item: “I do frequently” (P. 541) 12 Verplanken (2006): 5-point (1= disagree completely, 5= agree completely). Verplanken et al (2007) used both 7-point & 5-point ( ‘strongly disagree’ to ‘strongly agree’)
Self-Critical Rumination Scale Repetitive self-criticism Smart, Peters & Baer (2015) To assess self-critical rumination. Self-criticism is conceptualised as a form of negative thinking that focuses on devaluing oneself. Items also focus on ruminative qualities of thinking: “frequent, prolonged, repetitive & difficult to control”. (P. 2). Items focus on frequency & repetitiveness of self-criticism. Items also explore the content of thoughts, for example, whether someone focuses on aspects of themselves that they are ashamed of First probe statement: unclear. “My attention is often focused on aspects of myself that I’m ashamed of” (P. 6) 10 4-point (1= not at all, 2=a little, 3= moderately, 4=very much)

Notes: CSC: Comparative self-criticism; HS: Hated self; ICS: Internalised self-criticism; IS: Inadequate self; RS: Reassured self.

Appendix 2. Table 3 Study Characteristics

Questionnaire Author(s) N Population Diagnoses Age – Mean (SD) Demographic information Means & SDs (scales or self-criticism subscales) Country Missing items
SCCS Ishiyama & Munsun (1993) Sample (1) 561 Undergraduates N/A 22.3 (6.1) Victoria University; 27.1 (9.0) McGill university Total sample – 210 males; 350 females;1 unidentified sex. Victoria University – 182 males; 272 females. McGill university 28 males; 78 females; 1 identified sex Total sample = 40.3 (11.2); males = 39.6 (10.1); females = 40.8 (11.8). Canada NR
SCCS Ishiyama & Munsun (1993) Sample (2) 142 Unclear N/A NR 83 males; 59 females T1 = 39.1 (11.9). T2 = 38.3 (11.9) Unclear NR
LSCS Thompson & Zuroff (2004) Study (1) 282 Undergraduates N/A NR 144 females; 138 males N/A USA/Canada NR
LSCS Thompson & Zuroff (2004) Study (2) 144 Undergraduates N/A NR 75 females; 69 males NR USA/Canada NR
ATSS Carver & Ganellen (1983) Sample (1) 1083 Undergraduates N/A NR 594 males; 489 females N/A USA NR
ATSS Carver & Ganellen (1983) Sample (2) 502 Undergraduates N/A NR 260 males; 242 females See Appendix 5 – Construct validity USA NR
ATSSR Carver et al (1988) Study (1) 478 University students N/A NR NR N/A USA NR
ATSSR Carver et al (1988) Study (2 ) & (4 ) (data combined) Study 2 n = 170; Study 4 n = 219 (samples combined for analyses) University students N/A NR NR NR USA NR
ATSSR Carver et al (1988) Study (4) (subset of participants) 197 University students N/A NR NR NR USA NR
ATSSR Carver et al (1988) Study (5) (depression group) Depression group n = 5; Control group n = 11 Inpatients & hospital staff Depression NR NR NR USA NR
ATSSR Carver et al (1988) Study (5) (whole patient group) 70 Inpatients 24 Bipolar Disorder (12 in manic phase); 17 Schizophrenia; 7 SchizoAffective Disorder; 7 Atypical Psychosis; 5 Major Depression; 3 Dysthymic Disorder; 3 Adjustment Disorder; 2 Alcohol Dependence; 1 Schizophreniform Disorder; 1 Unspecified Nonpsychotic Mental Disorder 33.1 (9.43) NR NR USA NR
TPQ Parker et al (2006) Sample (1) 529 Community sample (recruited at GP surgery) N/A 35.5 (14.1) 54% females N/A Australia NR
TPQ Parker et al (2006) Sample (2) 52 Outpatients Depression 41.3 (NR) 51.9% females NR Australia NR
FSCRS Gilbert et al (2004) 246 Undergraduates N/A 27.7 (7.2) All females Total sample – IS = 16.75 (8.44); HS = 3.86 (4.58); RS = 19.81 (5.92) UK NR
FSCRS Kupeli et al (2013) Sample (1) 764 University students & community sample (recruited online) N/A 28.6 (10.6) Gender – 18.1% males (n = 138); 81.9% females (n = 626). Ethnicity – 76.2% White (n = 582). N/A UK NR
FSCRS Kupeli et al (2013) Sample (2) 806 As above N/A 28.3 (10.6) Gender – 17% males (n = 137); 83% females (n = 669). Ethnicity – 74.4% White (n = 600). N/A UK NR
FSCRS Kupeli et al (2013) Sample (3) 1224 (deduced by author) Community sample (recruited online) N/A NR NR See Appendix 5 – Construct validity UK NR
FSCRS Baião et al (2015) Non-clinical n = 887. Clinical n = 171 (after 4 excluded) Secondary analyses on data from 12 previous studies (7 non-clinical; 5 clinical groups) 100 (58.48%) Depression; 16 (9.36%) Personality Disorder; 13 (7.60%) Substance Abuse; 9 (5.26%) Anxiety; 3 (1.54%) Bipolar Disorder. (Missing data = 30) Non-clinical population = 24.13 (7.79). Clinical population = 44.22 (12.05) (missing data = 23 clinical participants) Non clinical population – 210 males; 676 females. Clinical population – 67 males; 91 females (missing data for 13 clinical participants) See Appendix 5 – Construct validity UK NR
SCS Neff (2003) Study (1) – content validity Focus group n = 68. Piloting of items n = 71 Undergraduates N/A Focus group= 21.7 (2.32). Piloting of items = 21. (2.03) Focus group – 30 males; 38 females. Piloting of items – 24 males; 47 females N/A USA N/A
SCS Neff (2003) Study (1) – main study 391 Undergraduates N/A 20.91 (2.27) Gender – 166 males; 22 females. Ethnicity – 58% White; 21% Asian; 11% Hispanic; 4% Black; 6% other Total sample = 3.14 (0.79); Males = 3.00 (0.81); Females = 3.24 (0.77) USA NR
SCS Neff (2003) Study (2) 232 Undergraduates N/A 21.31 (3.17) Gender – 87 males; 145 females. Ethnicity – 58% White; 22% Asian; 14% Hispanic; 3% Black; 3% other. NR USA NR
SCS Neff (2003) Study (3) Students n = 232; Buddhist n = 43 Community sample (recruited from Buddhist email list subscription) N/A Students = 21.31 (3.17); Buddhists = 47 (9.71) Students: Gender – 87 males; 145 females. Ethnicity – 58% White; 22% Asian; 14% Hispanic; 3% Black; 3% other. Buddhists: Gender – 16 males; 27 females. Ethnicity – 91% White; 5% Asian; 2% other. See Appendix 5 – Construct validity USA NR
SCS Williams et al (2014) Sample (1) 821 Community sample (recruited online) N/A 25.7 (9.8) Gender – 697 females (74.1%). Ethnicity – 800 (85.1%) White; 140 14.9%) Other 12.10 (4.40) UK EX
SCS Williams et al (2014) Sample (2) 211 Community sample (recruited online) N/A 46.51 (13.1) Gender – 153 females (65.1%). Ethnicity – 216 (91.9%) White; 19 (8.1%) Other 17.15 (4.29) UK EX
SCS Williams et al (2014) Sample (3) 390 Community sample (recruited through MBCT trial) Recurrent Depressive Disorder 50.16 (11.8) Gender – 325 females (76.6%). Ethnicity – 410 (96.7%) White; 4 (0.9%) Other; 10 (2.4%) Missing 11.81 (3.93) UK EX
ICARUS Kamholz et al (2006) Study (1) Pilot study 1 n = 193; Pilot study 2 & main sample used n = 398 (after 28 excluded) Undergraduates N/A 86.2% = 21 years and younger Gender – 59% females. Ethnicity – 44.8% Caucasian; 30.7% Hispanic; 12.1% African American; 7.1% Asian; 5.3% Other or mixed 2.38 (0.64) USA EX
ICARUS Kamholz et al (2006) Study (2A) 132 Undergraduates N/A 20.27 (3.24) Gender 62% females. Ethnicity – 46.8% Caucasian; 34.1% Hispanic; 10.3% African American; 8.7% Asian 1.49 (0.61) USA NR
ICARUS Kamholz et al (2006) Study (2B) 132 Undergraduates N/A As above As above As above USA NR
ICARUS Kamholz et al (2006) Study (3) 208 Outpatients 137 (66%) Substance-Use Disorder; 129 (62%) at least one Axis I psychiatric diagnosis; 93 (45%) two diagnoses; 62 (30%) three diagnoses. 91 (71%) mood disorder; 54 (42%) PTSD; 33 (26%) non-PTSD anxiety disorder 48; (37%) Psychotic Disorder 49 (7.98) Gender – n = 201, 97% males. Ethnicity – n = 140, 67.3% Caucasian; n = 54, 26% African-American; n = 2, 1% Hispanic, n = 4, 1.9% Native American; 8, 3.8% Other. 2.38 (0.75) USA NR
HINT Verplanken (2006) Study (2) 194 University students N/A NR 123 females;71 males 2.32 (1) Norway NR
HINT Verplanken et al (2007) Study (1) 157 University students N/A NR 95 females; 61 males (1 participant did not disclose) NR Norway NR
HINT Verplanken et al (2007) Study (4) 155 University students N/A NR 88 females; 66 males (1 participant did not disclose) 2.70 (1.05) Norway NR
HINT Verplanken et al (2007) Study (5) 125 University students N/A NR 79 females; 46 males 3.03 (1.36) USA NR
HINT Verplanken et al (2007) Study (8) T1: n = 1682. T2: n = 1102 Community sample (recruited via postal system) N/A 40.27 (8.23) T1: 939 females; 736 males (7 did not disclose). T2: 641 females; 461 males T1 = 2.72 (1.56). T2: NR Norway EX
SCRS Smart, Peters & Baer (2015) Study (1) Undergraduates n = 25; adult outpatient n = 13 Undergraduates & outpatients N/A NR NR N/A USA N/A
SCRS Smart, Peters & Baer (2015) Study (2) 420 (after 90 excluded) Undergraduates N/A 18.99 (1.44) Gender – 51.9% females. Ethnicity – 71.9% Caucasian. 2.17 (0.73) USA NR
SCRS Smart, Peters & Baer (2015) Study (3) 143 Undergraduates N/A 19.00 (1.46) Gender – 69.9% females. Ethnicity – 72.2% Caucasian. N/A USA NR
SCRS Smart, Peters & Baer (2015) Study (4) 70 Undergraduates N/A NR Gender – 89.9% female. Ethnicity – 91.3% Caucasian T1 = 1.90 (SE = 0.08); T2 = 1.83 (SE = 0.08) USA NR

Notes: ATSS: Attitudes Towards Self Scale; ATSR: Attitudes Towards Self Scale-Revised; FSCRS: Forms of Self-Criticising/Attaching and Self-Reassuring Scale; HINT: Habit Index of Negative Thinking; HS: Hated self; ICARUS: Inventory of Cognitive Affect Regulation Strategies; IS: Inadequate self; LSCS: Levels of Self-Criticism Scale; N/A: Not applicable; NR: Not recorded; RS: Reassured self; SD: standard deviation; SCCS: Self-Critical Cognition Scale; SCRS: Self-Critical Rumination Scale; SCS: Self-Compassion Scale; TPQ: Temperament & Personality Questionnaire; T1: Time 1; T2: Time 2.

Appendix 3. Table 4 Quality criteria for measurement properties assessed

Property Definition Quality criteria based on Quality criteria
Internal consistency The extent to which items in a (sub) scale are intercorrelated, thus measuring the same construct Terwee et al (2007) + Factor analyses performed on adequate sample size (7 * # items and ≥100) AND Cronbach’s alpha(s) calculated per dimension AND Cronbach’s alpha(s) between 0.70 and 0.95;

 

? No factor analysis OR doubtful design or method;

– Cronbach’s alpha(s) <0.70 or >0.95, despite adequate design and method.

Reliability (test-retest) The extent to which scores for participants who have not changed are the same for repeated measures over time Terwee et al (2007) + ICC or weighted Kappa ≥ 0.70;

 

? Doubtful design or method (e.g., time interval not mentioned);

– ICC or weighted Kappa < 0.70, despite adequate design and method.

Content validity The extent to which the domain of interest is comprehensively sampled by the items in the questionnaire Terwee et al (2007) + A clear description is provided of the measurement aim, the target population, the concepts that are being measured, and the item selection AND target population and (investigators OR experts) were involved in item selection;

 

? A clear description of above-mentioned aspects is lacking OR only target population involved OR doubtful design or method;

– No target population involvement.

Construct validity (hypothesis testing) The extent to which scores on a

 

particular questionnaire relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured

Terwee et al (2007) + Specific hypotheses were formulated AND the majority of the results are in accordance with these hypotheses;

 

? Doubtful design or method (e.g., no hypotheses);

– Less than 75% of hypotheses were confirmed, despite adequate design and methods.

Structural validity The degree to which the scores of a (sub) scale are an adequate reflection of the dimensionality of the construct to be measured Exploratory factor analysis – Schellingerhout et al (2012) + Factors explain at least 50% of the variance

 

? Explained variance not mentioned

– Factors explain <50% of the variance

Confirmatory factor analysis – devised by author + Factor structure confirmed

 

? Unclear if factor structure confirmed

– Factor structure not confirmed

Notes: ICC: Intraclass correlation; + positive rating; ? indeterminate rating; – negative rating.

Appendix 4. Table 5 Ratings for methodological quality and measurement properties

Methodological quality rated using COSMIN and measurement properties rated using Appendix 3 Table 4

    Internal consistency Internal consistency (i) Reliability Reliability Content validity Content validity Structural validity Structural validity (ii)
  Author(s) Methodological quality Measurement property Methodological quality Measurement property Methodological quality Measurement property Methodological quality Measurement property
SCCS Ishiyama & Munsun (1993) Sample (1) Fair [+ Negative self-processing: 0.89; Failure in positive self-processing: 0.77]     Poor ? Fair [+ NSP = 43%; FPSP = 9.1%]
SCCS Ishiyama & Munsun (1993) Sample (2)     Fair [? $ Test-retest reliability – r138 = 0.83 for total sample; r81 = 0.82 for males; r57 = 0.86 for females. TI: 6.5 weeks]        
LSCS Thompson & Zuroff (2004) Study (1) Fair [+ CSC 0.81; ISC 0.87]     Poor ? Fair [?]
ATSS Carver & Ganellen (1983) Sample (1) Fair [- Self-criticism: 0.65]     Poor ? Fair [- 40%]
ATSSR Carver et al (1988) Study (1) Fair [- Self-criticism: 0.65]     Poor ? Fair [CFA +]
ATSSR Carver et al (1988) Study (2 ) & (4 ) (data combined)     Poor [? $ Test-retest correlations – Self-criticism: 0.59.TI: 6 weeks]        
TPQ Parker et al (2006) Sample (1) Fair [? Cronbach alphas – ranged from 0.62 to 0.91 (Individual subscales not reported)]     Poor ? Fair [+50%]
TPQ Parker et al (2006) Sample (2)     Poor [ ? ICCs recorded for each subscale – Self-criticism: 0.73 (p<0.001). TI: mean = 29 days (range 5 – 150 days)]        
FSCRS Gilbert et al (2004) Fair [+ IS: 0.90; RS: 0.86; HS: 0.86]     Poor ? Fair [+ 58.32%]
FSCRS Kupeli et al (2013) Sample (1)             Fair (Items 4, 18 & 20 removed due to low factor loadings) [+ IS = 47.52; HS = 8.8%; RS = 6.74%]
FSCRS Kupeli et al (2013) Sample (2) Fair [+ 18-items – IS: 0.90; RS: 0.88; HS: 0.83. Original 22-items – IS: 0.91; RS: 0.88; HS: 0.86]         Fair (Items 4, 18, 20 & 22 removed due to low factor loadings) [CFA +]
FSCRS Baião et al (2015) Fair [+ Non-clinical – IS: 0.90; RS: 0.85; HS: 0.85. Clinical – IS: 0.91; RS: 0.85; HS: 0.87]         Fair [CFA +]
SCS Neff (2003) Study (1) – content validity         Poor ?    
SCS Neff (2003) Study (1) – main study Fair [+ Self-judgement = 0.77]         Fair [?]
SCS Neff (2003) Study (2)     Poor [? $ Test-retest reliability – Self-judgement = 0.88. TI: 3 weeks]     Fair [?]
SCS Williams et al (2014) Sample (1) Fair [? Self-judgement = 0.8]         Excellent [CFA -]
SCS Williams et al (2014) Sample (2) Fair [? Self-judgement = 0.82]         Excellent [CFA -]
SCS Williams et al (2014) Sample (3) Fair [? Self-judgement = 0.78]         Good [CFA -]
ICARUS Kamholz et al (2006) Study (1) Good [? Self-Criticism/Self-Blame = 0.81]     Poor Good [?]
ICARUS Kamholz et al (2006) Study (2A) Good [? Self-criticism/self-blame = 0.83]            
ICARUS Kamholz et al (2006) Study (3) Poor [? Self-criticism/self-blame = 0.85] Poor [? $ Test-retest reliability correlation coefficients – self-criticism/self-blame = 0.65 (p<0.001). TI: 1 month]     Poor [+73.5%]
HINT Verplanken (2006) Study (2) Poor [? 0.95. (No FA)]            
HINT Verplanken et al (2007) Study (1) Poor [? 0.943. (No FA)]     Poor ?    
HINT Verplanken et al (2007) Study (4) Poor [? 0.945. (No FA)]            
HINT Verplanken et al (2007) Study (5) Poor [? 0.947. (No FA)]            
HINT Verplanken et al (2007) Study (8) Poor [? 0.955. (No FA)] Poor [? $ Test -retest reliability = 0.801 (p<0.01). TI: 9 months]        
SCRS Smart, Peters & Baer (2015) Study (1)         Excellent +    
SCRS Smart, Peters & Baer (2015) Study (2) Fair [+ 0.92.]         Fair [+ 58.4%]
SCRS Smart, Peters & Baer (2015) Study (3)             Fair [CFA +]
SCRS Smart, Peters & Baer (2015) Study (4)     Fair [? $ Test-retest correlation = 0.86 (& no statistical difference found between scores). TI: 13 -37 days]        

Notes: ATSS: Attitudes Towards Self Scale; ATSR: Attitudes Towards Self Scale-Revised; CSC: Comparative self-criticism; FSCRS: Forms of Self-Criticising/Attaching and Self-Reassuring Scale; HINT: Habit Index of Negative Thinking; HS: Hated self; ICARUS: Inventory of Cognitive Affect Regulation Strategies; ICS: Internalised self-criticism; IS: Inadequate self; LSCS: Levels of Self-Criticism Scale; RS: Reassured self; SCCS: Self-Critical Cognition Scale; SCRS: Self-Critical Rumination Scale; SCS: Self-Compassion Scale; TPQ: Temperament & Personality Questionnaire;

CFA: Confirmatory factor analysis;

 

ICC: Intraclass correlation coefficient;

TI: Time interval;

T1: Time 1;

T2: Time 2;

$: Statistical test other than what COSMIN recommends;

i:Cronbach’s alpha presented;

ii: percentage of variance explained presented.

Appendix 5. Table 6 Construct validity – ratings for methodological quality and measurement property

Methodological quality rated using COSMIN and measurement properties rated using Appendix 3 Table 4

Questionnaire Author(s) Methodological quality Measurement property (i) Results for scales or self-criticism subscales
SCCS Ishiyama & Munsun (1993) Sample (1) Poor ? Correlation coefficients (p<0.05) 1. Self-esteem (n = 416) = -0.71. 2. Depression (n = 168) = 0.42. 3. Between group comparison: higher count of negative self-descriptive adjectives in ‘high’ self-critical group (p<0.01).
LSCS Thompson & Zuroff (2004) Study (2) Fair ? Correlation coefficients (P<0.05) for CSC :1. Distress = 0.53; 2. Self-esteem = -0.66;3. Perfectionism-self = 0.21; Perfectionism-other = 0.21; Perfectionism-social = 0.46. For ISC: 1. Distress = 0.44. 2. Self-esteem = -0.52.3. Perfectionism-self = 0.45. Perfectionism-other = 0.24. Perfectionism-social = 0.49.
ATSS Carver & Ganellen (1983) Sample (2) Poor ? Between group comparison (p<0.02): Gender – Males: Mean = 15.08 (SD = 3.19); Females: Mean = 15.79 (SD = 3.43)
ATSSR Carver et al (1988) Study (2 ) & (4 ) (data combined) Poor ? Correlation coefficients (*p<0.05 **p<0.01) Depression (study sample 2) = 0.15*; (study sample 4) = 0.26**.
ATSSR Carver et al (1988) Study (4) (subset of participants) Poor ? No results presented
ATSSR Carver et al (1988) Study (5) (depression group) Poor ? No results presented
ATSSR Carver et al (1988) Study (5) (whole patient group) Poor ? No results presented
FSCRS Gilbert et al (2004) Fair + Correlation coefficients (* = <0.05; ** =<0.01) for IS: 1. Depression = 0.52*; 2. ISC = -0.77**. CSC = 0.63**. For HS: 1. Depression = 0.57**. 2. ISC = 0.45**. CSC = 0.55**. For RS: 1. Depression = -0.51**. 2. ISC = -0.45**. CSC = -0.63**
FSCRS Kupeli et al (2013) Sample (3) Fair + Correlation coefficients (**p<0.001) 18-item FSCRS 1. Happiness – RS = -0.66**; HS = -0.66**; IS = -0.60**. 22-item FSCRS 1. Happiness – RS = -0.66**; HS = -0.66**; IS = -0.62**. 2. Between group comparison with 18-item version: Gender – females – **IS Mean = 18.3 (SD = 6.4); **RS Mean = 22.2 (SD = 6.8); HS Mean = 9.0 (SD = 4.9). Males – **IS Mean = 16.3 (SD = 6.5); **RS Mean = 20.6 (SD = 7.0); HS Mean = 8.5 (SD = 4.4)
FSCRS Baião et al (2015) Fair ? 1. Between group comparison (**p = 0.000): **RS – Clinical: Mean = 10.68 (SD = 6.51); Non-clinical: Mean = 20.27 (SD = 5.77). **IS Clinical: Mean = 27.47 (SD = 7.51); Non-clinical: Mean = 17.72 (SD = 8.29). **HS Clinical: Mean = 12.26 (SD = 5.67); Non-clinical: Mean = 3.88 (SD = 4.59). 2. Between group comparison: No significant differences found for gender in clinical population. Gender in non-clinical population – **RS – males: Mean = 21.20 (SD = 5.27); females: Mean = 19.98 (SD = 5.90) **IS – males: Mean = 16.42 (SD = 7.44); females: Mean = 18.11 (SD = 8.50). **HS – males: Mean = 3.36 (SD = 3.71); females Mean = 4.05 (SD = 4.83) (p = .058)
SCS Neff (2003) Study (1) – main study Fair + Between group comparison (p<0.005) – Gender – Males Mean = 3.00 (SD = 0.81). Females Mean = 3.24 (SD = 0.77)
SCS Neff (2003) Study (2) Fair + No results presented
SCS Neff (2003) Study (3) Fair + Between group comparison (p<0.001) – Buddhist Mean = 2.20 (SD = 0.65); Students Mean 3.07 (SD = 0.82)
ICARUS Kamholz et al (2006) Study (1) Poor ? Between group comparison – Gender – 2×15 (gender by strategy) repeated measures ANOVA completed. No significant interactions found.
ICARUS Kamholz et al (2006) Study (2A) Fair ? Mood induction experiment to test predictive validity – no correlations presented.
ICARUS Kamholz et al (2006) Study (2B) Poor ? Between group comparison – Gender – 2×15 (gender by strategy) repeated measures ANOVA completed. No significant interactions found.
ICARUS Kamholz et al (2006) Study (3) Fair + Correlation coefficients (*p<0.05 **p<0.01 ***p<0.001) 1. Depression = 0.60***; 2. Anxiety = 0.57***
HINT Verplanken (2006) Study (2) Fair + Correlation coefficients (**P<0.001) 1. Past frequency of ‘negative self-thinking’ = 0.648**. 2. Self esteem = -0.737**. 3. Depressive/anxiety symptoms = 0.571**.
HINT Verplanken et al (2007) Study (1) Fair + Task used to test hypotheses (Story & thought-listing protocol). HINT correlated significantly with negative self-thoughts (r = 0.295, p<0.001). Correlation between HINT and negative self-thoughts was significantly different to HINT and general negative thoughts (z = 2.02, p<0.05).
HINT Verplanken et al (2007) Study (4) Fair + Correlation coefficients (p<0.001) 1. Rumination = 0.665; 2. Self-esteem = -0.555
HINT Verplanken et al (2007) Study (5) Fair + Corelation coefficients (**p<0.01, ***p<0.001) 1. ‘Negative self-thinking’ = 0.537***; 2. Explicit self-esteem = -0.473***; 3. Implicit self-esteem = -0.279**
SCRS Smart, Peters & Baer (2015) Study (2) Fair + Correlation coefficients (*p<0.05; **p<0.01) 1. Rumination = 0.81**; 2. Brooding = 0.68**. 3. Rumination-anger = 0.67**; 4. Rumination-anxiety = 0.59**; 5. Rumination-interpersonal = 0.53**; 6. Rumination-social situations = 0.65**; 7. Self-criticism = 0.81**; 8. Shame (different measures) = 0.55**; 0.66**; 0.73**; 9. Self-compassion = -0.62**; 10. Depression/anxiety = 0.58**

Notes: ATSS: Attitudes Towards Self Scale; ATSR: Attitudes Towards Self Scale-Revised; CSC: Comparative self-criticism; FSCRS: Forms of Self-Criticising/Attaching and Self-Reassuring Scale; HINT: Habit Index of Negative Thinking; HS: Hated self; ICARUS: Inventory of Cognitive Affect Regulation Strategies; ICS: Internalised self-criticism; IS: Inadequate self; LSCS: Levels of Self-Criticism Scale; RS: Reassured self; SCCS: Self-Critical Cognition Scale; SCRS: Self-Critical Rumination Scale; SCS: Self-Compassion Scale; TPQ: Temperament & Personality Questionnaire.

(i) + Specific hypotheses were formulated AND the majority of the results are in accordance with these hypotheses;

? Doubtful design or method (e.g., no hypotheses);

– Less than 75% of hypotheses were confirmed, despite adequate design and methods.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Therapy"

Therapy is often thought of in relation to talk therapy, or psychotherapy, but therapy is simply a treatment not involving drugs or surgery that attempts to remedy a health problem, whether physical or mental.

Related Articles

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: