The primary aim of this study was to develop and evaluate a state ER questionnaire specifically designed for adolescents in anxiety-inducing situations. To this end, we employed an experimental online methodology using case vignettes that simulated such situations (pre-registered at https://osf.io/8sahf). We pursued three main research objectives: First, we aimed to determine whether a confirmatory factor analysis (CFA) would support item loadings on superordinate factors corresponding to key ER strategies, including reappraisal, avoidance, rumination, acceptance, and distraction24,25. Second, we evaluated whether scores derived from the questionnaire would demonstrate at least acceptable internal consistency at the scale level, with Cronbach's alpha coefficients exceeding 0.726. Third, we assessed the questionnaire's retest reliability by examining the moderate correlations of state ER responses across multiple situations -- reflecting the short-term stability of ER strategies within individuals27. Additional exploratory analyses, which deviated from the preregistration, are reported in Supplementary Appendix A.
Data were collected as part of the project "Gefühle im Gleichgewicht: Neue methodische Ansätze im Kindes- und Jugendalter (Feelings in Balance: New methodological approaches in childhood and adolescence)", conducted during the COVID-19 pandemic between January to December 2021. The study received ethics approval from the Institutional Review Board of the Department of Psychology at Humboldt-Universität zu Berlin (study number: 2020-65). Participants, along with their parents or legal guardians, received written information and provided informed consent prior to participation. The study was conducted in accordance with the principles outlined in the Declaration of Helsinki.
A structural equation model was computed for CFA (psychometric testing ERQ State). Sample size recommendations vary, ranging from simple rules of thumb (e.g., five times the number of estimated parameters) to more complex simulations (e.g.). Given the pilot nature of the study, simulations were not conducted, and the commonly recommended sample size of n = 250 was adopted (e.g.). Recruitment was carried out through schools, mailing lists, websites, student and parent committees, and social media platforms. As detailed in the pre-registration, data collection was limited by available time and personnel resources and was therefore concluded in December 2021. A total of N = 107 adolescents completed all questionnaires. Two participants were excluded due to consistently selecting the same response during the second administration of the ERQ State items. The final sample analyzed comprised N = 105 adolescents. To capture developmental differences across adolescence (cf.), participants were recruited across a broad age range of 10 to 17 years. Exclusion criteria included incomplete informed consent and systematic response biases (e.g., consistently selecting the same response option). As a token of appreciation, participants were offered the chance to enter a raffle to win one of five 50€ gift cards.
The study employed a within-subjects design using the Humboldt-Universität zu Berlin version of LimeSurvey, an open-source survey platform (https://www.limesurvey.org/de). The primary within-subject factor was the type of vignette presented (see Fig. 1). Participants and their guardians received detailed information about the study, including confidentiality assurances and the right to withdraw. A contact address was provided for participant inquiries. Informed consent was obtained before participants provided sociodemographic data and completed various questionnaires assessing depressive symptoms, anxiety symptoms, and general psychopathology.
The entire procedure was conducted in a single experimental session lasting approximately one hour. The main experimental phase consisted of two blocks, each targeting different anxiety conditions through vignettes inspired by Carthy et al.. In the first block, focused on social anxiety, participants were randomly assigned one vignette from a pool of five. They rated the vignette's valence and arousal levels for the vignette before completing all 30 ERQ State items. The second block involved presenting a second vignette, randomly assigned to one of three anxiety conditions: separation anxiety, specific phobias, or generalized anxiety -- with five or six vignettes for each condition.
Building on the study by Carthy et al., we employed two sets of vignettes depicting potentially anxiety-inducing scenarios (see Supplementary Appendix B). Vignette 1 included situations related to social anxiety (e.g., "You are sitting in a group. The others introduce themselves briefly. You should introduce yourself in a moment."). Vignette 2 comprised scenarios with separation anxiety (e.g., "Your mother was supposed to come back from work, but she is late."), specific phobias (e.g., "A large dog is coming toward you on the street. It is heading straight for you."), or generalized anxiety (e.g., "On the way home from school, your stomach feels strange."). For each vignette, one scenario was randomly selected from a pool of five to six possibilities with equal probability. The vignettes were presented both auditorily and in written form.
We based the development of the ERQ State on a synthesis of prior empirical investigations and theoretical frameworks. Initial exploration of state ER strategies in children aged 9 to 13 years, using Reappraisal and Suppression scales adapted from Egloff et al., revealed insufficient psychometric properties. Following two pilot studies with university students and drawing on established theoretical models, e.g., established trait questionnaires, and additional research findings, e.g., we developed an initial ER questionnaire. This questionnaire assessed five strategies -- acceptance, avoidance, distraction, reappraisal, and rumination -- with six items each (see Supplementary Appendix C for the full item list). This expanded item pool formed the basis for the ERQ State-Short, which includes three items per strategy, totaling 15 items. Participants indicated their level of agreement with each item on a 5-point scale (1 = strongly disagree; 2 = disagree; 3 = neutral; 4 = agree; 5 = strongly agree).
Arousal and valence ratings were assessed using visual analogue scales accompanied by visual stimuli, following the procedure of Carthy et al.. Participants were asked to rate their current arousal level ("How tense do you feel at this moment?") on a scale from 1 (very relaxed) to 7 (very tense). Simultaneously, they rated their emotional valence ("How do you feel at this moment?") using the same 7-point scale from 1 (very good) to 7 (very bad).
SDQ-Deu. The SDQ assesses behavioral symptoms across five scales: Emotional Symptoms, Conduct Problems, Hyperactivity/Inattention, Peer Problems, and Prosocial Behavior, each comprising five items. Participants respond on a 3-point scale (0 = not true; 1 = somewhat true; 2 = certainly true), with five reverse-coded items. Scale scores range from 0 to 10, where higher scores indicate greater difficulties, except for the Prosocial Behavior scale, where higher scores reflect strengths. The German SDQ (SDQ-Deu) has demonstrated good validity, showing strong correlations with the German Child Behavior Checklist. In our sample, internal consistencies for the total difficulties score were robust (α = 0.81), while individual scales ranged from α = 0.53 (Conduct Problems scale) to α = 0.79 (Emotional Symptoms scale). Previous research has reported a wide range of internal consistency for SDQ subscales, possibly due to symptom heterogeneity.
Anxiety symptoms were assessed using the German short version of the SCAS-S, a concise alternative to the original SCAS. It targets separation anxiety, social anxiety, panic, specific anxieties, and generalized anxiety. Participants rate the frequency of their anxiety symptoms on a 4-point scale (0 = never; 1 = sometimes; 2 = often; 3 = always), with higher scores indicating increased anxiety symptoms. The SCAS-S has demonstrated convergent and discriminant validity across multiple translations, and shows good to excellent internal consistency for the total score (α = 0.88; ω = 0.93). In our sample, the German self-report SCAS-S exhibited excellent internal consistency for the total score (α = 0.90). Internal consistency for the subscales ranged from α = 0.47 (Separation Anxiety scale) to α = 0.88 (Panic Disorder scale).
Depressive symptoms were assessed using the depression module of the PHQ-9. Respondents rate each of nine items based on their experiences over the past 2 weeks using a 4-point scale (0 = not at all; 1 = several days; 2 = more than half the days; 3 = nearly every day). Total score ranges from 0 to 27, with higher scores indicating greater severity of depressive symptoms. Validation studies have demonstrated good internal consistency (α = 0.86; α = 0.89) and excellent test-retest reliability. In a German sample, the PHQ-9 showed good internal consistency (α = 0.88). Its validity has been supported by numerous studies, e.g., including those involving adolescents. In our sample, the PHQ-9 demonstrated good internal consistency (α = 0.89).
All analyses were performed using R Statistical Software (v4.2.1) and RStudio (v2022.12.0.353), primarily utilizing the packages psych (v2.2.9), lavaan (v0.6.15) and MBESS (v4.9.2). Unless stated otherwise, a significance level of 0.05 was applied. For the second and third research objectives, analyses were conducted using the ERQ State-short, consisting of the final 15 items, to streamline computations and improve efficiency without compromising analytical rigor.
To reduce the extended ERQ State item pool into the concise ERQ State-short, consisting of three items per strategy, we conducted CFAs using lavaan (v0.6.15). Measurement models were initially tested separately for each ER strategy to facilitate item reduction. The initial model for Vignette 1 included six ERQ State items per strategy along with their superordinate ER factors. When model fit was unsatisfactory, modification indices guided the identification of correlated residuals, and content-valid modifications were applied. Three items per strategy were selected based on criteria from Ziegler, including factor loadings, residual correlations, item difficulty, and theoretical fit. Priority was given to items with high loadings, while also considering wording, distribution, and item similarity. Subsequently, three-item models were tested, and item patterns re-evaluated. A five-factor joint model was specified, allowing correlations among factors, and comparing fit indices and loadings. Multiple three-item models were explored as needed, replacing items with cross-loading. The final ERQ State-short version was chosen based on the best model fit after comparisons and modifications. Item selection was based on Vignette 1 data, and the resulting model was then tested against Vignette 2 data. For Vignette 2, anxiety conditions were collapsed into a single group due to limited observations per condition. Given Mardia's test indicated nonnormality of the data, lavaan's WLSMV was used. To address potential biases, the MLR estimator was also employed for comparison. Model fit was assessed using the scaled χ statistic, where significant values indicate poor fit. Alternative fit indices included the (scaled) Comparative Fit Index (CFI), the (scaled) Root Mean Square Error of Approximation (RMSEA, for MLR only), and the Standardized Root Mean Square Residual (SRMR) -- the Mplus-like SRMR for the MLR -- with thresholds for good fit set at ≥ 0.95, ≤ 0.06, and ≤ 0.08, and ≥ 0.90, ≤ 0.08, and ≤ 0.08 for acceptable fit, respectively. For the WLSMV, the Weighted Root Mean Square Residual (WRMR) was also examined with a cutoff of < 1.0 but interpreted cautiously due to the small and nonnormally distributed sample. A detailed description of the item reduction procedure and its results is provided in Supplementary Appendix D. Reported results refer exclusively to the final ERQ State-short.
To evaluate the internal consistencies of the short scales, we used the MBESS package (v4.9.2). In addition to Cronbach's alpha, we reported McDonald's omega, which is generally considered a more accurate estimate of reliability. Specifically, we used the categorical version of omega (categorical omega; ω), as proposed by Green and Yang, given its suitability for ordinal data. Thresholds for acceptable reliability were set at ≥ 0.70 for both α and ω. Confidence intervals for the reliability estimates were calculated using bias-corrected and accelerated (BCa) bootstrapping, following recommendations by Kelley and Pornprasertmanit, with 1,000 bootstrap iterations.
The test-retest reliability of the ERQ State-short scales was assessed across Vignettes 1 and 2, which were presented consecutively within the same experimental session, with all participants completing both the initial and follow-up measurements. Reliability was calculated using Spearman's rank-order correlations. Bootstrap confidence intervals (BCa bootstrapping, 1,000 iterations) were computed using the confintr package (v1.0.0). Holm adjustments were applied to correct for multiple testing.