eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Abstract

The Safety Attitudes Questionnaire has long been used in the healthcare industry to measure healthcare workers’ attitudes toward patient safety culture; as a result, it has been translated into a variety of languages, including Korean. Recently, with the help of item response theory, we realised we do not need the original 41 items of the questionnaire to guarantee accuracy, so we reduced the instrument to a 23-item survey. Except for the stress recognition domain, every domain functioned well. We suspect the stress recognition domain did not fare well due to cultural differences. Stress recognition refers to individuals understanding that significant stress can lead to a greater probability to make an error. However, healthcare workers, especially those in Asian countries such as Taiwan and Korea, do not accept such an idea. Rather, we found that such workers believe they should finish their work, regardless of how tired they are. They believe that admitting to stress makes them appear weak and can lead to them being fired. As the chasm between these two concepts cannot easily be crossed, we ultimately decided to remove the stress recognition domain from this second version of the survey. In sum, the new version of the Safety Attitudes Questionnaire contains 23 items across five domains. Their psychometric property was tested using confirmatory factor analysis, and information function curves helped us determine which items should be retained in the new instrument by visualising the behaviour of items and domains.

Keywords: Patient Safety, Safety Culture, Culture Survey, 환자안전문화, 문화설문, SAQ

Introduction

The Safety Attitudes Questionnaire (SAQ) has been one of the most popular instruments for gauging safety culture among healthcare workers (HCWs) in hospitals around the world.¹ South Korea is one such country that has benefited from SAQ for years.² However, despite its positive impact on improving safety, the SAQ Korean version (SAQ-K) has a couple of weaknesses. First, considering HCWs’ large workload, the SAQ-K included too many items, leading respondents to not care or even drop out in the middle of completing the questionnaire. In addition, several items contained unclear expressions due to the English-to-Korean translation. In this study, we tried to develop a newer version of the SAQ with a string tag of ‘-K2’ by completely resolving these problems with the previous instrument. SAQ-K2 is kinder to respondents by providing a smaller number of items in a more explicit and more natural translation.

Many resources have been invested in this reform. To illustrate, since just after the debut of SAQ-K in late 2012, we immediately launched a plan to improve it. We published almost 30 articles on such improvements.^1–28 Many of them provided item-level information using item response theory (IRT).^4,18 Each of the studies added another cobblestone, paving the road to safer healthcare; such microscopic-level explorations of the instrument laid the groundwork for these updates. Furthermore, we found that Taiwanese researchers using the SAQ-Chinese version were experiencing very similar problems, which led us to suspect the issues arose from the similar Asian background of these two countries.²⁹ Working as a team, researchers from Taiwan and Korea actively collaborated, resolving issues in a shorter time than we expected. As a result, Taiwan currently enjoys a newer version of SAQ-C, known as the Taiwanese Patient Safety Culture survey instrument (TPSC),¹ whereas Korea has its SAQ-K2.

All updates were carefully applied and validated using a confirmatory factor analysis (CFA). As the methods and results sections show how all the items and domains achieved the string tag ‘-K2’, we close this introduction here and directly dive into the details. To ensure a better flow, some contents from the discussion section have been dispersed to other sections.

Methods

Modification of the Previous Version of SAQ-K

This section describes in a step-by-step way the many tasks that took place simultaneously or in a reiterative way. We have divided the information into steps only to provide a clearer explanation.

Removal of a non-functioning domain

The original SAQ-K consists of 34 items in 6 domains. First we removed the entire stress recognition (SR) domain (i.e., four items), leaving five domains. SR was designed to ask respondents to acknowledge that stressors influenced their performance. However, in some countries, including Korea, HCWs believe they should be able to overcome any stressful situations; thus, giving a high score to SR items may make them look weak (22, 24), potentially increasing the possibility of being laid off. We saw no reason to keep SR in the instrument.

Deleting too unclear (non-translatable) items

Some English sentences or expressions can never be translated correctly into Korean; the nuance of the words in the two languages can never be a function, f(x), where the word-to-word translation is possible. It is particularly cumbersome for researchers that even a single word can ruin an item once translated. For example, in the item ‘Hospital management does not knowingly compromise patient safety’,² the words ‘knowingly’ and ‘compromise’ can be perceived in too many ways in Korean, including both positive and negative connotations, or even not be translated. As such, this intended-to-be-good item should be removed. Some may ask why not just use it as it is only one item in a domain, but we do not recommend such an approach as it would lead the whole domain vector (maybe psychological tensor) in the wrong direction.

Reducing the number of items by combining similar ones

This step requires both quantitative and qualitative decision making. In the perception of management domain (PM), both the original SAQ and SAQ-K1 included ten items: five items asking about two different management levels each, clinical management and hospital management (2). From the authors’ experience in the US, this set of items was functioned well. However, HCWs using SAQ-K or SAQ-C experienced severe difficulties with the set because, in their minds, there was no clear distinction between clinical unit managers and hospital managers. HCWs rarely see hospital-level management for more than a passing glance and practically never actually interact with them. We do not intend to judge whether this phenomenon is right or wrong; it is simply the status quo. Thus, we decided to merge each pair of questions into one item that combined ‘hospital managers’ and ‘managers of your areas’ into ‘managers’. As a result, respondents felt the instrument was much more straightforward and they could respond to the items. Table 2 summarizes the new version with much fewer items in the PM domain.

Fine-tuning of items to better fit the current Korean environment

This section focused on the subtle differences between English and Korean words, even for synonyms in dictionaries, which are primarily due to changes in the nuances of words in both languages as well as the hospital’s safety culture itself. In addition, temporal change in culture requires word-level adjustment. What follows is a great example from one of the author’s personal experience.

A few years ago, the Provider Behavior Research Group at Johns Hopkins Hospital decided to modify the SAQ that it had routinely administered every 18 months for years. The first issue was the very beginning item of the instrument: ‘Nurse input is well received in this clinical area.’ The original item (‘Doctor–nurse relationship is the most visible symbol of a power gradient in a healthcare setting’) was completely relevant, but this power play has been gradually dissipating recent years and is even discouraged by management. Thus, the ‘doctor–nurse’ component was removed to ask simply ‘employees’ input is well received’. In this way, several minor changes were made many words to make the items more clearly understood.

Final preparation before checking the validity: back translation

Finally, a bilingual (professional translator) translated the SAQ-K2 back into English and confirmed there were no items whose ideas differed from the original SAQ item.

Data collection

We administered SAQ-K2 in four different hospitals: a tertiary, a secondary, a nursing home, and a large ophthalmology clinic. Data were collected from March 4 to March 16 in 2019. All shifts (day, evening, and night) participated. The paper version was used for all respondents.

Analysis

With a total of 23 items in the five domains, a correlated factor model was developed to include all possible relationships between domains. As we depended upon the linear assumption in a 5-point Likert scale, we primarily used the same logic for this analysis step.

Addendum: unidimensional IRT model and its information function curve

In addition to using a typical linear CFA for a model fit check, we added an IRT analysis to visually check how SAQ-K2 items functioned. Although the authors use multidimensional-IRT (MIRT) on a daily basis, we did not go to that level. Instead, we used a simple unidimensional IRT model for drawing information function curves. We will show some of the results in a later section.

All analyses were conducted using Stata/SE 15.1 (StataCorp, College Station, Texas).

Results

Characteristics of respondents

A total of 297 HCWs responded. In Korean hospitals, the predominant job type is nurse, and most nurses are female. The same pattern applied to our sample, which only included two pharmacists; this potential under-representativeness of pharmacists is not meant to influence the validation process, especially when backed up by IRT (Table 1).

Characteristics	N	%
Gender
Male	77	25.9
Female	220	74.1
Work Experience
6 months	35	11.8
7-11 months	25	8.4
1-2 years	59	19.9
3-4 years	63	0.2
5-10 years	7	23.9
11-20 years	35	0.8
> 20 years	9	3.0
Physicians
Job Type	24	8.1
Nurses	120	40.4
Pharmacists	2	0.7
Technicians	75	25.3
Administrative staff	56	18.9
Others	20	6.7
Total	297	100

Table 1 Characteristics of respondents

Table 2 summarizes the results from the CFA, presented by domain. Each of the TC, SC, and JS domains consists of five items; PM and WC have four items each. Standardized factor loading spanned from 0.62 (WC1) to 0.88 (JC3 and JC4), indicating that items represent the corresponding latent trait (i.e., domain) well.

Table 2 Factor loadings from the correlated factor model

Table 3 indicates the variance/covariance matrix among domains. Although not shown here, we tried a model including the SR domain, and SR clearly showed a negative relationship with the other domains. Such results actively support why SAQ-K2 and SAQ versions from other countries removed the SR domain.^7,22,30,31

Table 3 Variance/covariance structure

Now we move on to the model fit statistics (Table 4). Except for chi-square, most of them were satisfactory compared not only to other safety culture instruments, but also any general psychometric measurement tools in various fields.^28,32,33 We did not emphasize the modification indexes, as this was beyond the scope of our study. In sum, the current safety culture measurement instrument is as valid as the previous version, albeit with a reduced number of items.

Fit statistics	Value	Description
Likelihood ratio
chi2_ms(220)	469.766	model vs. saturated
p > chi2	0.000
chi2_bs(253)	4304.128	baseline vs. saturated
p > chi2	0.000
Population error
RMSEA	0.065	Root mean squared error of approximation
90% CI, lower bound	0.057
upper bound	0.073
pclose	0.002	Probability RM SEA <= 0.05
Information criteria
AIC	12478.508	Akaike's information criterion
BIC	12762.419	Bayesian information criterion
Baseline comparison
CFI	0.938	Comparative fit index
TLI	0.929	Tucker-Lewis index
Size of residuals
SRMR	0.048	Standardized root mean squared residual
CD	1.000	Coefficient of determination

Table 4 Model fit indices

Discussion

Readers in the realm of quality and safety or psychology might think of this article as just another instrument validation using CFA. To a certain degree, it is. However, behind the scenes, our real value boils down to the phrase ‘saving lives by saving time’.⁹ We know all too well that in a hospital, just one minute might be enough time to make a patient’s silent heart begin to pump blood again—or the other way around. Therefore, we regard the efficiency of an instrument as our guiding star. The word ‘efficiency’ implies that the reform will lessen the burden of completing the survey as much as possible; in the meantime, the constructs that the instrument was intended to measure can still be quantified with high precision. Thus, just minimising the number of items is neither sufficient nor ideal. SAQ-K2 is not designed as Fishbein’s direct one-question method for a construct.³⁴ Yet increasing the number of items is not an ideal way either. Although more items lead to a higher alpha we have to keep reminding ourselves that ‘time is life’ in a hospital. Thus, keeping a balance between the two is a difficult tightrope to walk.

The good news is that we already had SAQ’s original English version and SAQ-K1, so we did not have to consider what items to add. Rather, we only had to prioritize the existing items (although a slight modification was also frequently required) and remove the less critical items one by one in quantifying a construct to the number where the amount of information from the instrument is not significantly harmed. Of course, there are new approaches to survey efficiency. Jeong et al., through their randomised controlled trial with SAQ-K1, suggested reducing response options from a 5-point Likert scale to a 3-point Likert scale or even using dichotomized answers.^5,9 Especially when we are focused solely on the central tendency of a group while ignoring variance, as we usually do, this way worked quite well. However, this new method is still premature, and SAQ-K2 is intended to be administered to all HCWs in Korea; therefore, we decided to stick to the conventional 5-point Likert scale, which left us with one option of removing the less important items. At this point, we borrowed from IRT’s graded response model’s visualising power.³⁵ We included a couple of graphs we used in Figure 1, where the TC domain was displayed as an example.

Figure 1 Item information curve (IIC) and Test Information Curve (TIF) of TC domain.

As seasoned readers of psychometrics may already know, IRT does not require CFA’s linear assumption; therefore, the graphs in Figure 1 (called information curves) are a very powerful tool for item selection. In Figure 1, the abscissa stands for trait level (domain level), with standard deviation as a unit. In the left pane, each line denotes the amount of information (precision) that the item contributes to the total score. Note that the amount varies across the TC level so we can understand in which level of safety attitude the item is the most useful. TC3 seems to be the lowest of most latent trait levels, yet the curves for the other items merged. Thus, in this graph, we were not wholly sure that removing a specific item would improve the instrument’s efficiency. In the right pane, the solid line is the sum of the individual information curves, and the broken line is the error. Here, till around 2 on the x-axis, information is quite high, which means that the TC domain of SAQ-K2 is functioning well to around the 90th percentile of the recipients. If we are focused on capturing the high safety culture level with a tiny error range, we should add another item that can capture the high level of TC attitudes. However, as of this writing, Korea’s safety culture falls around the middle of the scores, so we did not need an additional item for high safety attitudes.

The other four domains were similar, although for some domains, like PM, one item provided shallow information. Still, we kept these items. To explain why, we have to introduce differential item functioning (DIF). What if doctors and nurses share the same safety attitudes but answer differently? By analysing the data, we may want to say the two groups were different, despite being the same in terms of the latent trait level. To test this DIF and adjust it across different groups of people, we usually need at least four items (assuming a 5-point Likert scale). Therefore, this study set four items per domain as the lower limit of the number of items.³⁶

The authors should confess that we prefer IRT or multidimensional IRT (MIRT) over classic CFA. However, almost all SAQs were validated using the traditional CFA, so we followed that approach while keeping MIRT as a sidearm.

Conclusion

We have described what we did to develop the second version of SAQ-K. Although the process mixed qualitative and quantitative approaches, one idea is still clear: too much burden on responders eventually leads them to quit the survey in the middle or even not participate in it at all. Therefore, we must keep a balance between the precision of the survey instrument and the burden for HCWs to complete a survey. More often than not, we focus on the precision of data or using precision as an excuse for not doing our best to optimize the survey (i.e., make it as short as possible). When TS Elliot said, ‘If I had more time, I would have written a shorter letter,’ it was a great joke that made the reader laugh. Yet we do not have the right to say the same with a survey instrument in hospitals because somebody may die because of an inefficient instrument. SAQ-K2, we believe, will resolve this issue, although we know there will be SAQ-K3.