Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 8 Issue 3

An efficiency-oriented reform of safety attitudes questionnaire–Korean version (Development of SAQ-K2)

Heon-Jae Jeong,1 Wui-Chiang Lee,2 Deog Hyeon Son,3 Jung Hwa Lee,3 Su Sang Ryu,3 Shin Hye Yoo,4 Eun Jin Bae,5 Chulho Kim,6 Su Ha Han7

1President & CEO, The Care Quality Research Group; Advisor, Joint Commission Taiwan, Taiwan
2Department of Medical Affairs and Planning, Taipei Veterans General Hospital & National Yang-Ming University School of Medicine, Taipei, Taiwan
3Eson Convalescent Hospital, Korea
4Center for Palliative Care and Clinical Ethics, Seoul National University Hospital, Korea
5Moon's eye hospital, Korea
6Department of Neurology, Chuncheon Sacred Hospital, Korea
7Department of Nursing, SoonChunHyang University, Korea

Correspondence: Su Ha Han, Department of Nursing, SoonChunHyang University, 31 SoonChunHyang 6-gil, dongnam-gu, Cheonan-si, Chungcheongnam-do, 31151, South Korea, Tel +82-41-570-2487, Fax +82-41-570-2498

Received: April 30, 2019 | Published: May 16, 2019

Citation: Jeong HJ, Lee WC, Son DH, et al. An efficiency-oriented reform of safety attitudes questionnaire–Korean version (Development of SAQ-K2). Biom Biostat Int J. 2019;8(3):93-99. DOI: 10.15406/bbij.2019.08.00277

Download PDF


The Safety Attitudes Questionnaire has long been used in the healthcare industry to measure healthcare workers’ attitudes toward patient safety culture; as a result, it has been translated into a variety of languages, including Korean. Recently, with the help of item response theory, we realised we do not need the original 41 items of the questionnaire to guarantee accuracy, so we reduced the instrument to a 23-item survey. Except for the stress recognition domain, every domain functioned well. We suspect the stress recognition domain did not fare well due to cultural differences. Stress recognition refers to individuals understanding that significant stress can lead to a greater probability to make an error. However, healthcare workers, especially those in Asian countries such as Taiwan and Korea, do not accept such an idea. Rather, we found that such workers believe they should finish their work, regardless of how tired they are. They believe that admitting to stress makes them appear weak and can lead to them being fired. As the chasm between these two concepts cannot easily be crossed, we ultimately decided to remove the stress recognition domain from this second version of the survey. In sum, the new version of the Safety Attitudes Questionnaire contains 23 items across five domains. Their psychometric property was tested using confirmatory factor analysis, and information function curves helped us determine which items should be retained in the new instrument by visualising the behaviour of items and domains.

Keywords: Patient Safety, Safety Culture, Culture Survey, 환자안전문화, 문화설문, SAQ


The Safety Attitudes Questionnaire (SAQ) has been one of the most popular instruments for gauging safety culture among healthcare workers (HCWs) in hospitals around the world.1 South Korea is one such country that has benefited from SAQ for years.2 However, despite its positive impact on improving safety, the SAQ Korean version (SAQ-K) has a couple of weaknesses. First, considering HCWs’ large workload, the SAQ-K included too many items, leading respondents to not care or even drop out in the middle of completing the questionnaire. In addition, several items contained unclear expressions due to the English-to-Korean translation. In this study, we tried to develop a newer version of the SAQ with a string tag of ‘-K2’ by completely resolving these problems with the previous instrument. SAQ-K2 is kinder to respondents by providing a smaller number of items in a more explicit and more natural translation.

Many resources have been invested in this reform. To illustrate, since just after the debut of SAQ-K in late 2012, we immediately launched a plan to improve it. We published almost 30 articles on such improvements.1–28 Many of them provided item-level information using item response theory (IRT).4,18 Each of the studies added another cobblestone, paving the road to safer healthcare; such microscopic-level explorations of the instrument laid the groundwork for these updates. Furthermore, we found that Taiwanese researchers using the SAQ-Chinese version were experiencing very similar problems, which led us to suspect the issues arose from the similar Asian background of these two countries.29 Working as a team, researchers from Taiwan and Korea actively collaborated, resolving issues in a shorter time than we expected. As a result, Taiwan currently enjoys a newer version of SAQ-C, known as the Taiwanese Patient Safety Culture survey instrument (TPSC),1 whereas Korea has its SAQ-K2.

All updates were carefully applied and validated using a confirmatory factor analysis (CFA). As the methods and results sections show how all the items and domains achieved the string tag ‘-K2’, we close this introduction here and directly dive into the details. To ensure a better flow, some contents from the discussion section have been dispersed to other sections.


  1. Modification of the Previous Version of SAQ-K

This section describes in a step-by-step way the many tasks that took place simultaneously or in a reiterative way. We have divided the information into steps only to provide a clearer explanation.

Removal of a non-functioning domain

The original SAQ-K consists of 34 items in 6 domains. First we removed the entire stress recognition (SR) domain (i.e., four items), leaving five domains. SR was designed to ask respondents to acknowledge that stressors influenced their performance. However, in some countries, including Korea, HCWs believe they should be able to overcome any stressful situations; thus, giving a high score to SR items may make them look weak (22, 24), potentially increasing the possibility of being laid off. We saw no reason to keep SR in the instrument.

Deleting too unclear (non-translatable) items

Some English sentences or expressions can never be translated correctly into Korean; the nuance of the words in the two languages can never be a function, f(x), where the word-to-word translation is possible. It is particularly cumbersome for researchers that even a single word can ruin an item once translated. For example, in the item ‘Hospital management does not knowingly compromise patient safety’,2 the words ‘knowingly’ and ‘compromise’ can be perceived in too many ways in Korean, including both positive and negative connotations, or even not be translated. As such, this intended-to-be-good item should be removed. Some may ask why not just use it as it is only one item in a domain, but we do not recommend such an approach as it would lead the whole domain vector (maybe psychological tensor) in the wrong direction.

Reducing the number of items by combining similar ones

This step requires both quantitative and qualitative decision making. In the perception of management domain (PM), both the original SAQ and SAQ-K1 included ten items: five items asking about two different management levels each, clinical management and hospital management (2). From the authors’ experience in the US, this set of items was functioned well. However, HCWs using SAQ-K or SAQ-C experienced severe difficulties with the set because, in their minds, there was no clear distinction between clinical unit managers and hospital managers. HCWs rarely see hospital-level management for more than a passing glance and practically never actually interact with them. We do not intend to judge whether this phenomenon is right or wrong; it is simply the status quo. Thus, we decided to merge each pair of questions into one item that combined ‘hospital managers’ and ‘managers of your areas’ into ‘managers’. As a result, respondents felt the instrument was much more straightforward and they could respond to the items. Table 2 summarizes the new version with much fewer items in the PM domain.

Fine-tuning of items to better fit the current Korean environment

This section focused on the subtle differences between English and Korean words, even for synonyms in dictionaries, which are primarily due to changes in the nuances of words in both languages as well as the hospital’s safety culture itself. In addition, temporal change in culture requires word-level adjustment. What follows is a great example from one of the author’s personal experience.

A few years ago, the Provider Behavior Research Group at Johns Hopkins Hospital decided to modify the SAQ that it had routinely administered every 18 months for years. The first issue was the very beginning item of the instrument: ‘Nurse input is well received in this clinical area.’ The original item (‘Doctor–nurse relationship is the most visible symbol of a power gradient in a healthcare setting’) was completely relevant, but this power play has been gradually dissipating recent years and is even discouraged by management. Thus, the ‘doctor–nurse’ component was removed to ask simply ‘employees’ input is well received’. In this way, several minor changes were made many words to make the items more clearly understood.

Final preparation before checking the validity: back translation

Finally, a bilingual (professional translator) translated the SAQ-K2 back into English and confirmed there were no items whose ideas differed from the original SAQ item.

  1. Data collection

We administered SAQ-K2 in four different hospitals: a tertiary, a secondary, a nursing home, and a large ophthalmology clinic. Data were collected from March 4 to March 16 in 2019. All shifts (day, evening, and night) participated. The paper version was used for all respondents.

  1. Analysis

With a total of 23 items in the five domains, a correlated factor model was developed to include all possible relationships between domains. As we depended upon the linear assumption in a 5-point Likert scale, we primarily used the same logic for this analysis step.

Addendum: unidimensional IRT model and its information function curve

In addition to using a typical linear CFA for a model fit check, we added an IRT analysis to visually check how SAQ-K2 items functioned. Although the authors use multidimensional-IRT (MIRT) on a daily basis, we did not go to that level. Instead, we used a simple unidimensional IRT model for drawing information function curves. We will show some of the results in a later section.

All analyses were conducted using Stata/SE 15.1 (StataCorp, College Station, Texas).


Characteristics of respondents

A total of 297 HCWs responded. In Korean hospitals, the predominant job type is nurse, and most nurses are female. The same pattern applied to our sample, which only included two pharmacists; this potential under-representativeness of pharmacists is not meant to influence the validation process, especially when backed up by IRT (Table 1).











Work Experience

      6 months



     7-11 months



     1-2 years



     3-4 years



     5-10 years



     11-20 years



     > 20 years




     Job Type












     Administrative staff









Table 1 Characteristics of respondents

Table 2 summarizes the results from the CFA, presented by domain. Each of the TC, SC, and JS domains consists of five items; PM and WC have four items each. Standardized factor loading spanned from 0.62 (WC1) to 0.88 (JC3 and JC4), indicating that items represent the corresponding latent trait (i.e., domain) well.

Table 2 Factor loadings from the correlated factor model

Table 3 indicates the variance/covariance matrix among domains. Although not shown here, we tried a model including the SR domain, and SR clearly showed a negative relationship with the other domains. Such results actively support why SAQ-K2 and SAQ versions from other countries removed the SR domain.7,22,30,31

Table 3 Variance/covariance structure

Now we move on to the model fit statistics (Table 4). Except for chi-square, most of them were satisfactory compared not only to other safety culture instruments, but also any general psychometric measurement tools in various fields.28,32,33 We did not emphasize the modification indexes, as this was beyond the scope of our study. In sum, the current safety culture measurement instrument is as valid as the previous version, albeit with a reduced number of items.

Fit statistics



Likelihood ratio





model vs. saturated

p > chi2





baseline vs. saturated

p > chi2



Population error





Root mean squared error of approximation

90% CI, lower bound



upper bound





Probability RM SEA <= 0.05

Information criteria





Akaike's information criterion



Bayesian information criterion

Baseline comparison





Comparative fit index



Tucker-Lewis index

Size of residuals





Standardized root mean squared residual



Coefficient of determination

Table 4 Model fit indices


Readers in the realm of quality and safety or psychology might think of this article as just another instrument validation using CFA. To a certain degree, it is. However, behind the scenes, our real value boils down to the phrase ‘saving lives by saving time’.9 We know all too well that in a hospital, just one minute might be enough time to make a patient’s silent heart begin to pump blood again—or the other way around. Therefore, we regard the efficiency of an instrument as our guiding star. The word ‘efficiency’ implies that the reform will lessen the burden of completing the survey as much as possible; in the meantime, the constructs that the instrument was intended to measure can still be quantified with high precision. Thus, just minimising the number of items is neither sufficient nor ideal. SAQ-K2 is not designed as Fishbein’s direct one-question method for a construct.34 Yet increasing the number of items is not an ideal way either. Although more items lead to a higher alpha we have to keep reminding ourselves that ‘time is life’ in a hospital. Thus, keeping a balance between the two is a difficult tightrope to walk.

The good news is that we already had SAQ’s original English version and SAQ-K1, so we did not have to consider what items to add. Rather, we only had to prioritize the existing items (although a slight modification was also frequently required) and remove the less critical items one by one in quantifying a construct to the number where the amount of information from the instrument is not significantly harmed. Of course, there are new approaches to survey efficiency. Jeong et al., through their randomised controlled trial with SAQ-K1, suggested reducing response options from a 5-point Likert scale to a 3-point Likert scale or even using dichotomized answers.5,9 Especially when we are focused solely on the central tendency of a group while ignoring variance, as we usually do, this way worked quite well. However, this new method is still premature, and SAQ-K2 is intended to be administered to all HCWs in Korea; therefore, we decided to stick to the conventional 5-point Likert scale, which left us with one option of removing the less important items. At this point, we borrowed from IRT’s graded response model’s visualising power.35 We included a couple of graphs we used in Figure 1, where the TC domain was displayed as an example.

Figure 1 Item information curve (IIC) and Test Information Curve (TIF) of TC domain.

As seasoned readers of psychometrics may already know, IRT does not require CFA’s linear assumption; therefore, the graphs in Figure 1 (called information curves) are a very powerful tool for item selection. In Figure 1, the abscissa stands for trait level (domain level), with standard deviation as a unit. In the left pane, each line denotes the amount of information (precision) that the item contributes to the total score. Note that the amount varies across the TC level so we can understand in which level of safety attitude the item is the most useful. TC3 seems to be the lowest of most latent trait levels, yet the curves for the other items merged. Thus, in this graph, we were not wholly sure that removing a specific item would improve the instrument’s efficiency. In the right pane, the solid line is the sum of the individual information curves, and the broken line is the error. Here, till around 2 on the x-axis, information is quite high, which means that the TC domain of SAQ-K2 is functioning well to around the 90th percentile of the recipients. If we are focused on capturing the high safety culture level with a tiny error range, we should add another item that can capture the high level of TC attitudes. However, as of this writing, Korea’s safety culture falls around the middle of the scores, so we did not need an additional item for high safety attitudes.

The other four domains were similar, although for some domains, like PM, one item provided shallow information. Still, we kept these items. To explain why, we have to introduce differential item functioning (DIF). What if doctors and nurses share the same safety attitudes but answer differently? By analysing the data, we may want to say the two groups were different, despite being the same in terms of the latent trait level. To test this DIF and adjust it across different groups of people, we usually need at least four items (assuming a 5-point Likert scale). Therefore, this study set four items per domain as the lower limit of the number of items.36

The authors should confess that we prefer IRT or multidimensional IRT (MIRT) over classic CFA. However, almost all SAQs were validated using the traditional CFA, so we followed that approach while keeping MIRT as a sidearm.


We have described what we did to develop the second version of SAQ-K. Although the process mixed qualitative and quantitative approaches, one idea is still clear: too much burden on responders eventually leads them to quit the survey in the middle or even not participate in it at all. Therefore, we must keep a balance between the precision of the survey instrument and the burden for HCWs to complete a survey. More often than not, we focus on the precision of data or using precision as an excuse for not doing our best to optimize the survey (i.e., make it as short as possible). When TS Elliot said, ‘If I had more time, I would have written a shorter letter,’ it was a great joke that made the reader laugh. Yet we do not have the right to say the same with a survey instrument in hospitals because somebody may die because of an inefficient instrument. SAQ-K2, we believe, will resolve this issue, although we know there will be SAQ-K3.


  1. Jeong H-J, Lee W-C, Liao H-H. Item response theory-based validation of Taiwanese patient safety culture measurement instrument. Biom Biostat Int J. 2018;7(4):272–277.
  2. Jeong H-J, Jung SM, An EA, et al. Development of the Safety Attitudes Questionnaire–Korean Version (SAQ-K) and its novel analysis methods for safety managers. Biom Biostat Int J. 2015;2(1):00020.
  3. Jeong H-J, Kim M, An EA, et al. A strategy to develop tailored patient safety culture improvement programs with latent class analysis method. Biom Biostat Int J. 2015;2(2):00027.
  4. Jeong H-J, Lee W-C. Item response theory-based evaluation of psychometric properties of the safety attitudes questionnaire—Korean version (SAQ-K). Biom Biostat Int J. 2016;3(5):00079.
  5. Jeong H-J, Lee W-C. The level of collapse we are allowed: Comparison of different response scales in Safety Attitudes Questionnaire. Biom Biostat Int J. 2016;4(3):00100.
  6. Jeong H-J, Lee W-C. The pure and the overarching: An application of bifactor model to Safety Attitudes Questionnaire. Biom Biostat Int J. 2016;4(6):110.
  7. Jeong H-J, Lee W-C, Liao H-H. Importance of covariance in confirmatory factor analysis: Safety Attitudes Questionnaire-Chinese Version as an example. Biom Biostat Int J. 2017;6(2):00165.
  8. Jeong H-J, Lee W-C, Liao H-H. The road not taken: New methods to describe safety culture survey score. Biom Biostat Int J. 2017;6(3):00166.
  9. Jeong H-J, Park MJ, Kim C-H, et al. Saving lives by saving time: Association between measurement scale and time to complete Safety Attitudes Questionnaire. Biom Biostat Int J. 2016;4(5):00105.
  10. Chung A, Jeong H-J, Lee W-C. What though p-values conspire to cheat you: save your greatest failure. Biom Biostat Int J. 2016;3(2):63–64.
  11. Jeong H-J. Should we estop ourselves as an act of research? Biom Biostat Int J. 2017;6(4):175.
  12. Jeong H-J. 221β Baker Street, episode 2: the blind regressor-Part 1. Biom Biostat Int J. 2018;7(2):1.
  13. Jeong H-J, An EA, Kim SY, et al. Combinational effects of clinical area and healthcare workers' job type on the safety culture in hospitals. Biom Biostat Int J. 2015;2(2):00024.
  14. Jeong H-J, Jo H-S, Lee H-J, et al. Factors affecting hand hygiene behavior among health care workers of intensive care units in teaching hospitals in Korea: importance of cultur- al and situational barriers. J of Quality Improvement in Health Care. 2015;21(1):36–50.
  15. Jeong H-J, Kim M. A practical guide to behavioral theory-driven statistical development of quality and safety improvement program in health care. Biom Biostat Int J. 2014;1(1):1–6.
  16. Jeong H-J, Kim M. Triangulating safety: Applying social media analysis methods to revolutionize patient safety. Biom Biostat Int J. 2015;2(1):00018.
  17. Jeong H-J, Lee W-C. Ignorance or negligence: uncomfortable truth regarding misuse of confirmatory factor analysis. Journal of Biometrics & Biostatistics. 2016;7(3):298.
  18. Jeong H-J, Lee W-C. Does differential item functioning occur across respondents’ characteristics in Safety Attitudes Questionnaire? Biom Biostat Int J. 2016;4(3):00097.
  19. Jeong H-J, Lee W-C. A strategy to overcome under-reporting issues of voluntary medication error reporting system, part II: Changes in number of reports by a counter-error measure—Computerized prescriber order entry. Biom Biostat Int J. 2017;5(5):00146.
  20. Jeong H-J, Lee W-C. A very short and gentle review of mean and standard deviation as summary statistics of a sample. Biom Biostat Int J. 2017;6(2):00160.
  21. Jeong H-J, Lee W-C. Bayes and I: A gentle introduction to the bayesian approach. Biom Biostat Int J. 2017;5(2):00130.
  22. Jeong H-J, Pham JC, Kim M, et al. Major cultural-compatibility complex: considerations on cross-cultural dissemination of patient safety programmes. BMJ Quality & Safety. 2012;21(7):612–615.
  23. Jeong H-J, Yoon H. Hospital user’s manual: 33 rules for patients' safety. Seoul: Vitabooks; 2013.
  24. Lee G-S, Park MJ, Na H-R, et al. Are healthcare workers trained to be impervious to stress? Biom Biostat Int J. 2015;2(2):00028.
  25. Lee G-S, Park M-J, Na H-R, et al. A strategy for administration and application of a patient safety culture survey. J of Quality Improvement in Health Care. 2015;21(1):80–95.
  26. Lee W-C, Liao H-H, Jeong H-J. Considerations on survey validation: Focusing on international survey adaptation. Biom Biostat Int J. 2017;6(2):162.
  27. Liao H-H, Lee W-C, Jeong H-J. Can percent agreement be the scoring scheme for Safety Attitudes Questionnaire? Biom Biostat Int J. 2017;6(2):00161.
  28. Liao H-H, Lee W-C, You Y-L, et al. A practical approach to develop a parsimonious survey questionnaire—Taiwanese Patient Safety Culture Survey as an example. Biom Biostat Int J. 2017;6(3):00169.
  29. Hofstede G, Hofstede G, Minkov M. Cultures and organizations: Software for the mind. 3rd ed. New York, NY: McGraw-Hill; 2010. 576 p.
  30. Lee W-C, Chen S, Cheng Y, et al. Validation study of the Chinese Safety Attitudes Questionnaire in Taiwan. Taiwan J Public Health. 2008;27:6–15.
  31. Lee W-C, Wung H-Y, Liao H-H, et al. Hospital safety culture in Taiwan: a nationwide survey using Chinese version Safety Attitudes Questionnaire. BMC health services research. 2010;10(1):234.
  32. Hu Lt, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal. 1999;6(1):1–55.
  33. Acock AC. Discovering structural equation modeling using Stata. College Station, TX: Stata Press Books; 2013. 306 p.
  34. Fishbein M, Ajzen I. Predicting and changing behavior: The reasoned action approach. New York, NY: Psychology Press; 2011. 538 p.
  35. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph supplement. 1969;34(4, Pt. 2):100.
  36. Kim SH, Cohen AS. Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement. 1998;22(4):345–355.

Appendix. SAQ-K2

Creative Commons Attribution License

©2019 Jeong, et al . This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.