The article considers approaches and statistical methods for assessing the probability distribution of treatment outcomes for patients in large multidisciplinary medical hospitals, depending on the factors that characterize their gender and age, the profile and severity of the disease, the type of hospitalization, the technology of treatment and its cost, and some others. Examples of such distributions have been given, formed using multinomial models of disordered multiple-choice and random forest based on large arrays of initial data that characterize the parameters of real flows of hospital patients and the results of their treatment over a long period.
The obtained distributions were compared with each other in terms of quality indicators that reflect the estimates of the precision and recall of the predicted outcomes using the methods used. The reasons for the discrepancies in the quality estimates of the constructed distributions have been discussed. The possibilities of improving the approaches and methods proposed for assessing the probability of treatment outcomes of patients in medical hospitals, associated with greater detail of diagnoses, expanding the composition of the initial information, especially concerning rare outcomes, have been considered.
Econometric Methods for Assessing and Predicting the Results of Treatment Process in a Large Medical Hospital
Tatiana Tikhomirova1*, Nikolay Tikhomirov1
1 Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russia.
ABSTRACT
The article considers approaches and statistical methods for assessing the probability distribution of treatment outcomes for patients in large multidisciplinary medical hospitals, depending on the factors that characterize their gender and age, the profile and severity of the disease, the type of hospitalization, the technology of treatment and its cost, and some others. Examples of such distributions have been given, formed using multinomial models of disordered multiple-choice and random forest based on large arrays of initial data that characterize the parameters of real flows of hospital patients and the results of their treatment over a long period.
The obtained distributions were compared with each other in terms of quality indicators that reflect the estimates of the precision and recall of the predicted outcomes using the methods used. The reasons for the discrepancies in the quality estimates of the constructed distributions have been discussed. The possibilities of improving the approaches and methods proposed for assessing the probability of treatment outcomes of patients in medical hospitals, associated with greater detail of diagnoses, expanding the composition of the initial information, especially concerning rare outcomes, have been considered.
Keywords: Treatment outcome, Patient characteristics, Probable distribution of outcomes, Multiple-choice models, Random forest models, Latent variable.
INTRODUCTION
Large medical hospitals in Russia are multidisciplinary medical institutions that provide high-quality medical care using modern high-tech methods of diagnosis, treatment, rehabilitation, and prevention for a wide range of diseases, mainly on a paid basis. Therewith, the payment for their services is made both by insurance companies, by contracts of compulsory hospital insurance (CHI) and voluntary hospital insurance (VHI), and by the patients themselves under private contracts. The amount of payment for treatment is usually tied to its form, especially in health insurance contracts. There is a more significant differentiation of it in private contracts. Therewith, this indicator affects the composition of medical services provided and, obviously, the result of treatment.
Also, these results largely depend on the type and severity of the disease, the type of hospitalization (planned, emergency), the gender and age of the patient, and several other factors. It should be noted that these results are usually determined by several categories of the health status of patients who have undergone the corresponding course of treatment, for example:
As part of these categories that characterize the outcomes of treatment, the position "transfer to another hospital" can also be considered, which may mean a non-core type of disease for this hospital, insufficient technological equipment for its treatment, etc.
A particular result of treatment for each patient is a random variable, which is characterized by a certain probability of manifestation against the background of many factors that express the above parameters of the patient. Reliable dependences of the probability estimates of different treatment outcomes on these factors can be used to determine the load on a medical hospital with the expected flows of patients coming to treatment, to justify decisions that ensure an increase in the efficiency of its work [1-5].
In practice, the probabilities of possible outcomes of patients treatment (the distribution of probabilities by possible outcomes of their treatment) for each medical hospital can be estimated based on econometric models of discrete choice, machine learning methods for processing big data using statistics accumulated in the medical hospital, reflecting the characteristics of patients admitted for a certain period in the past, and the conditions and results of their treatment [6-9].
In this paper, we consider the features of solving this problem using econometric multinomial models of disordered multiple choice and the random forest method.
MATERIALS AND METHODS
Multinomial models of disordered multiple-choice allow estimating the probability of assigning a patient after treatment to each of the conditions under consideration based on information reflecting a specific set of factor values for each patient, where j is the patient's index, and – the factor index. These probabilities can be represented by the following expression [6]:
, |
(1) |
where is the probability of being patient j, characterized by a set of factors after treatment in the i-th state; is the latent variable of the i-th outcome for the j-th patient, determined according to the following expression:
, |
(2) |
where is the row vector of parameters with factors specific to each patient; – the error of the model, which is distributed according to Gumbel's law:
. |
(3) |
Note that the normalization restriction is often used for the latent variable of the first outcome:
. |
(4) |
Therefore, the probability of its manifestation, evaluating the expression (1), takes the form of equation 5.
, |
(5) |
and the probabilities of other outcomes are determined by the following modification:
. |
(6) |
Optimal parameter values that optimize the distribution over a set of outcomes can be obtained using the maximum likelihood method based on information reflecting the treatment results of a certain set of patients in the past [10, 11].
The probability distribution for patient outcomes in the random forest model is formed by averaging such distributions over a set of trees, taking into account that each tree is characterized by a specific distribution [12, 13]. It is obtained by randomly processing the selected treatment outcomes and their corresponding factors from the general population of outcomes on the principle of "return" using decision-making methods based on multidimensional classification. In general, the decision tree is formed in the course of a sequence of steps leading to the prediction of the outcome of patients' treatment based on their characteristics and indicators of the applied medical technologies.
The considered methods were used to assess the probabilities of treatment outcomes for surgical patients in a large hospital based on five possible outcomes: – discharged with improvement; – discharged recovered; – discharged without changes in health; – transferred to another hospital; – fatal outcome. The following factors were taken into account: – the total cost of treatment (rubles), which in practice varied from 3 to 2,700 thousand rubles (the average cost of treatment was 81 thousand rubles); – patient gender (0 – female, 1 – male); – channel of receipt (0 – scheduled, 1 – emergency); – use of high-tech assistance (0 – not used, 1 – used); – a payment on the CHI pole (0 – no CHI, 1 – there is CHI); – direct payment to a medical institution or VHI (0 – no, 1 – yes). The patient belongs to the surgical direction if he/she has resorted to the services of one of the surgical departments: neurosurgery, diagnostic and operative endoscopy, cardiovascular surgery, thoracic and vascular surgery, traumatology and orthopedics, urology, purulent and general surgery, maxillofacial surgery, and ophthalmology.
The methods used in the paper worked with balanced samples of the initial data, in which the proportions between the numbers of patients in different conditions did not differ too much compared to the proportions of the original array [14-16]. Therewith, the patients' conditions were determined by the main departments of the surgical direction of the hospital, without taking into account any differentiation by the profile of the disease.
Therein, each of the methods formed the volume and structure of its balanced sample based on its criteria. In this regard, we note that the initial set of hospital patients included 24 thousand people, and the balanced samples: more than 4 thousand people – for the multinomial model, more than 9.3 thousand people – for the random forest model.
The following variants of expression (2) for latent variables were obtained for the multinomial model, which determined the probabilities of the patient getting into each of the considered initial states after treatment:
, |
(7) |
, |
(8) |
, |
(9) |
. |
(10) |
RESULTS AND DISCUSSION
The results of predicting treatment outcomes using a multinomial econometric model are presented in Table 1.
Table 1. Results of predicting treatment outcomes of patients in the surgical department of the hospital by the multinominal model of disordered multiple choice
Projected values |
|||||||
Observed values |
Treatment outcome |
1 |
2 |
3 |
4 |
5 |
Subtotal |
1 |
633 |
301 |
344 |
7 |
4 |
1,289 |
|
2 |
280 |
732 |
267 |
9 |
1 |
1,289 |
|
3 |
74 |
171 |
1,042 |
1 |
1 |
1,289 |
|
4 |
4 |
2 |
0 |
91 |
4 |
101 |
|
5 |
16 |
9 |
3 |
5 |
16 |
49 |
|
Subtotal |
1,007 |
1,215 |
1,656 |
113 |
26 |
4,017 |
The quality of the results presented in Table 1 can be assessed by the characteristics of their precision and recall [17]. Therewith, the precision of the prediction of the i-th treatment outcome is the ratio of the number of patients correctly assigned to this category to their total number, which, according to the model, should belong to it:
, |
(11) |
where is the number of patients correctly assigned to the i-th category of treatment outcomes (the element standing on the main diagonal of Table 1), – the number of patients assigned to the i-th category by the model, but actually belonging to the j-th category, – the sum of the elements of the i-th column of Table 1.
Recall refers to the ratio of the number of patients correctly assigned by the model (classifier) to the i-th category to their total number [14, p. 70]:
, |
(12) |
where is the sum of the elements of the j-th row of Table 1.
Thus, precision does not allow referring all patients to the same alternative, that is, precision demonstrates the ability to distinguish a certain class from other classes, and recall – to detect the class at all [18].
Based on these two characteristics, we can evaluate the universal measure of the quality of the model's prediction of the i-th treatment outcome, the F-measure, which is the harmonic average between the characteristics of precision and recall:
. |
(13) |
The values of these characteristics of the quality of treatment outcomes predicted by the multinomial multiple-choice model, estimated based on the data in Table 1, are presented in Table 2.
Table 2. Quality characteristics of the results of patients’ treatment in the surgical department of the hospital predicted by the multinominal model of multiple-choice, %
Quality characteristics Treatment result |
Precision |
Recall |
F-measure |
Discharged with improvement |
62.86 |
49.11 |
55.14 |
Discharged recovered |
60.25 |
56.79 |
58.47 |
Discharged without changes in health |
62.92 |
80.84 |
70.76 |
Transferred to another hospital |
80.53 |
90.10 |
85.05 |
Fatal outcome |
61.54 |
32.65 |
42.66 |
As shown in Table 2, the highest quality characteristics of the predicted results of patients’ treatment with the multinomial model occur in the outcome "transferred to another hospital" (F-measure is 85.05%), and "discharged without changes in health" (F-measure is 70.76%), and the lowest in the category "fatal outcome" (F-measure is 42.66%). Such a low value of the F-measure for "lethal outcome" is due to the low level of its criterion for "Recall" (only 32.65%), which, in turn, may be a consequence of the small number of patients with such a result of treatment in a balanced sample. In this regard, it should be noted that the precision criterion (the first column of Table 2) is characterized by less variation in the considered outcomes. Only one value is slightly allocated for the outcome in it – "transferred to another hospital" – 80.53%. The values of this criterion for other outcomes are approximately at the same level, slightly exceeding 60%. Overall, the proportion of treatment outcomes correctly predicted by the multivariate model is over 62% (2,514 cases out of 4,017). At the same time, the values of the recall criterion for the considered outcomes differ significantly, ranging from 32.65% (for a fatal outcome) to 90.10% (for the outcome "transferred to another department").
The results of predicting the outcomes of treatment of patients in the surgical department of the hospital using the random forest model to a certain extent according to their criteria were opposite to their analogs of the multinomial model (Tables 3 and 4).
Table 3. Results of predicting the outcomes of patients’ treatment in the surgical department of the hospital with a multinomial random forest model
Projected values |
|||||||
Observed values |
Treatment outcome |
1 |
2 |
3 |
4 |
5 |
Subtotal |
1 |
5,604 |
1,646 |
801 |
52 |
20 |
8,123 |
|
2 |
66 |
419 |
54 |
0 |
0 |
539 |
|
3 |
61 |
56 |
523 |
0 |
0 |
640 |
|
4 |
4 |
1 |
0 |
34 |
0 |
101 |
|
5 |
13 |
0 |
1 |
0 |
7 |
49 |
|
Subtotal |
5,748 |
2,122 |
1,379 |
86 |
27 |
9,362 |
Table 4. Quality characteristics of the results of patients’ treatment in the surgical department of the hospital predicted by the random forest model, %
Quality characteristics Treatment result |
Precision |
Recall |
F-measure |
Discharged with improvement |
97 |
69 |
81 |
Discharged recovered |
20 |
78 |
31 |
Discharged without changes in health |
38 |
82 |
52 |
Transferred to another hospital |
40 |
87 |
54 |
Fatal outcome |
26 |
33 |
29 |
This model is characterized by a significant variation in the values of all criteria for the considered treatment outcomes. In particular, the range of the spread according to the precision criterion is from 26% for the fatal outcome to 97% for the "discharged with improvement" outcome. According to the Recall criterion, the fatal outcome is also in the extreme position of 33%, although the values of this criterion for all other outcomes are in a rather narrow range of 18% (69% – according to the outcome "discharged with improvement" to 87% – according to the outcome "transferred to another hospital"). The spread of the F-measure values is also significant (from 29% and 31% for the fatal outcome and the "discharged recovered" outcome, respectively, to 81% for the "discharged with improvement" outcome).
In general, the random forest model predicts treatment outcomes more accurately than the multinomial multiple-choice model. The number of correctly predicted treatment outcomes for it is 6,587 cases or 70.36% of the total sample size of 9,362 observations, which is 8% more than in the previous model.
The developed models of multiple-choice, at first glance, reflect a rather paradoxical result: with an increase in the cost of treatment, an increase in the level of its technical support, the probability of more favorable treatment outcomes for the patient decreases, and the unfavorable one increases. This is indicated by negative values of the coefficients for the corresponding factors in the latent variable that characterizes the patient's state of recovery, and positive values – for the same factors in the latent variables that characterize more negative treatment outcomes (without changes, transfer to another hospital and fatal outcome).
Therewith, the coefficients for the factors "gender" and "type of admission to the hospital" generally correspond to the observed structure of outcomes: less favorable outcomes are more common than in women, as in emergency patients, compared with planned ones.
These results, in general, can be explained by the fact that treatment outcomes are highly dependent on the severity of the disease, and its cost, as a rule, raises with its increase. Herewith, more severe (advanced) diseases are more often observed in men than in women, and in emergency patients than in planned ones.
In other words, patients with serious illnesses spend more money on treatment to stay alive, rather than completely recover from the disease. At the same time, patients with mild forms of disease recover more often, even with less significant treatment costs.
In terms of quality of predicted treatment outcomes, the multinominal model of disordered multiple choice and the random forest model showed different results. Therewith, in terms of precision, these results are more even in the multinomial model. Only the level of this indicator stands out for the better in terms of the outcome "transferred to another hospital" – above 80%. This indicator is slightly above 60% for all other considered outcomes. At the same time, the random forest model showed almost absolute precision in the result "discharged with improvement" – 97%. However, the precision of predicting other outcomes cannot be considered high: only 20% for the "discharged recovered" outcome and no more than 40% for three other outcomes (discharged without changes in health, transferred to another hospital, fatal outcome).
Meanwhile, the random forest model showed the best ability in detecting the classes of treatment outcomes, as indicated by the high and more even values of its recall indicator, with the exception, as in the multinominal model, of the fatal outcome.
In general, the precision of predicting treatment outcomes by the random forest model turned out to be almost 8% higher than that of the multinominal model (70.4% and 62.5% of the total sample of patients, respectively), which may be, however, a consequence of the larger sample size used for construction of the first model (more than 2.3 times higher than that of the second model).
CONCLUSION
The results presented in this paper generally indicate that the considered types of models, as well as some of their analogs, can be used to assess the outcomes of patients’ treatment in large medical hospitals if several conditions are met regarding the correctness of this task. These conditions include, in particular:
In this regard, it should be noted that for more reasonable comparability of the results obtained using different models and methods, the structures of their factors and sample sizes should not differ significantly.
ACKNOWLEDGMENTS: None
CONFLICT OF INTEREST: None
FINANCIAL SUPPORT: The study was carried out with the financial support of the Russian Foundation for Basic Research, project No. 20-010-00307 "Methodology for assessing health losses of the population and substantiation of directions for increasing the efficiency of health systems in the regions of the Russian Federation".
ETHICS STATEMENT: Authors are aware of, and comply with, best practice in publication ethics specifically with regard to authorship, dual submission, manipulation of figures, competing interests and compliance with policies on research ethics. Authors adhere to publication requirements that submitted work is original and has not been published elsewhere in any language.
1. Lutsenko EV. Development of medical information technologies in the Russian Federation. Vyatka Med Bull. 2017;2(54):73-6.
2. Migunova YuV. Problems and contradictions in the staffing of medical organizations. Soc: Sociol, Psychol, Pedagogy. 2017;10:47-51.
3. Starodubov VI, Ulumbekova GE. Healthcare in Russia: problems and solutions. ORGZDRAV: News. Opinions. Training. VsHOZ Bull. 2015;1(1):18-9.
4. Tikhomirova TM. Quantitative methods for assessing the state and health losses of the population in the regions of Russia. Federalizm. 2016;1(81):43-64.
5. Toor MN, Baig MT, Shaikh S, Shahid U, Huma A, Ibrahim S, et al. Pharmacovigilance as an Essential Component of Pharmacotherapy at Tertiary Hospitals in Rural Areas of Pakistan. Pharmacophore. 2020;11(4):71-5.
6. Timofeev VS, Sanina AA. Building binary choice models based on a universal family of distributions. Bulletin of the Astrakhan State Technical University. Series: Manag, Comput Eng Inform. 2015;3:104-11.
7. Tikhomirov NP, Tikhomirova TM, Lebedev SA. Methods for assessing the demand for medical services and their resource provision in medical hospitals. RISK: Resour, Inf, Procurement, Compet. 2018;3:100-4.
8. Tikhomirova TM, Gordeeva VI. Assessment of health care cost-effectiveness considering a reduction in population health loss. J Pharm Sci Res. 2017;9(11):2204-11.
9. Noor SS, Keerio NH, Valecha NK, Qureshi MA. Total Hip Arthroplasty Outcome (Hospital Results). J Biochem Technol. 2020;11(3):92-6.
10. Kaftannikov IL, Parasich AV. Features of the use of decision trees in classification problems. Bull Yuurgu. Series: Comput Technol, Control, Electron. 2015;15(3):26-32.
11. Tikhomirova TM, Sukiasyan AG. Discrete choice models: A tutorial. Moscow: Ru-Science. 2018.
12. Kartiev SB, Kureichik VM. A classification algorithm based on the principles of a random forest for solving the forecasting problem. Softw Prod Syst. 2016;2(114):11-5.
13. Chistyakov SP. Random forests: an overview. Proc Karelian Sci Cent Russ Acad Sci. 2013;1:117-36.
14. Kavrin DA, Subbotin SA. Methods for quantitatively solving the problem of class imbalance. Radio Electron, Comput Sci, Control. 2018;1(44):83-9.
15. Nikulin VN, Kanishchev IS, Bagaev IV. Data balancing and normalization techniques to improve classification quality. Comput Tools Educ. 2016;3:16-24.
16. He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263-84.
17. Shunina YuS, Alekseeva VA, Klyachkin VN. Performance criteria for classifiers. Bull Ulyanovsk State Tech Univ. 2015;2(70):67-70.
18. Shung KP. Accuracy, precision, recall or F1. Towards Data Science. 2018. Retrieved from: https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9.