Risk Automatic Prediction for Social Economy Companies using Camels
The rest of this paper is organized as follows. Section 2 reviews related work on bankruptcy prediction and the CAMELS
system. Section 3 presents the description of the data. Section 4 presents the description of the proposed model. Section
5 presents the experimental evaluation of the proposed model. Section 6 presents the results and discussions achieved
with the proposed model. Finally, Section 7 presents the conclusions.
2 Background and Related Work
Social economy enterprises are really important for healthy societies; they provide several jobs and sustain part of the
economy. Assessing their performance and possible bankruptcy is a crucial task for governments. Therefore, several
machine learning methods have been applied to predict bankruptcy [13, 15, 18]. These algorithms are based on financial
figures such as balance sheet and income statements, company-specific variables and stock market data. In addi-
tion, for a more general evaluation of a company, it is necessary to evaluate the qualitative attributes of the company [17].
Several types of methods for predicting bankruptcy have been proposed in the state-of-the-art. In [6], the authors used
decision trees to predict whether a given firm will go bankrupt in the following year. They showed that decision trees
are better than logit regression. In [12], the authors built a model to predict the distress of a given company. They used
logit, probit, multivariate discriminant analysis and artificial neural networks, showing that logit and probit are good, as
they have explainable and understandable properties. In contrast, artificial neural networks performed well but had
poor explainable and understandable properties. In [4], the authors evaluated 3,000 companies in Romania in eight
categories, from AAA to D. They tried to predict their probability of downward transition from one year to the next.
They used logit regression and artificial neural network and showed that artificial neural networks perform better than
logit regression. In [19], the authors supplemented traditional data, such as financial figures, with quality data, such
as a company sharing director or senior managers. With these relationships, a neighborhood prediction model was
built. They showed that when quality data is combined with financial data, the performance of the algorithm increases.
In [17], the authors evaluated four machine learning techniques: decision trees, neural networks, random forest, and
logistic regression. They showed that the random forest outperforms the other methods for bankruptcy prediction.
Financial datasets pose various problems, for example, the ratio of bankrupt to non-bankrupt companies is imbalanced,
i.e., there are more non-bankrupt companies. In this case, the algorithm cannot be evaluated with the accuracy metric
alone and more robust metrics such as f1-score, precision or recall can be used. Therefore, in [21], the authors propose
a comparison between three algorithms that deal with imbalance: probabilistic least-squares classification for outlier
detection, an isolation forest, and a one-class support vector machine. They show that probabilistic least-squares
classification (LSDA) outperforms the other two methods. The size of the company is also a really important factor to
take into account. Another problem is to evaluate small, medium and large companies with the same model. In [10], the
authors showed that firm size is a good predictor variable for the success or failure of a firm [6].
The CAMELS model was developed in the United States in 1991 to assess the overall performance of banks through
capital adequacy, assets, management capacity, earnings, liquidity and sensitivity to market risk. This model gives the
overall health and performance on a rating system between one and five, where one implies safe performance and five
implies unsatisfactory performance, of the banking system, but can be used to evaluate other types of companies such
as SEEs [1]. In [7], the authors developed an advanced CAMELs model with a Supervisory Risk Assessment and Early
Warning Systems to evaluate banks over a two-year period. In [20], the authors developed a new method to calculate the
probability of migrating from a low to a high level of risk based on CAMELS. In [3], the authors evaluate the possibility
of predicting changes in bank ratings in the following years by showing that they could predict some variables such as
capital adequacy, and leverage. In [5], the authors used the CAMELS system to evaluate the performance of Turkish
banks in the period from 2001 to 2008, showing that a strong liquidity ratio signifies overall good health. In [14], the
authors analyzed the development of the public and private bank sector in India. They showed that CAMELS is a good
rating system for assessing bank performance. In [11], the authors evaluate several Indian banks using CAMELS and
showing that 95% of the change is given by debt-to-equity ratio, loan-to-deposit ratio, income per employee, capital
adequacy ratio, and total investment-to-total assets ratio.
3 Data Description
The Colombian government collects data from each SEE at regular intervals. Some companies have to send information
monthly, others every six months. The type of data collected are total assets, total savers, total employees, total
associates, financial portfolio per debtor, total income in the period, among others. Each company receives a risk
label ranging from one to five, where one indicates low risk and five indicates high risk. The process of labeling each
2