Abstract and keywords
Abstract (English):
The article is concerned with developing mathematical support and algorithms for solving the problem of economic diagnostics of enterprises. IT-companies and start-ups (IT projects) that have special characteristics during the growth period were selected as the object of research. Based on the system analysis of data domain there has been developed a system of quantitative and qualitative characteristics to identify the economic state of the IT companies and start-ups in the external and internal environment. Scales of indices of different nature have been determined. Methods to introduce order and equivalence relations for the found peer companies have been given in order to compare their proximity to the analyzed company. Metrics used for comparing the companies are considered taking into account the quantitative and qualitative characteristics. The possibilities of distributing innovative IT projects using fuzzy clustering algorithms are considered. The comparative analysis of two basic algorithms - Fuzzy Classifier Means algorithm and Gustafson - Kessel algorithm - has been given. The clustering procedure for each algorithm is shown, as well as the graphic results of their operation. There was done the clustering quality assessment using a distribution coefficient, entropy of classification, and Hie-Beni index. It has been inferred that using Gustafson - Kessel algorithm provides better results for solving the problem of splitting IT projects for their economic diagnostics

IT start-up, case-based reasoning, precedents, peer company, comparative method, fuzzy clustering, Gustafson - Kessel algorithm, FCM
Publication text (PDF): Read Download


The task of estimation is one of the low-formalized tasks of economic systems management under conditions of uncertainty. The results of various property objects estimation are the basis for most of decision making in the private and public sectors under current economic conditions. Analog method is one of the most effective estimation methods. It is based on comparing the company with the most suitable analog ones, choosing the relevant prototype and transferring its economic properties and trends to the object of research. Basing on the global trend towards digitalization of economic sectors, informatization process occupies a special place. Now there are about 5000 small IT companies in Russia. Taking into account interest of venture funds and large IT -companies in buying of start-ups the estimation task is very important both for business and information technologies development in Russia.

We can observe the rapid growth of start-ups which offer modern applied IT solutions accelerating the economic, technological, service and other processes both for business and people. The high concentration of start-ups in the IT industry led to the development of venture capital fund system, most investments of which are distributed to the IT projects.

The reason for this is that to implement of R&D in the IT project the technology of rapid results is being used now. The technology helps to shorten significantly the period of output of the final product to the stage of commercialization. All this makes the IT sector the most attractive both from the point of view of developers and financial investments.

However, the process of developing and implementing a new IT project can be influenced by various external and internal factors that generate uncertainty of the final result and of the success of its commercial implementation. And for a venture investor, an important aspect is the investment risk profile acceptable to him.

In this context, venture funds have the task of careful economic diagnosis of projects aimed at determination of IT projects’ level investment prospects and investment risk for decision-making on investment.

At the same time, the use of the analog method for an estimation of IT - start-ups value using information technology is constrained by lack of information models and mechanisms that support this process:

If we talk about an estimation of IT projects investment attractiveness, then, due to the uncertainty and risk, the application of investment analysis traditional methods to projects of this kind can lead to unreliable results, since traditional methods do not take into account the innovative component of projects.

In this regard, the development of mathematical methods and algorithms providing a qualitative IT projects estimation is an urgent scientific and practical task.

The purpose of the study is to develop and justify mathematical methods and algorithms providing a decision making support process for the value and investment attractiveness of IT companies (projects) estimation using data mining tools.

The purpose setting divides the research into 2 main stages:

1. Development of a mathematical device for the IT start-up estimation, using case-based reasoning method and comparing analogues.

2. Development of a procedure of estimation of IT projects investment attractiveness using cluster analysis tools.

The study is focused on an IT company or a start-up that has special values of economic characteristics during the growth period which are not specific for ordinary enterprises. In the course of the study we agree that an IT start-up and an IT-project are identical concepts.


The estimation of IT start-up value

Identifying start-up characteristics. The set of criteria required for economic diagnostics of an IT company was determined by the example of a startup. A startup estimation method depends on the stage of: preseed; seed; series A.

At the stage of Preseed, the estimation takes place at a fixed rate of a business angel or an accelerator, the main task of which is to speed up the delivery of early stage projects to the first investor, to refine and help them. It is rather difficult to structure the indicators at this stage, since the start-up does not have formal indicators that allow the construction of a financial model, but only meets the following requirements: an achievable market volume of at least 300 million rubles, deadline - 3-5 years; team of the project - at least two people; the presence of a working MVP (minimum viable product) -minimum viable product.

At the stage of Seed, the objective is to scale the business (increase the number of customers, customer segments, geography, etc.). The estimation can be viewed from two sides, determining how much investment is needed, based on the team's costs per month and the investor's expectations through a specific time period. It is possible to use the indicators accepted in the international practice for the analysis of investment projects, for example, - NPV (Net Present Value).

Stage A is the stage of active growth and increasing of the company. At this stage, the following indicators are highlighted: Cash-flow, multiplier, discount rate, scale-out limiters.

Comparing the formation of estimates in three stages, it must be taken into account that the accelerators note that the systematization of the estimation for the Preseed stage is an impossible task, since here the subjective assessment formed after personal communication with the creators is more significant. Therefore, we will consider the Seed & Series A stages. Therefore, we will consider the Seed & Series A stages.

The papers of B. Payne [1, 2] and S. Nasser [3] are the most popular papers in this area of research, which are much talked in online research. They are devoted to the valuation of companies, including start-ups to various stages of investment.

To analyze the selected stages, we use five commonly used estimation methods of startups, summarizing the indicators on which they are based. The methods were determinate after undertaken studies in the largest business incubators in Russia, which mark the feasibility and adaptability of the selected methods to the Russian conditions. It should be noted that most methods are based on data from comparable companies or basic estimates: the Berkus method, the method of summation of risk factors, the venture capital method, the discounted cash flow method, the comparison method.

The characteristics that generate the above methods are grouped as qualitative and quantitative, it was done for the subsequent structuring and scaling. In total, 15 quantitative and 14 qualitative indicators were selected, including 9 types of risk (Table 1).

Table 1

Characteristics of start-up

Quantitative characteristics

Qualitative characteristics

Customer Acquisition Cost (CAC), Rub.

Team evaluation

Cash-Flow, Rub.

Scaling drivers


Scaling limiters

Market capitalisation, Rub.

Strategic relationship

Backlog, Rub.

Product introduction or sales start

Operating profit, Rub.

Quality of the prototype

Sensible idea (cost base), Rub.

Managerial risks

ROI (Return On Investment), %

Risks at different stages of business development

Discount rate, %

Political risks

Expected growth rate, %

Marketing risks

Regular monthly income, Rub.

Risks related to financing / raising of capital

Number of persons employed, Piece

Litigation risks

EBITDA (Earnings before interest, taxes, depreciation and amortization), Rub.

International risks

Gross profit, Rub.

Reputational risks

Risks associated with a potentially profitable exit from a startup


The estimation system of an IT company under the given set of characteristics will determine a point set in the criteria space that have a formal criterion representation. In order one company to serve as a good analog for other evaluation, it is desirable that they resemble in many characteristics, at the same time it is possible to prioritize, reinforcing the weight significance of a particular characteristic.


Identifying a peer company selecting method

For the selection of peer companies, we apply one of the decision-making methods – the method of case-based reasoning, using knowledge of known situations or cases (precedents), which in our case are peer companies. We define the set (IT) of IT companies considered in the selection of analogues. The information about a set IT is represented in the form . To determine the properties-characteristics of each IT company iti   we compare a set of characteristics . Then each IT company can be represented in a form , where  is a characteristic function that defines a subset or the i-th IT company.

Once the iti peer companies are extracted, you need to select the “similarity” to the it* precedent, describing the degree of proximity by the formula


where – a metric is calculated by m characteristics of analog and precedent    and ; wj – a degree of importance of the j-th characteristics.

The choice of the metric is the most difficult problem. The inhomogeneity of the characteristics does not allow us to introduce an algebra of operations on the given set. The most famous is the mathematical method of nearest neighbor [4], which is able to measure the degree of proximity for any characteristic:

where  – is an error indicator that takes a logical value to a number by the rule [false] = 0, [true] = 1.

For quantitative characteristics it is also possible to use Euclidean distance or the Manhattan metric, provided that all characteristics are reduced to a single measurement scale or normalized.

If the exact match of characteristics is not required (or it is not attainable), it is possible to use the Zhuravlev metric

where ɛ is a given level of deviation of j-characteristics of the analogue and precedent from each other.

The number of characteristics has an effect on output error, since the curse of dimension may arise: according to the law of averages, the sums of a large number of deviations are very likely to have very close values. This fact subsequently leads up to the need to form a set of informative characteristics, but will require retrospective observations for them to form a sample of data, to reveal the dependence or multicollinearity.

For qualitative characteristics, it is possible to use the measure of Hamming's similarity by determining the maximum number of matching characteristics of a precedent and an analogue. If you cannot enter a metric, various proximity measures are used.

After the database of precedents is formed in any way - manual or automated, it is possible to allocate relationships of order and equivalence for the objects filling it [5]. Using a geometric approach to the solution of this problem, the importance of which was stressed by D. A. Pospelov [6], it is possible to represent analogs and precedents as independent information objects and, in the future, to compare them both by individual characteristics and in general.

Analyzing analogues using the equivalence relation, the original set is divided into equivalence classes  of element  in the form of a subset of elements equivalent to

The classes of analogs can represent both nominal and ordinal scales. In the first case, they can be constructed in two ways: by clustering and using expert estimates. In the second case it is possible to use the partitioning of the original set into Pareto classes with subsequent ordering of these classes.

When analogues analyzed using the order relationship, precedents are arranged by rank in the absence of an accurate analog. Let's highlight the following decision-making tasks, using the ranking of analogues along the proximity to the precedent:

  •  the task of ranking analogs based on knowledge of their states at a given time ;
  •  the task of ranking analogues based on knowledge of their states at different times (for example, corresponding to the stages) ;
  •  the task of ranking analogues according to a given characteristic ;
  •  the task of ranking analogues on aggregate characteristics .

In the latter case, the equal importance of characteristics is considered when the decision-maker can or cannot reliably establish priorities between them. In the case of equal characteristics, a set of incommensurable undominated alternatives are formed - the Pareto ITP set. Thus, in the case of the solution is selected not just one but many peers, which ultimately makes the final decision difficult. In this case, apply mathematical methods that narrow the Pareto set, for example, the method of median distributions [7, 8]. The advantage of the method is the combination of qualitative and quantitative assessments.

It is also possible to construct various functions for selecting CK (IT) and CD (IT) in case the absence of information about the relative importance of characteristics and the availability of characteristics of both quantitative and qualitative type. They narrow the Pareto set and take into account only the mutual relations between the estimates of the analogs without taking into account the absolute values of the differences in the estimates by characteristics.

For two analogues  we define the number of characteristics by which itl has more proximity to it* than iti. For analogs whose maximum is this number, we define on the IT-set a numerical function  taking values corresponding to the maximal numbers found, where q(iti, itl) is the number of characteristics over which itl exceeds the variant iti, in other words, is closer to the precedent.

A choice function for the CK was constructed, considering the number of dominant characteristics of the analogue, which are close to the precedent, choosing the maximum values of the row of the matrix  and then separating the minimal of them:

where .

As a result, a subset of analogues is formed, which have a greatest number of characteristics close to the precedent. The resulting subset has less potency than the Pareto set, and

Consider the second method of generating analogues, closed to precedent using QIT matrix. The dominant index of the set IT was defined, equal to( . The value of the choice function CD (IT) is a subset of all variants of with a minimum IT dominant index:

A circular n-tournament selection function СТ was constructed:



This function also narrows the Pareto set, forming a subset of analogues close to the precedent, with .

The next stage is the investment attractiveness estimation of formed IT start-ups set.


Investment attractiveness estimation

Cluster approach to IT projects of investment attractiveness estimation. Let us consider IT project investment attractiveness task in detail.

Practice and work review [9, 10] shows that the most frequently used investment indicators for economic diagnostics of investment attractiveness of deferent projects are net present value (NPV), profitability index (PI), internal rate of return (IRR), payback period (PP). The use of such indicators for economic diagnostics of an IT start-up is difficult, as for the decision-making on investment it is necessary to take into account not only the financial component of the project, but also risks, finance, marketing and others.

This means that the IT project needs to be evaluated according to certain groups of criteria. Multicriteria evaluation of projects is carried out by experts subject to consistency of options [11]. Expert opinions have linguistic descriptions of the type “high”, “medium”, “low”, which are expressed quantitatively on a scale of 0 to 1. The obtained aggregated expert opinions can be used as signs of classification of the set of IT projects. Thus, a selection of IT projects can be divided into groups of projects with a certain set of similar characteristics that allow one to judge the investment prospects of an IT project. Such a procedure can be carried out using the methods of cluster analysis.

There is a set of IT projects , estimated by indicators  (L1 – novelty of the project relevance, L2 – the degree of risk, L3 – the characteristic of the scientific and technical product, L4 – market potential, L5 – the evaluation of project feasibility,  L6 – economic efficiency). The estimation is carried out by an expert group at discrete instants of time . The mathematical statement of the task is represented as follows.

1. It is required to distribute a set of IT projects , each of which is characterized by six characteristics , into three non-overlapping clusters (groups on investment prospects (IP)) K = {K1, …, K3} (K1 – IT projects with a high level of IP; K2 – IT projects with a medium level of IP recommended for revision; K3 – IT projects with a low level of IP recommended for refusal to finance).

2. Select the most appropriate clustering algorithm, by evaluating the quality of clustering:


It should be noted that the fuzzy multivariate type of expert judgments in the implementation of the expert evaluation procedure generates uncertainty that will affect the structure of the cluster. In addition it will be difficult to range the j-th IT project only to one of the clusters {K1, …, K3}.

This problem can be solved by using of the fuzzy clustering method [12], which differs in determining the membership degree of the project pj to each cluster and based on the theory of fuzzy sets by Zade [13].


Analysis of fuzzy clustering algorithms

After analyzing the fuzzy clustering algorithms in the studies [14, 15], we came to the conclusion that the presented algorithms can be conditionally divided into two main groups. The first group is the algorithms that form clusters of spherical shape. The second group is algorithms that form clusters in the form of hyperelipsoids of different orientations.

As the basic algorithms of these groups, we choose the fuzzy c-mean (FCM) algorithm and the Gustafson - Kessel algorithm, respectively. All other algorithms of fuzzy clustering are their derivatives [16].

If you use fuzzy clustering, the selected three groups {K1, …, K3} will be fuzzy clusters, for convenience we will denote them by . Then, fuzzy clusters will be described by a fuzzy partition matrix of the following form [17]:


where  – membership function of k-th IT project with a set of characteristics  to clusters .

So here it is a conclusion that every IT project having different membership degrees can be assigned to each of the three clusters. In this case, it is necessary to fulfill the following conditions


Now let us show the main distinguishing characteristics of the algorithms under consideration.

In the FCM method, the minimization of the functional has the form [18]:


where  – cluster center vector, and  –distance matrix to cluster centers.

The quantities in (1) can be determined from expressions



where m – exponential weight.

The condition for stopping this algorithm of fuzzy clustering is , where ε – is given by decision maker.

The Gustafson - Kessel algorithm differs in that it has its own matrix . In accordance with [19] we have the expression


Then the functional  will have the form


The functional in the form (2) cannot be minimized by Ai, since it is linear by Ai. Therefore, in order to obtain an acceptable solution, it is necessary that . It means, that should restrict the determinants of matrices Ai. Then the fuzzy covariance matrix for the i-th cluster will be determined as follows


For the next stage of the study, 50 IT projects were evaluated. The expert evaluations were made consistent, there was no affiliation between the experts. Given data for implementing the algorithms are as follows: m = 2, c = 3, ε = 1 e–6, matrix P is an aggregated expert evaluation of the criteria considered above .


Implementation of fuzzy clustering algorithms

The FCM algorithm. Formally, algorithm FCM (fuzzy c-average) can be represented in the form of a flowchart, which is shown in Fig. 1.





Fig. 1. Flowchart of the algorithm for clustering IT projects (FCM)


Fig. 2 shows the visualization of the results obtained using the Principal Component Analysis (PCA, implemented in the SOMToolbox of the Matlab engineering calculation environment) [20].



Fig. 2. Displaying FCM results using the PCA method


The Gustafson - Kessel algorithm .After that, the Gustafson - Kessel algorithm is implemented, the block diagram of which is shown in Fig. 3.



Fig. 3. Flowchart of the algorithm for clustering IT projects (the Gustafson - Kessel algorithm)


It took 141 iterations (until the breakpoint of the algorithm stopped) to solve the task of fuzzy clustering by the Gustafson - Kessel method.

Fig. 4 shows the results of clustering by the Gustafson - Kessel method using PCA.



Fig. 4. Displaying the results of the Gustafson - Kessel algorithm by the PCA method


Clustering Quality Assessment. Researches [21] propose to use the following indicators for evaluation of clustering quality.

1. The partition coefficient, calculated by the formula


It is used as a measure of fuzziness (the higher it is, the better assessment of fuzziness and clustering indirectly), but it does not take into account the pairwise distances needed to evaluate compactness and separation. Therefore, another indicator was proposed.

2. The classification entropy


This indicator varies within . The main purpose of the application of indicators R1 and R2 – search for the most acceptable number of clusters in an unclear partition. But as both indicators depend on the number of clusters (l), that are suitable for comparing partitions with only the same number of clusters.

3. Xie and Beni's Index


This coefficient is most suitable for estimating the compactness and separability of clusters in a fuzzy partition. It allows to judge the adequacy of the results obtained

The table shows the results of assessing the quality of clustering using two algorithms with the help of the considered indicators.

Table 2

The results of the evaluation of the quality of clustering


FCM algorithm

The Gustafson - Kessel algorithm











Table 2  shows that FCM has a smaller value R1, the large value of entropy and its coefficient Hie-Beni R3 exceeds the analogous indicator of the Gustafson - Kessel algorithm.

Thus, to solve the task of dividing IT projects into groups according to the degree of investment attractiveness, the most preferred is Gustafson - Kessel's fuzzy clustering.

In addition, the advantage of the Gustafson - Kessel algorithm is that it forms an adaptive form for each cluster, which makes it possible to order objects on clusters more correctly.



The conducted research allowed to achieve the following results:

  • there have been considered the issues of IT companies and startups economic diagnostics in the task of business value estimation based on the use of case- based reasoning method and comparing analogues were considered;
  • there have been selected characteristics and considered the issues of metrics and proximity measures choice for quantitative and qualitative characteristics of peer companies;
  • there have been presented mathematical methods that arrange set of peer companies by proximity to a precedent;
  • there has been proved the necessity of fuzzy clustering using for solving the problem of economic diagnostics of IT projects in particular of determining the level of investment prospects;
  • there has been carried out the analysis of two basic fuzzy clustering algorithms Gustafson - Kessel and FCM and also the features of its functional were considered;
  • there was carried out the practical implementation of the considered algorithms for 50 IT projects with aggregated expert estimates;
  • there was carried out an evaluation of clustering quality and was made a conclusion about the preference for using the Gustafson - Kessel algorithm.

The proposed approaches and mathematical device will allow to formalize the uncertainty and risk in the economic diagnostics of IT projects, as well as to improve the effectiveness of the financial decisions made by venture investment funds and other investment companies.


1. Payne B. Methods for Valuation of Seed Stage Startup Companies. Available at: www.angelcapitalassociation. org/blog/methods-for-valuation-of-seed-stage-startup-companies/ (accessed: 21.01.2020).

2. Payne B. Startup Valuations: The Risk Factor Summation Method. Available at: http://billpayne. com/2011/02/27/startup-valuations-the-risk-factor-summation-method-2.html (accessed: 21.01.2020).

3. Nasser S. Valuation For Startups – 9 Methods Explained. Available at: (accessed: 24.01.2020).

4. Anand S. S., Hughes J. G., Bell D. A., Hamilton P. Utilising Censored Neighbours in Prognostication. Workshop on Prognostic Models in Medicine. Denmark, Aalborg, 1999. Pp. 15-20.

5. Karpov L. E., Iudin V. N. Metody dobychi dannykh pri postroenii lokal'noi metriki v sistemakh vyvoda po pretsedentam [Data mining methods for constructing local metrics in systems of deduction by precedents]. Moscow, Izd-vo ISP RAN, preprint № 18, 2006. 21 p.

6. Pospelov D. A. Modelirovanie rassuzhdenii. Opyt analiza myslitel'nykh aktov [Modeling of reasoning. Practice in analysis of mental acts]. Moscow, Radio i sviaz' Publ., 1989. 184 p.

7. Kosmacheva I., Kvyatkovskaya I. Y., Sibikina I., Lezhnina Y. Algorithms of Ranking and Classification of Software Systems Elements. Knowledge-Based Software Engineering: Proceedings of 11th Joint Conference, JCKBSE 2014. Volgograd, Springer International Publishing, 2014. Pp. 400-409.

8. Pham Quang Hiep, Kvyatkovskaya I. Y., Shurshev V. F., Popov G. A. Methods and Algorithms of Alternatives Ranging in Managing the Telecommunication Services Quality. Journal of Information and Organizational Sciences, 2015, vol. 39, no. 1, pp. 65-74.

9. Kulikov D. L., Kucherov A. A. Stanovlenie i razvitie metodov otsenki effektivnosti innovatsionnykh proektov [Formation and development of methods for evaluating effectiveness of innovative projects]. Sovremennye problemy nauki i obrazovaniia, 2015, no. 1. Available at: (accessed: 30.01.2020).

10. Malova O. T. Podkhody k otsenke innovatsionnykh investitsionykh proektov [Approaches to the assesment of innovative investment projects]. Mezhdunarodnyj nauchnyj institut «Educatio», 2015, no. 3 (10), pp. 140-142.

11. Popov G. A., Kvyatkovskaya I. Y., Zholobova O. I., Kvyatkovskaya A. E., Chertina E. V. Making a choice of resulting estimates of characteristics with multiple options of their evaluation. Proceedings of 3rd Conference on Creativity in Intelligent Technologies and Data Science, CIT and DS 2019 (Volgograd, Russia, September 16–19, 2019). Part of the Communications in Computer and Information Science book series (CCIS, volume 1083). Springer, 2019. Part I. Pp. 89-104.

12. Bezdek J. C., Ehrlich R., Full W. FCM: The Fuzzy c-Means Clustering Algorithm. Computers & Geoscience, 1984, vol. 10, no. 2-3, pp. 191-203.

13. Zade L. A. Poniatie lingvisticheskoi peremennoi i ego primenenie k priniatiiu priblizhennykh reshenii [Concept of linguistic variable and its application to approximate decision making]. Moscow, Mir Publ., 1976. 165 p.

14. Neiskii I. M. Klassifikatsiia i sravnenie metodov klasterizatsii [Classification and comparison of clustering methods]. Available at: (accessed: 05.02.2020).

15. Jain A. K., Murty M. N., Flynn P. J. Data Clustering: A Review. ACM Computing Surveys, 1999, vol. 31, no. 3, pp. 264-323.

16. Rozilawati Binti Dollah, Aryati Binti Bakri, Mahadi Bin Bahari, Pm Dr. Naomie Binti Salim. Feasibility Study Of Fuzzy Clustering Techniques In Chemical Database For Compound Classification. Available at: (accessed: 17.12.2019).

17. Shtovba S. D. Proektirovanie nechetkikh sistem sredstvami MATLAB [Designing fuzzy systems using MATLAB software]. Moscow, Goriachaia liniia – Telekom Publ., 2007. 288 p.

18. Bezdek J. C., Dunn J. C. Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Dustrubutions. IEEE Transactions on Computers, 1985, pp. 835-838.

19. Gustafson D. E., Kessel W. C. Fuzzy clustering with fuzzy covariance matrix. Proceedings of the IEEE CDC. San Diego, 1979. Pp. 761-766.

20. Jolliffe I. T. Principal Component Analysis. Springer Series in Statistics, 2nd ed. NY, Springer, 2002. XXIX. 487 p.

21. Xie X. L., Beni G. A. Validity measure for fuzzy clustering. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence,1991, vol. 3 (8), pp. 841-846.

Login or Create
* Forgot password?