Escort Introduction
The effectiveness of software maintenance tasks are heavily dependent on the accuracy and reliability of software documentation, especially if the tasks are out-sourced to third party vendors. If the documentations are out-of-date, considerable amount of time need to be spent on software comprehension activities. Software clustering is often used as a remodularization and architecture recovery technique to help developers simplify software maintenance tasks and ease the burden of software comprehension. Despite this, unsupervised clustering techniques tend to ignore prior knowledge from domain experts, leading to results that can be nonsensical to developers. Semi-supervised clustering (constrained clustering) can incorporate supervision of domain experts or side information to help improve clustering results of classic unsupervised clustering techniques. However, these techniques rely heavily on manual analysis for identifying clustering constraints and hence, cannot scale very well.
We propose an evolution-aware software clustering constraint derivation approach, Escort, which automatically derives clustering constraints based on evolutionary data of the analyzed software. Specifically, Escort can serve as an alternative approach to derive implicit and explicit constraints in situations where domain experts are absent. In the subsequent constrained clustering process, Escort can be considered as a framework to help supplement and enhance various unsupervised clustering techniques to improve their accuracy and reliability. We evaluate Escort based on both quantitative and qualitative analysis. For the quantitative validation, the experiment results showed that our approach outperformed five other unsupervised clustering techniques. For the qualitative validation, we invited experienced developers working in five IT companies and students majoring in software engineering to participate in our survey to evaluate the rationality of the generated clustering constraints. The survey shows that the participants agreed with the clustering constraints generated by Escort. Moreover, we evaluate the usefulness of refactoring suggestions based on the generated constraints. The validation indicates that Escort is capable of providing meaningful refactoring suggestions that are consistent with the real refactoring operations (obtained by Refactoring Miner from commit massages) performed by developers. In particular, for the 15 refactoring suggestions generated by Escort that have not yet been carried out by developers, we also reported them to the respective developers on GitHub for further validation. Encouragingly, 60% of our reported refactoring suggestions have been acknowledged by the developers where they have either incorporated them directly, or in future releases.
Studied Subject
ID | Project | # Versions | # Major Versions | # Stars | KLOC (Avg) | # Classes (Avg) | Commits |
---|---|---|---|---|---|---|---|
1 | Activemq | 64 | 2 | 1,764 | 324.9 | 3,057 | 10,601 |
2 | Activemq-artemis | 32 | 2 | 602 | 518.3 | 3,324 | 7,502 |
3 | Aeron | 86 | 2 | 5,065 | 51.1 | 330 | 12,654 |
4 | Alluxio | 62 | 3 | 4,613 | 248.0 | 916 | 0,937 |
5 | Apktool | 34 | 2 | 10,220 | 16.6 | 179 | 1,648 |
6 | Assertj-core | 50 | 3 | 1,756 | 109.9 | 2,600 | 2,870 |
7 | Atmosphere | 204 | 3 | 3,430 | 40.6 | 259 | 5,931 |
8 | Atomix | 95 | 3 | 1,901 | 55.6 | 619 | 4,265 |
9 | AxonFramework | 99 | 4 | 2,020 | 93.0 | 724 | 5,951 |
10 | Beam | 83 | 2 | 3,998 | 389.6 | 1,063 | 27,132 |
11 | Bisq | 86 | 2 | 3,102 | 111.1 | 892 | 11,168 |
12 | Byte-buddy | 202 | 2 | 3,485 | 117.0 | 581 | 5,200 |
13 | Calcite | 52 | 2 | 1,894 | 211.5 | 869 | 4,175 |
14 | Camel | 154 | 3 | 3,242 | 680.0 | 7,981 | 45,096 |
15 | Cas | 218 | 4 | 7,620 | 91.1 | 1,219 | 16,869 |
16 | Cassandra | 241 | 4 | 5,950 | 189.2 | 775 | 25,297 |
17 | Conversations | 215 | 3 | 3,541 | 54.6 | 150 | 6,274 |
18 | Cxf | 153 | 2 | 642 | 527.7 | 4,618 | 15,722 |
19 | Dbeaver | 108 | 4 | 13,652 | 286.0 | 2,233 | 16,052 |
20 | Debezium | 73 | 2 | 3,265 | 75.5 | 363 | 3,125 |
21 | Discovery | 76 | 3 | 2,954 | 17.4 | 289 | 2,403 |
22 | Dropwizard | 147 | 3 | 7,657 | 44.0 | 509 | 5,430 |
23 | Eclim | 76 | 2 | 1,026 | 33.2 | 326 | 4,849 |
24 | Flink | 101 | 2 | 13,149 | 698.3 | 4,037 | 22,170 |
25 | Fresco | 40 | 2 | 16,207 | 89.2 | 547 | 2,531 |
26 | Grakn | 45 | 2 | 2,107 | 76.6 | 570 | 4,291 |
27 | Guacamole-client | 33 | 2 | 1,004 | 19.5 | 281 | 5,378 |
28 | Hadoop | 293 | 4 | 10,489 | 972.6 | 1,784 | 23,874 |
29 | Hawtio | 137 | 2 | 1,138 | 63.3 | 199 | 8,803 |
30 | Hive | 40 | 2 | 3,174 | 850.3 | 2,345 | 14,501 |
31 | Java-tron | 51 | 3 | 2,380 | 80.2 | 849 | 14,129 |
32 | karaf | 82 | 3 | 480 | 80.0 | 655 | 8,197 |
33 | Maxwell | 170 | 2 | 2,141 | 68.8 | 123 | 3,110 |
34 | Nifi | 88 | 2 | 2,066 | 60.1 | 693 | 5,286 |
35 | Okhttp | 95 | 4 | 37252 | 50.3 | 167 | 4645 |
36 | Openapi-generator | 53 | 3 | 5,446 | 374.2 | 542 | 14,218 |
37 | Orientdb | 157 | 3 | 4,154 | 368.1 | 2,329 | 19,352 |
38 | Pdfbox | 52 | 2 | 1,162 | 134.7 | 939 | 8,962 |
39 | Pmd | 70 | 2 | 2,887 | 184.3 | 1,415 | 16,532 |
40 | Powermock | 42 | 2 | 3,121 | 36.8 | 590 | 1,607 |
41 | Redisson | 163 | 3 | 13,242 | 74.7 | 486 | 5,675 |
42 | Rest-assured | 56 | 3 | 4,748 | 20.0 | 180 | 1,959 |
43 | Speedment | 67 | 2 | 1,832 | 95.3 | 1,537 | 4,674 |
44 | Spotbugs | 41 | 2 | 1,894 | 227.6 | 1,891 | 16,206 |
45 | Spring-framework | 175 | 3 | 37,411 | 502.5 | 3,773 | 20,896 |
46 | Spring-security | 143 | 4 | 4,843 | 145.0 | 1,231 | 8,732 |
47 | Storm | 33 | 2 | 6,078 | 160.0 | 920 | 10,316 |
48 | Testcontainers-java | 73 | 2 | 3,805 | 8.3 | 175 | 2,008 |
49 | Tika | 56 | 2 | 1,002 | 82.0 | 526 | 4,747 |
50 | Traccar | 31 | 2 | 2,392 | 25.9 | 415 | 6,227 |
Quantitative evaluation (RQ1)
Number of clustering constraints derived from subjects.xlsx
The results of the application of ESCORT in different algorithms.xlsx
Qualitative evaluation (RQ2)
Questionnaire
The issues reported by Escort
ID | Project | Filed Issue ID | # Suggested refactorings | Status |
---|---|---|---|---|
1 | Activemq | #8583 | 1 | Fixed |
2 | Alluxio | #16439 | 2 | Pending |
3 | Atmosphere | #2475 | 1 | Pending |
4 | Beam | #23896 | 1 | Confirmed |
5 | Bisq | #6395 | 3 | Confirmed |
6 | Cxf | #8690 | 2 | Confirmed |
7 | Redisson | #4642 | 1 | Pending |
8 | Openapi-generator | #12200 | 1 | Pending |
9 | Orientdb | #9787 | 1 | Pending |