How are functionally similar code clones syntactically different? An empirical study and a benchmark

dc.contributor.authorWagner, Stefande
dc.contributor.authorAbdulkhaleq, Asimde
dc.contributor.authorBogicevic, Ivande
dc.contributor.authorOstberg, Jan-Peterde
dc.contributor.authorRamadani, Jasminde
dc.date.accessioned2016-03-10de
dc.date.accessioned2016-03-31T08:02:39Z
dc.date.available2016-03-10de
dc.date.available2016-03-31T08:02:39Z
dc.date.issued2016de
dc.date.updated2016-03-10de
dc.description.abstractBackground. Today, redundancy in source code, so-called ‘‘clones’’ caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.en
dc.identifier.other459358820de
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-105676de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/3656
dc.identifier.urihttp://dx.doi.org/10.18419/opus-3639
dc.language.isoende
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.subject.classificationQuellcode , Analyse , Empirie , Benchmarkde
dc.subject.ddc004de
dc.subject.otherCodeklon , funktional ähnlicher Klon , empirische Studiede
dc.subject.otherCode clone , Functionally similar clone , Empirical study , Benchmarken
dc.titleHow are functionally similar code clones syntactically different? An empirical study and a benchmarken
dc.typearticlede
ubs.fakultaetFakultät Informatik, Elektrotechnik und Informationstechnikde
ubs.institutInstitut für Softwaretechnologiede
ubs.opusid10567de
ubs.publikation.sourcePeerJ computer science (2016), 2:e49. URL http://dx.doi.org./10.7717/peerj-cs.49de
ubs.publikation.typZeitschriftenartikelde

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
peerj_cs_49_2.pdf
Size:
408.81 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
935 B
Format:
Plain Text
Description: