How are functionally similar code clones syntactically different? An empirical study and a benchmark

Wagner, Stefan; Abdulkhaleq, Asim; Bogicevic, Ivan; Ostberg, Jan-Peter; Ramadani, Jasmin

How are functionally similar code clones syntactically different? An empirical study and a benchmark

dc.contributor.author	Wagner, Stefan	de
dc.contributor.author	Abdulkhaleq, Asim	de
dc.contributor.author	Bogicevic, Ivan	de
dc.contributor.author	Ostberg, Jan-Peter	de
dc.contributor.author	Ramadani, Jasmin	de
dc.date.accessioned	2016-03-10	de
dc.date.accessioned	2016-03-31T08:02:39Z
dc.date.available	2016-03-10	de
dc.date.available	2016-03-31T08:02:39Z
dc.date.issued	2016	de
dc.date.updated	2016-03-10	de
dc.description.abstract	Background. Today, redundancy in source code, so-called ‘‘clones’’ caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.	en
dc.identifier.other	459358820	de
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-105676	de
dc.identifier.uri	http://elib.uni-stuttgart.de/handle/11682/3656
dc.identifier.uri	http://dx.doi.org/10.18419/opus-3639
dc.language.iso	en	de
dc.rights	info:eu-repo/semantics/openAccess	de
dc.subject.classification	Quellcode , Analyse , Empirie , Benchmark	de
dc.subject.ddc	004	de
dc.subject.other	Codeklon , funktional ähnlicher Klon , empirische Studie	de
dc.subject.other	Code clone , Functionally similar clone , Empirical study , Benchmark	en
dc.title	How are functionally similar code clones syntactically different? An empirical study and a benchmark	en
dc.type	article	de
ubs.fakultaet	Fakultät Informatik, Elektrotechnik und Informationstechnik	de
ubs.institut	Institut für Softwaretechnologie	de
ubs.opusid	10567	de
ubs.publikation.source	PeerJ computer science (2016), 2:e49. URL http://dx.doi.org./10.7717/peerj-cs.49	de
ubs.publikation.typ	Zeitschriftenartikel	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: peerj_cs_49_2.pdf
Size:: 408.81 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 935 B
Format:: Plain Text
Description:

Download

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik