static analysis - Removing useless lines from c++ file -
there many times when debugging, or reusing code, file starts acquire lines don't anything, though may have done @ 1 point.
things vectors , getting filled, , go unused, classes/structs defined never used, , functions declared, never used.
i understand in many cases, of these things not superfluous, might visible other files, in case, there no other files, extraneous code in file.
while understand technically speaking, invoking push_back
something, , therefore vector not unused per se, in case, result goes unused.
so: there way this, either using compiler (clang, gcc, vs, etc) or external tool?
example:
#include<vector> using namespace std; void test() { vector<int> a; a.push_back(1); } int main() { test(); return 0; }
should become: int main(){return 0};
our dms software reengineering toolkit c++11 front end used this; presently not off shelf. dms designed provide custom tool construction arbitrary source languages, , contains full parsers, name resolvers, , various flow analyzers support analysis, ability apply source-to-source transformations on code based on analysis results.
in general, want static analysis determines whether every computation (result, there may several, consider "x++") used or not. each unused computation, in effect want remove unused computation, , repeat analysis. efficiency reasons, want analysis determines (points of) usage of result(s) once; data flow analysis. when usage set of computation result goes empty, computation result can deleted (note deleting "x++" value result may leave behind "x++" because increment still needed!) , usage sets of computations on depends can adjusted remove references deleted one, possibly causing more removals.
to analysis language, have able trace results. c (and c++) can pretty ugly; there "obvious" uses computation result used in expression, , assigned local/global variable (which used somewhere else), , there indirect assignments through pointers, object field updates, through arbitrary casts, etc. know these effects, dead code analysis tool has able read entire software system, , compute dataflows across it.
to safe, want analysis conservative, e.g., if tool not have proof result not used, must assume result used; have pointers (or array indexes pointers in disguise) because in general can't determine precisely pointer "points". 1 can build "safe" tool assuming results used :-} end conservative necessary assumptions library routines don't have source. in case, helpful have set of precomputed summaries of library side effects (e.g., "strcmp" has none, "sprintf" overwrites specific operand, "push_back" modifies object...). since libraries can pretty big, list can pretty big.
dms in general can parse , entire source code base, build symbol tables (so knows identifiers local/global , precise type), control , local dataflow analysis, build local "sideeffects" summary per function, build call graph , global side effects, , global points-to analysis, providing "computation used" information appropriate conservatism.
dms has been used computation on c code systems of 26 million lines of code (and yes, that's big computation; takes 100gb vm run). did not implement dead code elimination part (the project had purpose) straightforward once have data. dms has done dead code elimination on large java codes more conservative analysis (e.g., "no use mentions of identifier" means assignments identifier dead) causes surprising amount of code removal in many real codes.
dms's c++ parser presently builds symbol tables , can control flow analysis c++98 c++11 being close @ hand. still need local data flow analysis, effort, global analyses pre-exist in dms , available used effect. (the "no uses of identifier" available symbol table data, if don't mind more conservative analysis).
in practice, don't want tool silently rip things out; might computations wish preserve anyway. java tool produce 2 results: list of dead computations can inspect decide if believe it, , dead-code-removed version of source code. if believe dead code report, keep dead-code-removed version; if see "dead" computation think shouldn't dead, modify code make not dead , run tool again. big code base, inspecting dead code report can trying; how "you" know if apparantly dead code isn't valued "somebody else" on team?. (version control can used recover if goof!)
a tricky issue not (and no tool know of) handle, "dead code" in presence of conditional compilation. (java not have problem; c has in spades, c++ systems less). can nasty. imagine conditional in arm has side effects , other arm has different side effects, or case in 1 interpreted gcc's c++ compiler, , other arm interpreted ms, , compilers disagree on constructs (yes, c++ compilers disagree in dark corners). @ best can conservative here.
clang has ability flow analysis; , ability source transformations, might coerced doing this. don't know if can global flow/points-to analysis. seems have bias towards single compilation units since principal use compiling single compilation unit.
Comments
Post a Comment