Syntax Commentary 1

> +// Class Squasher: A sample Oink tool that produces patches
> +// merge a derived class with its parent

You can write sentences starting with a capital letter and ending with
a period and two spaces, or you can write them in Scott's style with
no capital and separated with a semi-colon or a blank line, but please
don't just have one follow the other with no punctuation at all.

   // Class Squasher: A sample Oink tool that produces patches.
   // Merge a derived class with its parent.

or

   // Class Squasher: a sample Oink tool that produces patches;
   // merge a derived class with its parent

> Index: squash_global.h
> ===================================================================
> --- squash_global.h    (revision 0)
> +++ squash_global.h    (revision 0)
> @@ -0,0 +1,13 @@
> +// see License.txt for copyright and terms of use
> +
> +// All the globals for squash in one place.
> +
> +#ifndef STATICPRINT_GLOBAL_H
> +#define STATICPRINT_GLOBAL_H

Of course these should be renamed to SQUASH_GLOBAL_H.  It is very easy
to make copy paste bugs like this.  The way to prevent it is to copy
and paste and then systematically find-replace EVERYTHING.  It is too
easy to miss a spot like this and create a subtle bug as you
unfortunately have.

For such a strategy to work in general, each major aspect of the
program must have a name and all of the global identifiers of that
aspect must use that name.  Here we see that all of the global
identifiers of staticprint contain the string 'staticprint' with
various capitalizations, and these names carefully separated out from
the rest of the identifier so that it is safe to find-replace
'staticprint' with 'squash' and the resulting code with be isomorphic
with the original (nothing strange like two identifiers merging will
happen).  You have mostly taken advantage of this, I just wanted to
point it out.

> +#include "squash.h"        // this module
> +#include "squash_cmd.h"    // SquashCmd
> +#include "squash_global.h"
> +#include "oink.gr.gen.h"        // CCParse_Oink
> +#include "strutil.h"            // quoted

The level to which '//' is indented when there is a block of them
together should be the same.  Emacs will do this for you automatically
if you use "M-;".  In Elsa, Scott always does this manually so you
just have to do it manually as well.

#include "squash.h"             // this module
#include "squash_cmd.h"         // SquashCmd
#include "squash_global.h"
#include "oink.gr.gen.h"        // CCParse_Oink
#include "strutil.h"            // quoted

> +class MinMaxLineVisitor : public ASTVisitor {
> +public:
> +  int minLine;
> +  int maxLine;
> +  int minCol;

Put a blank line between the data and c/dtors for a class this big.

> +  MinMaxLineVisitor();

Also a blank line between ctors and methods for a class this big.

> +  void postvisitTopForm(TopForm *obj) { notice(obj->loc); }
> +  void postvisitFunction(Function *obj) { notice(obj->getLoc()); }
> +  void postvisitPQName(PQName *obj) { notice(obj->loc); }

These methods all override the virtual visitor methods so they are all
implicitly virtual; I think that should be an error in C++; anyway,
please make them explicitly virtual.

> +struct RewriteVisitor : public ASTVisitor {
> +
> +  RewriteVisitor(StringRef implementationClass);

No blank line between the top of the class and the first member.

> +  bool visitTypeSpecifier(TypeSpecifier *obj);
> +  void printPatch(const std::string &str, const MinMaxLineVisitor &mmlv, StringRef file);
> +
> +private:
> +  StringRef implementationClass;
> +  //on the second pass these will contain the interface name & definition file

Put a space after the '//'.

> +  foreachSourceFile {
> +    File *file = files.data();
> +    maybeSetInputLangFromSuffix(file);
> +    TranslationUnit *unit = file2unit.get(file);
> +    RewriteVisitor vis(class_string);
> +    //first time discover base class

This comment is just too terse; please say something like "During the
first pass we find the base class."  I suppose you might be able to
find examples where you think that I am not clear in my comments;
please feel free to let me know about them.

> +    unit->traverse(vis);
> +    //then merge

"During the second pass we merge the base class into the subclass" or
something a bit more suggestive.

> +MinMaxLineVisitor::MinMaxLineVisitor() :
> +    minLine(INT_MAX), maxLine(0),minCol(INT_MAX) {

Put a space after a comma.

> +void MinMaxLineVisitor::notice(SourceLoc loc) {
> +  int line = sourceLocManager->getLine(loc);
> +  if (line < minLine)
> +    minLine = line;
> +  if (line > maxLine)
> +    maxLine = line;

Arg!  I explicitly prohibit this indentation style in the Oink Coding
Guidelines!

This is ok:
   if (foo) {
     bar();
   }

This is ok in Oink, but not Elsa:
   if (foo) bar();

This is not ok:
   if (foo)
     bar();

> +  if(col < minCol)
> +    minCol = col;

Sometimes you put space after the 'if' and sometimes not.  Please be
consistent about such things (I try to be, but may fail occasionally).
I did not put it into the guide, but I will now: we generally put a
space after an 'if'.

> +/** if m1 pure virtual decl for m2 then true */

Do not omit verbs and other helpful words in the interest of
terseness.  Make it readable.  Are you saying: "If m1 is a pure
virtual declaration for m2 then return true." ?

> +bool sameMember(Member *m1, Member *m2) {
> +  if(!(m1->ifMR_decl() && m2->ifMR_decl()))
> +    return false;
> +  Declaration* dn1 = m1->ifMR_decl()->d;
> +  Declaration* dn2 = m2->ifMR_decl()->d;
> +  FakeList* d1 = dn1->decllist;
> +  FakeList* d2 = dn2->decllist;
> +  //should be a while loop, but I don't see when you you can have more than 1 decl

("you you" ??  I make these mistakes too, but please proofread.)

OUCH!  Make it a while loop anyway!  Or at the very least add an
assertion that it is a list of length at most one.  If you make an
assumption in your code upon which the correctness depends, add an
assertion for that assumption.

   // I think it is not possible for a declaration to have more than one
   // declarator under these circumstances.
   xassert(d1->butFirst()->isEmpty());

> +  if(d1->isNotEmpty() && d2->isNotEmpty()) {

Some people say everything should be down inside a nested 'if' but I
disagree: in general code is much easier to read if avoidable nesting
is avoided.  So in this case, we can dispense with the case where the
if condition fails early on (right now it is handed in the 'else'
clause far below).

   bool bothNotEmtpy = d1->isNotEmpty() && d2->isNotEmpty();
   if (!bothNotEmtpy) return false;
   // ... rest of block that was down inside the 'if' continues here.

> +    Type* t1 = d1->first()->type;
> +    Type* t2 = d2->first()->type;
> +    Variable* v1 = d1->first()->var;
> +    Variable* v2 = d2->first()->var;
> +    if(!(v1->name == v2->name && t1->isFunctionType() && t2->isFunctionType() && v1->isPureVirtualMethod()))

Can you really read this as easily as you can read written English?
It is too long; at least add some newlines and some whitespace to show
off the very important leading 'not'.

if (! (v1->name == v2->name &&
      t1->isFunctionType() &&
      t2->isFunctionType() &&
      v1->isPureVirtualMethod()))
{
 return false;
}

> +    const FunctionType *f1 = t1->asFunctionTypeC();
> +    const FunctionType *f2 = t2->asFunctionTypeC();

This isn't in the style guide yet, but it is a big pet peeve of mine:
const annotates the thing BEFORE it, not after it.  It is only an
exception that a leading const annotates the thing before it.  We
probably have violations of this here and there, but when you are
someone who works on type qualifiers it really starts to bug you.
Here is why.  Quick, what does this mean?

   int * const * x;

Is it a const pointer to a pointer to an int?  Or a pointer to a const
pointer to an int?  Or a pointer to a pointer to a const int?

This is the language you are using, so even if you can figure it out,
you really shouldn't have to think about it at all, otherwise you're
thinking is fighting the language.  The answer is the second one: it
is a pointer to a const pointer to an int, but many people probably
think it is the third: a const pointer to a pointer to an int.  This
is because they have been falsely trained to think that way by people
writing this:

   const int x;

when they really should be writing this;

   int const x;

> +    if(!(f1->retType->equals(f1->retType) && f1->params.count() == f2->params.count()))

Again, my mind is just fighting with the whitespace or lack thereof in
order to read that.

> +    SObjListIter it1(f1->params);
> +    SObjListIter it2(f2->params);
> +   > +    //skip first argument..this?

When referring to 'this' as the language construct, do please put it in
quotes, otherwise it can be confusing.  In general, write comments to
a person in your mind who really has no idea what is going on: when in
doubt, spell things out.

> +    if(it1.isDone() || it2.isDone()) {
> +      return false;
> +    } else {
> +      it1.adv();
> +      it2.adv();
> +    }
> +   > +    while(!it1.isDone()) {
> +      if(!it1.data()->type->equals(it2.data()->type)) {
> +        return false;
> +      }
> +      it1.adv();
> +      it2.adv();
> +    }
> +    //d1 = d1->butFirst();
> +    //d2 = d2->butFirst();

Why is this commented-out code left here?  I sometimes write code and
leave it commented out, but it is for a reason.  One reason is
context: an old way of doing something; I realize that could be
confusing and I should probably clean up such instances.  Another
reason is two write 'negative code': a comment that says "here is the
naive and wrong way of doing this; don't do it this way".  However,
your code above just seems to be left over from a previous way of
doing the iteration.

> +  } else {
> +    return false;
> +  }

See above: this 'else' case could be eliminated at the start instead of
being left here requiring the entire 'then' clause to be indented.

> +static int countNewlines(const std::string &str) {
> +  //if string ends with a newline, don't count it
> +  int count = 0;
> +  for(std::string::size_type n = str.find('\n');n != std::string::npos;n = str.find('\n', n + 1))
> +    count++;

As with 'if' this indentation style with 'for' is not ok:
   for(...)
     count++;

Use the curlies instead:
   for(...) {
     count++;
   }

I have this argument with Karl all of the time, so perhaps others
don't share my aesthetic on whitespace, but it goes like this: if you
separate two structures with whitespace then you have to separate all
the higher-level (in the AST) structures with whitespace as well.  In
the above for-loop, you have whitespace around the '=' but none after
the semi-colon, a higher-level break.

This whole thing is so complex, I would just use newlines.

for(std::string::size_type n = str.find('\n');
   n != std::string::npos;
   n = str.find('\n', n + 1))
{
  count++;
}

> +  if(!lines) {
> +    return;
> +  }

In Oink it is ok to write such simple lines as follows:

   if (!lines) return;

> +  do {
> +    std::string::size_type pos = str.find('\n', n + 1);
> +    //str might already contain patch lines, don't reprefix those

This is a compound sentence written ungrammatically.  Use a
connective, such as 'so', or use a semi-colon because you are starting
a new sentence.  Creating new words such as 're-prefix' is fine, but
use a hyphen to help with readability.

   // Note that 'str' might already contain patch lines; be careful to
   // not re-prefix those lines.

> +    if(pos == std::string::npos) {
> +      pos = str.size();
> +    }

Again, if you like it is ok to say this in one line:

  if (pos==std::string::npos) pos = str.size();

A matter of taste.

> +    cout << "? interface class is " << interfaceClass << " in " << interfaceFile << ":"<getLine(obj->loc)<< endl;

Please be consistent: you sometimes put space around '<<' and
sometimes you do not.

> +  /*The next 2 for loops are tightly connected
> +  The first for tries to replace functions defined in the interface with their implementation. It outputs the the stuff in a
> +  way such that the last replacement member does not get flushed out right away such that all of the private members in the following
> +  for loop can be attached to it*/

Please use '//' for multi-line comments and avoid using '/* */' style
comments as much a possible.

Please do not exceed column 72; emacs will do this for you
automatically if you type 'M-q' within a comment.  Karl sometimes
exceeds column 72 but then I have to fix it.

> +  for(ASTListIterNC interfaceMemberIt(*interfaceMembers); !interfaceMemberIt.isDone(); interfaceMemberIt.adv()) {

I appreciate the long and descriptive identifier names.

Sometimes if one word is showing up very often it is helpful to follow
a practice of defining a abbreviation for it, documenting that at the
top of the file, and then using it consistently everywhere:

   // We use the following abbreviations:
   //   'iface' for 'interface'
   //   ...

Since there are few and they are consistently used, it should not
cause confusion; such an abbreviation will the code much more compact
and easier to read:

   for(ASTListIterNC ifaceMemberIt(*ifaceMembers);
       !ifaceMemberIt.isDone();
       ifaceMemberIt.adv()) {
     // ...
   }

It's a judgment call.

> +void SquashCmd::readOneArg(int &argc, char **&argv) {
> +  int old_argc = argc;
> +  OinkCmd::readOneArg(argc, argv);
> +  if (old_argc != argc) return; // the superclass read one so we don't
> +
> +  char *arg = argv[0];
> +  // please prefix the names of flags with arguments with '-d-'
> +  if (streq(arg, "-d-squash")) {
> +    shift(argc, argv);
> +    class_string = strdup(shift(argc, argv)); // NOTE: use strdup!
> +    return;
> +  }

The basic idea is that each utility program in Oink shares the common
oink flags as well as having its own flags.  The way I want to do this
is that every utility has a prefix for its flags; in an effort to make
this as short as possible, I am trying to have each utility have its
own letter; eventually we may run out of letters, but you get the
idea.

The letter prefix 'd' is for dfgprint, but as you can see I neglected
to change it for staticprint.

cd ~/oink-stack2/oink/
grep -n -e '-d-' *.cc *.h /dev/null
dfgprint_cmd.cc:21:  // please prefix the names of flags with arguments with '-d-'
dfgprint_cmd.cc:22:  if (streq(arg, "-d-ben-string")) {
dfgprint_cmd.cc:47:     "  -d-ben-string      : set Ben's string\n"
staticprint_cmd.cc:22:  // please prefix the names of flags with arguments with '-d-'
staticprint_cmd.cc:23:  if (streq(arg, "-d-ben-string")) {
staticprint_cmd.cc:50:     "  -d-ben-string      : set Ben's string\n"

Anyway, choose a unique prefix for squash flags, such as 'sq' and use
that.

> +}
> +
> +void SquashCmd::printHelp() {
> +  OinkCmd::printHelp();
> +  printf
> +    ("squash flags that take an argument:\n"
> +     "  -d-squash          : move the class members up to the first base class\n"
> +     "");
> +}
Oink:

CQual++:

Syntax Commentary 1