Syntax Commentary 4

>  void GrammarLexer::popRecursiveFile()
>  {
> -  trace("lex") << "done processing " <<     
> -    sourceLocManager->getFile(fileState.loc) << endl;
> +//  trace("lex") << "done processing " <<     
> +//    sourceLocManager->getFile(fileState.loc) << endl;

Why did you comment this out?  Scott likes his tracing flags.

> +  // create a new fragment to continue the old file with
> +  fileState.frag = sourceLocManager->createFileFragment(
> +                       fileState.frag->fileInfo, loc);

You know, perhaps Scott formats code like that sometimes, but it
fights with emacs and again the break of the parenthesized part over
the line is strange.  Please format such thigns like this:

  fileState.frag = sourceLocManager->
    createFileFragment(fileState.frag->fileInfo, loc);

The newline break is at a much higher level in the AST.

>    // called when a newline is encountered
>    void newLine()
> -    { fileState.loc = sourceLocManager->advLine(fileState.loc); }
> +    { fileState.loc = sourceLocManager->advLine(fileState.loc);
> +      fileState.frag->updateLineInfo(fileState.loc - lineStartLoc);
> +      lineStartLoc = fileState.loc; }

When the block has gotten longer than one line, it is best to just
format it in a more standard way like this:

    void newLine() {
      fileState.loc = sourceLocManager->advLine(fileState.loc);
      fileState.frag->updateLineInfo(fileState.loc - lineStartLoc);
      lineStartLoc = fileState.loc;
    }

>  // baselexer.cc            see license.txt for copyright and terms of use
>  // code for baselexer.h
>  
> +// Portions of this file have been rewritten by Leo Savernik for
> +// Comneon electronics technology GmbH & Co. OHG, 4040 Linz, Austria, Europe.
> +// Comneon electronics technology GmbH & Co. OHG generously donated the
> +// changes for release into the public domain, as applicable by US
> +// law. This file is re-released under the Oink license (see License.txt).
> +

I think this would be best put in the checkin comment and perhaps on
the web page with a reference to the version number.  You are
contributing to many files and that method would enusre a more exact
record of what exactly you are contirbuting.  Many files will be
touched by many people, so these messages could get confusing if
associated with single files.  What do you think?

> +  for (; p < endp; p++, curLoc++) {
> +    if (*p == '\n')
> +      markStartOfNewLine(SourceLoc(curLoc));

Man, this naked-then-clause dies hard, doesn't it?

Ok:
    if (*p == '\n') markStartOfNewLine(SourceLoc(curLoc));

Ok:
    if (*p == '\n') {
      markStartOfNewLine(SourceLoc(curLoc));
    }

Not ok:
    if (*p == '\n')
      markStartOfNewLine(SourceLoc(curLoc));

>  // lexer object
>  class BaseLexer : public yyFlexLexer, public LexerInterface {
>  protected:  // data
> -  istream *inputStream;            // (owner) file from which we're reading
> -  SourceLocManager::File *srcFile; // (serf) contains the hash map we update
> +  // state of a file we were or are lexing
> +  struct FileState {

I don't think I've said this before but please put a blank line at the
start of an inner class.

> +    istream *source;               // (owner) source stream
> +    yy_buffer_state *bufstate;     // (serf) flex's internal buffer state
> +    SourceLocManager::FileFragment *frag; // (serf) file fragment being updated
> +    StringRef filename;            // physical file name or 
> +    int curLine;                   // current source line number; needed for new fragments
> +    bool tokenSeen;                // true as soon as we saw a non-comment token
> +    FileStateExtension *ext;       // extension for language specific file state

Commenting the meaning of every state variable: very good; Scott likes
that too.

> +  public:
> +    FileState(StringRef filename, istream *source);
> +    ~FileState();
> +
> +//    FileState(FileState const &obj);
> +//    FileState& operator= (FileState const &obj);

Is this because you want the compiler to generate them?  I think
that's what you said at the definition site for these that are now
commented out.

I think it would be best to just delete the definitions if you think
the compiler is going to generate the right copy ctor, and then leave
the declarations commented out but bring Scott's attention to them
with a comment above them:

    // The copy ctor and operator= have identical samantics to the
    // compiler-generated one, so we omit them; Scott, feel free to remove
    // this comment.
    //    FileState(FileState const &obj);
    //    FileState& operator= (FileState const &obj);

> +  void markStartOfNewLine(SourceLoc curLoc) {
> +    fileState.frag->updateLineInfo(curLoc + 1 - lineLoc);
> +    fileState.curLine++;

My nitpick: please prefer the prefix '++' to the postfix when only a
side-effect is desired.

> +      SourceLocManager::FileFragmentList &fragments = sourceLocManager->serializationOnly_fragments();

102 characters on that line: too long

>  # Some containers; I no longer care about order
> -#   ObjList
> +#   ObjArrayStack

The XML is only supposed to refer to generic containers, such as
Lists, String-to-Object maps, and Object-to-Object maps; we should not
refere to smbase-specific classes such as ObjArrayStack.  Please give
it a name that has a generic function and matches its semantics, not
its implementation.

>  List_files
> +List_fragments
>  
>  # FIX: are these in File or somewhere else?
>  loc
> Index: elsa/xml_file_reader.cc
> ===================================================================
> RCS file: /leo/uni/doktorarbeit/comneon/firmenrechner/rep/elsa/elsa/xml_file_reader.cc,v
> retrieving revision 1.1.1.1
> diff -u -p -r1.1.1.1 xml_file_reader.cc
> --- elsa/xml_file_reader.cc   2006/11/23 20:55:34     1.1.1.1
> +++ elsa/xml_file_reader.cc   2006/12/17 22:27:12
> @@ -1,8 +1,8 @@
>  // xml_file_reader.cc            see license.txt for copyright and terms of use
> +// largely rewritten by Leo Savernik to accomodate the new srcloc impl.

If you make such a comment, please put a date with it as well:

    // 2006-12-17 ls: largely re-written...

> -int SourceLocManager::File::lineLengthSum() const
> +void SourceLocManager::FileFragment::updateLineInfoSection(int lineLen)
>  {
> +  xassert(beingProcessed);
> +
> +  // FIXME: shouldn't the -1 go away and be issued in the caller
> +  // updateLineInfo?
> +  addLineLength(lineLengths, lineLen - 1);
> +  numChars += lineLen;

What do you plan on doing about such remaining questions?  Fixing them
before the final submission, or just leaving them there?

> +// allocates another new fragment to the given file
> +SourceLocManager::FileFragment *SourceLocManager::createFileFragment(FileInfo *fi, int lineno, int colofs)

Line too long.

> +  if (!this) {
> +    // it's quite common to forget to do this, and this function is 
> +    // almost always the one which segfaults in that case, so I'll
> +    // make the error message a bit nicer to save a trip through
> +    // the debugger
> +    xfailure("you have to create a SourceLocManager in your main() function");
> +  }

If this occurs often, please factor it out into a little function for
checking it.

> +  for (FileFragment *frag = f->firstFragment(); frag; frag = frag->next()) {
> +//     cerr << "frag " << frag << " in " << frag->fileInfo->name.c_str() << ": range " << frag->charOffset << " - " << (frag->charOffset + frag->numChars) << " P " << frag->isBeingProcessed() << endl;

Line too long.  Just split into multiple lines and use M-x
comment-region to comment and un-comment it when needed.

> +    if (frag->charOffset <= charOffset
> +        && (frag->isBeingProcessed()
> +            || charOffset < frag->charOffset + frag->numChars)) {

This is reasonable.  I had a thought though: it seems almost as if
there should be a method on Fragments called withinRange(offset) or
something, like this:

   // COMMENT HERE
    bool withinRange(int offset) {
      bool offsetAbove = frag->charOffset <= charOffset;
      bool offsetBelow = charOffset < frag->charOffset + frag->numChars;
      return offsetAbove && (frag->isBeingProcessed() || offsetBelow);
    }

Such a method should be written only if there is a clean way to write
the missing comment such that in a *bottom* *up* fashion (refering
only to the fragment and the offset) it makes a semantically clean
sentence.  If it weren't for the frag->isBeingProcessed() this should
be easy, but since I don't know what frag->isBeingProcessed() means,
such a clean sentence might not exist.  If it does not exist, don't do
the refactoring.

> +  xfailure("charOffset not contained in any fragment. Have you parsed the file already?");

Two spaces after a period in English prose.  Your English is pretty
good though, better than my German that's for sure :-) !

> +  for (FileFragment *frag = f->firstFragment(); frag; frag = frag->next()) {
> +    // is line contained within this fragment (lower boundary)
> +    if (frag->lineOffset < line
> +        // is line contained within this fragmen (upper boundary); if
> +        // lastNumCols is greater than 0, regard last line, too
> +        && line <= frag->lineOffset + frag->numLines + !!frag->lastNumCols
> +        // check whether column number is located within this fragment
> +        && (!frag->lastNumCols || line < frag->lineOffset + frag->numLines + 1
> +            // if it is the last line; in this case also take care of
> +            // fragments starting and ending on the same line
> +            || col <= frag->lastNumCols +
> +               (line == frag->lineOffset + 1
> +                ? frag->firstColOfs
> +                : 0
> +               )
> +           )
> +       ) {

The above is formatted pretty well for how complex it is.  I would
only suggest that when the predicate is so long, it is easier to see
the separation from the 'then' clause if the curly goes on its own
line.  That is, instead of this:

    if (long() &&
        predicate &&
        here()
       ) {
      do_stuff();
      more_stuff();
    }

write this:

    if (long() &&
        predicate &&
        here()
       )
    {
      do_stuff();
      more_stuff();
    }

> +  FOREACH_OBJARRAYSTACK_NC(FileFragment, fragments, iter) {
> +    FileFragment *frag = iter.data();
> +    if (line > globLine
> +        && (frag->isBeingProcessed()
> +            || line <= globLine + frag->numLines)) {

Again, put the curly on its own line.

> @@ -951,8 +931,19 @@ void testRoundTrip(SourceLoc loc)
>  {
>    char const *fname;
>    int line, col;

I would still rather you just did the verbose and clearer thing:

    int line;
    int col;

Multiple Declarators per Declaration is a bug in C and C++.

> +
> +  // DEBUG: quick check whether roundtrip is ok for an arbitrary location
> +  if (loc==10390) {
> +    cerr << __FUNCTION__ << ": loc(" << int(loc) << ") ";
> +  }

> +  // DEBUG: quick check whether roundtrip is ok for an arbitrary location
> +  if (loc==10390) {
> +    cerr << fname << ":" << line << ":" << col << endl;
> +  }

Nice!  Much clearer when debugging code is delimited clearly as such.

> +// calculates line information in one go. It also returns an alternative
> +// line offsets array for checking against correct calculation of
> +// line offsets.

This style is ok for short prose; 1) NO capitalizations, 2) sentences
SEPERATED by a SEMI-COLON and ONE space:

    // calculates line information in one go; it also returns an
    // alternative line offsets array for checking against correct
    // calculation of line offsets

This style is also ok and prefered for longer prose; 1) first word of
sentence CAPITALIZED, 2) sentences TERMINATED (not just separated) by
a PERIOD and TWO spaces (except last sentence needs no spaces after
it):

    // Calculates line information in one go.  It also returns an
    // alternative line offsets array for checking against correct
    // calculation of line offsets.

But please do not mix styles.  Further, in emacs please use M-q to
format comments just as you can with paragraphs in fundamental and
text mode.

> +  private:   // data
> +    // pointer to first fragment belonging to this file
> +    FileFragment *first;                     // (serf)
> +
> +    // most recently allocated FileFragment belonging to this file
> +    FileFragment *mraFragment;               // (serf)
> +  private:   // funcs
> +    FileInfo();                              // serial only
> +    FileInfo(FileInfo&);                     // disallowed
> +
> +  public:    // funcs
> +    FileInfo(char const *name);
> +    ~FileInfo();

Please put a blank line between the data members and the ctors/dtors
when the class is large enough to warrent blank lines between the data
members themselves.

For completeness, also put a blank line between the ctor/dtor section
and the methods, but you did that:

> +
> +    // first fragment belong to this file or 0 if none
> +    FileFragment *firstFragment() const { return first; }
> +
> +    FileFragment *&serializationOnly_first() { return first; }
> +    FileFragment *&serializationOnly_last() { return mraFragment; }
> +    bool serializationOnly_isComplete() { return !name.empty() && numChars>=0 && numLines>=0; }
> +    void serializationOnly_postProcess();
> +    static FileInfo *serializationOnly_new() { return new FileInfo; }
> +
> +    friend class FileFragment;
> +    friend class SourceLocManager;

In general it seems that we try to put friends in the ctor/dtor
section, above the first ctor, but this would be hard to discern from
our code and I think I need to fix some places where I violate it.