• Welcome to Theos PowerBasic Museum 2017.

News:

Attachments are only available to registered users.
Please register using your full, real name.

Main Menu

Code-Formatter PB 10

Started by Theo Gottwald, January 18, 2011, 07:58:28 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Peter Weis

Paul,

I'll try it once promised the next day!

regards Peter

Paul Elliott

Theo,

You're the one going off target of code. Check most of your "comments" when you
bullied me into posting my code.

As near as I can tell ( after months of testing ), my code is bug free.

When I commented about the problems with other versions, you kept yelling at me.
Too bad there is not a "system monitor" around to keep you honest.

Paul Elliott

By the way, how can a Macro confuse a reformatter program?
There is always the possibility of altering the spaces within a Macro to the point
that the compiler won't produce what was originally intended (unless you bypass
formatting Macros ). but the formatter program isn't really doing anything with
the Macro definition nor with the macro within the regular code.


Peter Weis

#273
Hi Paul,

I have now used the code formatter found the following error!
See that the #IF has been removed and some lines are now doubled.

Therefrom.

TYPE FILE_NOTIFY_INFORMATIONX
    NextEntryOffset AS LONG
    Action          AS LONG
    FileNameLength  AS LONG
    #IF %DEF(%WIDE_UNICODE)
        Filename        AS WSTRINGZ * 32000
    #ELSE
        Filename        AS WSTRINGZ * 1000
    #ENDIF
END TYPE   

Is that!


TYPE FILE_NOTIFY_INFORMATIONX
    NextEntryOffset    AS LONG
    Action             AS LONG
    FileNameLength     AS LONG
    FileNameLength     AS LONG
    Filename           AS WSTRINGZ * 32000
    Filename           AS WSTRINGZ * 32000
    Filename           AS WSTRINGZ * 1000
    Filename           AS WSTRINGZ * 1000
END TYPE
                   


regards Peter

In short: We're not yet ready with this.

Larry Charlton

I'm not sure what the goals of the project are, but here's a valid source from hell.  A couple of things I've discovered in my own wanderings on this topic is A) It's not possible to indent all source files correctly.  Consider the case where a single file is included at two different points each of which have different indent values.  It is possible to format source streams correctly. B) It's not possible to correctly indent/format a single source file, consider structures or functions that start in one file and end in another either by being included or by running off the end of an included file. C) It's not possible to format code by looking at a physical line, instead a logical line needs to be evaluated.  D) Some formatting probabbly needs some human intervention, consider aligning Dim's, =, Multiple single line statements, one parameter per line, etc.  Somethings humans look at and say, this would look best this way (maybe compact), others they look at and think, this would be better aligned.  Sometimes it has to do with context or purpose of the statements.  So in the long run, it would be nice if automatic formatting could be applied and then a human could change localized formatting when browsing through the file easily. and finally F) Not all whitespace is created equal.  It can be spaces, tabs, continuations, cr's, lf's, and in some cases missing. 

As side effect of these considerations I decided tokenizing the input stream was preferable to other approaches.

I've included one possible correct indented example.  My formatter is not there yet either, I need to fix statements and routines that start with continuation lines and AsmData blocks.  It would probabbly be useful to the project to maintain a sample of all valid quirks and then running the app against the quirks file.  The included file is not a sample of all oddities in the PB language but some things it took me a while to get right. 

Hope it helps, and keep up the great work.

Theo Gottwald

I guess we would need somebody to collect versions here and bring things together, we need a project Master.
For the code-Formatter. Anybody interested to take that hat?

With the right to test, collect codes and always post the last version?
if so mail me ...

Paul Elliott

Larry C.,

I agree that it may not be possible to please everybody at all times.
Frankly I used your GDIPlus program ( combined with all its Includes ) as a major test
case for my version. I learned a LOT about the COM parts/formatting from your code.
But found that it needed ALL the code in 1 file to make any sense for formatting.
You might consider putting all that code into 1 file & running it thru the latest formatter
and seeing if there are any parts that need tweaking.

Also found a problem in another program I'm working on by using your code. I have to
flag Type/End Type lines because you had an entry of argb in a Type and a Function ARGB()
that was giving me fits.

Will check out this latest code.

Thanks.


Larry Charlton

Here's a quirks file with a few more quirks.  It's a single compilable source file (doesn't do anything).  Another source that suprised me early on was zxRef by Patrice.  Had a few interesting things in there I hadn't seen before.  Quirks2.bas is my attempt at formatting.  Was interesting to note how you handle Try Catch blocks differently than I do, might have to change the way I indent them :)

Paul Elliott

Larry,

Ouch! Your quirks files are problems for the program ( if Split Variables option checked ).

Not really sure how to handle continuation lines in a Type first line or the End Type/Union.
Nor continuation lines that are nothing but continuations. Formatting that cross over
multiple continued lines ( especially that breaks up keyword pairs ) is tricky.

I've attached the output from another program that shows continued lines as 1 line.
It does no formatting but does interpret #IF/#ENDIF and handles #Include files ( even
nested ones ). But would be tricky to recreate all sources with formatting to make sure
that any #Includes have backups ( would proabably be easiest to create a new output
directory ). It already creates an output file with all sources and would just need to break
out the different files.

Haven't looked at zxRef yet.


Paul Elliott

#279
Larry,

Most of the problems with your quirks programs are because the formatter hasn't found
the end of the Type structure. It doesn't handle splitting the END TYPE onto 2 lines.

How would you like them indented? Especially if you add 2 or 3 continuation lines between
them. It probably could be done by first combining all continuation lines ( & removing the
parts from consideration ) then doing the formatting. But that would throw off all the
regular lines that splitting doesn't mess up. Might be able to just do it for the Type & Union
structures. Then again it would have to process any code-pair that started with END.

This needs way more thought.

Didn't see anything really strange after formatting zXRef.  Was there something you saw
that I missed?






Larry Charlton

Probabbly not in your code, but there were two things that were unusual after I'd run through a lot of other source code without problems.  I think one of them was:
IF nRemaining = 0 GOTO BufEnd
Hadn't seen that in a loooong time, don't recall the other, but it's what started me looking at the language spec instead of just opc.

Really the line continuation thing is weird.  I think as long as parameters are working in 99.999% of the code you won't see any of those.  I just opted to try and understand the language, block structures, and indenting and implement based on variations I could think of.  It's when I realized there were impossible indenting scenarios.  I think what you have is a valid formatter, might be just a matter of noting things like: x or y not implemented.  Someone else could always do them later if they felt the need.

Anyway the rule I used was, the first continuation _ adds an additional indent to subsequent lines.  When you reach a line without a continuation _ and you previously had one, remove an indent.  Note I said adds, because if you're doing a block sub for example, I also add an indent after emitting the sub if it's not a single line definition (for the body).  The net result is single line constructs got indented once, block continuations such as parameters get two indents and then pop's back out an indent when the body starts up and then again at the end of the sub if that made any sense.

Paul Elliott

Ok, I think I understood that about the continuation lines but am not sure.
Kinda busy figuring out some other problem at the moment tho.

Does it currently work the way you want? Except for splitting keyword pairs, I mean.
Personally I don't want to handle keyword pairs on different lines. Just looks too weird
and raises questions of how to output it. You haven't said how you want it output yet.
Should I ALWAYS combine split lines with keyword pairs? that will involve extra checking.

I definitely need to work on those single-line functions/subs. Need to check on having
beginning & end pairs without extra routines within them. also then need to allow for
multiple single-line routines on a single line followed by the beginning of another routine
with the end on a subsequent line.

Also need to do something about the single-line IF with GOTO without THEN.
got the same problem in another program I'm working on.


Larry Charlton

I'm not sure what you mean.  So far it's working as expected except for when things start with continuations.

Currently the indenting isn't messing with formatting.  I treat split keywords like anything else with a continuation.  i.e. the second word will be indented on the next line.

Paul Elliott

Larry,

For my version, that 2nd part is what causes problems. There are a couple special routines
that work on blocks of code ( types & unions & variable definitions & macros are some ).
once the program starts processing a block it needs to find the end of the block so that
it can go back to normal formatting. but it needs the ending keywords to be on the same
line. parts of the program work with the original method of breaking a line into individual
pieces and other parts work on the line via instr or mid. I use the method that makes it
easiest to do the work.


Theo Gottwald

Peter told me yesterday that there are things that do not yet work. Maybe try his Library code from the other post. He said that after formatting the code woild not be usable anymore.

Means to me: "There is quite something to do left".