RE: sed and matching braces in the source string

9 Aug 2012

      ...
Warning: Simplest method presented last.
On 09.08.12 14:13, Trent W. Buck wrote:
...
Jason White wrote:
...
James Harper <james.harper@bendigoit.com.au> wrote:
...
The problem is that my sed script says to start at the "(" and then
read up until a ")", but I really mean to say read up until a
matching ")". Can I do this with sed or should I be using something
else?
...
...
Thus you cannot use a regular expression, but you can use an parser
that accepts some subset of CFGs, such as LALR (e.g. yacc) or LL(k)
(e.g. parsec).
Just modelling the mechanics mentally, it seems that it might be quicker to
implement with just a lexer (e.g. lex) to pick out the open and closing braces,
then increment/decrement a counter (in the lexer) till a nesting level match
is detected, signalling the end of the token.
Hacking the other text rearrangements in C, either in the lexer or in a
grammar, can be a bit clumsy, though that is moderated by suitable lexer
regexes, possibly with the help of lexer states.
However, if the text really is as presented, then KISS ought to do it.
In awk, the line is by default seen as space-separated fields, so
newid() or (newid()) or ((newid(Ooh)de)elephants!) is always detected as
one field, making braces nesting irrelevant.
If at some stage, input text with random spaces, e.g. "(newid ( ) )" is
encountered, then it is a simple matter to add a line or two of prefiltering to
the awk script, to effect repair. These operations on the line would cause the
fields to be automatically re-evaluated, allowing the rearrangements to then
be made on complete fields. (As I see it, repair merely involves detection of /
+[)(]/ , and elision of the spaces; / +/ )
And if at some stage, arithmetic expressions with spaces crop up, then
detecting something along the lines of / +[)(0-9*/+-]/ might cover that use
case, still without having to delve into grammars or even lexer gymnastics.
The problem looks like fun. :-)
I've solved it for now within the limited scope of my problem, which are:

"DEFAULT (newid()) "
"DEFAULT ('some string') "

The first never has any spaces, so I can do " *\([^ ]*\) *"

The second never has any braces, so I can do " *\(('[^']')\) *" (or something like that)

So it's working now and will get me through this conversion, even if it's a bit fragile for general use.

Thanks for the suggestions!

James