Greedy versus non-greedy repeat (was: Minor regexp oversight for
	setext headings)
    Allan Odgaard 
    29mtuz102 at sneakemail.com
       
    Mon Oct  9 12:44:01 EDT 2006
    
    
  
Greedy matching is generally more efficient (so it’s a good idea to  
use it when possible), but the reason it is faster is because of  
changed semantics.
Take this string:
    foo...«lots of text»...bar...«lots more text»...end
If we match it against ‘foo.*bar’ the regexp engine will first match  
‘foo’, then it will match to the end of the string and look for ‘bar’  
there, if there is no match, it will backtrack (i.e. going back one  
character), it will continue until it finds ‘bar’.
So if the string is 1,000 characters long and ‘bar’ is in the middle,  
we make around 1,500 compares.
If instead we search for ‘foo.*?bar’ it is non-greedy, basically  
meaning it will let ‘.*?’ match as few characters as possible (0)  
then look for ‘bar’, if that didn’t work, it will match another  
character, etc. until it finds ‘bar’. So here we only match around  
500 characters.
The problem is of course if the string contains ‘bar’ in two  
locations, then the two patterns will match a sequence of different  
length.
So the OP is right in that non-greedy matching is generally faster  
and should be preferred, but the user should also know why, otherwise  
he might get a wrong match -- coincidentally a common newbie mistake  
is actually the opposite, like doing: ‘<.*>’ to match the contents of  
a tag -- this should be ‘<.*?>’ (or <[^>]*>) to handle the case where  
the line contains multiple tags -- and coincidentally this was also  
sort of the problem with the Markdown heading pattern (‘.*’ matching  
too much).
On 9. Oct 2006, at 16:07, Angie Ahl wrote:
> Oi.
>
> All due respect,
>
> If you're going to correct/flame people publicly at least have to  
> politeness to point out/qualify what you're saying eg chapter or  
> page number otherwise you come across as an ignorant troll with no  
> social skills.
>
> Unless of course you are an ignorant troll in which case please  
> disregard this message. I was under the impression you weren't but  
> I could be wrong.
>
> And I've read the book about 5 times by now so I'd really  
> appreciate you qualifying what you say to the list for the benefit  
> of the list as I'm pretty sure you're mistaken.
>
> Angie
>
> On 7 Oct 2006, at 23:21, A. Pagaltzis wrote:
>
>> * Angie Ahl <alists at vertebrate.co.uk> [2006-10-07 22:35]:
>>> Not sure that it would but I've read in several places that non
>>> greedy should be used whereever it can as it's more efficient
>>> for the regex parser and processor. Don't have any benchmarks
>>> to back this up but I'm sure it says so in the mastering regex
>>> book.
>>
>> You are mistaken on all counts. Please don’t parrot silly cargo
>> cult advice. Read the book you mention instead of assuming what
>> it says.
>>
>> Regards,
>> -- 
>> Aristotle Pagaltzis // <http://plasmasturm.org/>
    
    
More information about the Markdown-Discuss
mailing list