[dcc2] Multi file sends

Riley White riley at algenta.com
Mon Apr 26 22:41:23 EDT 2004


On Apr 26, 2004, at 6:46 PM, <codemstr at ptdprolog.net> wrote:
> Well I've never heard of a strtok() exploit, I can't say that it 
> doesn't
> exist, but I can't in my mind imagine how it would. Yes, people will 
> try and
> find exploits in a token system as well. But, the parser will be far 
> less
> complex and therefore should have less problems. I can think of tons of
> different things I'd try on a home built XML parser, and I'm sure some 
> of
> them would indeed cause problems.

I believe that you are referring to my suggestion. You are certainly 
correct to worry about security on a home-built XML parser, and I would 
never recommend rolling your own full XML parser. What I would suggest 
is forgetting that the format is XML and just thinking of it as a 
string in a known format, failing if it strays from that format. Like I 
said, I've implemented this kind of thing before, and it's not 
difficult. It's not as robust as using a full XML parsing library since 
anything straying from valid syntax will cause the code to fail, but 
that's all that really needs to happen.

> Those are just some. An XML library would of course deal with all of 
> those.
> But if someone is writing their own XML parser (as was suggested), 
> then they
> have to deal with all those things. With a token based system, you 
> just need
> to deal with missing/extra parameters, missing/extra quotes, and 
> multiple
> spaces (maybe a couple others I forgot, but that's the majority of 
> them).
> And, many of them can already be dealt with using the existing strtok()
> function. The only one that would need to be hand coded would be quote
> handling. With XML, there could be hundreds of potential illegal 
> inputs that
> people try.

If the parser was limited to recognizing a string in a particular 
format which also happens to be recognizable by XML parsers, I don't 
believe this will be such an issue. There are bound to be some people 
who see XML and decide to write a generic parser, but I think that 
experienced developers know better than to try that.

> And, as for using a library, libxml is a rather large library. If
> I remember, the rule of thumb is 1 bug per 1000 lines of code. Well 
> the XML
> parsing libraries have several thousand lines of code. A token based 
> parser
> might have 1-2 hundred. Just from the sheer size of the XML parsing
> libraries, you are bound to have more bugs.

libxml is quite mature, and I would not worry about depending on it. 
Like anything else, we have to keep an eye out for exploits that are 
found, but it's pointless to avoid using something simply because it 
might contain errors that might be exploitable.

I'd like to just go on record as saying that I am not worried about 
trying to force developers to use parsing techniques that minimize the 
possibility of errors. I believe that this discussion is going nowhere, 
and I think that continuing in the direction we are headed will cause 
problems. I agree that a tokenized string format is preferable if it 
can encompass all the needed data without turning into spaghetti. 
However, using XML certainly does not strike any kind of fear into my 
heart, and I think it's preferable to overly complicated tokenized 
strings.

I suggest that we begin with the assumption that we will use tokenized 
strings and try to find problem situations that will arise. If we can 
handle every problem situation in the context of tokenized strings, 
then we should just stick with that. Otherwise we should consider using 
XML. Maybe somebody can begin by suggesting a basic syntax for a 
tokenized string solution, and the rest of us will try to find 
situations where it won't work, and we'll go from there.

--Riley



More information about the dcc2 mailing list