[dcc2] Multi file sends
Riley White
riley at algenta.com
Mon Apr 26 22:41:23 EDT 2004
On Apr 26, 2004, at 6:46 PM, <codemstr at ptdprolog.net> wrote:
> Well I've never heard of a strtok() exploit, I can't say that it
> doesn't
> exist, but I can't in my mind imagine how it would. Yes, people will
> try and
> find exploits in a token system as well. But, the parser will be far
> less
> complex and therefore should have less problems. I can think of tons of
> different things I'd try on a home built XML parser, and I'm sure some
> of
> them would indeed cause problems.
I believe that you are referring to my suggestion. You are certainly
correct to worry about security on a home-built XML parser, and I would
never recommend rolling your own full XML parser. What I would suggest
is forgetting that the format is XML and just thinking of it as a
string in a known format, failing if it strays from that format. Like I
said, I've implemented this kind of thing before, and it's not
difficult. It's not as robust as using a full XML parsing library since
anything straying from valid syntax will cause the code to fail, but
that's all that really needs to happen.
> Those are just some. An XML library would of course deal with all of
> those.
> But if someone is writing their own XML parser (as was suggested),
> then they
> have to deal with all those things. With a token based system, you
> just need
> to deal with missing/extra parameters, missing/extra quotes, and
> multiple
> spaces (maybe a couple others I forgot, but that's the majority of
> them).
> And, many of them can already be dealt with using the existing strtok()
> function. The only one that would need to be hand coded would be quote
> handling. With XML, there could be hundreds of potential illegal
> inputs that
> people try.
If the parser was limited to recognizing a string in a particular
format which also happens to be recognizable by XML parsers, I don't
believe this will be such an issue. There are bound to be some people
who see XML and decide to write a generic parser, but I think that
experienced developers know better than to try that.
> And, as for using a library, libxml is a rather large library. If
> I remember, the rule of thumb is 1 bug per 1000 lines of code. Well
> the XML
> parsing libraries have several thousand lines of code. A token based
> parser
> might have 1-2 hundred. Just from the sheer size of the XML parsing
> libraries, you are bound to have more bugs.
libxml is quite mature, and I would not worry about depending on it.
Like anything else, we have to keep an eye out for exploits that are
found, but it's pointless to avoid using something simply because it
might contain errors that might be exploitable.
I'd like to just go on record as saying that I am not worried about
trying to force developers to use parsing techniques that minimize the
possibility of errors. I believe that this discussion is going nowhere,
and I think that continuing in the direction we are headed will cause
problems. I agree that a tokenized string format is preferable if it
can encompass all the needed data without turning into spaghetti.
However, using XML certainly does not strike any kind of fear into my
heart, and I think it's preferable to overly complicated tokenized
strings.
I suggest that we begin with the assumption that we will use tokenized
strings and try to find problem situations that will arise. If we can
handle every problem situation in the context of tokenized strings,
then we should just stick with that. Otherwise we should consider using
XML. Maybe somebody can begin by suggesting a basic syntax for a
tokenized string solution, and the rest of us will try to find
situations where it won't work, and we'll go from there.
--Riley
More information about the dcc2
mailing list