[dcc2] Multi file sends
codemstr at ptdprolog.net
codemstr at ptdprolog.net
Mon Apr 26 23:01:32 EDT 2004
Riley White <riley at algenta.com> said:
> On Apr 26, 2004, at 6:46 PM, <codemstr at ptdprolog.net> wrote:
> > Well I've never heard of a strtok() exploit, I can't say that it
> > doesn't
> > exist, but I can't in my mind imagine how it would. Yes, people will
> > try and
> > find exploits in a token system as well. But, the parser will be far
> > less
> > complex and therefore should have less problems. I can think of tons of
> > different things I'd try on a home built XML parser, and I'm sure some
> > of
> > them would indeed cause problems.
>
> I believe that you are referring to my suggestion. You are certainly
> correct to worry about security on a home-built XML parser, and I would
> never recommend rolling your own full XML parser. What I would suggest
> is forgetting that the format is XML and just thinking of it as a
> string in a known format, failing if it strays from that format. Like I
> said, I've implemented this kind of thing before, and it's not
> difficult. It's not as robust as using a full XML parsing library since
> anything straying from valid syntax will cause the code to fail, but
> that's all that really needs to happen.
Yes, I agree. However, the spec can't say it is XML then, because it is not.
Things that would be valid will be detected as invalid by the pseudo-XML-
subset parser. Meaning if the parser says "beginning and end tag must be on
the same line" well then:
<name>
test
</name>
Becomes invalid. But as far as XML is concerned, it is perfectly valid. So
you can't call it XML anymore. I mean to the average person, it is obviously
still XML-like, but in a formal draft, you can't call it something that it
isn't. It will lead to ambiguity.
I like the idea someone suggested of using the same format as the single-file
transfers. Why implement 2 formats? Then the argument of "token parsers can
have bugs too" is void. Because the token parser has to be there anyway to
support single-file transfers. So regardless of whether a token parser is
used for multi-file, the bugs will be present. However, introducing XML
parsing means there are now two parsing engines in which bugs can exist, not
just one.
-- codemastr
>
> > Those are just some. An XML library would of course deal with all of
> > those.
> > But if someone is writing their own XML parser (as was suggested),
> > then they
> > have to deal with all those things. With a token based system, you
> > just need
> > to deal with missing/extra parameters, missing/extra quotes, and
> > multiple
> > spaces (maybe a couple others I forgot, but that's the majority of
> > them).
> > And, many of them can already be dealt with using the existing strtok()
> > function. The only one that would need to be hand coded would be quote
> > handling. With XML, there could be hundreds of potential illegal
> > inputs that
> > people try.
>
> If the parser was limited to recognizing a string in a particular
> format which also happens to be recognizable by XML parsers, I don't
> believe this will be such an issue. There are bound to be some people
> who see XML and decide to write a generic parser, but I think that
> experienced developers know better than to try that.
>
> > And, as for using a library, libxml is a rather large library. If
> > I remember, the rule of thumb is 1 bug per 1000 lines of code. Well
> > the XML
> > parsing libraries have several thousand lines of code. A token based
> > parser
> > might have 1-2 hundred. Just from the sheer size of the XML parsing
> > libraries, you are bound to have more bugs.
>
> libxml is quite mature, and I would not worry about depending on it.
> Like anything else, we have to keep an eye out for exploits that are
> found, but it's pointless to avoid using something simply because it
> might contain errors that might be exploitable.
>
> I'd like to just go on record as saying that I am not worried about
> trying to force developers to use parsing techniques that minimize the
> possibility of errors. I believe that this discussion is going nowhere,
> and I think that continuing in the direction we are headed will cause
> problems. I agree that a tokenized string format is preferable if it
> can encompass all the needed data without turning into spaghetti.
> However, using XML certainly does not strike any kind of fear into my
> heart, and I think it's preferable to overly complicated tokenized
> strings.
>
> I suggest that we begin with the assumption that we will use tokenized
> strings and try to find problem situations that will arise. If we can
> handle every problem situation in the context of tokenized strings,
> then we should just stick with that. Otherwise we should consider using
> XML. Maybe somebody can begin by suggesting a basic syntax for a
> tokenized string solution, and the rest of us will try to find
> situations where it won't work, and we'll go from there.
>
> --Riley
>
> _______________________________________________
> dcc2 mailing list
> dcc2 at dcc2.org
> http://six.pairlist.net/mailman/listinfo/dcc2
>
--
More information about the dcc2
mailing list