[dcc2] Multi file sends

codemstr at ptdprolog.net codemstr at ptdprolog.net
Mon Apr 26 23:01:32 EDT 2004


Riley White <riley at algenta.com> said:

> On Apr 26, 2004, at 6:46 PM, <codemstr at ptdprolog.net> wrote:
> > Well I've never heard of a strtok() exploit, I can't say that it 
> > doesn't
> > exist, but I can't in my mind imagine how it would. Yes, people will 
> > try and
> > find exploits in a token system as well. But, the parser will be far 
> > less
> > complex and therefore should have less problems. I can think of tons of
> > different things I'd try on a home built XML parser, and I'm sure some 
> > of
> > them would indeed cause problems.
> 
> I believe that you are referring to my suggestion. You are certainly 
> correct to worry about security on a home-built XML parser, and I would 
> never recommend rolling your own full XML parser. What I would suggest 
> is forgetting that the format is XML and just thinking of it as a 
> string in a known format, failing if it strays from that format. Like I 
> said, I've implemented this kind of thing before, and it's not 
> difficult. It's not as robust as using a full XML parsing library since 
> anything straying from valid syntax will cause the code to fail, but 
> that's all that really needs to happen.

Yes, I agree. However, the spec can't say it is XML then, because it is not. 
Things that would be valid will be detected as invalid by the pseudo-XML-
subset parser. Meaning if the parser says "beginning and end tag must be on 
the same line" well then:
<name>
test
</name>

Becomes invalid. But as far as XML is concerned, it is perfectly valid. So 
you can't call it XML anymore. I mean to the average person, it is obviously 
still XML-like, but in a formal draft, you can't call it something that it 
isn't. It will lead to ambiguity.

I like the idea someone suggested of using the same format as the single-file 
transfers. Why implement 2 formats? Then the argument of "token parsers can 
have bugs too" is void. Because the token parser has to be there anyway to 
support single-file transfers. So regardless of whether a token parser is 
used for multi-file, the bugs will be present. However, introducing XML 
parsing means there are now two parsing engines in which bugs can exist, not 
just one.

-- codemastr
> 
> > Those are just some. An XML library would of course deal with all of 
> > those.
> > But if someone is writing their own XML parser (as was suggested), 
> > then they
> > have to deal with all those things. With a token based system, you 
> > just need
> > to deal with missing/extra parameters, missing/extra quotes, and 
> > multiple
> > spaces (maybe a couple others I forgot, but that's the majority of 
> > them).
> > And, many of them can already be dealt with using the existing strtok()
> > function. The only one that would need to be hand coded would be quote
> > handling. With XML, there could be hundreds of potential illegal 
> > inputs that
> > people try.
> 
> If the parser was limited to recognizing a string in a particular 
> format which also happens to be recognizable by XML parsers, I don't 
> believe this will be such an issue. There are bound to be some people 
> who see XML and decide to write a generic parser, but I think that 
> experienced developers know better than to try that.
> 
> > And, as for using a library, libxml is a rather large library. If
> > I remember, the rule of thumb is 1 bug per 1000 lines of code. Well 
> > the XML
> > parsing libraries have several thousand lines of code. A token based 
> > parser
> > might have 1-2 hundred. Just from the sheer size of the XML parsing
> > libraries, you are bound to have more bugs.
> 
> libxml is quite mature, and I would not worry about depending on it. 
> Like anything else, we have to keep an eye out for exploits that are 
> found, but it's pointless to avoid using something simply because it 
> might contain errors that might be exploitable.
> 
> I'd like to just go on record as saying that I am not worried about 
> trying to force developers to use parsing techniques that minimize the 
> possibility of errors. I believe that this discussion is going nowhere, 
> and I think that continuing in the direction we are headed will cause 
> problems. I agree that a tokenized string format is preferable if it 
> can encompass all the needed data without turning into spaghetti. 
> However, using XML certainly does not strike any kind of fear into my 
> heart, and I think it's preferable to overly complicated tokenized 
> strings.
> 
> I suggest that we begin with the assumption that we will use tokenized 
> strings and try to find problem situations that will arise. If we can 
> handle every problem situation in the context of tokenized strings, 
> then we should just stick with that. Otherwise we should consider using 
> XML. Maybe somebody can begin by suggesting a basic syntax for a 
> tokenized string solution, and the rest of us will try to find 
> situations where it won't work, and we'll go from there.
> 
> --Riley
> 
> _______________________________________________
> dcc2 mailing list
> dcc2 at dcc2.org
> http://six.pairlist.net/mailman/listinfo/dcc2
> 



-- 





More information about the dcc2 mailing list