Heh, you lucked out in your request -- I've done a bunch of grad work on statistical and predictive modeling, so I'm very familiar with the kinds of questions that need to be asked and answered for this kind of task. I might have been a number theorist, and then what would you have done? :-P
I remain curious as to how or whether a test like this could be used to compare a half dozen known pastiches to Holmesian canon for "consistency of Watsonian style", but that's clearly a different proposition than the authorship question Allen raises.
Yup. The first task is to figure out exactly what you mean by "consistency of Watsonian style" -- and once you know what you mean by it, then you can sit down and figure out how to measure it. (Which might turn out to be an iterative process, going back and forth between measure and question and test, until you have all three aligned.) This measure and test is something like "Is the pastiche more like Holmes canon or more like (second reference text) in how it uses prepositions and such?" Maybe you could have the second reference text be the pastiche-author's non-pastiche work? "Is The Peerless Peer more like Holmes canon or more like the other Wold Newton books with respect to the frequency with which [list of reference words] (and to a lesser extent, the linguistic structures that can be inferred from those words) are used in the text?" All the critiques I wrote above would still apply -- you'd have to pick your reference words so that exactly ten of them are more frequently used in Holmes canon and the other ten are more frequently used in Wold Newton canon, etc. -- but you could use this test pretty much as written for that, especially if you were doing it just for fun.
Yes, exactly: there is some merit to the technique, but room to question the results. If I were designing a test from scratch for this stylographic question, I don't think I'd use this measure and test at all, but I have a lot more computational power at my disposal than Mosteller and Wallace had in 1963. (Not that I feel like putting in the work of actually designing the measure and test -- and assuming of course someone else hasn't already invented and published one!)
You're right, that might make an enjoyable LCSS presntation! Certainly would spark an animated discussion in the corridors, I would think.
Re: Because you asked...
Heh, you lucked out in your request -- I've done a bunch of grad work on statistical and predictive modeling, so I'm very familiar with the kinds of questions that need to be asked and answered for this kind of task. I might have been a number theorist, and then what would you have done? :-P
I remain curious as to how or whether a test like this could be used to compare a half dozen known pastiches to Holmesian canon for "consistency of Watsonian style", but that's clearly a different proposition than the authorship question Allen raises.
Yup. The first task is to figure out exactly what you mean by "consistency of Watsonian style" -- and once you know what you mean by it, then you can sit down and figure out how to measure it. (Which might turn out to be an iterative process, going back and forth between measure and question and test, until you have all three aligned.) This measure and test is something like "Is the pastiche more like Holmes canon or more like (second reference text) in how it uses prepositions and such?" Maybe you could have the second reference text be the pastiche-author's non-pastiche work? "Is The Peerless Peer more like Holmes canon or more like the other Wold Newton books with respect to the frequency with which [list of reference words] (and to a lesser extent, the linguistic structures that can be inferred from those words) are used in the text?" All the critiques I wrote above would still apply -- you'd have to pick your reference words so that exactly ten of them are more frequently used in Holmes canon and the other ten are more frequently used in Wold Newton canon, etc. -- but you could use this test pretty much as written for that, especially if you were doing it just for fun.
Yes, exactly: there is some merit to the technique, but room to question the results. If I were designing a test from scratch for this stylographic question, I don't think I'd use this measure and test at all, but I have a lot more computational power at my disposal than Mosteller and Wallace had in 1963. (Not that I feel like putting in the work of actually designing the measure and test -- and assuming of course someone else hasn't already invented and published one!)
You're right, that might make an enjoyable LCSS presntation! Certainly would spark an animated discussion in the corridors, I would think.