Hi Jim,
Your project interests me. Allow me to reflect a bit on the notion of "operationalizing" Lonergan's work on cognition and the functional specialties. To begin with, I'm not sure what you mean by "providing validity and reliability to the structure," and I hope that you will correct me if I have missed the boat.
What I mean by "operationalize" is to create a method for measuring a variable. By "measuring" I mean assigning a case to a category according to a rule. So to operationalize a variable, I have to specify the categories as well as the rule or rules for assigning cases to one or another of the categories. The set of categories I use in operationalizing any given variable may correspond to a ratio, interval, ordinal, or nominal scale (basic social science stats). My method of measurement of any variable must be evaluated, as you say, in terms of reliability (easier task) and validity (much harder task). My method of measurement can be described as a "structure," and it is in this sense that I interpret what you mean by the reliability and validity of the structure.
Lonergan's eight functional specialties constitute a set of categories, and it would be possible to assign instances of theological inquiry to one or another of these categories, taking the eight categories as constituting a nominal scale. Lonergan's descriptions of these specialties, and the further elaboration of these descriptions by followers of Lonergan, provide the rules for assigning cases to categories. The reliability of these assignments would probably have to be assessed by inter-scorer reliability, a high level of agreement between independent scorers in the way they assign a set of theological inquiries to these categories. This kind of reliability might be hard to achieve, especially if the individuals or groups who had conducted the inquiries had not thought of themselves as working within one of Lonergan's functional specialties. The judgment that a set of theological inquiries had been validly assigned to these categories would be contingent upon the judge's conviction that the categories themselves are a valid way of dividing up theological inquiries. There is no question that theological inquiries differ, but there does not appear to be general consensus among theologians that Lonergan's set of categories provides a valid "ruler" for describing the variation in theological inquiries. This means that only those who accept Lonergan's division of theology into these functional specialties would accept using them as a valid nominal scale.
There is another way of operationalizing the functional specialties. Each of the eight could be considered to be a dimension along which a theological inquiry could be measured, analogous to the way height and weight are two dimensions along which a person's body is frequently measured. This would involve most likely, developing ordinal scales for each functional specialty. The result would be that each measured theological inquiry would receive a score on each specialty. If the scores were assigned ordinal numbers, say 1 to 5, a given study might be a 3 on research, a 3 on interpretation, a 2 on history, a 1 on dialectic, a 4 on foundations, etc. Again, there would probably be a need to assess inter-scorer reliability, and validity would still depend upon a judge's degree of acceptance of Lonergan's categories as pointing to the set of ways that theological inquiries can vary.
You are, however, not assessing theological studies, but judicial viewpoints on science in the context of the courtroom. You appear to envision either administering a paper and pencil test to participants in a study, or interviewing them. In either case, they would select one (or more?) of several options, and the options would be based on the functional specializations. I can imagine how you might translate Lonergan's categories for FS in theology to a set of categories for positions on legal aspects of science. I think that the same two approaches to measurement might still be relevant. Would the legal-scientific analogues to the theological categories constitute a nominal scale in themselves, or would you use each category as the basis for constructing an ordinal -- "more or less" -- kind of scale?
Another reflection on validity. There is a difference between "criterion validity," "convergent validity," and "face validity." Criterion validity is seldom possible, because it requires independent knowledge of that which the instrument has been designed to measure. Thus, an intelligence test (the most commonly used example) can provide a score that is generally standardized with a mean of 100 and each standard deviation away from the mean being 10 points away from the mean, so that "normal" intelligence is the 66+% of the population with IQ scores between 90 and 110. But it is impossible to know, independently of administering an intelligence test, what a person's IQ really is, so that we can see whether or not the test measures what it is supposed to. But there still is convergent validity. If there are several independently constructed intelligence tests, the validity of each can be judged by the degree to which the scores of a set of individuals measured by one test "converge" upon the scores of the same set of individuals as measure by the other tests. But it is expensive to construct multiple tests, and the people who might be willing to take one are not likely to be willing to take the others, just so that the test makers can test the validity of their instruments. The result is that the most popular kind of validity for tests in the social sciences turns out to be face validity. This means that the arguments of the test makers that the way they have operationalized the measurement of a variable persuade enough of the interested parties to accept the results as valid. This is made more plausible if the test turns out to be reliable, but if the measurement is unreliable, it cannot be judged to have even face validity.
Why have a droned on at such length about this. I did my dissertation, many years ago, on standardizing a set of measures of gender identity for subjects in India. I spent a great deal of time reviewing the literature on reliability and validity. I concluded that one of the measures (the Franck Drawing Completion Test) was invalid for Indians. I went back to the data for the standardization of the test for American subjects, and concluded that it was also invalid for Americans. I was terrified that my dissertation director, who had a considerable stake in the validity of the test for Americans, would cut me up into little pieces, but he was very gracious.
Best regards,
Dick