Content Automation, Speed of Content
Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Accurate estimates of these document category proportions have not been a goal of most work in the classification literature, which has focused instead on increasing the accuracy of classification into individual document categories. Methods tuned to maximize the percent of documents correctly classified can still produce substantial biases in the aggregate proportion of documents within each category. Individual document classifications, when available, provide additional information to social scientists, since they enable one to aggregate in unanticipated ways, serve as variables in regression-type analyses, and help guide deeper qualitative inquiries into the nature of specific documents. We denote the Document category variable as Di , which in general takes on the value Di = j , for possible categories j = 1, … , J.6 We denote the second, larger population set of documents as the inferential target, and in which each document has an unobserved classification D.
Sometimes the labeled set is a sample from the population and so the two overlap; more often it is a nonrandom sample from a different source than the population, such as from earlier in time. Existing Approaches A simple way of estimating P(D) is direct sampling: identify a well-defined population of interest, draw a random sample from the population, hand code all the documents in the sample, and count the documents in each category. DANIEL J. HOPKINS AND GARY KING Optimizing for a Different Goal Here we show that even optimal individual document classification that meets all the assumptions of the last section can lead to biased estimates of the document category proportions. Except at the extremes, there exists no necessary connection between low misclassification rates and low bias: it is easy to construct examples of learning methods that achieve a high percent of individual documents correctly predicted and large biases for estimating the aggregate document proportions, or other methods that have a low percent correctly predicted but nevertheless produce relatively unbiased estimates of the aggregate quantities.
Statistically Consistent Estimates of Social Aggregates We now introduce a method optimized for estimating document category proportions. To simplify the exposition, we first show how to correct aggregations of any existing classification method and after offer our standalone procedure, not requiring a method of individual document classification. We therefore offer here statistically consistent estimates of document category proportions, without having to improve individual classification accuracy and with no assumptions beyond those already made by the individual document classifier. We avoid parametric assumptions here too, by using direct tabulation to compute the proportion of documents observed to have each specific word profile among those in each document category.
Rather than automating all 3 concourses into one integrated system, the system was used in a single concourse, by a single airline and only for outbound flights. Although the remnants of the system soldiered on for 10 years, the system never worked well and in August 2005, United Airlines announced that they would abandon the system completely. The 3 bids are all rejected Denver Airport Project Management team approach BAE directly requesting a bid for the project Denver Airport contracts with BAE to expand the United Airlines baggage handling system into an integrated system handling all 3 concourses, all airlines, departing as well as arriving flights. Demonstration is a disaster as clothes are disgorged from crushed bags Denver Mayor cancels 15 May target date and announces an indefinite delay in opening Logplan Consulting engaged to evaluate the project Fourth target for opening BAE Systems denies system is malfunctioning. Scope of work is radically trimmed back and based on Logplan’s recommendation airport builds a manual tug and trolley system instead City of Denver starts fining BAE $12K per day for further delays Actual opening In order to save costs the system is scrapped in favour of a fully manual system.
As planned, the system was the most complex baggage system ever attempted. From Slinger’s perspective a. Denver was to be a state of the art airport and as such the desire to have the most advanced baggage system would likely have been a factor behind Slinger’s willingness to proceed, b. Slinger’s prior experiences with baggage handling will have been based on simple conveyor belts combined with manual tug and trolley systems. As such Slinger may have been inclined to make decisions on his own rather than seeking independent advise, e.
Slinger dealt with the discussions with BAE personally, given that he was responsible for the complete airport, he will have had considerable other duties that would have limited the amount of time he had to focus on the baggage system, f. On the surface the prototype may well have made it look as if BAE had overcome the technical challenges involved in building the system and as such Slinger may have been lured into a false sense of security. Key Decision 5 – Design of the physical building structure Rather than being separate entities, the baggage system and the physical building represented a single integrated system. Largely because the design of the building was started before the baggage system design was known, the designers of the physical building only made general allowances for where they thought the baggage system would go. The resulting design meant that the baggage system had to accommodate sharp turns that were far from optimal and increased the physical loads placed on the system.
Despite United Airlines instance that the automated system be finished, based on Logplan’s recommendation the Mayor slashed project and ordered that a manual trolley system built at an additional cost of $51M. While the Mayor was correct in taking action, the timing of the intervention again reveals something about the internal dynamics of the project.