Building and (Not) Using Tools in Digital Humanities

As I mentioned in my last post, the “Short Guide to Digital Humanities” (pages 121-136 of Digital_Humanities, by Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, and Jeffrey Schnapp, MIT Press, 2012) includes the following stricture under the heading “What Isn’t the Digital Humanities?”:

The mere use of digital tools for the purpose of humanistic research and communication does not qualify as Digital Humanities.

I’m not going to speculate on the reasons that these authors make this declaration, or on why they feel they (or anyone else) should be authorized to decide what kinds of activities do and do not “qualify” as parts of the field.

Here I want only to reflect on the potential damage done to the field by adhering to this restriction.

First, I think it raises questions about the credibility of the field. Among the strongest justifications for the existence of DH is its formal resemblance to established sub-disciplines in which computers are used to address academics subjects whose existence predates the age of computerization. The field of this sort to which I am closest and about which I know the most is computational linguistics, really a name for a group of diverse sub-disciplines ranging from machine translation to natural language processing to corpus linguistics. While there are some exceptions among some of the subfields, it is almost entirely the case that these fields do not distinguish between “building” tools and “using” tools when it comes to the evaluation or promulgation of scholarship. On the contrary, what is valued in all these fields is interesting results, results that can be widely used and shared among the community of computational linguists, especially results that expand our understanding of human language. There are a dozen or two world-famous computational linguists, all of whom use tools in any number of ways in their research. I know of no particular attention or approbation paid to them because they did or did not build the tools they use. I certainly know of no internal or external strictures dictating that the “building” part of their work is computational linguistics, while the use of those tools is not. Not only that: all of these linguists (just a few examples are Douglas Biber, John Goldsmith, Dan Jurafsky, Mark Davies, and Christopher Manning, all of them known for a wide range of scholarly activities), despite their overt and declared interests in exploiting computers for the analysis of language, write substantial analyses of their work for others to read. All of this work “counts” as computational linguistics. I know of no stricture within the field that says you must be dedicated to building things if you want to be a part of it, or that building things must constitute your primary investment in the field: on the contrary, you must be dedicated to computational analysis of languages that produces interesting results, just as the literal meaning of the phrase “computational linguistics” suggests. Just to be clear: that can and does entail building tools, if you want to do so: it just entails building as part of a suite of activities, which also includes using tools and generating analyses that compel the attention of other linguists. I think DH should and could be even more like Computational Linguistics (and more directly allied with it, as well), and the suggestion that only the building part “counts” moves in the contrary direction.

There is more to say about this analogy, because there certainly are some researchers who prefer to focus their work almost exclusively on building computational tools. For the most part, those folks live in another discipline entirely, usually computer science or electrical engineering. There are close and productive ties between computer science and computational linguistics. There are some similar ties between digital humanities and computer science, but I think there could be more. But the more of these there are, the less one might expect the humanist members of teams to be the right ones to do the heavy parts of building, since those are typically the skills that are treated in exquisite detail in computer science. So what? What I want are interesting results, period. I want them in DH the same way I want them in any other humanities specialization.

It’s worth reflecting on this just a bit more, because it speaks to a problem we hear a lot about especially in English Departments with regard to evaluation of DH scholars. Computational linguists often work in teams, just like DH teams. In these teams, the distribution of tasks varies. I don’t know of particular scrutiny being paid to who does or does not do the actual building on those teams–the interest is in the results, and in having made a significant contribution to them in any fashion, and being able to articulate that contribution.

Now if we reflect on significant DH projects, can we actually say with certainty that the significant figures in DH actually are builders, and if so, in what respect? One of the figures whose work is most often and rightly pointed to as indicative of the potential of DH is Franco Moretti. As far as I know, the work for which he is most famous, on historical trends in the development of the novel in Europe, can best be described as using tools and interpreting the results of that tool-use; Moretti famously expressed very little interest in the tools used. (In his more recent Stanford Literary Lab experiments there is more discussion of the tools, but not much emphasis on Moretti’s direct involvement in building them.) Curiously, the Digital_Humanities stricture would pretty much rule out Moretti’s work as DH, which strikes me as seriously counterproductive (and very hard to explain to outsiders). Moretti is by no means the only relatively well-known DHer whose direct, practical engagement with building–especially with actual coding–is somewhere between “unclear” and “nonexistent.” Only if you care more about boundaries than results would you want to try to distinguish whether these people “are” or “are not” DH, rather than looking at the results they produce.

Beyond the credibility issue, I think we already see the distorting effects that promoting the building of tools and demoting their use can have. One is to generate a series of what are effectively prototypes that never move beyond that phase; another is a lack of focus on the differences between prototype work and actual live supported software; related to that is a lack of focus on even the process that takes designs from prototypes to live use, so that too many prototypes simply get built and then pass out of awareness. The authors of an important recent report about digital projects in the UK, Sustaining Our Digital Future (Strategic Content Alliance, 2013), Nancy Maron, Jason Yun, and Sarah Pickle, write a lot about these problems:

For well over a decade, significant investment in creating digital resources has been spurred by government agencies and funders, as well as by private philanthropists. Even today, developing sustainability plans remains a challenge for many of these projects. While most will agree that, at the very least, early efforts to create digital content were valuable for increasing the capability and experience of those who engaged in them, some of these earlier projects have been criticised for not being “future-proofed” and indeed, not all are easily available today; a few may be entirely inaccessible and even those that do exist have lost value as their content and interfaces remain frozen in time or worse. A review of UK Digitisation Projects, funded by the New Opportunities Fund (NOF) Programme from 1999-2004, evaluated 154 grants and found that as of August 2009, 25 or 16% were found to have “no known URL or URL not available” and for 82 or 53%, it was noted that while the website exists, it “seems not to have changed since the launch”. The LAIRAH Project: Log Analysis of Digital Resources in the Arts and Humanities, a study that sought, among other things, to “determine the scale of use and neglect of digital resources in the humanities” reviewed usage logs of the 1255 projects in the Arts and Humanities Data Service, finding that “most of the projects that we studied are finished, [but] very few are being actively updated.” (11)

Such an unfortunate situation has many explanations, but it is clear that the demotion of using tools, both in the discipline of DH and in its funding streams, cannot be helping matters.

TOPIC MODEL

Force-Directed Graph of Topic Correlation Network Layout, 1800-1849, from Michael Simeone, “Visualizing Topic Models with Force-Directed Graphs,” generated with tool at http://isda.ncsa.illinois.edu/~mpsimeon/topics/FDE/indexvml.htmlL

One troubling dynamic I’ve seen over the decade-plus history of digital humanities and its frequent and often overt emphasis on building above and beyond most other forms of scholarly activity is this one, which is to some extent the converse of the prototype problem. This is that when tools or methods are proven useful and powerful, and therefore become widely distributed and used by a range of scholars who may or may not see themselves explicitly as “part” of DH, that very usefulness and ubiquity ends up disqualifying them as part of DH. An old example: in the early days of the web, plain old HTML was considered enough of “building” that many projects were said to qualify as DH simply by dint of using HTML. Then, when blogs started to become popular but blogging software had not been packaged up enough to make it easy for anyone to use it, blogging was seen as part of DH and assembling a blog was seen as “building.” Then HTML skills became more widespread, building blogs became easy, and of course many DHers use blogs, but having a blog and writing HTML are no longer considered good ways to qualify a project as DH.

Now let’s take a more pointed and more current example. One of the latest technologies being promulgated through many quarters in DH is topic modeling. We’ve already seen a number of truly insightful, analytical projects using topic modeling to draw interesting conclusions about various bodies of text. Just a few of the several recent projects that have gotten attention include: Andrew Goldstone and Ted Underwood, “What Can Topic Models of PMLA Teach Us About the History of Literary Scholarship?“; Jonathan Goodwin, “Topic Modeling Signs” and “Creating Topic Models with JSTOR’s Data for Research (DfR)“; Ted Underwood, “Topic Modeling Made Just Simple Enough” and “Visualizing Topic Models“; and Ben Schmidt, “Keeping the Words in Topic Models.”

I hope and trust that there are few people who would demur from the view that all of these are excellent, worthwhile, even exemplary instances of what DH can do. They tell us important things about both the shape of literary production and of critical production, depending on the corpora to which they have been applied. As exemplars, it’s important to note what these projects have in common: building tools, using tools, and writing up the results. I would hope that some of those results would eventually be published in peer-reviewed journals or edited collections, but taken as a whole it is hard to imagine these not being the sorts of contributions that English and History departments would consider meaningful contributions to promotion and tenure files.

So, I think that topic modeling is here to stay, and there are reasons to suspect that major data providers–for example, JSTOR, which provides some of the data (via its Data for Research interface, described at some length in Goodwin’s “Creating Topic Models with JSTOR’s Data for Research (DfR)“) used in the analyses of critical writing–do too. But topic modeling is nothing more than a set of algorithmic processes. If they are valuable, I’d expect to see JSTOR, EEBO, and many others to incorporate topic modeling tools into their interfaces: in fact, I presume they see the work of Profs. Underwood, Goodwin, Goldstone, Schmidt and others as in some sense building prototypes for future applications. That should make it possible for a wide range of scholars to use the tools to draw all kinds of interesting inferences about a wide range of subject matters. But if we follow the Digital_Humanities stricture, that work would no longer count as digital humanities. Yet continuing to see it as DH would encourage even more scholars to utilize it and generate interesting results with it. The deployment of topic modeling tools by data providers would also help to address the sustainability issue, since these major data providers are already in the business of supporting and maintaining tools like these and the data on which to do research with them.

I don’t deny that there will probably remain new topic modeling tools to build. What I am hoping to point out is that the very usefulness of topic modeling suggests it will become part of the scholar’s toolkit, and that if we then arbitrarily deem that success to mean it is no longer part of our research enterprise, we are cutting off our nose to spite our face. Wide adoption and use is success, and interesting results produced with digital tools deserve to be called digital humanities.

Next: a follow-up on exclusionary definitions of DH

This entry was posted in digital humanities, materality of computation, rhetoric of computation and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

You must be logged in to post a comment.