CESARE PASTORINO, TAMARA LOPEZ AND JOHN WALSH
The Digital Index Chemicus: toward a digital tool for studying Isaac Newton's Index Chemicus
In the decade of the 1680s, Isaac Newton
composed a remarkable alchemical reference work, which he called the
Index Chemicus (or ‘Chemical Index’). Newton left behind multiple
versions of the Index. In its final form, the work acquired features
both of an alchemical dictionary and thesaurus, and of an annotated
bibliography to the literature of early modern alchemy.
The extant versions of the Index Chemicus
have recently been edited and published for the first time as a part
of the digital humanities project The Chymistry of Isaac Newton,
based at Indiana University-Bloomington. The purpose of this project
is to transcribe, digitize, edit, and annotate all of Newton’s alchemical
manuscripts. While transcribing and editing the Index Chemicus,
we realized that the specific features this work possesses, and in particular
its alphabetically organized entries, could be presented to scholars
and students in a more useable form than the representation
of the manuscript within the main edition. This was the starting point
for the development of a newly designed, edited, and formatted digital
edition of the Index Chemicus. In previous work, we have devised
several strategies for the development of a Digital Index, among
which was its use as a reference and an entry point (in terms of access
via subjects and keywords) to the related content in The Chymistry
of Isaac Newton. This year, the NEH awarded the project a grant
that will integrate the Digital Index into the Chymstry of
Isaac Newton, alongside newly developed visualization and annotation
tools for the collection.
The research presented here, explores
the features of a digital tool conceived for the study and understanding
of the Index Chemicus as an independent resource. In particular,
we demonstrate that the Index Chemicus presents research issues
and questions that cannot fully be explored and exploited within a more
traditional (if digital) published edition. The development of a separate
and specific form of publication for the Index Chemicus responds
to the exigency of dealing with research questions that traditional
formats for manuscript publications -both in paper and digital form-
can only address with difficulty.
In the following sections of this paper, we present other research that has been done on the Index Chemicus, and situate Newton's own work within the context of Early Modern reading practices. Next we broadly describe the content and organization of the extant versions of the Index, and give some insights into the practices Newton followed in creating them. Following, we develop our research questions, and provide a deeper content and data analysis of the manuscripts. We conclude this discussion with some ideas about the direction to take in the technical implementation.PAST AND PRESENT RESEARCH
In the past, the Index Chemicus
has primarily been studied by the late Richard Westfall, a prominent
Newtonian scholar and professor of history of science at Indiana University.
In the most comprehensive analysis of the work to-date, Westfall (1975)
addressed the dating of the manuscripts, and the Index’s intellectual
role and purpose in connection with Newton’s alchemical studies. However,
for Westfall, the Index Chemicus primarily represented the smoking
gun of Newton’s alchemical interest: the sheer bulk and amount of
detail in the manuscript constituted incontrovertible evidence of Newton’s
efforts in alchemical matters. This fact was an uneasy novelty for the
Newtonian scholarship of the time, and it is largely based on the merit
of scholars like Westfall that it now is an accepted aspect of Newtonian
Beyond the work done by Westfall, one
can approach the analysis of Newton’s Indexes by considering
more recent scholarship of early modern scholarly reading practices
(Blair 2003; Blair, 2004; Burke, 2000; Yeo, 2001; Yeo, 2003). Though
alphabetical indexes came into practice as a way to organize information
as early as the 13th century, they became more commonly employed
by individual scholars in the early modern period as a note-taking
technique to facilitate ‘reading for action’. This reading strategy
was necessitated not only by the increased production of books, but
also by their greater availability and the consequent growth in size
of personal libraries. As personal libraries grew, so did the numbers
of books read. In addition to diligent reading of volumes, scholars
came to relying on published reference materials, and to using other
shortcuts such as reading books in bits and pieces, paying close attention
to passages of particular relevance to their direction of inquiry and
less so to others.
Ann Blair (2003) has recently suggested
that the ‘best evidence’ for such ’consultation reading’ is
in the high incidence of indexing among scholars, both on their own,
or as a way to correct, improve upon or add to indexes provided by publishers.
So used, marginal annotations in books became finding aids for readers,
helping them return to items of interest after the passage of time.
In the case of Newton, much of the evidence
for his reading strategies was put together by John Harrison (1978),
in his seminal research on Newton’s personal library. As well as developing
a comprehensive catalogue of the books and particular editions held
by Newton, Harrison made careful note of the nature of marginal notations
as well as their content.
Pairing this study of Harrison’s with
her own study of 17th century reading and note-taking practices,
Blair (2004: 423) has developed an initial assessment of Newton’s
own reading habits, suggesting that, like his contemporaries, Newton
corrected problems and provided cross-references, but did not include
marginal summaries or keywords. Similarly, Newton’s use of dog-earing
was unique, but his inclusion of marginal annotations unrelated to the
main body of the text was common among 17th century readers.
However, Blair concludes that while many early modern note-takers tracked
topics in the margins of books, and created indexes of topics and page
references on their fly-leaves, Newton did not.
Given the nature of the Index Chemicus and the clear relation its contents share with the marginal notations catalogued by Harrison, it is clear that, at least in the case of Newton’s alchemical readings, Blair’s initial analysis of Newton’s reading practices is not comprehensive. In order to give a fuller evaluation of these practices, Newton’s creation of a reading reference tool like the Index Chemicus must be taken into account1.THE MANUSCRIPT, THE TEXT
The heading Index Chemicus was assigned by Newton to several texts, which constitute different stages of a single reference project.
The version of the Index
assumed to be the first, MS Keynes 30/3, is simply a list of headings
on a single sheet of paper. It consists of 115 terms, annotated in alphabetical
groups. This version includes no bibliographical citations within the
The likely candidate for the second
version of the Index, MS Keynes 30/2, is developed over 8 pages.
The entries in this version number 258 and, most significantly, Newton
begins to annotate references to books. Mostly, the references are very
simple, comprised of a jotted work and/or author and the referenced
page. However, a few entries show an early evolution of the text toward
the construction of a thesaurus. So, for instance, in the item ‘Eclipsis’,
Newton notes that ‘Eclipse” represents “putrefaction’. Very
few entries are developed beyond this level, like in the case of
‘Calais et Zete’, which is constructed as a one sentence summary
and interpretation of a mythological episode described in alchemical
terms (folio 1v).
It is possible that, when the eight pages of MS Keynes 30/2 were too full to accommodate new entries, Newton started a new version, MS Keynes 30/5. Developed over 12 folios, it comprises 24 pages. In this manuscript, the total number of entries is around 730. Almost all entries of Keynes 30/5 are now developed at least into a sentence, with larger and more extensive references to authors and specific texts given than in the case of Keynes 30/2. Some entries extend beyond a single sentence, giving a more detailed and structured description. Items are sometimes repeated to describe different semantic contexts. A constant feature of entries is also given by the enumeration of synonyms and related concepts.
In his final version, catalogued as Keynes MS 30/1, the Index Chemicus extended over 98 folios. The entries here number around 920, while the manuscript includes around 30,000 words. Within this version, Newton composed the main alphabetical listing on the rectos of the manuscript, reserving versos for the annotation of subsequent insertions. If the entries on the versos are included, Keynes MS 30/1 comprises 126 pages.
Again, the definition of a particular alchemical concept, compound, procedure, or mythological term is regularly followed by a list of authors and texts that refer to it, including precise references to volumes and pages; frequently, the initial definition is integrated with a list of synonyms. A remarkable case is found in the entry ‘Materia Prima’, where about 50 synonyms for the term are present. However, in this version of the Index, the size of the items can vary considerably, from a simple annotation, to the length of a short essay.
The most impressive feature of this version lies in the richness of its source apparatus. Overall in Keynes 30/1, the number of distinct authors and works cited by Newton runs in the hundreds, and there are references to thousands of different pages and passages of specific alchemical texts.RESEARCH DIRECTIONS
From this overview, it is possible to
develop a group of research issues. A broad and general question regards
the intellectual role that the Index Chemicus played for
Newton. To properly investigate this issue, one must tackle several
different and more concrete points, namely the Index’s composition,
its content and structure, and its evolution through the
various known versions. As we have shown, a different but related area
of investigation is given by the context of early modern reading
strategies, of which the Index is a clear example.
The Intellectual Role
In his work on the Index Chemicus,
Westfall (1980: 359) suggested that this work was initially conceived
as Newton’s personal entry point to the literature of alchemy, and
that subsequently it was developed into a ‘general guide to the Art’.
Westfall’s first interpretation is echoed by the work of researchers
like Blair (2003) and Yeo (2001; 2003), who note that, during this time,
a shift in reading strategies occurred from close, deep consultation
of a single or few sources to “reading for action” of a larger number
of texts. Within the publishing world, this period also marked the beginning
of the development of a market for selective, authoritative sources
for knowledge, which reached their apex in the scientific reference
works that were to follow in the 18th century. Given these
two early modern trends, is it possible that Newton intended his later
version of the Index to be read by others, as Westfall suggested?
More broadly still, the reference to
the umbrella term ‘alchemy’ for the content of the Index
is generic, as alchemical practices of the seventeenth century constituted
a variegated set of often rather dissimilar traditions. What kind of
material is referenced in the Index Chemicus? Which selections
Newton made over time, and why?
Composition, content and evolution
These general theoretical questions
on the intellectual role of the Index can be approached from
a more practical standpoint, as it is clear that Newton’s goals and
aims are embedded within the various versions as represented in the
text and the methods used to select, arrange and organize it.
In general, the study of Newton’s compositional
strategies requires close comparison of the structure and content
of the various texts: the entries and the representational features
that were included over time. In this analysis, identification and marking
of the regularities and variations in the texts is essential. Given
the different organizational techniques employed within the Index,
it is also possible to think of different levels of analysis: one targeting
the single entry; a second examining the groups of entries belonging
to clusters of similar or like terms; and the third by considering the
full entry list.
Structure and content: Entry variations
The first analytical entry point is to
examine all entries beginning with a specific letter, with particular
emphasis on their variation in the different versions of the Index.
Though a simple example of a textual feature, it allows the researcher
to follow a macro-level aspect of the manuscript’s structure and evolution,
enlightening some of Newton’s compositional choices and strategies.
In the example below, the list in the first column shows Newton’s
initial choice for terms belonging to the letter ‘A’. It is reasonable
to think that these entries represented Newton’s best choice, that
is to say the nucleus of concepts that, according to him, were most
relevant and important. The list of Keynes 30/3 is arranged into groupings
by letter, but is not strictly ordered alphabetically, as for instance
‘Aurum’ (‘gold’) precedes Argentum (‘silver’). This seems
to indicate that Newton wrote the terms down in a rough order of importance,
perhaps as they came to his mind. Almost all items of this initial list
were retained in the second version, but for instance the item ‘Aqua’
(‘water’) was expanded into in a cluster of semantically related
terms (‘Aqua ardens’, ‘Aqua mercurialis’, ‘Aqua Saturni’,
Structure and content: term clusters
This evolution of terms clusters in the
various versions of the Index invites a different type of analysis.
Building on the previous example, it is possible to list and describe
the items developed around the concept ‘Aqua’. In particular, the
entry ‘Aqua sicca’ (‘dry water’) appears to have had an interesting
evolution over time. Newton introduced the concept in Keynes 30/3, and
developed it in depth in Keynes 30/5, where four different entries for
the term are present, each referring to semantically different alchemical
entities. In the passage from Keynes 30/5 to Keynes 30/1 (the Index
final version), these entries are again collapsed in a single one. Also
notice that In Keynes 30/5 Newton introduced the terms ‘Aqua salis
nitri’ and ‘Aqua vegetabilis’, which disappear as distinct entries
in Keynes 30/1 (and are actually combined in the item ‘Aqua maris’,
Structure and content: bibliometric analysis
Another angle for analysis is bibliometric,
examining in detail the scale and scope of the Index references
and sources. Again, an example can be given considering the evolution
of the references cited in the entries ‘Aqua sicca’. In this case,
Newton’s sources expanded over time, together with the number of pages
referenced. However, the process was not simply cumulative, as for instance
Newton eliminated from the last version of the Index the reference
to Eireneus Philalethes’ Fons chymicae philosophiae, which
he introduced in Keynes 30/5.
As we have emphasized, the early modern culture of scholarly indexes and reference works represents the intellectual context to which the Index Chemicus itself belongs. We have stressed the idea that Newton’s reading strategies, at least in the case of alchemical material, suggest an integration between two different types of activities: on one hand, direct book annotation and cross-referencing, and on the other, the compilation of a standalone reference work. In this context, bibliometric analysis invites an examination of the extent to which volumes from Newton’s library are represented within Newtonian manuscripts.THE DIGITAL INDEX: TOWARD A DIGITAL TOOL FOR UNDERSTANDING THE INDEX CHEMICUS
From the research questions and entry
analyses given before, we can begin to develop a model of the Index
Chemicus from an informational perspective, as the first step toward
implementing its digital incarnation.
Though the intellectual and spatial arrangement
of the Index evolves between subsequent versions, Newton's base
unit of organization is consistently the entry. An entry always has
a term and belongs to a version. Sometimes it possesses a textual description
that can include synonyms, and one or more references. These features
can be schematized in the following way:
As this diagram shows, references may
be one of two types: cross references, or bibliographical references,
and our analysis indicates that cross references relate to other terms
within the same manuscript version. This relationship, too is easily
Similarly, using Harrison's seminal analysis
of Newton’s library as a starting point, we can build an authoritative
bibliographic database of alchemical sources to which individual references
within entries can be linked, represented in this expanded model:
At this stage, we have delineated a clear
information model that captures the salient aspects of Newton's own
structure, and links to authoritative data sources to normalize variants
in bibliographic data. Furthermore, we have developed an information
model that can be easily represented using the descriptive and linking
capabilities of a language like the Text Encoding Initiative.
In fact, the model depicted in the first figure above largely represents
the digital form of the Index Chemicus as it exists today.
Within The Chymistry of Isaac Newton,
five of the Index Chemicus manuscripts have been transcribed
and edited. Marked up using the Text Encoding Initiative (TEI)
P4, Newton’s individual entries have been tagged as list items, and
editorial judgments have been made to structurally identify the headwords
for entries, which are encoded within <label> tags.
One way to move forward with the technical
implementation is to continue building upon the TEI data created for
The Chymistry of Isaac Newton. With some effort, the alchemical
bibliographic references could be isolated and specified using features
of the TEI defined for describing bibliographic material2.
Though this version of the TEI does include facilities for indicating
entries to be used in generating indexes, it does not offer a
specific tag set for modelling syndetic relationships of the sort usually
seen in subject indexes or thesauri3. Nevertheless, cross
references to other entries can be specified using the <ref> and
<ptr> tags4 and synonyms could be similarly approximated.
A slightly different approach might be
to represent the versions of the Index using one of a number of semantic-oriented
technologies developed in recent years for representing conceptual models
like Topic Maps or the newer generation RDF based languages. As with
the TEI, there is much experience to draw upon from the digital humanities
community to support this approach. The Deeds project at the University
of Toronto, for example, has migrated their data from XML to the newer-generation
modelling language RDF in order to facilitate complex dating and attribution
analyses. At a different level in the modelling stack, the Canonical
Text Services Protocol developed by Neel Smith’s Holy Oak tackles
the issue of consistent, normalized bibliographic citation. Elsewhere
within the Centre for Computing in the Humanities (CCH), recent projects
like the Fine Rolls of Henry III have examined the use of other
“semantic web” technologies like RDF expressed domain ontologies
for automatically deriving subject indexes (Gerves et al. 2007, Ciula
et al. 2007).
However, these directions are less
attractive when variations between different versions, and the manuscript
transformations are considered. As we have shown in the previous
examples, Newton's ideas about what constituted a significant concept
was fluid: within one version he might have expanded the information
about a single idea into several entries, or conversely contracted multiple
items into a single entry. Newton’s lists of synonyms fluctuated in
similar ways between versions and clusters of entries. Moreover, terms
that could be considered an entry synonym in one version were elevated
to the rank of a concept in a subsequent version, and vice versa. These
features are not surprising. The focus on the different manuscript versions
and their transformations emphasizes, so to speak, Newton’s magmatic
work with the text, and not its crystallized form. The fluid quality
of term selection suggests that entries are too
representational to serve as units of analysis for the Digital
A more fruitful tack to take may be to consider each version of the Index as a systematically arranged set of textual analyses. Conceptually, this orientation is similar to that developed by John Bradley (2003) at the Centre for Computing in the Humanities (CCH). Framed within an analysis of famed user interface developer Engelbart's Augment System, Bradley suggests that the goal of technologists working on textual analysis projects should not be to create systems for presenting the texts themselves or the words within the texts, but instead to find ways to represent the conceptual model as it is developed by the analyst in working with the texts, to facilitate the work of the analyst in model building. In so doing, these systems must also reflect the ways in which temporality manifests itself within the modelling process. Bradley identifies four effects of time on models: an increase in complexity and richness, the encapsulation of distinct models introduced by the analyst’s own scholarly context; the development of sub-models that explain and support the larger work; and the evolution and development of more formalized structures or ways of representing the model.CONCLUSION
Accepting that the work done by Newton in creating the Indexes is like that performed in textual analysis, one can use Bradley’s discussion to identify design goals for the Digital Index Chemicus. Within this framework, the different versions of the Index Chemicus can be understood to be products of modelling. But as has been shown, Newton’s products are the focus of the Chymistry of Isaac Newton, and have been ably represented there. The Digital Index Chemicus, by contrast, must build a tool that breaks apart the static representations left by Newton, presenting them to the scholar instead in a way that approximates the process by which Newton created them. We have observed that Newton’s models grew in richness and complexity over time, and suspect that he used successive versions to develop a more general conceptual model of Alchemy. The tool must therefore at once reveal Newton’s paths through alchemical texts, while assisting scholars in their own modelling processes.
This paper is the first exploratory step beyond the digital edition and publication of the manuscript within the Chymistry of Isaac Newton. That edition made the Index available to researchers for the first time, a significant contribution to Newtonian scholarship. However, as we have shown, the singularity of this manuscript begs a richer treatment and invites the development of new approaches toward its study.
Cesare Pastorino is a Ph.D. candidate
in the Department of History and Philosophy of Science at Indiana
University, and a research and editorial assistant for the digital project
the Chymistry of Isaac Newton (http://www.dlib.indiana.edu
Tamara Lopez is a member of the
XML Team at the Centre for Computing in the Humanities, King's College London.
Holding degrees from the School of Library and Information Science
at Indiana University, Lopez's research interests include the
design and use of mixed-content model XML languages, socially
constructed web standards and software, and the creation of information
systems to support digital scholarship. While at IU, she was also
the programmer and technical analyst for the Chymistry of Isaac Newton
John A. Walsh is an assistant
professor in the School of Library and Information Science at Indiana University,
where he teaches and conducts research in the areas of digital
humanities and digital libraries. His work explores electronic
textuality; complex document structures; the nature of the document
in the digital age; and the evolution of the document, the book,
and the literary text--both born-digital new media texts and digital representations
of prior texts. Exploring current transformational developments
in textuality, Walsh studies the application of metadata and
semantic web technologies to facilitate new forms of close, distant,
and social reading and interpretation. Current projects include
the Swinburne Project (http://www.swinburneproject
Blair, Ann (2003) ‘Reading Strategies
for Coping With Information Overload ca. 1550-1700’, Journal of the
History of Ideas 64(1): 11-28.
Blair Ann (2004) ‘Focus: Scientific
Readers: An Early Modernist's Perspective’, Isis 95: 420–430.
Burke, Peter (2000) A social history
of knowledge: from Gutenberg to Diderot. Cambridge, UK: Polity Press/Malden,
Mass.: Blackwell Publishers.
Bradley, John (2003) ‘Finding a Middle
Ground between “Determinism” and “Aesthetic Indeterminacy”:
A Model for Text Analysis Tools’, Literary and Linguistic Computing
Bradley, John (2005) ‘Documents and
Data: Modelling Materials for Humanities Research in XML and Relational
Databases’, Literary and Linguistic Computing 20(1): 133-151;
Bush, Vannevar (1945) ‘As we may think’,
Atlantic Monthly, Reprinted in ACM Interactions 3(2) (March 1996): 35-46.
Ciula, A., Spence, P., Vieira, J.M. and
Poupeau, G. (2007) ‘Expressing Complex Associations in Medieval Historical
Documents: the Henry the III Fine Rolls’, Digital Humanities 2007
Figala, K, Harrison, J, and Petzold, U (1992) ‘De Scriptoribus Chemicis: sources for the establishment of Isaac Newton’s (al)chemical library’ pp. 135-80 in Peter M. Harman and Alan E. Shapiro (eds) The investigation of difficult things. Essays on Newton and the history of exact sciences. Cambridge: Cambridge University Press.
Harrison, John R. (1978) The library
of Isaac Newton. Cambridge: Cambridge University Press.
Gervers, Michael and Margolin, Michael
(2007) ‘Managing Meta data in a Research Collection of Medieval
Latin Charters’, retrieved 04 September, 2007 from: http://res.deeds.utoronto.ca
McCarty, Willard (2005) Modelling In
Humanities Computing, Basingstoke (England)/ New York: Palgrave
Unsworth, John (2007) ‘Learning from
nora: distributed software development in the humanities’, Indiana
University, Bloomington, IN, March 29, 2007. Available at: http://www3.isrl.uiuc.edu/
Westfall, Richard (1975) ‘Isaac Newton's
Index Chemicus’, Ambix 22(3): 174-185.
Westfall, Richard (1979) ‘Newton's
Reading’ (Book review of J. Harrison, The Library of Isaac Newton).
Science, 204(4394): 745-746.
Westfall, Richard (1980) Never at Rest:
A Biography of Isaac Newton. Cambridge: Cambridge University Press.
Yeo, Richard (2001) Encyclopaedic Visions:
Scientific Dictionaries and Enlightenment Culture. Cambridge: Cambridge
Yeo, Richard (2003) ‘A Solution to
the Multitude of Books: Ephraim Chambers's Cyclopaedia (1728) as “the
Best Book in the Universe”’, Journal of the History of Ideas 64(1):
3 i.e. indicating lead-in terms. See also: "7.6 Back Matter", http://www.tei-c.org/cms/Guidelines/P4/html/DS.html#DSBACK ;