Pre-conference lecture

Pre-conference lecture (no participation fee)

HathiTrust Research Center: Pushing the Frontiers of Large Scale Text Analytics

Presenter: J. Stephen Downie (Associate Dean for Research Graduate School of Library and Information Science
University of Illinois Urbana-Champaign / Co-director of the HathiTrust Research Center (HTRC))
Moderator: Shunya Yoshimi (University of Tokyo)

15 September 17:30 - 19:00 @Faculty of Engineering Bldg.2 (Hongo Campus), University of Tokyo [map]

This presentation will introduce the newly-formed HathiTrust Research Center (HTRC) and its projects. The HTRC is affiliated with the HathiTrust, an online repository dedicated to the provision of access to a comprehensive body of published works for scholarship and education ( ). Over 60 universities belong to the HathiTrust community and over 10 million volumes have been ingested into its digital archive from sources including Google Books, member university libraries, the Internet Archive, and numerous private collections. The HathiTrust Research Center ( ) is dedicated to facilitating scholarship using this enormous corpus through enabling access to the corpus, developing research tools, fostering research projects and communities, and providing additional resources such as enhanced metadata and indices that will assist scholars to more easily exploit the HathiTrust corpus. The poster will outline the mission and goal of the HTRC, progress toward this goal to date, current and planned projects, and ways in which scholars can work with and through the HTRC.
Specific issues and related projects that will be discussed include:

(1) Development of approaches and tools for “non-consumptive research,” research on corpuses of documents that does not involve reading the documents. Of the 10.2 million volumes in HathiTrust, 62% are subject to copyright laws of the United States and other countries and the remaining 38% that are in the public domain are subject to Google terms of access. HTRC has entered into legal agreements with HT and Google that can provide computational access to the volumes as a service to researchers. In addition, HTRC researchers are currently developing computational tools for non-comsumptive research that provide ways in which text mining and other tools can be applied to copyrighted materials of the HathiTrust corpus in ways that does not violate fair use terms.

(2) Development of a tool suite for analysis of HathiTrust data through collaborations with the SEASR group and Project Bamboo.

(3) Access to high performance computing resources for analysis of HathiTrust data.

(4) Specific HTRC research project examplars.

(5) Governance and community building.