Category Archives: Uncategorized

How does the digital change the nature of historical research?

 

There are advantages and disadvantages in terms of the effects of the digital on the nature of historical research. This view is reflected by digital scholars where Daniel Cohen and Roy Rosenzweig state: ‘Even the ancient discipline of history has begun to metamorphose…new media and technologies have challenged historians to rethink the ways that they research…the past.’[1] This essay will discuss the following five beneficial qualities in relation to digital media, as well as showing how they have changed historical research. They are as follows: capacity, accessibility, flexibility, manipulability and interactivity. The following disadvantages, and why they are so, will also be discussed. These flawed areas of the digital in relation to historical research which will be focused on in this essay include: quality, readability and accessibility. Overall it seems that the digital has brought more in terms of advantages than disadvantages to historical research, hence the weight of this essay focuses more heavily on its positive and what they bring to the discipline of history.

The changes seen in historical research in relation to digital history can be seen greatly in terms of the increased storage capacity now accessible because of the digital. Storing large amounts of information is not always possible in a museum or archive where space is tight. A positive effect of the digital on historical research is the capacity provided by digital storage, which allows for greater research possibilities. The amount of data in Google Books, for example, amounts to: ‘…2 trillion words from 15 million books…’[2] Cohen and Rosenzweig show just how important increased storage capacity is where they state: ‘…digital media can condense unparalleled amounts of data into small spaces. A 120-gigabyte hard drive that sells for $95…can hold a 120,000 – volume library.

Instant access to online archives for academics and amateur researchers alike shows just how much the digital has changed historical research for the better. For amateur historians and those interested in family ancestry, increased access to online resources and genealogy websites such as Ancestry permit historical research in the comfort of their own home. Ever increasing in popularity, family history or genealogy accessed through the internet can only be of benefit to the discipline of history as a whole because of the increased amount of research taking place. Hence the benefits of increased access are far reaching and ‘The instantaneous access to primary and secondary sources…will likely alter historical research and writing in ways that we haven’t yet imagined.’[3] Another benefit is the depth of historical research and tracing of generations of family members or groups who have migrated. Cohen and Rosenzweig show this positive where they state: ‘A genealogical web page can bring together the descendents of a family who started out in County Cork, Ireland, but later scattered to London, Toronto, San Francisco, Cape Town, and Melbourne.’[4]

Flexibility of digital data has changed historical research because large data can be recorded in databases or Extensible Mark-up Language (XML) and then used for research purposes. This is useful because: ‘…digital information organized into databases or marked up in XML…can be instantly reordered or combined into new forms. Acting on the pieces in a database or XML document…computer programs can pull together disparate materials in a way that compares, contrasts, and enhances them.[5] This adds to research by allowing historians to see long term patterns or trends over a period of time. It would take much longer to physically pull together all of the research and analyse it, hence the digital benefits historical research in this case.

The digital also benefits historical research through manipulability: ‘…the possibility of manipulating historical data with electronic tools as a way of finding things that were not previously evident.’[6] This is similar to the case where Google n-grams was used to uncover new and never seen before information found in long term trends. An important element of historical research provided by the digital is text searching using Boolean search capabilities through databases such as JSTOR.[7] JSTOR and other journals are somewhat problematic though because they charge subscription fees which are not always affordable for historians. The advanced search options of websites such as the Old Bailey Online also allow for increasingly refined and detailed searches, creating better results and historical research.

Digital history encourages interactivity between academics and amateurs, students and teachers through online blogs and websites. This allows for collaboration, debate, discussion and feedback on historical research and findings. Social media such as Twitter is also a very popular online forum where historians and amateurs can share research and discuss issues about digital history among other things. One digital collaborative project between Google Books and a mathematician named Erez Lieberman Aiden has led to a breakthrough in digital research. Instead of having to gain copyright permission from all of the authors on Google Books (which is a consistent problem), the young scholar was able to take all of the data from the n-gram database. In his article John Bohannon states: ‘

The researchers have revealed 500,000 English words missed by all

dictionaries, tracked the rise and fall of ideologies and famous people,

and…identified possible cases of political suppression unknown to

  1. [8]

 

In conclusion it could be said that the digital has much to offer the discipline of history and that it has changed the nature of historical research for the better, bringing many benefits  to it. These include greatly increased storage capacity for academic and amateur researchers alike when storing research, resources and data. The possibilities of storage capacity in the digital are wide and include not only text and paper documents but images and scans of pictures or even primary and secondary sources as seen on the British Museum website. The digital offers increased accessibility for research and to the discipline of history as a whole inimitably. Arguably there is nothing which rivals the access to research materials and resources like the digital does. This is reflected in the rise of online blogging and websites set up by historical researchers, as well as by the growth in popularity of genealogy for the amateur historians or those interested in their family ancestry. The digital also brings increased flexibility to historical research, with the possibility of using XML or putting information into a database and using it to discover long term trends or patterns. This kind of research would not be possible without the digital because of the large amounts of information which would need to be pulled together and analysed, hence, in this particular area, the digital has changed historical research for the better by uncovering new methods through which to research and new results. This can be seen in the case of google Ngrams viewer and its research possibilities. The possibilities in terms of manipulating data to benefit historical research are also similar to this. Websites, such as the Old Bailey Online, can be designed with extensive search engines which use Boolean phrases or advanced search to enable more thorough or definitive results. Google has much to offer in this respect, giving researchers the possibility to research using images, maps, translate, books and a blogger among other things. The digital also offers benefits to historical research, both academic and amateur because it encourages online interactivity, hence it offers scholars the ability to further their research and knowledge through a process of feedback and debate. Despite the digital bringing many positives to historical research it also has its flaws. In terms of quality not all digital resources are completely reliable. Wikipedia, for example, is unreliable because anyone can add to it. Hence it is not reliable for scholarly research which needs to be footnoted. In terms of inaccessibility, some digital resources are inaccessible and require a subscription to access them. This can be limiting for historical researchers or even amateurs who may be unable to afford the cost of accessing the journals. Overall though, the digital many positives and has improved possibilities within historical research. This is reflected in: ‘The unprecedented number of sessions focusing on digital scholarship at the 126th Annual American Historical Association…indicates that historians are active participants in a digital revolution promoting…open and collaborative scholarship.’[9]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bibliography

Books

Cohen, Daniel. J. and Rozenzweig, Roy, Digital History (Pennysylvania, 2006).

 

Journal Articles

Cohen, Daniel J., ‘The Difference the Digital Makes’ Journal of Digital Humanities, Vol. 1, No. 3 (2012), pp. 1-2.

Galarza Alex, Heppeler, Jason and Seefeldt, Douglas, ‘A Call to Redefine Historical Scholarship in the Digital Turn, Vol. 1, No. 4 (2012), pp. 1-5.

Torget, Andrew, J., and Christensen John, ‘Mapping Texts: Visualising American Historical Newspapers’, Journal of Digital Humanities, Vol. 1, No. 3 (2012), pp. 1-4.

 

Internet Resources

 

Bohannon, John, ‘Google Opens Books to New Cultural Studies’, Sciencemag.org, http://dericbownds.net/uploaded_images/Science-2010-Bohannon.pdf; consulted 1st march 2014.

 

Cohen, Daniel, J., ‘Initial Thoughts on the Google Books Ngram Viewer and Datasets’, http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/; consulted 1st March 2014.

 

‘Collection Online’, The British Museum, https://www.britishmuseum.org/research/collection_online/search.aspx; consulted 22nd April 2014.

Sullivan, Danny, ‘When OCR Goes Bad: Google’s Ngram Viewer & the F-Word’, Search Engine Land, http://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181; consulted 28th February 2014.

 

 

 

 

 

 

 

[1] Daniel J. Cohen and Roy Rosenzweig, Digital History (Pennysylvania, 2006), p. 2.

[2] John Bohannon, ‘Google Opens Books to New Cultural Studies’, Sciencemag.org, http://dericbownds.net/uploaded_images/Science-2010-Bohannon.pdf; consulted 1st march 2014.

[3] Cohen and Rosenzweig, , Digital, p. 4.

[4] Cohen and Rosenzweig, , Digital, p. 5.

[5] Cohen and Rosenzweig, , Digital, p. 5.

[6] Cohen and Rosenzweig, , Digital, p. 7.

[7] Cohen and Rosenzweig, , Digital, p. 7.

[8]Bohannon, ‘Google, p. 1

[9]Alex Galarza, Jason Heppeler and Douglas Seefeldt, ‘A Call to Redefine Historical Scholarship in the Digital Turn, Vol. 1, No. 4 (2012), pp. 1-5.

Advertisements

Reflections on Online Contribution for the Digital History Module

Student participation in online discussion through resources such as Studynet and Twitter are particularly valuable in this module. Firstly they allow an arena for debate and discussion on the module topics, as well as providing the opportunity to clarify and confirm ideas and knowledge which is delivered in the weekly workshop. It allows students to show their own understanding of the course material and the views they may have in terms of historical debate surrounding digital history.

            The online discussion elements of the Digital History module offered the opportunity to better understand the course material through interaction with peers. The discussion format via Studynet was helpful because it gave the chance to see the conclusions that other students had come to and look at these in comparison with my own views. However, the format of the discussion meant that once someone had made the main points about the weekly reading or preparation, it seemed as though I would be repeating what they had said. It was for this reason that I felt less confident about commenting on the Studynet page.

            I felt somewhat more confident in using Twitter for online discussion, perhaps because there was not too much opportunity to write something which could be argued against. I find it relatively difficult to critique the work of others and it was for this reason that I did not use Studynet very much. Twitter gave me the opportunity to connect with historians, such as Tim Hitchcock, showing him that I enjoyed his article on the Open Scholarship Project. This method seems of much more benefit to historians than current publishing methods and Open Access.  Current methods seem outdated and expensive for historians, as well as limiting the access to their work. The possibility of being able to access peer reviews which are usually unavailable seems a positive step forward, which could even add more depth to historical debate and improved resources for students or researchers.

            Although I found it difficult to partake in the above mentioned online discussion forums, I can see why they are of benefit to students and the module as a whole. I seems similar, to some extent, to the way in which historians peer review each other’s works. If I had contributed more it would have been possible to gain feedback on the information that I have gleaned from the module as a whole, thus allowing me to better improve my own knowledge of digital history in general.

 

Google Ngrams Viewer: How good is it really?

Whether you are technologically minded or not Google Books Ngram Viewer is a valuable digital tool. It is simple to use and easy to understand. The Ngram viewer uses Big Data which has been collected from Google Books and puts it into simple graphs as seen below. This information enables historians and other academics to find patterns or long-term trends through data mining.

ImageSource: Google Books Ngram Viewer

What does it do? And how accessible is it?

Basically, the Ngram viewer uses graphs to visualise the use of phrases and language found in the Google books collection over a particular period of time. (For a definition of an Ngram see here). Search dates range from 1500 to 2008 and it allows you to search across twenty-two different languages including British/American English, Russian, Chinese and Hebrew. Once the user has searched for their key words, a graph shows the resulting use of them over the set period of time. The dated hyperlinks below the graph take the user to Google books and the selected books related to their search as seen here:

ImageSource: Google Books Ngram Viewer

If the user clicks on the link at the top right had side of the graph they are able to embed that data onto their own website. That is pretty nifty! To top that, Google allow their Ngrams users to download the Big Data from their site and use it in personal research. All that Google ask is that the use of their work is attributed to them.

So, for someone who is usually more at home in a library than on a digital search engine, I am impressed…so far. In terms of visual design for accessibility Google Ngrams is appealing because of its simple and yet colourful graphs and easy to navigate layout. It is possible to manipulate the data, to some extent, through the search terms and you can take the Big Data away and use it for your own purposes. It’s a take away without the calories and no washing up! Basically, they do all of the work for you. Now, that can’t be bad, can it?

How do they do it?

Optical Character Recognition or OCR is used by Google Books to digitize books and make the data available on Ngrams. A basic definition is below:

ImageSource: TechTerms.com 

There are, however, some  flaws associated with the use of OCR. These include:

  • Accuracy rates are not 100% – a combination of manual and OCR transcription may produce better results. Danny Sullivan, an expert on search engines talks about this problem here.
  • Images are also affected – Image colour and detail may not be as precise when using OCR. Manual scanning may take longer but is possibly more cost-effective. Examples of mistakes can be seen below:

ImageSource: The Art of Google Books

ImageSource: The Art of Google Books

  • The case of the f-word – What is known as the Medial S has proven to be something of a problem for OCR when it is used to scan older source material. Often the words and letters are partly or completely misread. Danny Sullivan highlights these problems here.

A further aspect of Google Ngrams is its Extensible Markup Language or XML. This shows the bare basics of what the document is made of. When creating an XML document the person building the resource controls the information available in the search engine. In comparison to this, a HTML document is fixed and unchangeable. For further comparison see here. In Google Ngrams the XML schema has less to offer than, for example, that of the Old Bailey Online. This may be a consequence of the types of data used in each digital resource. The data in Google Ngrams Viewer is formed of groups of letters as seen here in a sample:

ImageSource: Google Ngrams Viewer

Whereas the XML schema in the Old Bailey Online is much easier to identify with. The search terms, for example, are easier to understand:

ImageSource: The Old Bailey Online

Copyright has been something of an issue for Google Books. Although it may impact the Ngram Viewer and its content in one way, this resource has made it possible to download the Big data and use its XML schema for quantitative research. This  removes messy legal problems and is discussed in the following article by John Bohannon.

Overall Google Ngram Viewer has a lot to offer historians. It allows them to see patterns or trends in data over a longer period than would be possible if they were researching through traditional methods. It stores a vast amount of data in a small space which can be accessed immediately. Finally, it offers historians a simple and manageable tool in the emerging and sometimes complicated discipline of Digital History, as Dan Cohen discusses here.

Bibliography

 

‘Big Data definition’, PCMag Encyclopedia, http://www.pcmag.com/encyclopedia/term/62849/big-data; consulted 1st March 2014.

Bohannon, John, ‘Google Opens Books to New Cultural Studies’, Sciencemag.org, http://dericbownds.net/uploaded_images/Science-2010-Bohannon.pdf; consulted 1st march 2014.

Cohen, Dan, ‘Initial Thoughts on the Google Books Ngram Viewer and Datasets’, http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/; consulted 1st March 2014.

‘Data Mining definition’, Dictionary.com, http://dictionary.reference.com/browse/data+mining; consulted 1st March 2014.

Google Books Ngram Viewer, https://books.google.com/ngrams; consulted 27th February 2014.

‘Google Ngrams Dataset’, Google Books Ngram Viewer, http://storage.googleapis.com/books/ngrams/books/datasetsv2.html; consulted 28th March 2014.

‘Google “Sherlock Holmes” search’, Google Books Ngram Viewer, https://www.google.com/search?q=%22sherlock%20holmes%22&tbs=bks:1,cdr:1,cd_min:1910,cd_max:1977&lr=lang_en; consulted 27th February 2014.

‘Introduction to XML’, w3schools.com, http://www.w3schools.com/xml/xml_whatis.asp; consulted 1st March 2014.

‘Ngram definition’, Dictionary.com, http://dictionary.reference.com/browse/n-gram; consulted 1st March 2014.

‘OCR definition’, Techterms.com, http://www.techterms.com/definition/ocr; consulted 27th February 2014.

Sullivan, Danny, ‘When OCR Goes Bad: Google’s Ngram Viewer & the F-Word’, Search Engine Land, http://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181; consulted 28th February 2014.

The Art of Google Books, http://theartofgooglebooks.tumblr.com/; consulted 28th February 2014.

The Old Bailey Online,‘Violent Theft, highway robbery: reference no. t16740909-6’, version 7.1, http://www.oldbaileyonline.org/browse.jsp?id=t16740909-6&div=t16740909-6&terms=horse#highlight; consulted 28th March 2014.

 

Comparison of historical internet sites

Through this blog post I hope to show the elements below in relation to the following websites:

  • The Old Bailey Online
  • Connected Histories

The elements which will be searched for and discussed are:

  1. What is the catalogue system like? How is it structured?
  2. How detailed is the search page? Does it enable an advanced search?
  3. Is the XML schema visible?
  4. Is there an API? Is it possible to manipulate the data?
  5. Is OCR or manual transcription used?
  6. Is the website easy to use? Is it visually appealing?
  7. In terms of data size is the site manageable for historians?

The Old Bailey Online: 

This website enables access to transcribed trials from the Old Bailey in London, from the period 1674-1913. As well as being transcribed in detail, each trial once opened, offers the opportunity to access the actual document via a link on the top right hand side of the page. The website is fairly easy to navigate although there is a lot of information to get through and it is well laid out, colourful and interesting as a whole.  The overall data size is 197, 745 which is a quite substantial when it is being put to use as a historical resource. The XML schema is available at the bottom of the page on each trial and it is possible to search for statistics via the advanced search engine.

The search engine itself has a lot to offer its users. The Old Bailey can be searched through a combination of  verdict, punishment, time period, first name and surname and by using boolean terms or advanced search. This would offer the website’s users increasingly defined and improved access to the trials available. This would also save time if users know what they are looking for and perhaps even if they do not. The Old Bailey Online also has an API link which goes into a lot of detail. Although this detailed information is somewhat difficult to understand unless you are knowledgeable of the language/terminology used.

The trials used in the Old Bailey Online have been transcribed through two methods. Some of it has been transcribed by hand, by five data developers from the University of Sheffield. The second method used to transcribe the trials was a combination of a system developed at the University of Sheffield called GATE and by HRI digital at the Humanities Research Institute, also at the University of Sheffield. The combination of these methods means that they could be used to back each other up, thus  they may offer a greater final product.

Connected Histories:

This is  an online historical resource which is advertised on the Old Bailey Online. It provides information on British History from 1500-1900. Those who put together the latter have collaborated with The Institute of Historical research to bring together twenty two online resources into one search engine. Not all of this material is free to access, unlike the Old Bailey Online, hence a subscription is required for some of the sites.

Access to information on the Connected Histories project is found on a small link at the bottom of the home page, whereas the Old Bailey Online makes the detail of its project development much easier to find.  As the site brings together various other works, no new material was transcribed for Connected Histories. However, the use of OCR and manual checking is described and the failures of OCR are highlighted. API and XML are used but again this section of information is hard to get to grips with unless the language used is familiar to the reader.

Connected Histories offers statistics similarly to that of the Old Bailey Online, but the search engine is simpler. This may be because of the wealth of different resources the site uses. There are sections which allow the user to search whether they require secondary sources, ephemera, images, maps and so on. So this search engine still has a lot to offer. It is well laid out, with each of the individual twenty two resources available to access on a scrolling bar. It seems that it is also possible to save and share information from Connected Histories.

Overall both sources are interesting, of a good quality and of use to historians and anyone who would wish to access them. The subscription required in the latter resource may be off putting for all those except serious researchers who are looking for something in particular. It has been difficult getting to grips with the technical language used in digital resources such as these and I look forward to gaining a better understanding of it all.