Memoir of a chart of the East Coast of Arabia – Project 1

In my post about crowdsourcing I mentioned a digitisation project we worked on as a class. We made a accounts in the site 18th Connect, and together we digitised Memoir of a chart of the east coast of Arabia 1764. This manuscript is a telling of a journey and description of an old map. To access the site one need to create an account and password and can immediately start editing any text that is available in the site.

This manuscript, written back in 1764 is read through an OCR, in particular TypeWriter. The text is read and then an editor, like me and my classmates, has access to every line that is identified by the software. This includes many mistakenly read ink spots, paper folds and others, which should be removed from being included as data. There are also mistakes when it comes to the spelling of the words and sometimes letters that are hard to read. This is the first level of editing that we had to do. Many other additions to the text and its metadata followed. As my final project I decided to finalise the document and prepare it for online publication.

In the following paragraphs I will describe the process, some problems and some solutions and generally refection on the making of such publication.

Turning the Memoir into a TEI XLM file requires many additions to the original text and correcting the OCR’s mistakes. This format is associated with some specific tags, similar to those in HTML, but more literature specific. In order to add more information about the specific layout of the text, the fonts, the character’s size, purpose, I started using the TEI Guidelines. They are very complex because they deal with texts in enormous detail. There are notations of verse structures, rhymes, play structure, character information and many many others. There are tags that give incredible amounts of metadata about the information put in the text. In order to properly annotated a text in this manner one must know the specific tags, similar to learning a program language. I am very incompetent when it comes to the many functions of the language, so my current edition of the Memoir isn’t very profound in metadata. However, diving into the depths of TEI showed me how much there is to learn and what precision of information one can achieve. I found that TEI is not a very popular language outside the limited interest in Digital Humanities. I also noticed that many of the examples in the site are given with logograms, which suggests involvement from far East countries.

Thought the whole text there is a spelling specify that comes from the difference in the age of the English language. Often instead of “f” or “s” the author would be put the long “s” according to the spelling back then. In order to provide accurate “translation”, editors have to change the old letter with the new equivalent to keep the meaning. There are sometimes just spelling mistakes where the OCR couldn’t read the text because of a blur and we had to correct them as well. These were minor things that were fixed a couple of days after we started. There were many symbols such as pictograms of anchors,

Anchor pictogram
degree notations and fractions that aren’t traditionally included into a keyboard. Those are initially noted with “@“ and later replaced with the right symbol using the functions in TEI Guidelines.

Punctuation and emphasis like italics, bold, and underline are manually imputed correspond to the text. This task first requires a closer look to the text and its meaning. Putting different character descriptive tags provides a further reading from the editor’s side. The process happens attentively first in order to copy the text and second to transfer the meaning correctly. Here the editor has some freedom of interpretation but mostly the obligation to fully transfer the author’s intent. The same goes concerning page layouts. They should be inputed in order for the original arrangement to be kept even in digital format, so that it can be recreated as well. These minor but important additions relate to the topography of the text. With time understanding and interpretations change, so keeping as much as possible from the intended is important. The software, TypeWrite, keeps track of all changes that are made to each line with username, time and change made. This information is valuable and it kept in the XML file, because it is very influential to meaning of the text.

I have used the tag “hi rent=“italics”/“bold”/etc” to mark text. The location of certain pieces of text I have marked with “hi rent=“center”/“right”/“left”” . These tags are specific to TEI, however I have used other tags such as marking a paragraph “p” that are typical for HTML but are used in TypeWrite as well. In all of these small insertions that the editor makes can be made mistakes. Or at least there are enough different ways for them to be written that differences may occur. For example, I know that not all commas that follow an italicised piece are italicised. Sometimes I would put the closing tag “/hi” before an additional comma because I haven’t seen it. There are many minor variations like these that come either from the inconsistency of one editor or between many with different styles. Also I would add that even in this text I don’t think I have manages to capture quite all of the elements on the page. There are lists, different indentations and stories within stories that I haven’t marked. This is partly because I don’t feel very comfortable using all of the TEI notations and partly because it requires a lot of time and screen staring.

Once having transcribed the text and all of the information contained within the page, the editor can add annotations, comments, footnotes, and any type of tag to provide additional interpretations. In my edition of the Memoir I have not completed this phase. My interactions with the text end with the input of the actual information.

However, from here one starts academic interest and need for more precise knowledge in TEI. The map can be described in great detail as there are some repeating characters such as Captain Smith, and there are many locations. I think that given the nature of the text – a distribution of an old map and a memoir of a traveler – it would be very appropriate so someone to georeference the locations and see if the distance estimates are correct. As we noticed during the GIS day the coasts around the Arab peninsula change often and it would be interesting to see the changes from 300 years ago. Moreover, there are instances of interaction with groups living near the coasts. There are lists of stocks, plans for trip provisions, and accounts of valuables that the Captain has encountered.

The user interface of the site 18thConnect cannot be left without a comment, because in such crowdsourcing initiatives the experience for the user is very important. The site doesn’t look too welcoming from the home page, but the bigger issues appear when editing starts. The portion of the original text that is seen is very small, which would make sense if this space was attributed for editing purposes instead. But it isn’t. The OCRed text appears in a very small box below the original, which is not very easy to navigate with a mouse as you have to click on the next line in order to start editing it. The problem is that the given shortcut controls for movement between the lines, insertion or lines, deletion and submission don’t work properly. Most of the times I have used them the site doesn’t respond and even after clicking the “Insert line” button the page doesn’t refresh and no result shows. Not to mention that if you are using iOS the shortcut commands are completely inaccurate.

That doesn’t aim to discourage you from editing text, it is just a note to the creators. There is a certain satisfaction of contribution to the world that I associate with this finished digitalisation and with others I have completed. Although small, it is a lasting impact that one can make in the world, which will be recorded in the metadata of the work to be remembered as long as it exists. I think it is fulfilling for people to spend sometime on a text, because it triggers both internal and external change.

NOTE: To anyone interested in this particular text I should mention that I know of at least to instances where something must be added, but I am unable to do so.
Top of page 8 needs to have a line with: “hi rent=“center” (2)”/hi” added.
On 9 page, line 23: a Boat “founded” or “sounded”?
Add a table on page 4

Participants:

rindh.hosting.nyu.edu

whataboutlife.hosting.nyu.edu

themultilingualmuslimah.hosting.nyu.edu

digitalhumanties.hosting.nyu.edu

mlmidh.hosting.nyu.edu

AND Professor Wrisley in his course https://wp.nyu.edu/ahcad139/

Crowdsourcing

With the internet available to a big part of the human population and with its size bigger than ever before crowdsourcing has become a great way for gathering data and information. The work to be done with the grandness of the Internet is a lot and counting in the new social and psychological sciences to gather data directly from the people without too much involvement is a great way to get things done. Wikipedia is the biggest example of such a project. Everyone uses it as a source of information when wondering and it is all the creation of users putting in information. Of course, a certain amount of editing must be done but it is a lot easier to do once you have a page written out.

It works for many things: open source programs, digitalisation of text, making studies, informing about different places and events all around Earth. Crowdsourcing lets anyone become a part of something bigger than themselves and accomplish more as a part of humanity.

In the case of making text inputs and working with digital texts as a whole, “digital humanities”, crowdsourcing is a very good way to increase the worldwide heritage. With so many texts written before the option for spreading them with super fast speed to all corners of the Earth, there is a great need for help, that doesn’t require any skills. Checking texts for grammatical mistakes, made by softwares, scanning old books, etc. is easy for everyone and there is a need for it.

Here in Abu Dhabi there is a great language barrier, as Arabic is hardly read by computers and many mistakes are found. Also, in the context of NYUAD we have a grand library collection of Ancient arabic texts that most of the world hasn’t yet seen. Making a catalog of these texts would be one way to increase their availability and it can be done using crowdsourcing from NYUAD. On the other hand, having many students fluent in Arabic we can even transfer these texts in a digital format that is accurate and useful to scholars all around the world.

This work would be hard and long, and a great pressure only to the people who know Arabic so using the best part of crowdsourcing – that it is free – will not be an option for this project. Some sort of encouragement should be made in order to find enough participants but even then the numbers will be small and the chance of students carrying out the workload is small. This is one of the biggest problems projects like this, that have a limited target of crowd, face. So instead we can work out something that would help the Western world to get to know the Arabic world, instead of only those interested in Arabic already.

Another suggestion for a crowdsourcing project that can be carried out in the UAE, and with the special help of NYUAD is gathering a “yellow pages” for Abu Dhabi. A lot of tourist guides are posted on the internet but for the UAE they don’t include sufficient information and not many people are doing them. The one option is expats living here, but they have to attend to their lives and work so they don’t have enough time to explore, another is journalists who only come to visit, and the last one, least likely is natives who live here. This is a country with a very closed society so having some way to understand more about it, presented in a pleasant, user-friendly manner, would be a great project.

NYUAD students, and other students who live here because they go to international universities are the best option for crowdsourcing on such a project: we have a lot of free time and travel around a lot, we try to find different forms of entertainment and we also have different perspectives on what is  “fun”, “interesting”, or “cheap”, which will allow for a greater amount of audience to gather. Sharing one’s adventures is a good motivation for people to participate, because people love to talk about themselves. If there is an unified form for presenting information about plans, rating it and etc. it will also be easily categorised (in fact as it is inputted). Such project may face problems of editing because sometimes young people can be too impulsive and write things that are not accurate, so just like Wikipedia there will be a need for editors who will not only have to check the entries but also if possible even visit the places to give a more informed statement on a place, event or series of events.