A quick guide to transliterating Arabic, Persian or Urdu on your computer

Scholars in the West relying on sources in languages written in Arabic script (such as Arabic, Persian, Ottoman Turkish or Urdu) often need -if only to search the library catalogues- to be able to write the Arabic script in a transliterated or romanized form. This post offers a quick guide to transliterating or romanizing languages written in Arabic script. Transliteration and romanization are used interchangeably to designate the action of writing the Arabic characters in Latin characters.

1. Transliteration systems

Transliteration and romanization system are based on adding diacritic marks to Latin characters to render letters and sounds that don’t exist in English. Numerous transliteration standards are available (ALA-LC, ISO, IJMES for example) which might be confusing, but the most important is to be consistent once you have chosen a system. It is important to note as well that each language -even if written in Arabic script- will have a proper transliteration system. Most North American libraries use the ALA-LC (Library of Congress) romanization tables whereas a number of European libraries use the ISO 233 transliteration standard. Knowing the differences between ALA-LC and ISO 233 will help search library catalogues much more efficiently. Last, some journals or publishers have their own transliteration system which they require authors to use: knowing which standard is used in a specific publication will often make using it much easier.

2. Diacritic marks

The main challenge with romanization is the consistent encoding of letters with diacritic marks. Using a persistent encoding standard will ensure the marked letters display properly regardless of the document format, type of device, or exploitation system you are working on. Inconsistent encoding will result in alterations of the text where letters turn into different signs, often illegible.

3. Encoding standard

The computing standard for consistent encoding of non-Latin scripts is the UNICODE TRANSFORMATION FORMAT (UTF). Developed in the early 1990s by a not-for-profit consortium made of large computing companies (Adobe, Apple, Google, IBM, Microsoft, Oracle) and governmental agencies, UNICODE is regularly amended to include more characters. At present, it allows to write 150 different scripts among which Arabic, Persian, Ottoman Turkish, Urdu and  their romanized forms. Different UTF standards are available, but the most commonly used are UTF-8 (in particular for HTML web documents) and UTF-16 (especially for text documents in both Windows and mac OS environments).

4. Typefaces (fonts)

In order to encode letters in UTF, you need to use one of the rare typefaces that support UNICODE characters such as Arial Unicode MS on PCs, and either Times New Roman, Helvetica or Lucida Grande on mac. If not among the default typefaces available on your computer, these fonts can easily be downloaded for free from the internet.

5. Transliterated letters input

Once you have a typeface compatible with UNICODE, you need a tool allowing the input of characters and diacritic marks. Because regular keyboards layout cannot accommodate key combinations for all characters with diacritics, alternative methods were developed by operating systems: the Microsoft Windows Character Map and the Extended Accent Codes for Mac will give you access to the entire repertoire of UNICODE characters.

6. Additional information

The Arabic Macintosh website is a very valuable resource for mac users interested in transliterating the Arabic script. The Digital Orientalist dedicated a lengthy post to keyboard layouts in both mac OS and Windows environments.

Women’s Worlds in Qajar Iran Digital Archives

Women’s Worlds in Qajar Iran (WWQI) is a digital archive of materials related to the lives of women during the Qajar era, inclusive of the period immediately preceding and following the dynastic period (1786 -1925). The goal of WWQI is to address a gap in scholarship and understanding of the lives of women during the Qajar era.

“Given the dearth of available primary-source materials related to women in the Qajar era, it is not surprising that, to date, the vast majority of Qajar social histories have focused almost exclusively on the struggles, achievements, and day-to-day realities of the men of that period. This is in part a matter of expediency; while men’s writing have been easily accessible in various national archives for decades (and many have in more recent years been published in edited volumes), most women’s writings, photographs, and other personal papers have to date remained sequestered in private family hands.”

WWQI aims to open up the documented social and cultural histories of Qajar women, thus allowing for the examinations of broader patterns of life during this era.

The materials included in the archive are not only those contained in private archives and manuscripts but also published materials from the Middle Eastern Collection in Widener Library and other institutions. They consist of:

  • Writings: letters, prose, poetry, travel writings, essays, periodicals, and diaries
  • Legal documents: wedding contracts, dowry documents, settlements, endowments, powers of attorney, wills, sales, and other financial contracts
  • Artworks: calligraphy, painting, embroidery, weaving, other handicrafts, music, and film
  • Photographs
  • Everyday objects
  • Oral histories

You could begin your search either by clicking on “Collections” or on “Browse”. All roads tend to lead to the search engine, where you can refine your search with keywords and filter selection.

The website uses Elastic Search full text search engine which supports both English and Persian language-specific searches. While the results should be consistent, the results may vary slightly in terms of relevancy ranking.

The website also includes a research platform which put students and scholars in collaborative conversations, and generate innovative scholarship on the cultural history of the Qajar period focused on lives of women and issues of gender and sexuality.

To learn more about how the Archive generates the digital holdings, see the documentary essay by Nicole Legnani, Commissioned by the Office of the Digital Arts and Humanities at Harvard University.

The Harvard University Library (HUL) central infrastructure accommodates all image, text, and audio materials collected for this archive. All WWQI materials can be accessed through the following Harvard University Library catalogues as well: Visual Information Access (VIA) system and HOLLIS Catalog.

Fihrist: Union Catalogue of Manuscripts from the Islamicate World

FIHRIST is an evolving union catalogue, for 11,015 Islamic and other Middle Eastern manuscripts.

The collective holdings of the contributing Libraries of the UK are of substantial intellectual and cultural significance. All contributing libraries have been selectively collecting manuscripts from all subject areas, and of various geographical origins, dating from the 7th to the 19th century CE.

“FIHRIST is a free on-line catalogue for manuscript descriptions.

FIHRIST is not a digital Library”

FIHRIST developed from a pilot-project between Oxford & Cambridge to become               a UK-wide union catalogue. The catalogue is constantly growing in volume, as libraries and research projects are contributing manuscripts descriptions.

The union catalogue provides basic and advanced search options. One can search in English, Arabic or Hebrew by using the additional Keyboard in the search box. When using advanced search, more search options and a list of tips are made available to improve the search results.

In terms of manuscript availability, “if a digital copy of a works exists on-line, a link is provided and maintained by the institution holding the manuscript. To request digital copies, or contact the institution directly, you may use the field Comment on this record at the bottom of every description.” The level of details provided in each entry varies and are changing over time as research progresses.

Sample of an Entry

The user can browse the catalogue by:

  • Classmarks (also called shelfmarks, classification number, etc.)
  • Works
  • People (personal names)
  • Subjects (basic LC subject headings)

While browsing, limiters will varies to best suit each category. For instance, if the user chooses to browse the catalog by classmark, limiters such as language, century, physical form, materials, decoration, institution or collection are made available. Whereas if the user chooses to browse by works, institution & language are the available limiters.