Institut für den Nahen und Mittleren Osten



The Arabic Papyrology Database Guidelines

APD Guidelines New

The Arabic Papyrology Database Implementation Details

Guidelines for preparation and implementation of texts for the Arabic Papyrology Database (

1. Basics

1.1 What is a document?

1.2 Reeditions and emendations

2. Brackets and asterisks

2.1 Brackets

2.2 Lacunae

2.3 Symbols and checking marks

2.4 Other languages and scripts

3. Layers

3.1 Plain line السطر الاصلى

3.2 Plain words المڤرداپ

3.3 Words with full dots المفردات

3.4 Words with full dots and vowels الْمُفْرَدَاتُ

3.5 Transliteration

3.6 Lexicon / Lemmatisation

4. Standardforms and Crossreferences

4.1 Standardforms

4.2 Standardlemma

4.3 Invisible "mistakes"

4.4 Crossreferences

5. Decisions and reminders concerning vocalization / lemmatization

6. Instructions implemented into user interfaces

 7. Obsolete instructions

1. Basics

1.1 What is a document?

If one sheet carries two texts written at different occasions, we tend to create two or more datasets, even if the editors publish them under the same title. We always separate texts in case of reused scrap paper, where the two texts are not related at all, but also a letter and its reply, or a debt and a receipt related to it, will be separated. Marginal notes which are relevant for a text’s function, like witness signatures, registration marks etc., are not separated. The mention of a new date in the second text is an important criterion for separating two texts. We label the two texts a) as the editor does b) recto and verso if applies c) „primary use“ vs. „secondary use“ d) with small letter a, b, c. The same label appears in the „name“ of the edition and in the inventory number.

1.2 Reeditions and emendations

In general, we enter the editio princeps as basic text in the APD. Only if the editio princeps is unreliable (or skips parts of the text), we might treat the reedition as basic text. Editions published before 1863 may be completely ignored, unless they offer useful variant readings. For various reasons, we may occasionally replace the reading of the editio princeps by an emendation, and treat the editio princeps like a variant (d.Sept.21).

Team members may also implement their own corrections in a limited number of cases, mainly typographical errors of the edition. We assume a typographical error if a) the translation contradicts the edition b) the word or phrase is current c) the editor quotes the correct text in a footnote d) the plate is very clear. In these cases, we just type in the correct text, giving a remark or even without a remark. If a team member has a divergent scientific opinion, but this has not been published yet, we insert it as a variant with our member shortname.


2. Brackets and asterisks

2.1 Brackets

We use the system used by Diem and Khan in their recent publications. (d.17)

  • Single square brackets [ ]: text written by the scribe, but disappeared since, and completed by the editor.‬
  • Double angular brackets « »: Uncertain reading. (Unicode 00AB and 00BB)
  • Angular brackets < >: text omitted by the scribe, and completed by the editor.
  • Double square brackets 〚〛: erasures, deleted by the scribe (Unicode 301A and 301B)
  • Curly brackets { } text written by mistake. In case a misspelt word (in curly brackets) was corrected by the scribe himself, the correct word follows the misspelt one without brackets. Give in that case also a remark. (d.61)
  • Round brackets / parentheses (): the solution of an abbreviation. The abbreviation itself is marked by curly brackets.

Please note: [[Erased words]] and {superfluous words} appear only in plain line السطر الاصلى - i.e. we do not list them in all layers nor are they lemmatised. (d.July06). In case of abbreviations, the abbreviation is given only in plain line, its solution is given in all layers and lemmatised, e.g. P.GrohmannProbleme 4.4 (d.Dec.06). In lists or accounts, crossing out a phrase is considered as a kind of checking mark, so the text is given in all layers. (d.Nov.07)

2.2 Lacunae

Destroyed or undeciphered letters, often represented by dots or blank spaces in a printed edition, are represented by asterisks in the database. If the approximate number of lost letters is known, we insert the same number of asterisks. Three asterisks may also stand for a gap of unkown length. If the lost word is a name, we do not insert „fulān“, even if the editor does so, but *** with a word category „n.“ and the domain „prop.pers.“, displayed in brackets: „n. (prop.pers.)“. (d.2, d.Dec.10)

2.3 Symbols and checking marks

Symbols and checking marks are represented by asterisks, just as undeciphered text. They will be differentiated by a remark to „plain words“ / المڤرداپ, like „place holder (dot)“ or „symbol (cross)“. They will be assigned a lemma with the word category „mark“ or „symbol“ respectively. Word categories are searchable in the tools „Text“ and „Lexikon“.

2.4 Other languages and scripts

  • Foreign text is represented by asterisks *** in all layers. The language is specified in a remark referring to „plain line“ / السطر الاصلى, e.g. "line in Greek" or "phrase in Judaeo-Arabic" (d. Sept. 10/Oct. 14). We do not create lexikon entries for foreign text. Instead, everything will be lemmatized with asterisks, the lemma-supplement indicating the language: 101 for Greek, 102 for Coptic, 103 for Judaeo-Arabic, 104 for Hebrew, 105 for Hebrew in Arabic script, 106 for Aljamiado, 107 for Castilian/Catalan words in Latin script, 108 for Latin in Arabic script, 109 for Latin, 110 for Persian, 111 for Turkish in Arabic script, 112 for Uighur script. (d. Oct. 14, update Nov. 17). Single words may be inserted in the arabic layers, but will be translitterated as asterisks.
  • Foreign languages in Arabic script (e.g. „ego testis“ in will be vocalized tentatively and translitterated, but the text will still be assigned to the lemma ***.
  • Greek numerals are replaced by the sign ÷ in the arabic layers. They are translitterated by the number given by the editor, except that we use + instead of a space when dealing with fractions, e.g. 1 ½ will be translitterated as 1+1/2. Numerals are not lemmatised at all. Occasionally, you may find greek numerals in the arabic layers, mainly in texts entered before Oct. 2012.
  • In case of documents with a contemporary translation on the document itself (e.g. Arabic document with Latin interlinear translation on the same document), refer to this translation in "long remarks" and give references within the document, if possible (e.g. line 3: line in Latin). However don't offer this "translation" as translation of the document (in field translation) as it is not done by an editor and we cannot be sure about its status, being a "real" translation or not. (d.Jan.13)
  • For loanwords, we create Crossreferences (see below).

3. Layers

3.1 Plain line السطر الاصلى

  • Dotless bā' , nūn, yā' etc. are represented by a bā' with three dots ( پ , normally combination Alt+F). (d.23) Dotless qāf or fā' are represented by a qāf or fā' with three dots ( ڤ , normally combination Alt+T). (d.23)
  • Final qāf, fāʾ and nūn will be represented by dotted letters in this and the following layer, because their shape is unambigous. (d.8) [ich weiss nicht, ob "following layer" für Aussenstehende klar ist]
  • Punctuation of final nūn, qāf and fāʾ: give a reference in Plain Words, either "nūn with a dot above", "qaf with two dots above" or "fāʾ with a dot above". For "unusual" punctuation see below. (d.Aug.06)
  • Columns in a table (e.g. accounts) are marked by vertical bars | (alt+7 on the Swiss keyboard) in layer plain line (and only there). In case of text that is arranged in columns see chapter 1.2.b). (d.Dec.06)
  • In case we need to insert tatwīls to fix the position of brackets, asterisks etc., we give the tatwīl with a space (d.Sept.10), see

3.2 Plain words المڤرداپ

  • What we cannot display, we give as a comment to plain words. E.g. qāf with one dot above, sīn with dash above, šīn with three dots in a row above, ṭāʾ without vertical stroke, yāʾ with two dots above (considered as punctuated yāʾ). Alif with three dots below is considered as a kind of ornament. (d.49)
  • In case of misplaced dots, add reference, as e.g. "dot of ǧīm misplaced under yāʾ (P.Cair.Arab. 71.20). (d. June 10)
  • If final qāf, fāʾ or nūn is punctuated, give a reference.
  • If the vast majority of final nūns, qāfs and fāʾs is dotted, give a remark in the metadata-field "Remark(s)": "Final nūn, qāf and fāʾ dotted until otherwise stated. (d.Feb.12)
  • If a whole document ist punctuated in the maġribī-way, give a remark in "long remark": Dotting of qāf and fāʾ: maġribī (d.Feb.12)
  • Words over two lines are assigned to the first line from layer plain words on. (d.25)

3.3 Words with full dots المفردات

  • Tašdīd and hamza are considered as equivalent with dots, so we display all tašdīds, hamza and madda in this layer. E.g. الشّيخ with tašdīd, الإسكندريّة with hamza.
  • We print dots on tāʾ marbūta and final yāʾ, regardless of whether the editor does or not.
  • We add all the hamzas needed to match the spelling given by Hans Wehr, e.g. براءة قبض واستيفاء. Words like ماية or بير will be replaced by مائة and بئر respectively, unless the yāʾ is dotted in the original. In that case, there will be two dots in all arabic layers. (the transliteration will then be bīr and miyaẗ).
  • In the rare case of floating hamza, we insert an additional tooth in the layers المفردات and الْمُفْرَدَاتُ, e.g. P.Marchands II 26:6 (d.June06).
  • 3.3.7 Problem of širāʾ / širà and similar cases
: If both forms are admitted by Wehr, and both forms are attested, we use the form ending in -āʾ as Lemma. If the rasm allows it, we add hamza from layer full dots on. In case of alif maqsūra, no hamza is added, but we still link the string to the lemma with hamza. (d.July06) Create a crossreference (see below).

3.4 Words with full dots and vowels الْمُفْرَدَاتُ

  • Completion of alif superscriptum only in this layer. (d.March06, abolishment of d.13) We write alif superscriptum without fatḥa, except in spanish documents where the fatḥa occurs in the original. In this case, we give the fatḥa in all layers (with alif superscriptum in this layer), and replace it by an ā in layer latinized. (d.Feb.12)
  • We insert "Hilfsvokale" (Fischer §53) at the end of words, like in اشْتَرَتِ الْمُبَارَكَةُ، مِنَ الْمَنْزِلِ but leave ʾalif al-waṣl unvocalised (d.Sept.21)

3.5 Transliteration

  • vocal of alif al-waṣl: ă, ĭ, ŭ (d.12). This is the only point where the transliteration is not a mechanical transformation of the previous layer.
  • alif maqṣūra in context if written as a tooth without dot: à. E.g. akrà + hā: akràhā. If written as alif mamdūda: ā. (d.50)
  •  ‭‫عافوك‬ instead of ‫عافاك‬ is transcribed as ʿāfā ka, parallel to ‫صلوة‬. (d.Oct.06)‭
  •  "Hilfsvokale" after nunation like in muḥammad:ini ăn-nabiyyi have to be added manually after upload, because we cannot type them in Arabic script. We will not add them anymore (d.Sept.21, abolition of d.Feb.06)
  • Undeciphered Arabic words are nevertheless transliterated. An unpunctuated tooth is represented as "B", unpunctuated fāʾ or qāf as "F". (P.Köln.Kauf. :5). (d.Feb.06). Letter ح in an undeciphered word is transliterated ḥ and not Ḥ. (d.Sept. 10)

3.6 Lexicon / Lemmatisation

3.6.1 Search hint

It is highly recommended for beginners to search this level. You will find all forms relating to the lemma, e.g. a search for the lemma „kitāb“ will also find all instances of „kutub“. There are also Cross-references for ortographical variants, e.g. if you searched for "miʾaẗ", in the results you will find a suggestion to search also "miyaẗ".

3.6.2 Composed lemmata

  • For composed place names like ʿayn šams, personal names like ʿabd ăl-lāh, the numbers from 11 to 19 (see corresponding chapter), titles like ʾamīr ăl-muʾminīna, we create separate lexical entries/lemmata. The strings belonging to composita will be assigned to the composed lemma only, i.e. the transliteration ʿayni will be linked only to the lemma "ʿayn šams" n. (prop.loc.). As composed lemmata have no inflection, there will be no standardforms. (d.Sept.21, abolition of previous guideline, where ʿayni would have been linked both to "ʿayn" n. [common noun] and "ʿayn šams" n. (prop.loc.))
  • The components of the kunya (i.e. ʾabū so-and-so) will only be lemmatised once. (d. Oct. 10). However, the second element of the kunya will be treated as a personal name, which has another lexical entry than the common noun.
  • The article will not be lemmatised in this way any more, e.g. ʿabdi ăl-muʾmini: only ʿabdi and muʾmini are linked to the lemma "ʿabd ăl-muʾmin", the article is lemmatised as article (d.May06, updated Sept.21)
  • Abolishment of (d.Oct.14): The roots of each compoment will be given with a comma in between (plus space after the comma). (d.July06) Replaced by: No roots are given for lemma with multiple components (d.Oct.14).
  • The expression as a whole will be assigned the form ø (d. Aug. 14)
  • A composed word is something more than its elements. It often has a specific meaning which cannot be inferred from its elements. The domain may change from common noun to proper name, or, as is the case for numbers 11-19, their inflection is unusual. If none of this applies, we do not create a new lemma. Lists of titles like 'supporter of the religion, honour of the scholars' are not considered as lexical entities (d. Sept. 21, replacing: Double lemmatising only for non-trivial compositions).
  • Composed expressions with particles, like kaḏa, mimman, ʾinnamā are treated like one word. That means that they are not even split in the layer "Transliteration".
  • Currently (August 21), composed lemmata will show up twice in the search results, because both elements have been linked to the expression.
  • Give a crossreference (see Crossref, Type 3) (d.July06)

3.6.3 Shape?

4. Crossreferences, standard forms and standard lemmata

4.1 Standardforms

We use "standardforms" to tag strings that deviate from classical Arabic as described in Fischer's grammar. See Hopkins to get an idea what Middle arabic might look like. Some developments can easily be represented this way, see for example:

  • The omission of ʾalif tanwīn, or hyper-correct use of ʾalif tanwīn, e.g. "dīnār - ( -> dīnār:an -“. No vowel, not even sukūn, in layer الْمُفْرَدَات.
  • Gender issues in numbers
  • The word ʾabū frozen in nominative
  • Indifferent use of ăllaḏī
  • For duals and regular plurals, the nominative is often replaced by the acc./gen. form, e.g. dīnārayni instead of dīnārāni, muzāriʿīna instead of muzāriʿūna. The opposite, i.e. hyper-correct forms, also occurs.
  • Imperative written with prefix-verbform, i.e. taktub ʾilay-ya instead of ŭktub ʾilay-ya, will be displayed as follows: kataba, → ŭktub (, i.e. with a xxx "standardform" ŭktub and a "standardformform" imperative (d.April11).
  • Some strings are „wrong“, but this does not affect their syntactical function. A frequent example are form of ĭbn where the ʾalif is dropped, although it is exspected (Fischer §22b), as in ġilyālima bni ʾaḫī-nā 'William, the son of our brother', see
  • The same is true for non-contracted apocopate-forms: e.g. taqūm instead of taqum (lam taqūm la-nā)in P.Hamb.Arab. II 7, line 7. We put a sukūn at the end of taqūm and assign the form The xxx "expected string" is taqum with no "expected inflection".(d. April 11)
  • A double lemmatising is possible if a word is "wrong" concerning different aspects, e.g. ʾabṭà instead of ʾabṭaʾà instead of ʾabṭaʾat. The first entry will thus be ʾabṭà with "expected string" ʾabṭaʾa and form "", and the second entry ʾabṭà with expected string ʾabṭaʾat, form "" and expected inflection "" Another example: taʿlamū instead of taʿlamūna instead of taʿlamna. First entry will be taʿlamū, standardform taʿlamūna, form,, standardformform The second entry will be taʿlamū, standardform taʿlamna, form,, standardformform July 06)

"Expected inflection" is currently not searchable (as of September 2021). You can search the form „“ without case, but this will also give you results from accounting, where no inflection is expected.

4.2 Standardlemma

  • Some colloquial expressions will be tagged with a separate lemma, i.e. ṯalāṯaẗ realized as talātaẗ (with dots) will get its own entry in the lexicon. This lemma will have a „standard lemma“, which is indicated in the lexicon section in round brackets, e.g. talātaẗ (→ ṯalāṯaẗ) in Note that the non-standard-lemma may exhibit another root than the standard-lemma, in this case tlt instead of ṯlṯ
  • A string assigned to a sub-standard lemma will be considered syntactically correct (in general), so there is no reference to a standardform.
  • To ease search, we create crossreferences between the classic and the colloquial word, e.g. from ṯalāṯaẗ to talātaẗ and vice versa. If you search for one of them in the TEXT-tool, in the Layer „Lexicon“, you will find a suggestion to search for the other at the end of the result list.
  • verba mediae geminatae being treated as verba tertiae infirmae, example: xxx
  • Composita may also have a standard lemma, e.g. ǧumādà ăl ʾawwal is considered a lexical alternative to ǧumādà ăl ʾūlà.
  • ṯalāṯaẗa ʿašara which is written as ثلاثة اعشر is transcribed as ṯalāṯaẗa ĭʿšara (d.August06) and assigned a separate lemma, with crossreferences (d.Sept.21)

  • In case of plene writing of short vowels: We retain them and add a corresponding lemma (with standardlemma and crossreferences). (d.Dec.10).

4.3 Invisible „mistakes“

Other issues cannot be described by looking just at the single string.

  • The use of for negation where is expected. You may occasionally find a xxx „standardform“ , but APD members get used to middle Arabic after a while, so they do not consider it sub-standard anymore. [der Aufwand für eine Korrektur lohnt sich eigentlich nur, wenn man einen Artikel zum Thema schreibt]

  • In النّصف دينار, vocalised as ăn-nisfu dīnār:in, the status of niṣfu is determined where we expect constructus, but the string looks the same in both states. It will be lemmatised twice, once as and once as This is searchable: search for two different forms, and select „at the same word position“(d.Sept.06).
 The construction النّصف الدّينار is treated as apposition, so no genitive, vocalisation ăn-niṣfu ăd-dīnāru. Niṣf is only determined in that case (d.Sept.06).
In analogy to niṣf, in الثّلاثة دنانير transliterated ăṯ-ṯalāṯaẗu danānīra, ṯalāṯaẗu will be lemmatisied twice, once as and once as, danānīra will be lemmatised as, while الثّلاثة الدّنانيرis treated as an apposition, i.e. ăṯ-ṯalāṯaẗu ăd-danānīru (d.Sept.06)
  • defective writing of long ā: we insert ʾalif superscriptum. يأبا and يأخي are handled as follows: put an alif superscriptum on the initial yāʾ and add a hamza on the alif. In layer latinized thus yā ʾabā and yā ʾaḫī. (May11). In all cases, the omission is invisible in the latinised string, and searchable only in the arabic script layers.
  • The number hundred is written in many different ways. From layer words with full dots on we write it either as مئة or مائة , i.e. we insert the hamza. e.g. مایتین we write as مائتین and ماتین as مائتین. ّIf the alif is omitted in the Arabic script, we omit it also. If there are not enough teeth to add hamza, we will add a tooth in this only case. (d.May06) In layer transliteration, we remove the orthographical ʾalif, so the difference between مائة and مئة disappears.
  • Omission of ʾalif al-waṣl or lām of article: the article is in general transliterated as ăl- (or ăt-, ăṯ- etc.). When the lām is omitted, the ʾalif is transliterated as ĭ, like in ادّراهم : ĭd-darāhim. (d.Oct.06). When the ʾalif is omitted (also after li-), it is not transliterated, like in لشّيخ : li-š-šayḫi.

  • When the editor supplements a letter the scribe had omitted, like ʾalif tanwīn, this will be displayed in angular brackets in layer السطر الاصلى. In the other layers, the text will look regular.

  • In case the rasm does not allow the insertion of hamza, we will write it without, add the lemmaunit to the corresponding lemma with hamza, and give a standardform in editlemmaunit pointing to the form with hamza. (d.May06) [Bsp. ʾabraw statt ʾabraʾū: standardform macht weniger Arbeit als Standardlemma mit crossreference, aber eigentlich wäre ein standardlemma hier passender, und teilweise wurden auch welche angelegt.]


4.4 Crossreferences

  • Type 1: Loanwords, like dirham (from δραχμη): Users may type the greek word (all lower case, without accent) in the TEXT-tool, and be guided to dirham. For Coptic loanwords, like ⲑⲱⲟⲩⲧ, we use the Coptic unicode section. We create crossreferences for Greek and Coptic personal names, when the editor gives a greek/coptic spelling in the footnotes. Persian crossreferences (e.g. bērōn) are written in latin transcription. We also create crossreferences from modern/english varieties of a name, like Venice, Michael. We try to be generous with crossreferences, the etymological relation may not be verified in all cases.
  • Type 2: Lemmata with different spellings (like xxx), or lemmata mentioned together in one paragraph in Wehr (e.g. muḏ and munḏu or sa and sawfa): Reference from both lemmata to each other. The reference comment will be "see also".
 In case there are more than two options, we define a main lemma, and create references from the main lemma to the variants and from all variants to the main lemma. In case it is not clear which lemma may be the main lemma, we will take the most frequent one. This applies also to greek names which have different spellings both in greek and in arabic.
  • Type 3: Lemmata which are a combination of to or more elements (e.g. ʿabd ăr-raḥmān, mimman, ʾinnamā): Reference only from its elements to the lemma. The reference comment will be "see also".
  • Type 4: Lemmata which we have not found in dictionaries and obviously seem to be non-standard (e.g. talātaẗ vs. ṯalāṯaẗ): Reference from the attested lemma to the standard lemma and vice versa.
  • Type 5: Users who are not familiar with our transcription may search for tārīḫ or ism instead of taʾrīḫ or ĭsm. We occasionally give crossreferences for that, especially if editors quote a spelling which is different from the one given in first place in Wehr (and adopted by us). (d.May06)

5. Decisions and reminders concerning vocalization / lemmatization

5.1 Numbers from 11 to 19

  • Each component is lemmatised twice, as single word an as expression. As single components the first one is lemmatised as acc.constr. and the second component as gen.indet. (In case of 12: The first part is lemmatised as nom., acc. or gen.) [to be replaced by: both elements are linked to the composed lemma. There will thus be no reference to standardforms in case of erroneous gender. For variants like talātaẗ ĭʿšara, we create separate composed lemmata, with crossreferences (d.Sept.21)]
  • Numbers from three to ten with Ta-marbūṭa are lemmatised as f. and those without Ta-marbūṭa as m.. (d.March06)
  • The number as expression is assigned the form ø, hence no gender and no standardform (d. Aug. 14, replacing d.August06 and d.Nov.12)
  • ʿašara which is written as اعشر is transcribed as ĭʿšara (according to Hopkins). (d.August06)
  • Be aware of the different vocalisation of عشر and عشرة, thus ʿašru banāt:in but ṯalāṯaẗa ʿašara raǧul:an and ʿašaraẗu riǧāl:in but ṯalāṯa ʿašraṯa bint:an.(July10)

5.2 Asyndeton (sentences without appropriate conjunctions)

Verb serialisation where the first verb is in imperfect and no subordinating morpheme is found: We assign to the second verb imperfect (following Fischer §188 where the case of perfect - imperfect is covered). E.g. "fa-ŭnẓurī ʾan taqūlī li-zawǧi-ki yaqbiḍu min-hu mirwaḥaẗ:an wa ..." (P.Marchands II 28, recto 6) (d.July06)

5.3 Names

  • names as Šanūdah, that carry apparently a tāʾ marbūṭa: We assign a tāʾ marbūṭa to the lemma, thus šanūdaẗ (d.47). However, within the Arabic text, the version of the editor is retained, whether it is šanūdaẗ or šanūdah, both of which are assigned to the lemma šanūdaẗ. (May11)
  • Names of males are always masculine and names of females feminine. Names are always det., even if they show nunation. There are however special cases, where a name can be in status constructus, e.g. mīnā baǧūš, Menas, Sohn des Pegoš. (d.Aug.06)
  • n.prop.loc. are normally feminine, unless they contain an element that is not yet lexically empty and masuline, e.g. šubrā (monastery). (d.July06)
  • n.prop.pers. of the form fiʿāl/faʿīl/mufʿil (e.g. Ziyād, Karīm, Muskin) are always triptotic. Feminine n.prop.pers., of whatever form, are always diptotic. For a comprehensive overview of whether a given personal name is diptotic or triptotic, cf. Fischer, Grammatik § 152-153 in combination with Brockelmann, Grammatik (Porta) § 72 (d. Aug. 10); n.prop.pers. of the form faʿʿāl are triptotic (if all the consonants belong to the root e.g. Ḥassān:un and Ḥabbār:un but ʿaffānu (root ʿ f f)), those in faʿāl are diptoic (e.g. Ḥanānu). (d.Feb.11)
  • Names with Nisba without article are treated as they were triptotic: thus ḥusayniyy:un or ḥiǧāziyy:un. (d.March07)
  • Non-Arabic names are marked as such. Thus, in translation, put e.g. "Artemidoros (Greek)", or "Mina (Coptic)". I.e. without special characters (accents) and with the language denomination after the personal name and in brackets. (d.July10)
  • When lemmatising n.prop.pers. we try to give roots (radicals) as far as possible. This concerns personal names of Semitic and non-Semitic (i.e. Coptic, Iranian etc.) origins likewise (d. Aug. 10)

5.4 Magic

  • Combinations of characters as magical expressions (e.g. yā sīn): Lemmatising as if it was a word. E.g. Lemma "ys", Root "ys", Word category "n.", Translation "Yāsīn", Lemmaunit "ys", Form: "ø". (see P.Bad V 147.1) (d.Oct.06)
    Abolishment of (d.Oct.14): Magical signs (i.e. not identifiable but peculiar figures with a particular shape), magical illustrations (such as dogs, scorpions) or well known "magical signs" as pentacles or hexacles (is this English?) will be represented by ◦ (Unicode white bullet, 25E6) in the respecting line. Give as much bullets as there are sings (e.g. P.Bad. V 160.7). If the exact number is not known or identifiable put the aprroximate number. The APD user will thus remark that there are special signs in a given line. Give additionally in "plain line" a reference: "scorpion", "dog", "magical sign" etc. (e.g. P.Bad. V 162) (d.Oct.06)
    Is being replaced by: Refer to a symbol or handmark (like crosses, circles, stars, triangles) with an asterisk * (supplement 2, word category: symbol). Give also a remark in plain words. (d.Oct.14)

5.5 Reminders

  • Difference of "li-" and "la-": Lemma "la": la as a particle (often afer ʾinna in front of the predicate, in the apodosis to law and law lā, particle of oath); Lemma li 1: li as preposition (after nominal elements); Lemma li 2: li as conjunction with conjunctive ("damit, auf dass, um zu"); Lemma li 3: li as conjunction with apocopate (command or request, li-yaktub, wa-l-yaktub, fa-l-yaktub)
  • The imperative of ʾamara is mur, in case of وأمر wa ʾmur. (d.Oct.06)
  • The transcription of īy is iyy in all cases. (d.June15)
  • We consider rabīʿ as diptotic, thus fī šahri rabīʿa ăl-ʾāḫari. (d.Dec.10) This is because fī rabīʿa ăl-ʾawwali is easier to type than fī rabīʿ:ini ăl-ʾawwali, but this does not mean that other months are diptotic too.
  • Imperative written with prefix-verbform, i.e. taktub ʾilay ya instead of ŭktub ʾilay ya. Assign apoc.! (d.April11)
  • In case of weekdays (for example in a list) written as combination of yawm + a greek numeral letter, yawm will be lemmatised twice: once as yawm and once as the corresponding weekday, e.g. yawm ăl ǧumʿaẗ. However, the numeral letter will not be lemmatised. (d.Oct.12)

6. Instructions implemented into user interfaces

6.1 Bibliography

6.1.1 Long abbreviation

  • Name both editors, e.g. Gaubert/Mouton, P.Fay.Villages (monograph) or Tillier/Vanthieghem, Registre (article). In case of three or more authors, use „et al.“, e.g. „Falkenhausen et al., Tròccoli“. Use special characters. For arabic authors publishing in western scripts, use the same spelling as the publication (e.g. Ragheb vs. Rāġib). For reviews, use the Sigle, e.g. Diem, Review P.Khalili I p. 50 (NOT Diem, Khalili I p. 50). The long abbreviation will be used the fields „Edition(s)“, „Emendations“, „Translation(s)“, „Further literature“ and „Image(s)“. In case of doubt, check existing metadata, either on in „search anywhere“ or in Filemaker. Report or correct any inconsistencies.
  • The case of CPR III (1,...): Grohmann's CPR III volume was designed as the first one of several subvolumes. However, the other volumes were never realised. Therefore, we refer to the edited documents in a short way without the sub-volumes, e.g. CPR III 57 or CPR III 130. However, when we refer to a page within one of the volumes, we will state the subvolume, e.g. CPR III 1,1 p. 50. If we have to refer to editions by page numbers, fill "1,1 p. ..." into ln4addition. (d.Nov.15)

6.1.2 Abbreviation APD edition

The „name“ of edited documents, previously also the Sigle on the ISAP checklist for monographs, e.g. P.Fay.Villages or P.TillierRegistre. Your input will be immediately added to the pulldown „Name“ on It is important for all publications including editions to have this field filled correctly. New „Names“ can be created by senior members in FileMaker. No special characters, e.g. P.RemondonPapyrus 1, not P.RémondonPapyrus 1 (d. March 16). There is however one exception to this rule: P.KölnKauf. has retained its Umlaut because P.Köln is also used as abbreviation in Greek papyrology.

6.2 Metadata (Filemaker)

6.2.1 Inventory number

  • Use the abbreviations of the Collections-List on the APD site, even if the editor may use another abbreviation.
  • Cambridge: We insert the two numbers separated by a dot into the numeric and the non-numeric field, but without dot, e.g. P.Cam.inv. TS H 10 173 or P.Cam.inv.TS Ar. 53 59. On you will have to search for H10.173 or H10 173, and Ar.53.59 or Ar.53 59 respectively, i.e. without whitespace after the letter. (d.May16)
  • You may use the non-numeric field if a document consists of two or more fragments, e.g. P.Cam.inv. TS Misc. 29 23 flesh side + P.Cam. inv.TS Misc. 38.31
  • Collection abbreviations: In abbreviations created before June 2012, we occasionally used spaces, e.g. P.Anawati inv. Spaces are not allowed anymore, use dots instead, e.g. P.Gran.Bibl.Uni.inv. (d.June12).

6.2.2 Recto and verso

  • In general, recto is the side which was used first. If this is unclear, script perpendicular to the fibres is recto.
  • Flesh side is recto for parchments, hair side is verso.

6.2.3 Catalogue

Some classics like PERF, P.Ryl.Arab., P.Khalili II include both editions and descriptions of papyri. They may appear either in the "edition" field (or "further edition" if applies), or in the "Cat." field.

6.2.4 Title / Content

  • Fill in title of the document as given by the editor. In case of several editions or descriptions, the sequence of titles is as the sequence of editions, with a semicolon ";" in between.
  • Titles from catalogues like PERF, P.Haram come last.
  • In case of English titles with capital letters: write them with small letters, thus like a normal text.
  • In case of titles like „Papyrus I“, you may create your own title in english. No brackets or names, thus no distinction from titles given by english editors.

6.2.5 CE Date

  • If the date is unkown, insert
    • for (arabic) papyri = 1.1.632-31.12.1000
    • for paper = 1.1.800-31.12.1517
    • unkown material = 1.1.632-31.12.1517
  • Date estimated by the editor, mostly on paleographical grounds
    • "9th century CE" = 1.1.801-31.12.900.
    • "Early 9th century" = 1.1.801-31.12.830
    • "End of 9th century" = 1.1.871-31.12.900
    • "Mid of 9th century" = 1.1.831-31.12.870
    • "About 900" = 1.1.871-31.12.930.
    • "3rd century AH" = same as 9th century CE.
    Important: In all these cases, leave the numeric Hiǧra dating empty.
  • If the text mentions a person that appears on another, dated, document: time slot of 30 years before and after the dated document.
  • Opistography: If a dated document has been recycled to write an undated document, give a time slot of exactly sixty years after the known date. If an undated document has been recycled to write a dated document, exactly sixty years before that date.
  • If the preserved text is a dated copy, e.g. of a legal instrument, the date of the copy counts, but both dates should be mentioned in the free text field.

6.2.6 CE end date

If the exact day is known, begin and end will be the same.

6.2.7 Date as on document

  • Leave empty if the text is not explicitly dated. Take the date from the text and not from the editor's description.
  • APD transcription for arabic months.
  • If the dating in the text is damaged, make this clear by inserting 411-419 or 41[.]; or if complemented by the editor [3]10 AH.
  • Transcription of Coptic months according to Thomann's conversion tool.

6.2.8 Dating comment

  • If a date is mentioned that is not the date of redaction, e.g. taxes are paid for a specific year, choose "(ca.)", or for a debt with a delivery date mentioned, choose "(before)". The CE timespan will still be the same as if this was the redaction date.
  • If the text is not dated, but mentions a fatimid wezir or an ayyubid currency or similar, state it here, if possible quoting your sources.

6.2.9 Numeric Hiǧra Date (also Coptic, Spanish etc.)

  • These datings may be incomplete. For example, you often find a Coptic month combined with a Hiǧra year. The end-date is only needed if the dating is ambigous, like "ǧumādà […] 324 AH" -> [leave empty].5.324 - [leave empty].6.324, or "41[.] AH" -> [leave empty].[leave empty].411 - [leave empty].[leave empty].419.
  • We treat ḫarāǧī years exactly like Hiǧra years, but mention it in the text field if something like „sanaẗ ḫarāǧiyyaẗ“ is mentioned by the text.

6.2.10 Weekday (if mentioned by the text)

We consider weekdays to be the most reliable part of the dating, so adapt the CE date to the weekday if necessary.

6.3 Replace names of lines

  • In case a document is written on recto and verso, „recto“ and „verso“ are mentioned for all lines.
  • Take the numbering from the printed edition. This will allow users to find text in an authorized publication. You may add a description after the number, e.g. „10 right margin 2“, or „1 (upper left corner)“. Avoid technical terms like „ʿalāma“, give physical descriptions instead. (d.Sept.21)
  • Seals: Documents with seals that contain a text: Text is added to the text of the document. Name it "Seal" by means of the operation "Replace Name of Lines". E.g. P.GrohmannUrkunden 8. (d.Oct.06)
  • Allot a line number to each line of text in seal. E.g. (1) muḥammadu bnu (2) sulaymāna. (d.July10) [see page "replace linename"]

6.4 Edit a line: western translation

  • Gaps "[…]" for lost or undeciphered text are filled with dots, not empty spaces (d.48).
  • In case of different editions and thus more than one translation: Give a reference with the name of the editor to the not-primary translation, e.g. „Rāġib“. If it is evident from the languages who has written which translation, abstain from a reference (d.Jan.07).
  • In case of documents without translation by the editor: The APD-team processing the document may add a translation of its own but is not expected to do so. Give your member shortname (e.g. LS or UB) in the metadata (d.Jan.13).
  • Historic translations do not belong here, but if they are on the same sheet, they can be represented as „lines in foreign script“ (see section „Brackets and asterisks“) (d.Jan.13).

6.5 Edit a line: Variants and emendations

6.5.1 Variant readings with uneven number of words

In case an Arabic word could be read differently but the variant reading does not have the same number of elements, example of لما as lammā or li mā:

  • First case: lammā as basic reading and li mā as variant reading: lammā on position 1 and li as variant reading also on position 1. Create a position two with an o with stroke "ø" (Unicode 00f8) for the basic reading to indicate its emptiness and add mā as a variant reading on position 2.
  • Second case: li mā as basic reading and lammā as variant reading: li on position 1 and lammā as variant reading also on position 1. Mā on position 2 and a variant reading with an o with stroke "ø" for the empty variant reading on position 2. (d.Oct. 06)
  • Variant readings with uneven number of words by different editors: Apply the rules above. This implies "empty" positions either in the main or variant readings. Give a reference to the empty position if it is the variant by the second editor. Give this reference only once. E.g. from P.Khalili I 18.10: Khan reads (ترتع{)تبارك وتعالى} whereas Diem reads ربع at the same position. Give تبارك with variant reading ربع، then write وتعالى with an empty variant reading. Give a reference to the empty position: Diem. Give this empty position also in the other layers, the reference, however, only on its upmost level. On the same line, you may also see the opposite example: Khan reads قابلا whereas Diem reads فأنا في . There is thus an empty slot in the main reading (without reference) and then the variants by Diem. (May11)

6.5.2 Variant readings of erased words

Text in [[double square brackets]] appears only in plain line. Therefore, if a second editor has a variant within the erased part, give it as a line reference to plain line. However, do not give the whole line but only one word before and after the variant. Give the variant with full punctuation, as it appears in the edition of the second editor. (d.May11)

6.6 Remarks and references

  • The r-button has two functions: We insert a) references to publications (for translations, variant readings) and b) remarks for things that cannot be represented otherwise, e.g. symbols, tašdīd below the letter, a „misplaced“ dot. In case of misplaced dots, add a reference, as e.g. "dot of ǧīm misplaced under yāʾ“, like in
  • A very frequent kind of remark will be for final qāf, fāʾ and nūn which are actually dotted on the plate. Write „final nūn with dot above“. If a large majority of the text is dotted, you may change strategy: State that the text is dotted in metadata (in the field „remarks“) and add remarks to the words that are NOT dotted. (d.Feb.12)
  • Take care not to use any colon (:) in the remark fields, use rather round brackets. E.g. instead of saying „Diem: xxxx“, say „xxxx (Diem)“. (d.Dec.14)
  • In case of quotation marks, don't use " " but ‹› (Unicode 2039,203a). (d.Dec.14)
  • Rarely, you may need to add a reference in the following case: → 6.5.2 Variant readings of erased words

6.7 Lemmatise an unknown string

  • For Coptic names ending on -ah, see 5.3 Names
  • Plural of adjectives for non-humans: The adjectives concerning animals are often attested in plural. We consider to be the most correct form, so give a standardform (d.July12).
  • Collectiva: For collectives, which are marked by the domain "collective", the nomen unitatis is no separate lemma. For example, wardaẗ is just an occurrence of ward. The collective noun is assigned the form, its broken plural, the nomen unitatis gets the form its sound plural (d.Feb.06)
  • General negation: Assign acc.det. and NOT acc.indet. (see Fischer §318c). E.g. lā yada []. (d.Nov.10)

6.8 Create a new lemma

6.8.1 General

  • see also 5.3 Names
  • see also 3.6.2 Composed lemmata
  • qāḍī: lemmatised as qāḍī (not qāḍin) (d.51)
  • Facultative commentary: This field may be used for all kind of internal comments that ease our work. It will never appear on the "surface" of the database. This field should be used if a lemma was not attested neither in Wehr, Diem or other dictionaries. Give a commentary like "not attested in the usual dictionaries". (d.May06)
  • The field "standardlemma" will be used only in cases where the lemma is of the mentioned type 4 in the list "Crossreferences". (d.May06)
  • Lemma supplements: In case of homonyms, assign the most basic meaning supplement 1, and the other meaning supplement 2 (or 3 or higher with several homonyms). This concerns especially personal names, that will be assigned supplement 2, and their homonym adjective, participle or noun supplement 1 (even though the personal noun is much more often attested in the database). Don't be too hesitant in assigning supplements, thus distinguishing between different meanings of a word (it's easier to merge later on than to differenciate). To change a supplement, go to "list of lemmata with variants".

6.8.2 Shape

  • Shape is assigned formally.
  • Assign a shape if the word is formed by an Arabic root (thus no Coptic or Spanish loanwords that coincidentally follow one of the Arabic patterns). Assign ø if the word is clearly not formed by an Arabic root. If you are uncertain whether a certain word (especially name) is of Arabic origin or not, or if yo don't know what shape to assign, leave the field empty.
  • Double lemmata (consisting of more than one element) are not included.
  • Be aware of the underlining patterns in case of words with weak or doubled roots! E.g. ʾakram, ʾaǧall and ʾaqṣà are all assigned ʾafʿal, maǧbūr and marʿī both mafʿūl, ḍaraba, ḥaǧǧa and ḫāṭa all faʿala. Have a look at Wright's grammar if you are not sure what to assign (or watch out for parallel cases).
  • Words with nisba are assigned "nisba", e.g. rubāʿī, ʿabbāsiyyaẗ

6.10 Typing a text (before upload)

7. Obsolete instructions

  • If the scribe of the document has punctuated in a wrong way, we put the misspelt word in Plain Line in curly brackets and give the corrected one in angular brackets. On Layer Plain Words we give a reference like: وقنیھ: yāʾ with three dots above (see as an example P.CairArab 138:5). Be aware of the fact that dots may shift. In this case go to the same procedure but give a reference like: القپال: dot of qāf (maġribī) shifted on bāʾ
  • alif maqṣūra: à, alif mamdūda: ā (d.11) [macht der Computer]
  • names with plene- and non-plene spelling: plene-spelling as standardform. (d.40) [Mir fällt nur Ibrahim als Beispiel ein, aber wir setzen ja ein alif superscriptum (siehe oben), und dadurch entfällt ein Verweis auf eine Standardform. Ausserdem wäre wohl eher ein Standardlemma als eine Standardform angebracht]
  • e) Punctuation
    Check the punctuation of the original text on the plates and describe it in detail (under remarks) (e.g. "line 4, word 5: kataba, tā' with two dots above" or "all fā' are dotted with one dot above" etc.).
  • punctuation marks (Satzzeichen) are not omited. From layer plain words on they count as a word for themselves. (d.36) [see section on symbols]
  • Names with plene- and non-plene spelling: plene-spelling as standardform. (d.40) [hier müsste doch ein Standardlemma hin, oder? Ausserdem setzen wir ja bei ā immer ein alif superscriptum, und Defektivschreibungun für ī und ū kenne ich nicht.]
  • f) Non-plene writing
    In case of defective writing, alif superscriptum is automatically added and we do not refer to a standardform. (d.May06) [löschen, weil „automatically“ sowieso nicht stimmt]
  • Problem of Greek Letters in Arabic text: The text will be displayed correctly if we insert at the beginning of the line the Symbol Right-to-left-override (Unicode 202E) manually, i.e. not before the implmenting but after. (d.July06)
  • If there are Greek or Latin elements in an Arabic text, note the following instruction for correct representation: (d.34)
    Set at the beginning of the line a RLO-mark (Right-to-Left Override) to define the paragraph as beeing written in a right-to-left script (Arabic).
    In front of a left-to-right part (Greek or Latin) insert a LRE-mark (Left-to-Right Embedding).
    After the left-to-right part insert a PDF-mark (Pop Directional Format) in order to cancel the last direction-order (in our case Left-to-Right Embedding).
    On layers "plain words", "full dots" and "full dots and vowels", put RLO, LRE, greek word, PDF into each field with greek letters. Same procedure for fields with asterisks, if the line contains greek signs. (d.Sept.11)
    The marks for RLO, LRE and PDF can be found in the Unicode-division "General Punctuation": RLO = 202E, LRE = 202A, PDF = 202C In case of Arabic in a mainly Greek or Latin text, go to the same operation, but replace RLO by LRO (Left-to-Right Override) and LRE by RLE (Right-to-Left Embedding). PDF remains the same (LRO= 202D, RLE= 202B).
  • The persomal pronoun أنا is transcribed ʾanā (and not ʾana). (d. Oct. 10) [macht der Computer sowieso]
  • The Greek numeral koppa representing 90 has in Unicode two variants, either 03DF "Greek small letter koppa" or 03D9 "Greek small letter archaic koppa". We will always use 03D9 "Greek small letter archaic koppa" for better representation with Lucida Grande. (d.May07)
    The fraction sign ʹ is the "greek numeral sign", Unicode 0374. For fractions of carats, use ͵ "greek lower numeral sign", Unicode 0375. (d.Sept.11)
  • If an unknown letter is connected from one both sides with preceding or following letters, ensure that the visible letters have the corresponding shape (initial, intermediate or final) by using taṭwīl before or after the asterisks or brackets. Only if the preserved letters are actually connected, of course. The Urdu characters ٹ and ڑ will not be needed anymore. (d. April 14, replacing d. 2) [gehört in die interne Dokumentation, Emma hat alle Urdu-Zeichen gelöscht]
  • 3.1.8 Modifying and merging of lemmata
    For modifying a lemma: go to "List of Lemmata" (listlemmata.jsp) and change the lemma there. (d.June15)
    For merging: First change the wrong lemma on listlemmata.jsp. It should now be exactly like the other lemma to merge with (also same supplement number). Go then to "Edit double lemmata" (doublelemmata.jsp). The two lemmata should appear in the list. Click on "show" and check again before clicking on "Merge lemmaunits and delete second lemma". Check after that in lemmalookup.jsp whether all the details are correct and make sure the lemma does not have a supplement assigned if it is the only one now (if it has, go again to listlemmata.jsp and remove the supplement there). (d.June15)
  • In case of "wrong" concordance regarding gender that is applied systematically (e.g. ăd dār ăllaḏī etc.), lemmatise the noun according to the given concordance with a standardformform showing the "right" gender. In this case [ ->]. (d.Nov.10)“ [das war zwar ursprünglich mein Vorschlag, aber dâr kann laut Wörterbuch durchaus maskulin sein. Bei jumada erübrigt sich die Frage, wenn wir nur noch das zusammengesetzte Lemma markieren.]
  • In case an unpublished document is referred to as it was published (e.g. P.GrohmannBerlin 16 - 35, the manuscript has apparently been sent to the journal but got lost), we'll add (unpublished) in field P.Erg. (d.Dec.10) [der Fall hat sich nicht wiederholt, und da, wo die Entscheidung zutrifft, wurde sie ja schon umgesetzt
  • Field emendation and further literature: Reference to the mentioning of a document consists of: Last name of Author, Sigle (if it is a monograph) or short name of article, p. (for page) and the pagenumber. E.g. Diem, P.Berl.Arab. II p. 121 or Diem, Frühe Urkunden p. 149. Give semicolons between different page numbers of the same book or article and give also semicolons between different books or articles, e.g. Diem, P.Berl.Arab. II p. 94; 190; Khan, P.Khalili I p. 238. (d.June12) [steht im Layout, wenn es da nicht steht, liest es auch keiner]