Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most widely used alphabetic writing system in the world today.
Collating sequence with extensions
Alphabets derived from the Latin have varying collating rules:
Related Topics:
Alphabets derived from the Latin - Collating
~ ~ ~ ~ ~ ~ ~ ~ ~ ~
- In Breton, there is no "c" but there are the ligatures "ch" and "c'h", which are collated between "b" and "d". For example: « buzhugenn, chug, c'hoar, daeraouenn » (earthworm, juice, sister, teardrop).
- In Croatian and Serbian and related South Slavic languages, the five accented characters and two conjoined characters are sorted after the originals: ..., C, ?, ?, D, D?, ?, E, ..., L, LJ, M, N, NJ, O, ..., S, ?, T, ..., Z, ?.
- In Czech and Slovak, accented vowels have secondary collating weight - compared to other letters, they are treated as their unaccented forms (A-Á, E-É-?, I-Í, O-Ó-Ô, U-Ú-?, Y-Ý), but then they are sorted after the unaccented letters (for example, the correct lexicographic order is baa, baá, báa, bab, báb, bac, bác, ba?, bá?). Accented consonants (the ones with caron) have primary collating weight and are collocated immediately after their unaccented counterparts, with exception of ?, ? and ?, which have again secondary weight. CH is considered to be a separate letter and goes between H and I. In Slovak, DZ and D? are also considered separate letters and are positioned between ? and E (A-Á-Ä-B-C-?-D-?-DZ-D?-E-É?).
- In the Danish and Norwegian alphabets, the same extra vowels as in Swedish (see below) are also present but in a different order and with different glyphs (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but nowadays "W" is considered a separate letter.
- In Dutch the combination IJ (representing ? (letter IJ)) was formerly to be collated as Y (or sometimes, as a separate letter Y < IJ < Z), but is currently mostly collated as 2 letters (II < IJ < IK). Exceptions are phone directories; IJ is always collated as Y here because in many Dutch family names Y is used where modern spelling would require IJ. Note that a word starting with ij that is written with a capital I is also written with a capital J, for example, the town IJmuiden (mun. Velsen) and the river IJssel.
- In Esperanto, consonants with circumflex accents (?, ?, ?, ?, ?), as well as ? (u with breve), are counted as separate letters and collated separately (c, ?, d, e, f, g, ?, h, ?, i, j, ? ... s, ?, t, u, ?, v, z).
- In the Estonian õ, ä, ö and ü are considered separate letters and collate after w. Letters ?, z and ? appear in loanwords and foreign proper names only and follow the letter s in the Estonian alphabet, which otherwise does not differ from the basic Latin alphabet.
- The Faroese alphabet also has some of the Danish, Norwegian, and Swedish extra letters, namely Æ and Ø. Furthermore, the Faroese alphabet uses the Icelandic eth, which follows the D. Five of the six vowels A, I, O, U and Y can get accents and are after that considered separate letters. The consonants C, Q, X, W and Z are not found. Therefore the first five letters are A, Á, B, D and Ð, and the last five are V, Y, Ý, Æ, Ø
- In Filipino and other Philippine languages, the letter Ng is treated as a separate letter. Also, letter derivatives (such as Ñ) immediately follow the base letter. Filipino also is written with accents and other marks, but the marks are not in very wide use (except the tilde). It is pronounced as in sing, ping-pong, etc. By itself, it is pronounced nang, but in general Philippine orthography, it is spelled as if it were two separate letters (n and g). (Philippine orthography also includes spelling.)
- The Finnish alphabet and collating rules are the same as in Swedish, except for the addition of the letters ? and ?, which are considered variants of S and Z.
- In French and English, characters with diaeresis (ä, ë, ï, ö, ü, ÿ) are usually treated just like their un-accented versions. If two words differ only by an accent in French, the one with the accent is greater. (However, the Unicode 3.0 book specifies a more complex traditional French sorting rule for accented letters.)
- In German letters with umlaut (Ä, Ö, Ü) are treated generally just like their non-umlauted versions; ß is always sorted as ss. This makes the alphabetic order Arg, Ärgerlich, Arm, Assistent, Aßlar, Assoziation. For phone directories and similar lists of names, the umlauts are to be collated like the letter combinations "ae", "oe", "ue". This makes the alphabetic order Udet, Übelacker, Uell, Ülle, Ueve, Üxküll, Uffenbach.
- The Hungarian language has accents, umlauts, and double accents. The accent is ignored in collating, and the double accent, which indicates a long umlaut vowel, is treated as equal to the umlaut.
- In Icelandic, Þ is added, and D is followed by Ð.
- Both letters were also used by Anglo-Saxon scribes who also used the Runic letter Wynn to represent /w/.
- Þ (called thorn; lowercase þ) is also a Runic letter.
- Ð (called eth; lowercase ð) is the letter D with an added stroke.
- In Polish, specifically Polish letters derived from the Latin alphabet are collated after their originals: A, ?, B, C, ?, D, E, ?, ..., L, ?, M, N, ?, O, Ó, P, ..., S, ?, T, ..., Z, ?, ?.
- In Romanian, special characters derived from the Latin alphabet are collated after their originals: A, ?, Â, ..., I, Î, ..., S, ?, T, ?, ..., Z.
- In the Swedish alphabet, "W" is seen as a variant of "V" and not a separate letter. It is however recognised and maintained in names, like in "William". The alphabet also has three extra vowels placed at its end (..., X, Y, Z, Å, Ä, Ö).
- Some languages have more complex rules: for example, Spanish treated (until 1997) "CH" and "LL" as single letters, giving an ordering of CINCO, CREDO, CHISPA and LOMO, LUZ, LLAMA. This is not true anymore since in 1997 RAE adopted the more conventional usage, and now LL is collated between LK and LM, and CH between CG and CI. The only Spanish specific collating question is Ñ (eñe) as a different letter collated after N.
- In Tatar and Turkish, there are 9 additional letters. 5 of them are vowels, paired with main alphabet vowels as hard-smooth: a-ä, o-ö, u-ü, í-i, ?-e. The four remaining are consonants: ? is sh, ç is ch, ñ is ng and ? is gh.
- Welsh also has complex rules: the combinations CH, DD, FF, NG, LL, PH and TH are all considered single letters, and each is listed after the letter which is the first character in the combination, with the exception of NG which is listed after G. However, the situation is further complicated by these combinations not always being single letters. An example ordering is LAWR, LWCUS, LLONG, LLOM, LLONGYFARCH: the last of these words is a juxtaposition of LLON and GYFARCH, and, unlike LLONG, does not contain the letter NG.
For multilingual situations with no one preferred language or alphabet, the
~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Unicode Collation Algorithm can be used.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~ Table of Content ~
| ► | Introduction |
| ► | Letters of the alphabet |
| ► | Extensions |
| ► | Evolution |
| ► | Collating sequence with extensions |
| ► | See also |
| ► | References |
| ► | External links |
~ What's Hot ~
The Karate Kid, Alvin And The Chipmunks The Squeakquel, Lethal Weapon 5, Legion, It S Complicated, The Goods Live Hard Sell Hard, Sorority Row, Avatar, The Mummy 4 Rise Of The Aztec, All About Steve, The Princess And The Frog, The Hangover, 28 Months Later, Up In The Air, My Sister S Keeper, 500 Days Of Summer, Hannah Montana The Movie, Dear John, The Blind Side, New Moon,
~ Community ~
| ► | History Forum Come and discuss about History, Civilizations, Historical Events and Figures |
| ► | History Web-Ring A community of sites, blogs and forums dedicated to History. Do not hesitate to submit your site. |
and are licensed under the GNU Free Documentation License.
Lexicon - Privacy Policy - Spiritus-Temporis.com ©2005.
