Microsoft Store
 

Collation


 

:Alphabetical redirects here. For the alphabet, click here. For the meal, see Collations.

Complications

Compound words and special characters

A complication in alphabetical sorting can arise due to disagreements over how groups of words (separated compound words, names, titles, etc.) should be ordered. One rule is to remove spaces for purposes of ordering, another is to consider a space as a character that is ordered before numbers and letters (this method is consistent with ASCII-ordering), and a third is to order a space after numbers and letters. Given the following strings to alphabetize — "catch", "cattle", "cat food" — the first rule produces "catch" "cat food" "cattle", the second "cat food" "catch" "cattle", and the third "catch" "cattle" "cat food". The first rule is used in most (but not all) dictionaries, the second in telephone directories (so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo). The third rule is rarely used.

Related Topics:
Compound word - Name - Title - Space - Dictionaries - Telephone directories

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

A similar complication arises when special characters such as hyphens or apostrophes appear in words or names. Any of the same rules as above can be used in this case as well; however, the strict ASCII sorting no longer corresponds exactly to any of the rules.

Related Topics:
Hyphen - Apostrophe

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Name/Surname ordering

The telephone directory example sheds light on another complication. In cultures where family names are written after given names, it is usually still desired to sort by family name first. In this case, names need to be reordered to be sorted properly. For example, Juan Hernandes and Brian O'Leary should be sorted as Hernandes, Juan and O'Leary, Brian even if they are not written this way. Capturing this rule in a computer collation algorithm is difficult, and simple attempts will necessarily fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian".

Related Topics:
Family name - Given name

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In telephone directories in English speaking countries, surnames beginning with Mc are sometimes sorted as if starting with Mac and placed between "Mabxxx" and "Madxxx". Under these rules, the telephone directory order of the following names would be: Maam, McAllan, Macbeth, MacCarthy, McDonald, Macy, Mboko.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Abbreviations and common words

When abbreviations are used, it is sometimes desired to expand the abbreviations for sorting. In this case, "St. Paul" comes before "Shanghei". Obviously, to capture this behavior in a collation algorithm, we need a list of abbreviations. It may be more practical in some cases to store two sets of strings, one for sorting and one display. A similar problem arises when letters are replaced by numbers or special symbols in an irregular manner, for example 1337 for leet or the movie Se7en. In this case, proper sorting necessitates keeping two sets of strings.

Related Topics:
Leet - Se7en

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In certain contexts, very common words (such as articles) at the beginning of a sequence of words are not considered for ordering, or are moved to the end. So "The Shining" is considered "Shining" or "Shining, The" when alphabetizing and therefore is ordered before "Summer of Sam". This rule is fairly easy to capture in an algorithm, but many programs rely instead on simple lexicographic ordering.

Related Topics:
Article - The Shining - Summer of Sam

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Numerical sorting of strings

Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a". This can be extended to Roman numerals. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

For example, Windows XP does this when sorting file names (much to the annoyance of some people who are used to a simple lexicographic ordering). Sorting decimals properly is a bit more difficult, due to the fact that different locales use different symbols for a decimal point, and sometimes the same character used as a decimal point is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.

Related Topics:
Windows XP - File name - Decimal point

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:See for the usage of alphabetical order in Wikipedia.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~