md
Format Settings's List Separator
July 6th, 2016

Updating my mathematical expression parser to use with Free Pascal, has led me to investigate the list separator used in different locales. While Windows has such a symbol which is set according to the user’s regional setting, there is none in Linux. So how is the default list separator defined in an application running on Linux? The short answer is that the list separator is the comma ‘,’ as used in the USA and not adjusted according to the user’s locale.

The Multiplatform Programming Guide wiki, recommends including the clocale.pp unit as one of the first used unit in the application’s project file as in the following example project1.lpr file:

program project1; {$mode objfpc} uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads, {$ENDIF} clocale, {$ENDIF} Interfaces, // this includes the LCL widgetset Forms, main { you can add units after this }; {$R *.res} begin RequireDerivedFormResource := True; Application.Initialize; Application.CreateForm(TForm1, Form1); Application.Run; end.

The clocale unit pulls in the SysUtils.pas unit which contains the DefaultFormatSettings record. That record is an initialized variable:

var DefaultFormatSettings : TFormatSettings = ( CurrencyFormat: 1; NegCurrFormat: 5; ThousandSeparator: ','; DecimalSeparator: '.'; CurrencyDecimals: 2; DateSeparator: '-'; TimeSeparator: ':'; ListSeparator: ','; ...

which is not modified in SysUtils initialization code for the Gtk2 widget set. The initialization code of unit clocale updates some fields of that record using information obtained from the operating system. Since Linux does not have a list separator, that field is unchanged.

As a consequence of this behaviour, the decimal and list separators will both be the comma ‘,’ for many users. This is what happens on my system where the locale is fr_CA. My parser cannot function with those values. As an example, consider the expression max(1.2,24.54,32.1) which returns the biggest value 32.1 with an en_UK locale. But what does max(1,2,24,54,32,1) when the comma is both decimal and list separator? My parser will return 32.1 because when creating tokens, it tries to create the biggest numbers possible. Hence 1,2 is treated as a single number and not as two separated integers. However, this is ambiguous and the user could have expected 54 or 54,32 as the answer. In Windows the list separator for fr_CA is the semicolon. The expression would have been written max(1,2;24,54;32.1) and again the result would be an unambiguous 32.1.

Consequently, I decided to modify the project file as follows:

program project1; {$mode objfpc} uses {$IFDEF UNIX}{$IFDEF UseCThreads} cthreads, {$ENDIF} clocale, {$ENDIF} Interfaces, // this includes the LCL widgetset Forms, main { you can add units after this }; {$R *.res} begin If DefaultFormatSettings.DecimalSeparator = ‘,’ then DefaultFormatSettings.ListSeparator = ‘;’. RequireDerivedFormResource := True; Application.Initialize; Application.CreateForm(TForm1, Form1); Application.Run; end.

While that solves my problem, is it creating problems for others? I investigated that question in Windows where it is relatively easy to get to the various locales’ values. The decimal separator is one of three symbols: . , /; the list separator is one of four symbols: , ; ° ? (it is not clear to me if ? is the correct symbol or an indication that it is not known). The table below summarises the data found in a recently updated Windows 10 system:

Total number of locales258
Decimal separator is ‘.’130
Decimal separator is ‘,’127
Decimal separator is ‘/’1
List separator is ‘,’67
List separator is ‘;’188
List separator is ‘°’2
List separator is ‘?’1
Decimal and list separators are ‘,’ & ‘;’121
Decimal and list separators are ‘.’ & ‘;’67
Decimal and list separators are ‘.’ & ‘,’61
Decimal and list separators are ‘,’ & ‘,’6
- other combinations3

The additional code will correctly change the list separator for 121 locales that use the comma as decimal symbol. Furthermore, it would have no impact for the 130 locales that use the period as decimal separator and the one locale that uses the slash. The correction could be seen as an “error” for 6 locales (arn_CL, en_ZA, gn_PY, quz_BO, quz_EC, vi_VN) that actually use the comma as both the list separator and decimal separator. Given the ambiguity of such a choice, I think it is an acceptable compromise.

Data: as a text file, as an XLS (Excel) file, as an ODS (LibreOffice.Calc) file.