[Development] Two-digit dates: what century should we use ?
Edward Welbourne
edward.welbourne at qt.io
Tue Nov 5 14:44:50 CET 2019
Hi all,
Prompted by [0], I'm looking at what century to use for years, when the
text being read is expected to be in a "short format" that only includes
two digits.
* [0] https://bugreports.qt.io/browse/QTBUG-74323
tl;dr - how do folk feel about (in Qt 6) a century-wide window, ending a
decade or three ahead of QDate::currentDate(), and placing any two-digit
year in that range ?
Before anyone says "Don't Do That" (or "why would anyone use two-digit
years after the mess of y2k ?"), bear in mind that CLDR (the Unicode
consortium's common locale data repository, on which QLocale's data is
based) provides short date formats, many of which use two-digit years.
We currently fail to round-trip dates via such formats because 1900 is
used as default year when no year is specified and (thus) 19 is used as
default century number when only the later digits are (understood to be)
specified. As we get further into the twenty-hundreds (as it were), this
shall grow to be an increasing jarring flaw in date format handling.
I'm considering changing that: since it's a material behaviour change,
it clearly needs to happen as part of Qt 6, which at least gives me a
few months to discuss it and see what folk think is a better plan than
what we have.
It's notable that ECMAScript's Date constructor adds 1900 to any year
number from 0 through 99 (even if supplied as one of a sequence of
integer arguments, not a string), causing problems for the
representation of dates from 1 BCE through 99 CE. (I must remember to
tease my friend on the ECMA 262 committee about that - his excuse will
be that it was copied from an early version of Java, I suspect - and see
if he can coax them into changing it.) Likewise, C's struct tm (used by
mktime and friends) has a 1900 offset on its year number: that's
probably never going to change, perverse as it is and shall increasingly
be.
Folk still talk about "The fifties" and mean the 1950s; probably
likewise the forties, thirties and even twenties. That last, at least,
shall soon be something of a problem. Folk can see more of the past
than of the future, so perhaps it's not much of a surprise that common
nomenclature reserves short phrases for the past at the expense of the
future: "The sixties" shall be in the past for a few decades yet, I
think. So rather than having a default century, and maybe changing it
abruptly to 20 at some point in the next fifty years, I think it would
be better to have two-digit years coerced into a century-wide window
about the (forever moving) present.
Perhaps we should make that a narrower window and treat roughly a decade
near the wrap-around as error - e.g. using 1945--2035 as our year range,
with two-digit years 36 through 44 treated as undecodable.
The question then arises: what year-range should we use ?
Two things I'm fairly sure should be true are:
* the current year (i.e. QDate::currentDate().year(), naturally) should
be included in the range;
* the range should be contiguous.
So the interesting questions are:
* how far into the past and future should the range reach ?
* how wide a buffer (if any) should we leave ?
If we don't have a buffer, my inclination is to put the transition date
at a decade boundary, e.g. 49 -> 2049 but 50 -> 1950, as this shall feel
less perverse to most folk than having a mid-decade transition such as
44 -> 2044 but 45 -> 1945. However, with a buffer, this problem goes
away, as there aren't adjacent two-digit numbers that map to wildly
different years; instead, the intervening numbers that aren't handled
make the discontinuity seem more sensible. In principle a one year
buffer would suffice, but I'm inclined to make the gap a decade long, or
more, if we have one.
If QDate::currentDate().year() is C and (C / 10) * 10 is D, either of
these ranges strikes me as better than the 1900--1999 that we're
currently using:
* D -70 <= year < D+30 (all two-digit values handled)
* C -65 <= year <= C +25 (othet two-digit values rejected)
So, to my questions:
* Does anyone want to make the case for keeping 1900--1999 as range ?
* Has anyone a better suggestion for how to chose a rolling range ?
* Should we have a buffer ? If so, how wide ?
* How far into the past and future should the range reach ?
Eddy.
More information about the Development
mailing list