[Development] Two-digit dates: what century should we use ?

Kari Oikarinen kari.oikarinen at qt.io
Tue Nov 5 15:46:50 CET 2019

On 5.11.2019 15.44, Edward Welbourne wrote:> Hi all,
 > Prompted by [0], I'm looking at what century to use for years, when the
 > text being read is expected to be in a "short format" that only includes
 > two digits.
 > * [0] https://bugreports.qt.io/browse/QTBUG-74323
 > tl;dr - how do folk feel about (in Qt 6) a century-wide window, ending a
 > decade or three ahead of QDate::currentDate(), and placing any two-digit
 > year in that range ?
 > Before anyone says "Don't Do That" (or "why would anyone use two-digit
 > years after the mess of y2k ?"), bear in mind that CLDR (the Unicode
 > consortium's common locale data repository, on which QLocale's data is
 > based) provides short date formats, many of which use two-digit years.
 > We currently fail to round-trip dates via such formats because 1900 is
 > used as default year when no year is specified and (thus) 19 is used as
 > default century number when only the later digits are (understood to be)
 > specified.  As we get further into the twenty-hundreds (as it were), this
 > shall grow to be an increasing jarring flaw in date format handling.

I think even in the future many two-digit year formatted dates will
refer to 19xx (either because they are old or because that's the
assumption widely). So correct handling of those formats will be
impossible anyway. They don't contain enough information. But it's of
course unfortunate if the dates you stored yourself can't be read back

 > I'm considering changing that: since it's a material behaviour change,
 > it clearly needs to happen as part of Qt 6, which at least gives me a
 > few months to discuss it and see what folk think is a better plan than
 > what we have.
 > It's notable that ECMAScript's Date constructor adds 1900 to any year
 > number from 0 through 99 (even if supplied as one of a sequence of
 > integer arguments, not a string), causing problems for the
 > representation of dates from 1 BCE through 99 CE.  (I must remember to
 > tease my friend on the ECMA 262 committee about that - his excuse will
 > be that it was copied from an early version of Java, I suspect - and see
 > if he can coax them into changing it.)  Likewise, C's struct tm (used by
 > mktime and friends) has a 1900 offset on its year number: that's
 > probably never going to change, perverse as it is and shall increasingly
 > be.

Surely that's not a comprehensive list. I don't expect either of these
to change and staying in line with other software can avoid surprises.

I thought almost everyone would assume 1900-1999 as the range.
Apparently that's not true though. Python uses the range 1969-2068 and
C# depends on locale but appears to at least sometimes be 1930-2029.
Using two-digit years is a much worse idea than I thought since there
doesn't seem to be consensus.

 > Folk still talk about "The fifties" and mean the 1950s; probably
 > likewise the forties, thirties and even twenties.  That last, at least,
 > shall soon be something of a problem.  Folk can see more of the past
 > than of the future, so perhaps it's not much of a surprise that common
 > nomenclature reserves short phrases for the past at the expense of the
 > future: "The sixties" shall be in the past for a few decades yet, I
 > think.  So rather than having a default century, and maybe changing it
 > abruptly to 20 at some point in the next fifty years, I think it would
 > be better to have two-digit years coerced into a century-wide window
 > about the (forever moving) present.
 > Perhaps we should make that a narrower window and treat roughly a decade
 > near the wrap-around as error - e.g. using 1945--2035 as our year range,
 > with two-digit years 36 through 44 treated as undecodable.
 > The question then arises: what year-range should we use ?
 > Two things I'm fairly sure should be true are:
 > * the current year (i.e. QDate::currentDate().year(), naturally) should
 >    be included in the range;

I don't think it's that obvious. Many systems start at the UNIX epoch
and then may get the current time from network later. That might
result in the year-range changing during application lifetime, which
sounds horrible. You wouldn't know what certain input results into
when reasoning statically.

 > * the range should be contiguous.
 > So the interesting questions are:
 > * how far into the past and future should the range reach ?
 > * how wide a buffer (if any) should we leave ?
 > If we don't have a buffer, my inclination is to put the transition date
 > at a decade boundary, e.g. 49 -> 2049 but 50 -> 1950, as this shall feel
 > less perverse to most folk than having a mid-decade transition such as
 > 44 -> 2044 but 45 -> 1945.  However, with a buffer, this problem goes
 > away, as there aren't adjacent two-digit numbers that map to wildly
 > different years; instead, the intervening numbers that aren't handled
 > make the discontinuity seem more sensible.  In principle a one year
 > buffer would suffice, but I'm inclined to make the gap a decade long, or
 > more, if we have one.
 > If QDate::currentDate().year() is C and (C / 10) * 10 is D, either of
 > these ranges strikes me as better than the 1900--1999 that we're
 > currently using:
 > * D -70 <= year < D+30 (all two-digit values handled)
 > * C -65 <= year <= C +25 (othet two-digit values rejected)

Rather than just saying the range, documentation would would need to
outline the rules used to choose it. That would be much harder to
understand than a single static range.

 > So, to my questions:
 > * Does anyone want to make the case for keeping 1900--1999 as range ?

The case for keeping it would be:

* Backwards compatibility
* Predictable compared to rolling ranges based on current year
* No clearly correct alternative due to no consensus elsewhere

 > * Has anyone a better suggestion for how to chose a rolling range ?
 > * Should we have a buffer ?  If so, how wide ?
 > * How far into the past and future should the range reach ?
 > 	Eddy.

I would lean towards keeping 1900-1999, but it would mean that a lot
of people would need to do range handling like you outlined when
handling dates inputted by users. User who now writes 19 as the year
will mean 2019. So having a more intuitive default would have
positives. As of course the original bug report proves. Some people
are already expecting that behavior.


More information about the Development mailing list