[Development] Two-digit dates: what century should we use ?

Fri Nov 8 11:15:59 CET 2019

André Somers (6 November 2019 17:20) wrote
>>> I came to the conclusion that the sane behavior for interpreting
>>> dates depends on the semantics of what the date means. For instance,
>>> a birth date will always be a date in the past,

On 07-11-19 11:47, Edward Welbourne wrote:
>> ... except when it's the best-estimate date of birth of an expected
>> child, or part of a discussion of how (say) an education system will
>> handle the cohorts of children born in various date ranges.

André Somers (8 November 2019 09:14)
> ... neither of which are actual birth dates. The first is an expected
> birth date, the second something else entirely.

>From the point of view of an education system, 2020-09-01 through
2021-08-31 are birth-dates of pupils they have to think about in their
long-term planning for (roughly) the academic years 2026 through 2039.
They need to ask actuaries to tell them how many children to expect in
that cohort, for example.

I tend to think of the eccentric uses of a classification because I'm
the one who's going to have to field the bug reports when someone uses
the "date of birth" mode for format-reading and gets results almost a
century ago where they expected results in the near future.

>> I'll agree, though, that birth dates are *usually* in the past ;^>
>>
>> Even when it is in the past, the range of past dates it may land in
>> is more than a century wide.  Some folk live for more than a century;
>> and records of dates of birth of folk can be relevant even after the
>> folk in question are dead.
>>
>> (All of which argues against using two-digit years in dates of birth,
>> common though that practice is.)

> True. But that does not preclude people from entering such dates. I
> guess it also depends on what use case you envision for this. For
> reading data stored in a 2-digit format, you are completely right.

Thankfully the use of two-digit years in storage formats is much less
fashionable than it used to be.  I still doubt it'll ever die out, though.

> But I was thinking more of making date entry work better. I have
> written controls backed by date parsing code based on logic like
> this. Yes, you can enter full data, but the control would do the
> expected thing based even for shorthands like using a 2-digit
> year. What it would do would depend on the purpose of the date
> field. The example above were not random: it was medical device
> software, so it was dealing with birth dates, appointments, etc. So
> for that one-in-200 patient over 100 years old, you'd use the full 4
> digit year when entering the data. For the rest of them the 2 digit
> version would be enough.

but that's definitely a UI thing - if you're going to interpret what the
user typed as something else, you need to show them how you've
interpreted it, so that they can correct it if it's wrong.  So, by the
time you feed your data to Qt, you have ready-digested data that doesn't
need to be parsed by Qt with a two-digit year.  Of course, you *can* do
this, first trying to parse what the user typed with one format then, if
that fails or produces an implausible date, with another, but I suspect
you're setting yourself up for bad UX if you do that.

>>> while a date for an appointment would normally be a date in the
>>> future.

>> and usually not very far in the future, at that, which makes this one
>> of the cases where two-digit years aren't automatically a bad idea.

> True. It helps in the experience with the software if entering common
> things works quickly and smoothly. Making dates easier to enter can be
> a win in the time a user needs to enter data, and that can be _very_
> valuable, especially if that is something that needs to be done often.

Sounds like good UX, yes.

OTOH, as noted above, it's surely important to show the user promptly
how you're going to interpret their two-digit year, so that they can
correct it if that's not what they meant - as, for example, when a
doctor is filling in a form about a patient over a century old and needs
18 to mean 1918, not 2018 (or, the other way round for a child, if the
software defaults to 1900s).  Which means this happens at the UI level,
not in the QDateTime parsing code.

>>> That alters the interpretation of the date. May I suggest adding an
>>> enum argument to any function doing the conversion from a string to a
>>> date that allows you tell you to suggest the kind of date that is
>>> expected?

>> That would imply inventing (and documenting) how we're going to
>> implement each member of the enum; and, fundamentally, that's going to
>> boil down to specifying (per enum member) a range of (up to) 100 years
>> that each two-digit year value gets mapped into.  Then along comes some
>> user whose use-case we didn't think of and we need to extend the enum
>> and the enum grows endlessly.  I think it is easier to let the caller
>> just specify that year range (principally by its start date).  The
>> caller can then invent any flavour of year range they like.

> Do you really think it would get out of hand?

Yes.

> I can't see this growing to more than a hand full,

I try not to let the fact that I can't think of more than a handful lull
me into forgetting how inventive users of a general-purpose library are
apt to get.  I also expect that, as we get further into the 21st
century, folk shall start to get "creative" about deciding how to
interpret two-digit years in dates; and we'll need to be compatible with
whatever client code is obliged to handle.

> and it would be much easier to use than having to use and read
> Qt::ExpectPastDate compared to something like
> QDate::currentDate().year() - 99 as an argument to that function.

I'm fairly confident there shall be applications that have weird
requirements not covered by our enum (if we have one), indeed some that
we'll never be willing to cover because only that one client actually
uses it, so we'll need the start-year-based version of the APIs.  If we
also provide an enum-based API for choice of reading of two-digit years,
that's two ways to specify this, which roughly doubles the number of
methods in each affected API.

We have enum-based formats (that adapt to the system locale) but also
let client code supply its own string formats for dates and times,
because inevitably some applications need formats that don't match the
CLDR-supplied formats for supported locales, or want to pass the format
from some fixed locale rather than the system one.  (If your software
processes data generated by a source in a different locale to (at least
some of) your users, you need to use that locale's date-formats when
reading the data, but the user's locale when displaying it.)

If we have both enum and start-year ways to tell the API how to handle
two-digit years, that's now four methods for each API (after
consolidation at Qt 6 - we already have four because of the string/enum
choice for format and the recent addition of a QCalendar optional
argument).

Of course, nothing stops an application defining its own wrapper for
relevant methods, that takes an enum defined by the application and maps
it to a suitable start-year.

	Eddy.