-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider time zone canonicalization behavior given forking of IANA Time Zone Database #2509
Comments
I like option 4, but would rename it to something like "Canonicalization doesn't follow Links" and would also make clear that Link chains terminating in "Etc/UTC" or "Etc/GMT" are still followed and canonicalized to "UTC" (unless that option splits into e.g. 4a and 4b with different conclusions regarding this behavior).
That definitely gets into backwards compatibility territory, and it is plausible—if not likely—that some existing code already uses
tc39/ecma402#119 does propose exposing the IANA names, but I don't recall any existing text requiring implementations to do so (although they certainly could anyway; ECMA-402 gives formatters lots of flexibility).
I defer to @sffc.
I don't have a strong opinion here either.
Yes. Canonicalization still happens, it just doesn't follow Links (except for special-casing GMT and UTC).
No. Doing this would make the behavior less comprehensible and would sacrifice potential benefits.
There should definitely be some way to identify that there is a Link chain establishing equality between two time zones with different names, and ideally a way to determine its directionality (e.g., detecting that Atlantic/Reykjavik is a Link to Africa/Abidjan rather than the reverse).
Agreed; not needed at this time.
Note that current behavior also maps "Etc/GMT" to "UTC": https://tc39.es/proposal-temporal/#sup-canonicalizetimezonename
AFAICT, there's no explicit differentiation... rather, just a zone.tab file identifying for each ISO 3166-1 alpha-2 country code the corresponding time zones, which can theirselves identify Links (as is currently the case for DE/IS/etc.).
Disagree on this; custom time zones should be compared by referential object identity. A custom time zone that happens to have
👍
I also disagree on these last points, although really it seems to be mostly a question of modeling—I don't think Temporal should classify Links as "alternate spelling" vs. "real", but rather just treat any Link relationship as establishing equality. In an implementation that uses the standard IANA time zone data, Atlantic/Reykyavik and Africa/Abidjan are equal until and unless a policy change causes them to diverge. |
I agree with @gibson042, partly for a far more practical reason: maintenance. The IANA source doesn't have any API differences between "links due to similar clocks" and "links due to renames". The If Temporal was to distinguish between the two cases in an API, there would need to be a stable maintenance process for adding brand new links to the correct category. |
My assumption is that, ideally, there'd be two categories of links:
The first type of link (let's call them "synonyms") conveys no semantic value. Programs will never behave differently depending on which ones you use (other than when comparing the The second type of link (let's call them "merges") conveys semantically different information that could change the behavior of future programs beyond string comparison. The particular use case I had in mind where it's helpful to know that difference is helping is when a program has logic like this: "I want to do special processing for timestamps for X" (where "X" is a particular country like India or Sweden). Like this: if (Temporal.TimeZone.from('Europe/Copenhagen').equals(zdt.timeZoneId)) {
// do India-specific stuff
} else {
// non-India-specfic logic
} It would be bad if future changes in the spelling of the desired English transliteration of "Copenhagen" caused the code above to break. So it's probably good practice for any code that checks for a specific time zone (or that wants to compare two ZDT timestamps to know if they're semantically identical) to use But it'd also be bad if the price of protecting against future spelling changes meant that you'd need to false-negatively run jurisdiction-specific logic for other jurisdictions that coincidentally share the same time zone rules. It's true that, continuing that example above, if Denmark split into multiple time zones then the code above would break. But I think this is OK, because the change happened in Denmark so of course Denmark-specific code will need to change. My main concern is that if you treat all aliases the same, then So I do think there's a case that being able to distinguish these cases is important. But...
One possible (needs validation) solution using existing data would be to use zone.tab which includes pre-merge data. If a link from
Agree, if the approach above won't work. We'd want to work with the IANA folks (or maybe ICU/CLDR?) to ensure that distinction is maintained in the future via some other solution. There's less than 300 total links so this isn't a lot of ongoing maintenance work (would probably add <1hr/year of work to someone's plate) but someone would have to be willing to commit to doing the work long-term. BTW, I'd volunteer do make an initial PR into TZDB, if it's decided that this split would be good to maintain AND if the data files need to change somehow. |
You'd also need to consider backzone, because e.g. Africa/Timbuktu does not appear in zone.tab but is a "merge" (to use your term) of Africa/Abidjan in the primary data but (presumably) a synonym of Africa/Bamako in the pre-1970 data, and I think the same applies to everything in the "Non-zone.tab locations with timestamps since 1970 that duplicate those of an existing location" section mentioned below.
That seems like a goal that exceeds the scope of Temporal v1.
AFAICT tzdata Links are all created equal—the only existing data that could be used is unstructured section-heading comment text like "Pre-2013 practice, which typically had a Zone per zone.tab line" and "Non-zone.tab locations with timestamps since 1970 that duplicate those of an existing location". So I guess you'd be proposing something like a new merged file that exclusively contains the content from those section(s) and a Temporal equality comparison that ignores its contents? |
It's probably best to read this whole discussion thread first: https://mm.icann.org/pipermail/tz/2021-November/031074.html I would definitely like a change to the current format (I commented in that linked thread). But part of the reason the tzdb structure doesn't change often is the sheer number and variety of downstream consumers that have to be able to handle any new format. |
Yep, you're right:
Will share more results when I finish the investigation. |
Initial investigation is complete. Results are here: https://4rylir.csb.app (full-screen view) and https://codesandbox.io/s/iana-vs-es-4rylir (source code). You can filter or sort to understand the various kinds of links. Summary
Categorizing Synonyms vs. MergesI took a first pass at classifying links as synonyms or merges based on the following algorithm:
I manually verified all 86 synonyms identified by the algorithm above. There were these patterns:
I also manually checked through the Links identified as merges , and I was unable to find any that looked like they should be synonyms. |
My initial reaction is that it's not the job of Temporal to tell implementations what they can/should and can't/shouldn't do in this area. I can at least say that any solution that involves "don't canonicalize time zone names" likely means that ICU's time zone utilities can no longer be used for data storage; they can be used for calculations, but Temporal glue code will need to be implemented to conform to the spec rather than just following with ICU behavior as we've been doing for a long time. |
Before I did this research I probably would have agreed with you, but now that I've dug into the problem I'm quite concerned about the impact of canonicalization on the stability of ECMAScript code across engines and across time. From what I've seen, canonicalization changes very frequently, and implementations seem to vary quite a bit in how they apply canonicalization. This has really made me question the value of exposing canonicalized IDs to userland developers. We're already seen (in this repo, in Chrome's bugs, etc.) user complaints about canonicalization when differences are usually limited to only minor variations like Calcutta vs. Kolkata. And that's with almost 2/3 of Links in the current IANA TZDB not being followed by engines to IANA's canonical IDs. If engines start resolving Canadian time zones to Panama, Iceland to Cote d'Ivoire, and Stockholm to Berlin, we can expect many more complaints, user confusion, broken tests, etc. Who'd be a good person to talk with to understand how ICU currently approaches this problem? How do they determine which Links to follow and which to ignore?
I assume that implementations would need to store both the caller's (case-normalized) original string input as well as a pointer to the data structure that ICU uses to represent a canonicalized time zone. Is that what you mean by "storage"? The stored string would be used by #2482's If we also wanted to offer a Other than above, what other glue code would be needed? |
@yumaoka and @pedberg-icu know the most about ICU4C time zone handling. For ICU4X, we currently persist time zones by BCP-47 ID. We can (or will be able to) take IANA strings and map them to BCP-47, and then we lookup the canonical ID to go in the other direction. There is an issue (unicode-org/icu4x#2909) discussing which source of truth we should use for canonicalization. I'm currently neutral on the actual usability issue. I'm just pointing out that we're in effect moving more responsibility out of ICU[4X] and into the Temporal glue code. This logic about how to compare time zones for equality, what form of canonicalization to apply to them, etc., is not easy, as your OP shows. ICU/CLDR already solves these problems in its own way, as it has been doing for a long time. Moving these problems into Temporal glue code just makes Temporal harder to implement and harder to test. If the champions think that the problem is big enough to warrant the additional (nontrivial) implementation cost, so be it. |
I don't, for one! I think the TZDB fork is a problem which JS implementations can coordinate among themselves to solve. Pulling the responsibility for solving the problem into our domain will delay the proposal, while delivering an incomplete solution (because this is a problem that applies outside of Temporal as well, and those parts we can't solve.) |
Question. Can this behavior be changed as a Temporal V2 follow-up? Logistically, I think it's fair to say that moving forward with this change is going to delay Temporal implementations by another several months, given that we need to discuss this in various venues to achieve consensus, then write the spec text, then the tests, then the ICU functions discussed above, then in-flight implementations need to be updated. |
An appendix to the synonym vs. merge investigation above: CLDR helpfully provides synonym data here. Example: "inccu": {
"_description": "Kolkata, India",
"_alias": "Asia/Calcutta Asia/Kolkata"
}, If CLDR is the source of truth for time zone identifiers, then it's easy to distinguish merges from aliases.
My concern is that implementations have had years to do this coordination... and haven't done it. With Temporal V1 we have a one-time opportunity to reduce churn in the ecosystem forever... and from what I've seen coming down the road from IANA, avoiding the whole "what's the right canonical ID?" question forever (at least for Temporal) seems appealing.
Is the current plan for V8 to implement Temporal using ICU4C or ICU4X?
zdt = Temporal.ZonedDateTime.from('2020-01-01T00:00[Europe/Copenhagen]');
zdt.equals('2020-01-01T00:00[Europe/Berlin]'); One approach that I think might be web-compatible would be to not canonicalize
Yep, agree. Although if we went with the "don't canonicalize IDs except UTC" solution above, that would require zero changes from ICU, and would only require a small change from implementers which could be bundled with the changes in #2482 which will already change how TimeZone slots are stored and used. The delta of additional implementer effort seems quite small. But I agree that once we start asking for any different canonicalization behavior, I agree this would introduce delay. Which might be an argument for the "no-canonicalize" solution or the "full canonicalize" status quo as the best options for V1. |
If we let ICU keep canonicalizing the In other words, if we went with option 3 now, we could adopt options 1 or 2 (or even 4) later. Option 4 has implementation concerns just like options 1 and 2. The laundry list of 10 questions in the OP is well thought out, but they are questions we need to resolve if we were to implement option 4, and, again, Temporal needs to persist the user-specified time zone alongside the ICU time zone (unless it computes the ICU time zone on the fly when it is needed).
I don't think Temporal is the right vehicle to force this type of ecosystem change. Temporal is already a really tall order for implementations. I do hope that implementations would be more amenable to solving the problem if there were a future proposal narrowly focused on this problem space. |
Sharing more stuff I've learned: CLDR metadata, not IANA TZDB, is currently the source of time zone canonicalization mappings in ECMAScript engines, per this comment:
I think this means that we don't really care that much about the TZDB fork, as long as:
The last bullet is a problem! Currently the spec says this:
This language, combined with other spec text encouraging use of the latest TZDB, will force implementers to use IANA's canonicalization strategy because the spec text is very prescriptive about use of If we do want engines (and not Temporal) to decide how canonicalization should work, then this spec text needs to change. Right? |
Yeah, it makes a lot of sense to solve this in the section of 402 you're pointing to. I think there's already an issue open for it. |
Given that this is already visible in 402, should Temporal be concerned with this issue specifically? Implementations already manage to choose to do something or other. We should just make sure that, whatever the result is, we apply it to 402 and Temporal equally. |
@sffc Are you thinking of tc39/ecma402#272? That issue seems a bit wider than just canonicalization, although it touches on some of the same questions.
@littledan Currently the only way to know the canonical ID is quite hard to discover: In a Temporal world, canonical IDs will be highly visible in output of So although canonicalization exists in 402 today, it will have a lot more visibility and impact once Temporal ships in engines. Hence my concern!
@gibson042 After #2482, if an object is in a ZDT's [[TimeZone]] slot, will we know if it's a custom zone or not? I'm OK to use |
Based on discussion above, and given CLDR's synonym-only canonicalization strategy, I think we can narrow the decision to two basic choices below. Note that neither option requires any change to ICU or CLDR. A. Status quo: Follow Links + change 402 to codify existing CLDR practice.
Pro: Less spec churn; Somewhat easier to implement. B. Don't follow non-UTC Links when exposing time zone identifiers from Temporal objects
Pro: better web compatibility
Con: More spec churn; Somewhat harder to implement.
Unfortunately, I don't think that (B) above is possible in a V2. For example, it would not be web-compatible to stop considering Asia/Calcutta and Asia/Kolkata as equivalent in |
Firefox doesn't use CLDR time zone canonicalisation, but IANA canonicalisation (including
CLDR has a stable time zone id policy, which can be problematic for some time zone ids. For example
tc39/ecma402#272 (comment) has a link to this old bug report from bugs.ecmascript.org: https://tc39.es/archives/bugzilla/1892/. Some missing bits which aren't yet covered here:
The overall situations is more like:
There are probably more special cases, too. For example take [1] The meta zone mapping uses optional date information to handle the case when time zone rules change. When no date information is present, ICU restricts the range from 1970-01-01 to 9999-12-31, so it's best not to use dates more than fifty years in the past resp. dates too far into the future when testing this. js> var dtf = new Intl.DateTimeFormat("en", {timeZone: "Antarctica/McMurdo", timeZoneName:"long"})
js> dtf.format(Date.UTC(1970, 0, 1))
"1/1/1970, New Zealand Standard Time"
js> dtf.format(Date.UTC(1970, 0, -1))
"12/30/1969, GMT+12:00"
js> dtf.format(Date.UTC(9999, 11, 31))
"12/31/9999, New Zealand Daylight Time"
js> dtf.format(Date.UTC(9999, 11, 31+1))
"1/1/10000, GMT+13:00" |
Thanks, this is very useful info.
@anba - What is Firefox planning to do with the recent changes in IANA to merge unrelated zones together, for example, Once Temporal ships, these merges will be very problematic because time zone strings will be much more visible and will be persisted (e.g. in databases) and re-used far in the future. For example, imagine a calendar app that stores meeting times in a database using |
Firefox examines the time zone information from For Using But just using |
That sounds like a good approach, and definitely better than the current main fork of TZDB. Do you know if what you're doing in FF varies from what https://github.com/JodaOrg/global-tz is doing? They sound quite similar. |
From Temporal and 402 meetings 2023-03-09, we'll follow up on this issue in two ways:
In the meantime I'll close this issue to remove noise from the Temporal repo. |
I think TZDB with The aforementioned If we want to do exact comparisons, it's necessary to explicitly define which configuration is tested:
[1] It's likely that ICU4C and ICU4X will also have slightly different behaviour, because if ICU4X uses BCP-47 ids to store time zone ids, it can't represent the old and deprecated SystemV time zone ids, because those don't have a BCP-47 id. It could use |
While working on #2493, I learned that the IANA Time Zone Database has been forked due to a disagreement between that database's maintainer and some prominent users of the database.
Background
The two forks differ as follows:
Europe/Copenhagen
=>Europe/Berlin
andAtlantic/Reykyavik
=>African/Abidjan
. There are many more examples like this. This fork is preferred by the TZDB maintainer, and therefore is exposed by the official IANA downloads of TZDB releases.PACKRATLIST
build option. That build option was added by the maintainer to ensure that both forks could be built out of the same repo. See discussion here and here.You can read more about the fork in the TZDB mailing list archives. A few relevant threads:
The fork seems to represent a philosophical difference about the purpose of the TZDB. One camp (which includes the maintainer) sees the goal of TZDB as simply providing a way to convert post-1970 zoned timestamps into exact instants, and wants to reduce the TZDB size and maintenance hassle of dealing with pre-1970 data. The other camp (supporting the unmerged fork) adds additional use cases:
I'm not sure how much Temporal cares about pre-1970 dates, but the latter two issues seem quite important to Temporal users. The second one will make calendaring apps more resilient to country-level timezone/DST changes, while the third will prevent developer confusion and consternation.
Also, given the complaints about the changes, it's possible that the TZDB may revert these changes in the future, which would cause further churn.
Options
Anyway, now that we know this fork exists, we need to figure out what to do about it in the Temporal spec. Options include:
1. Recommend that implementers use the Primary Fork
2. Recommend that implementers use the Unmerged Fork
3. Don't recommend anything; implementers are free to choose.
4. Stop canonicalizing time zones (thanks to @pipobscure for this suggestion)
equals
method; avoids triggering geopolitical sensitivities caused by modifying user input point to an unexpected country or name.Temporal.TimeZone.equals
method to help users identify equivalent time zones like Asia/Calcutta vs. Asia/Kolkata; may require modifying existing ICU behavior (per this comment, it sounds like Firefox already does similar mods).Discussion
Of the above options, my strong preference is for (4), because it solves both the forking issue as well as the existing canonicalization issues like Calcutta vs. Kolkata. Also, I think retaining user input as-is will be quite helpful to reduce confusion in cases where code takes input from some other source, modifies that data, and then sends or stores the modified data. If the time zone identifier varies a lot between the original and modified ZDT, I think that will generate user confusion that avoiding canonicalization would prevent.
If we want to go with (4), here's a few questions to answer:
Intl.DateTimeFormat.p.resolvedOptions().timeZone
behave? Should it also stop canonicalizing? If yes, should it add a newcanonicalTimeZone
property?Intl.DateTimeFormat.p.format
orDate.p.toLocaleString
? I suspect that the answer is "no" because localized descriptions of time zones don't usually surface the IANA identifiers, but not 100% sure about this.Europe/Paris
vs.europe/paris
? My opinion: yes, we should canonicalize.Asia/Calcutta
vs.Asia/Kolkata
. My opinion: no, because by not canonicalizingid
in this case we can avoid user complaints like this chromium bug, and we can ensure future compatibility & round-trippability even if zones are renamed in the future. Note thatequals
should probably report these astrue
though. (See below.)TimeZone.p.equals
method? I think we should, both for consistency across Temporal types and to help code be robust in the face of past or future renames of cities which seems to happen fairly often globally. JS code should be able to ask "Is this date in the India time zone" without having to worry that that code will be broken by a past or future rename.equals
should we also add a method that tests if all rules are the same across time zones, e.g.Atlantic/Reykyavik
vs.Africa/Abidjan
? I don't think this is needed. Userland code can always usegetNextTransition
in a loop to check for this kind of equality, and if there's user demand we could always add it in a later release.Etc/UTC
should resolve toUTC
in ECMAScript, matching current behavior. There's no value in changing this existing behavior.PACKRATLIST
option to work, TZDB data must provide a way to differentiate "merged" links likeAtlantic/Reykyavik
=>Africa/Abidjan
from "renamed" links likeAsia/Calcutta
vs.Asia/Kolkata
. How does this differentiation work, and does is work for all links or are there gaps? It sounds like @anba may know how this works.If we add
equals
, here's a suggestion for its behavior:id
property.Europe/Paris
vs.europe/paris
.Asia/Calcutta
vs.Asia/Kolkata
, because they represent the same thing with different spelling.Etc/UTC
then treat them as equal.Atlantic/Reykyavik
vs.Africa/Abidjan
) as equal, even if all their time zone transitions are the same, because future changes could make those locations have different time zone rules. Per above, if users want to evaluate "all rules are the same" then can do this in userland by comparing time zone transitions in a loop. Although honestly I'm skeptical that this will be a popular use case. Who cares if the rules are equal?Pinging @jasonwilliams @ptomato @sffc @gibson042 @pipobscure for your opinions.
The text was updated successfully, but these errors were encountered: