Improve unknown start of token error message for invisible characters #2637

CramBL · 2025-02-23T12:28:55Z

Resolves #1016

The approach I implemented is displaying the unknown token in single quotes, but if it's not an ASCII character then it is also escaped and displayed in parenthesis.

Invisible character

Before

error: Unknown start of token:
  |
1 | test:
  | ^

After

error: Unknown start of token '' (\u{200b}):
  |
1 | test:
  | ^

ASCII character

Before

error: Unknown start of token:
  |
1 | %test:
  | ^

After

error: Unknown start of token '%':
  |
1 | %test:
  | ^

casey

Nice, see comments!

casey · 2025-02-24T16:53:39Z

src/compile_error_kind.rs

@@ -153,7 +153,9 @@ pub(crate) enum CompileErrorKind<'src> {
  UnknownSetting {
    setting: &'src str,
  },
-  UnknownStartOfToken,
+  UnknownStartOfToken {
+    token: char,


Perhaps call this start? Since it's not really a whole token. Also, start is the name of the variable in lexer.rs, so you can use shorthand initializer syntax there.

casey · 2025-02-24T16:54:30Z

src/compile_error.rs

+        if token.is_ascii() {
+          format!("'{token}'")
+        } else {
+          format!("'{token}' ({})", token.escape_unicode())


I think the escape_unicode format isn't super useful. Let's use the U+NNNN representation, which is commonly used to represent unicode codepoints. NNNN should be four hex digits. Leading zeros should not be elided.

Also, let's use char::is_ascii_graphic as the test for whether to print the unicode codepoint. This will cause it to be printed for ASCII control characters, as well as ascii whitespace, which I think is probably good.

Also, I think this should just be done with two calls to write:

write!(f, "Unknown start of token '{start}')?; if !start.is_ascii_graphic() { // print (U+NNNN) } Ok(())

Implemented. It needs 3 write! however, since there's a colon at the end

casey

Whoops, missed something. New tests should use the Test struct directly. Also, you can remove the : at the end of the error message. I'm not sure why it's there, and other messages don't have it.

casey · 2025-02-24T20:35:16Z

tests/misc.rs

@@ -1679,6 +1679,36 @@ assembly_source_files = %(wildcard src/arch/$(arch)/*.s)
   status:   EXIT_FAILURE,
 }

+test! {


Sorry I didn't catch this before: The test! macro is kind of deprecated. I don't think it's very clear, and it's often inflexible, so anything custom often requires workarounds, so I try to use the Test struct and constructor for new tests (which the test! macro uses under the hood). So these should be converted to use the Test struct.

I know you said for new tests but I took the liberty to convert the other tests that this PR touches as well.

Nice, that's great. Maybe one day I'll get around to actually converting all the tests 😅 It strikes me that AI might actually be really good at this kind of tedious but complex refactoring.

…ng zeros are not trimmed

… the test! macro

casey · 2025-02-25T21:43:43Z

Excellent, thank you!

CramBL force-pushed the improve-start-of-token-err-msg branch 2 times, most recently from 4569d6d to d99c184 Compare February 23, 2025 12:45

casey requested changes Feb 24, 2025

View reviewed changes

CramBL added 5 commits February 25, 2025 10:53

Improve unknown start of token error message for invisible characters

92d2136

address review comments

9fdd062

add test to ensure control characters are printed correctly and leadi…

f82911a

…ng zeros are not trimmed

remove colon

047735a

adjust unknown_start_of_token tests to use the Test struct instead of…

0efb648

… the test! macro

CramBL force-pushed the improve-start-of-token-err-msg branch from f22aca4 to 0efb648 Compare February 25, 2025 09:54

casey merged commit cb12c60 into casey:master Feb 25, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve unknown start of token error message for invisible characters #2637

Improve unknown start of token error message for invisible characters #2637

CramBL commented Feb 23, 2025 •

edited

Loading

casey left a comment

casey Feb 24, 2025

casey Feb 24, 2025

casey Feb 24, 2025

casey Feb 24, 2025

CramBL Feb 24, 2025

casey left a comment

casey Feb 24, 2025

CramBL Feb 25, 2025

casey Feb 25, 2025

casey commented Feb 25, 2025

Improve unknown start of token error message for invisible characters #2637

Improve unknown start of token error message for invisible characters #2637

Conversation

CramBL commented Feb 23, 2025 • edited Loading

Invisible character

Before

After

ASCII character

Before

After

casey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

casey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

casey commented Feb 25, 2025

CramBL commented Feb 23, 2025 •

edited

Loading