-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve unknown start of token error message for invisible characters #2637
Conversation
4569d6d
to
d99c184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, see comments!
src/compile_error_kind.rs
Outdated
@@ -153,7 +153,9 @@ pub(crate) enum CompileErrorKind<'src> { | |||
UnknownSetting { | |||
setting: &'src str, | |||
}, | |||
UnknownStartOfToken, | |||
UnknownStartOfToken { | |||
token: char, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps call this start
? Since it's not really a whole token. Also, start
is the name of the variable in lexer.rs
, so you can use shorthand initializer syntax there.
src/compile_error.rs
Outdated
if token.is_ascii() { | ||
format!("'{token}'") | ||
} else { | ||
format!("'{token}' ({})", token.escape_unicode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the escape_unicode
format isn't super useful. Let's use the U+NNNN
representation, which is commonly used to represent unicode codepoints. NNNN
should be four hex digits. Leading zeros should not be elided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's use char::is_ascii_graphic
as the test for whether to print the unicode codepoint. This will cause it to be printed for ASCII control characters, as well as ascii whitespace, which I think is probably good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think this should just be done with two calls to write
:
write!(f, "Unknown start of token '{start}')?;
if !start.is_ascii_graphic() {
// print (U+NNNN)
}
Ok(())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented. It needs 3 write!
however, since there's a colon at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, missed something. New tests should use the Test
struct directly. Also, you can remove the :
at the end of the error message. I'm not sure why it's there, and other messages don't have it.
tests/misc.rs
Outdated
@@ -1679,6 +1679,36 @@ assembly_source_files = %(wildcard src/arch/$(arch)/*.s) | |||
status: EXIT_FAILURE, | |||
} | |||
|
|||
test! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't catch this before: The test!
macro is kind of deprecated. I don't think it's very clear, and it's often inflexible, so anything custom often requires workarounds, so I try to use the Test
struct and constructor for new tests (which the test!
macro uses under the hood). So these should be converted to use the Test
struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you said for new tests but I took the liberty to convert the other tests that this PR touches as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, that's great. Maybe one day I'll get around to actually converting all the tests 😅 It strikes me that AI might actually be really good at this kind of tedious but complex refactoring.
…ng zeros are not trimmed
f22aca4
to
0efb648
Compare
Excellent, thank you! |
Resolves #1016
The approach I implemented is displaying the unknown token in single quotes, but if it's not an ASCII character then it is also escaped and displayed in parenthesis.
Invisible character
Before
After
ASCII character
Before
After