Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But unicode support is still nuclear. In unicode you can write the same graphemes in many different ways. And if you go to non-unicode supported language specifics, like ue standing for ü, where the ü can be written in two different ways, with marks and directly, neither ripgrep nor ugrep will help find those substrings. Also the many Arabic subtleties, where there are not only mark combinations, but also more beautiful characters meaning the same graphemes.


Idk what "still nuclear" means.

And yes, ripgrep doesn't do any kind of Unicode normalization. Few tools do.

But that doesn't mean what I said was wrong. ripgrep has a whole host of Unicode features that ag doesn't have.


Nuclear is nuclear, not unclear.

Read https://www.unicode.org/reports/tr10/#Searching what unicode has to say about string search, in opposition of the usual byte search grep tools.


The GP probably meant "unclear".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: