Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note to self:

If I need to publish something and have it remain completely anonymous, convert it to ASCII TEXT. That is one of the few formats I understand well enough to be CERTAIN that it is free of metadata.

Now all I have to worry about is how I can anonymously publish it, text style analysis, and how to include diagrams without resorting to ASCII drawings.



You could always just release the raw LaTeX source and let people compile it themselves or something.


I'd imagine you reveal too much information with your personal latex writing style.


It's not foolproof, but that does require that you have released a sufficiently large sample of LaTeX source under your real name (or something that can be traced to your real name) for comparison. Additionally, if you really want to release something typeset like that, but you're worried about fingerprinting, my guess is that it's not that hard to deliberately change your personal style by using a minimal set of packages and when necessary restricting yourself to the most popular packages. Even if your new personal style still ends up unique, it wouldn't match the fingerprint of anything you've released under your own name.


That's why I compose all my top secret documents in emoji. If anyone finds them they'll just assume it's gibberish from a 13yr old girl.


13-year-old girls are more fluent digital communicators than you or me.


Strongly disagree.

This is something you will hear from a lot of places, often from people who never were early adopters of neither mail, sms or skype etc.


Yeah well that's not me... I've been on BBSes since 1988 and got my first proper internet shell account in 1991 when, incidentally, I was 13.

The reason I made the comment is because it rubbed me the wrong way to cite 13-year-old girl talk as prototypical gibberish. Are 13-year-old girls generally excitable and potentially annoying? Yes. Do they talk in gibberish as a matter of course? No, that's a thoughtless sexist trope.

And I'll place a blind bet that they text faster than you.


Agree that the girl part should have been left out.

Still think an average hn-er should be able to type faster and more correct on a normal keyboard than the average 13yo, regardless of gender.


You're stacking the deck. Physical keyboards are ancient technology ;)


Might markdown be sufficiently basic to reduce style based fingerprinting?

I suspect that the actual writing style apart from the markup would provide some clues.


Or build the latex in a clean VM


http://cm.bell-labs.com/who/ken/trust.html suggests you cannot trust anything!


I wonder if there is a tool to obfuscate text to prevent analysis. one way I could think of is to use any online translation tool to translate to a foreign language and then back to english. and then fix the grammar slightly. the structure of the text should hopefully be different enough for any tool/human to recognize it's your style. is there an easier way to do this?


There was an interesting talk about this (adversarial stylometry) at 28C3:

https://www.youtube.com/watch?v=C9SgAOcCm0I

One of the methods they mentioned is machine translation, but they found that it wasn't terribly useful. It's a really neat talk and I highly recommend it. They also wrote some software to anonymize texts (Anonymouth) and their stylometry software (JStylo) is also freely available:

http://events.ccc.de/congress/2011/Fahrplan/events/4781.en.h...


There is some free software available[0] to do stylometry analysis. And some software which purports to assist in anonymizing writings[1]. I've not really played around with either, so I can't speak to their ease of use and/or effectiveness. But it's at least somewhere to start.

[0] http://evllabs.com/jgaap/w/index.php/Main_Page

[1] https://github.com/psal/anonymouth


A pretty neat idea, but of course if you use an online tool you would still be disclosing the original text to an untrusted third party...


Yeah, some work has been done in this area, but I only know about it in passing via a mention in one of Jacob Appelbaum's talks. This seems like it might be a decent place to start: https://www.youtube.com/watch?v=-b0Ta9h62_E


Reminds me of a concept of "google translate fixed point", i.e. you translate between english and an foreign language back and forth until the translation stops changing.


Interesting, however, I would not use google translate to obfuscate my writing, as it might appear in google logs :)


:). Speaking of which, can you recommend any non-Google online translation tool?



Don't be too sure about ASCII text file either, it may still contain a bit of metadata like the BOM (although then technically it's not pure ASCII, but rather UTF, but still it means you have to check the encoding used by your favourite text editor).


The BOM isn't exactly identifying information, but it's things like that (and encodings) that made me specify ASCII.


In case of UTF8, BOM isn't information at all, it's always the same 3 bytes (endianness mark is meaningless on byte-based encoding). And being added by default Windows plain text editor, it's fingerprinting usefulness is rather limited.


I think you'd be okay with any format, such as HTML, that you can eyeball in a plain text editor; you just need to eschew binary formats. So diagrams in SVG in preference to PNG?


And that's why I created Bitcoin Megaphone! http://bitcoinmegaphone.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: