Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

[flagged]


It kind of is, though. Google doesn't randomly try to visit every URL on the internet. It follows links. Therefore, for these files to be indexed by Google, they need to be linked to from somewhere.


Exactly , that's whyb"non public" github gists work. They are public, but not indexed anywhere "by default "


Good thing, otherwise they would have exposed countless photos via Google Photos.

Today, a photo file might be hosted at:

  photos.fife.usercontent.google.com/pw/[snip]=w[####]-h[####]-s-no-gm?authuser=0
But it used to be a little closer to:

  ...[google_site].com/[superLongAlphanumeric].jpg
And no auth required, URL only!


> Therefore, for these files to be indexed by Google, they need to be linked to from somewhere.

So? That’s indeed how Google works.

Google does not work how OP describes it.

I’ve investigated similar incidents in the past on other platforms, it was always user error causing links to be public.


Can you actually explain why the phrase you cited from OP is wrong? You say that ~”files need to be linked to from somewhere” is correct. How is a file linked to from somewhere [on the internet] if it’s not being served on the internet that Google crawls (ie, HTML)? The only alternative is in… API calls? That Google probably isn’t crawling?

“Fiverr might be hosting public HTML somewhere” seems like an entirely reasonable alternative phrase to “these links must be linked from somewhere [that Google can crawl] “, at least to someone who is only superficially familiar with how search works.

The distinction you imply is obvious is not, and your point is thus rather confusing to someone who is not you.


It’s a huge mistake to assume these links have to originate from fiverr-hosted HTML, it’s far more likely Google is finding them from places like GitHub repos used by fiverr-users.


That was my first thought, but is it logical to assume that 5+ unrelated people took their finished tax return URL and linked it on a website/tweet/etc? Who would do that?

Even still, Fiverr could very well have GDPR/CCPA/etc liability as the host of these files, because they related to its services, it's not just a generic file host.


> Who would do that?

Indian users, at least that’s what github data suggests.


The only thing that's user error here is the developers of Fiverr exposing files without proper session authentication.


That’s very often a deliberate design decision.

It’s bizarre UX if you link a file to someone and the link doesn’t work.


It's actually very common to link a file hosted in the cloud to a coworker or partner and it requires login.


It's exactly how it works, pages don't just magically appear in Google's index.

You need links to pages either from your own website or backlinks from other websites. Alternatively if the page is in your sitemap then Google will typically pick it up or you can manually submit it for indexing. For important pages you would typically want internal links, backlinks, and have it in your sitemap.


Google indexes links from places other than fiverr, odds are these links are mostly from places like GitHub.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: