Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

one of the Flickr engineers, cal Henderson, wrote a book with a title something like "building scalable websites" that was published by O'reilly. I'm pretty sure he covers that topic. You may be able to get acces online via your public libraries web site (you can in seattle, at least).

There are the obvious issues with file uploads, they can take a lot of bandwidth, and disk space, but there are a lot of less obvious problems.

1. File uploade take a lot longer than most web requests, both because of the size of the data, and because most client connections download faster than they upload.

2. As a result, file upload requests hold server resources longer than other requests. This usually comes down to memory, but there can also be file handle and socket limits. Also, more in the past than in the present day, just the CPU overhead from dealing with lots of open sockets could get to be an issue.

3. File uploads often carry a lot of memory overhead. The braindead simple way of handling way fileuploads in PHP, etc ends up buffering the whole file in memory until it the upload is complete. That can really add up. Furthermore, the process handling the upload request has the memory overhead of the PHP (or ruby, or python...) interpreter, and any code and libraries associated with your application. This overhead is carried even though most of that code and data structures are unnecessary for most of the request durration.

This memory useage really stacks up when each upload request lives for seconds, or minutes, rather than the milliseconds required for most requests.

There are lots of ways to deal with the resource issues. Writing the upload to disk as it arrives is a big improvement. You can go further by having a separate app/server instance that is tuned to minimize the size of each app/interpreter instance is another.

There are also ways to take advantage of file upload features built in to a front end webserver (like nginx) to buffer the whole upload to disk before your app has to get involved. Not to mention the amazon examples mentioned.

Turning to a specialized custom file upload server written in Java or C seems like an optimization you undertake if you outgrow the other solutions (including more memory per server, or more servers)

Imple



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: