Code please.

kruhft · on April 8, 2011

Here is python code that I first used to solved to problem in the time allowed:

    #!/usr/bin/env python

    import sys
    import math

    count = int(sys.stdin.readline())
    for i in xrange(0, count):
        n = int(sys.stdin.readline())
        ndiv5 = math.floor(n / 5.0)
        if n >= 25:
            ndiv5 += math.floor(n / 25.0)
        if n >= 125:
	    ndiv5 += math.floor(n / 125.0)
        if n >= 625:
    	    ndiv5 += math.floor(n / 625.0)
        if n >= 3125:
    	    ndiv5 += math.floor(n / 3125.0)
        if n >= 15625:
    	    ndiv5 += math.floor(n / 15625.0)
        if n >= 78125:
    	    ndiv5 += math.floor(n / 78125.0)
        if n >= 390625:
    	    ndiv5 += math.floor(n / 390625.0)
        if n >= 1953125:
    	    ndiv5 += math.floor(n / 1953125.0)
        if n >= 9765625:
    	    ndiv5 += math.floor(n / 9765625.0)
        if n >= 48828125:
    	    ndiv5 += math.floor(n / 48828125.0)
        if n >= 244140625:
    	    ndiv5 += math.floor(n / 244140625.0)
        print int(ndiv5)

And here is my optimized Common Lisp version:

    (declaim (optimize (speed 3) (space 3) (safety 0)))
    (declaim (inline floor))
    
    (defun calc-trailing-zeros (n)
      (declare (type (integer 0 1000000000) n))
      (let ((ndiv5 (floor n 5)))
        (declare (fixnum ndiv5))
        (if (>= n 25)
    	    (setq ndiv5 (+ ndiv5 (floor n 25)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 125)
    	    (setq ndiv5 (+ ndiv5 (floor n 125)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 625)
    	    (setq ndiv5 (+ ndiv5 (floor n 625)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 3125)
    	    (setq ndiv5 (+ ndiv5 (floor n 3125)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 15625)
    	    (setq ndiv5 (+ ndiv5 (floor n 15625)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 78125)
    	    (setq ndiv5 (+ ndiv5 (floor n 78125)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 390625)
    	    (setq ndiv5 (+ ndiv5 (floor n 390625)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 1953125)
    	    (setq ndiv5 (+ ndiv5 (floor n 1953125)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 9765625)
    	    (setq ndiv5 (+ ndiv5 (floor n 9765625)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 48828125)
    	    (setq ndiv5 (+ ndiv5 (floor n 48828125)))
    	    (return-from calc-trailing-zeros ndiv5))
        (if (>= n 244140625)
    	    (setq ndiv5 (+ ndiv5 (floor n 244140625)))
    	    (return-from calc-trailing-zeros ndiv5))))
    
    (let ((count (parse-integer (read-line))))
      (declare (fixnum count))
      (dotimes (n count)
        (declare (fixnum n))
        (let ((factn (parse-integer (read-line))))
          (princ (calc-trailing-zeros factn)) (terpri))))

Notice the Python code doesn't even have the early returns that the Lisp code does. The Python code comes in at 5.42 seconds; the optimized Lisp version's best time was 3.39 seconds. I know the algorithm to solve the problem is probably not the most elegant, but it does get accepted. Any ideas on how to optimize the Lisp code further would be appreciated.

rayiner · on April 8, 2011

I don't know what you're expecting. Your algorithm is basically:

1) Read a line from a stream, parse it into an integer. 2) Do a little math. 3) Convert an integer into text and print it to a stream.

In Python, (1) and (3) are implemented in C. For Common Lisp, (1) and (3) are mostly implemented in Lisp.

Beyond that, (1) and (3) will completely dominate the runtime. (2) is probably just a couple of dozen CPU cycles, but the print and terpri will involve flushing stdout, which will involve a system call which will be thousands to tens of thousands of CPU cycles. Depending on how the stream is buffered, the read-line will likely involve a system call as well.

Basically in this test is just measuring how quickly the program can sit and wait for read()/write() system calls to complete. Since they're both calling into the same kernel, it's unsurprising they're the same speed.

ced · on April 8, 2011

Hey, I was curious about your claims, so I ran both versions on my computer, and SBCL came out 10X faster. For one million calls (count=1000000) and n fixed to 100000, SBCL took 0.6 seconds versus 8.6 seconds.

One thing I did, though, was to call both functions directly with arguments, and not rely on read-line/readline. That might be the difference, I wouldn't be surprised if Python was optimized for text processing, and SBCL was lazier about it (Lisp programs call read et al. infrequently). Your code is decently optimized, IMO. Cheers.

kruhft · on April 8, 2011

Thanks for checking. SPOJ uses SBCL 1.0.18 so that might be a bit of an issue, since it seems to be a couple of years old. I was thinking that the input routines might have been a bit of a problem and that CL isn't optmized as much for that type of processing. Maybe something to look into in SBCL code base for improvement.

waterhouse · on April 8, 2011

Mmm, this looks like "number of trailing zeroes in n! when written in base 10", or, equivalently, "number of factors of 5 in n!". If we let that be f(n), and u = floor(n/5), then f(n) = u + f(u). Then, instead of writing out all the powers of 5 up to whatever your integer limit is... if you think for a little bit, this will work:

  (defun calc-trailing-zeros-2 (n)
    (do ((n (floor n 5) (floor n 5))
         (total 0 (+ total n)))
        ((zerop n) total)))

Methinks this is much better; if there is a performance disadvantage, I wouldn't care about it (and you can add type-decs as you wish; if you do, note that you'd probably have to say (the fixnum (+ a b)) even if a and b are known fixnums). If you want to verify that it works...

  (dotimes (i 30) (print (= (calc-trailing-zeros i) (calc-trailing-zeros-2 i))))

It prints a bunch of T.

As for the performance comparison, I'd echo what others have said: I expect the I/O (read-line, princ, terpri) to dominate the run time. ... Now for some experimentation. "randomthing" is an SBCL image in my executable "mybin" directory that executes the "let ..." code at the bottom; achieved with (save-lisp-and-die "randomthing" :executable t :toplevel (lambda () (let ...))). Let's test it:

  ;Clipboard is "100000" followed by 100000 instances of "500000"
  $ time pbpaste | randomthing > /dev/null

  real	0m0.453s
  user	0m0.352s
  sys	0m0.096s

  ;Clipboard is "100000" followed by 100000 instances of "000000"
  $ time pbpaste | randomthing > /dev/null

  real	0m0.352s
  user	0m0.263s
  sys	0m0.091s

In the case of "000000", the "calc-trailing-zeros" part should take about zero time, and therefore the remaining time is all the I/O stuff. Depending on whether we go by the "real" or "user" thing, this tells us that the math part takes up either 2/7 or 2/9 of the time of the whole program.

And just to be sure that "pbpaste" isn't the main cause of slowness:

  $ time pbpaste > /dev/null

  real	0m0.024s
  user	0m0.012s
  sys	0m0.009s

Fairly insignificant. Likewise, startup time for this Lisp image can be experimentally determined by having my clipboard only contain "10" followed by 10 integers:

  $ time pbpaste | randomthing > /dev/null

  real	0m0.032s
  user	0m0.010s
  sys	0m0.027s

So, yeah, I/O dominates the run time, at least on my SBCL (SBCL 1.0.47).

kruhft · on April 7, 2011

I'll try and post the Python and CL code tonite.