Edge cases: overflow

Spread the love

Consider this (rather suboptimal) prime factorization algorithm:

    public static class MyMath
    {
        public static IEnumerable<KeyValuePair<int, int>> Factor(int n)
        {
            if (n < 2)
            {
                yield break;
            }

            // Handle factors of 2 first
            int p = 0;
            while ((n & 1) == 0)
            {
                n /= 2;
                ++p;
            }

            if (p != 0)
            {
                yield return new KeyValuePair<int, int>(2, p);
            }

            // Now check every odd number
            for (int f = 3; f <= n; f += 2)
            {
                p = 0;
                int r = 0;
                while (r == 0)
                {
                    int m = Math.DivRem(n, f, out r);
                    if (r == 0)
                    {
                        n = m;
                        ++p;
                    }
                }

                if (p != 0)
                {
                    yield return new KeyValuePair<int, int>(f, p);
                }
            }
        }
    }

You can run it like so and see that it returns the proper factorization for a variety of Int32 values:

            Console.WriteLine(string.Join(';', MyMath.Factor(1)));
            // Output: <empty string>

            Console.WriteLine(string.Join(';', MyMath.Factor(1024)));
            // Output: [2, 10]

            Console.WriteLine(string.Join(';', MyMath.Factor(555555)));
            // Output: [3, 1];[5, 1];[7, 1];[11, 1];[13, 1];[37, 1]

            Console.WriteLine(string.Join(';', MyMath.Factor(2048002048)));
            // Output: [2, 11];[101, 1];[9901, 1]

In fact, it works properly for any possible Int32 — except one very critical value. Can you spot the defect?

.

.

.

.

.

.

The title of the post may have given it away, but the problem is in the for loop boundary. If n == int.MaxValue, the check f <= n will always succeed. Because of integer overflow and the lucky coincidence that 231 - 1 (AKA int.MaxValue) is prime, f will be incremented beyond this value and wrap around to the negatives -- specifically, -231 + 1 since we increment by 2. From there, the loop will keep running, continuing to wrap around forever and ever.

I had originally written this code to benchmark some simple factoring algorithms. While I was prescient enough to write a unit test for this particular case, I was still initially confused and surprised by its exceedingly long (in fact, infinite) running time.

There are several possible fixes for this problem. Perhaps the best one here is to optimize the algorithm and change the upper bound to the square root of the number. (For example, when using trial division to factor the number 1,000,003, it is not necessary to check any factor larger than 1,000 since the product 1001 x 1001 will exceed the original number.) While that happens to work in this particular algorithm, there may not be an easy way to avoid the MaxValue comparison in other situations.

Another quick fix which works in most general cases is to use long (Int64) for the loop counter f instead of int. This pushes the overflow boundary far higher, at the cost of a few well-placed cast operations. If you are running this code on an x64 processor, it is highly likely that the JIT compiler is already using 64-bit registers anyway so there should be no major performance impact.

Moral of the story: boundary values are still important; try passing int.MaxValue in your unit tests.

Leave a Reply

Your email address will not be published. Required fields are marked *