Skip to content

Conversation

@abadams
Copy link
Member

@abadams abadams commented Nov 22, 2021

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.

fast_integer_divide did two lookups, one for a multiplier, and one for a
shift. It turns out you can just use count leading zeros to compute a
workable shift instead of having to do a lookup. This PR speeds up use
of fast_integer_divide in cases where the denominator varies across
vector lanes by ~70% or so by avoiding one of the two expensive gathers.
@abadams abadams requested a review from shoaibkamil November 22, 2021 23:54
std::lock_guard<std::mutex> lock_guard(initialize_lock);
{
static Buffer<uint8_t> im(256, 2);
static Buffer<uint8_t> im(256);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: since C++11 onwards default to promising thread-safe initialization of statics, we could rewrite this to be a bit terser like so:

Buffer<uint8_t> integer_divide_table_u8() {
    static Buffer<uint8_t> im(256) = []() {
        Buffer<uint8_t> im(256);
        for (uint32_t i = 0; i < 256; i++) {
            im(i) = table_runtime_u8[i][2];
            if (i > 1) {
                internal_assert(table_runtime_u8[i][3] == shift_for_denominator(i));
            }
        }
        return im;
    }();

    return im;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will fix.

@abadams
Copy link
Member Author

abadams commented Nov 23, 2021

No, that's compile-time code.

@abadams abadams merged commit 8b68f85 into master Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants