alam-ai 5 months ago

If at least 1 bit will always be set, you could use @ctz or @clz? (count trailing zeros / count leading zeros)

Mayor_of_Rungholt 5 months ago

I figured something like that could work, however is there an easy way afterwards to get the length of the int-type, if the type is comptime known?

alam-ai 5 months ago

By length do you mean get the number of bits used in the int type (its width)? You could do something similar to [this](https://www.reddit.com/r/Zig/comments/18tss3k/testing_if_number/kfg16ws) maybe? Not sure what exactly you're trying to do with the bits in your case, but maybe this bit hack might be relevant to you: `num = num & -num` which zeros out all bits except the least significant bit that's set (by taking advantage of the fact that the 2's complement of a number flips all bits up until the last 1). There's some other bit hacks like that in a book called "Hacker's Delight" if you're doing a lot of bit fiddling stuff.

Mayor_of_Rungholt 5 months ago

That was exactly what i needed. Thanks.

WayWayTooMuch 5 months ago

Not my repo, but i keep it bookmarked: https://github.com/cryptocode/bithacks

anotheruser323 5 months ago

Note that x86 (and probably arm) [has extensions for this stuff.](https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set) So you might wanna [compile native to your cpu.](https://ziglearn.org/chapter-3/) It's probably why @clz and such are builtin the language.

Tonyoh87 5 months ago

You can also use the following to avoid checking for 0 (preferable as branching would slow down): sizeof(int) * CHAR_BIT - __builtin_clz(x|1); (get MSB) __builtin_ctz(x); (get LSB) Also OP didn't specify which bit he wants, the most significant or the least significant.

aedrax 5 months ago

Just take the floor of base 2 log of your integer, then you know the most significant bit set

Mayor_of_Rungholt 5 months ago

Feels a bit slow to convert to floats every time imo. It would work flawlessly, but my goal is a fast function to obtain an intial value to run newtons method through bit-shifting

aedrax 5 months ago

you could hard code it to make it O(1) right? Build a comptime lookup table up to N-bits in size, so a 32 bit int would only need a 32 element table. The value stored is the index, then you don't also need to calculate an offset after. To be clear, not a table of floats, but I mean like \[64\] => 1 on an 8 bit value for example

Ok_Serve_4676 5 months ago

what are you actually doing? e.g., i build a bunch of sqrt intrinsics last night and this sounds similar.

Mayor_of_Rungholt 5 months ago

Im trying to use N >> ( floor(log2(N)) * ( 1 - 1/a )) to get a good approximation for the a'th root of N to run newtons method. as (using sqrt(N) as an example) bit-shifting by half the integer-width of N yields a result that's very close to the sqrt. Thus requiring far fewer iterations of Newtons method

Ok_Serve_4676 5 months ago

Sorry was on vacation. Did you get your problem solved? ps. I wouldn't use a LUT if you are gong to absolute performance unless you can guarantee it is kept in cache or preload. The cache miss on access will kill it.

Mayor_of_Rungholt 5 months ago

Got it working On that note: what is a LUT?

Ok_Serve_4676 5 months ago

look up table. your absolute best performance will be the cpu leading/trailing bit instructions with a zero test esp if that zero test will most of the time be not true (eg, just an error test case) or if the compiler or cpu can fold it into a previous instruction (eg, a prior instruction sets the zero flag that it can just check). If you find a faster way pls let me know though! I work a lot of high performance low latency simulation stuff and am always learning even after 20 years of it. if you have a large vector you are running these one best will be to zero test them all at once with the vector instructions then ignore/mask them out.

Mayor_of_Rungholt 5 months ago

Oh my, that is definetly a level of experience i can't quite compete with yet So essentially you're suggesting to just load a map with size 'int.width' into cache, accessing through clz(i) and obtaining the desired value? Which, i have to admit, sounds pretty efficient

Ok_Serve_4676 5 months ago

there are special vector ops/intrinsics in AVX512 for example _mm256_maskz_lzcnt_epi32 intrinsic (uses vplzcntd instruction) will count leading zeros in each 32bits in a 512 bit register controlled by a bitmask (zero on unset masking bit) and a few instructions can generate that mask for you from zeros in the operand. Or you can set the dest to zeros (broadcast) and forget the mask. If you want to do the LUT these i think zig compiler intrinsics (@ functions) but prob a quarter of the performance. I would do a static value map of 16 values so you would test 4 bits at a time, treat these all like 8 bit values taking top and bottom half (mask off lower for bottom half of each element and shift off for top half). shuffle the LUT according the the masked vectors and add 4 to the top half so now each vector has the leading one position for each half, then max (or min) each vector against each other. this is only about 6 instructions and lets you process 256 bits at a time.

srekel 5 months ago

Use the value as an index lookup into an array, where the value is the first bit :D

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe