Pinned post

Finished my 6-second game trailer for my Asteroids (1979)-clone in virtual reality:

youtu.be/tGoc2T6ABUc

Boring Math (11.0)

I am probably gonna also just optimize around the small cases.

Don't actually use a bifurfactor (implementation-wise) until the size gets larger than a few kilobytes.

Use a "current chunk" mechanism where you are actually reading e.g. 64-256 or 4096 byte chunks at a time. Only need to do the expensive work of finding the next chunk infrequently (amortized analysis).

Boring Math (11.0)

Oh, right, another important aspect is using every item in a cache line before switching to a new strategy. E.g. if you were really unlucky you might have the following scenario:

"x- ::= X" access

"-" has 1 cache block
"x" has 3 cache blocks in the final 67%
"X" has 3 cache blocks in the final 67%
If the L1 cache can only store two lower level identical address bits, then each time a data member is read it will evict itself before reading all of the data it needs... a lot.

Boring Math (11.0)

It's worth noting the linear aspect is intentionally on the left because data access near array is more likely.

Eventually two bifurcators for read and write in "x- ::= X" will merge at the leftmost side, which would not happen as a guarantee for the right variant to my knowledge.

Additionally, the more useful aspect is that data will usually be accessed starting from the smallest chunk size, so statistically you should assume the rightmost bits are the least valuable.

Boring Math (11.0)

I think I have one good idea. Make the "bifurcator" non- symmetric but still satisfy the requirements of starting from the center and accessing the leftmost, center, and rightmost data on the final step.

This means I will likely modify the code's data access to look like:

23 21 19 17 15 13 11 9 7 5 3 1 24 2 20 4 16 6 12 8 10 14 18 22

Which is a data access pattern that is somewhat simple.

Boring Math (11.0)

I am somewhat concerned about efficiency of the algorithm:

1) I was mistaken about only needing 1 copy at a time because e.g.: "x- ::= X" requires space to copy to.
2) caching can't have multiple addresses with the same lowest level bits set (cache associativity?), which probably is the biggest threat to performance
3) cache lines are no longer 80% used, they should be below 50% utilization.

Boring Math (11.0)

...

class bifurcator:
def __iter__(itera):
return
def __next__(self):
return

a6 = splicer(array, 1, False)
b6 = splicer(array, 1, True)
a12 = splicer(array, 2, False)
b12 = splicer(array, 2, True)

for iterator in [a6, b6, a12, b12]:
for value in iterator:
print(value)
print()
print()
a_dash_sort(mutable_input):

Boring Math (11.0)

exponent = 0
chunk_size = 6 * 2**exponent

array = range(6*16)

class splicer:
def __init__(self, array, size, first):
self.array = array
self.size = size*chunk_size
self.first = first

def __iter__(self):
self.index = self.size*2 + (1 if self.first else 0)
self.end = self.index + 2*self.size
return self

def __next__(self):
x = self.index
if x >= self.end:
raise StopIteration
self.index += 2
return self.array[x]

...

Boring Math (11.0)

Fuck it, I'm calling it A- or A-Dash, despite possible confusion with A* or A-Star (graph theory).

The A actually can reference the pathing in a way X cannot.

Boring Math (11.0)

This weekend I will try to create a bitmap of

plot Boole[pi function(n)/pi function(d) = 2]

Boring Math (11.0)

pi function(n)/pi function(d) = 2

Has an interesting "graph" (artifacts):

Boring Math (11.0)

Primes used are:

2,3,5,7,11,13

&

19,47,53

For

Table[pi function(prime(n)*2.5)/n], n=1 to 30

Boring Math (11.0)

pi function(62)/pi function(25) = 2

stackoverflow.com/questions/21

Table[pi function(prime(n)*2.5)/n], n=1 to 30

Which has a "center" around 15 (I think).

Boring Math (11.0)

This reminds me of Tokuda's Shell Sort gap sequence on two levels now. The original 80% yield for a hexagon and the similar (5/28)/(1/7) = 5/4 = 9/4 - 1

Boring Math (11.0)

2a + 4b = 1, ((2a + 4b/2)*3 + 2)/5 = 1 - 1/(15/6)

-->

a=-1/6
b=+2/6

Where notably negative numbers are used.

Boring Math (11.0)

2a + 4b = 1, ((2a + 4b/2)*3 + 2)/5 = .8

-->

a=b=1/6

...

2a + 4b = 1, ((2a + 4b/2)*3 + 2)/5 = 1 - 1/(14/3)

-->

a=1/7, b=5/28

Boring Math (11.0)

I think cache performance is 80%.

3 X's
2 -'s

X's have cache utilization 2/3rds.
-'s have cache utilization 100%.

Boring Math (11.0)

I think for the standard (non-negative) version you can do:

"x" = 4, "-" = 4, "X" = 8

I don't think you can get away with non-X (a zipper).

"x" --> "_-"
"X" --> "__--"
"x-" --> "X"

Which means "X-" or "X-Dash" Sort is a pretty good name.

It has a straightforward tag system.

Boring Math (11.0)

But I think you can combine both?

Boring Math (11.0)

I think you can convert this to a (better?) variant without negative numbers?

Boring Math (11.0)

-x- ::= X
x ::= -_-
X ::= --_--

"x"=8
"-"=4
"X"=16

"-x-" --> "X"

"X" --> "--_--"

The only time you do comparisons is with the "-x-" step (the other is merely unpacking/converting.

Also, you don't need multiple arrays in parallel or spliced together or interwoven, which is really convenient.

Show older Mastodon server focused on game development and related topics.