Naive implementations of optimizations suggested by breznak on pull-request n. 769 · Pull Request #776 · htm-community/htm.core

ghost · 2020-02-20T20:16:45Z

With regards to the optimizations suggested by breznak toward the end of pull-request n. 769 ( #769 (comment) and subsequent posts), I've added naive implementations of the caching of numNeigbors (in wrapping neighborhoods) and of the caching of all the neighbors of each column (in wrapping neighborhoods).

My results are as follows: NOTE: I'm still only using dynamic_hotgym as a benchmark, and haven't looked at mnist_sp yet

Caching of numNeighbors resulted in a a drop of ~6.5% in total time spent in local inhibition (as compared to the original run time);
Caching of all neighbors of columns, when applied only to inhibitColumnsLocal_(), resulted in a drop of ~14% in total time spent in local inhibition (as compared to the original run time);
(*)Caching of all neighbors of columns, when applied to updateBoostFactorsLocal_(), resulted in a drop of ~55% in total time spent in local inhibition (as compared to the original run time).

(*) It should be noted that in dynamic_hotgym, updateBoostFactorsLocal_() could be skipped completely, as boostFactor_ is always = 0. However, I don't believe this has any bearing on the results.

I forgot to keep track of memory usage, but I did some back of hand calculations and figured it shouldn't be so bad if you have many GB of RAM and don't go over 10k columns. Maybe this speed/memory trade-off could be added as an option, instead of as mandatory usage?

Thanks again breznak.

For #123

test update

breznak · 2020-02-20T21:18:31Z

(*)Caching of all neighbors of columns, when applied to updateBoostFactorsLocal_(), resulted in a drop of ~55% in total time spent in local inhibition

wow! 6, 14 and 55% increase are very encouraging numbers!

in total time spent in local inhibition (as compared to the original run time)

does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better?

And something is crashing, I'll have a look if I find a bug..

breznak

This looks promising, let's cleanup the errors and then we can make the implementation nicer and maybe even faster!

src/htm/algorithms/SpatialPooler.cpp

breznak · 2020-02-20T21:34:52Z

src/htm/algorithms/SpatialPooler.cpp

+	neighborMap_.clear();
+	neighborMap_.reserve(wrapAroundNeighbors_ * numColumns_);
+	for(UInt column=0; column<numColumns_;column++) {
+		for(auto neighbor: WrappingNeighborhood(column, inhibitionRadius_,columnDimensions_)) { 


my idea was to spend a really lot of time, mem in init, and cache this for all of columns x inhibition_range where cols typically 1-4k, inh radius (I'll have to see if it can be fixed, or) varies in min(columns dim)/2. I think we can stuff that into RAM

Are you saying that there are more things that should be cached? What did I miss?

What did I miss?

yes, this depends on inhibitionRadius_ that's why you need to call the method each time it changes in https://github.com/htm-community/htm.core/pull/776/files/95b8dfee87a8caab33955e1f691ca4d6a89cd24a#diff-75db947f594e8a477099a6c8bb86daa0R484

the extreme idea is:

//in constructor, pseudocode: for inhRadius in <all ranges of inh radius>: for col in all-columns: for auto neighbor in WrappingNeighborhood(col, inhR, columnDimensions): this->neighborMap[inhR][col].push_back(neighbor);

How often does the inhibition radius change in real applications? I ran the "perf" tool against dynamic_hotgym with the commits in this PR and now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time (plus or minus the margin of error of perf). Among the spatialPooler functions, the big villains are still updateBoostFactorsLocal_() (27% of total runtime) and inhibitColumnsLocal_() (13% of total runtime), but this has more to do with the amount of calculations they have to perform, I think.

Also, how do you figure memory requirements would scale with regards to amount of columns and disposition of columns in dimensions?

Edit: I mean, because in wrapAroundNeighborhoods, there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult (I think? maybe I'm wrong there).

How often does the inhibition radius change in real applications?

every isUpdateRound_ which defaults to 500, so 1/500th of the time. We've almost never changed that default.

now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time

ok, I agree this is good enough and not worth complicating things. Leave as is.

the big villains are still updateBoostFactorsLocal_() (27% of total runtime)

this is where the cached Neighborhood should help, all the function does is just looping through the neighborhood and increasing counter to get numNeighbors, I'd like to get it in O(1)

there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult

I'd suggest not to worry about wrapAround=False case (unless you can do it easily too). If these patches work well, I'll suggest keeping only the WrappingNeighborhood (unless there are proven use-cases for the non-wrapping).

breznak · 2020-02-20T21:37:27Z

src/htm/algorithms/SpatialPooler.cpp

+
+        const UInt numActive_wrap = (UInt)(0.5f + (density * (numNeighbors + 1)));
+
+	const UInt mapOffset = column * (numNeighbors+1);


I'd like to shield off this logic of indexing in the map (of cached values) from the SP. Ideally later we extend Neighbourhood() to have param preCache=true and do all of this internally.

So it should be done in such a way that we'd still use the same (auto neighbor: WrappingNeighborhood) logic, is that it?

I think we'd like the data const vector<SynapseIdx>[][]& mapNeighbors accessible from WrappingNeighborhood and filled in constructor there.
Used as

vector& neighbors = map[radius][col];

I'll try. This would probably the hardest change to make, I think(?).

yes, but I think it can be done quite nicely. (I haven't looked into the details, but this would be my high-level idea:)

add a new class CachedWrappingNeighborhood:

CachedWrappingNeighborhood(Neighborhood): constructor(rangeCenter, rangeRadius, dimColumns) { //this takes a lot of time, once for r in rangeRadius { for col in rangeCenter { hood = WrappingNeighborhood(col, r, colDimensions); vector locals={}; for( neighbor : hood) { locals.append(neighbor); } this->cachedHood[r][col] = locals; }} }

breznak · 2020-02-20T21:42:02Z

Some tests are failing,

https://github.com/htm-community/htm.core/pull/776/checks?check_run_id=458711654#step:5:246

if you run ./build/Release/bin/unit_tests you'll get a complete test suite we use

ghost · 2020-02-20T22:37:37Z

Some tests are failing,

I saw that. This last commit should fix it.

ghost · 2020-02-20T23:01:00Z

does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better?

It does translate into real speed. Real speed is much improved (at least on my machine).

ghost · 2020-02-20T23:03:47Z

This last commit should fix it.

It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything.

breznak · 2020-02-20T23:12:11Z

It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything.

I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations.

EDIT: yes, it'll likely be serialization, you've added new member variables. But before focusing on fixing that, if other C++ tests pass, let's make this code clean and look at things like
#776 (comment)

dkeeney · 2020-02-21T01:11:32Z

I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations.

I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later.

breznak · 2020-02-21T01:13:51Z

I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later.

ok, thanks! Not urgent right now, I want to figure the implementation where the code would actually end up. So we'd just ignore failing serialization tests for now.

breznak · 2020-02-26T08:23:51Z

@dkeeney it's a bit strange but the user completely disappeared, I like the changes so-far but we cannot expect any requested fixes or answers.

I'll try to finish review and do changes for the code that is here
not sure if I have time to implement the biggest proposed change- cached Neighborhoods.

dkeeney · 2020-02-26T13:31:37Z

I am getting close to the having the first draft of the REST interface ready. Maybe a day or two more. Then I can look at the serialization.

breznak · 2020-06-03T21:36:31Z

Closing this branch, follow up in -> #783

kineyev and others added 2 commits February 20, 2020 16:44

Merge pull request #1 from htm-community/master

b9f8746

test update

Naive implementations of optimizations suggested by breznak

95b8dfe

ghost requested a review from breznak February 20, 2020 20:16

breznak added optimization SP labels Feb 20, 2020

breznak reviewed Feb 20, 2020

View reviewed changes

Fix a couple of mistakes.

8d580a7

breznak assigned ghost Feb 20, 2020

Even more performance improvements.

22a720f

breznak self-assigned this Feb 26, 2020

breznak mentioned this pull request Mar 17, 2020

SP local inhibition optimization #783

Merged

3 tasks

breznak closed this Jun 3, 2020


		const UInt numActive_wrap = (UInt)(0.5f + (density * (numNeighbors + 1)));

		const UInt mapOffset = column * (numNeighbors+1);

Conversation

ghost commented Feb 20, 2020 • edited by breznak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

breznak commented Feb 20, 2020

Uh oh!

breznak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Feb 21, 2020 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

breznak commented Feb 20, 2020

Uh oh!

ghost commented Feb 20, 2020

Uh oh!

ghost commented Feb 20, 2020

Uh oh!

ghost commented Feb 20, 2020

Uh oh!

breznak commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dkeeney commented Feb 21, 2020

Uh oh!

breznak commented Feb 21, 2020

Uh oh!

breznak commented Feb 26, 2020

Uh oh!

dkeeney commented Feb 26, 2020

Uh oh!

breznak commented Jun 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ghost commented Feb 20, 2020 •

edited by breznak

Loading

ghost Feb 21, 2020 •

edited by ghost

Loading

breznak commented Feb 20, 2020 •

edited

Loading