Naive implementations of optimizations suggested by breznak on pull-request n. 769#776
Conversation
wow! 6, 14 and 55% increase are very encouraging numbers!
does it translate well into the real speed? I mean, the local inh function is now faster, but other computations can be slower. Is the real speed (reported by the benchmark, eg) also better? And something is crashing, I'll have a look if I find a bug.. |
breznak
left a comment
There was a problem hiding this comment.
This looks promising, let's cleanup the errors and then we can make the implementation nicer and maybe even faster!
| neighborMap_.clear(); | ||
| neighborMap_.reserve(wrapAroundNeighbors_ * numColumns_); | ||
| for(UInt column=0; column<numColumns_;column++) { | ||
| for(auto neighbor: WrappingNeighborhood(column, inhibitionRadius_,columnDimensions_)) { |
There was a problem hiding this comment.
my idea was to spend a really lot of time, mem in init, and cache this for all of columns x inhibition_range where cols typically 1-4k, inh radius (I'll have to see if it can be fixed, or) varies in min(columns dim)/2. I think we can stuff that into RAM
There was a problem hiding this comment.
Are you saying that there are more things that should be cached? What did I miss?
There was a problem hiding this comment.
What did I miss?
yes, this depends on inhibitionRadius_ that's why you need to call the method each time it changes in https://github.com/htm-community/htm.core/pull/776/files/95b8dfee87a8caab33955e1f691ca4d6a89cd24a#diff-75db947f594e8a477099a6c8bb86daa0R484
the extreme idea is:
//in constructor, pseudocode:
for inhRadius in <all ranges of inh radius>:
for col in all-columns:
for auto neighbor in WrappingNeighborhood(col, inhR, columnDimensions):
this->neighborMap[inhR][col].push_back(neighbor);
There was a problem hiding this comment.
How often does the inhibition radius change in real applications? I ran the "perf" tool against dynamic_hotgym with the commits in this PR and now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time (plus or minus the margin of error of perf). Among the spatialPooler functions, the big villains are still updateBoostFactorsLocal_() (27% of total runtime) and inhibitColumnsLocal_() (13% of total runtime), but this has more to do with the amount of calculations they have to perform, I think.
Also, how do you figure memory requirements would scale with regards to amount of columns and disposition of columns in dimensions?
Edit: I mean, because in wrapAroundNeighborhoods, there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult (I think? maybe I'm wrong there).
There was a problem hiding this comment.
How often does the inhibition radius change in real applications?
every isUpdateRound_ which defaults to 500, so 1/500th of the time. We've almost never changed that default.
now mapAllNeighbors() (including the functions it calls) accounts for less than 1% of total run-time
ok, I agree this is good enough and not worth complicating things. Leave as is.
the big villains are still updateBoostFactorsLocal_() (27% of total runtime)
this is where the cached Neighborhood should help, all the function does is just looping through the neighborhood and increasing counter to get numNeighbors, I'd like to get it in O(1)
there can be easily a single map that goes from minInhibitionRadius_ to maxInhibitionRadius_, so RAM isn't really a concern, but without wrapAround this becomes more difficult
I'd suggest not to worry about wrapAround=False case (unless you can do it easily too). If these patches work well, I'll suggest keeping only the WrappingNeighborhood (unless there are proven use-cases for the non-wrapping).
|
|
||
| const UInt numActive_wrap = (UInt)(0.5f + (density * (numNeighbors + 1))); | ||
|
|
||
| const UInt mapOffset = column * (numNeighbors+1); |
There was a problem hiding this comment.
I'd like to shield off this logic of indexing in the map (of cached values) from the SP. Ideally later we extend Neighbourhood() to have param preCache=true and do all of this internally.
There was a problem hiding this comment.
So it should be done in such a way that we'd still use the same (auto neighbor: WrappingNeighborhood) logic, is that it?
There was a problem hiding this comment.
I think we'd like the data const vector<SynapseIdx>[][]& mapNeighbors accessible from WrappingNeighborhood and filled in constructor there.
Used as
vector& neighbors = map[radius][col];
There was a problem hiding this comment.
I'll try. This would probably the hardest change to make, I think(?).
There was a problem hiding this comment.
yes, but I think it can be done quite nicely. (I haven't looked into the details, but this would be my high-level idea:)
- add a new class CachedWrappingNeighborhood:
CachedWrappingNeighborhood(Neighborhood):
constructor(rangeCenter, rangeRadius, dimColumns) {
//this takes a lot of time, once
for r in rangeRadius {
for col in rangeCenter {
hood = WrappingNeighborhood(col, r, colDimensions);
vector locals={};
for( neighbor : hood) {
locals.append(neighbor);
}
this->cachedHood[r][col] = locals;
}}
}
|
Some tests are failing,
if you run |
I saw that. This last commit should fix it. |
It does translate into real speed. Real speed is much improved (at least on my machine). |
It didn't entirely fix things. It seems the C++ tests are passing, but the python tests fail at some point (in Pickle)? I may have a hard time figuring this one out, as I've never even looked at the python bindings. I'll post an update if I find out anything. |
I'll have a look later, or we can as @dkeeney ?, he's been digging into the serializations. EDIT: yes, it'll likely be serialization, you've added new member variables. But before focusing on fixing that, if other C++ tests pass, let's make this code clean and look at things like |
I am deep into the REST interface PR at the moment but if you cannot find the problem with serialization I can take a look at it later. |
ok, thanks! Not urgent right now, I want to figure the implementation where the code would actually end up. So we'd just ignore failing serialization tests for now. |
|
@dkeeney it's a bit strange but the user completely disappeared, I like the changes so-far but we cannot expect any requested fixes or answers.
|
|
I am getting close to the having the first draft of the REST interface ready. Maybe a day or two more. Then I can look at the serialization. |
|
Closing this branch, follow up in -> #783 |
With regards to the optimizations suggested by breznak toward the end of pull-request n. 769 ( #769 (comment) and subsequent posts), I've added naive implementations of the caching of numNeigbors (in wrapping neighborhoods) and of the caching of all the neighbors of each column (in wrapping neighborhoods).
My results are as follows: NOTE: I'm still only using dynamic_hotgym as a benchmark, and haven't looked at mnist_sp yet
(*) It should be noted that in dynamic_hotgym, updateBoostFactorsLocal_() could be skipped completely, as boostFactor_ is always = 0. However, I don't believe this has any bearing on the results.
I forgot to keep track of memory usage, but I did some back of hand calculations and figured it shouldn't be so bad if you have many GB of RAM and don't go over 10k columns. Maybe this speed/memory trade-off could be added as an option, instead of as mandatory usage?
Thanks again breznak.
For #123