happycoding/CountDistinctSlices.cpp at master · hellocomrade/happycoding · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
#include <cassert>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <iostream>

using namespace std;
/*
https://codility.com/demo/results/trainingGY4XYP-XSU/
https://codility.com/demo/results/training27GSKJ-JTQ/

Observation:
It requires a linear algorithm but allows you to use O(M) space and tells you that no elements in the
sequence is larger than M. This gives you the hint to take either a bitset or a hashmap. We will make
the decision later:

Examine the following sequence:

4, 3, 2, 1, 3, 5, 6, 5

If we put a pointer (base) at the start and use another pointer (iterator) to scan the sequence, we will stop at index 3,
since A[4] == A[1]. The subsequence [4, 3, 2, 1] could generate at most 1+2+3+4=10 unique slices. We could
easily generalize this to n*(n+1)/2, given n is the number of elements in the subsequence.

So, can we move the base pointer from index 0 to index 4? it looks we solved this? Wait a minute, if we
do so, we would miss distinct slices [2,1,3]! So, we actullay should move the base pointer to the position
just next to the duplicated element that we have scaned, which is index 2 in the example. OK, we could
only count the subsequence [4,3] using the formula we figured out. Then we moved the iterator back to index 2,
and continue this scan. However, this will not work! Because we are looking for a linear solution! Doing
so will generate a O(N^2) solution.

OK, it appears we have to try to make sure both the base and the iterator only moving forward, no backward, dude! All right,
no big deal!

Assuming we move the base pointer to index 2, and keep the itertor at index 4, then start to scan and stop at the next
duplicate at index 7. We got a subsequence [2,1,3,5,6]. If we use the formula to count the total again and add
them to the count we got at the first stop, what we miss here? We didn't miss, we double count the slices
for [2,1].

4, 3, 2, 1
2, 1, 3, 5, 6

Well, since we can count the total unique slices for any subsequence in O(1). We could do this:
count distinct slices for [4,3,2,1] first and then substract the duplicates count on [2,1] that will be
introduced in the next counting. This will only take O(1) time. Therefore, we can continue this process
and have a linear solution.

Go back to the O(M) space, how should we use it? bitset or hashtable? bitset may seems preferred at the first
glance, however, we soon realize that, in order to implement the algorithm we described above, we need to
track no only the value of the elements we scanned but also their indexes. This requirement rules out bitset,
and hashtable is the only option! So, base pointer moves to the index of the duplicate plus 1 every time scan
stops.

Another small concern: should we clear the hashtable everytime we scan a duplicate element? Not necessary.
Everytime we move the base pointer to its new location, we create a base line. If hashtable returns a hit,
but the index it returns is smaller than the base pointer's index, we know it was from the last run. We
could safely ignore it and put new key-value in.

OK, we now have a solution!

One thing I overlooked is: we have a possibility for integer overflow here. Since N <= 100000, if we do
N*(N+1)/2, it will be an overflow for integer! So, use long long to cover the rear.
*/
int solutionCountDistinctSlices(int M, const vector<int> &A)
{
	int len = A.size();
	assert(len > 0 && M >= 0 && M < 100001);
	unordered_map<int, int> map;
	unordered_map<int, int>::const_iterator end = map.end();
	unordered_map<int, int>::const_iterator valueItor;
	long long count = 0, i = 0, j = 0, k;
	while (j < len)
	{
		valueItor = map.find(A[j]);
		if (valueItor == end || valueItor->second < i)
		{
			map[A[j]] = j;
			++j;
		}
		else
		{
			count += (j - i)*(j - i + 1) / 2;
			k = j - valueItor->second - 1;
			count -= k*(k + 1) / 2;
			i = valueItor->second + 1;
			if (count >= 1000000000ll)return 1000000000;
		}
	}
	count += (j - i)*(j - i + 1) / 2;
	return (int)std::min(count, 1000000000ll);
}
/*
https://codility.com/demo/results/trainingHP9BHQ-E87/
https://codility.com/demo/results/training8BZMBM-AQ8/

I said the hashtable is the only option, I was wrong. You can actually just use a vector since
M is given up front. It makes the code look cleaner. While, for unordered_map declarations, you could
use keyword "auto" to make it shorter as well if using c++11 or newer.
*/
int solutionCountDistinctSlices1(int M, const vector<int> &A)
{
	int len = A.size(), lasti = 0;
	long long count = 0, i = 0, offset = 0;
	vector<int> memo = vector<int>(M + 1, -1);
	for (i = 0; i < len; ++i)
	{
		if (memo[A[i]] < lasti)
			memo[A[i]] = i;
		else
		{
			offset = i - memo[A[i]] - 1;
			/*
			The actual number of unique elements we count here is (i - 1 - lasti) + 1. i is the index
			of the current element that is a duplicate to the one at memo[A[i]], there comes (i - 1)
			as the index from the last unique elements in this run.
			*/
			count += (i - lasti + 1) * (i - lasti) / 2 - (offset + 1) * offset / 2;
			if (count > 1000000000ll)return 1e9;
			lasti = memo[A[i]] + 1;
			memo[A[i]] = i;
		}
	}
	return std::min(1000000000LL, count + (i - lasti + 1) * (i - lasti) / 2);
}
void testCountDistinctSlices()
{
	cout << "Expect 11: " << solutionCountDistinctSlices(4, vector<int>({ 3, 4, 3, 4, 3, 4 })) << endl;
	cout << "Expect 9: " << solutionCountDistinctSlices(6, vector<int>({ 3, 4, 5, 5, 2 })) << endl;
	cout << "Expect 5: " << solutionCountDistinctSlices(0, vector<int>({ 0, 0, 0, 0, 0 })) << endl;
	cout << "Expect 6: " << solutionCountDistinctSlices(3, vector<int>({ 1, 2, 3 })) << endl;
	cout << "Expect 10: " << solutionCountDistinctSlices(4, vector<int>({ 1, 2, 3, 4 })) << endl;
	cout << "Expect 1: " << solutionCountDistinctSlices(100, vector<int>({ 1 })) << endl;
	cout << "Expect 9: " << solutionCountDistinctSlices(10, vector<int>({ 3, 8, 8, 1, 2 })) << endl;
	cout << "Expect 9: " << solutionCountDistinctSlices(4, vector<int>({ 3, 4, 3, 4, 3 })) << endl;
	cout << "Expect 14: " << solutionCountDistinctSlices(4, vector<int>({ 3, 4, 5, 3, 4, 3 })) << endl;
	cout << "Expect 12: " << solutionCountDistinctSlices(5, vector<int>({ 3, 4, 5, 3, 4 })) << endl;
	cout << "Expect 25: " << solutionCountDistinctSlices(10, vector<int>({ 1, 10, 7, 6, 1, 9, 2 })) << endl;
	cout << "Expect 5: " << solutionCountDistinctSlices(10, vector<int>({ 10, 1, 10 })) << endl;
	cout << "Expect 7: " << solutionCountDistinctSlices(10, vector<int>({ 1, 10, 1, 10 })) << endl;
	cout << "Expect 23: " << solutionCountDistinctSlices(10, vector<int>({ 1, 10, 7, 6, 1, 10, 2 })) << endl;
	cout << "Expect 23: " << solutionCountDistinctSlices(100000, vector<int>({ 1, 100000, 7, 6, 1, 100000, 2 })) << endl;
	cout << "Expect 10: " << solutionCountDistinctSlices(6, vector<int>({ 3, 4, 5, 5, 5, 2 })) << endl;
	cout << "Expect 8: " << solutionCountDistinctSlices(6, vector<int>({ 3, 4, 5, 5, 5 })) << endl;
	cout << "Expect 12: " << solutionCountDistinctSlices(6, vector<int>({ 3, 4, 3, 5, 6 })) << endl;
	cout << "Expect 15: " << solutionCountDistinctSlices(6, vector<int>({ 4, 3, 2, 3, 5, 6 })) << endl;

	vector<int> vec;
	for (int i = 0; i < 100000; ++i)vec.push_back(i);
	cout << "Expect 1000000000: " << solutionCountDistinctSlices(100000, vec) << endl;
}