Add a "opaque_id" column to submissions. #1419

veluca93 · 2025-06-18T23:51:08Z

This gives a stable ID that can be used to refer to a specific submission.

IDs will be used for the submission API. We can't use submission IDs as they leak information on the total number of submissions.

Quick SQL snippet to update an existing database:

begin;
alter table submissions add column opaque_id BIGINT;
update submissions set opaque_id = id where opaque_id IS NULL;
ALTER TABLE submissions ADD CONSTRAINT participation_opaque_unique UNIQUE (participation_id, opaque_id);
commit;

wil93 · 2025-06-19T08:32:29Z

What is an opaque_id? Is it to avoid disclosing the submission.id to users?

veluca93 · 2025-06-19T08:40:10Z

What is an opaque_id? Is it to avoid disclosing the submission.id to users?

yes

wil93 · 2025-06-19T08:47:52Z

What is an opaque_id really? And why we need it? It would be good to update the PR's description.

gollux · 2025-06-19T09:01:06Z

We need submission identifiers for the API (PR #1335). But using database IDs assigned serially would leak information on the number of submissions by other contestants.

veluca93 · 2025-06-19T09:01:59Z

What is an opaque_id really? And why we need it? It would be good to update the PR's description.

I believe the PR description already explains:

This gives a stable ID that can be used to refer to a specific submission.

wil93 · 2025-06-19T09:04:06Z

@veluca93 not really (since the existing ID would also work, arguably).

If you can add Martin's sentence We need submission identifiers for the API (PR https://github.com/cms-dev/cms/pull/1335). But using database IDs assigned serially would leak information on the number of submissions by other contestants. to the PR's description it would be much better.

veluca93 · 2025-06-19T09:06:25Z

@veluca93 not really, if you can add Martin's sentence We need submission identifiers for the API (PR https://github.com/cms-dev/cms/pull/1335). But using database IDs assigned serially would leak information on the number of submissions by other contestants. to the PR's description it would be much better.

Not sure it adds that much info, but sure, done.
Also note that you can modify PR descriptions.

cms/server/contest/handlers/contest.py

prandla · 2025-06-19T09:29:25Z

i'm not sure i like this approach much, because of the possibility of collisions. even though it would take around 2^31 submissions in the DB before collisions become likely enough to be a problem, it still seems a bit unclean to me. (also, if there is collision-avoidance in the generate_opaque_id function, then we should also implement similar logic in the dumpupdater, to at least ensure the IDs don't collide within a dump.)

if we want to make collisions so unlikely as to make them really negligible, i think it'd be better to use UUIDs. alternatively we can use an encrypted form of the database ID (encrypted with e.g. the secret_key in cms.conf).

to be clear i'm not really opposed to the current implementation, it's probably easier than my proposed alternatives. (well, uuids actually shouldn't be more difficult.)

veluca93 · 2025-06-19T09:34:11Z

i'm not sure i like this approach much, because of the possibility of collisions. even though it would take around 2^31 submissions in the DB before collisions become likely enough to be a problem, it still seems a bit unclean to me. (also, if there is collision-avoidance in the generate_opaque_id function, then we should also implement similar logic in the dumpupdater, to at least ensure the IDs don't collide within a dump.)

if we want to make collisions so unlikely as to make them really negligible, i think it'd be better to use UUIDs. alternatively we can use an encrypted form of the database ID (encrypted with e.g. the secret_key in cms.conf).

to be clear i'm not really opposed to the current implementation, it's probably easier than my proposed alternatives. (well, uuids actually shouldn't be more difficult.)

IMO if IDs collide in a dump you can just retry, that's unlikely enough.

During normal operations collisions are indeed avoided.

@gollux mentioned that UUID had some issues in postgres, but I don't remember exactly what.

I am opposed to using a function of the submission ID, because it introduces the possibility of mistakes leaking information.

cmscontrib/AddSubmission.py

cmscontrib/updaters/update_45.py

cms/db/submission.py

gollux

I will add more notes as separate comments.

gollux · 2025-06-19T13:04:52Z

My original proposal was to have the opaque IDs scoped to the given participation, while you have them globally unique.

I still think that a more restricted scope (at least a contest if not a participation) is better. One reason is that it reduces the probability of collisions, the other that it allows preserving the IDs in contest dumps without fear of collisions when importing. Also, when a dump is upgraded from an older format, it is sufficient to avoid collisions in newly assigned IDs.

gollux · 2025-06-19T13:22:47Z

Regarding ID format: UUIDs are popular, but quite unwieldy for both people and machines. In particular, they make poor index fields - they consume lots of space and they are allocated randomly, so a B-tree based index grows at all places simultaneously, as opposed to local growth when using sequential IDs. The latter problem is solved by UUIDv7, but it is still impractically large.

Using 64-bit IDs saves space (and makes the IDs more human-friendly, especially if we decide to encode them in a base larger than 10), but if we allocate them at random, they suffer from the same performance issues in B-trees. Choosing 64-bit IDs whose upper 32-bits are a UNIX timestamp and the lower bits random would make index performance better. (If the IDs are scoped globally or to a contest; scoping per participation is more sensitive, you need to make the index on ID first and participation second.)

Still, it's a question of how much do we care: with 400 contestants, 3 tasks in a contest, and 30 submissions per task, it's just 36000 submissions in total and the index is going to be tiny. So even fully random IDs scoped to a contest are going to perform quite reasonably.)

wil93 · 2025-06-19T14:09:48Z

Just to cover all bases: how bad is it really to leak information on the total number of submissions?

gollux · 2025-06-19T15:54:07Z

Just to cover all bases: how bad is it really to leak information on the total number of submissions?

That's not a healthy approach to security :) Contestants are supposed to be separated, so all leaks are bad to some degree.

Of course there are some leaks that are almost unavoidable. One such example is evaluation delays. We can think about them and argue that they give no real advantage to contestants.

But when the number of leaks grows, you must argue that even a combination of all the leaks gives no advantage, which is fiendishly hard. So whenever a leak is easy to stop, we want to do it, so that the few remaining unavoidable leaks will be manageable.

wil93 · 2025-06-19T15:58:35Z

What I mean is: imagine that CMS included a random piece of stats in the user-visible page (let's say right next to the countdown in the page) which read "Total submissions: 1234" and you maybe even see the number grow in real time. It's not that crazy to think, there are a lot of assumptions that early versions of CMS made. (In one IOI, I remember we even experimented with showing a breakdown on where the points where being distributed, so that weaker contestants would know on which task to focus their attention on).

Now, in this scenario, would we be worried about this already-existing leak and would we be trying to remove this information? Or would we keep it as it is since it's there and it doesn't actually hurt?

gollux · 2025-06-19T16:02:48Z

In that case, I would want to make the visibility of such stats configurable.

wil93 · 2025-06-19T16:12:33Z

OK so you think it gives some kind of advantage to be able to see that information? That's fair.

As for this point made by @veluca93:

I am opposed to using a function of the submission ID, because it introduces the possibility of mistakes leaking information.

What kind of mistakes do you mean? If I understood correctly, the "function of the submission ID" approach would re-use the secret key configured in cms.toml to get some reversible hash, for which there is probably some standard function out there (likely already among our dependencies).

gollux · 2025-06-19T16:23:42Z

OK so you think it gives some kind of advantage to be able to see that information? That's fair.

I am not sure that it gives. But I am sure that contestants are much more ingenious than me, so I prefer minimizing leaks to quantifying advantages obtainable from the leaks :)

gollux · 2025-06-19T16:25:35Z

What kind of mistakes do you mean? If I understood correctly, the "function of the submission ID" approach would re-use the secret key configured in cms.toml to get some reversible hash, for which there is probably some standard function out there (likely already among our dependencies).

I think that a stronger argument against basing public IDs on internal sequential IDs is that you lose the possibility of dump imports preserving original public IDs.

veluca93 · 2025-06-19T20:41:24Z

My original proposal was to have the opaque IDs scoped to the given participation, while you have them globally unique.

I still think that a more restricted scope (at least a contest if not a participation) is better. One reason is that it reduces the probability of collisions, the other that it allows preserving the IDs in contest dumps without fear of collisions when importing. Also, when a dump is upgraded from an older format, it is sufficient to avoid collisions in newly assigned IDs.

Switched this to participation-scope, PTAL.

cmscontrib/updaters/update_from_1.5.sql

cmscontrib/updaters/update_45.py

cms/db/submission.py

This gives a stable ID that can be used to refer to a specific submission.

veluca93 requested a review from gollux June 18, 2025 23:51

veluca93 requested a review from wil93 as a code owner June 18, 2025 23:51

wil93 reviewed Jun 19, 2025

View reviewed changes

cms/server/contest/handlers/contest.py Show resolved Hide resolved

gollux reviewed Jun 19, 2025

View reviewed changes

cmscontrib/AddSubmission.py Outdated Show resolved Hide resolved

gollux reviewed Jun 19, 2025

View reviewed changes

cmscontrib/updaters/update_45.py Outdated Show resolved Hide resolved

gollux reviewed Jun 19, 2025

View reviewed changes

cms/db/submission.py Show resolved Hide resolved

gollux reviewed Jun 19, 2025

View reviewed changes

veluca93 force-pushed the submission-ids branch 2 times, most recently from 7000118 to bf03a00 Compare June 19, 2025 20:40

veluca93 force-pushed the submission-ids branch from bf03a00 to 5a0fc8e Compare June 19, 2025 20:44

wil93 reviewed Jun 19, 2025

View reviewed changes

cmscontrib/updaters/update_from_1.5.sql Show resolved Hide resolved

veluca93 force-pushed the submission-ids branch from 5a0fc8e to f0c8f65 Compare June 19, 2025 22:37

gollux approved these changes Jun 21, 2025

View reviewed changes

cmscontrib/updaters/update_45.py Outdated Show resolved Hide resolved

cms/db/submission.py Outdated Show resolved Hide resolved

Add a "opaque_id" column to submissions.

ac421ac

This gives a stable ID that can be used to refer to a specific submission.

veluca93 force-pushed the submission-ids branch from f0c8f65 to ac421ac Compare June 21, 2025 15:46

veluca93 merged commit 7d7eb3c into cms-dev:main Jun 21, 2025
3 checks passed

wil93 mentioned this pull request Jun 25, 2025

Remove CODEOWNERS. #1421

Merged

wil93 mentioned this pull request Aug 29, 2025

Add opaque_id to user tests, use opaque_id in functional tests, remove encryption-related dead code #1526

Open

Add a "opaque_id" column to submissions. #1419

Add a "opaque_id" column to submissions. #1419

Uh oh!

Conversation

veluca93 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wil93 commented Jun 19, 2025

Uh oh!

veluca93 commented Jun 19, 2025

Uh oh!

wil93 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

veluca93 commented Jun 19, 2025

Uh oh!

wil93 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

veluca93 commented Jun 19, 2025

Uh oh!

Uh oh!

prandla commented Jun 19, 2025

Uh oh!

veluca93 commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gollux left a comment

Choose a reason for hiding this comment

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

wil93 commented Jun 19, 2025

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

wil93 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

wil93 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

gollux commented Jun 19, 2025

Uh oh!

veluca93 commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

veluca93 commented Jun 18, 2025 •

edited

Loading

wil93 commented Jun 19, 2025 •

edited

Loading

wil93 commented Jun 19, 2025 •

edited

Loading

wil93 commented Jun 19, 2025 •

edited

Loading

wil93 commented Jun 19, 2025 •

edited

Loading