Skip to content

Conversation

@ipundit
Copy link
Contributor

@ipundit ipundit commented Sep 4, 2023

This is a draft non-breaking solution for #45 based on the Pipes and Filter Pattern. It:

  • Passes regression thus demonstrating that no API changes for ActiveRecord 1.x are necessary. Nothing breaks!
  • Adds a new static Model::where() function that replaces Model::find(). Everything that Model::find() used to do can be done by Model::where(), but has a saner and type-checked input parameter list.
  • The above point is proved with an API-documenting test suite called SQLExecutionPlanTest.php
  • where() will always return a Model, or null if not found, while all() will always return a Model[], no matter what is passed into the input lists of both functions

To understand this code:

  • Ignore the PR composer test is now more convenient to use #48 changes that'll be removed in the final version (composer.json and TestCommand.php)
  • Read SQLExecutionPlanTest::testWhereChained() and SQLExecutionPlanTest::testAllChained() to see the beauty of this chaining method, and how it's more like Ruby and easier to use than the old Model::find() and Model::all()
  • Read Model.php and note SQLExecutionPlan is a State class that is repeated passed around for chaining purposes. By splitting this functionality from Model, a user can setup an SQLExecutionPlan and reuse it among multiple invocations and even cache it

To do if this design is approved:

  • Complete documentation
  • Refactor Model::find() to move that functionality into SQLExecutionPlan.php. Model::all() has already been rewritten to replace Model::find() with SQLExecutionPlan::all()
  • Should the use of Model::find() emit a deprecation notice? If where() can do all that find() can do but is easier to use and is type checked, why have find()?

ipundit added 18 commits August 31, 2023 18:10
…at capitalization for attribute names are retained. So the short form for New York is now NY instead of ny. test/model/Author, Book, and Venue now have those strtoupper() workarounds in getter/setter methods removed as it's done in Model.php
…rcased in Model.php in order to be case insensitive.
…at capitalization for attribute names are retained. So the short form for New York is now NY instead of ny. test/model/Author, Book, and Venue now have those strtoupper() workarounds in getter/setter methods removed as it's done in Model.php
…rcased in Model.php in order to be case insensitive.
…cord

# Conflicts:
#	test/ActiveRecordTest.php
#	test/models/Book.php
…Disable it so that regression does not timeout.
…d#35:

Authors table now has a column that is camel cased: firstName
Added tests that demonstrate case insensitivity when accessing this firstName column
- Correct link to contributing.md
@shmax
Copy link
Contributor

shmax commented Sep 4, 2023

Hey, looks pretty cool. I'll have to analyze this in detail, but just a quick note:

and is type checked, why have find()?

Because it aligns with the conventional Active Record pattern, and it IS type-checked. Please familiarize yourself with PHPStan and the extensions I've been laboring over for the past week.

@ipundit
Copy link
Contributor Author

ipundit commented Sep 4, 2023

The options parameter though is not checked at compile time. I've written bugs like these before:

Author::find(3, ['orderBy'=>'author_id DESC']);

gives a run time error in a line later than where the bug resides. It should be:

Author::find(3, ['order'=>'author_id DESC']);

In the new way, this runs fine:

Author::orderBy('author_id DESC'])->where(3);

while this fails on the line where the typo bug is at.

Author::order('author_id DESC'])->where(3);

The old way still works, but I prefer the new way as typo-related errors are found where they are made.

@shmax
Copy link
Contributor

shmax commented Sep 4, 2023

Right, so with your new design we wouldn't pass that stuff to find, which sounds good, but I think you're leaping a bit too far to argue that find should go away. If you want to draw more inspiration from RoR, why not go all the way?

Author::order('author_id DESC'])->where(["name" => "bob"])->find([1,3, 4]);

…cord

# Conflicts:
#	composer.json
#	scripts/TestCommand.php
…cord into RubyLikeFind

# Conflicts:
#	composer.json
#	scripts/TestCommand.php
@ipundit
Copy link
Contributor Author

ipundit commented Sep 4, 2023

All I did I is rebase after the merge of #48. PHPUnit 8.1 passes, but PHPUnit 8.2 gives an error I've never seen before:

Error: Can't use 'tar -xzf' extract archive file: /home/runner/work/_actions/_temp_c3b3d108-0339-45f2-9f31-530b9a59f47f/443e5230-3b4f-41a4-906e-4bd890ddd139.tar.gz. return code: 2.

in the setup job phase. Isn't that a build environment error rather than a coding error? shmax, is this something you can fix or is this something fixable on my end?

@shmax
Copy link
Contributor

shmax commented Sep 4, 2023

Hmm, beats me, but I don't think it's your fault--I re-ran the jobs, and they're green, now.

@ipundit
Copy link
Contributor Author

ipundit commented Sep 4, 2023

If you want to draw more inspiration from RoR, why not go all the way?

Author::order('author_id DESC'])->where(["name" => "bob"])->find([1,3, 4]);

This is a good question, a philosophical one. I really don't like find() returning a Model or a Model[] depending on the input parameters. My solution was to split it into two functions, find() always returns a Model or null, and all() always returns a Model[].

So the above would be (simplified) to:

$query = Author::where(["name" => "bob"])->find(1);
$queries = Author::where(["name" => "bob"])->all([1,3]);

Most of the time though, you're not filtering on the primary keys (find) and the non-primary keys at the same time (where):

$query = Author::where(["name" => "bob"])->find();
$queries = Author::where(["name" => "bob"])->all();

to the point where you would always be ending every call with a ->find() and ->all() call

In the existing design, you would do the above with:

$query = Author::where(["name" => "bob"]);
$queries = Author::all(["name" => "bob"]);

which is easier to read and write for the common cases, and

$query = Author::where(["name" => "bob", "id" => 1]);
$queries = Author::all(['name = (?) and id in (?)', 'bob', [1, 3]]);

for the non-common cases where you're filtering on key and non-key fields at the same time. The where() is not too bad, but the all() looks pretty ugly here.

A philosophical question. Your thoughts?

@shmax
Copy link
Contributor

shmax commented Sep 4, 2023

I read you loud and clear, but my instinct is still not to diverge too far from the basic behavior of find. It's sort of the main API for the active record pattern, and I don't want to upset more than a decade of developer familiarity with the core tenets even though it is a bit weird. Again, it is being statically checked, so if you're using something like PHPStan it will catch any points in your code where you are getting single and array model results mixed-up.

But let me look into it in more depth when I get done with the phpstan stuff I'm currently toiling away on, and then I'll be in a better position to discuss this.

@ipundit
Copy link
Contributor Author

ipundit commented Sep 4, 2023

In my work on this PR, I ran into two types of name space collisions:

A problem with last and find()

Say you have a table with the primary key as a string. Now does:

$query = Author::where(["name" => "bob"])->find('last');

mean you're searching on the primary key='last' or you want to find the last row with name=bob? The existing design fixes the problem with:

$query = Author::where(["name" => "bob"], false); // use true, which is the default, to return the first row

A problem with raw where statements and all()

Say again you have a table with the primary key as a string.

$queries = Author::all(["a", "b"]);

is obvious what the string[] input parameter does, but what about?

$queries = Author::all(['name = (?) and publisher <> (?)', 'Bill Clinton', 'Random House']);

That's also a string[], so how based on the type of the input parameter, do you know to do a primary key search or a conditions search? The existing solution to try the primary key search first, catch the RecordNotFound Exception, and then try the conditions search. Not elegant, but it did pass regression.

The new design would solve this problem by:

$queries = Author::all(["a", "b"]);
$queries = Author::where(['name = (?) and publisher <> (?)', 'Bill Clinton', 'Random House'])->all();

These corner cases are hard to imagine and find, and only by working deeply with the all() function while writing this PR did I realize the weakness with the existing design. What I recommend is that move and refactor Model::find() functionality to use SQLExecutionPlan::where(), just like it currently does for SQLExecutionPlan::all(). This is a win for PHPStan:

as:

  /**
   * Alias for self::find('all').
   *
   * @see find
   *
   * @return static[]
   */
public static function all(/* ... */)

is now:

/**
  * @param array<number|mixed|string> $needle An array containing values for the pk
  *
  * @return array<Model> All the rows that matches query. If no rows match, returns []
  */
public static function all(array $needle = []): array

but that was hard work to figure that out. I expect that I might be able to do something like that for find() if I can get that to work for the new design and pass regression. I'm willing to do that, but only if we agree that SQLExecutionPlan is a good idea and thus the refactoring part of this PR is accepted in principle. I would separate that part of the PR into a separate one as infrastructure work under the old API that's good to do regardless of the ultimate fate of this PR which proposes a new API for users worthy of a version 2.0.

As for the the final fate of ->find() and ->all(), I'm not yet willing to make a position on it until other corner cases are found.

lib/Relation.php Outdated
*
* @return Model|null The single row that matches query. If no rows match, returns null
*/
public function where(int|string|array $needle): Model|null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, should return a Relation.

*
* @return array<Model> All the rows that matches query. If no rows match, returns []
*/
public static function all(array $needle = []): array
Copy link
Contributor

@shmax shmax Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all doesn't take any arguments; I think it's only meant as a means to get a handle to the Relation without executing anything:

$allBooks = Books::all();

...
$books = $allBooks->where(['title'=>'foo'].to_a()'

lib/Relation.php Outdated
return $this;
}

public function orderBy(string $order): Relation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we still doing snake_case in the lib code?

*/
public function all(array $needle = []): array
{
$list = [];
Copy link
Contributor

@shmax shmax Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all doesn't take any arguments or do anything; all it needs to do is return itself.

…her than joins()

- Add from chainable function. It doesn't exist in Rails4, but it is in the PHP ActiveRecord spec
- Move implementation of find() from Model to Relation
- Remove find_by_pk() as the regression tests and documentation says you should call find() instead
- Fix bugs in ActiveRecordFindTest which expected RecordNotFound exceptions as opposed to TypeError exceptions when passing invalid input
@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #49 (3170e85) into master (4bec272) will decrease coverage by 0.25%.
The diff coverage is 92.77%.

❗ Current head 3170e85 differs from pull request most recent head 0b81ba0. Consider uploading reports for the commit 0b81ba0 to get more accurate results

@@             Coverage Diff              @@
##             master      #49      +/-   ##
============================================
- Coverage     95.23%   94.99%   -0.25%     
- Complexity      935      986      +51     
============================================
  Files            32       33       +1     
  Lines          2246     2378     +132     
============================================
+ Hits           2139     2259     +120     
- Misses          107      119      +12     
Files Changed Coverage Δ
lib/Model.php 95.58% <82.35%> (-1.94%) ⬇️
lib/Relation.php 99.10% <99.10%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ipundit
Copy link
Contributor Author

ipundit commented Sep 6, 2023

Ok, the next step is done. Model::find() is now calling Relation::where() and Relation::all() and Model::find*() are calling Model::find(). That means all the find() are going through Relation's implementation and this major refactoring passes regression.

Model::count() is also calling Relation's implementation.

Model does the static::extract_and_validate_options($args) work to be version 1 compatible and then sends the cleaned up data to Relation. RoR4's input arguments are much less permissive than version 1's input arguments, and so I'd like to track Relation's input argument list to RoR4 while keeping the conditions[] processing for backwards compatibility in Model.

One issue that caused great complexity in the code is the version 1 inconsistent handling of queries that don't return a record. If it's a find by primary key, then it throws an exception. If it's a find by condition, then it returns null, or [].

However, this is not even internally consistent.

$thisIsNull = Author::find_by_author_id(999999);
$butThisThrowsAnError = Author::find(999999);

The reason is that dynamic finders are always converted to a find by condition even though that condition is searching on the primary key. If we're allowed to break things for version 2, I would like the return value to always be null when expecting a single Model when you can't find it, and a zero-length Model[] when expecting multiple models.

@shmax
Copy link
Contributor

shmax commented Sep 6, 2023

This is still a draft so I assume it's very much a work in progress, but I still see some architectural rough edges that need to be smoothed out:

  • Relation::where needs to return a Relation
  • Relation::all should not take any arguments. All it needs to do is return itself.
  • Relation needs a find method. Once you have that, Model::find can delegate to it.
  • Relation needs a to_a method.

And so on.

If we're allowed to break things for version 2, I would like the return value to always be null when expecting a single Model when you can't find it, and a zero-length Model[] when expecting multiple models.

Don't worry about breaking compatibility with 1.x (that ship has sailed); I'm more interested in aligning with RoR wherever possible. If you need to drop find_by_pk you won't hear me complaining.

Keep up the good work. 👍

@shmax
Copy link
Contributor

shmax commented Sep 6, 2023

Also, I'm out of other stuff to do, so consider me available to help. If you need me to tune up the dynamic static checking stuff or otherwise contribute just say the word.

…n the test suite and it's obviously an internal function that should have never been made public. Use find() instead.
@ipundit
Copy link
Contributor Author

ipundit commented Sep 6, 2023

Ok, where() and find() are now implemented! I've tried to hew closely to the Ruby on Rails standard. To do:

  • Have Model::find() call Relation::find()
  • Input checking of where statements to deny SQL injection attacks
  • all() and to_a()
  • Documentation

Unfortunately, I'm going to have to pass the rest of this feature to you. I have to go back to work now, so it'll be at least about a week before I can check in again. Please feel free to take it anywhere from here without consulting me. Since Ruby on Rails has a published spec and we're following it, it's now just a matter of implementation rather than design. The heavy lifting has been done though, and I think the larger community will appreciate the new chainable interface once it's released to Packagist. It's been nice working with you as you're really quick to answer, and I've learned lots on this project. Thanks for everything.

@shmax
Copy link
Contributor

shmax commented Sep 6, 2023

That's definitely some great progress! I'm happy to tag-in on this, and will begin pushing changes soon. Thanks so much for taking the lead on this, and we'll catch up again in a week or so. 👏

@shmax
Copy link
Contributor

shmax commented Sep 7, 2023

Just thought I'd give a little update:

  • implemented Relation::all
  • implemented Relation::to_a
  • Relation::find now only takes primary keys, arrays, of pks, or a list of pks (no more arrays with "conditions", etc)

The tricky part is Relation::where, as I think you've probably already discovered. I spent the evening with the debugger, and the problem is that the conditions-parsing code we have always assumes that we have a single string, or an array with a string and replacement values, or a hash, and isn't prepared to do more complex aggregations. The other troubling thing is that despite mention of it here and there in the documentation we don't actually seem to support using named replacement variables, ie:

[
  'conditions' => [ 
    'select id from books where name = :name', [
        'name' => 'bob'
     ]
  ]
];

That will quickly become complicated when we start trying to aggregate multiple where clauses because you have to manage name conflicts. Fortunately, this is something I've worked out in my own codebase so I should be able to get all this working, but it will probably take some time.

@shmax
Copy link
Contributor

shmax commented Sep 7, 2023

I did a little experimenting with Ruby on Rails, and it does allow you to mix named and unnamed parameters:

@res = User.where("admin = ?", 1).where("name = :name", { name: "Michael Example" }).to_a

I'd like to see how they handle that, so today's project is going to be to download the source for rails and do some deep debugging...

@ipundit
Copy link
Contributor Author

ipundit commented Sep 7, 2023

Thanks for the update. It sounds like the project is is good hands. I only had enough time to demonstrate that the chainable implementation was doable while maintaining complete backwards compatibility, but finishing off all the corner cases is a lot of detail work and I'm so I'm glad you're willing to do it. I have four thoughts that have come to mind since the last commit:

  1. take() is trivial to implement since we have find(), but I find it confusing as to why you need both until I read the Ruby documentation in detail. It may be simpler for users to not offer the nuance, but the dynamic finders use take(), so it would have to be implemented anyway.
  2. I moved count() from Model to Relation in my last checkin, so Author::where(name='Bill').count() will be trivial to implement. Since the version 1 code has count, I would advocate for implementing count() in version 2, especially since the implementing logic is already there.
  3. last() currently returns a Relation when it should return a Model or a Model[] when using last(n). This is easy to fix since we already have find() and last() can call find()
  4. If we implement last() and take(), then you may as well implement first(), especially since version 1 has it.

Really, first(), last(), take() are all variations of find(). Since since we have find(), especially now that it's been disassociated from where() (which was a design weakness in RoR < 3 and therefore ActiveRecord), it would easy, and I think helpful to implement them all according to the RoR4 standard.

Finally, there's a question of how much of Ruby on Rails to replicate before you release version 2 of ActiveRecord. My 2 cents is to completely implement version 1 functionality in version 2 (and we're pretty close already, especially if you implement the above 4 points) as the minimum standard to release. Also, a LOT of documentation has to be written, but it could point to working test cases. As a user, I find the test suite the most helpful way to understand ActiveRecord and how to actually do something.

For example, Author::find(99999) throws a RecordNotFound exception, which is not what I want in my code, but it's what Ruby specifies. But I can do Author::find_by_author_id(99999) to get null, or Author::where(99999)->find to get null too. Putting that in the documentation and test cases would be useful.

@shmax
Copy link
Contributor

shmax commented Sep 7, 2023

Really, first(), last(), take() are all variations of find().

We'll get 'em! Actually, it seems that find( :first, ...)-style is still a thing, so I guess I'll have to bring that back.

I haven't come across take yet... not sure I understand exactly what it does, but I'll look into it more when I can.

Finally, there's a question of how much of Ruby on Rails to replicate before you release version 2 of ActiveRecord. My 2 cents is to completely implement version 1 functionality in version 2

If by version 1 you're talking about php-activerecord, then agreed, I think, with the understanding that the concept of passing "conditions" into find is now replaced by using where clauses (assuming I can get that fully-working).

I did a little more RoR debugging, here, and I think I can now answer the question of how it handles mixing of named and unnamed params: it sanitizes them on the spot. So, if you do something like People::where("name = ?", "bob"), then it immediately converts it to name = 'bob' before sending it to the query builder. This seems to be a rare case where PHP is ahead of the curve, here, in that we have the PDO abstraction to do all our escaping for us, so I don't think there's too much to be gleaned from the Ruby code, and I'll just have to write my own crazy algorithm for normalizing mixed vars.

@ipundit
Copy link
Contributor Author

ipundit commented Sep 7, 2023

#and I'll just have to write my own crazy algorithm for normalizing mixed vars.

:) Yes, I actually know how crazy that is. Exposing the conditions internal variable made life a lot more difficult to moving to version 2 as that was an implementation detail rather than a spec one, and now we want to change the implementation while maintaining backwards compatibility and removing conditions from the spec. The conditions regularization code is in SQLBuilder, but it doesn't handle all the cases and so I had to write where()'s ANDing in Relation.

I have two thoughts on where():

  1. Currently, the implementation of ANDing where statements in SQL code is in Relation. It should probably live in SQLBuilder instead and any escaping of input variables should live all in one class.
  2. where()'s and algorithm currently converts statements into an array of conditions like:
Author::where(['name' => 'bob', 'parentId=4'])
    ->where("mixedCaseName='Bill'")
    ->find(2);

into:

[0] => ['name' => 'bob', 'parentId=4']
[1] => ['mixedCaseName=Bill']

and then

[(name=('bob') and parentId=(4) AND mixedCaseName = 'Bill') AND id = (?), 2]

The brackets are to specify that all of the where()s are executed before find()'s id=2. I realized now that's not going to handle the or case. It should have brackets around every where() predicate, ie:

[((name=('bob') and parentId=(4)) AND (mixedCaseName = 'Bill')) AND (id = (?)), 2]

That means

Author::where(['name' => 'bob', 'parentId=4'])
    ->where("mixedCaseName='Bill'")
    ->or->where("name<>'charlie' and name <> 'david'")
    ->find(2);

then converts to:

[((name=('bob') and parentId=(4)) AND (mixedCaseName = 'Bill') OR (name <> 'charlie' and name <> 'david')) AND (id = (?)), 2]

@shmax
Copy link
Contributor

shmax commented Sep 8, 2023

I have two thoughts on where():

Right on both counts. I'm busily rewriting the sucker, and I came to the same conclusions you did before I even got started. With luck will have an update fairly soon....

@shmax shmax mentioned this pull request Sep 10, 2023
4 tasks
@ipundit
Copy link
Contributor Author

ipundit commented Sep 11, 2023

Replaced by pull request #63

@ipundit ipundit closed this Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants