Skip to content

#3: updated deprecated df.append() syntax#18

Open
j-miko wants to merge 3 commits intomasterfrom
3-replace-pddataframeappend-with-pdconcat
Open

#3: updated deprecated df.append() syntax#18
j-miko wants to merge 3 commits intomasterfrom
3-replace-pddataframeappend-with-pdconcat

Conversation

@j-miko
Copy link
Copy Markdown
Member

@j-miko j-miko commented Mar 25, 2026

No description provided.

@j-miko j-miko requested a review from Copilot March 25, 2026 13:29
@j-miko j-miko linked an issue Mar 25, 2026 that may be closed by this pull request
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates deprecated pandas.DataFrame.append() usage to supported pandas patterns to keep transcript profiling and assay ordering utilities working on newer pandas versions.

Changes:

  • Replaced DataFrame.append(Series) with pd.concat(...)-based row additions in SAM transcript profile generation.
  • Replaced repeated append() calls in IDT bulk input generation with a list-accumulation approach and a single DataFrame construction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
trxtools/sam/SAMtranscripts.py Swaps deprecated DataFrame.append() calls for pd.concat() when accumulating per-transcript profiles.
trxtools/folding/assays.py Reworks bulk IDT output construction to avoid deprecated append() by collecting rows and building the DataFrame once.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread trxtools/sam/SAMtranscripts.py Outdated
Comment on lines +149 to +158
# output_df = output_df.append(s1.rename(n))
output_df = pd.concat([output_df, s1.rename(n).to_frame().T])

# deletions
s2 = transcript2profileDeletions(l, length=length)
outputDel_df = outputDel_df.append(s2.rename(n))
outputDel_df = pd.concat([outputDel_df, s2.rename(n).to_frame().T])

# deletions ext
s3 = transcript2profileDeletions(l, length=length, expand=expand)
outputDelExt_df = outputDelExt_df.append(s3.rename(n))
outputDelExt_df = pd.concat([outputDelExt_df, s3.rename(n).to_frame().T])
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.concat(...) is executed once per transcript inside the loop for output_df/outputDel_df/outputDelExt_df, which repeatedly copies/reallocates the growing DataFrames. To keep this efficient on large inputs, accumulate the per-transcript rows in lists and concatenate once after the loop (or concatenate each list once per output).

Copilot uses AI. Check for mistakes.
Comment thread trxtools/sam/SAMtranscripts.py Outdated
Comment on lines 179 to +184
output = pd.DataFrame()
rows = []
for i, row in working.iterrows():
output = output.append(pd.Series(index=['sequence'], data=row["template"], name=str(i) + "_template"))
output = output.append(pd.Series(index=['sequence'], data=row["non-template"], name=str(i) + "_non-temp"))
rows.append(pd.Series(index=['sequence'], data=row["template"], name=str(i) + "_template"))
rows.append(pd.Series(index=['sequence'], data=row["non-template"], name=str(i) + "_non-temp"))
output = pd.DataFrame(rows)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output = pd.DataFrame() is immediately overwritten by output = pd.DataFrame(rows) after the loop, so the initial assignment is redundant. Consider removing the first initialization (or directly construct output from rows) to reduce clutter.

Copilot uses AI. Check for mistakes.
Comment on lines 103 to +105
s1 = transcript2profile(l, length=length)
output_df = output_df.append(s1.rename(n))
# output_df = output_df.append(s1.rename(n))
output_df = pd.concat([output_df, s1.rename(n).to_frame().T])
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building output_df by calling pd.concat(...) on every groupby iteration repeatedly reallocates/copies the whole DataFrame, which can get very slow for many transcripts (same scaling problem as df.append). Collect the per-transcript rows/frames in a list and pd.concat once after the loop (keeping n as the row index) to avoid O(n²) behavior.

Copilot uses AI. Check for mistakes.
Comment thread trxtools/sam/SAMtranscripts.py Outdated
j-miko and others added 2 commits March 25, 2026 14:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace pd.DataFrame.append() with pd.concat()

2 participants