Skip to content

When extracting images from a PPTX file, you can specify the output location for the images. #1501

@readmagic

Description

@readmagic

I don't know why not support this feature. And I modify the local source code. Modify the file is _pptx_converter.py and in about 144 lines.

       # If keep_data_uris is True, use base64 encoding for images
                    if kwargs.get("keep_data_uris", False):
                        blob = shape.image.blob
                        content_type = shape.image.content_type or "image/png"
                        b64_string = base64.b64encode(blob).decode("utf-8")
                        md_content += f"\n![{alt_text}](data:{content_type};base64,{b64_string})\n"
                    else:
                        # A placeholder name
                        filename = re.sub(r"\W", "", shape.name) + ".jpg"
                        filepath = os.path.join(kwargs.get("img_dir", "."), filename)
                        with open(filepath, "wb") as f:
                            f.write(shape.image.blob)
                        md_content += f"\n![{alt_text}]({filepath})\n"

And this is my test case,this run ok.

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("/home/xxx/xxx.pptx,img_dir="/home/xxx/imgs/")
print(result.markdown)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions