Unify Operators using Python Interface, Remove CMake#37
Conversation
hunhoffe
left a comment
There was a problem hiding this comment.
I think this is a great step forward!
I'd recommend subsequent PRs to:
- Simplify test setup/config, probably using a library such as pytest
- Simplify the python import/path setup in this repo
However, this PR is already large/chaotic, so I'd review my comments but push these tasks to future work.
| @@ -0,0 +1,177 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
this could be a good intermediary step, but is there a reason we can't use something like pytest to find tests within a directory? This seems a little like reinventing the wheel when there are mature python testing libraries.
Another option could be to find classes with tests during module load time in the operator top level init file (e.g., do introspection to find which of the loaded things are testable).
I'm fine with this for now -- just brainstorming :)
There was a problem hiding this comment.
Claude whipped this together with me so these tests could fit into the existing testing framework that is a weird handmade solution of CMake+Python. Once it's all migrated to Python, I agree that we should move to something more established. Created issue #42 for this which I can tackle after merging this.
| from operators.common.test_utils import run_test | ||
|
|
||
|
|
||
| regular_test_cases = [ |
There was a problem hiding this comment.
This could (in a later PR probably) be addressed with something like a pytest.parametrize with M, K, etc. parameters.
There was a problem hiding this comment.
Agreed. This is ugly and for now just mirrors what was in the CMake directly and I didn't like doing it but as you already pointed out, the PR is already getting large.
|
|
||
| def set_up_artifacts(self): | ||
| operator_dir = Path(__file__).parent | ||
| file_name_base = f"relu_{self.num_aie_columns}c_{self.num_channels}ch_{self.size}_{self.tile_size}t" |
There was a problem hiding this comment.
I wonder if both file_name_base and mlir_artifact could have a default impl in the operator class that simplifies this setup for most operators (could. be overridden if needed).
I also wonder if the set_up_artifacts should be part of initialization, but that's probably a question for another day.
There was a problem hiding this comment.
It'd be great if there was some customization capability to be able to change the name / directory of object files and MLIR files.
| python_cmd += "print(str(ctx.module))" | ||
| else: | ||
| python_cmd += "print(str(mlir_code))" | ||
| self.dry_run.append(f"python3 -c '{python_cmd}' > {artifact.path}") |
There was a problem hiding this comment.
You can get the python executable with sys.executable
| class AieccCompilationRule(CompilationRule): | ||
| def __init__(self, build_dir, peano_dir, mlir_aie_dir, *args, **kwargs): | ||
| self.build_dir = build_dir | ||
| self.aiecc_path = Path(mlir_aie_dir) / "bin" / "aiecc.py" |
There was a problem hiding this comment.
You can probably reuse some of the logic in https://github.com/Xilinx/mlir-aie/blob/9db8a590303c59bd88be58f0596f15afb4656af7/python/iron/compile/compile.py#L79 to avoid an extra shell.
| ) | ||
|
|
||
| def compile(self, artifacts): | ||
| clang_path = Path(self.peano_dir) / "bin" / "clang++" |
There was a problem hiding this comment.
Peano compiler, linker, etc. are exposed in https://github.com/Xilinx/mlir-aie/blob/9db8a590303c59bd88be58f0596f15afb4656af7/python/utils/config.py#L13
|
It's kind of big to review exhaustively, but I think it's a good step towards a reusable IRON framework. We have a number of functions for compilation, linking etc. in MLIR-AIE that can be reused. |
|
Thanks for the comments @ypapadop-amd. I agree it's large. The goal was to limit the scope of this PR to only removing the duplication of build scripts for the operators. Instead of having the CMake build scripts under The Erika and I are planning for this to be a multi-step refactor. Reusing existing code from elsewhere (more code deduplication) and coming to a clean, reusable operator interface are the next steps. Perhaps addressing things like operator artifact naming, and reusing existing IRON and JIT code should be part of those next steps. |
📊 Test Results for Small Benchmark/Test Suite6b841b9 (2025_12_03_21_43_03) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite6b841b9 (2025_12_03_21_43_03) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. swiglu_decode_1x2048x2048
tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
…n Python (missing warmup iterations); merge Curt's GEMM fixes
📊 Test Results for Test Example Applications98350dc (2025_12_03_22_19_28) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications98350dc (2025_12_03_22_19_28) IRONCLAD Trendsllama_3.2_1b
|
📊 Test Results for Test Example Applicationsa02e460 (2025_12_03_22_40_30) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applicationsa02e460 (2025_12_03_22_40_30) IRONCLAD Trendsllama_3.2_1b
|
📊 Test Results for Test Example Applications79c1558 (2025_12_03_22_52_18) IRONCLADTested on
📈 Trends (vs main branch) for Test Example Applications79c1558 (2025_12_03_22_52_18) IRONCLAD Trendsllama_3.2_1b
|
📊 Test Results for Small Benchmark/Test Suite79c1558 (2025_12_03_23_09_12) IRONCLADTested on
📈 Trends (vs main branch) for Small Benchmark/Test Suite79c1558 (2025_12_03_23_09_12) IRONCLAD Trendsaxpy_1_cols_2_channels_2048_tile_2048_3.0
axpy_2_cols_2_channels_2048_tile_1024_3.0
axpy_4_cols_2_channels_2048_tile_512_3.0
axpy_8_cols_2_channels_2048_tile_256_3.0
dequant_1_cols_1_channels_2048_tile_2048
dequant_1_cols_2_channels_2048_tile_1024
dequant_2_cols_1_channels_2048_tile_1024
dequant_2_cols_2_channels_2048_tile_512
dequant_4_cols_1_channels_2048_tile_512
dequant_4_cols_2_channels_2048_tile_256
dequant_8_cols_1_channels_2048_tile_256
dequant_8_cols_2_channels_2048_tile_128
eltwise_add_1_cols_2_channels_2048_tile_2048
eltwise_add_2_cols_2_channels_2048_tile_1024
eltwise_add_4_cols_2_channels_2048_tile_512
eltwise_add_8_cols_2_channels_2048_tile_256
eltwise_mul_1_cols_2_channels_2048_tile_2048
eltwise_mul_2_cols_2_channels_2048_tile_1024
eltwise_mul_4_cols_2_channels_2048_tile_512
eltwise_mul_8_cols_2_channels_2048_tile_256
gelu_1_cols_1_channels_2048_tile_2048
gelu_1_cols_2_channels_2048_tile_1024
gelu_2_cols_1_channels_2048_tile_1024
gelu_2_cols_2_channels_2048_tile_512
gelu_4_cols_1_channels_2048_tile_512
gelu_4_cols_2_channels_2048_tile_256
gelu_8_cols_1_channels_2048_tile_256
gelu_8_cols_2_channels_2048_tile_128
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x32_8_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_0_bcolmaj_1_ccolmaj_0
gemm_2048x2048x2048_64x64x64_2_cols_1_bcolmaj_0_ccolmaj_0
gemm_2048x2048x2048_64x64x64_8_cols_0_bcolmaj_0_ccolmaj_0
layer_norm_1_cols_1_channels_2048_tile_2048
layer_norm_1_cols_2_channels_2048_tile_1024
layer_norm_2_cols_1_channels_2048_tile_1024
layer_norm_2_cols_2_channels_2048_tile_512
layer_norm_4_cols_1_channels_2048_tile_512
layer_norm_4_cols_2_channels_2048_tile_256
layer_norm_8_cols_1_channels_2048_tile_256
layer_norm_8_cols_2_channels_2048_tile_128
matrix_vector_mul_128x128_32_1col
matrix_vector_mul_2048x8192_1_1col
matrix_vector_mul_2048x8192_1_2col
matrix_vector_mul_2048x8192_1_4col
matrix_vector_mul_2048x8192_1_8col
matrix_vector_mul_8192x2048_4_1col
matrix_vector_mul_8192x2048_4_2col
matrix_vector_mul_8192x2048_4_4col
matrix_vector_mul_8192x2048_4_8col
mem_copy_16_cores_2_chans_2048_tile_128_False
mem_copy_1_cols_1_channels_2048_tile_2048
mem_copy_1_cols_2_channels_2048_tile_1024
mem_copy_1_cores_1_chans_2048_tile_2048_False
mem_copy_2_cols_1_channels_2048_tile_1024
mem_copy_2_cols_2_channels_2048_tile_512
mem_copy_2_cores_1_chans_2048_tile_1024_False
mem_copy_2_cores_2_chans_2048_tile_1024_False
mem_copy_4_cols_1_channels_2048_tile_512
mem_copy_4_cols_2_channels_2048_tile_256
mem_copy_4_cores_1_chans_2048_tile_512_False
mem_copy_4_cores_2_chans_2048_tile_512_False
mem_copy_8_cols_1_channels_2048_tile_256
mem_copy_8_cols_2_channels_2048_tile_128
mem_copy_8_cores_1_chans_2048_tile_256_False
mem_copy_8_cores_2_chans_2048_tile_256_False
mha
relu_1_cols_1_channels_2048_tile_2048
relu_2_cols_1_channels_2048_tile_1024
relu_4_cols_1_channels_2048_tile_512
relu_8_cols_1_channels_2048_tile_256
rms_norm_1_cols_1_channels_2048_tile_2048
rms_norm_1_cols_2_channels_2048_tile_1024
rms_norm_2_cols_1_channels_2048_tile_1024
rms_norm_2_cols_2_channels_2048_tile_512
rms_norm_4_cols_1_channels_2048_tile_512
rms_norm_4_cols_2_channels_2048_tile_256
rms_norm_8_cols_1_channels_2048_tile_256
rms_norm_8_cols_2_channels_2048_tile_128
rope_1_cols_2_channels_4096_tile_4096_0
rope_2_cols_2_channels_4096_tile_2048_0
rope_4_cols_2_channels_4096_tile_1024_0
rope_8_cols_2_channels_4096_tile_512_0
sigmoid_1_cols_1_channels_2048_tile_2048
sigmoid_2_cols_1_channels_2048_tile_1024
sigmoid_4_cols_1_channels_2048_tile_512
sigmoid_8_cols_1_channels_2048_tile_256
silu_1_cols_1_channels_2048_tile_2048
silu_2_cols_1_channels_2048_tile_1024
silu_4_cols_1_channels_2048_tile_512
silu_8_cols_1_channels_2048_tile_256
softmax_1_cols_2_channels_4096_tile_2048
softmax_2_cols_2_channels_4096_tile_1024
softmax_2_cols_2_channels_4096_tile_512
swigluNo metrics available. swiglu_decode_1x2048x2048
tanh_1_cols_1_channels_2048_tile_2048
tanh_2_cols_1_channels_2048_tile_1024
tanh_4_cols_1_channels_2048_tile_512
tanh_8_cols_1_channels_2048_tile_256
transpose_2048_M_64_N_1_cols_1_channels_64_m_64_n_8_s
transpose_2048_M_64_N_1_cols_2_channels_64_m_64_n_8_s
weighted_rms_norm_1_cols_2_channels_2048_weights_2048
weighted_rms_norm_2_cols_2_channels_2048_weights_1024
weighted_rms_norm_4_cols_2_channels_2048_weights_512
weighted_rms_norm_8_cols_2_channels_2048_weights_256
|
CMakeLists.txtto wrapping all operators in a Python AIE base operator.test.cpps totest.pys, aiming to factor out shared/previously duplicated code.operatorsdirectory, where each operator has a subdirectory with adesign.py(IRON code),op.py(operator wrapper/Python interface),reference.pyandtest.py.Add a "dry run" facility to the Python interface that allows printing the compilation commands for artifact generation. I added this primarily because it was useful to verify against the CMake compilation commands.Removed, commits were 0fa2fca, 350dd5f, 2250e79 and 69994ff.test.pys in CI.Remaining to do:
bf16in the Python operator interface -- needed for thedequantkernel.