-
Notifications
You must be signed in to change notification settings - Fork 188
[CIR][MERGE-SPLIT-COMPILATION] Adds a compilation step that merges Device and Host CIR into a single module and can co-optimize their execution #2097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a new cc1 frontend action, -cir-combine, which enables combining separately generated host and device CIR modules into a single CIR container for downstream processing. Key changes Add a new frontend program action frontend::CIRCombine and corresponding CIRCombineAction. Introduce typed cc1 options: -cir-host-input <file>: specifies the host CIR module. -cir-device-input <file>: specifies the device CIR module. Enforce strict argument validation in ParseFrontendArgs: -cir-combine requires -fclangir. Exactly one host and one device CIR input must be provided. User-provided positional inputs are rejected. Extend FrontendOptions to store CIR host/device inputs. Ensure correct cc1 argument round-trip by regenerating -cir-combine, -cir-host-input, and -cir-device-input. Treat -cir-combine as a non-source action: Suppress implicit stdin input and language (-x) emission. Inject a synthetic precompiled input to drive frontend encouraging execution without parsing source. Wire the new action through frontend action creation so it executes correctly. This lays the groundwork for split compilation workflows in ClangIR, allowing host and device CIR to be analyzed and transformed together before lowering, while preserving existing offload and bundling flows.
* Reads 2 CIR code files and creates a 'combined' region of these files * Improves CLI arg checking * Adds CLI test
|
@RiverDave and @bcardosolopes I will keep this PR open so that both of you can have an idea of the progress. |
graph between actions. They do not carry properly the Input suffix and
their device.
* Notes for next sprint:
1. CombineCIRActions needs to look more like an offloadAction
* It will contain a Host Input and a Device Input (One of each)
* It needs to explicitly set Device and Host Kinds
2. SplitCIRAction needs to follow the CombineCIR paradigm.
|
Currently the clang driver action graph looks like this: We are losing information regarding the device arch. |
* There is still room for improvement and reuse quite some of the
existing baseclass Action capabilities
Output looks like this:
+- 0: input, "vecadd.cu", hip, (host-hip)
+- 1: preprocessor, {0}, hip-cpp-output, (host-hip)
+- 2: compiler, {1}, cir, (host-hip)
| +- 3: input, "vecadd.cu", hip, (device-hip, gfx942)
| +- 4: preprocessor, {3}, hip-cpp-output, (device-hip, gfx942)
|- 5: compiler, {4}, cir, (device-hip, gfx942)
+- 6: comebinecir, "host-hip (x86_64-unknown-linux-gnu)" {2}, "device-hip (amdgcn-amd-amdhsa:gfx942)" {5}, cir
+- 7: splitcir, {6}, cir, (host-hip)
| +- 8: splitcir, {6}, cir, (device-hip, gfx942)
| +- 9: backend, {8}, assembler, (device-hip, gfx942)
| +- 10: assembler, {9}, object, (device-hip, gfx942)
| +- 11: linker, {10}, image, (device-hip, gfx942)
| +- 12: offload, "device-hip (amdgcn-amd-amdhsa:gfx942)" {11}, image
|- 13: linker, {12}, hip-fatbin, (device-hip)
+- 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {7}, "device-hip (amdgcn-amd-amdhsa)" {13}, cir
+- 15: backend, {14}, assembler, (host-hip)
+- 16: assembler, {15}, object, (host-hip)
17: linker, {16}, image, (host-hip)
| let summary = "Container for host and device CIR modules"; | ||
| let description = [{ | ||
| `cir.offload.container` is a top-level container used to keep host and device | ||
| CIR modules together for joint analysis and transformation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally thinking about something like:
module {
cir.host {
...
}
cir.device {
...
}
}
Any reason why you need the cir.offload.container wrapping both of them?
Based on our experience with optimizing cross library with LLVM IR dialect, we had to flatten the namespace and put everything in the same module (while adding an extra attribute to indicate the symbol origin, such that we can split them back after optimizations). In our case lots of problems came from symbol definitions being only available within another cir.library and MLIR not being able to properly handle symbol tables across them, is this a problem for this approach?
| StaticLibJobClass, | ||
| BinaryAnalyzeJobClass, | ||
| BinaryTranslatorJobClass, | ||
| CIRCombineJobClass, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIRCombineJobClass -> CIRCombineHostDeviceJobClass
This PR is a draft / working skeleton that documents my current progress on CIR host–device combine. The goal here is not to land this as-is, but to outline the individual components that need to change to enable the desired functionality.
Once I have an end-to-end pipeline working for PolyBench on AMD, I plan to split this into smaller, focused, reviewable PRs. Until then, this PR serves as a reference point for understanding the overall direction and moving pieces.
Current status
- GPUBinaryHandle
- ConstDataArrays
TY_CIRtype that is intentionally non-mergeable with earlier Actions.- This separation is required so we can hook the follow-up Action that consumes both host and device CIR.
Next steps