-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ioctl ECREATE error -5 when running on virtual machine #955
Comments
|
Actually I have tried two different distributions, CentOS 8 and Ubuntu 20.04. In the CentOS one, I built and reinstalled Linux kernel of version 5.18.15, turning SGX feature on, and the kernel version of the Ubuntu one is 5.15.0. Both of them have |
Hm. Can you try to install Intel SGX SDK and run some examples from there? For example, https://github.com/intel/linux-sgx/tree/master/SampleCode/SampleEnclave. I currently don't understand what's going wrong with your machine. |
Thanks for your reply. I try running some examples there and it seems all right to me. For SampleEnclave,
When it comes to mitigation, |
Interesting. Error This seems to only happen on exceptions during ECREATE instruction. Out of all the possible reasons for exceptions, I can only see this one happenning: May I ask what Intel CPU are you using? What is the output of |
The output about CPU in th VM is as follows:
The CPU info of the host machine is almost the same, but with more cores. However, Gramine runs perfectly well on the host. |
@Zebartin Could you also show the output of |
I wonder if @mythi can provide any insights on running a VM that supports Intel SGX. I currently don't understand what is going wrong with EENTER. |
The output of
|
I haven't seen any problems with it. @Zebartin what version tag of qemu you used from that Intel repository? I guess it's worth pointing out that upstream Qemu has supported SGX since 6.2. I've only used Qemu 6.2+ from the Distros (e.g., Ubuntu 22.04 has it). Do you see anything strange in the guest's |
I'm in agreement with Mikko... I haven't seen any issues using SGX in a QEMU 6.2+ guest VM. Not sure what's going on here. One question I didn't see asked... What is the host OS distro and kernel version? I assume it has the SGX kernel module and the /dev/sgx_vepc device? |
I am aware of that. I am using Qemu built from the official gitlab repository, with options I can not figure out anything strange in
|
The host OS distro is CentOS 8. The kernel version is 5.18 and was built with SGX and SGX virtualization enabled, so there is Maybe I should try reinstalling my Qemu... I did see others like you using Qemu with SGX normally, I am confused now. And it strikes me that I had to modify some codes of Qemu in order to make it run, according to this. Did you do so? |
If you want to use libvirt with qemu, yes there are some changes that have to be made like this. That part is still being worked. But, if you use qemu 6.2+ only and directly (no libvirt), I have personally verified this works. |
I've also used Qemu build manually and everything worked as expected. I did not have to modify anything sgx related to make it work. Update: I uesed qemu directly without libvirt |
AFAIK, it was libvirt that was failing. My setup is using vanilla Ubuntu 22.04 but with libvirt 0.8.6 installed from Kinetic repo. |
Maybe a silly question, but have you tried "make clean" then "make SGX=1" in the VM? If the .token file has some flag mismatch, it will also cause error -5. |
It turns out that Qemu 7.1.0 is the problem. I tried 6.2.0 as @boryspoplawski suggested, and tried 7.0.0 also, they all work well with Gramine. I also tried integrating Qemu 7.0.0 with libvirt, and it works fine. The ioctl error -5 comes up only when I use Qemu 7.1.0 built from the official download page or the latest version of git repo. I can not figure out whether it is Qemu's fault or not. Thank you all for your suggestions and guidances! |
Indeed. With Qemu 7.1 on an ICX, Gramine works only when I remove this line:
If either avx or avx512 bit is set in the token, gramine will fail with error -5. But with Qemu 6.2 and the same HW + image, it works fine. I'm not familiar with Qemu and don't know how to fix it. This is the command line I used to launch Qemu 7.1: ./qemu-system-x86_64 -enable-kvm -cpu host,+sgx-provisionkey -object memory-backend-epc,id=mem1,size=8G,prealloc=on -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0 -smp 8 -m 16384 -drive ... -netdev ...
|
Looks like the issue is solved on the Gramine side (solution: do not use the latest QEMU v7.1). It would be interesting to debug why QEMU started failing, and how is this related to AVX/AVX512 |
would this also be an additional argument for not forcing
But I agree it'd make sense to understand what's going on. We could submit an issue to https://github.com/intel/qemu-sgx |
I think that the fact that a change in |
No. We do create a dummy token: gramine/python/graminelibos/sgx_get_token.py Lines 148 to 149 in d5599d5
But the attributes (SECS.ATTRIBUTES.XFRM) are unconditionally taken from the host system's available CPU features: gramine/python/graminelibos/sgx_get_token.py Line 123 in d5599d5
So both EPID and DCAP use populate the |
I know we create it, but I thought that "dummy" means that we don't use it later (except to just preserve the interface). I don't see how we use it on DCAP, maybe only to initialize starting attributes of an enclave? Even if, why do they differ from the ones we'd choose when taking them from the host? |
Yes, I will try to contact them. |
This was also my question and I was referring to #363. |
Not true. We use the "dummy" token (
Yes, you're exactly correct. Even if we use DCAP, we still generate a
Not sure I understand this question. Since the Of course, we could move the logic of determining the "starting attributes of the enclave" to our Gramine untrusted-PAL startup C code. Currently this logic is in the Python code (I gave links above). When we discussed #363, I mentioned it somewhere, that the only thing we should "lift" from the dummy token generated by Python is this "what are the starting attributes" logic (that queries |
So, isn't this @Zebartin's bug here, not qemu's/gramine's? The token should be generated on the machine where you run the enclave, not where you build it. And I think he's generating it on the host, and then uses inside a VM (which is a different machine, technically). I agree that moving this logic to startup code could make this case easier, but this may be complicated, we'd need separate logic for EPID where we have to take the attributes from the token. If this won't end up super complex then we could do it, but hard to say for me if that's the case. |
Oh, is this how @Zebartin does it? This wasn't clear to me from the issue description. I thought that all the files are generated inside the VM, i.e. the whole Gramine testing/building/tweaking happens inside a VM at all times. But if it's indeed that @Zebartin generates the |
Well, the only things I did were all from Quick start part of the docs. I know neither what the It seems that @lejunzhu reproduced the same result, and I believe that @lejunzhu would not make the same mistakes as mine, if there is any. |
Not really. Even when I generate the token file on the same VM, the issue still happens. There are two different findings here:
|
Wow, this is clearly a bug. Isn't it affecting not only SGX, but any VM on any machine? |
In this QEMU 7.1 situation, it is most likely a bug.
|
One patch sent to Qemu community to fix this issue @lejunzhu also verified this issue in his ENV, thanks for reporting it to me! |
@lejunzhu We hit this problem again, on QEMU 8.0.
How do SDK apps "learn" about XCR0? I didn't find any way to get XCR0 values from Linux. |
It uses XGETBV instruction. I think this is the function that does it: https://github.com/intel/linux-sgx/blob/master/psw/urts/se_detect.cpp#L97 |
Thanks @lejunzhu. I created two PRs that align with SGX SDK style (and which is the correct way):
Could you maybe also do a (light) review of these PRs? Does the logic seem right to you? |
#1403 looks correct. I have only one comment: in case this issue happens again in the future, should we print a warning message when CPU and OS report things differently? I'm not familiar with how XSAVE is used in Gramine, so I have no idea about #1402. |
Just for info. The patch that fixes the XCR0 and CPUID.12.1 issue was merged in QEMU in April 2023 and appears in QEMU starting from v8.0.1. See for details:
|
Description of the problem
I tried running Gramine on a virtual machine but failed to run the helloworld example.
Steps to reproduce
Steps 1 to 3 are based on this article, part of which is somewhat deprecated though.
virt-manager
and install Ubuntu 20.04 in it.Expected results
Actual results
Gramine commit hash
Gramine was installed via Ubuntu's apt:
The text was updated successfully, but these errors were encountered: