-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing SYSRETs #13
Comments
hi @shlomopongartz , how did you make the counts ? Yes, if an exit syscall is missing, the corresponding stack will continue to grow at the next syscall entering in the kernel.
I don't see why such thing would happen ? Let's investigate this issue, i want to make sure that the kernel code is behaving as it should be. |
Just added a counters in the if statement "if event.direction == SyscallDirection.exit:" and in the corresponding "else:" statement. and print the balance. |
Hi, |
I monitored the
So we have more If we take a look at static int em_sysret(struct x86_emulate_ctxt *ctxt)
{
const struct x86_emulate_ops *ops = ctxt->ops;
struct desc_struct cs, ss;
u64 msr_data, rcx;
u16 cs_sel, ss_sel;
u64 efer = 0;
struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt);
/* syscall is not available in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL ||
ctxt->mode == X86EMUL_MODE_VM86)
return emulate_ud(ctxt);
if (!(em_syscall_is_enabled(ctxt)))
return emulate_ud(ctxt);
if(ctxt->ops->cpl(ctxt) != 0){
return emulate_gp(ctxt,0);
}
//check if RCX is in canonical form
rcx = reg_read(ctxt, VCPU_REGS_RCX);
if(( (rcx & 0xFFFF800000000000) != 0xFFFF800000000000) &&
( (rcx & 0x00007FFFFFFFFFFF) != rcx)){
return emulate_gp(ctxt,0);
}
ops->get_msr(ctxt, MSR_EFER, &efer);
setup_syscalls_segments(ctxt, &cs, &ss);
if (!(efer & EFER_SCE) && !nitro_is_trap_set(vcpu->kvm, NITRO_TRAP_SYSCALL))
return emulate_ud(ctxt);
ops->get_msr(ctxt, MSR_STAR, &msr_data);
msr_data >>= 48;
//setup code segment, atleast what is left to do.
//setup_syscalls_segments does most of the work for us
if (ctxt->mode == X86EMUL_MODE_PROT64){ //if longmode
cs_sel = (u16)((msr_data + 0x10) | 0x3);
cs.l = 1;
cs.d = 0;
}
else{
cs_sel = (u16)(msr_data | 0x3);
cs.l = 0;
cs.d = 1;
}
cs.dpl = 0x3;
//setup stack segment, atleast what is left to do.
//setup_syscalls_segments does most of the work for us
ss_sel = (u16)((msr_data + 0x8) | 0x3);
ss.dpl = 0x3;
ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
ctxt->eflags = (reg_read(ctxt, VCPU_REGS_R11) & 0x3c7fd7) | 0x2;
ctxt->_eip = reg_read(ctxt, VCPU_REGS_RCX);
if(nitro_is_trap_set(vcpu->kvm, NITRO_TRAP_SYSCALL)){
vcpu->nitro.event.present = true;
vcpu->nitro.event.type = SYSCALL;
vcpu->nitro.event.direction = EXIT;
kvm_arch_vcpu_ioctl_get_regs(vcpu, &(vcpu->nitro.event.regs));
kvm_arch_vcpu_ioctl_get_sregs(vcpu, &(vcpu->nitro.event.sregs));
}
return X86EMUL_CONTINUE;
} The emulation has 5 occasions to fail and return before setting the event as present. compared to static int em_syscall(struct x86_emulate_ctxt *ctxt)
{
const struct x86_emulate_ops *ops = ctxt->ops;
struct desc_struct cs, ss;
u64 msr_data;
u16 cs_sel, ss_sel;
u64 efer = 0;
struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.emulate_ctxt);
if(nitro_is_trap_set(vcpu->kvm, NITRO_TRAP_SYSCALL)){
vcpu->nitro.event.present = true;
vcpu->nitro.event.type = SYSCALL;
vcpu->nitro.event.direction = ENTER;
kvm_arch_vcpu_ioctl_get_regs(vcpu, &(vcpu->nitro.event.regs));
kvm_arch_vcpu_ioctl_get_sregs(vcpu, &(vcpu->nitro.event.sregs));
}
/* syscall is not available in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL ||
ctxt->mode == X86EMUL_MODE_VM86)
return emulate_ud(ctxt);
if (!(em_syscall_is_enabled(ctxt)))
return emulate_ud(ctxt);
ops->get_msr(ctxt, MSR_EFER, &efer);
setup_syscalls_segments(ctxt, &cs, &ss);
if (!(efer & EFER_SCE) && !nitro_is_trap_set(vcpu->kvm, NITRO_TRAP_SYSCALL))
return emulate_ud(ctxt);
ops->get_msr(ctxt, MSR_STAR, &msr_data);
msr_data >>= 32;
cs_sel = (u16)(msr_data & 0xfffc);
ss_sel = (u16)(msr_data + 8);
if (efer & EFER_LMA) {
cs.d = 0;
cs.l = 1;
}
ops->set_segment(ctxt, cs_sel, &cs, 0, VCPU_SREG_CS);
ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
*reg_write(ctxt, VCPU_REGS_RCX) = ctxt->_eip;
if (efer & EFER_LMA) {
#ifdef CONFIG_X86_64
*reg_write(ctxt, VCPU_REGS_R11) = ctxt->eflags;
ops->get_msr(ctxt,
ctxt->mode == X86EMUL_MODE_PROT64 ?
MSR_LSTAR : MSR_CSTAR, &msr_data);
ctxt->_eip = msr_data;
ops->get_msr(ctxt, MSR_SYSCALL_MASK, &msr_data);
ctxt->eflags &= ~msr_data;
ctxt->eflags |= X86_EFLAGS_FIXED;
#endif
} else {
/* legacy mode */
ops->get_msr(ctxt, MSR_STAR, &msr_data);
ctxt->_eip = (u32)msr_data;
ctxt->eflags &= ~(X86_EFLAGS_VM | X86_EFLAGS_IF);
}
return X86EMUL_CONTINUE;
} I think the fix here is that we should move the code which configures the event from the beginning of the function to the end of I will try this solution and post the results ! |
Actually it's the reverse, we should move the code that configures the event in |
But if you put the code at the beginning of the sysret than this is before set_segment is called. |
So, if we want to monitor all syscall attempt (good or bad), we have to place our event at the beginning of the function, and if we want only the successful ones, place it at the end like you said. I implemented your solution on a new branch. The results are better, but we are still missing some
|
So it seems that for some successful syscalls there are failed sysrets. |
The trust should also go for the syscall. |
If i place the event at the beginning : {
"enter": 64289,
"exit": 64198
} Almost the same results |
This is odd, I wonder if the number of times em_syscall and em_sysret are the same. |
Hi, I think that maybe some of the syscalls are not really syscalls, I used the value in the GS register which holds the pointer to the Thread Information Block in 64 bit (FS in 32 bit guests) and tried to read it with vmi_read_addr_va and sometimes it failed! which I think may imply that it wasn't a real syscall. S.P. |
Hi, Just found out that all missing SYSRET belong to calls to NtContinue! S.P. |
Hi, I think i found why. After a bit of googling, i saw this comment:
We have to change our logic, and not systematically associate a |
Hi, Best regards S.P. |
Hi,
I was counting the number of SYSCALLs and SYSRETs and I see that that from time to time SYSRET is missing and I assume that the depth of the stack increases.
Is it possible that the kernel module sometimes fails to capture a SYSRET or that it is ignored from some reason?
The text was updated successfully, but these errors were encountered: