-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix VMware layer tag reading #429
Conversation
The VMware layer did not handle tags having a different data-size. This fixes the majority of cases, since the needed tags for determining the memory regions will often be located in the regular-sized tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I've just got a couple of questions (mostly about corner cases). Also, if you know of any way to tell whether a vmsn or vmss file is needed, that would be extremely handy so we can provide better error messaging... 5:)
# TODO: Read special data sizes (signalling a longer data stream) properly instead of skipping them | ||
if data_len in (62, 63): | ||
data_len = 4 if version == 0 else 8 | ||
offset += 2 + name_len + (indicies_len * index_len) + 2 * data_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this confuses me, since it looks like we'd be doing this twice? Can you walk me through what's going on here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clearify this a bit more in the next commit. Most tags have a data_len
of 4 or 8, which are the regular cases and easy to handle. The layout looks as follows:
flags | name_len | name | indices | data |
---|---|---|---|---|
1 | 1 | name_len |
4 * indices_len |
data_len |
Some tags however, have a data_len
of 62 or 63, which indicate a longer datastream. The layout is than as follows:
flags | name_len | name | indices | data_size | data_mem_size | padding? | data |
---|---|---|---|---|---|---|---|
1 | 1 | name_len |
4 * indices_len |
4 or 8 | 4 or 8 | 2 | data_size |
The 4 or 8 (which I set as data_len
) is version dependent, so this is where the version comes in. Difference between data_size
and data_mem_size
should occur in cases where the data is compressed (data_size < data_mem_size
).
I based this mostly on Volatility 2.x-code and the original parser of the same author.
Fix magic headers and properly read irregular-sized tags
I've had a look into this and it seems to me to be as follows:
So, how to determine based on the
However, I am not sure whether |
Ah, ok. I guess all the ancient vmware files I had didn't require the vmware layer? I may increase the logging level on vmware files that seem valid up until the region count, and then tell people they need the metadata file... |
Not really, since the two cases in which the |
Looks good, lemme know if you're happy for me to merge this into develop (we've changed the way we work with git, so now development happens on the develop branch, which won't go into the up coming release, but will go into the next one)... |
Happy to merge this one, so please go ahead 😃 |
Ah, that's a shame. 5:S I've just had some users trying to read vmem files and not understanding why they break because there's no great way of highlighting to the user that a required vmsn/vmss wasn't found. I guess we just include it in the documentation and point people there until it becomes well known... |
Awesome, thanks very much!!! Really glad to see people starting to contribute, it's much appreciated! 5:D |
I would try to find out if |
Hmmm, I've got three vmem files that don't require a metadata file (seemingly?) that both start with those bytes. It seems the bytes appears at offsets 00000000, 0000000c, 0000006c in all of them. I've also got a vmem file with a vmss that doesn't start with those bytes, so I think that may just be what the OS decides to put in the zero page? 5:S Shame, but I guess there's no way of differentiating a raw file from a vmem one without vmems having some structure (and it's probably best they don't, qemu captures are extremely slow to interact with)... |
Hello @cstation I am running vmware workstation 16.1.0, trying to analyze some dump coming from different ubuntus (14.04,16.04,18.04,20.04) but getting
I created the right symbols for each kernel, all dumps are reporting the
|
Hi @garanews! VMware sometimes stores memory in several regions in a
If so, and you still encounter errors, then the symbols could be incompatible. Did you create the symbols for the Ubuntu 14, 16 and 18 using the matching OS for each of them (so Ubuntu 14 symbols are generated on a Ubuntu 14 installation, etc)? |
Hi @cstation ,
I will be happy to share with you the dumps and the symbols if have time to help me investigate. |
No problem! If you could upload them somewhere and share a link, I will have a look. |
give me an email address, I have already on my gdrive |
👍, you can send it to [email protected]! |
@cstation done! |
Seems like it isn't a VMware-related issue, but a issue with the symbols. Output of
Output of
Did you generate the symbols for Ubuntu 18 on the Ubuntu 18 VM, or on the Ubuntu 20 VM? You need to generate the symbol files on a identical set-up as the one you want to analyse (same OS, same OS-version, same kernel-version). |
I generated all symbols on Ubuntu 20.04VM: @ikelos told me that doesn't matter OS where you are running dwarf2json :) |
Ok, so to avoid some confusion:
I hope that clarifies the situation somewhat? @garanews you need to provide |
Do you think that let vol.py being less strictive in this check an option? |
@garanews, you did not use the debug kernel and system.map that match the OS you're targetting exactly, since you fed an Ubuntu 20 kernel and System.map to dwarf2json, whilst you want to analyse an Ubuntu 18 system. The kernel and system.map files seem to be OS-dependent. So installing a certain kernel-version on Ubuntu 20 produces different files than installing the same kernel on Ubuntu 18. When both your OS-version and kernel-version match, the timestamp will always be the same. So, you should generate the ISF for Ubuntu 18 on an Ubuntu 18 VM. You can use the timestamp to verify if you did things right, not to find the right kernel. As long as OS-version and kernel-version exactly match the target system, the timestamp will be the same. |
I'm afraid not. Since kernels can be compiled with different options creating different structures, even a minor difference in the build string could mean a major difference in the structures available. Less accurate checking of the kernel would be forensically less sound, and when an issue does arise, likely take up developer time simply to reach the conclusion that the profile wasn't close enough. We don't really have data to tell us how often these structures change, so I'm willing to change my mind in the future if someone can find data that supports it having a negligible impact, but until then accuracy over convenience is definitely the course we want to take. |
@cstation I know I need to pass extactly what I want to analyze, in fact on my Ubuntu 20.04 (with kernel 5.8.0) I did for example
when try to match the memory dumped out an ubuntu 18.04 that is running kernel 5.0.0-23 |
@ikelos in this way generating symbols withtout having system that is going to be analyzed running it will be so hard. For this issue (that is related to vmware) I will follow what @cstation suggested, so trying to generate the symbol directly on machine that I dumped, in this way the vmlinuz and system.map are the real ones running where memory is dumped. and let's see if it will work (skipping the error VMware VMEM is not split into regions). But this is not what I planned because this is just a coincidence that I have both system and memory to be analyzed, normally I will have only memory... |
@garanews: the easiest way in such case is to set-up a new VM with matching OS-version, and install the right kernel-dbgsym (e.g. for Ubuntu follow https://gist.github.com/NLKNguyen/2fd920e2a50fd4b9701f and replace You could try to ommit using the same OS-version by passing a custom banner value to dwarf2json (volatilityfoundation/dwarf2json#12), but chances are very high that the resulting volatility-output will be incorrect / corrupted. |
@garanews I understand that, and I'm sorry that it's difficult. Unfortunately not every memory image can be analyzed, and people analyzing windows images are in the same boat, dependent upon Microsoft publishing the debugging information. In a similar way, a lot of the large distributions create debug packages for their kernels. There may be a lot of them and they may not all be archived, but they usually exist/existed at some point. You can download them and process them on any system just by knowing their URLs (see https://github.com/volatilityfoundation/volatility3/blob/develop/development/stock-linux-json.py for code that can help do this). As an open source tool, you're entirely welcome to modify your copy to somehow accept the profile you've created, and I hope this returns you valid and useful results, but we can't support this within the official tool for the reasons [I've mentioned].(#429 (comment)). |
The VMware layer did not handle tags having a different data-size. This fixes the majority of cases, since the needed tags for determining the memory regions will often be located in the regular-sized tags.
Fixes #422