Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VMware layer tag reading #429

Merged
merged 2 commits into from
Jan 21, 2021
Merged

Fix VMware layer tag reading #429

merged 2 commits into from
Jan 21, 2021

Conversation

cstation
Copy link
Contributor

The VMware layer did not handle tags having a different data-size. This fixes the majority of cases, since the needed tags for determining the memory regions will often be located in the regular-sized tags.

Fixes #422

The VMware layer did not handle tags having a different data-size. This fixes the majority of cases, since the needed tags for determining the memory regions will often be located in the regular-sized tags.
@cstation cstation marked this pull request as draft January 19, 2021 21:19
Copy link
Member

@ikelos ikelos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I've just got a couple of questions (mostly about corner cases). Also, if you know of any way to tell whether a vmsn or vmss file is needed, that would be extremely handy so we can provide better error messaging... 5:)

# TODO: Read special data sizes (signalling a longer data stream) properly instead of skipping them
if data_len in (62, 63):
data_len = 4 if version == 0 else 8
offset += 2 + name_len + (indicies_len * index_len) + 2 * data_len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this confuses me, since it looks like we'd be doing this twice? Can you walk me through what's going on here please?

Copy link
Contributor Author

@cstation cstation Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clearify this a bit more in the next commit. Most tags have a data_len of 4 or 8, which are the regular cases and easy to handle. The layout looks as follows:

flags name_len name indices data
1 1 name_len 4 * indices_len data_len

Some tags however, have a data_len of 62 or 63, which indicate a longer datastream. The layout is than as follows:

flags name_len name indices data_size data_mem_size padding? data
1 1 name_len 4 * indices_len 4 or 8 4 or 8 2 data_size

The 4 or 8 (which I set as data_len) is version dependent, so this is where the version comes in. Difference between data_size and data_mem_size should occur in cases where the data is compressed (data_size < data_mem_size).

I based this mostly on Volatility 2.x-code and the original parser of the same author.

volatility/framework/layers/vmware.py Outdated Show resolved Hide resolved
Fix magic headers and properly read irregular-sized tags
@cstation
Copy link
Contributor Author

Looks good, I've just got a couple of questions (mostly about corner cases). Also, if you know of any way to tell whether a vmsn or vmss file is needed, that would be extremely handy so we can provide better error messaging... 5:)

I've had a look into this and it seems to me to be as follows:

  • When a vmem does not have any regions, it is a plain representation of the memory at the moment of the snapshot. It does not require any action of the VMware-layer. (This is already layed down in the code, since an exception is raised when 0 regions are encountered, causing Volatility to continue without VMware layer.) And so, it does not require the presence of a vmsn or vmss file.

  • In cases where the vmem is divided into regions, a vmsn or vmss is needed to determine where those regions are located in the vmem.

So, how to determine based on the vmem if it is determined in layers or not?

  • I've looked into a few cases where the vmem was divided into regions, and all vmem's started with the following bytes: 53 ff 00 f0 53 ff 00 f0.
  • I encountered two occasions in which the vmem was not divided into regions, for which in both cases the vmem started with 00 00 00 00 00 00 00 00. This seems logial to me, since the chance that the memory is filled on the very begining isn't that big.

However, I am not sure whether 53 ff 00 f0 53 ff 00 f0 are proper magic bytes, since I only tested this on my system. One could check on the absence of 00 00 00 00 00 00 00 00 to determine whether a vmsn or vmss is needed, but that seems a bit tricky to me.

@cstation cstation marked this pull request as ready for review January 21, 2021 09:52
@ikelos
Copy link
Member

ikelos commented Jan 21, 2021

Ah, ok. I guess all the ancient vmware files I had didn't require the vmware layer? I may increase the logging level on vmware files that seem valid up until the region count, and then tell people they need the metadata file...

@ikelos ikelos changed the base branch from master to develop January 21, 2021 10:09
@cstation
Copy link
Contributor Author

cstation commented Jan 21, 2021

Ah, ok. I guess all the ancient vmware files I had didn't require the vmware layer? I may increase the logging level on vmware files that seem valid up until the region count, and then tell people they need the metadata file...

Not really, since the two cases in which the vmem was not divided into regions were created using the latest version of VMware. Moreover, the region count-field is located in the vmsn or vmss, not in the vmem, so that wouldn't work.

@ikelos
Copy link
Member

ikelos commented Jan 21, 2021

Looks good, lemme know if you're happy for me to merge this into develop (we've changed the way we work with git, so now development happens on the develop branch, which won't go into the up coming release, but will go into the next one)...

@cstation
Copy link
Contributor Author

Happy to merge this one, so please go ahead 😃

@ikelos
Copy link
Member

ikelos commented Jan 21, 2021

Not really, since the two cases in which the vmem was not divided into regions were created using the latest version of VMware. Moreover, the region count-field is located in the vmsn or vmss, not in the vmem, so that wouldn't work.

Ah, that's a shame. 5:S I've just had some users trying to read vmem files and not understanding why they break because there's no great way of highlighting to the user that a required vmsn/vmss wasn't found. I guess we just include it in the documentation and point people there until it becomes well known...

@ikelos ikelos merged commit ae87259 into volatilityfoundation:develop Jan 21, 2021
@ikelos
Copy link
Member

ikelos commented Jan 21, 2021

Awesome, thanks very much!!! Really glad to see people starting to contribute, it's much appreciated! 5:D

@cstation
Copy link
Contributor Author

Ah, that's a shame. 5:S I've just had some users trying to read vmem files and not understanding why they break because there's no great way of highlighting to the user that a required vmsn/vmss wasn't found. I guess we just include it in the documentation and point people there until it becomes well known...

I would try to find out if 53 ff 00 f0 53 ff 00 f0 is indeed a magic indicating that a vmem needs a metadata file. This seems probable since this is always the case on my system, but some extra testing by others to substantiate this would be good.

@ikelos
Copy link
Member

ikelos commented Jan 21, 2021

Hmmm, I've got three vmem files that don't require a metadata file (seemingly?) that both start with those bytes. It seems the bytes appears at offsets 00000000, 0000000c, 0000006c in all of them. I've also got a vmem file with a vmss that doesn't start with those bytes, so I think that may just be what the OS decides to put in the zero page? 5:S Shame, but I guess there's no way of differentiating a raw file from a vmem one without vmems having some structure (and it's probably best they don't, qemu captures are extremely slow to interact with)...

@garanews
Copy link
Contributor

garanews commented Feb 22, 2021

Hello @cstation I am running vmware workstation 16.1.0, trying to analyze some dump coming from different ubuntus (14.04,16.04,18.04,20.04) but getting VMware VMEM is not split into regions
All dumps have

53ff 00f0 53ff 00f0 c3e2 00f0 53ff 00f0

I created the right symbols for each kernel, all dumps are reporting the VMware VMEM is not split into regions error but with ubuntu 20.04 I am getting the result of plugins (example pslist), for the others dumps

Unsatisfied requirement plugins.PsList.primary: Memory layer for the kernel
Unsatisfied requirement plugins.PsList.vmlinux: Linux kernel symbols

@garanews
Copy link
Contributor

image

@cstation
Copy link
Contributor Author

Hi @garanews!

VMware sometimes stores memory in several regions in a vmem file. The accompanying vmsn or vmss file contains information on how these partitions look like (start, size, etc). The message VMware VMEM is not split into regions could indicate two things:

  1. The memory is not splitted in several regions. The vmem is a plain representation of the memory at that time;
  2. The related vmsn or vmss file is not in the same folder as the vmem, or has a different name. Therefore, Volatility is not able to read how the regions look like. Could you check if the vmsn or vmss-files are in the same location and have the same name?

If so, and you still encounter errors, then the symbols could be incompatible. Did you create the symbols for the Ubuntu 14, 16 and 18 using the matching OS for each of them (so Ubuntu 14 symbols are generated on a Ubuntu 14 installation, etc)?

@garanews
Copy link
Contributor

Hi @cstation ,
the vmem and vmsn are in the same folder: I am running volatility passing the path of vmem file.
the vmem and vmsn have the same name.

image
I generated symbols on another machine using dwarf2json.

I will be happy to share with you the dumps and the symbols if have time to help me investigate.
Thanks!

@cstation
Copy link
Contributor Author

No problem! If you could upload them somewhere and share a link, I will have a look.

@garanews
Copy link
Contributor

give me an email address, I have already on my gdrive

@cstation
Copy link
Contributor Author

cstation commented Feb 25, 2021

👍, you can send it to [email protected]!

@garanews
Copy link
Contributor

@cstation done!

@cstation
Copy link
Contributor Author

cstation commented Feb 26, 2021

Seems like it isn't a VMware-related issue, but a issue with the symbols.

Output of vol.py -f {Ubuntu18xxxx}.vmem banner (symbols required for analyzing this vmem):

Linux version 5.0.0-23-generic (buildd@lgw01-amd64-030) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #24~18.04.1-Ubuntu SMP Mon Jul 29 16:12:28 UTC 2019 (Ubuntu 5.0.0-23.24~18.04.1-generic 5.0.15)

Output of vol.py isfinfo (symbols known to volatility):

Linux version 5.0.0-23-generic (buildd@lcy01-amd64-017) (gcc version 8.3.0 (Ubuntu 8.3.0-6ubuntu1)) #24-Ubuntu SMP Mon Jul 29 15:36:44 UTC 2019 (Ubuntu 5.0.0-23.24-generic 5.0.15)

Did you generate the symbols for Ubuntu 18 on the Ubuntu 18 VM, or on the Ubuntu 20 VM? You need to generate the symbol files on a identical set-up as the one you want to analyse (same OS, same OS-version, same kernel-version).

@garanews
Copy link
Contributor

I generated all symbols on Ubuntu 20.04VM: @ikelos told me that doesn't matter OS where you are running dwarf2json :)

@ikelos
Copy link
Member

ikelos commented Feb 27, 2021

Ok, so to avoid some confusion:

  • dwarf2json can be run on any operating system
  • dwarf2json will generate specific JSON files based on the files you hand into it (the debug kernel and system.map)
  • the debug kernel and system.map must match the OS you're targetting exactly
  • the banner and isfinfo output that cstation provided show the same linux kernel version but compiled on a different version of gcc at different times of day, and thus is not an exact match and won't get found by volatility

I hope that clarifies the situation somewhat? @garanews you need to provide dwarf2json with the debug kernel that has a banner that matches the output precisely.

@garanews
Copy link
Contributor

  1. done
  2. done
  3. done
  4. ehm, it will be a bit challenging find exactly the same kernel compiled with same gcc in the same day of this epoch...

Do you think that let vol.py being less strictive in this check an option?

@cstation
Copy link
Contributor Author

cstation commented Feb 27, 2021

@garanews, you did not use the debug kernel and system.map that match the OS you're targetting exactly, since you fed an Ubuntu 20 kernel and System.map to dwarf2json, whilst you want to analyse an Ubuntu 18 system.

The kernel and system.map files seem to be OS-dependent. So installing a certain kernel-version on Ubuntu 20 produces different files than installing the same kernel on Ubuntu 18. When both your OS-version and kernel-version match, the timestamp will always be the same. So, you should generate the ISF for Ubuntu 18 on an Ubuntu 18 VM.

You can use the timestamp to verify if you did things right, not to find the right kernel. As long as OS-version and kernel-version exactly match the target system, the timestamp will be the same.

@ikelos
Copy link
Member

ikelos commented Feb 27, 2021

I'm afraid not. Since kernels can be compiled with different options creating different structures, even a minor difference in the build string could mean a major difference in the structures available.

Less accurate checking of the kernel would be forensically less sound, and when an issue does arise, likely take up developer time simply to reach the conclusion that the profile wasn't close enough. We don't really have data to tell us how often these structures change, so I'm willing to change my mind in the future if someone can find data that supports it having a negligible impact, but until then accuracy over convenience is definitely the course we want to take.

@garanews
Copy link
Contributor

@cstation I know I need to pass extactly what I want to analyze, in fact on my Ubuntu 20.04 (with kernel 5.8.0) I did for example

 ./dwarf2json linux --elf /home/gara/Documents/5.0.0-23/usr/lib/debug/boot/vmlinux-5.0.0-23-generic --system-map /home/gara/Documents/5.0.0-23/System.map-5.0.0-23-generic | xz -c                                                                  > 5.0.0-23-both.json.xz

when try to match the memory dumped out an ubuntu 18.04 that is running kernel 5.0.0-23

@garanews
Copy link
Contributor

@ikelos in this way generating symbols withtout having system that is going to be analyzed running it will be so hard.
It is already not so easy find exact vmlinux available online but if also the compiler and compiled date need to be same...

For this issue (that is related to vmware) I will follow what @cstation suggested, so trying to generate the symbol directly on machine that I dumped, in this way the vmlinuz and system.map are the real ones running where memory is dumped. and let's see if it will work (skipping the error VMware VMEM is not split into regions).

But this is not what I planned because this is just a coincidence that I have both system and memory to be analyzed, normally I will have only memory...

@cstation
Copy link
Contributor Author

@garanews: the easiest way in such case is to set-up a new VM with matching OS-version, and install the right kernel-dbgsym (e.g. for Ubuntu follow https://gist.github.com/NLKNguyen/2fd920e2a50fd4b9701f and replace $(uname -r) by the desired kernel). After this, you run dwarf2json on this VM. This should always leave you with the right kernel version / timestamp. You could create some script for this to do this more automated (e.g. https://github.com/karmatr0n/dwarf2json-centos7 or a custom bash-script).

You could try to ommit using the same OS-version by passing a custom banner value to dwarf2json (volatilityfoundation/dwarf2json#12), but chances are very high that the resulting volatility-output will be incorrect / corrupted.

@ikelos
Copy link
Member

ikelos commented Feb 27, 2021

@garanews I understand that, and I'm sorry that it's difficult. Unfortunately not every memory image can be analyzed, and people analyzing windows images are in the same boat, dependent upon Microsoft publishing the debugging information.

In a similar way, a lot of the large distributions create debug packages for their kernels. There may be a lot of them and they may not all be archived, but they usually exist/existed at some point. You can download them and process them on any system just by knowing their URLs (see https://github.com/volatilityfoundation/volatility3/blob/develop/development/stock-linux-json.py for code that can help do this).

As an open source tool, you're entirely welcome to modify your copy to somehow accept the profile you've created, and I hope this returns you valid and useful results, but we can't support this within the official tool for the reasons [I've mentioned].(#429 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Volatility breaks on VMware Tags
3 participants