-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apex failing with error -110 (No /dev/apex_0) #343
Comments
@dakota we don't actually support virtual machines, |
Works fine on the host system with Ubuntu installed. So definitely something that ESXi is doing (or not doing). Any suggestions on where to look? Or do I just scratch the whole idea :/ |
Can corroborate the same results here. Same Coral and same adapter. Probably many people looking into this since the USB versions are difficult to find. Could you shed some light into what the error messages with apex mean with respect to resource management? I'm sure with enough hints we can figure out how to tweak Esxi. |
Same error messages here with ESXi 7.0 Update 1 and Debian 10 on a Supermicro SYS-E300-8D. |
Also trying to work through this. I got this far but as mentioned in OP I don't have /dev/apex_0 with the same dmesg you have above.
Out of curiousity, are you guys running this? |
Someone mentioned, either in this thread or a different one, that ESXi was displaying the Coral as Global Unicorp via one pci-usb adapter, but when using a different adapter it showed as Google, and then it worked.
So this may be:
- totally irrelevant
- specific to usb vs pci/m2
- or a clue that the apex driver is looking for a specific identifier, not seeing it, and failing to load
Is there a way to hack this identifier in ESXi? Can we change the driver to not look for an exact string?
Grasping for anything...
…________________________________
From: KillahB33 ***@***.***>
Sent: Thursday, 22 April 2021 05:52
To: google-coral/edgetpu
Cc: MEntOMANdo; Comment
Subject: Re: [google-coral/edgetpu] Apex failing with error -110 (No /dev/apex_0) (#343)
Also trying to work through this.
I got this far but as mentioned in OP I don't have /dev/apex_0 with the same dmesg you have above.
lspci -nn | grep 089a
0b:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#343 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ARKXFRP2LBWCPTSHFCAPCBLTKALZHANCNFSM4ZXJNPFQ>.
|
Also grasping, I spent the morning to see what people were doing to get GPU passthrough to work hoping there would be some overlap. Do you have any info on the adapter? I would be curious to see what his device looks like in esxi because mine is just showing as |
Would be good thing to know, I have had other things that didn't work on esxi but passed through fine so this seems odd. I tried a few thing today with no luck. I updated passthru.map with device info |
I also tested a fresh install with UEFI instead of legacy BIOS.
Additionally setting |
Hm interesting, well I ordered a usb3 card and a enclosure, will update if that works. |
Anyone have any luck? I'm stuck at same spot. Using ableconn adapter, able to pass the PCI device through and see it in ubuntu VM but apex device isn't created. |
I couldn't get it to work and have instead resorted to a small standalone computer with the Coral in instead of in a VM. |
Which did you go with? I also did the same, went with a Jetson Nano |
An old AMD Athlon 5150 based computer that I had lying around. Busy shopping around for a Dell Optiplex SFF (or similar) with a more modern CPU. |
I don't think this is it since it displays exactly the same on my standalone AMD Athlon based system. |
I fixed my issues... first comment on this thread. Running on an HP proliant gen8, proxmox passing through to ubuntu but had it working with HassOS as well. I was not successful with ESXi and migrated hypervisors because of it. There was also some HP specific bug I had to resolve? Getting this working was a deep, dark rabbit hole but it's working great now with frigate. Here's the HP specific bit: |
I also just found this guide https://www.reddit.com/r/Proxmox/comments/n34f8q/proxmox_vm_ubuntu_2004_frigate_2x_google_coral_tpu/ It seems like the root of the issue is that ESXi's implementation of MSI-X is broken, or at least causes issues with the Apex driver. |
@jayburkard I am actually just looking at migrating as well because of all of this. My jetson nano isn't working nearly as well as I though it would so I would rather use it for something specific to it's use case and have this in my main docker setup. Will probably do the migration this weekend. Luckily only two vms for me so shouldn't take too long. |
Same here, 2 VMs (had 3 but migrated HassOS to run supervised in ubuntu - every time I provisioned >1TB for it hassos would crash/not boot, known bug with no resolution). I think you can absolutely get the hardware working on proxmox. I had to dig to find all that but now that it's working it's been flawless and if you have anything but a proliant g8 you should have way fewer steps :) I wanted ESXi to keep my skills up for industry but I get plenty of experience at work and really like proxmox. |
I am also experiencing the same issue. Has there been any movement on getting the issue resolved on either the VMware side or Coral device? |
I am also experiencing the same issue. Trying to get it to work with VMware ESXI 8.0 |
Ah crap. Im another +1 on this. Shame I didnt see this post before I found the error |
Yet another +1. Did not see this before I bought a few M.2 Corals. Would really like to get this running on my servers. Migration is not an option due to size. |
In reading this post about the USB version of this device https://williamlam.com/2023/05/google-coral-usb-edge-tpu-accelerator-on-esxi.html |
@smallsam possibly ... If someone can get me the device, happy to take a look when I get some time |
@k1n6b0b sure. I don't think GH allows DM but you can use our HQ address with attention to William Lam 3401 Hillview Ave. Palo Alto, CA 94304 |
Darn. Wish I would have found this before I bought two of these m.2 TPU devices. Still had time to cancel my backorder for two more thankfully. |
@lamw did you get hold of one of these devices yet? If not I can get one to you fairly quickly (& don't need it back). |
@chris20 No, I never received the device |
@lamw Ok, I ordered one via Amazon (though it’s a third-party seller so the delivery won’t be from them). In the delivery instructions I said “mailroom”; if it should go to front office or reception please let me know and I’ll amend the order. You should have on Wed or Thu this week. (I’d have sent you one of the m.2’s I had here but I’m in Australia so this seemed easier :) |
Thanks.The address is our main shipping, no need for instructions as long as you've got my full name on receiver. JFYI - I've been heads down preparing for our upcoming conference,so won't be able to do anything until after that at earliest |
@chris20 just want to ACK that I've received the M.2 TPU |
🙌🙌🙌 apologies I never sent my TPU, it's still deep in the supermicro. Awesome to see someone else was able to contribute one. 🤞 you find the resolution! |
Really hope the PCIe card can be made to work but I'm going to passthrough a M.2 asmedia USB3 card instead to run the USB Coral device. Passing just the device via ESXI proved to be un-reliable. Lots of USB bus errors after running Frigate for a while and it would take down the entire VM, make ESXI unresponsive until I physically unplugged the USB coral device from the server. |
I didn’t believe everything I read, but the cable was my problem. You need excellent quality cable, and the ones from Amazon did not work. When I used one from my M2 to USB enclosure, never had any more problems with esxi. I also tried native port then added a pci card - both failed because of the cable :-(On Aug 29, 2023, at 13:21, goldserve ***@***.***> wrote:
Really hope the PCIe card can be made to work but I'm going to passthrough a M.2 asmedia USB3 card instead to run the USB Coral device. Passing just the device via ESXI proved to be un-reliable. Lots of USB bus errors after running Frigate for a while and it would take down the entire VM, make ESXI unresponsive until I physically unplugged the USB coral device from the server.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@lamw |
@Real-Taz Sorry, for some reason I thought I had responded but looks like it was another GH thread. Please see blakeblackshear/frigate#3604 (comment) and ultimately, the fix needs to come from Google/Coral team which I had file bug back in Aug google-coral/libedgetpu#48 with no response :( |
Hi, I had the same problem on the esxi.
First I was trying the usb coral, I somehow managed to workaround all issues on the esxi, but after few weeks it got unstable and crashed and whole setup needed to be repeated, then after few days the same. So I ordered mini pcie coral variant and tried that on the higher end pc via minipcie->pcie adapter card, but with no luck. I was able to passthrough it to the VM, but in the VM the drivers wouldn't load up as in the first comment in this thread. Then I tried it also on the mini-pc (this one https://www.aliexpress.com/item/1005004848553416.html) because it has direct mini pcie slot. But ended at the same place as with the higher end pc, the same issue. So I ordered new 128GB nvme to try proxmox instead of esxi (didn't want to purge the old 128GB if this would have been the same dead end) BUT! It seems I got beyond this problem. :) I will update you guys If I manage to get it working with frigate. Didn't go that far yet 😉 |
I'm using a M.2 Accelerator B+M key on Proxmox for months now. Only difference with @jakubsuchybio is that I'm not using a VM, but LXC. Never tried a VM, but in case it doesn't work there is a fallback scenario. |
I have read about the LXC way, but as I'm more familiar with VMs, so I started this way. Great to know that LXC works fine 👍 |
The issue is when using PCI pass through to a virtual machine. |
The only problem I'm having with LXC is that with every kernel update of Proxmox I have to reinstall the kernel headers, because the driver is on the host (as you stated). I wished I never started with ESXi, I will never switch back! |
I jumped ship from ESXi to Proxmox and its working for me too. |
The only problem is, that I have intel N5105 which has some instruction bug and the VM in the proxmox is freezing. More about it here: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/page-1 Everything I tried from that thread didn't help me, so I'm stuck at this freezing and is not usable for my case. |
Seems my decision to move from ESXi to proxmox came just in time https://kb.vmware.com/s/article/2107518?lang=en_US |
I'm using the mini-pcie version with this Ableconn adapter that has multiple reports of working. It is in Dell PowerEdge T420 running ESXi, and being passed through to a Ubuntu 20.04 VM. Followed the official guide to install the drivers.
The
/dev/apex_0
device is however not showing. Any ideas (bunch of debug info below)?lscpu
uname -a
dmesg | grep apex
lspci
lspci -vvv
modinfo gasket
The text was updated successfully, but these errors were encountered: