Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.lemminx-maven directory double the size needed for dependencies #545

Open
Kiiv opened this issue Jan 15, 2024 · 9 comments
Open

.lemminx-maven directory double the size needed for dependencies #545

Kiiv opened this issue Jan 15, 2024 · 9 comments

Comments

@Kiiv
Copy link

Kiiv commented Jan 15, 2024

#497 introduced a second maven repository to avoid pollution of the principal repository with not existing dependencies (origin in #213)

Unfortunately, the size of the repositories is growing in parallel, taking twice the space that should be used. When working with a lot of dependencies and projects, it is an important issue. My maven repository take almost 40go and the .lemminx-maven take approximately the same size.

If I understand, maven and lemminx-maven now use different repositories to resolve dependencies. This approach seems strange to me because it break the logic and optimization of having one place to share the same dependency between different builds.

It really sounds like a workaround to me instead of a clean solution.
I don't understand why some directories are created when resolving dependencies but I probably miss some implementation details here.

@angelozerr
Copy link
Contributor

I don't understand why some directories are created when resolving dependencies but I probably miss some implementation details here.

I did did that, because I didn't want to break the existing behavior which resolves dependencies when you are typing in your XML editor to provide dependency validation when you are typing. When dependency are resolved, maven aether download some metadata files (for each artififact character that your ar editing) and we have to avoid polluating the real maven repository with those aether metadata files.

It seems some user (like you) are not happy with this behavior, so I think a safe solution should be to provide a settings which resolves dependencyes as you are typing to enable / disable it.

@mickaelistria
Copy link
Contributor

we have to avoid polluating the real maven repository with those aether metadata files.

Can't we find a better way to achieve that? Such as adding a cleanup action to lemminx-maven to get rid of the extra useless files that may be created?

It seems some user (like you) are not happy with this behavior, so I think a safe solution should be to provide a settings which resolves dependencyes as you are typing to enable / disable it.

This is not good, as it would reduce drastically usability of lemminx-maven, some features would be missing on type. We don't want to reduce the features here, we just want to avoid duplicated repos or tons of useless files.
A DirectoryWatch on the m2 repo can be a good solution to cleanup extra files generated by lemminx-maven without loosing functionality nor duplicating m2 repo.

@laeubi
Copy link
Contributor

laeubi commented Feb 13, 2024

@mickaelistria I wonder why lemminx is actually downloading things, maybe it should better use the resolver API directly?

For example look at

https://github.com/apache/maven-resolver/blob/master/maven-resolver-demos/maven-resolver-demo-snippets/src/main/java/org/apache/maven/resolver/examples/FindAvailableVersions.java

this operates on metadata only and it seems possible to maybe even specify a (temorary) working directory that gets cleaned up afterwards, @cstamas maybe can give even better advice if it is possible to not persist data to disk at all but keep in memory.

Another issue I see with lemminx is that it even try to resolve invalid GAVs e.g ones that contain (unresolved) variables or even < ...

just looking at my disk the .lemminx cache fills up 3GB of data already, if now we add that's really quite a lot... if it now also doubles the size of the local repository that way to much...

@cstamas
Copy link

cstamas commented Feb 13, 2024

From discussion above it is unclear to me: are artifacts downloaded as well? As both parties mention "When dependency are resolved, maven aether download some metadata files" and @laeubi also points only "only metadata is needed".

I did cursory look at the sources, and while it is not everything clear to me yet, I do see a huge overlap in functionality with MIMA, unsure why is that not used.

Other issue I spotted is use of deprecated classes from Maven (especially use of maven-artifact stuff for layout) that will clearly break, if Maven 3.9 use uses something like "split local repository" and so on (see https://issues.apache.org/jira/browse/MNG-7706).

@mickaelistria
Copy link
Contributor

I wonder why lemminx is actually downloading things, maybe it should better use the resolver API directly?

IIRC, It does use the resolver API since the beginning, but then some people complained that searching for artifacts was creating "noise" (cache aether marker files mostly) in the .m2 repository for GAVs that do not resolve to anything existing, so some other contributors decided to implement a whole different approach of lemminx-maven using a totally different local repository that people wouldn't look at so where lemminx-maven can mess things up (at the cost of duplication).

and it seems possible to maybe even specify a (temorary) working directory that gets cleaned up afterwards

That is the interesting bit no-one considered for lemminx-maven as far as I know. If this can be made to work for lemminx-maven, it would be a good solution against the duplicated repo contents.
Note that the problem were really the cache folders/files created by the Maven resolver in the repo. If we can have the resolver have those cache in memory or in any disposable dir, that would solve everything.

@mickaelistria
Copy link
Contributor

I did cursory look at the sources, and while it is not everything clear to me yet, I do see a huge overlap in functionality with MIMA, unsure why is that not used.

I see first commit of MIMA was Mar 28, 2023; while the first commit on this repo is 3 years older (and the project is actually even older as it was incubating in some other Git repo before joining Eclipse). So this code predates MIMA and had to work without it for a long time; and it happened to be working good enough to not bring the need to adopt another technology.

@cstamas
Copy link

cstamas commented Feb 13, 2024

Yes, I just realized this is much older stuff than MIMA as I typed the comment. Anyway, the overlap is still there, and is problematic: settings is not decrypted, system properties are wrongly collected (where are env things?), use of legacy apis instead, use of MavenProject instead of Model + some context, resolving whole project (all of it's dependencies) while it is really not needed, etc.

Will try to fix these step by step once I get there...

@cstamas
Copy link

cstamas commented Feb 14, 2024

For start created a PR for Resolver: apache/maven-resolver#430
This (once polished and merged) will make possible to run Resolver against any NIO2 FileSystem, for example like the Demo shows, Google JIMFS. Of course, this project will need to get rid of actual artifact downloading, but baby steps...

@cstamas
Copy link

cstamas commented Jul 2, 2024

apache/maven-resolver#526

That PR above make use of Resolver 2 + Maven 4 classes using Google JIMFS and run all the "demo" snippets on it. Resolver 2.0.0 is on vote, while Maven 4.0.0-beta-3 is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants