Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large 834 file parse #3

Open
Lucier82 opened this issue Jun 5, 2020 · 5 comments
Open

Large 834 file parse #3

Lucier82 opened this issue Jun 5, 2020 · 5 comments

Comments

@Lucier82
Copy link

Lucier82 commented Jun 5, 2020

I'm not even sure if this project is being supported but I have recently started using this parser for my application and have found the parsing of large files to be time consuming and possibly error prone. In my case I have a 1GB 834 EDI file that is causing a heap space issue on my line of code: this.x12 = (X12) parser.parse(this.content); where this.content represent a string containing the file data... I know this is probably the worst possible solution however; I'd like to know if parsing in this manner using an Input Stream of File object still utilizes the same memory consumption or if they use buffered readers to keep heap space usage lower?

@hzbarcea
Copy link
Owner

hzbarcea commented Jun 5, 2020

Great that you're using this parser @Lucier82 , many thanks! There were virtually no contributions in a long time, mainly due to the fact that it's stable and parsing standard formats is not an evolving field. On top of that EDI is increasingly replaced by xml and json.

Since this.content is a string it's allocated and it uses space in memory. Using an InputStream should decrease memory use significantly. Also reducing the scope of use of this.content to allow for garbage collection as soon as possible should help. The JVM is usually pretty good at optimizing gc. And on production machines nowadays it's pretty common to have over 100G of RAM for memory intensive applications.

I hope this helps.

@Lucier82
Copy link
Author

Lucier82 commented Jun 5, 2020

I work in the health care industry and as far as I can tell, EDI isn't going anywhere any time soon (for me at least; i can't speak to other industries and companies). I appreciate this parser I commonly refer to as the "Peanut Butter" parser because for awhile I couldn't remember who pb stood for. Anyways... I am trying all the parse() implementations today to see if any of them work for my larger ~1GB file. I didn't like using the String but it was an easy crutch to fall on during development and has been working fine up until now.

Thank you for the quick feedback, I wasn't certain I was going to get anyone due to the age of the project. I will update this issue when I get to some sort of resolution.

@hzbarcea
Copy link
Owner

hzbarcea commented Jun 5, 2020

@Lucier82 Peanut Butter sound good, and yummy 😃 . If you'll have a patch I could release an update, not a problem, so you can have your dependencies in Maven Central.

@Lucier82
Copy link
Author

Update: I am using this project through Maven central, which was nice so I didn't have to import and manage the jar manually;
I'm modifying a few pieces of my application that has also using up a decent amount of memory and from what I can tell even when I'm using the parser as seen here (X12) parser.parse(Files.newInputStream(path)); I'm still running into a heap space error. I'm running on a Windows 10 64-bit system with 32GB RAM and while watching the applications resource consumption through task manager the process was using 5,701.8MB of memory before Java threw the heap space error. I will attempt again using a different implementation of parse() and provide a new update after I get through that.

@Lucier82
Copy link
Author

So I did this again with the parse(File) implementation and found the same results with memory still allocated at up to 5600MB. I've made some changes to my process because I'm running a SHA hash on the EDI file and before I cleaned that process up I was using a String Builder to capture my byte stream but now I'm not saving anything more than I have to so the only other cause would be the X12 Parsers holding on to un-referenced objects. So I got curious and cloned the project locally to really dig into it and found the Buffered Reader in the parse(InputStream) function does not appear to close and the String Builder is assigned to a String variable that's not used past the calling of parse(String).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants