Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to read comments and chart titles? #478

Open
KfirAlfa opened this issue Nov 6, 2024 · 1 comment
Open

Is it possible to read comments and chart titles? #478

KfirAlfa opened this issue Nov 6, 2024 · 1 comment

Comments

@KfirAlfa
Copy link

KfirAlfa commented Nov 6, 2024

Hello,

Thanks for the awesome crate!!

I'm using Calamine to parse Xlsx files, and it works great. However, I didn't manage to extract comments and chart titles using calamine, but I had to parse them with quickxml. (see code below)

@tafia Can this functionality be added to Calamine (if it doesn't already exist)? I don't want to unzip the file twice.
I'll be happy to open PR if you could guide me to the places in the code where changes need to be made.

Thanks again for providing the crate!

fn extract_comments(
    archive: &mut ZipArchive<impl Read + Seek>,
) -> Result<Option<Comments>> {
    let comments_xmls: Vec<String> = archive
        .file_names()
        .filter(|name| name.starts_with("xl/comments"))
        .map(String::from)
        .collect();
    if comments_xmls.is_empty() {
        return Ok(None);
    }
    let mut comments = Comments {
        authors: None,
        comments: None,
    };
    for comment_file in comments_xmls {
        let mut file = archive.by_name(&comment_file)?;
        let mut contents = String::new();
        std::io::Read::read_to_string(&mut file, &mut contents)?;

        let mut reader = XmlReader::from_str(&contents);
        let mut buf = Vec::new();
        let mut in_authors = false;
        let mut in_author_text = false;
        let mut in_comments = false;
        let mut in_comment_text = false;

        loop {
            match reader.read_event_into(&mut buf) {
                Ok(Event::Start(e)) => match e.name().as_ref() {
                    b"authors" => in_authors = true,
                    b"author" => in_author_text = true,
                    b"comments" => in_comments = true,
                    b"comment" => in_comment_text = true,
                    _ => {}
                },
                Ok(Event::Text(e)) => {
                    if in_author_text && in_authors {
                        let author = e.unescape()?.into_owned();
                        comments.authors.get_or_insert(vec![]).push(author);
                    }
                    if in_comment_text && in_comments {
                        let comment = e.unescape()?.into_owned();
                        comments.comments.get_or_insert(vec![]).push(comment);
                    }
                }

                Ok(Event::End(ref e)) => match e.name().as_ref() {
                    b"authors" => in_authors = false,
                    b"author" => in_author_text = false,
                    b"comments" => in_comments = false,
                    b"comment" => in_comment_text = false,
                    _ => {}
                },
                Ok(Event::Eof) => break,
                Err(e) => return Err(XlsxParseError::Xml(e)),
                _ => {}
            }
        }
        buf.clear();
    }

    Ok(Some(comments))
}
@tafia
Copy link
Owner

tafia commented Dec 18, 2024

Thanks for the code!
It can be added indeed, I expect to have more time next year, end of year is a bit tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants