Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tool: xlsx2tsv #76

Merged
merged 12 commits into from
Dec 16, 2024
8 changes: 8 additions & 0 deletions tools/wrangling/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
categories:
- Text Manipulation
description: Convert an xlsx file to a tabular
long_description: Extract one sheet from an xlsx/xls file and convert it to a tabular format
name: excel_to_tabuler
owner: ufz
homepage_url: https://github.com/Helmholtz-UFZ/galaxy-tools
remote_repository_url: https://github.com/bernt-matthias/mb-galaxy-tools/tools/tox_tools/wrangling
rmassei marked this conversation as resolved.
Show resolved Hide resolved
Binary file added tools/wrangling/test-data/excel_test.xlsx
Binary file not shown.
4 changes: 4 additions & 0 deletions tools/wrangling/test-data/output_sheet_1.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
column0 column1
test1 value1
test2 value2
test3 value3
4 changes: 4 additions & 0 deletions tools/wrangling/test-data/output_sheet_2.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
column2 column3
test4 value4
test5 value5
test6 value6
32 changes: 32 additions & 0 deletions tools/wrangling/xlsx2tsv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import argparse

import pandas as pd


def convert_xlsx_to_tsv(input_file, sheet_name, output):
try:
# Read the specified sheet and convert them to tsv
df = pd.read_excel(input_file, sheet_name=sheet_name)
df.to_csv(output, sep='\t', index=False)
print(f"Extracted sheet '{sheet_name}' from {input_file}")

except Exception as e:
print(f"Failed to convert sheet '{sheet_name}' from {input_file}: {e}")


def main():
parser = argparse.ArgumentParser(description="Convert specific sheets from a single .xlsx file to .tsv format in the same directory.")
parser.add_argument("--input-file", type=str, required=True, help="Path to the input .xlsx file.")
parser.add_argument("--sheet-names", type=str, required=True, help="Comma-separated list of sheet names to convert.")
parser.add_argument("--output", type=str, default="extracted_sheet.tsv", required=False, help="Suffix for the tsv file")
args = parser.parse_args()

# Convert sheet names from str to list
sheet_names = args.sheet_names

# Call the conversion function with the provided arguments
convert_xlsx_to_tsv(args.input_file, sheet_names, args.output)


if __name__ == "__main__":
main()
50 changes: 50 additions & 0 deletions tools/wrangling/xlsx2tsv.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<tool id="xlsx2tsv" name="Excel to Tabular" version="0.1.0+galaxy0" license="MIT" profile = "23.0">
<description>with pandas</description>
<requirements>
<requirement type="package" version="2.2.1">pandas</requirement>
<requirement type="package" version="3.1.5">openpyxl</requirement>
rmassei marked this conversation as resolved.
Show resolved Hide resolved
</requirements>
<command detect_errors="aggressive"><![CDATA[
python '$__tool_directory__/xlsx2tsv.py'
--input-file '$input_file'
--sheet-names '$sheet_names'
--output '$output'
]]></command>
<inputs>
<param name="input_file" type="data" format="excel.xls,xlsx" optional="false" label="Input excel file" help="Input XLS/XLSX file"/>
<param name="sheet_names" type="text" optional="false" label="Name of the excel sheet" help="Excel sheet to convert to tsv"/>
</inputs>
<outputs>
<data name="output" format="tabular"/>
</outputs>
<tests>
<test>
<param name="input_file" value="excel_test.xlsx"/>
<param name="sheet_names" value="Sheet1"/>
<output name="output" value="output_sheet_1.tsv" ftype="tabular">
<assert_contents>
<has_text text="column0"/>
<has_n_columns n="2"/>
</assert_contents>
</output>
</test>
<test>
<param name="input_file" value="excel_test.xlsx"/>
<param name="sheet_names" value="Sheet2"/>
<output name="output" value="output_sheet_2.tsv" ftype="tabular">
<assert_contents>
<has_text text="column2"/>
<has_n_columns n="2"/>
</assert_contents>
</output>
</test>
</tests>
<help>
Description
-----------
Extract a sheet from XLS/XLSX file to a tabular file
</help>
<citations>
<citation type="doi">10.5281/zenodo.13819579</citation>
</citations>
</tool>
Loading