Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "not sequential unbroken integers" for Line plot #20

Open
yeli7068 opened this issue Dec 30, 2021 · 1 comment
Open

Error "not sequential unbroken integers" for Line plot #20

yeli7068 opened this issue Dec 30, 2021 · 1 comment

Comments

@yeli7068
Copy link

Dear Dr. Bloom,

I tried the line plot in dmslogo with toydata.csv. Errors say "not sequential unbroken integers".

Then I turned to the example. Even after reading the instruction, I still felt confused especially there was a gap between original and new in BG505_to_HXB2.csv (e.g. site: 141, 142l isite:142, 151).

What is "not sequential unbroken integers"? How to get the isite in SARS2?

Thx in advance.

Codes here:

# load data
toydata = pd.read_csv("toydata.csv")

# logo plot check 
fig, ax = dmslogo.draw_logo(toydata.query('show_site'),
                            x_col='site',
                            letter_col='mutation',
                            letter_height_col='escape_score',
                            xtick_col='wt_site',
                            title='AZD8895',
                            addbreaks=False)

# line plot failed

fig, ax = dmslogo.draw_line(toydata,
                            x_col='site', # how to get the isite in SARS2?what is "not sequential unbroken integers"?
                            height_col='tot_escape_score',
                            xtick_col='site',
                            show_col='show_site',
                            title='AZD8895',
                            widthscale=2)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/mv/v7pv40mn6d3gwx8g563lpclm0000gn/T/ipykernel_14414/3092297124.py in <module>
----> 1 fig, ax = dmslogo.draw_line(toydata,
      2                             x_col='site', # how to get the isite in SARS2?what is "sequential unbroken integers"?
      3                             height_col='tot_escape_score',
      4                             xtick_col='site',
      5                             show_col='show_site',

~/anaconda3/envs/SARS2_RBD_Ab_escape_maps/lib/python3.8/site-packages/dmslogo/line.py in draw_line(data, x_col, height_col, height_col2, xtick_col, show_col, xlabel, ylabel, title, color, color2, show_color, linewidth, widthscale, heightscale, axisfontscale, hide_axis, ax, ylim_setter, fixed_ymin, fixed_ymax)
    162     if (xlen != data[x_col].nunique()) or any(list(range(xmin, xmax + 1)) !=
    163                                               data[x_col].unique()):
--> 164         raise ValueError('`x_col` not sequential unbroken integers')
    165 
    166     if len(data[x_col]) != len(data[x_col].unique()):

ValueError: `x_col` not sequential unbroken integers

OS: macOS Catalina 10.15.7
Python: 3.8.12
dmslogo: 0.6.2

@jbloom
Copy link
Member

jbloom commented Dec 31, 2021

The line plot requires x_col to have sequential unbroken numbers, because the line plot draws a value for every site. The logo plot does not require this because it can break the axis to just show certain sites of interest.

The x_col (or isite) column can just be any index that goes 1, 2, 3, ... so on. If you are using a protein that is already numbered that way, then it is just the site. But some proteins are no longer sequentially numbered. For instance, Omicron has some indels in the NTD but is still normally numbered using Wuhan-Hu-1 site numbering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants