Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] executor would stuck and the task would never stop #697

Open
lianneli opened this issue Dec 17, 2024 · 1 comment
Open

[BUG] executor would stuck and the task would never stop #697

lianneli opened this issue Dec 17, 2024 · 1 comment

Comments

@lianneli
Copy link

lianneli commented Dec 17, 2024

Describe the bug
Depending on conditions below, Blaze executor will stuck and the task will never stop, which will get trouble.

To Reproduce

  1. the sql task:
with 
ao_data as (
    SELECT
        vender.vender_id
        ,ao.shop_code as store_code
        ,ao.goods_code as matnr
    from (select * from blaze_t.mid_dp_snap  where dt = '20241212' and yn=1 and biz_type=0 and replenish_type in(1,2,4) and  range_ind  =1 ) ao
    left join blaze_t.dv vender on ao.mt=vender.mart_code
    union ALL
    SELECT
        vender.vender_id
        ,ao_dc.shop_code as store_code
        ,ao_dc.gcode as matnr
    from (select * from blaze_t.mid_dc_snap where dt = '20241212' and yn=1 and biz_type=0 and replenish_type in(1,2,4) and shop_code in ('701') and mart='aaa6' and  range_ind  =1) ao_dc
    inner join blaze_t.dv vender on ao_dc.mart=vender.mart_code
)

select
  d.matnr as matnr
  ,d.status_code
  ,d.status_name
  ,d.delivery_type
  ,d.supplier_code
  ,d.supplier_name
  ,d.sell_type
  ,d.group_kvi_1
  ,d.group_kvi_2
  ,d.group_kvi_3
  ,'20241212' as dt
FROM
   blaze_t.sw d
left join ao_data ao on d.vender_id = ao.vender_id and d.store_code = ao.store_code and d.matnr = ao.matnr
DISTRIBUTE BY
    dt, CAST(RAND() * 2 AS INT)
  1. ENV
  • executor:1 (2core), memory=4g, memoryOverhead=4g; driver:1(1core),memory=2g
  • Spark on k8s, Spark 3.3.4
  • the most important condition: the one SIZE of blaze_t.mid_dp_snap's data file need related 180M (data generation class is below)
  1. tables and datas
CREATE TABLE blaze_t.mid_dp_snap (
    id BIGINT,
    mt STRING,
    goods_code STRING,
    shop_code STRING,
    store_id BIGINT,
    sku_id BIGINT,
    status STRING,
    dtype INT,
    biz_type INT,
    tcategory STRING,
    scategory STRING,
    category STRING,
    osc STRING,
    sc STRING,
    replenish_type INT,
    tq DECIMAL(38, 10),
    rquantity DECIMAL(38, 10),
    rvalue DECIMAL(38, 10),
    sdisplay INT,
    says INT,
    ustock INT,
    minterval BIGINT,
    mindisplay DECIMAL(38, 10),
    yn INT,
    update_id STRING,
    modified STRING,
    create_id STRING,
    created STRING,
    maxdisplay DECIMAL(38, 10),
    rtype BIGINT,
    rbatch BIGINT,
    pdisplay DECIMAL(38, 10),
    displayt BIGINT,
    mquantity DECIMAL(38, 10),
    omq DECIMAL(38, 10),
    odq DECIMAL(38, 10),
    ddays BIGINT,
    dfactor DECIMAL(38, 10),
    sdays BIGINT,
    dm DECIMAL(38, 10),
    do DECIMAL(38, 10),
    ss DECIMAL(38, 10),
    pd DECIMAL(38, 10),
    pi STRING,
    ei STRING,
    rdt STRING,
    red STRING,
    rgcode STRING,
    range_ind INT,
    dt STRING
) USING parquet PARTITIONED BY (dt) ;

-- the data can be generated by java class below

CREATE TABLE blaze_t.mid_dc_snap (
    id BIGINT,
    mart STRING,
    gcode STRING,
    gname STRING,
    shop_code STRING,
    sname STRING,
    sid BIGINT,
    skuid BIGINT,
    status STRING,
    dtype INT,
    biz_type INT,
    tcategory STRING,
    scategory STRING,
    category STRING,
    tcat_name STRING,
    scat_name STRING,
    cat_name STRING,
    supplier_code STRING,
    supplier_name STRING,
    os_code STRING,
    os_name STRING,
    replenish_type INT,
    tquantity DECIMAL(38, 10),
    rquantity DECIMAL(38, 10),
    sfactor DECIMAL(38, 10),
    msafety_stock DECIMAL(38, 10),
    bunit STRING,
    ounit STRING,
    o_type INT,
    yn INT,
    update_id STRING,
    modified STRING,
    create_id STRING,
    created STRING,
    range_ind INT,
    dt STRING
) USING parquet PARTITIONED BY (dt) ;

insert into blaze_t.mid_dc_snap values (1828949,'aaa6','1573','1660','701','8921',8,5671,'2',7,0,'2644','5244','4424','8347','8388','1780','7798','1124','8617','9605',2,5,8,5,2,'2','3',7,1,'8448','2024-12-12 22:31:29','9807','2024-12-12 22:31:29',1,'20241212');


CREATE TABLE blaze_t.dv (
    vender_id BIGINT,
    vn STRING,
    mart_code STRING,
    status INT,
    create_time STRING,
    update_time STRING,
    lcurrency STRING,
    tcurrency STRING,
    gcurrency STRING,
    tz STRING,
    zname STRING
) USING parquet ;

insert into blaze_t.dv values (470,'test-470','aaa1',0,'2024-12-12 22:31:29','2024-12-12 22:31:29','c2','8d','c3','3d','54');
insert into blaze_t.dv values (1540,'test-1540','aaa2',1,'2024-12-12 22:31:29','2024-12-12 22:31:29','24','40','d6','23','79');
insert into blaze_t.dv values (4270,'test-4270','aaa3',2,'2024-12-12 22:31:29','2024-12-12 22:31:29','e5','8a','d4','a3','43');
insert into blaze_t.dv values (6290,'test-6290','aaa4',7,'2024-12-12 22:31:29','2024-12-12 22:31:29','7c','12','e3','ae','fd');
insert into blaze_t.dv values (9550,'test-9550','aaa5',7,'2024-12-12 22:31:29','2024-12-12 22:31:29','f2','86','9f','c0','88');
insert into blaze_t.dv values (2520,'test-2520','aaa6',4,'2024-12-12 22:31:29','2024-12-12 22:31:29','ef','d9','58','ab','11');

CREATE TABLE blaze_t.sw (
    vender_id BIGINT,
    store_code STRING,
    matnr STRING,
    vname STRING,
    sname STRING,
    sid INT,
    inum STRING,
    wname STRING,
    oware_name STRING,
    oo_price DECIMAL(30, 10),
    oc_price DECIMAL(30, 10),
    status_code STRING,
    status_name STRING,
    sell_type INT,
    stype_name STRING,
    corder INT,
    creturn INT,
    csale INT,
    csale_return INT,
    cexhibit INT,
    create_time STRING,
    update_time STRING,
    wcreate_time STRING,
    sflag INT,
    sbanner STRING,
    delivery_type INT,
    supplier_code STRING,
    supplier_name STRING,
    group_kvi_1 INT,
    group_kvi_2 INT,
    group_kvi_3 INT,
    isupplier_code STRING,
    isupplier_name STRING,
    iitem_flag INT
) USING parquet ;

insert into blaze_t.sw values (49,'701','aaa6','123','123',123,'123','123','123',123,123,'123','123',123,'123',123,123,123,123,123,'123','123','123',123,'123',123,'123','123',123,123,123,'123','123',123);

blaze_t.mid_dp_snap's DATA:

package csv;

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Random;
import java.util.UUID;

public class GenerateData3cx {
  static Random random = new Random();

  public static void main(String[] args) {

    String outputFilePath = "/Downloads/mid_dp_snap.csv";

    String header = "id,mt,goods_code,shop_code,store_id,sku_id,status,dtype,biz_type,tcategory,scategory,category,osc,sc,replenish_type,tq,rquantity,rvalue,sdisplay,says,ustock,merval,mindisplay,yn,update_id,modified,create_id,created,maxdisplay,rtype,rbatch,pdisplay,displayt,mquantity,omq,odq,ddays,dfactor,sdays,dm,do,ss,pd,pi,ei,rdt,red,rgcode,range_ind,dt";

    String[] mt = new String[6];
    mt[0] = "aaa1";
    mt[1] = "aaa2";
    mt[2] = "aaa3";
    mt[3] = "aaa4";
    mt[4] = "aaa5";
    mt[5] = "aaa6";

    String[] shopCode = new String[3];
    shopCode[0] = "101";
    shopCode[1] = "501";
    shopCode[2] = "701";

    try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFilePath))) {
      writer.write(header);
      writer.newLine();

      int i = 1;
      for (; i <= 5000000; i++) {
        String builder = i + "," +
            mt[random.nextInt(6)] + "," +
            shopCode[random.nextInt(3)] + "," + // shop code
            getRandomId() + "," +
            getRandomId() + "," +
            getRandomId() + "," +
            getRandomId() + "," +
            random.nextInt(10) + "," +
            random.nextInt(2) + "," + // biz_type
            getRandomId() + "," +
            getRandomId() + "," +
            getRandomId() + "," +
            getRandomId() + "," +
            getRandomId() + "," +
            random.nextInt(5) + "," + // replenish_type
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            "1" + "," + // yn
            getRandomId() + "," +
            "2024-12-12 22:31:29" + "," +
            getRandomId() + "," +
            "2024-12-12 22:31:29" + "," + // created
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            UUID.randomUUID().toString() + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            random.nextInt(10) + "," +
            "20241212";

        writer.write(builder);
        writer.newLine();
      }

      System.out.println("处理完成,已成功写入新的CSV文件。");
    } catch (IOException e) {
      e.printStackTrace();
      System.out.println("文件读取或写入出现错误:" + e.getMessage());
    }
  }

  private static Integer getRandomId() {
    int result = 0;
    for (int i = 0; i < 6; i++) {
      result += random.nextInt(10) * (int) Math.pow(10, 3 - i);
    }
    return result;
  }
}

This Java Class will generate 5,000,000 rows of data in csv format, and you can insert them into mid_dp_snap. For my test result, the stuck will happen when one file is about 180M, so you may need repartition before write.

the stuck log:
img_v3_02hl_fba638d2-6257-4149-8d15-56b2b878a32g

I have searched the reason about a week, and finally found the question is stuck in rust code which I am not familiar with. So please help me, thanks.

@gy11233
Copy link
Collaborator

gy11233 commented Dec 17, 2024

Thank you for your feedback!

We have reproduced the issue according to the configuration and data settings (the mid_dp_snap table contains 5,000,000 rows), but the same stuck point did not occur.

If you have any other questions or need further assistance, please let us know.

Execution Plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants