问题:报错:return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Job failed with org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 22.0 failed 4 times, most recent failure: Lost task 0.3 in stage 22.0 (TID 57, had
oop03, executor 2): UnknownReason

解决:因为语句中collect_set(named_struct))顺序有错

阅读全文

问题:hive的字段是array<struct>类型,如何将数据转换成字符串?

解决:一步步拆解

方法:
hive字段

`page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '页面访问统计'

数据生成方法

select
    mid_id,
    collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats
from
(
    select
        mid_id,
        page_id,
        count(*) page_count,
        sum(during_time) during_time
    from dwd_page_log
    where dt='2020-06-14'
    group by mid_id,page_id
)t2
group by mid_id

阅读全文

问题:npm如何更换源?

方法:

1、查看当前源

npm config get registry

2、更换源

npm config set registry http://registry.npmmirror.com

阅读全文

问题:list如何保证排序不变去除重复元素

方法:

a = [2,3,2,4,2,1,2]
一般去重方法
list(set(a))  # 输出 [1, 2, 3, 4] 这样会出现排序错乱
[*dict.fromkeys(a)]  # 输出 [1, 2, 3, 4] 这样会出现排序错乱
sorted(set(a), key=a.index)  # 输出 [2, 3, 4, 1]  正确,但数据量大时会卡
from collections import OrderedDict
[*OrderedDict.fromkeys(a)]   # 输出 [2, 3, 4, 1]  正确,推荐

阅读全文

问题:redis如何删除模糊匹配到的keys?

解决:配合xargs进行删除操作

方法:

./redis-cli -h [$Addr] -a [$Password] -p [$Port] -n [$db] keys "[$Key]*" | xargs ./redis-cli -h [$Addr] -a [$Password] -p [$Port] -n [$db] del

例如:

redis-cli -n 1 keys "school*" | xargs redis-cli -n 1 del

阅读全文