Thanks for the help , the same calculation for settings 10000 took a 20min time on my laptop with my post-processing script , now I do it in 1 min using lammps . Very grateful , copy the code in case anyone has the same concern:
variable cuad_vx atom vx*vx
compute sum_cuad_vx all reduce sum v_cuad_vx
compute sum_vx all reduce sum vx
compute mean_vx all reduce ave vx
variable std_vx equal sqrt((c_sum_cuad_vx-2c_mean_vxc_sum_vx+{N}*(c_mean_vx)^(2))/({N}-1))
fix 1 all ave/time 1 1 1 v_std_vx file std.dump mode scalar ave running overwrite
${N}=Number of particles