CH uses blocking io. That is the probable cause for the weak copy performance when copying between two relatively slow medias.
In that scenario copy_time=read_time+write_time+overhead.
With asynchronous io, it should be possible to reduce the copying time by executing reads and writes in parallel which will result in copy_time=max(read_time, write_time) + read_time_of_first_chunk + write_time_of_last_chunk + overhead.
The prototype needs to be created to research the possibilities for async io.