Max Alekseyev on Wed, 01 Mar 2023 03:16:48 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: unable to load a large vector from a file


Hi Nicolas,

My data fits Vecsmall() and your suggestion does work nicely. However, I don't think the file size is the issue here. While my original file has size ~10GB, the one created by writebin() is ~8GB, which is not much smaller. So the file size alone does not explain the failure of read(). I got an impression that read() on binary files does a better job on parsing the data and properly allocating memory. Or maybe it's the Vecsmall() type that saved the day. read() was also quite fast here.
Anyway, it'd be helpful to understand why my initial approach did not work.

Regards,
Max

On Tue, Feb 28, 2023 at 2:41 PM Nicolas Mascot <mascotn@maths.tcd.ie> wrote:
Hi Max,

You could try to use writebin() instead of write(); this usually results
in smaller files.

If your integers are small enough to fit on a long (I am not 100% sure,
but I think that the range is -2^63 to 2^63-1), you would also save a
lot of space by converting your vector to a vecsmall ( = vector of small
integers):
writebin("myvector",Vecsmall(v));
If needed, you can then turn the result back into a (regular) vector
with Vec():
Vec(read("myvector"))
but in most cases that should not be necessary.

Best regards,
Nicolas

On 28/02/2023 19:30, Max Alekseyev wrote:
> I have a vector v of size 10^9 with moderately-sized integers, which I
> saved to a file with
> write("myvector.txt",v);
> However, I'm unable to read it back. PARI/GP either runs out of memory
> (even with 96GB memory allocation!) or just killed by the system.
> I tried both
> v = read("myvector.txt");
> and
> \r myvector.txt
>
> Is there a way around?
>
> PS. The resulting file size is just 10GB and so it should not be an
> issue to read it as a whole into memory and then parse it, but PARI/GP
> somehow fails to do that.
>
> Regards,
> Max