Max Alekseyev on Wed, 01 Mar 2023 03:16:48 +0100
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: unable to load a large vector from a file
- To: Nicolas Mascot <email@example.com>
- Subject: Re: unable to load a large vector from a file
- From: Max Alekseyev <firstname.lastname@example.org>
- Date: Tue, 28 Feb 2023 21:15:01 -0500
- Cc: email@example.com
- Delivery-date: Wed, 01 Mar 2023 03:16:48 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677636937; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xvvK929j5BLLKaWYbtoRbAYgYEg/8OMRsKjcUQ/1aW0=; b=AjdjjQBxr1tiF3e/+IqzodOrphzL2vBoZWRpd3+yL38yZ5/FKWse8PWQEqqWfGCORv 9dVpKcUYw9u9WOS7iZ4Ry8tdd5pRExJtHecLxAdOFDo8sq0TbiM5x1ETgWCeyl4bk3ge eDIrThxniB+3QaaEdrj2WXJ0dwg6uQKIyr0y9RNcf6qfLtJD5e6I1QCQJEK9VoTbGvTE NP72yaXe8+o/q+iEOLDp2yhk7KcaUMMmndnnNqWQxQtDuPMQGUQdaJVEJA6b4KnA6w88 FSGWjt0zQmWatO1RBaMBumFIcBaOJtRYyXQPURInjooWRx5Epzz4h+I4WQDaAX4G/yVr XP/g==
- In-reply-to: <firstname.lastname@example.org>
- References: <CAJkPp5N+U6Fw=5URL4TsLC_rKMwPxWV30wNG25E2OXOKq1Fyag@mail.gmail.com> <email@example.com>
My data fits Vecsmall() and your suggestion does work nicely. However, I don't think the file size is the issue here. While my original file has size ~10GB, the one created by writebin() is ~8GB, which is not much smaller. So the file size alone does not explain the failure of read(). I got an impression that read() on binary files does a better job on parsing the data and properly allocating memory. Or maybe it's the Vecsmall() type that saved the day. read() was also quite fast here.
Anyway, it'd be helpful to understand why my initial approach did not work.
You could try to use writebin() instead of write(); this usually results
in smaller files.
If your integers are small enough to fit on a long (I am not 100% sure,
but I think that the range is -2^63 to 2^63-1), you would also save a
lot of space by converting your vector to a vecsmall ( = vector of small
If needed, you can then turn the result back into a (regular) vector
but in most cases that should not be necessary.
On 28/02/2023 19:30, Max Alekseyev wrote:
> I have a vector v of size 10^9 with moderately-sized integers, which I
> saved to a file with
> However, I'm unable to read it back. PARI/GP either runs out of memory
> (even with 96GB memory allocation!) or just killed by the system.
> I tried both
> v = read("myvector.txt");
> \r myvector.txt
> Is there a way around?
> PS. The resulting file size is just 10GB and so it should not be an
> issue to read it as a whole into memory and then parse it, but PARI/GP
> somehow fails to do that.