| Ilya Zakharevich on Mon, 24 Jun 2024 01:00:52 +0200 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: ABI of “return objects on the bottom of the stack, continuously” (Re: Two stacks at once) |
On Sun, Oct 29, 2023 at 04:08:32PM -0700, Ilya Zakharevich wrote:
> The quoted message was written ∼19 years ago. — And then I found a
> solution “quite soon” — but could not find toits to write this down.
>
> The old discussion is on
> https://pari.math.u-bordeaux.fr/archives/pari-dev-0412/msg00016.html
The quoted message is on
https://pari.math.u-bordeaux.fr/archives/pari-dev-2310/msg00018.html
It discusses how to avoid gerepile()ing (cheaply!). (A copy at end.)
⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ New development
I found
https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html
Very short summary:
With contemporary memory architectures, the price of ONE memory access
to “a region of random access memory of size S” is not O(1), but
O(√S). (With pretty good precision!)
What is changed by this:
My initial thought:
The price of gerepile()ing is O(S) where S is the size of the
object. To create this object one needs ≥O(S) time, so optimizing
away gerepile() may only improve a constant, and only when “inside”
a linear-time algorithm.
(Of course, the improvement of this constant may be rather big
if one needs a long change of gerepile()s.)
My current thought:
The price of gerepile() may be O(S√S) if the object is very “mixed
up” in its memory representation. On the other hand, if CREATION
of this object was “much more memory-local”, then the time to
create this object might still have been O(S).
CONCLUSION: without further investigation, one cannot exclude a
possibility of cases where removing gerepile() would make a
super-linear-time algorithm into a linear-time.
Hope this helps,
Ilya
> > On Thu, Dec 16, 2004 at 11:49:19PM +0100, Bill Allombert wrote:
> > > That said, the idea of several stacks has a lot of merit, but I don't
> > > think we can retrofit it at this stage without major though.
>
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ grepile() is considered harmful
>
> I remind that the idea was to (almost?) completely eliminate
> grepile() — which is the most confusing — and thoroughly
> unneeded and wasteful — part of the implementation of PARI.
>
> The **need** for grepile() comes from the PARI’s ABI, whose idea is
> that
>
> [[[ objects returned from a subroutine call are allocated on stack ]]]
>
> Moreover, these returned objects should better (for the purpose of
> efficiency of the stack usage) be allocated continuously. Likewise,
> they should better be at the bottom of the stack.
>
> However, the components of recursively built objects are often
> constructed using OTHER ancillary subroutine calls. After such a
> call to construct (say) the 3rd component, the stack looks like
>
> BEFORE … COMPONENT1 COMPONENT2 RESULT-OF-CALL
>
> Now we may need to post-process the RESULT-OF-CALL, and when
> COMPONENT3 is finally returned by yet another ancillary call, the
> stack becomes
>
> BEFORE … COMPONENT1 COMPONENT2 RESULT-OF-CALL COMPONENT3
>
> — and this means that our 3 components are now not adjacent.
>
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ The proposed solution of 2004
>
> Use two stacks alternately: when our caller expects our result to be
> on STACKα, we call our ancillary routines when being switched to
> STACKβ, and only the FINAL ancillary call during the calculation of
> COMPONENT3 is made switching back to STACKα.
>
> Of course, there is a disadvantage: this may IN SOME CASES result in a
> waste of memory: it may happen that we need a lot of memory on
> STACKβ — while STACKα has a lot of unused memory.
>
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ WORKAROUND — of about 2005 ;—(
>
> Have both stacks use the same chunk of memory! To avoid collisions,
> one of them should grow-down-from-top, the other one
> grow-up-from-bottom.
>
> ⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜⁜ IMPLEMENTATION
>
> • Have a new variable stack_dir.
> • Make the stack allocation routines pay attention to stack_dir.
> • (As a stopgap measure) make gerepile() (and friends) pay attention
> to stack_dir.
> • Make the starting value for stack_dir to be settable at the start
> of runtime.
>
> After this, check that the current code works with the “non-standard”
> value of stack_dir. If this ACTUALLY turns out to be feasible:
>
> • Add a new function flip_stack_dir().
> • Use this function in new code (as described above).
> • (Leisurely,) insert flip_stack_dir() in suitable places in old code
> and remove gerepile() code.
>
> The bookkeeping data for the stack is currently
> stack_bot avma stack_top
> (if I did not mix up the names bot/top!). avma grows when one
> allocates objects on stack. The free area is between avma and
> stack_top.
>
> The proposed data for the stack is
> stack_min avma_min avma_max stack_max
> The free area is between avma_min and avma_max. Depending on
> stack_dir, avma is aliased to one of avma_min and avma_max, and
> stack_top is aliased to the other one of these two.
>
> (The code taking into account stack_bot needs to be rewritten.)
>
> (One way to do this is that flip_stack_dir() flips stack_dir, and
> exchanges avma and stack_top.)
>
> Hope this helps,
> Ilya