Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the result was computed repeat when in a search with two aggregations in one field #90

Open
hengfeiyang opened this issue Jan 19, 2022 · 3 comments

Comments

@hengfeiyang
Copy link

the search query like this:

query := bluge.NewBooleanQuery().AddMust(bluge.NewMatchAllQuery())
searchRequest := bluge.NewTopNSearch(1, query).WithStandardAggregations()
searchRequest.AddAggregation("my-agg-min", aggregations.Min(search.Field("rating")))
searchRequest.AddAggregation("my-agg-avg", aggregations.Avg(search.Field("rating")))
searchRequest.AddAggregation("my-agg-sum", aggregations.Sum(search.Field("rating")))

in this case, my-agg-sum result value large than the real value 3 multiple. equal to rating field appear times. it seems computed repeat.

@mschoch
Copy link
Member

mschoch commented Jan 19, 2022

Yes, you are correct, that if you compute multiple aggregations on the same field, the code we have to try and accumulate the list of field we care about will contain duplicates.

Do you know if this actually causes problematic behavior? Like, do we do extra work 3 times because it appears multiple times in that list?

I'm fine with the fix, but I think we should try to add a test case to illustrate the problematic behavior.

@mschoch
Copy link
Member

mschoch commented Jan 19, 2022

I did remember one more detail related to this.

One concern I had at the time I originally wrote this part, was that because aggregations can be nested, enumerating the fields required involves enumerating the fields required for all child aggregations as well. I didn't like the idea that we de-duplicate this list at every level, especially when we immediately merge it with another list, and the have de-duplicate that again. Based on that thinking, I figured, so what if this method returns duplicates, if any caller is going to use this list to do real work, they should do one de-duplication pass at the very end.

So, I'm pretty sure that's why this method does not do any de-duplication today. It would help to know more precisely how this current behavior is problematic, as maybe there is some other solution...

@hengfeiyang
Copy link
Author

i saw the flow on aggregation, one documentMatch will deeply flow in every nested aggregation, so one field should be compute one time, this is still a problem.
i fixed in my pull cetainly will de-duplicate at every level, i will try to find other better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants