-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP retrieval proposal #747
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hsanjuan, would be extremely nice if we can pull it off with such small set of changes.
Once we have HTTP basics like user-agent, status code metrics, 503/429/Retry-After (details inline), this is worth testing on Rainbow staging (do A/B test with bitswap-only box and bitswap+http).
ps. Whatever we do, HTTP should be opt-in, with a big EXPERIMENTAL warning.
279d563
to
7e1160b
Compare
5a73303
to
c6a1b06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, Bitswap is the name of the protocol (and client behaviour) and it can use either HTTP or libp2p to communicate with remote peers. But then HTTP servers don't exactly follow Bitswap spec since they don't comply with CANCEL
messages (see comment).
My suggestion would be to name the folders network/libp2p
and network/http
.
Note about the two scenarios regarding peerIDs:
Resolving A issues imply:
Resolving B issues imply:
|
7f3b0e1
to
aa1f711
Compare
Thank you everyone for the reviews! I am addressing comments and resolving as I go, but if I don't react to anything let me know (there are many comments, and replies to comments attached to outdated code etc). |
This and subsequent commits introduce an httpnet module at what is known as the "bitswap network layer". The bitswap network layer connects bitswap-peers, sends bitswap messages and receives responses. Bitswap messages are basically a wantlist, a list of CIDs that should be sent if available. httpnet does the same, except instead of sending the bitswap message over bitswap, it triggers http requests for the requested blocks. httpnet is a drop-in addon so that we can request blocks over http, and not only via bitswap. As httpnet is a network, it benefits from all existing wantlist management logic. Any http/2 endpoint should benefit from streamlined requests on a single http connection. A router-network ensures that messages are correctly handled by bitswap or by http requests depending on what the peers are advertising. HTTP requests are given priority in the presence of both. Here are some of the httpnet features: * Peers are marked as Connected when they are able to handle http requets. * Peers are marked as Disconnected when http requests fail repeatedly (MaxRetries). * Server errors trigger backoffs preventing more requests to happen to the same url for a period (Retry-After header or configuration value) * We support several urls per peer, meaning a peer can provide alternative http endpoints which are tried based on number of failures or existing cooldowns. * We translate HAVE requests to HTTP-HEAD requests and BLOCK requests to HTTP-GETs * We support cancellations: ongoing or soon to happen requests for a CID can be cancelled using a "cancel" entry in the wantlist. * We record latency information for peers by pinging regularly. * We discriminate between different errors so that we know whether to move to the next block in a wantlist, or to retry with a different url, or to completely abort. * Options to configure user-agent, max retries etc. are supported.
Get rid of intermediary channel and WaitGroup and related goroutines.
Co-Authored-By: Marcin Rataj <[email protected]>
Missing:
|
Include 502 and 504s. Reduce retry counts. Add comments.
Metrics should now record the "interpreted" status code.
Accept 410 as a valid response.
Before: when a url is in cooldown, we sleep for the rest of the cooldown and then proceed with the request. Now: we return an error and try with the next url until no more to try. The reason is we should not block a worker. We can avoid contacting servers during cooldowns, but we cannot schedule requests to be tried later or have workers waiting. Scheduling is a footgun as it requires to leave some requests hanging for later and those queues will need to have a limit. Blocking prevents usage of resources by others, and also prevents the message sender from returning as soon as possible with whatever results it obtained.
This is a proposal to add HTTP retrieval to Boxo. The current state is highly WIP, but I successfully retrieved something over HTTP, so posting to initiate a discussion over the approach and if we want to pursue it until the end.
Approach
The high-level idea is that most of what lives in
bitswap/client
is actually an "exchange" implementation, with the only real "Bitswap" thing being thatbitswap/network
sends HAS/GET requests over bitswap-protocol streams. As such, we should be able to complementbitswap/network
with an HTTP-retrieval implementation which, instead of fetching things over the bitswap protocol, calls HTTP endpoints as indicated by the provider's/http
addresses entries.Note that conceptually at least, this is not adding HTTP retrieval into bitswap, but promoting most of the bitswap code to be a reference "Exchange" implementation, which is re-usable for different retrieval protocols (bitswap, http...). That is, we would be talking of an "exchange network" component and not a "bitswap network" component. Renames to this extent are still missing.
Implementation
In order to introduce an http-retrieval "exchange network" we need to:
/http
provider.To this end:
/http
addresses in the peerstore of the given peer./http
endpoints when handling a WANT.In my tests plugging it to Kubo, the http-network can be used to retrieve content from a gateway over http. 🥳
The main advantange to this approach is that it is relatively clean to incorporate to the codebase, and keeps most of the code untouched, without having to duplicate any of the complex areas.
Challenges
Bitswap places a lot of importance on managing connectivity events to peers. We avoid requesting things from peers that have not signaled connectivity, we clean peers that have disconnected and re-queue things for peers that disconnect. Thus it seems we must support http-connectivity events. When a libp2p peer connects for bitswap, we know that the connection is setup, handshake has been performed and protocol negotiation has happened. For HTTP these things may not exist so we need to define what means "Connected" (i.e. in the case of https it would mean we have completed SSL handshakes).
Apart from that, the question is what are the elements in the current
bitswap/client
stack that do not apply to HTTP (peerqueues, messagequeues, broadcast, wantsending, prioritization etc.)... and why not? What if a peer disconnects from bitswap but not from http or vice-versa? What if Latency is much worse for bitswap than for http? Perhaps this is all logic for the network-router to know how to choose which network to use to send messages.Otherwise perhaps it is not possible to have a satisfactory implementation this way and we need to start thinking what to copy-paste into a separate "http-exchange" (at least the client part).
Related: #608