Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP retrieval proposal #747

Open
wants to merge 85 commits into
base: main
Choose a base branch
from
Open

HTTP retrieval proposal #747

wants to merge 85 commits into from

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented Dec 9, 2024

This is a proposal to add HTTP retrieval to Boxo. The current state is highly WIP, but I successfully retrieved something over HTTP, so posting to initiate a discussion over the approach and if we want to pursue it until the end.

Approach

The high-level idea is that most of what lives in bitswap/client is actually an "exchange" implementation, with the only real "Bitswap" thing being that bitswap/network sends HAS/GET requests over bitswap-protocol streams. As such, we should be able to complement bitswap/network with an HTTP-retrieval implementation which, instead of fetching things over the bitswap protocol, calls HTTP endpoints as indicated by the provider's /http addresses entries.

Note that conceptually at least, this is not adding HTTP retrieval into bitswap, but promoting most of the bitswap code to be a reference "Exchange" implementation, which is re-usable for different retrieval protocols (bitswap, http...). That is, we would be talking of an "exchange network" component and not a "bitswap network" component. Renames to this extent are still missing.

Implementation

In order to introduce an http-retrieval "exchange network" we need to:

  • Know when something should be retrieved via HTTP - that is, an item has an /http provider.
  • Use HTTP network for that.

To this end:

  • We have a router which select the http-network or the bitswap-network (or both) based on the existance of /http addresses in the peerstore of the given peer.
  • We have implemented an http-network as a PoC that performs GET requests to /http endpoints when handling a WANT.

image

In my tests plugging it to Kubo, the http-network can be used to retrieve content from a gateway over http. 🥳

The main advantange to this approach is that it is relatively clean to incorporate to the codebase, and keeps most of the code untouched, without having to duplicate any of the complex areas.

Challenges

  • Connectivity tracking is not implemented yet and we will have to see to what extent it can be implemented (I'm guessing we can plug into the TCP dialer directly).
  • Options like timeouts etc. are not implemented
  • We use a single HTTP client rather than a pool
  • Of course testing is fully lacking.

Bitswap places a lot of importance on managing connectivity events to peers. We avoid requesting things from peers that have not signaled connectivity, we clean peers that have disconnected and re-queue things for peers that disconnect. Thus it seems we must support http-connectivity events. When a libp2p peer connects for bitswap, we know that the connection is setup, handshake has been performed and protocol negotiation has happened. For HTTP these things may not exist so we need to define what means "Connected" (i.e. in the case of https it would mean we have completed SSL handshakes).

Apart from that, the question is what are the elements in the current bitswap/client stack that do not apply to HTTP (peerqueues, messagequeues, broadcast, wantsending, prioritization etc.)... and why not? What if a peer disconnects from bitswap but not from http or vice-versa? What if Latency is much worse for bitswap than for http? Perhaps this is all logic for the network-router to know how to choose which network to use to send messages.

Otherwise perhaps it is not possible to have a satisfactory implementation this way and we need to start thinking what to copy-paste into a separate "http-exchange" (at least the client part).

Related: #608

@hsanjuan hsanjuan self-assigned this Dec 9, 2024
@hsanjuan hsanjuan requested a review from a team as a code owner December 9, 2024 19:44
lidel
lidel previously requested changes Dec 13, 2024
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hsanjuan, would be extremely nice if we can pull it off with such small set of changes.

Once we have HTTP basics like user-agent, status code metrics, 503/429/Retry-After (details inline), this is worth testing on Rainbow staging (do A/B test with bitswap-only box and bitswap+http).

ps. Whatever we do, HTTP should be opt-in, with a big EXPERIMENTAL warning.

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 57.08447% with 630 lines in your changes missing coverage. Please review.

Project coverage is 60.29%. Comparing base (9e4f046) to head (bb9597d).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
bitswap/network/httpnet/httpnet.go 54.75% 185 Missing and 10 partials ⚠️
bitswap/network/router.go 0.00% 142 Missing ⚠️
bitswap/network/httpnet/msg_sender.go 70.32% 98 Missing and 13 partials ⚠️
bitswap/network/httpnet/pinger.go 15.57% 102 Missing and 1 partial ⚠️
bitswap/network/httpnet/cooldown.go 65.57% 20 Missing and 1 partial ⚠️
bitswap/network/bsnet/ipfs_impl.go 48.71% 20 Missing ⚠️
bitswap/network/http_multiaddr.go 77.90% 13 Missing and 6 partials ⚠️
bitswap/network/httpnet/metrics.go 87.95% 10 Missing ⚠️
bitswap/message/message.go 55.55% 3 Missing and 1 partial ⚠️
bitswap/network/httpnet/request_tracker.go 95.38% 2 Missing and 1 partial ⚠️
... and 1 more

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #747      +/-   ##
==========================================
- Coverage   60.55%   60.29%   -0.27%     
==========================================
  Files         244      253       +9     
  Lines       31136    32563    +1427     
==========================================
+ Hits        18855    19634     +779     
- Misses      10604    11216     +612     
- Partials     1677     1713      +36     
Files with missing lines Coverage Δ
bitswap/client/client.go 82.69% <100.00%> (-2.08%) ⬇️
bitswap/client/internal/peermanager/peermanager.go 91.79% <100.00%> (-0.07%) ⬇️
bitswap/network/bsnet/metrics.go 100.00% <100.00%> (ø)
bitswap/network/bsnet/options.go 50.00% <ø> (ø)
bitswap/network/connecteventmanager.go 88.54% <100.00%> (+2.29%) ⬆️
bitswap/server/server.go 56.85% <100.00%> (+0.80%) ⬆️
bitswap/testinstance/testinstance.go 86.44% <ø> (ø)
bitswap/testnet/peernet.go 38.46% <100.00%> (ø)
examples/bitswap-transfer/main.go 41.21% <ø> (ø)
...uting/providerquerymanager/providerquerymanager.go 87.53% <100.00%> (-0.49%) ⬇️
... and 11 more

... and 6 files with indirect coverage changes

@hsanjuan hsanjuan force-pushed the http-retr2 branch 2 times, most recently from 5a73303 to c6a1b06 Compare January 16, 2025 17:54
@hsanjuan hsanjuan requested a review from a team January 16, 2025 17:54
Copy link
Contributor Author

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review.

Copy link
Contributor

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, Bitswap is the name of the protocol (and client behaviour) and it can use either HTTP or libp2p to communicate with remote peers. But then HTTP servers don't exactly follow Bitswap spec since they don't comply with CANCEL messages (see comment).

My suggestion would be to name the folders network/libp2p and network/http.

@hsanjuan
Copy link
Contributor Author

Note about the two scenarios regarding peerIDs:

  • A: The HTTP endpoint is a separate, unique PeerID, different from bitswap: In this model (web3storage), the bitswap endpoint counts as a different peer altogether, and therefore the bitswap endpoint gets a separate p2p connection, wantlist etc. In this case, even if both endpoints belong to the same provider, they will both get WANT-discovery requests and they are considered fully different providers though the system.

  • B: The HTTP endpoint is the same peerID as bitswap. Assuming the provider records contain bitswap and HTTP entries under the same peer ID (something that Kubo could do for example), the network router will prioritize HTTP for all operations. Other peers can speak bitswap to us, and our bitswap server can send responses with SendMessage(), but we will default to HTTP endpoints for wantlists, latency, pings, disconnects... This means we will not be using the Bitswap client from our side, we will not be sending bitswap traffic to other peers. Our server will still be working and responding to bitswap requests. Other-peers' bitswap servers, however, will not learn about our wantlists via bitswap if they offer an HTTP endpoint and we don't attempt to establish a libp2p connection at all. In an ideal scenario where all Kubo nodes in the network offer an HTTP endpoint, we can imagine no bitswap traffic at all.

Resolving A issues imply:

  • Identify that two peer IDs correspond to the same provider.
  • Prioritize the HTTP-peerID
  • Failover to the bitswap peerID when HTTP fails (?)
  • This needs to be done at the Routing layer possibly, but it is difficult since there is no indication that two providers are the same, other than perhaps having matching DNS.

Resolving B issues imply:

  • Ensure that bitswap server is initialized with a bitswap network, for safety: there is no reason for the bitswap server to use the network-router. DisconnectFrom() should close p2p streams, rather than wild-guessing if we should do an HTTP cleanups. (Currently DisconnectFrom() is only called from the Server, but still).
  • We could add logic to fail-over to bitswap when HTTP fails. It is easy for the initial Connect(). It is trickier when HTTP worked for some content records and errored badly for others. We need to be careful to store bitswap addresses in the peerstore, and leave them when deleting http addresses in the case of errors. There are cases as well when Connect() works but retrieval fails. How do we know that on the next Connect() we should not be attempting HTTP ? The connect/disconnect logic, including the results from the message sender needs to be fine-tuned.
  • We cannot Connect() over both bitswap and HTTP at the same time, since this can trigger competing connection-manager events, since everything else just cares about the peerID. So a bitswap failure would stop HTTP-wantlists for that peer.

@hsanjuan
Copy link
Contributor Author

Thank you everyone for the reviews! I am addressing comments and resolving as I go, but if I don't react to anything let me know (there are many comments, and replies to comments attached to outdated code etc).

This and subsequent commits introduce an httpnet module at what is known as
the "bitswap network layer". The bitswap network layer connects bitswap-peers,
sends bitswap messages and receives responses.

Bitswap messages are basically a wantlist, a list of CIDs that should be sent
if available.

httpnet does the same, except instead of sending the bitswap message over
bitswap, it triggers http requests for the requested blocks. httpnet is a
drop-in addon so that we can request blocks over http, and not only via bitswap.

As httpnet is a network, it benefits from all existing wantlist management
logic. Any http/2 endpoint should benefit from streamlined requests on a
single http connection. A router-network ensures that messages are correctly
handled by bitswap or by http requests depending on what the peers are
advertising. HTTP requests are given priority in the presence of both.

Here are some of the httpnet features:

* Peers are marked as Connected when they are able to handle http requets.
* Peers are marked as Disconnected when http requests fail repeatedly (MaxRetries).
* Server errors trigger backoffs preventing more requests to happen to the same
  url for a period (Retry-After header or configuration value)
* We support several urls per peer, meaning a peer can provide alternative
  http endpoints which are tried based on number of failures or existing cooldowns.
* We translate HAVE requests to HTTP-HEAD requests and BLOCK requests to HTTP-GETs
* We support cancellations: ongoing or soon to happen requests for a CID
  can be cancelled using a "cancel" entry in the wantlist.
* We record latency information for peers by pinging regularly.
* We discriminate between different errors so that we know whether to
  move to the next block in a wantlist, or to retry with a different url,
  or to completely abort.
* Options to configure user-agent, max retries etc. are supported.
@hsanjuan
Copy link
Contributor Author

Missing:

  • Collect metrics using labels. Set endpoint metric as label too to have stats for different http providers. Pending Feat: Add CounterVec type. go-metrics-interface#19
  • Clarify HTTP/3, udp
  • Clarify acceptable responses for Connect probe.
  • Check code paths and potentially add some more tests
  • Decide if we remove request_tracker.go altogether or not.

Before: when a url is in cooldown, we sleep for the rest of the cooldown and
then proceed with the request.

Now: we return an error and try with the next url until no more to try.

The reason is we should not block a worker. We can avoid contacting servers during cooldowns, but we cannot schedule requests to be tried later or have workers waiting.

Scheduling is a footgun as it requires to leave some requests hanging for later and those queues will need to have a limit.

Blocking prevents usage of resources by others, and also prevents the message sender from returning as soon as possible with whatever results it obtained.
@hsanjuan hsanjuan dismissed lidel’s stale review February 26, 2025 15:04

comments addressed mostly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants