for a very long time, if a remote server responded to us with
a valid but unsuccessful (HTTP 4xx) response and the caller was the
`send_federation_request` function, we may find ourselves
with a warning message only containing the destination's
server name which was very unhelpful. the true error was
buried away in trace logs. this would primarily be noticed
with server key fetch requests from us.
conduit has been throwing away the ruma request error: https://gitlab.com/famedly/conduit/-/blame/next/src/utils/error.rs#L62
before: 2024-05-23T04:45:02.930224Z WARN router:{path=/_matrix/client/v3/publicRooms}:handle: conduit_api::client_server::directory: Failed to return our /publicRooms: matrix.org
after: 2024-05-23T05:05:02.435272Z WARN router:{path=/_matrix/client/v3/publicRooms}:handle: conduit_api::client_server::directory: Failed to return our /publicRooms: matrix.org: [401 / M_UNAUTHORIZED] Failed to find any key to satisfy: _FetchKeyRequest(server_name='your.server.name', minimum_valid_until_ts=1716440702337, key_ids=['ed25519:RQB3XPQX'])
Signed-off-by: strawberry <strawberry@puppygock.gay>
this matrix-react-sdk PR (and the cited sliding sync MSC)
says that they will intend on checking sliding sync support
from this unstable feature flag at /versions until the CORS
header stuff is specced
https://github.com/matrix-org/matrix-react-sdk/pull/12498
Signed-off-by: strawberry <strawberry@puppygock.gay>
Previously we were relying on NIX_CFLAGS_COMPILE, but this is not being
set in static devshells. A cleaner solution for complement would likely
be to build the tests in their own nix derivation instead of building
them in the devshell, but this change unblocks CI for now.
This is causing build failures on Mac:
> In file included from /tmp/nix-build-rocksdb-static-aarch64-apple-darwin-9.1.1.drv-0/source/memory/memory_allocator.cc:8:
> In file included from /tmp/nix-build-rocksdb-static-aarch64-apple-darwin-9.1.1.drv-0/source/memory/jemalloc_nodump_allocator.h:11:
> /tmp/nix-build-rocksdb-static-aarch64-apple-darwin-9.1.1.drv-0/source/port/jemalloc_helper.h:63:36: warning: unknown attribute '_rjem_malloc' ignored [-Wunknown-attributes]
> mallocx(size_t, int) JEMALLOC_ATTR(malloc) JEMALLOC_ALLOC_SIZE(1)
> ^~~~~~
> /nix/store/3bix0kzy670dyhhizri3dwb1qfj3sdpa-jemalloc-static-aarch64-apple-darwin-5.3.0/include/jemalloc/jemalloc.h:412:18: note: expanded from macro 'malloc'
> # define malloc je_malloc
> ^~~~~~~~~
> /nix/store/3bix0kzy670dyhhizri3dwb1qfj3sdpa-jemalloc-static-aarch64-apple-darwin-5.3.0/include/jemalloc/jemalloc.h:75:21: note: expanded from macro 'je_malloc'
> # define je_malloc _rjem_malloc
> ^~~~~~~~~~~~
> /nix/store/3bix0kzy670dyhhizri3dwb1qfj3sdpa-jemalloc-static-aarch64-apple-darwin-5.3.0/include/jemalloc/jemalloc.h:183:43: note: expanded from macro 'JEMALLOC_ATTR'
> # define JEMALLOC_ATTR(s) __attribute__((s))
Full build log at <https://girlboss.ceo/~strawberry/pb/ygJ3>. This is
likely fixable with patches to rocksdb, but not worth it since darwin is
only a dev platform.
CI is running `cargo build --all-features`, so we should be passing all
the features to nix as well.
The only thing this currently affects is the jemalloc_prof feature, but if
we add any non-default features that affect nix in the future they should
also be handled correctly now.
Dynamically-linked jemalloc doesn't work due to link-order issues, and we
want CI to be testing a static binary anyway since that's what we're
publishing in releases.
Some of the features affect nix dependencies, so we need to have a
full feature list available when constructing the nix derivation. This
incidentally fixes the bug where we weren't enabling jemalloc on rocksdb
in CI/devshells, because jemalloc is now a default feature. It does not
fix the more general class of that issue, where CI is performing an
`--all-features` build in a nix devshell built for default-features.
I am now passing `--no-default-features` to cargo, and having it use our
unified feature list rather than duplicating the unification inside cargo.
Without setting JEMALLOC_OVERRIDE, we end up linking to two different
jemalloc builds. Once dynamically, as a transitive dependency through
rocksdb, and a second time to the static jemalloc that tikv-jemalloc-sys
builds.
Previously, we were returning redundant member count updates or encrypted
device updates from the /sync endpoint in some cases. The extra member
count updates are spec-compliant, but unnecessary, while the extra
encrypted device updates violate the spec.
The refactor necessary to fix this bug is also necessary to support
filtering on state events in sync.
Details:
Joined room incremental sync needs to examine state events for four
purposes:
1. determining whether we need to return an update to room member counts
2. determining the set of left/joined devices for encrypted rooms
(returned in `device_lists`)
3. returning state events to the client (in `rooms.joined.*.state`)
4. tracking which member events we have sent to the client, so they can
be omitted on future requests when lazy-loading is enabled.
The state events that we need to examine for the first two cases is member
events in the delta between `since` and the end of `timeline`. For the
second two cases, we need the delta between `since` and the start of
`timeline`, plus contextual member events for any senders that occur in
`timeline`. The second list is subject to filtering, while the first is
not.
Before this change, we were using the same set of state events that we are
returning to the client (cases 3/4) to do the analysis for cases 1/2.
In a compliant implementation, this would result in us missing some
relevant member events in 1/2 in addition to seeing redundant member
events. In current conduwuit this is not the case because the set of
events that we return to the client is always a superset of the set that
is needed for cases 1/2. This is because we don't support filtering, and
we have an existing bug[1] where we are returning the delta between
`since` and the end of `timeline` rather than the start.
[1]: https://github.com/girlbossceo/conduwuit/issues/361
Fixing this is necessary to implement filtering because otherwise
we would start missing some member events for member count or encrypted
device updates if the relevant member events are rejected by the filter.
This would be much worse than our current behavior.
This cache can serve invalid responses, and has an extremely low hit
rate.
It serves invalid responses because because it's only keyed off
the `since` parameter, but many of the other request parameters also
affect the response or it's side effects. This will become worse once we
implement filtering, because there will be a wider space of parameters
with different responses. This problem is fixable, but not worth it
because of the low hit rate.
The low hit rate is because normal clients will always issue the next
sync request with `since` set to the `prev_batch` value of the previous
response. The only time we expect to see multiple requests with the same
`since` is when the response is empty, but we don't cache empty
responses.
This was confirmed experimentally by logging cache hits and misses over
15 minutes with a wide variety of clients. This test was run on
matrix.computer.surgery, which has only a few active users, but a
large volume of sync traffic from many rooms. Over the test period, we
had 3 hits and 5309 misses. All hits occurred in the first minute, so I
suspect that they had something to do with client recovery from an
offline state. The clients that were connected during the test are:
- element web
- schildichat web
- iamb
- gomuks
- nheko
- fractal
- fluffychat web
- fluffychat android
- cinny web
- element android
- element X android
Fixes: #336