KEFCore: troubleshooting

`SingletonOptionChanged` at startup

Symptom: An InvalidOperationException with message containing SingletonOptionChanged is thrown when creating a DbContext.

Cause: You are using UseInternalServiceProvider and two DbContext instances have been configured with different singleton options (e.g. different ApplicationId, BootstrapServers, or serialization types) on the same shared Service Provider.

Solution: Ensure all singleton options are identical across all DbContext instances that share the same Service Provider. See options for the full list. If you need different singleton options, use separate Service Providers.

`ClusterId currently not available`

Symptom: InvalidOperationException: ClusterId currently not available from <BootstrapServers>.

Cause: KEFCore resolves ClusterId by connecting to the Kafka broker when the Service Provider is first initialized. If the broker is not reachable at that point, the resolution fails.

Solution:

Verify the broker address in BootstrapServers is correct and reachable from the application host.
If you need to build the model outside a live cluster (e.g. in integration tests or standalone model inspection), use KEFCoreConventionSetBuilder.Build() or KEFCoreConventionSetBuilder.CreateModelBuilder() — these methods handle cluster suspension internally.

Topic name changed after namespace refactoring

Symptom: After moving entity classes to a different namespace, the application no longer reads existing data from the cluster, or creates new topics instead of reusing existing ones.

Cause: Without [Table] or [KEFCoreTopicAttribute], the topic name is derived from the EF Core entity type Name property which includes the full CLR namespace. Renaming the namespace changes the topic name.

Solution: Always decorate entity classes with [Table("name", Schema = "schema")] or [KEFCoreTopicAttribute("name")] to make the topic name independent of the CLR namespace. See conventions and migration for details.

`ComplexType must implement IEquatable or override Equals`

Symptom: InvalidOperationException at startup identifying a specific ComplexType class.

Cause: KEFCoreComplexTypeEquatableConvention detected a ComplexType that uses reference equality (the .NET default). KEFCore relies on value equality to detect changes — without it, two logically identical instances are treated as different, causing unnecessary Kafka writes.

Solution: Implement IEquatable<T> or override Equals(object) on the ComplexType:

[ComplexType]
public class Address : IEquatable<Address>
{
    public string Street { get; set; }
    public string City { get; set; }

    public bool Equals(Address other)
        => other != null && Street == other.Street && City == other.City;

    public override bool Equals(object obj) => Equals(obj as Address);
    public override int GetHashCode() => HashCode.Combine(Street, City);
}

If equality is guaranteed by other means, apply [KEFCoreIgnoreEquatableCheckAttribute] to suppress the check. See conventions.

`ApplicationId` conflict across processes

Symptom: Two processes sharing the same ApplicationId and cluster do not each have a complete view of all entity data — queries return partial results.

Cause: Apache Kafka™ Streams assigns partitions across all consumers sharing the same ApplicationId. Each process receives only a subset of partitions and therefore has an incomplete local state store.

Solution: Use a distinct ApplicationId for each process. The ApplicationId is a singleton option — all DbContext instances within the same process share it, but different processes must use different values. See options.

Post-`SaveChanges` synchronization timeout

Symptom: Operations after SaveChanges read stale data, or DefaultSynchronizationTimeout expires.

Cause: KEFCore waits for the Streams state store to catch up with the latest produced offset after each SaveChanges. If the store is under load or the timeout is too short, this wait expires.

Solutions:

Increase DefaultSynchronizationTimeout (in milliseconds) or set it to Timeout.Infinite to wait indefinitely.
Verify that event management is enabled for the affected entity (KEFCoreIgnoreEventsAttribute disables synchronization for that entity).
If synchronization is not needed (read-only consumers), set DefaultSynchronizationTimeout = 0 to disable it.

`StreamsManager` not starting / state errors

Symptom: InvalidOperationException mentioning Streams state (PENDING_ERROR, ERROR, NOT_RUNNING).

Cause: The Kafka Streams topology failed to start, often due to broker connectivity issues, incompatible StreamsConfig, or a previous unclean shutdown leaving corrupt RocksDB state.

Solutions:

Check broker connectivity and StreamsConfig.BootstrapServers.
If using UsePersistentStorage = true, the RocksDB state directory may be corrupt — delete it and let the store rebuild from the topics.
Check the application logs for the StreamsUncaughtExceptionHandler message which identifies the root cause.

`StreamsException: Fatal user code error in TimestampExtractor callback` — `NullPointerException: Cannot invoke "java.lang.Long.longValue()" because "retVal" is null`

Symptom: Kafka Streams reports a fatal error in the TimestampExtractor callback:

org.apache.kafka.streams.errors.StreamsException: Fatal user code error in TimestampExtractor callback
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Long.longValue()" because "retVal" is null

Cause: The JVM↔CLR callback invoked by the TimestampExtractor returns null instead of a long timestamp. This is part of a broader class of non-deterministic failures at the JVM↔CLR boundary under sustained call pressure, tracked in JCOBridgePublic#24. The root cause involves GC interactions between the JVM and CLR that are not fully mitigable with workarounds at the application level.

Mitigations (in order of effectiveness):

JCOBridge HPA edition — the definitive solution. The HPA (High Performance Application) edition of JCOBridge addresses the non-deterministic GC-boundary failures at the interop layer, eliminating this class of errors entirely under sustained load. See jcobridge.com for details.
Automatic recovery — the StreamsManager error handler already catches this exception and responds with REPLACE_THREAD, which restarts the affected stream thread automatically. For most workloads the recovery is transparent.
Disable event management per entity — removes the TimestampExtractor entirely for the affected entities, eliminating the error at the cost of real-time tracking and post-SaveChanges synchronization for those entities:
```
[KEFCoreIgnoreEventsAttribute]
[Table("HeavyEntity")]
public class HeavyEntity { ... }
```
Note that EnsureSynchronized will not be available for entities with event management disabled.

Note

Related issues: KEFCore#448, KNet#1058, JNet#856. All are manifestations of the same underlying JVM↔CLR boundary issue documented in JCOBridgePublic#24.

See conventions for how to configure event management per entity.

`EnsureSynchronized` never returns when using transactional producers

Symptom: After tx.Commit(), EnsureSynchronized blocks indefinitely or returns false even though all data was written correctly.

Cause: Kafka transactional producers write control records (commit/abort markers) to the topic in addition to data records. ListOffsets HEAD includes these markers — for example 5 data records produce offsets 0-4 plus a commit marker at offset 5, so ListOffsets returns 6. If EnsureSynchronized uses ListOffsets to set the expected offset, the local Streams store will never report offset 6 (the marker is a control record, not a data record), causing the synchronization wait to loop indefinitely.

Solution: KEFCore automatically skips the LatestOffsetForEntity call in EnsureSynchronized for entity types that belong to a transaction group — the expected offset is set by CommitPendingOffsets via PartitionOffsetWritten immediately after CommitTransaction(), using only the actual data record offsets. No user action required.

If you still see this issue: verify that isolation.level = read_committed is set in the StreamsConfig of the consuming application. Without it, the Streams consumer will not correctly process the commit marker and the local offset tracking may diverge.

Persistent storage state directory not found after upgrade

Symptom: After upgrading to a version that includes the StorageIdForTable change, the application rebuilds the entire state from Kafka topics instead of resuming from the existing RocksDB checkpoint.

Cause: The storage directory identifier now includes the ClusterId in addition to the topic name (Table_{topicName}_{clusterId}). Previous versions used only the topic name (Table_{topicName}). The existing RocksDB directory is not found under the new identifier, causing a full rebuild.

Solution: This is a one-time cost at the first startup after the upgrade. The state is rebuilt correctly from the Kafka topics. If the topic contains a large amount of data and rebuild time is a concern, plan the upgrade during a maintenance window.

Table of Contents

KEFCore: troubleshooting

SingletonOptionChanged at startup

ClusterId currently not available

Topic name changed after namespace refactoring

ComplexType must implement IEquatable or override Equals

ApplicationId conflict across processes

Post-SaveChanges synchronization timeout

StreamsManager not starting / state errors

StreamsException: Fatal user code error in TimestampExtractor callback — NullPointerException: Cannot invoke "java.lang.Long.longValue()" because "retVal" is null