Dislog — distributed logging system powered by Raft
year
2023
Dislog is a logging system implemented in Go. It is designed to be scalable, fault-tolerant, and easy to use. It allows you to collect and store logs from multiple sources in real-time.
A complete log not only holds the latest state, but all states that have existed, which allows you to build some cool features that you’d find complicated to build otherwise. Logs are simple—and that’s why they’re good. That's why Dislog exists.
Dislog follows a distributed system architecture, with secure mTLS connections over gRPC, which is an RPC protocol that uses binary encoding to reduce network overhead. The system follows a leader and follower relationship using a consensus algorithm called Raft, which is also used in large projects like Kubernetes. The architecture uses a discovery service library to maintain cluster membership, using an efficient and lightweight gossip protocol to communicate between nodes. The system is easy to integrate with current systems, as it uses containerization technology with advanced orchestration through Kubernetes.
During my internship as a Java developer at Inossem Canada Inc., I gained experience with microservices architecture and worked on a logging system that collected operation logs in a warehouse. Through this work, I faced challenges in integrating the logging microservice with other services, such as deciding where to store the logs and how to group them. I worked on multiple solutions, but each required coupling with other microservices to function properly. Unfortunately, this coupling introduced the potential for cascading failure, but we had no other solution at the time, and so we had to deal with the technical debt.
After my internship, I decided to create Dislog to solve this problem for others who may encounter similar logging challenges. Dislog provides only two simple log operations: add and get. I ensured that Dislog doesn't cause any coupling between components in an already complex distributed system, and it's easy to use. Simply deploy it and send requests to it.
Since my internship at Inossem Canada Inc., I've become quite fascinated with distributed systems and the problems they solve. While they do bring new challenges, the guarantees they provide make developing them worth it. This fascination led me to learn more about distributed systems and build one from the ground up.
Challenge
Chronological relationships in distributed systems
To ensure that it was clear which server had the latest logs appended to the system, whether that be the follower or leader, I had to implement a mechanism. Initially, I thought to use real-time clocks to determine which server had the latest log in case the leader failed and the followers needed to hold an election. However, I soon realized that real-time clocks complicated things too much and could be out of sync, which would cause the logs to be unsynchronized as well.
When relying on real-time clocks, the time on each server may not be synchronized, which can result in inconsistencies in the logs. For example, if Server A appends a log at 10:00:00 and Server B appends a log at 10:00:01, but due to clock drift or other issues, Server B's clock is actually one second behind Server A's clock, then the logs may be out of order. In this case, the log appended by Server B would appear to be earlier than the log appended by Server A, even though that's not the case.
Instead, I decided to use something called a term, which is a monotonically increasing integer that indicates how current a server is. It functions as a logical clock that captures chronological events such as when a log is appended. When a candidate begins an election, it increments its term. If the candidate wins the election and becomes the leader, the followers update their terms to match, and the terms remain unchanged until the next election.
Future ideas & what’s next
In terms of future ideas for Dislog, I'm considering adding support for more log operations and extending the system to support more advanced use cases. One possible addition is the ability to search logs based on specific criteria or filters. Additionally, I'm looking into implementing a more efficient way of storing and retrieving logs, such as using a distributed file system like HDFS or a NoSQL database like Cassandra.
As for what's next, I'm planning to continue improving and refining Dislog, while also exploring other distributed systems and learning more about the challenges and solutions in this field. I believe there is still a lot of untapped potential in distributed systems, and I'm excited to see how Dislog can contribute to this area of technology. I also hope to engage with the community and receive feedback from users to improve and evolve Dislog over time.