Open Source and Open Data: Storj DCS Network Statistics

Written by Brandon Iglesias

You might often see or hear us reference our company values. The fact of the matter is that our values—including openness, transparency, and empowering our community—are what drives us as a company and as individuals. Our values are our north star, so when faced with decisions or when we find ourselves at a crossroads, we often reexamine the situation through the lens of our company values.

Our company value of Open means we’re committed to the free and open sharing of software, information, knowledge, and ideas. It’s been shown this kind of openness yields better results in the long run—not just for the company but for the industry and community as well. Open source software has been the cornerstone for innovations such as containers and microservices, private web browsing, and new databases that enable other powerful services. This is why we are committed to open source software.

Since the launch of Storj DCS, our community has been asking for more statistics and data on the network. Some folks in our community have even found ways of reverse-engineering the network to derive statistics about it. A great example of this ingenuity is Storj Net Info. Providing these statistics has always been a goal of ours, but the task has been lower on our priority roadmap list than delivering some other critical features that Storj DCS customers need.

We recently began publicly exposing more data about the network in a way that could be used on-demand and programmatically. If you missed it, we have started publishing what we think is the most important network statistics on our new Storj DCS Public Network Statistics page. Now, if you’re a non-technical person, this may not be what you expected. Here’s an explanation of why we took this approach.

New members of our community often ask why don't we build a service like Dropbox or Google Drive instead of a cloud object storage service like Storj DCS. This is because we’re focused on providing the building blocks (underlying storage layer) for others to build those kinds of applications. By doing this, we can enable dozens of companies to build Dropbox-like services on Storj DCS (or easily migrate their existing applications to the service).

We decided we wanted to take a similar approach with these statistics, so we’re exposing the data in JSON format instead of just providing a dashboard for people to view. On this page, you’ll find statistics such as the amount of data stored and transferred across the network and information about the Nodes on the network. The data on this page is automatically updated every hour so you can make time-series charts.

You’ll also start seeing these statistics appear on various pages across the site, including our homepage and Node Operator page. These pages will be updated every hour when new data is published on the network statistics page.

The data we are exposing include the following statistics:

Statistics about stored and transferred data

  • bandwidth_bytes_downloaded - number of bytes downloaded (egress) from the network for the last 30 days
  • bandwidth_bytes_uploaded - number of bytes uploaded (ingress) to the network for the last 30 days
  • storage_inline_bytes - number of bytes stored in inline segments on the Satellite
  • storage_inline_segments - number of segments stored inline on the Satellite
  • storage_median_healthy_pieces_count - median number of healthy pieces per segment stored on Storage Nodes
  • storage_min_healthy_pieces_count - minimum number of healthy pieces per segment stored on Storage Nodes
  • storage_remote_bytes - number of bytes stored on Storage Nodes (it does not take into account the expansion factor of erasure encoding)
  • storage_remote_segments - number of segments stored on Storage Nodes
  • storage_remote_segments_lost - number of irreparable segments lost from Storage Nodes
  • storage_total_bytes - total number of bytes (both inline and remote) stored on the network
  • storage_total_objects - total number of objects stored on the network
  • storage_total_pieces - total number of pieces stored on Storage Nodes
  • storage_total_segments - total number of segments stored on Storage Nodes
  • storage_free_capacity_estimate_bytes - a statistical estimate of free Storage Node capacity, with suspicious values removed

Statistics about Storage Nodes

  • active_nodes - number of Storage Nodes that were successfully contacted within the last 4 hours, excludes disqualified and exited Nodes
  • disqualified_nodes - number of disqualified Storage Nodes
  • exited_nodes - number of Storage Nodes that gracefully exited the Satellite, excludes disqualified Nodes
  • offline_nodes - number of Storage Nodes that were not successfully contacted within the last four hours, excludes disqualified and exited Nodes
  • suspended_nodes - number of suspended Storage Nodes, excludes disqualified and exited Nodes
  • total_nodes - total number of unique Storage Nodes that ever contacted the Satellite
  • vetted_nodes - number of vetted Storage Nodes, excludes disqualified and exited Nodes
  • full_nodes - number of Storage Nodes without free disk

Statistics about user accounts

  • registered_accounts - number of registered user accounts

Since we launched this, one of our community members built this really cool grafana dashboard. Check it out. We’ll be sharing more about this and other community-built dashboards in the coming weeks, but we hope that exposing this data will continue to enable others to build amazing things like this!

As we continue to expand on the data points we expose, we’ll be adding more of this data to our website as well. If you have any ideas or suggestions on what else we should be exposing, please open a GitHub issue in the repository for this project.

Like this post? Share it