Proposal: Telemetry

Motivation

The Pulp development team has a difficult time with a lot of significant decisions due to a lack of basic data. This was discussed some at the Pulpcon 2021. Here is a sampling of the decisions we struggle with:

  • We launch a lot of our features as “tech preview” to get feedback from users on how it’s going before declaring it stable, thus preventing further changes to it from a semver perspective. Yet we have almost no data on if users are even using these features. Knowing when to declare an API stable is really hard.

  • Similar to ^, what about the single container or the installer? How broadly are they being? It would be valuable for everyone to know if it’s heavily used or not.

  • What about dropping EL7 support (eventually, no plans anytime soon)? When we did this for EL6 we basically guessed. If we knew 3% were using EL7 versus 30% we would be able to serve our users much better.

  • Or what about the database versions, when do we raise the support for the minimum database version? If hypothetically a minimum version raise would affect 23% of the install base, that is key information to know.

Proposal

We prototype some basic telemetry gathering data in pulpcore. I’d like to form a working group to collaborate on this effort, and use this thread for asynchronous discussion around the effort.

Privacy and Trust

Maintaining the trust of our users is of the utmost importance and should be the prevailing, guiding principle throughout this effort. I hope the working group discusses in detail how this can be done responsibly and with great respect of our users privacy and autonomy. I believe transparency and user choice are good strategies to keep in mind.

Would you like to join the working group?
Would you like to follow along with the group’s discussions and give feedback on the more detailed plan it produces?

1 Like

I’d be interested in joining this working group.

3 Likes

I’m interested as well

The installer supports a lot of different operating systems and their various configs. Collecting some basic information about the system running Pulp would help the installer team decide which features to prioritize. I would be interested in participating.

I’d be interested as well.

I’d be interested to follow along the discussions and progress.

The working group will meet weekly, and we’ll take good notes so if you can’t make it, or want to just follow along via discourse updates, that’s fine.

The meeting will be 9am ET Thursdays starting with Jan 6th. Everyone who expressed interest received an invite, but also the call details are below. The agenda is here, and feel free to add to it. Also questions and comments are welcome here which we’ll also look at during the meeting.

Telemetry Working Group
Thursday, January 6, 2022 · 9:00 – 10:00am
Google Meet joining info
Video call link: https://meet.google.com/vyw-xbip-vid
Or dial: ‪(US) +1 440-732-0623‬ PIN: ‪461 291 013‬#
More phone numbers: https://tel.meet/vyw-xbip-vid?pin=6000191772917
Or join via SIP: sip:6000191772917@gmeet.redhat.com

1 Like

Thanks to everyone who joined today’s working group call! The full minutes are posted here (thanks @ggainey for such great notes).

The next meeting will be Friday, January 14 · 9:00 – 10:00am EST (see in your timezone), and the info is below.

Google Meet joining info
Video call link: https://meet.google.com/vyw-xbip-vid
Or dial: ‪(US) +1 440-732-0623‬ PIN: ‪461 291 013‬#
More phone numbers: https://tel.meet/vyw-xbip-vid?pin=6000191772917
Or join via SIP: sip:6000191772917@gmeet.redhat.com

1 Like

Thanks to everyone who joined today’s call. The full minutes are posted here. Thanks @ggainey for the great notes!

FYI we’ve started writing the list of required questions each metric must answer before being added to telemetry, you can see that in the top of the notes too.

One action is for me to create a very-basic PoC against cloudflare’s key-value store for data collection. More details are in the notes. I hope to share this by next week’s meeting.

Speaking of the next meeting, it’s on Thursday, January 20 · 9:00 – 10:00am (see in your timezone), and the info is below.

Google Meet joining info
Video call link: https://meet.google.com/vyw-xbip-vid
Or dial: ‪(US) +1 440-732-0623‬ PIN: ‪461 291 013‬#
More phone numbers: https://tel.meet/vyw-xbip-vid?pin=6000191772917
Or join via SIP: sip:6000191772917@gmeet.redhat.com

1 Like

Thanks to everyone who joined today’s call. The full minutes are posted here. Thanks @ggainey again for the great notes!

The PoC (half done-ish) is available here for comment. As of right now it doesn’t post or read data to Cloudflare, but it does dispatch periodic tasks and generate the system ID (a uuid).

This week we mostly focused on getting the list of questions any metric to be reported needs to answer. @ggainey took the action item of proposing answer for gathering a redacted version of the status API data. We’ll likely dig into this in the next meeting.

Going forward the meetings will be 30 mintues. The next meeting is Jan 27th 9:30 - 10am (see in your timezone). The meeting info is below:

Google Meet joining info
Video call link: https://meet.google.com/vyw-xbip-vid
Or dial: ‪(US) +1 440-732-0623‬ PIN: ‪461 291 013‬#
More phone numbers: https://tel.meet/vyw-xbip-vid?pin=6000191772917
Or join via SIP: sip:6000191772917@gmeet.redhat.com

I’m feeling like I need to wrap up some other work, and I suspect others do too. I’m cancelling tomorrow’s telemetry meeting; our next meeting will be Feb 3rd.

Thanks to everyone who joined today’s call. The full minutes are posted here. Thanks @ggainey again for the great notes!

This week we reviewed and approved a proposal for basic metrics collection of anonymized status-like data. Although this is approved, please share additional thoughts on concerns at any point.

The PoC PR has not had additional progress made. My goal is to do more on it within two weeks.

Also @ggainey is going to create a hack.md to brainstorm questions that telemetry could help us answer.

The next meeting is Feb 10th 9:30 - 10am (see in your timezone). The meeting info is below:

Google Meet joining info
Video call link: https://meet.google.com/vyw-xbip-vid
Or dial: ‪(US) +1 440-732-0623‬ PIN: ‪461 291 013‬#
More phone numbers: https://tel.meet/vyw-xbip-vid?pin=6000191772917
Or join via SIP: sip:6000191772917@gmeet.redhat.com

Here’s a hackmd to gather quesitons we want telemetry to help answer. Suggestions for Things We'd LIke Telemetry On - HackMD

I want to cancel tomorrow’s meeting to allow us to focus on the 3.18 deliverables. I plan to host it on Feb 17th though at the usual time.

The only update I have is that the scheduled task work is being scheduled for pulpcore 3.19 and is now moved to its own PR.

I am cancelling tomorrow’s meeting (Feb 17th) and the week after (Feb 24th) to accommodate several folks on PTO and the 3.18 release. Our next meeting is March 3rd when I hope we can pickup with the regular weekly meetings.

Hey folks, I’m going to cancel once again. I believe the next step is on my to port the PoC to use the new scheduled tasks in 3.18. Also I’d like to see that working before we go too much further in metrics planning. So I’m going to cancel two weeks once more with the plan to meet on March 17th.

We met today, you can see the notes here Telemetry Working Group - HackMD

The action items for next week (also in notes ^) is to think about how we expect the summarization and representation to work specifically for the “status-like telemetry data” proposal we already reviewed. That proposal is here: Proposal: gather /status/ telemetry - HackMD

We’re planning to meet again (same bridge info as earlier in the thread) next week, March 24th at 9:30 ET).

We met today, and you can see the minutes about what was discussed. Also we identified some graph goals for this first set of data to help inform the PoC.

We’ll be meeting again next week, March 31 ET on the same bridge.

We met today, and you can see the minutes about what was discussed. We primarily discussed details around summarization of data, and making data available or not.

One open question is: do we have dev and production installs all post data to one backend, or have two?

I’m hoping to work on the PoC some more to show next week (hopefully finishing the first MVP PoC). We’ll be meeting again next week, April 14 ET on the same bridge.

I’m cancelling tomorrow to ensure when we meet next, we have a working PoC. The 3.20 plugin API breaking changes needed to be done on higher prio, so that work got the time instead. I plan to work on the telemetry PoC early next week. We’ll meet again next April 27th at the usual time and place. (see links and info in ^ posts).