Registration Closes Oct. 3rd for My Final 10-Week Training Seminar of 2021
Designing Human-Centered Data Products is back, but space is limited. Work with me and a small group of data product leaders who want to learn to build more useful, usable, and indispensable analytics and ML solutions. Live sessions begin Oct. 4, 2021. Details/register

039 – How PEX Fingerprinted 20 Billion Audio and Video Files and Turned It Into a Product to Help Musicians, Artists and Creators Monetize their Work

Experiencing Data with Brian T. O'Neill
Experiencing Data with Brian T. O'Neill
039 - How PEX Fingerprinted 20 Billion Audio and Video Files and Turned It Into a Product to Help Musicians, Artists and Creators Monetize their Work

Every now and then, I like to insert a music-and-data episode into the show since hey, I’m a musician, and I’m the host 😉 Today is one of those days!

Rasty Turek is founder and CEO of Pex, a leading analytics and rights management platform used for discovering and tracking video and audio content using data science.

Pex’s AI crawls the internet for user-generated content (UGC), identifies copyrighted audio/ visual content, indexes the media, and then enables rights holders to understand where their art is being used so it can be monetized. Pex’s goal is to help its customers understand who is using their licensed content, and what they are using it for — along with key insights to support monetization initiatives and negotiations with UGC platform providers.

In this episode of Experiencing Data, we discuss:

  • How the data science behind Pex works in terms of being able to fingerprint actual songs (the underlying IP of a composition) vs. masters (actual audio recordings of songs)
  • The challenges PEX has in identifying complex, audio-rich user-generated content and cover recordings, and ensuring it is indexing as many usages as possible.
  • The transitioning UGC market, and how Pex is trying to facilitate change. One item that Rasty discusses is Europe’s new Copyright Directive law, and how it’s impacting UGC from a licensing standpoint.
  • How analytics are empowering publishers, giving them key insights and firepower to negotiate with UGC platforms over licensed content.
  • Key product design and UX considerations that Pex has taken to make their analytics useful to customers
  • What Rasty learned through his software iteration journey at Pex, including a memorable example about bias that influenced future iterations of the design/UI/UX
  • How Pex predicts and priorities monetization opportunities for customers, and how they surface infringements.
  • Why copyright education is the “last bastion of the internet” — and the role that Pex is playing in streamlining copyrighted material.

Brian also challenges Rasty directly, asking him how the Pex platform balances flexibility with complexity when dealing with extremely large data sets.

Resources and Links


Quotes from Today’s Episode

“I will say, 80 to 90 percent of the population eventually will be rights owners of some sort, since this is how copyright works. Everybody that produces something is immediately a rights owner, but I think most of us will eventually generate our livelihood through some form of IP, especially if you believe that the machines are going to take the manual labor from us.” - Rasty

“When people ask me how it is to run a big data company, I always tell them I wish we were not [a big data company], because I would much rather have  “small data,” and have a very good business, rather than big data.” - Rasty

“There's a lot of these companies that [have operated] in this field for 20 to 30 years, we just took it a little bit further. We adjusted it towards the UGC world, and we focused on simplicity” - Rasty

“We don't follow users, we follow content. And so, at some point [during our design process] we were exploring if we could follow users [of our customers’ copyrighted content].... As we explored this more, we started noticing that [our customers] started making incorrect decisions because they were biased towards users [of their copyrighted content].” - Rasty

“If you think that your general customer is a coastal elite, but the reality is that they are Midwest farmers, you don't want to see that as the reality and you start being biased towards that. So, we immediately started removing that data and really focused on the content itself—because that content is not biased.” - Rasty

“[Re: PEX’s design process] We always started with the guiding principles. What is the task that you're trying to solve? So, for instance, if your task is to monetize your content, then obviously you want to monetize the most obvious content that will get the most views, right?.” - Rasty



Brian: Welcome back to Experiencing Data. Today, we have a special edition of the show. I think I've told listeners before, I have a background in music originally, and I'm a professional musician as well as my consulting work. And so, whenever I have the chance to do a music and data, or music and analytics oriented episode, I always jump on those because they're super fun for me, and I think it's just a fun topic that everyone can, kind of, enjoy from an entertainment perspective, but also from a learning perspective. So, today, I'm happy to have Rasty Turek on the line, who's the CEO of Pex. Hey Rasty, how's it going?

Rasty: Good. Thank you for having me.

Brian: Excellent. What the heck is Pex?

Rasty: [laughs]. Well, we call it—

Brian: Oh, it's not a candy thing right with the little head that flips out, you know?

Rasty: That's Pez. [laughs]. Actually, Pex is a Czech word, Pexeso which stands for [00:01:20 “pekelně se soustřed!”] It's an acronym. And the word is the Memory Game, or Game of Memory, as is known in English. And our name stems from the idea of what we are doing, so we can identify two pieces of content against each other, and the game is based on the fact that you're flipping through images that are on their underside or essentially on their face. And so, you don't see the images and they have to identify the same. That was the origin of these. The problem of the name is that most Americans cannot pronounce it. And so, after nine months of negotiations with the original owner of the, we were able to acquire the domain and shortened our name.

Brian: Cool, cool. So, tell me about what Pex does in our listeners. You’re in the music analytics and rights management, but what does that mean to someone who doesn't really work in the music business?

Rasty: Well, we call it Google for audio/visual content. In reality, we, kind of, work the same way. So, our services, crawl the internet for audio/visual content, index it and it's able to search through it, and then we expose that to our customers, which are usually rights owners, on both the digital rights management—meaning we didn't find content that was shared without their permissions, or without the licenses and/or we help them to understand the general what is called UGC, User Generated Content world of YouTube, Facebook, TikTok, and many others.

Brian: So, when you tell about rights management then here are we talking about—so in the music world, we have what's called the song, or the IP, right? So, this is, like “Let It Be” by The Beatles, right? But I can go and record “Let It Be” on my own record, and my recording of that song is called a master. So, are you searching all the masters and helping the rights holders of recordings, or are you helping the composers that actually wrote the songs? Who's your customer in that sense?

Rasty: We actually do both. So, we are able to technically identify four distinguished copyrights. There is five: text, images, video, sound recording, and composition, we can do four of five. So, we do everything but text, and so we are able to separately identify melody, which is essentially what comes from the composer, or the writer and separately identify the sound recordings. And so, by these, we are able to serve both the labels, and the publishers—or essentially that side of the world—and the technology is built to deal with a lot of distortion, so when people re-record songs in a different style—so for instance, you do a piano version of something that was done in a guitar, or you have A cappella or something similar, our systems are able to connect these dots between each other.

And so, that's on the music side, on the video side also, we deal with a lot of different distortions, like horizontally swapped images, and cropped, and a lot of other things. And we are able to identify somewhere from one second of the content, so fairly short segments, and these allows us to identify also mixes and remixes, so when people moosh together multiple songs, or when they are making a longer set of songs, we are pretty good at that too.

Brian: Got it. So, I'm curious, without getting too into the technical details here, but how does the technology work in terms of identifying, say that the song, the copyright in the actual intellectual property of the song? I assume you're using machine learning for this, but I would think it would take a lot of versions of the same song before you could identify the song. Is that correct, or that's not really the approach that you took to actually identify these recordings?

Rasty: Actually, so the system is very flexible. So, where it's maybe easier to illustrate this is that there is—in video where, essentially, we can use a camera recorded version from cinema, and then identify any other subsequent version of it, be it DVD, Blu-ray, a lot of other distortion. And so, the quality doesn't necessarily matter for the system and something similar is applied to the audio. And this is achieved through something that is called perceptual hashing. It's a very concrete type of mathematical formula that essentially looks for perceptual changes rather than changes of content, meaning the mathematical.

So, as you know, everything is zeros and ones in computers, so what is a change to a computer is not necessarily a change to a person. So, there are these things called codecs that allow different forms of compression, of content. To computer, these things are inherently different. To humans, they are not. And so, perceptual hashing exists to teach computers about differences that humans perceive rather than the other way around.

And so, we utilize these kinds of concepts of perceptual hashing, and then we built on top of that internally, of our own algorithms that are correctly identified as a machine learning, not necessarily deep learning. So, these are not what is currently called AI. These are what the industry calls signal processing. And so, there are formulas that essentially allow other algorithms to identify changes that humans will perceive, and through these changes is able to identify the content. And so, there's a lot of distortions that can come in like sped up, slow down, high pitch, low pitch, and a lot of other things with the music, and this is what the perceptual algorithm is built [00:07:38 around].

So, the idea is—and this is why we are able to identify things that are, when they change instruments or when, for instance, someone whistles the melody, because the algorithm is not looking for is there a difference between those two particular files, but what it’s looking for is, is there a difference between the logic of them, the way that a human’s ear will approach this. And so, this understood sector, I will say. Most things started in 60s. The first significant companies out of Yamaha came like 90s. Shazam is 2000.

So, there's a lot of these companies that operate in this field for 20 to 30 years, we just took it a little bit further. We adjusted it towards the UGC world, and we focused on simplicity, in the operational parameters. So, that means there is a difference when you have 60 million songs and you're trying to identify those, and, as we have, 20 billion videos, and you're trying to identify those, because the difference between the song and video is quite significant, especially on the audio side, where normal video, like, if you go out and you start recording on your phone, there will be thousands of different sounds coming in. While song is built around the fact that it is shielded from everything other interference. And so, you have these very narrow sounds within and so songs are significantly easier than for instance videos that are from the normal world, and so we had to adjust to those. And we did, I will say, pretty good job over the last six years.

Brian: Yeah, I would think it's quite good at identifying master recording use, right? Like someone else's recording being played in the background of a video. Is it safe to assume that the platform—the analytics are more accurate on identifying that, than it is on identifying say a cover version of someone else's song, or it's about equal in terms of the accuracy?

Rasty: It's slightly less on the cover versions, mostly because we are running in troubles. For instance, we have, as a test case, a regular version of Beatles song, which is you will, as a person, be able to identify quite easily, computer is not because what you are listening to is not the melody, you are listening to the lyrics, you are listening to all of these things while the computer is really narrow. And so, computers are these expert systems rather than they are these general systems like humans are. And so, this the discussion between AGI versus AI, or Narrow AI.

And so, this is kind of the same thing for us. So, the cover versions are a little bit slightly less on the identification side, but at the current world, we are able to still identify around 70 to 80 percent, and that's usually good enough when you are talking about 20 billion videos and songs that are uploaded to the internet today, with a additional 50 million being uploaded every single day. And so, at the end of the day, the quantity always trumps the quality when it comes to the massive scale like these.

Brian: Right. So, I'm going to just summarize what I think the customer and the use cases are here, and then we can kind of jump into what some of those are because I'm curious to understand how the end-users actually perceive your analytics, how they use them, how did you decide what to present? So, in short, the reason I would buy Pex, or subscribe to Pex if I'm the user is, I'm a publisher, probably. I’m in charge of monetizing an artist’s catalog. So, they've got 10 CDs out, 100 songs, and that artist isn’t spending all day chasing down every YouTube and trying to monetize, someone used my song, and I need to monetize that. That's really the job of the publisher to do placements, and whatever, commercials on TV, and this kind of thing.

So, the question is when there's user-generated content, it's really hard to keep up with all the usages of quote, “My song.” And so, my publisher, if I'm the artist, they're the person that's going to use Pex to go out and figure out where digitally—you guys are tracking—where are all of my copyrights being used that I don't know about? And then, I have the option, I guess, to either monetize, or request takedown, or what's the use? What's the business value there? Is it the ability to monetize it immediately, or to take it down, or to do something else? What would I do as the publisher in that case?

Rasty: Well, that depends on the platform. So, some platforms allow additional monetization, some platforms don’t. What is a pretty big use case in general, is also general licensing. And so, it's not necessarily you as a singular composer or singular artist. The publisher represents a lot of them, and so mooshing them into a bucket, we can tell them what's their makeup of the platform. So, this is your catalog against this platform, you represent, let's say 14 percent of all traffic to that platform. And then, the publisher goes to the platform and says, “Look, I can remove all my content. That's my legal right. Or we can find a way how to financially benefit both.”

And so, this is more and more popular. This is also a thing that is happening now in Europe. So, Europe passed a new law middle of last year, called Copyright Directive, more specifically Article 17, which changes the paradigm how UGC platforms operate. So, under DMCA, Digital Millennium Copyright Act in the United States, they get something called Safe Harbor, which essentially allows them to have any content they want without any legal ramifications. The copyright directive is changing these hundred and 80 degrees saying, from now on, you have to have license to a hundred percent of your content, and so this is where our technology place can help out platforms with identification, licenses, and other things.

But there's always a point of analytics and inherent action. So, a lot of creators, rights holders don't want their content in certain settings. So, for instance, Disney doesn't want to have their content on adult platforms, or there's a lot of these kind of decisions that already go through themselves, versus a lot of musicians don’t want to have their songs to be played in certain podcasts that are maybe, you know, in certain teams that are political or others and so, we give them control over these decisions, and then they make the further decisions based on data that we surface.

Brian: Got it, got it. So, would you qualify it as, then, the analytics are largely providing a form of decision support? Like ammunition to go in and have a conversation with a large platform, and say, “We know what's going on, we know where our catalog’s being used, we know what that's worth to you.” Is that the kind of the value Pex provides, and then they go and have a separate conversation about how to do a deal, to formalize that?

Rasty: We definitely facilitate those. There is so much more. I know Alexis is much wider, and so this is one of the use cases. We have plenty of others, and it really depends on their interest at a time. And so, what we tend to see a lot is repeated business with most of the rights holders, because they realize that we can support from a basic level, when you are doing a promo for a certain song or certain album or certain video, up to the ladder of a whole platform or category.

So, for instance, which podcasts are booming these days, and what kind of music is being used, and what gaming content being popular, and a lot of other stuff that essentially correlates with this. And so, the analytics is not only wider but is much deeper. And it really depends on the creator and their interest at a time. But we tend to cover most of the holes over a few months once they realize what we can do.

Brian: So, I would assume you probably have a small number of publishers who represent giant catalogs and then maybe a long tail of people with small to medium-sized catalogs. Is that a safe assumption, or…?

Rasty: We currently work mostly with the large ones. Through trade organizations and groups, we gave access to the small ones, too, and we are hoping to open up more widely within a few months. But for now, the largest rights holders are especially our current customers.

Brian: Got it. So, how did you go about deciding—okay, you log into Pax—I'm a publisher for a large publishing company, I have a really large catalog. What do I need to see, and how did you guys decide what information to present, what gets priority in that experience? How do you make sure that the analytics aren't overwhelming, and it's not just, like, here's every placement everywhere in the world and what time it happened at because that's probably overwhelming? So, tell me about your process of figuring out how to make all of this stuff digestible, when there's so much user-generated content being produced all the time.

Rasty: Well, what is funny, middle of February we just passed six years as a company. And last year, we had a large on-site in our headquarters in Los Angeles where all of our employees flew in. And I fished out the wireframes that I put together sometime before the company was established, and the current system looks almost exactly like the original wireframes. And so, I will say we got quite lucky with the thinking there. But we, in fact, surface every single identification, so 10s of millions, hundreds of millions of them to every single rights owner, but we build around that is very advanced filtering, where they are able to narrow down, based on a series of maybe 50 filters, what they want to really see at this point, and then build tools around that to take an action from the system.

And so, rather than going off summaries, we went with the specifics because what differentiates us from a lot of other competitors is there's a lot of analytical companies that do very similar things, and one of the most famous is Nielsen, were to do a lot of extrapolation, so they have few sensors within the population, what they call people meters, and then based on those, they will generate the essentially whole population markup, what we do is the complete opposite. We see every single piece of content on the internet or almost every single piece of content, and because of that, you don't have to extrapolate anything, but that also puts us in a place where any summarization becomes very tricky because now they're extremes, right? And so, what we decided to go with is to give the raw data with a lot of tools to help them to summarize them the way that they want, flexibly. And this was quite successful for us because, a) our capabilities are very hard to replicate within other companies, so we were able to operate for the last six years unopposed without any significant competitor, but b) it also helped the rights holders to realize what kind of data is out there, and what the capabilities and possibilities are. And so, this is within the DRM.

And then, for the analytics, what we did is we spent three years building these POCs with a lot of different customers. So, we went from rights holders to brands, through aid agencies, and a lot of different companies that essentially touch in some form or shape audio/visual content. And then, we produce these reports for them to see how they react. And as the time went, and as we learn more and more, and we saw how to utilize these, we were able to figure out what matters and what doesn't, and build these and productize these into a system that is just becoming public now. So, we spent almost three years in a more like a physical AB test world rather than putting something out and hoping for the best.


Brian: So, can you tell us about some of the things that maybe you learned about the different versions that—I would call those prototypes effectively—so you had multiple prototypes, you got user feedback on them. Is there a particular story, or an aspect of the design that you, maybe you like, “Wow, we never would have thought to do that. But now that we showed them some different versions, they brought back some surprising feedback for us and it changed our trajectory.” Did you have any learnings like that you can think of?

Rasty: Yeah, there's a couple. So, one of the big ones is that we only see content, we don't see usage. That means we don't follow users, we follow content. And so, at some point we were exploring if we could moosh usage data to our own system, and there are other providers that essentially, kind of, sneakily follow users, and we thought this will be an interesting addition to our system. And as we explored this more, we started noticing that the recipients started making incorrect decisions because they were biased towards users. If they had some idea about who their customer is, and once they saw the customer, they started becoming immediately biased towards it.

So, one of my favorite stories is unrelated to us, but it's essentially this fishing rod company in 60s and 70s, was trying to advertise for men, and someone suggested a woman magazine. And, essentially, all of the executives said, “That's stupid because women don't buy fishing rods,” but the reality is woman bought them for the men. And so, they became one of the most successful companies because they were advertising in a places that others didn't. And so, this is, kind of, similar, we see this a lot. Like, once there is a user exposed, the bias immediately kicks in.

So, for instance, if you think that your general customer is a coastal elite, but the reality is that they are Midwest farmers, you don't want to see that as the reality and you start being biased towards that. So, we immediately started removing that data and really focused on the content itself because that content is not biased. Content has a usage, and that usage is whatever it is, and so that was a big learning. The other things was, we able to surface usage through approximation in a form? So, for instance, if you put out a video, we know what pieces of that video are being used in other pieces of content, and that way you're able to connect back to which parts of your content are clicking with which audiences. And whatever you are able to surface in these measurements is something that most of the rights holders, or the recipients don't ever expect. So, it's parts that they thought, “Oh, well, why are they interested in this?” And we never—we don't know the answers, we just are able to show the data. So, that tends to be quite interesting.

And then, one of the curious ones is because we can seclude or separate the owned channels versus what would we call editorial channels versus a true UGC, people get very surprised by how the true UGC, meaning the person that you don't have any relationship with, reacts to the content differently than the channels that you are in control of. And so, these were pretty interesting learnings. The other one is there's 400 plus platforms in the world. Most people can name maybe 10. And most of them don't compete with each other, so we were very surprised about this.

The only big one that actually, kind of, tries to go after all of the content is YouTube. But every other platform outside of that is very narrow, very specific, and only certain type of content, and certain message works there. And we were very surprised to see such a divided world; be it geographically, so there are platforms that are, for instance, Russian or Chinese; be it language, or be it type of content. And these information, when it comes to the [00:25:58 unintelligible]. Again, it can be a rights holder, it can be a creator, or aid agency tends to be shocking, because, again, most people cannot name more than 10 platforms.

Brian: . Got it. Got it. I wanted to challenge you on one thing you said here. So, in my experience, working on decision support applications, and analytic solutions, typically speaking when you're dealing with really large data sets like this and, in your case, the customer’s data being placements, or usages of their copyrighted material and user-generated content, simply displaying every single record, like in a table or something.

That can be pretty daunting in terms of figuring out what do I do with this data, unless I'm simply taking an export, and then I bring it into another system that's going to then do some kind of analysis on it. So, how do you balance the flexibility that that provides with the potential complexity, the need for the user to figure out, “What do I do with it?” You know, if I log in on Monday and there's 52,000 new placements of the hit single that my artists dropped last week, all over the place, does that put a tax on them in terms of the amount of effort required for them to take the next step, or how do you balance those two things?

Rasty: What is interesting is it's always—so we always started with the guiding principles. What is the task that you're trying to solve? So, for instance, if your task is to monetize your content, then obviously you want to monetize the most obvious content that will get the most views, right? That's how monetization works, at least on YouTube and a few other platforms. More views generates more ad views; more ad views generates more money. So, the task for that particular person is I want to identify any content that has the most potential to earn me the most money.

And so, we build a lot of these algorithms internally that essentially surface or try to predict the future. Obviously, that is not possible, but there is some simple ways how to use velocity from the past, then extrapolate into the future and few other things. And so, we offer these priority filters; we offer these other things. The other interesting thing is we started hiding a lot of information from the customer. So, for instance, historically we always surfaced views, and engagement counts, and a lot of these other things. And we realized that people again—because that's what we, as a human do, we get biased.

And so, people will immediately start looking at these numbers and start using them. So, if there is a video that we didn't fight that has already a million views, you will never get paid for that because the million views is gone. But people somehow instinctively go to that and say this is the largest number that's the number I have to operate on. But there is a new video that was uploaded three minutes ago. And in three minutes, it got 4000 views, it has much higher probability, gaining the next million than the one that already has a million.

And so, we started hiding a lot of information from people. We started putting them in the kind of ways or scopes, where they are very forced to take a certain action. And so, all of them are designed to reflect the usefulness of what they supposed to surface. So, a customer that is trying to learn about the general overview of the platform is not necessarily bothered about a single video, so they don't have to see that, while the customer that is trying to take an action on very particular videos, is trying to go after those.

And so, for instance, as I said, there is customers that refuse to have their content on adult platforms, and for that is very simple. It's a filter for adult platforms and then everything within there is something that they dislike, in the one there so they can issue takedown notices. There is other type of customer that wants to monetize the most, and so essentially goes after the content that has the most likelihood of earning them the most money. There are other customers that are trying to see the most innovative stuff that happens to the content. And that means they want to distinguish between the identification because what happens a lot is someone uses content in their own video, and that video that they produced is copied all over the place.

So, it's essentially—its indirect virality, right? You as a musician, let's say, you didn't do it. You were just in the video that was used a lot. And so, we see this plenty with the big, big artists where they use someone unknown content and this happens all over the place where they not necessarily steal, they might have a deal, they might have, you know, this might be a promotion for someone that they really like. These might be very legitimate, and in many cases it is. What becomes the challenge there is you, for the creator, for the rights holders, they really need to know what they are going after.

And so, our system tries to adjust to these requirements, and those are semi-reflected in the design. And as I said, you're doing this for six years, and so by now, we saw most of the possible use cases, and so the system is well optimized for all of them. The funny thing is, and I think this is just a gut feeling that I had when I was starting this company, I've really focused on the first principle, so I've really asked myself, “What I think is the end goal for every of our customers, and what shall I do about it? Or how would I want to see this, if I will be in their shoes?” and so I tried to put something together and somehow I was semi-right, so we didn't have to change too much over the years and seems to be working, but this is not necessarily the case, and that's why I said, we spent three years on the general analytics product, testing with the larger market just to make sure that we get it right.

Brian: So, does it predict—when I log in—because it sounds like the use cases you spelled out there, there's some very concrete and specific ones. So, for example, let's call it opportunities, monetization opportunities. Pex predicts that the following 10 videos are going to be the most monetizable in the next week or something like that. Do I have to go and dig for that, or do you actually, like, promote that content? And maybe there's another widget that says, “Here's violations of your use: Playboy TV, or whatever is using Disney’s, Lady in the Tramp,” or I don’t know, [laughs], something like that. Do you provide that stuff right away, or does the user go in and configure all that stuff themselves, or how does that work?

Rasty: Well, we definitely surface the infringements. We don't know if this infringement is really an infringement, or it's licensed use. We don't see into the people's minds, and we don't know what they are up to, right? And so, if you had a deal, you need to know this yourself. We connect to system on the publisher side to like sync databases or something similar, where we can help them to do these things. We have a lot of tools like whitelists, where you can say, “All of these accounts are licensed by me, or my friends,” or whatever that is, “And you don't touch those.” Or there is a lot of tools that are built to help them. But at the end of the day, it's up to the right holder, or the user of our system to figure these things out.

Brian: Got it got it. So, did you feel like it's relatively easy, then, to figure out what that experience should be like for the customer, or is that an ongoing thing? How do you think about it now? Because it sounds like it's, kind of like, done in a way. Like you said, “We spent three years on it.” Do you see that as an ongoing challenge, or how do you see it going forward?

Rasty: Yeah, I mean, the challenge for us is right now changing with the paradigm of the market. And so, that's our biggest focus is the market is changing, because the society is now forcing a change on all of the UGC platforms. So, we want to be there to facilitate it more. The other thing is, what is missing in the general, I will say, population is any kind of education around copyright. Most people just don't know what copyright means, what the rights are. We see people challenging copyright strikes, essentially, takedown notices and others, saying this is a public domain because I bought these in a public place. a lot of people have very hard times figuring out the basic laws, and I think there is some education very similar to financial education that had to happen in, let's call it, 80s and 90s where people couldn’t distinguish between debit and credit cards, and now it's not a big deal, right? It's something that most population understands very well.

I think this is coming. I think this is the last bastion of the internet. We have now identity that is exposed, we have now payments that are functional, and the last thing that is missing is attribution. And I think most of us, I will say, 80 to 90 percent of population eventually will be rights owners of some sort, this is how copyright works. Everybody that produce something is immediately a rights owner, but I think most of us will eventually generate our livelihood through some form of IP, especially if you believe that the machines are going to take the manual labor from us. Then the IP is the only thing that we are left with. And so, through some form of education that has to occur, and a lot of other things. And things have to settle, there is a quite turbulent times in the copyright, right now, and I think the markets are paying a lot of attention to it.

And so, for us as a company, we just want to be middle of it, we want to facilitate as much as possible, and we want to expand those markets. So, I told all my employees that the goal, the vision for these company is that twenty years from now, nobody will ever even think about copyright. It will be the same thing as when you go to Amazon and you pay online with credit card, you don’t think about it. Twenty years ago, people were laughing at Amazon, saying, “Who the hell is going to pay you over the internet?” And so, we definitely hope that this is going to be the massive change in the society, but there's a pretty long way to go. And so, that's why it's, okay, about twenty years, rather than next week.

Brian: Yeah. I find it—I mean, I'm an artist and it's very complicated, even if you just focus on music and audio, right? Just masters and songs, then you get into video, and then you get into mashups. And it's crazy there's not standards for everything, and you have international law and yet the web that has no borders, right. So, it's a mess. Unfortunately, we all still need lawyers some of the time for certain things. But it can get in the way, sometimes of the creativity, but hopefully, it'll get better.

Cool. It's been a great chat. I'm curious, our audience tends to be people that are in leadership positions of data products, decision support applications, analytics solutions. You're sitting on a huge pile of data, recordings, audio, and video, and you've turned that into a successful business. Do you have any advice for leaders trying to do the same thing? They're trying to manage large datasets and turn that into useful insight for their users, how do they do that? Or, do you have some closing advice for them?

Rasty: I always start with the first principle. What is the goal that someone is trying to achieve? So, every data—that's the problem of this world. There's too much data. When people ask me, how is it to run a big data company? I always tell them, I wish we will not be, because I will much rather have a small data and have a very good business than a big data. Because with big data comes not only just a lot of work, but also a lot of responsibility, especially when you have data on usages and other things because that is touching privacy, which we fortunately don't.

But the first principle is always work. What is the task that I'm trying to achieve, or the customer is trying to achieve? Where can I help? And so, I learned the basics, let’s say, [00:38:60 unintelligible]. There is only two reasons why people act on something: fear or greed. And so, they can be always translated in, people will pay you only because one of the two things, you make them more money or you save them money. Either way is good. Obviously making more money is always better.

And so, that's the first principle that you start with. How can my data can be helpful to whoever the recipient is? And then, zeroing down. So, one of our biggest challenges as a company is every technology can be used in myriads of ways. We, over the last six years, got asked by hundreds of organizations to work with them, from identification of radios within cars. So, essentially, car manufacturers wanted to have our technology to identify what people are listening. The TV manufacturers wanted to put this on the TV set-top boxes, and stuff like that. And many technologies can be used in many ways. We decided that we have a certain goal, we have a certain vision for the company. And we mapped the path how to get to the vision, regardless if it's going to be successful or not. This is what we want. This is who we are as a people, these are the moral values and standards that we have, and as we were gaining traction with the technology, and with the company, and with the products, we stayed focus.

So, now every day one of my employees will show up with some new opportunity that someone else is talking about. There is a new hot industry, there's a new something happening, maybe we can help. Of course, we can help, but it's a distraction. So, staying focused, and delivering everything on the first principle keeps you in the business. And so, one of the interesting things as we are heading into a quite massive recession, is that our business, because it produces or generates more money for our customer, is quite safe, but would we have gone in any other direction in the past—and there were a lot of directions that were more lucrative—we would not be able to survive this.

And so, there is a very important value in staying focused, and really, really starting from the first principle of the customer. So, what the customer wants, what the customer needs, and how can I fulfill the need? And funny enough, there is no marketing for our company. We barely talk publicly, and if you really truly believe in the fact that if you build it and it's valuable, they will come. It took a very long time for us to fulfill that, but it worked.

Brian: Well, congratulations on your success and it's good to know there's a company like yours and that's helping artists and publishers out there monetize their content and get paid for doing their art. So, thank you for that. And thanks for sharing all these great ideas. Where can people follow you: websites, Twitter, social media? What's the best way to get in touch if they wanted to check your stuff out?

Rasty: Yeah, I'm only actually on Twitter. The rest, social media is not for me. On Twitter, I am @synopsi. Otherwise, our domain is, and that's, kind of, it.

Brian: Awesome. Well, Rasty, thank you so much. We've been talking to Rasty Turek, the CEO of Pex, a multimedia analytics and rights management platform. Thanks for coming on Experiencing Data and best of luck.

Rasty: Thank you very much, Brian.

Brian: All right, cheers.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Subscribe for Podcast Updates

Join my DFA Insights mailing list to get weekly insights on creating human-centered data products, special offers on my training courses and seminars, and one-page briefs about each new episode of #ExperiencingData.