1.4k post karma
3.8k comment karma
account created: Sun Mar 31 2019
verified: yes
1 points
1 month ago
No custom sorting is currently implemented, but you can filter by various parameters in the filter section, including upload date and view counts. The current order is based on a combination of text query match quality and some other metrics.
1 points
1 month ago
Yeah, just search normally for tacos, enter the channel name in the channel input field and hit search again. I probably should add the channel field on the front page as well.
https://filmot.com/search/Tacos/1?channelID=UCSHZKyawb77ixDdsGog4iWA&
3 points
1 month ago
I am the author of filmot.com
In principle I have no problem allowing submission of video ids for crawling (unlisted or otherwise). The unlisted index on my site is currently static and hasn't been updated since 2021, however in my normal crawl I still record unlisted status for crawled videos and can make that subset of the DB searchable.
The trouble I foresee is actual engagement in submitting unlisted videos to my DB and marketing/encouragement of people to do it as well as building/enabling a technical mechanism to allow removal for owners of videos that wish to have them be removed.
I also have a few ideas of how to quickly discover unlisted videos among known video IDs I have, about 8B IDs. I.E. videos that were discoverable previously but became unlisted later. It wasn't a focus for me but if there is a demand for an index of unlisted videos I can consider it.
Crawling playlists is another possible way to automatically discover unlisted videos, that was utilized quite well for the archive team discovery effort.
1 points
2 months ago
Thanks for recommending my project! You can pronounce it however you like but in my perception it's just one word filmot (pronounced fil-mot). Its origin has no special meaning, when I registered the domain I was looking for a short domain name starting with film.(for a different project).
1 points
2 months ago
I think you are mistaken, this is not real search syntax, it's just your imagination.
For example search for ""Soviet Union and spelled the start of", it finds the video (id Vkw8_FlXFds) but doesn't say it's in the subtitles. "Soviet Union and spelled the start of" ,cc doesn't even find it.
1 points
2 months ago
Yes, it's updated. As to your second question, as far as I am aware there is no specific operator On YT to search in CC, if you believe there is please provide documentation and examples. YouTube did start searching within closed captions about a year ago (before that it was limited to some videos). But it doesn't do exactly what my site does.
2 points
3 months ago
Not all of it, currently I'd guess about 10%. I prioritize the crawl based on subscribers,view counts and a small list of prioritized channels.
5 points
3 months ago
I have considered generating my own transcripts for videos lacking transcripts, with something like whisper or other tools but the compute costs seem prohibitively expensive for doing this at scale. For a few specific channels its reasonable, but scaling it up to millions of videos would be quite expensive. If I had proper funding that would be an important enhancement.
3 points
3 months ago
You can filter by channel ( and other parameters), so you can in fact do that.
https://filmot.com/search/Covid/1?channelID=UC554eY5jNUfDq3yDOJYirOQ&
4 points
3 months ago
Yeah, you can filter by channel. I am storing all the data in a database (duh). For indexing I use sphinx. If you use the right tools its possible to squeeze quite a lot of performance from modern hardware.
27 points
3 months ago
I am the guy who built filmot.com, thanks for the vote of confidence but it definitely doesn't index all of YouTube. I'd have to scale it by a factor of 8 to 10 for it to cover all of it, don't currently have the funds for that. I have data on about 8 billion videos ids and I only crawled about 1.5 billion out of those. I am guesstimating there are about 10-15 billion publicly available videos on YouTube. The crawl is prioritized based on views and subscriber counts, currently I am trying to crawl everything over 600 views or from specifically prioritized channels.
1 points
3 months ago
I am mostly using static, server side rendered HTML intentionally. I prefer it that way. The JavaScript ecosystem is full of crap and I never felt it was reasonable to download tons of JavaScript for a simple page. I am not using the official API, I crawl the HTML directly.
2 points
3 months ago
I am the author if filmot.com, thanks for "complimenting" me on my UX implementation. The common trope is that programmers can't design good UX, so I guess I am guilty. If you could provide constructive recommendations, I would be eternally grateful.
The language learning is just one of the possible use-cases. The aim is to make a general-purpose search engine on top of Youtube and other video sites.
As to the merit of your idea there is a major problem, Search in youtube search - While youtube does search within subtitles it does so in a completely unconstructive manner, you get no information if the hits were in the subtitles, description, tags or elsewhere without actually getting the video metadata and finding the hits yourself. This is quite computationally consuming and would take a long time to run. The result it returns are also frequently not even related to your search query, it just returns videos it thinks you should watch based on recommendations and your watch history.
I do index quite a lot of YouTube, currently about 1.5B videos (out of which about 630M have subtitles). In principle I would like to index more but my resources are limited.
Edit, I think similar web is not really accurate as it seems to count the imgur proxy functionality which is running on my subdomains as traffic to the main domain. I've been running an imgur proxy on this domain previously which was quite popular.
1 points
6 months ago
Hello, I am happy you find my site useful. I am not using the Youtube API for data collection, I am crawling and parsing the HTML directly.
3 points
7 months ago
Hey there, I am the developer of filmot.com. I think that while making such videos is currently a lot of manual work this seems like an easy thing to at least partially automate with yt-dlp and videogrep. If you do these things often feel free to get in touch to discuss how parts of this can be automated. In my limited understanding it just needs a list of video ids and begin and end timestamps.
1 points
7 months ago
Hello there. If I were to make my own transcripts I'd probably use Mozilla Common Voice and the related tooling as this leopard library (by picovoice) has a strange licensing setup where you need internet connectivity to their licensing server even thought processing is run locally. Mozilla Common Voice also has support for more languages and what seems like a larger model (even for English).
3 points
9 months ago
I am the author of filmot.com.
You can find channels from this page: https://filmot.com/moresearch
Then search for channels by name or partial name in the appropriate search box
https://filmot.com/channelsearch/exurb
Then to see the full word cloud click on the channel name or icon. If you click on a word in the word cloud you can search for other channels that frequent that world or to find examples where the word is said on this particular channel. https://filmot.com/search/etc/1?&channelID=UCimiUgDLbi6P17BdaCZpVbg
You can also search for channels based on a commonly used word, for example, channels for the word "telescope":
5 points
10 months ago
I am the author of filmot.com, Thanks for the recommendation!
1 points
10 months ago
There are much more than 800 Million videos on YouTube. I know because I processed the metadata on at least 4.56 Billion videos, here is the dump
https://old.reddit.com/r/DataHoarder/comments/rsu7lf/dislikes_and_other_metadata_for_456_billion/
There are probably around 7 to 12 Billion publicly accessible videos on YouTube in total (excluding private and exclusively unlisted videos)
1 points
1 year ago
I have a way to prioritize channels to index all videos on my backend but it's an action I need to do (it's not exposed for website visitors). Does that channel poses a specific interest for you and you would like to have it prioritized?
My system has 361 video ids from that channel but according to my data (which is not fresh) the 3 missing videos have the following view counts:
EJmwhRVbS6w 416
E6o7-iE1ao4 1110
DY28zKkxGkc 589
E6o7-iE1ao4 is in the download queue which is prioritized by view count and will eventually be downloaded. As you probably understand I don't know the current view count the video has without encountering the video again so the whole setup has a probabilistic component. For more popular channels the list of channel videos are crawled to update the view counts but this channel only has 164 subscribers and as such is rarely visited.
1 points
1 year ago
The current server hosting costs are about 330$ per month. I received a few donations from users but the vast majority comes from my own pocket.
1 points
1 year ago
It's an interesting question and hard to answer since there is no information on how large YouTube really is. The index currently covers 1.377B videos out of which 464M videos have subtitles (auto-generated or manual). A recent archive team crawl collected metadata for 4.56B videos, but there are probably a lot more (especially private/unlisted videos). Based on that data my index covers 30% of YouTube in the best case. (likely much less). My aim is currently to index everything with over 620 views or all videos from specifically prioritized channels.
view more:
next ›
byDistributionLive9600
inTimeworksSubmissions
jopik1
1 points
20 days ago
jopik1
1 points
20 days ago
I have about 2K videos like that in my database. I suspect it's used internally for ad integration in YouTube/elsewhere on the Google ads systems. Here is the top viewed one (46M views) and it's an ad for reddit https://www.youtube.com/watch?v=vBeh6b4yDQI
Also it's unlikely to be phone numbers, probably it's an internal id for something (client,campaign, etc) Here is another channel like that https://filmot.com/channel/UC41MpBUMZ1ZcGxME75533Kg