Thank you for providing this, you are a hero!!! I'm gonna try to do cool stuff with it!
tgv 5 hours ago [-]
It probably also got swamped in real-time...
linmer 5 hours ago [-]
Do you mean it's not updated? You gotta sort by update_time column. Looks sorted, but you gotta sort it with a query like:
SELECT * FROM hackernews_history
ORDER BY update_time DESC
LIMIT 100;
And yeah, I got that from deepseek because I don't have a brain.
GeoAtreides 5 hours ago [-]
oh hey, per HN terms and conditions I license my HN data only to HN. Can you please remove my data from the set? Thank you!
snowwrestler 5 hours ago [-]
Not sure if joking, but if this product is not republishing the text of your contributions (to which you hold copyright), you’re probably not going to convince a court to do anything here.
Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.
moralestapia 5 hours ago [-]
By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
@zX41ZdbW, you can safely ignore this guy.
@GeoAtreides, next time read the actual terms of service before hallucinating.
codingdave 5 hours ago [-]
> for any Y Combinator-related purpose
That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.
While a literal reading of the MIT license refers to "software", many datasets have been released under it.
In particular, if someone releases something that is only a dataset along with an MIT license file, the most reasonable interpretation is that the rights holder intended to release the data under the terms of that license.
I looked for copyright cases involving this specific distinction, whether "data" versus "software" makes a legal difference, but didn’t find anything.
So the question remains open (for you, for me it's pretty clear the dataset is released under MIT).
You might want to sue and find out. It sounds like an interesting experiment.
yes, and per HN terms and conditions only YC and YC affiliated (as you quoted) can use the api legally. I don't license my content to anyone else and so it shouldn't be use by anyone else, even if it's available on a free-for-all API (nice move HN, btw).
It's right there, you just have to click the link I shared ...
GeoAtreides 5 hours ago [-]
that's the license for the API, not the content/data the API serves
jupr 4 hours ago [-]
>including without limitation the rights
to use
'use'...arguably the sole purpose of the API is to fetch the data.
You are grasping at straws.
fartcoin67 4 hours ago [-]
[dead]
linmer 5 hours ago [-]
Wait, so I have to ask for every single person's permissions to use this data?
uhhhhhhhhhhhhhhhhhhhhhh
pelagicAustral 5 hours ago [-]
You must be fun at parties
jrflowers 1 hours ago [-]
Steve Carrell yelling “I DECLARE BANKRUPTCY!!” in The Office dot gif
Aachen 6 hours ago [-]
Google Trends is about searches
This is about published text. More like if Google Trends counted word occurrences on webpages. Or if Google Ngrams counted webpages instead of books
People don't write much about non-newsworthy things whereas many people search "burger" anytime they want a burger delivery. The datasets aren't usable in the same way
Edit: not to say it's not a cool product! Just keep this in mind and enjoy using it :)
hn_throwaway_99 1 hours ago [-]
> The datasets aren't usable in the same way
I strongly disagree, especially since this tool aggregates both posts and comments. While they don't measure the exact same thing, HN posts and comments are quite similar to searches from the standpoint of "What are people interested in finding more about and discussing" - stories that get popular usually have a lot of comments, thus boosting relevant terms, while posts about topics that don't trend score low because they don't get any relevant discussion comments.
Heck, just try it yourself - I compared "blockchain" to "OpenAI" with this tool, and got a predictable result (blockchain had some spikes up until the late teens, then OpenAI took over with the launch of ChatGPT). Interestingly, the Google Trends plot for these two terms looks very similar.
Aachen 5 hours ago [-]
Someone asked an imo good question (that I was going to vouch for, idk why it was dead), but deleted it. Not sure why, but so I'll not credit the username in case they don't want that and changed some words for stylometrics avoidance
> The concept seems pretty comparable. From the title I had a good idea of what it was; when clicking on it, the visual presentation felt familiar & intuitive. \n\n Being a little less literal can be useful!
That's why I'm pointing it out: the title leads you to think they're the same metric, the page looks visually similar, and so you treat it as the same data type; but when you read the data through this lens, you draw wrong conclusions. It took me a while, scrolling down the examples, before I realised why it felt so off and that my mindset is wrong. It's what's being written about currently, not what people on HN are actually looking for
It's indeed not about being nonliteral, it's for me about having been confused about the data being shown
john_strinlai 5 hours ago [-]
>Someone asked an imo good question but deleted it. Not sure why
it was me, and i deleted it because i realized my last sentence "being a little less literal can be useful" came across as unnecessarily blunt, which i didn't want. but i wasnt sure how to express what i wanted to say without it being that way. so i deleted it while rethinking my phrasing, and rethinking your comment.
in the end, i kind of came around to understand where you were coming from, so i didnt bother to recomment.
Aachen 5 hours ago [-]
Thanks! Didn't come across like that to me though, all good
ytkimirti 3 hours ago [-]
Yeah I feel like hackernews trends is alright but the post title is a bit misleading, noted
Now if Algolia had a dataset of what people are searching for on HN that'd be it
Aachen 5 hours ago [-]
Was considering that as well, but I doubt that people use Algolia in the same way that they use Google
5 hours ago [-]
pabzu 5 minutes ago [-]
That's so cool! is it possible to add a linux distro trends ?
simonpure 7 hours ago [-]
Hug of death
`
/api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT cle1::c8vgv-1782399959042-aeba3cae05ff
`
docheinestages 7 hours ago [-]
If this project is an ad for their product (Upstash, promising "Highly Available, Infinitely Scalable"), then the last thing they'd want is a hug of death :/
ryan_n 7 hours ago [-]
Oof that would be hilarious/tragic
steve1977 6 hours ago [-]
Downstash
y1n0 6 hours ago [-]
Must stash
superxpro12 6 hours ago [-]
/api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
6 hours ago [-]
jjordan 7 hours ago [-]
back in my day we called this a good ole' fashioned slashdotting.
lysace 7 hours ago [-]
Our startup (~20 people) got slashdotted in 1998 or so. I was the only one randomly awake at the time. Remember watching all the logs from our web server in realtime, ready to immediately kill anything or anyone threatening the overall availability.
512 kbps uplink, I think. Even accidental DoS was trivial. We had a self-hosted little data center at our office with the only available stupidly expensive commercial connection.
Felt some dread having to restart the main (async, single-process) web server a few times to keep things going due to bugs in our code. So many* people on dial-up patiently waiting for the page to load.
It was exhilarating though :).
*) Surely at least a hundred!
mysterydip 6 hours ago [-]
One of the things I love about HN is having stories like this in the comments from otherwise random unassuming usernames
Onavo 6 hours ago [-]
Its funny that these days the bottleneck is usually the data layer. Servers are so powerful now that even your average $5 server can handle HN levels of load if configured correctly.
Roonerelli 7 hours ago [-]
I get
/api/hn -> 502 {"error":"Search entry should have an initialized schema, command was: [\"SEARCH.AGGREGATE\",\"hn\",\"{\\\"$or\\\":[{\\\"title\\\":{\\\"$eq\\\":\\\"anthropic\\\",\\\"$boost\\\":5}},{\\\"text\\\":{\\\"$eq\\\":\\\"anthropic\\\"}}]}\",\"{\\\"by_month\\\":{\\\"$dateHistogram\\\":{\\\"field\\\":\\\"time\\\",\\\"fixedInterval\\\":\\\"30d\\\"}},\\\"top_authors\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"by\\\",\\\"size\\\":6}},\\\"by_type\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"type\\\",\\\"size\\\":4}}}\"]"}
arikrahman 3 hours ago [-]
I'm also getting /api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT cle1::48fnt-1782412720840-4855b2b75b5a after a few lookups
ytkimirti 7 hours ago [-]
We will be with you shortly :)
aNapierkowski 7 hours ago [-]
yeah we killed it :(
7 hours ago [-]
kaelyx 6 hours ago [-]
Hello, /api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
For some reason the results cut off at 2018-10 even though "Popular Comparisons" preview shows more.
ytkimirti 3 hours ago [-]
Fixed
smalltorch 7 hours ago [-]
Reminds me of this side project I'm working on.
https://gitlab/here_forawhile/torum
It's a HN clone, that syncs with HN that allows you to basically establish smaller private communities who can discuss anything that's on HN without actually being on HN.
It also indexes and let's you search through the DB which I find is really useful to find things that peak my interest.
'peak' refers to the top of a thing, commonly mountains
smalltorch 5 hours ago [-]
*find things that align with my intrest peaks
all2 3 hours ago [-]
Perfect. I love it.
kpw94 6 hours ago [-]
The huge spike of "lk-99" in science & frontier tech is amusing...
This is cool concept, would love a positive/negative sentiment computed for each comment that refers to a given word, so you can see trends of "cloudflare (positive)" vs "cloudflare (negative)" where first one counts comments only if sentiment confidence is greater than say 0.6 and the other one counts comments only if sentiment is less than 0.4 (assuming [0,1] sentiment score)
arjie 6 hours ago [-]
One useful feature would be to normalize by total so that I can see changes in something as opposed to just total site growth. Right now I have to chart a single generic parameter but if I pick poorly it’ll confuse the issue.
apitman 5 hours ago [-]
I'd love to see the opposite as well, ie how much has HN grown over time.
I also have a seperate page for the "Who is Hiring?" posts, here is the distribution of programming languages over each monthly "Who is hiring?" post in HN ever.
https://hackernewstrends.com/who-is-hiring
A minor suggestion - I'd like to be able to render the current graph taller (full height of my browser window).
Also some sentiment analysis on the "people" graphs would be very insightful (particularly for the likes of Edward Snowdon, Julian Assange, Elon Musk and Sam Altman). Perhaps colour the area under the graph red-orange-green based on the sentiment?
ytkimirti 6 hours ago [-]
Thanks for the feedback, noted the full-screen request.
The sentiment analysis is very interesting, I can do that easily. Could be a new page as well. Did you see this anywhere else or just your idea?
cbeach 6 hours ago [-]
Just my idea. I'm working on a side project https://newsavista.com/invite/ASAD68923E that aggregates news and tracks news trends and changing sentiment on the major stories. With cheap cloud LLMs (and "free" local LLMs) it turns out to be a trivial feature to build.
bluecoconut 6 hours ago [-]
Very cool!
one subtle consistency bug that made it hard for me to interpret when I was clicking around: the small thumbnail plot vs the full plot often (always?) seem to use different colors.
The blue / orange gets assigned to the opposite labels in the A vs. B when you click, which made it confusing to understand.
sinuhe69 7 hours ago [-]
IMO, using AI to assign keywords to a broader group of strict synonymous keywords would make the comparison much more helpful.
Because in general we want to know the trend of categories more than of a word, asking for “auto pilot” for ex. should include “self driving”, FSD etc.
marky1991 6 hours ago [-]
I would not like this. This is the kind of change that made google search so annoying. (Eg what if I want to track the history of 'self-driving' vs 'auto pilot' in sales pitches? Or more basically, what if the system wrongly interprets me wrongly?) Better to support | or similar old-fashioned search engine syntax and dwis and not dwim.
Pikamander2 6 hours ago [-]
Synonym functionality is good as long as there's an easy way to disable it, either globally or by wrapping the term in quotes.
linmer 5 hours ago [-]
Cool! I want to suggest something, Imagine I want to got to a specific date where some topic was hot, I can read it from your website and then go to that date. But it would be better if I could click on some sort of button, or on the points on the graph to go to that date. It would be easy to implement, you just need links like this:
https://news.ycombinator.com/front?day=2026-05-24
ytkimirti 3 hours ago [-]
Good idea, noted
dom96 7 hours ago [-]
Very cool idea. Shows programming language trends pretty well.
It looks like some of these terms aren't indexed (or the site is just too hug of deathed right now), but I'd like to see the graph of like, social media, iot, cryptocurrency, ai.
/api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT sfo1::rbqk2-1782415926647-577d5c5ed030
zarlss43 3 hours ago [-]
I thought Grunt was still popular after all these years. But I'm pretty sure this is picking up the trend of "grunt work" instead of the task manager.
dwoosley 5 hours ago [-]
Almost all of the major vulnerability and hack are just single spikes at the time it happened and it tails off after that… except Stuxnet. Stuxnet is was much more interesting that most other attacks since it was very political and openly published. Of course, the thing that attack was about is still a news headline today as well
5 hours ago [-]
Petersipoi 5 hours ago [-]
It's funny how "trump" dwarfs just about any other term. Truly a hacker forum.
onestay42 2 hours ago [-]
You could say... it trumps them
throwaway29812 5 hours ago [-]
[dead]
cloudkj 7 hours ago [-]
This is great, I was just hoping to find a tool like this and specifically scoped to "Show HN" posts? Is there a way to do that?
ytkimirti 7 hours ago [-]
Great idea actually, I'll add that as well for sure
scarecrw 7 hours ago [-]
Very cool!
I'd love to have some sort of normalization option to separate more subtle positive trends from the general increase in number of posts.
Insanity 5 hours ago [-]
This looks quite nice! But suspiciously absent data points.. no Java or Go for the languages? Seems odd. No Amazon in companies, yet I think it's often mentioned.
I wondered if "go" got filtered out because it's also just a regular word.
Either way, very cool!
toozitax 1 hours ago [-]
used it to query finance and tax launches and how they did and it was helpful. thanks
arikrahman 3 hours ago [-]
Despite being for trends this is actually a good tool to find articles that are interesting but sometimes buried.
ytkimirti 3 hours ago [-]
Exactly, I found lots of weird moments in history on the most random topics ever.
I let the LLM generate hundreds of terms and ran a “shock value” metric script to discover the interesting ones.
aberrahmane_b 5 hours ago [-]
Great project.The popular comparisons are probably the most useful part because they show the relay race between tools pretty clearly.
One thing I’d like to see is normalization by total HN activity over time.
ytkimirti 7 hours ago [-]
We had to take the site down for a second, it'll be online in a few minutes. Thanks for trying it out
chfritz 5 hours ago [-]
great idea! Now, you are running into the same issue Google Trends had to solve: term disambiguation. For instance, "atom" is ambiguous in a comparison of editors like this: https://hackernewstrends.com/?q=sublime&q=atom&q=vscode. Given LLMs it might be possible to use an embedding vector (with context) instead of a text string for indexing, and if you do, this problem might go away.
linzhangrun 6 hours ago [-]
Great job! I've also been wanting to do similar statistics recently, wanting to know when LLMs becoming the absolute dominant topic on HN. Now it seems like half of the posts were about LLMs.
corv 6 hours ago [-]
The 'flash vs html5' chart looks strange juxtaposed with that conclusion
al_borland 6 hours ago [-]
There are a few technologies with pretty generic names which don’t lend themselves so well to this kind of trend analysis.
I was curious about Atom. According to the trend it’s still neck and neck with VS Code. But are people really talking about Atom the text editor that much still, or other types of atoms?
fg137 5 hours ago [-]
I think Google Trends is actually smart enough to suggest which topic you want to see for the same keywords -- it understands the semantics.
linmer 5 hours ago [-]
I think atom is no longer being developed, so it must not be a that popular topic. is that what you meant?
al_borland 3 hours ago [-]
That’s my assumption, yes.
jazzpush2 5 hours ago [-]
This is a great project. It'd be fun to look at some of the more popular startups over time, both those that ended up successful and those that didn't.
stopachka 5 hours ago [-]
Nice! Would love a brief explanation of the infrastructure. I see the Powered by "Upstash Redish Search", but why choose Upstash Redis Search vs something else?
ltrg 5 hours ago [-]
It would be super interesting to see if HN mentions serve as a leading indicator of company performance/valuations -- I wouldn't be surprised.
jianfenglin 5 hours ago [-]
Glad to see that the raw data is also shared. Very cool, but why the openai vs anthropic graph has no data post 2019?
ytkimirti 5 hours ago [-]
Yeah we had to refill the dataset due to an error, it will be fixed in a few minutes
Does the trend only show absolute numbers? Because I think it should be divided by the number of posts during the time frame (day?).
7 hours ago [-]
jasonjmcghee 3 hours ago [-]
Terms with spaces seem to be an "or"
maxignol 4 hours ago [-]
Funny one x)
Though I ain’t sure if even more data is useful on hackernews
flakiness 7 hours ago [-]
The example comparisons made me smile. Well done!
rightbyte 7 hours ago [-]
Nice. Is the data points y-axis normalized by total amount of comments at that time?
Edit:
Nvm seems like absolute count if you click the graph.
upmostly 4 hours ago [-]
Looking at this makes me think HN is peak design aesthetic.
jahala 6 hours ago [-]
Really cool! Where would you get the data for something like this? Is it open, or its scraped?
igcorreia 6 hours ago [-]
The colors of the lines of the big graph are inverted compared to the smaller ones.
chris_money202 7 hours ago [-]
Love this, seems to struggle with newly indexed words. Will try again when the FP load is gone
SoKamil 6 hours ago [-]
Are those raw numbers or adjusted for active users at given point in time?
NooneAtAll3 6 hours ago [-]
I'd be interested in "google ngram for hacker news" instead
ytkimirti 6 hours ago [-]
What is missing from it? I've used ngrams as well and I this was partly inspired by that.
dacox 5 hours ago [-]
very cool! not sure if something is broken, but there seems to be no data past 2019 on any of the queries that i can see
joelres 7 hours ago [-]
Really beautiful, informative, and functional layout. Great work!
docheinestages 7 hours ago [-]
But can it discover new trends without having to type the keywords?
Cider9986 5 hours ago [-]
Scrolling is totally broken for me.
GL26 7 hours ago [-]
insane ! I don't know if it's possible but it would be huge if we had access to the localisation of the trends
mkgeorge7 6 hours ago [-]
This is actually very cool@
mkgeorge7 6 hours ago [-]
This is actually very cool!
drchaim 7 hours ago [-]
too slow or broker right now
WhitneyLand 5 hours ago [-]
First great work.
Reminds that I wish there was a modern way to do this for the words people speak and write online with. I want to literally know when people started putting literally twice in sentences.
Ngram seems is out of date a piece meal. Now Corpus seems like they try but UX terrible.
lazystar 7 hours ago [-]
nice. i guess AWS still had nothing to fear from GCP/Azure. ty for this
This is the only HN submission I ever upvoted because it is amazing
ytkimirti 7 hours ago [-]
Thanks, it was my first ever post here as well, would you look at that
fragmede 7 hours ago [-]
If more people spent time on /new looking for awesome stuff and vouching for dead items, HN would be a better place.
linmer 5 hours ago [-]
Has anyone tried to make some sort of algorithm to find cool stuff on HN or sort by upvotes etc? I know it's cool and intended that such things don't exist, but has anyone tried?
frankzero 7 hours ago [-]
I know right
joe_the_user 5 hours ago [-]
The topic comparisons are pretty boring and search is disabled. Perhaps I'll remember to return to this. But I can't think of much it gives that plain Google nGram viewer doesn't.
thomasgeelens 5 hours ago [-]
oeeh hug of death, congrats!
k33n 6 hours ago [-]
This is quite useful at-a-glance
clacker-o-matic 7 hours ago [-]
ooh this is sick! really nice ui too!
ProofHouse 6 hours ago [-]
Yup your upstash is rate limited
nailer 5 hours ago [-]
> API design, era by era: REST becomes the web's default 2012–15, then the post-REST generation splits: gRPC for service-to-service from 2016, GraphQL for the client from 2017.
No. Looking at the diagram, REST is the default until 2017, GraphQL is briefly popular around early 2020s, then the web resturns to REST.
So you can create any sort of similar services in a single SQL query and an HTML page.
I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...
It is also updated in real-time.
SELECT * FROM hackernews_history
ORDER BY update_time DESC
LIMIT 100;
And yeah, I got that from deepseek because I don't have a brain.
Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.
@zX41ZdbW, you can safely ignore this guy.
@GeoAtreides, next time read the actual terms of service before hallucinating.
That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.
https://opensource.org/license/mit
In particular, if someone releases something that is only a dataset along with an MIT license file, the most reasonable interpretation is that the rights holder intended to release the data under the terms of that license.
I looked for copyright cases involving this specific distinction, whether "data" versus "software" makes a legal difference, but didn’t find anything.
So the question remains open (for you, for me it's pretty clear the dataset is released under MIT).
You might want to sue and find out. It sounds like an interesting experiment.
is zX41ZdbW either?
I didn't consider you might now know about:
https://github.com/hackernews/api
It's right there, you just have to click the link I shared ...
'use'...arguably the sole purpose of the API is to fetch the data.
You are grasping at straws.
uhhhhhhhhhhhhhhhhhhhhhh
This is about published text. More like if Google Trends counted word occurrences on webpages. Or if Google Ngrams counted webpages instead of books
People don't write much about non-newsworthy things whereas many people search "burger" anytime they want a burger delivery. The datasets aren't usable in the same way
Edit: not to say it's not a cool product! Just keep this in mind and enjoy using it :)
I strongly disagree, especially since this tool aggregates both posts and comments. While they don't measure the exact same thing, HN posts and comments are quite similar to searches from the standpoint of "What are people interested in finding more about and discussing" - stories that get popular usually have a lot of comments, thus boosting relevant terms, while posts about topics that don't trend score low because they don't get any relevant discussion comments.
Heck, just try it yourself - I compared "blockchain" to "OpenAI" with this tool, and got a predictable result (blockchain had some spikes up until the late teens, then OpenAI took over with the launch of ChatGPT). Interestingly, the Google Trends plot for these two terms looks very similar.
> The concept seems pretty comparable. From the title I had a good idea of what it was; when clicking on it, the visual presentation felt familiar & intuitive. \n\n Being a little less literal can be useful!
That's why I'm pointing it out: the title leads you to think they're the same metric, the page looks visually similar, and so you treat it as the same data type; but when you read the data through this lens, you draw wrong conclusions. It took me a while, scrolling down the examples, before I realised why it felt so off and that my mindset is wrong. It's what's being written about currently, not what people on HN are actually looking for
It's indeed not about being nonliteral, it's for me about having been confused about the data being shown
it was me, and i deleted it because i realized my last sentence "being a little less literal can be useful" came across as unnecessarily blunt, which i didn't want. but i wasnt sure how to express what i wanted to say without it being that way. so i deleted it while rethinking my phrasing, and rethinking your comment.
in the end, i kind of came around to understand where you were coming from, so i didnt bother to recomment.
` /api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT cle1::c8vgv-1782399959042-aeba3cae05ff `
512 kbps uplink, I think. Even accidental DoS was trivial. We had a self-hosted little data center at our office with the only available stupidly expensive commercial connection.
Felt some dread having to restart the main (async, single-process) web server a few times to keep things going due to bugs in our code. So many* people on dial-up patiently waiting for the page to load.
It was exhilarating though :).
*) Surely at least a hundred!
/api/hn -> 502 {"error":"Search entry should have an initialized schema, command was: [\"SEARCH.AGGREGATE\",\"hn\",\"{\\\"$or\\\":[{\\\"title\\\":{\\\"$eq\\\":\\\"anthropic\\\",\\\"$boost\\\":5}},{\\\"text\\\":{\\\"$eq\\\":\\\"anthropic\\\"}}]}\",\"{\\\"by_month\\\":{\\\"$dateHistogram\\\":{\\\"field\\\":\\\"time\\\",\\\"fixedInterval\\\":\\\"30d\\\"}},\\\"top_authors\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"by\\\",\\\"size\\\":6}},\\\"by_type\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"type\\\",\\\"size\\\":4}}}\"]"}
For some reason the results cut off at 2018-10 even though "Popular Comparisons" preview shows more.
https://gitlab/here_forawhile/torum
It's a HN clone, that syncs with HN that allows you to basically establish smaller private communities who can discuss anything that's on HN without actually being on HN.
It also indexes and let's you search through the DB which I find is really useful to find things that peak my interest.
'peak' refers to the top of a thing, commonly mountains
This is cool concept, would love a positive/negative sentiment computed for each comment that refers to a given word, so you can see trends of "cloudflare (positive)" vs "cloudflare (negative)" where first one counts comments only if sentiment confidence is greater than say 0.6 and the other one counts comments only if sentiment is less than 0.4 (assuming [0,1] sentiment score)
This was a small project of mine after I've found out that I can simply the whole hackernews archive (~48GB) and play around with it.
You can compare terms just like in google trends and you can also see the exact posts & comments from that time.
I like that you can discover what went crazy in the timeline, they just come up as small burst of activity, it's quite fun to play around with it. https://hackernewstrends.com/?q=litecoin&q=dogecoin&q=solana...
I also have a seperate page for the "Who is Hiring?" posts, here is the distribution of programming languages over each monthly "Who is hiring?" post in HN ever. https://hackernewstrends.com/who-is-hiring
Any kind of feedback is welcome.
Currently it says "no job-post mentions in this window" for everything. Transient error?
Where is this archive located you speak of?
A minor suggestion - I'd like to be able to render the current graph taller (full height of my browser window).
Also some sentiment analysis on the "people" graphs would be very insightful (particularly for the likes of Edward Snowdon, Julian Assange, Elon Musk and Sam Altman). Perhaps colour the area under the graph red-orange-green based on the sentiment?
The sentiment analysis is very interesting, I can do that easily. Could be a new page as well. Did you see this anywhere else or just your idea?
one subtle consistency bug that made it hard for me to interpret when I was clicking around: the small thumbnail plot vs the full plot often (always?) seem to use different colors.
The blue / orange gets assigned to the opposite labels in the A vs. B when you click, which made it confusing to understand.
Because in general we want to know the trend of categories more than of a word, asking for “auto pilot” for ex. should include “self driving”, FSD etc.
https://hackernewstrends.com/?q=Nim&q=Rust&q=Zig
The transition between crypto and ai on the graphs is already pretty funny. https://hackernewstrends.com/?q=crypto&q=chatgpt
I'd love to have some sort of normalization option to separate more subtle positive trends from the general increase in number of posts.
I wondered if "go" got filtered out because it's also just a regular word.
Either way, very cool!
I let the LLM generate hundreds of terms and ran a “shock value” metric script to discover the interesting ones.
One thing I’d like to see is normalization by total HN activity over time.
I was curious about Atom. According to the trend it’s still neck and neck with VS Code. But are people really talking about Atom the text editor that much still, or other types of atoms?
I am really liking the trend for "linux": https://hackernewstrends.com/?q=linux
https://hackernewstrends.com/?q=linux&q=windows
Edit: Nvm seems like absolute count if you click the graph.
Reminds that I wish there was a modern way to do this for the words people speak and write online with. I want to literally know when people started putting literally twice in sentences.
Ngram seems is out of date a piece meal. Now Corpus seems like they try but UX terrible.
Hmm, did I break something?
No. Looking at the diagram, REST is the default until 2017, GraphQL is briefly popular around early 2020s, then the web resturns to REST.