Technically speaking, organizations used in this report are no more only a startup now, but I hope you people won’t mind this and aren’t gonna launch a drone on me.
I think, something is clear from the name itself, is it? Well! it should.
This report tries to plot all the involved organizations on the Open-Source portal. It tries to tell that, in the race to achieve their goals, what different organizations are doing there, for/in the community.
It’s pretty biased though, because this report uses only one platform of the Open-Source community, GitHub.
This report doesn’t measure the success of involved organizations; it simply can’t. They all are doing good in their fields, that’s why they are here. :tophat:
I think it was almost mid of the December last year when I saw the interview of Flipkart’s CTO Amod Malviya in a YourStory article. I started reading that and kept reading till the end. At the end my reaction was, wow! this man is awesome and he is indeed. I have seen many of his talks after reading that interview.
That interview made a different impression on me. I liked his words where he was talking about building a top class internet infrastructure in India. I don’t know what you people think of Flipkart, Myntra etc. but what I think is that they are evolving continuously, at least in the technical aspect. That’s why they are in the marathon and Amazon itself is in the race with them.
So, after a while I found myself on the GitHub organization of Flipkart and I was scrolling through their projects there. Then the idea of this report popped-up in my mind and here I’m, struggling with it.
The earth will keep rotating without this report but it’s kinda necessary for technical organizations to be a part of current Open-Source era. I mean as they say in the Group Dynamics, If you’re part of a group then you learn for other members and they learn from you.
Do you remember something named Facebook? Lets take an example from them.
Maybe that you take PHP as a language for the kids but keep in mind that The Social Network was initially developed in that same PHP. But as they started growing and feeling glitch using it; seeing that the :santa: was not coming to help them, they attempted building something on their own. Finally today, we know the inventions as HHVM and Hack language.
So, the thing is don’t wait for santa and build cool things that matters. Big organizations are already doing it, be it hhvm, react by Facebook or typeahead.js by Twitter or web-starter-kit by Google and many more by others.
I do believe that the organization selection part was a bit biased as I wanted to have my favorite organizations first on the list, like HackerEarth, Hasgeek, Housing, Flipkart, Wingify and Zomato etc.
It was disappointing to see that Housing was not on the GitHub by that time and Zomato’s organization was having zero public activities.
Finally, I selected 15 startups, giving priority to my favorite ones.
There is a section here in this report, which uses last year’s GitHub activity of organizations, so I killed my idea of replacing Zomato by someone else as the year was gone and it was kinda tough to jump traditional API bumper and collect data.
As I said Zomato have zero public activity last year but it doesn’t mean they are not good, they are doing pretty good; aquiring it all, at a rate of hurricane wind speed and serving in cities more than you’ve ever been in your life. Maybe they are using some other platform, a local Git hosting or something.
You better zoom-in the images or open them in a different tab.
Do you know, when all of these organizations were found? Not sure?
This plot shows relative appearance of selected organizations both in the public world as well as in the open-source world.
Add legend text in the image.
Well! in case if you’re thinking that this information is all chatter, let me present something interesting.
Go back and see the image carefully and you’ll notice something different from others for Cucumbertown and HasGeek.
Yes! the GitHub organizations for these two were created before their public launching itself. Sounds interesting, right?
I can’t say for Cucumbertown now but I can present a supporting theory to prove this for the HasGeek.
Do you guys remeber what was the first event that HasGeek organised? It was DocType HTML5, you silly. The event was held on October, 2010 and HasGeek was pubilcally launched in December, 2010. You can fly to their GitHub account and check that they are developing hasgeek/doctypehtml5 since then.
Maybe organising this event was the inspiration behind launching the HasGeek, I need to hear HasGeek founder Kiran’s words on it, though.
As we all know, repository is an important component of GitHub’s ecosystem.
This section deals with no. of public repositories for each involved organization.
Cloud services provider ShepHertz has maximum no. of public repositories there, mainly based on their App42 service stack. Flipkart and HasGeek also have significant no. of repositories, rest are the organizations are building their store gradually.
No. of repositories on GitHub is not the right thing to measure about, though.
As I said, having more number of repositories doesn’t explicitly show your popularity. It’s not an old wars between states where king with more elephants was supposed to be the winner.
But no. of stars on any GitHub repository can represent its vogue, leave the case where they’re fake.
This graph represents the stars distribution on all the repositories of involved organizations.
Top 10 repositories according to no. of stars
You can see Wingify, Flipkart and HasGeek are ruling the leader-board here.
GitHub provides a feature named fork, using that you can contribute to awesome projects of others like it was your own project.
This section deals with attributes of repositories, counting which one of them is a forked repository or which one is a source repository.
This plot shows which organization have all their own source repositories and which one is having forked repositories.
During the development, I also calculated active and inactive percentage of the forked repositories. You can have a look here at how this was calculated.
We can see that HasGeek is doing fairly good here, having more share of source repositories than forked. A large portion of Flipkart and Freshdesk’s repositories are inactive-forked.
All the involved organizations have somewhat for the community; projects born as solutions of some problems, projects born in some hackathons and so on. They’re gradually building things to enhance their infrastucture and market position.
This section deals with creation of repositories of all the organizations.
Again, if you think that it’s general knowledge, then let me show you the magic.
Go back and watch the image carefully and you’ll notice something weird for HackerEarth, are you?
Yes! you see there, HackerEarth’s first repository was created before creation of their GitHub organization itself. How is this even possible?
Well! ladies and gentlemen, this is possible. Let me introduce a new theory in support of this.
HackerEarth’s oldest repository in the time series is django-storages. It’s the same repository, which is creating the confusion. But the fact is that this repository was initially forked by HackerEarth’s Co-founder Vivek on his GitHub account. After the creation of a separate organization for HackerEarth, he merged that repository to the organization.
That’s why this repository’s creation date is before creation of their organization. Well! again, I need Vivek’s approval on this.
This section deals with the commit activity of all the organizations.
This plot shows weekly commit activity of all the organizations. This is pretty much mixed-up though, but this was the only plot-type in my mind at the time, when I was developing this.
You can see a relatively more development activity in the start of the year.
Flipkart development team keeps a fork of the linux, it’s not a forked repo though. I removed its activities because this was making the plot even more cluttered. You can check that plot also, though.
Different organization are working in different fields of the technology; be it medical services, developer events, online shopping, food, cloud services, online payments etc., so they’re encountering different problems in the path and managing it accordingly.
This section deals with use of different programming languages in the involved organizations’s infrastructure.
This plot uses colors from GitHub’s linguist for different programming languages.
This helps us understanding tech-stack of all the organizations.
This section deals with the fields, different organization are working in.
To calculate the results, I have used repository names and their description here. Actually I wanted to have relative sharing in fields of working of all the organizations.
So, initially, my plan was to use Latent Dirichlet Allocation on the repository-description-text corpus for Topic Modeling.
Where I had use concatenated repository descriptions of organizations as a document but then I droped this idea because of asymmetrical repository distribution. It was resulting in a corpus of 14 documents only (Zomato excluded).
You can have a brief knowledge about LDA, here.
Then I changed the plan and moved towards Naive Bayes Classifier and used word frequencies only.
So, some of the topic results from Classifier for organizations are :
Here we can see that Flipkart’s stack includes things related to distributed computing, Networking, Databases on the other hand Wingify’s stack includes things related to Frontend, Data, Networking.
So, this is it. Open Source Presence Infographic of Indian Startups. :octocat:.
If you’re thinking that santa helped me in all this; then you are wrong, my friend. I was all alone everytime, thinking about it, collecting the data, managing R source files in Rstudio, writing Python for it and all that.
If you’re feeling that you can do something much more awesome than this. > You can do whatever you want; It’s hosted on GitHub, pravj/ospi.
Open Source Presence Infograohic : Terms and Conditions