Eclipse Yoxos Services Downloads Blogs About
Home > Blogs >

Posts Tagged ‘git’

on Jun 10th, 2011You don’t have to use git to access code on github

I guess a lot of people would agree that github is the current kick-ass platform for developing software. Many platforms showed up fast and with the same speed they disappeared. Github is different. It’s also genuinely innovative. For several months I use github to share small projects (widgets, tools, small plug-ins). When I write a blog about something new I always link the associated github repository.

A few days ago some people mentioned to me that it’s great that the sources from my posts are open but they can’t install git on their machines due to security restrictions in their company. They aren’t even allowed to install egit as an Eclipse plug-in. But there is  good news. You don’t have to use git when you want to get sources from github. You can download every branch from every (public) github repository without git. When you browse to a repository you can simply press the download button on the right and download the latest version of the repository as zip or tar.gz. Is this simple? As I said, it’s a kick-ass platform icon wink You dont have to use git to access code on github

github You dont have to use git to access code on github

 

on May 29th, 2011Git Lesson: Be mindful of a detached head

A severed head is never fun, and in git this is no different. In fact, a detached head can cause quite the headache.  In this article I will discuss what a detached head is, how it can happen, and most importantly, what you can do about it.  But before I begin, let’s rehash a bit about how git works.

A git repository is a direct acyclic graph of commits.  Here is a very simple git repository.

 Git Lesson: Be mindful of a detached head

In the EGit history view it would look like this:

screenshot 078 Git Lesson: Be mindful of a detached head

Now, let’s assume that core and ui bugs (commits E and F) were not actually bugs, but rather controversial design decisions. And the the fixes broke other parts of the application.  After a quick meeting, someone suggests a new way of addressing the problems that will make everyone happy. One problem, they’re not sure if it will actually work.   Since they are using git, they checkout the code before the controversial bugs were addressed (Commit B) and they start hacking away.  Now here’s the crux of the problem: a branch is a linear set of commits, so where do these new commits live? The truth is, head is not actually visible from any branch now.  This is a detached head.

 

 Git Lesson: Be mindful of a detached head

In the EGit history view you can see you have no head.

screenshot 079 Git Lesson: Be mindful of a detached head

The good news is that you can quickly fix this problem by creating a new branch.  In egit, this is as simple as Team -> Switch To -> New Branch. Now, all these ‘detached commits’ will live on the new branch (new_idea).

 Git Lesson: Be mindful of a detached head

You can now see the new commits (and new branch) in the EGit history view:

screenshot 080 Git Lesson: Be mindful of a detached head

What if you accidentally  switched branches while you were on a detached head.  At this point, there is no way to access the commits because they are not attached to any branch. Luckily, git remembers all the commits, even the ones that happened while your head was detached.  Simply drop the command line and issue: git reflog

screenshot 081 Git Lesson: Be mindful of a detached head

You can now see the missing commits.  If you return to Eclipse and use Navigate -> Open Git Commit you can open the commit 73df399

screenshot 082 Git Lesson: Be mindful of a detached head

And directly create a branch from here

screenshot 083 Git Lesson: Be mindful of a detached head

Another very common way to lose your head is to checkout a remote branch. Again, if you checkout a remote branch, immediately create a new local branch for your new commits; otherwise you’ll be working with a detached head.

There is more information about detached heads here.  Remember, don’t fight your tools, just git ‘er done.

on Apr 5th, 2011How to blog using GitHub and Eclipse

If this is not the first post by me that you’re reading, you may know that I try to blog regularly. Previously, I had 2 or 3 private blogs which, you also might know, were not that successful icon wink How to blog using GitHub and Eclipse . Since I started at EclipseSource, I publish on our company blog.

Anyway, I started my first blog 5 years ago and used some horrible, long forgotten php software. For my other blogs and the EclipseSource blog, WordPress is used. WordPress is probably the winner when it comes to blogging software. It’s widely accepted and heavily used by thousands of bloggers. While I like WordPress a lot, it has some drawbacks. Currently I’m sitting in the train writing this post and what tool do I use? A simple Text Editor. When I arrive at my company I have to paste this post into our WordPress, do a little formatting and publish it. What I really miss in this workflow with the text editior is the history of the post.

As a developer I like trying out the new stuff. As a result I’ve been using Git for a while now and I’m quite happy with it. I’m also an Eclipse committer and the Eclipse IDE is my home. Thanks to my colleagues I’m quite quick with all the shortcuts in the IDE. So, using the IDE as my blog editor and Git as the version control system (aka. history) would feel quite natural to me. But, how can we do this?

pages How to blog using GitHub and EclipseLuckily there is GitHub, probably your choice also for a Git hosting service too. Anyone can create a public Git repository for free. In the same way as WordPress is the winner for blog systems, GitHub is the winner when it comes to Git hosting services. What is less well known is that GitHub provides another service called GitHub pages. With pages you can use a Git repository to publish web contents. All you need to do is create a Git repository with a special naming (your-username.github.com) and everything pushed to this repository will be published under http://your-username.github.com (also good for publishing p2 repositories).

What the GitHub folks have also implemented is a Jekyll integration. Jekyll is a blog system that transforms your articles using static templates. You can add a blog post by adding a text file. After writing your posts, all you have to do is push the files to your GitHub pages repository and GitHub automatically starts the Jekyll transformation to create the blog. Isn’t this awesome? You get a blog system with Git as a history and a web hoster for free – in one provider icon wink How to blog using GitHub and Eclipse .

How does Eclipse come into the game? After cloning your repository to a local destination you can use linked workspace resources to edit the blog post in your IDE. All you have to do is create a new project and change the location to your Git repository root (see screenshot below).

projectWizard How to blog using GitHub and Eclipse

After editing your posts you have the option to use EGit (the Eclipse Git Integration) to push your changes to GitHub (don’t forget to share the project). The only piece missing is an Eclipse Jekyll integration (GSoC Students where are you?). With this I mean, when I save or commit a local blog post, it would be nice if a Jekyll transformation could be triggered to provide a local preview of the blog. Currently I do this by executing a command on the shell.

I’ve decided already that when I have to create a new blog I will use this technique. If you are not convinced yet, check out these example blogs which use GitHub and Jekyll.

 

on Jun 22nd, 2010Git Support, Top Eclipse Helios Feature #2

Only 1 more day until Eclipse Helios is release and we are down to my Top 2 features.

Over the life of Eclipse (Jeff McAffer tells me that he’s been working on Eclipse since 1999) a lot has changed. Eclipse started its life inside OTI/IBM. In November 2001 the Eclipse Consortium was announced and Eclipse was released as ‘Open Source’. For the next few years Eclipse grew, but was still mostly supported by a few large companies. New projects were proposed, new committers came on board, and Eclipse became the dominate player in the IDE space.  But as the popularity of Eclipse grew, so did its diversification. Then in April 2010, David Carver noticed that the number of active individual committers (those not associated with any particular company) was tied with IBM for the top spot.

Committers Git Support, Top Eclipse Helios Feature #2

What does all this mean and what does this have to do with the Eclipse Helios release? Well, as Eclipse continues to diversify, the Eclipse foundation will need a software revision control system that supports this diversification. The Eclipse Helios release marks the beginning of this transformation. Number 2 on my Top 10 List is: Git Support at Eclipse.

Three important components make up the Git support at Eclipse: JGit, EGit and the Git Infrastructure. JGit is a pure Java library implementation of Git version control system. JGit is licensed under the EDL has a number of users, including the Netbeans Git support.

EGit is the Eclipse tooling, and is build on JGit. There is currently support for a number of Git features:

Egitmenu 0.8.0 Git Support, Top Eclipse Helios Feature #2

History view:

Egit 0.8 history view Git Support, Top Eclipse Helios Feature #2

Repository View:

Egitrepositoriesview Git Support, Top Eclipse Helios Feature #2

Patch Support:

PatchContextMenu Git Support, Top Eclipse Helios Feature #2

The JGit / EGit team has excellent documentation and there is some great information on Git in general.  Git is being worked on by Matthias Sohn, Shawn Pearce, Chris Aniszczyk, Mathias Kinzler, Stefan Lay, Robin Rosenberg and Christian Halstrick.  However, a really big thank-you goes out to the past (and present) committer reps for bringing Git to Eclipse.  The initial Git contribution provided a number of unique licensing challenges that required unanimous approval from the Eclipse board of directors.  Git at Eclipse would not have been possible without their hard work.

In addition to the tool support, Eclipse.org has rolled out Git infrastructure for the community to make use of. There are Git mirrors for Eclipse projects and even Git repositories that some projects have started to migrate too. The big thank-you goes out to Denis Roy and Wayne Beaton for this.  Git really is the future of Eclipse, and if all goes as planned, Git will be on my Top 10 List again next year.

on Apr 22nd, 2010Eclipse DemoCamp 2010 in Mannheim

Ever been to Mannheim? If not – this is your chance to visit this lovely city. For the Helios release, the guys behind the majug² (Mannheimer Java user Group) invite everybody to the Helios Democamp in June. And as Ian already found out: Yes, we love our DemoCamps! It’s always great to have technical discussions over a frosty beverage!

2455008482 b1def65090 Eclipse DemoCamp 2010 in Mannheim

Watertower by flamouroux

At the moment, the attendee list is still pretty empty but save yourself a seat while it’s not booked out – they only have 100 seats available. Topics this year include EGit, EclipseRT, Android and Roo. Do you think a cool topic is missing? Step up and give a demo about what you’re doing! I’m really looking forward to see more demos of how people use Eclipse as IDE or runtime.

Eclipse camp Eclipse DemoCamp 2010 in Mannheim

Hope to see you there for another great DemoCamp and ad-hoc Stammtisch!

on Mar 19th, 2010Helios M6 RCP package

The new EPP packages for Helios M6 are uploaded to the download area and just need some more hours to be distributed to the Eclipse download mirrors until we can make them available for the public from eclipse.org/downloads. The mirroring is important, because otherwise the eclipse.org uplink would be entirely saturated and no one could get the Helios M6 bits in time before EclipseCon.

In the meantime, I’d like to highlight some additions that I recently did as a package maintainer of the RCP package. (If you don’t know what a package maintainer is you should consider joining my talk on Monday about ‘Building EPP packages‘.)

  • git is becoming more and more popular at Eclipse and EGit is always one of the first plug-ins that I am installing whenever I unpack a new Eclipse milestone on my computer. The logical step: Include EGit in my RCP package because I think that I am not the only one who needs this tool.
  • Another addition that I recently made is the RAP tooling. My daily work has changed and in the last months I am doing more RAP development than RCP development. I am not entirely sure if one needs both in one package, maybe RAP needs to go into its own package, but so far I think both technologies  complement each other. I am happy to get feedback – see bug 230357.
  • Last but not least: The Marketplace Client (MPC) is included to allow early feedback – the developers of this nice tool need your feedback to bring it into the best possible shape for Helios!

Now let’s wait until the packages are available… and I need to go back preparing my EclipseCon slides.

on Dec 13th, 2009Persistent Trees in git, Clojure and CouchDB

This is a tale of three images. I found these images while investigating the internals of several different applications. There are some really neat software projects emerging at the moment, and as a developer I always find it interesting to take a look at the implementation details, because there is often a lot to be learned. It’s not always something you might need right now, but maybe a few years down the line you may be confronted with a similar problem. Plus – in my opinion – knowing a bit about the internals of a program helps reasoning about its behaviour.

Exhibit A: git repositories

git trees Persistent Trees in git, Clojure and CouchDB

Let’s start with the first image. This one is taken from Scott Chacon’s talk “Getting Git”. (Actually I had to mix slides 138 and 142 together to better fit the blog format.) I’ll try not to go to much into the details of git here, listen to the talk instead of you want to know more. The thing I do want to talk about though is git’s tree structures. These tree structures describe the project contents (both directory and file) for one specific commit. These trees are completely immutable (this is ensured by SHA-1 hashes). A new commit creates what amounts to a completely new tree. But usually the changes in each commit are only a fraction of the whole tree. Since a lot of the sub-trees of the original tree have not changed (and are immutable), these can be safely recycled. Only files and folders that have changed need to be added. Note that this means parent nodes of changed nodes have to be created as well, recursively up to the tree root. But even though the tree itself is new (as evidenced by the new root) the additional space requirements are quite low, on the order of approximately O(log n). This makes for extremely compact repository format, and an astonishingly simple one.
Another thing I want to point out that the only element that needs to be mutated is the reference to the root of the tree, everything else is immutable.

Exhibit B: Clojure’s persistent data structures

clojure trees Persistent Trees in git, Clojure and CouchDB

Next up is a picture lifted from Rich Hickey’s “Clojure Concurrency” talk. If you are interested in figuring out ways to deal with the impending multi-core age, I highly recommend you take a look at this talk. The concepts and techniques are not exclusive to Clojure, so even if you are not into Lisp’s you might want to watch it.

The way clojure deals with concurrency (in a nutshell) is that every core data structure (list, trees, hash maps, etc) is immutable. This has one big perceived drawback: Immutable data structures incur a huge overhead because of all the copying involved in creating new instances. But this is only a problem if the immutability is implemented naively. Analysis shows that usually only a part of a data structure is actually modified at any given time. This again allows recycling large parts of the the “old” data structure and only creating part of it new, and just pointing to the bits of the old structure that have not changed. As these are immutable as well, it is completely safe to do so. In the picture the two green boxes are the root nodes of two data structures sharing large parts of their trees.

Since a program that only deals with static, immutable data is pretty boring, there are mechanisms for introducing changes in a program. These mechanisms are called references, and these do mutate. But the runtime ensures that these changes happen in a controlled and well behaved manner, e.g. in the form of Software Transactional Memory (STM) or agents. But everything else is essentially immutable. Note that this has the neat side effect that reading process are inherently thread-safe since nothing can suddenly change while you are looking at it, i.e. no more ConcurrentModificationExceptions.

Exhibit C: CouchDB’s append-only file format

couchdb trees Persistent Trees in git, Clojure and CouchDB

The third picture is from a blog post by Ricky Ho titled “NoSQL patterns”. CouchDB has made quite a few waves recently, so I wanted to see what makes it tick under the hood. In keeping with the Erlang philosophy of robustness and failure tolerance, couchdb uses an append-only file format. This means that all data written is not modified any further and there is thus little chance for corruption. CouchDB uses a B+ Tree to index its entries, so again we have a tree structure. And because the tree entries live inside an append only file, these trees are immutable as well. When making a change to the database, the new data is written along with all changes to nodes in the B+ Tree. Since the B+ Tree is very flat, that means even for databases with millions of entries the number of updated tree nodes is fairly small. The most recent root of the index tree can always be found at the very end of the file, and thus the database file is read from the end backwards. If any kind of failure occurs while writing changes, the software can see that the last entry is corrupt and can seek further back until it finds a “good” entry and proceed from there. Another interesting benefit is that reading processes don’t block writing processes and vice versa. A reading process can just find the latest root and then work from there. Because all references go backward in the file, and everything is immutable there is no contention. Meanwhile a writing process can happily append at the end of the file without disrupting any readers.

Further exhibits

There are probably more examples of this pattern to be found. The oldest reference I could find is “The Design and Implementation of a Log-Structured File System” written by Mendel Rosenblum and John K. Ousterhout in 1991. The most recent file system effort seems to be NILFS. The flash-based SSDs that are currently entering the market are also rumored to use something along these lines internally. I also suspect that a few of the traditional DBMS might use similar data structures under the hood

Shared characteristics

Although all the applications do very different things and come from completely different backgrounds, they share a common data structure underneath. Unfortunately I have not been able to find a well published moniker for this pattern. If anyone does, please let me know.

Update: As FuncFan pointed out in the comments the moniker I was looking for was “Purely functional trees“. Thanks for the quick reply!

Immutable, recycling trees

All three systems have these immutable trees in some way way or form that allow sharing structure. In git these are tree objects, in Clojure they are called persistent data structures, and in CouchDB they are part of the internal index tree.

Mutable references

To allow for changes in an otherwise immutable world, all systems allow for mutable reference constructs. All change is encapsulated in these references, which are called exactly that in both git and Clojure. In CouchDB the story is a little different. Here the “reference” to the latest tree is simply the end of the database file

More common ground

Apart from these “primary” characteristics there are other shared features among these three systems, that are a direct result of the underlying data structure.

Garbage collection

Because data may be shared between data structures, it is often not safe to delete all children when a root node is no longer needed. Instead a garbage collection mechanism is needed to free unused structures. Git does have a special command that does exactly that, Clojure uses the Garbage Collection in the JVM implicitly, and CouchDB offers a compact operation, which removes old versions and tree indices.

Versioning

Because no structure is changed, it is trivially easy to keep old versions around, and since a lot of data is shared it is usually fairly cheap in terms of memory to do so. Versioning is the primary application of git, so no big surprises on that end. In Clojure it is also fairly easy to add versioning for things like an “undo” buffer: Just keep a list of the old objects around. CouchDB also offers some lightweight versioning out-of-box, but it is mostly used for replication. But it should be fairly simple to add more sophisticated versioning features.

Concurrency Safety

The immutability properties of all three system make reasoning about concurrent changes and processes a lot simpler. Reading is trivial because all that is read is set in stone, so to speak. Writing is made easier by the fact that there is only a single point – the reference – that is modified to effect a specific change. This critical point can then be guarded with traditional synchronization primitives, without having to worry about the rest of the data structure.

Conclusion

I have to admit I found it a bit eerie seeing a single pattern pop up in so many different places. It may be just an idea whose time has come (similar to Garbage Collection a few years back). Or it may be just that now we simply have enough memory and computing resources at our disposal, which allows us to never have to delete something from the record, but instead only add incrementally. And oftentimes the history of data is just as interesting as the data itself. The benefits of immutability may also be a good way to tame the concurrency beast in the years to come.

I hope you enjoyed this comparative trip into the internals of these software systems. Again, check out the two talks and the blog post, they are well worth the time.

on Nov 3rd, 2009Git Mirrors at Eclipse.org

Good news everyone, Git mirrors are going live at Eclipse.org

gitmirrors 300x93 Git Mirrors at Eclipse.org

Please give them a whirl.

If you find any issues, please state them on this bug.

on Sep 22nd, 2009Git at Eclipse

Git has been gaining some traction in the Eclipse community as of late.

githeader 300x40 Git at Eclipse

From the birth of the EGit project at Eclipse and the recent approval of JGit to be hosted at Eclipse as a sub project of the EGit project, good things are coming. Why should you care about Git?

Git is pretty popular these days as evident by some of the open source projects out there using Git:

  • Linux
  • KDE
  • Qt
  • Android
  • X.org
  • Wine
  • VLC
  • OLPC
  • OpenAFS
  • Ruby
  • Perl

Apache is even rumored to be switching, at the moment they have a public GIT mirror.

Git is also fast and efficient. In some of my testing, Git produced significantly smaller repositories than SVN did. In terms of speed… I think Git’s ability to do branching cheaply is one of its biggest assets. In the end, I think the most important feature of Git is that it significantly lowers the barrier to contribution. People are able to easily branch your work and you can pull at a later time. I’m not a Git expert by any means yet, but here are some things that have helped me along my Git journey:

In the Eclipse world, I see a move towards Git as the smart thing to do. It would make it easier for the Eclipse community to contribute code versus our current model. It would also help the many companies out there that maintain their own copies of Eclipse and patch things as necessary because of their release cycles. The more I look at it, I can’t come up with many reasons why Eclipse shouldn’t move to Git. Here are some current happenings:

  • A vserver is being provisioned with Git and Gerrit installed
  • A read only GIT mirror of the Eclipse codebase is being setup

If Git is important to you at Eclipse, I encourage you to get involved with the EGit project via their mailing list.

on Mar 2nd, 2009Git BoF @ EclipseCon

EclipseCon is coming up, and to my big suprise the Git BoF got accepted.

Initially, this BoF proposal was just a way to get the ball rolling on distributed version control systems at eclipse. In the recent weeks I have learned that the ball has been rolling for some time already and has gained quite a momentum, especially among the committers – just take a look at the comments on Bug 257706.

I’d like to get all stakeholders involved in the discussion. Denis Roy will be there to represent the technical and infrastructure side of things. Apart from downstream consumers, committers, contributors and potential contributors, it would also be really great to have some members of the legal team and the board present to get as many viewpoints as possible.

I know that git support was put on hold at the board meeting in December. The cited reason was the resource cost to deploy and support yet another VCS. While I can understand that concern, especially after the recent investment in subversion, I think there is another issue to consider: The opportunity cost of not deploying a DVCS in the near future. Contributions from developers are the life blood of any open source project, and as such, it should be as easy as possible to get people involved in the development process. I experienced first hand how cumbersome it can be to work with the current infrastructure. Centralized versioning may be fine for corporate software development, but for a project like Eclipse I fear it results in too many hurdles for developers. That’s why I’m hoping we can really get this off the ground sooner rather than later.

Regarding infrastructure costs, it has been my understanding that the effort to setup a repository is relatively low. Which I guess makes sense since each developer has to have his own repository. A possibility might even be to use a third party provider like github, at least during the transition period.

Another interesting issue is what work flow model to adopt if we decided to support a DVCS. Options include a “lieutenant” style process as practiced by the Linux kernel, or a more traditional approach with a central master repository that committers commit to.

These are some of the issues I’d like to discuss in this BoF. I am absolutely stoked to see this happening and am really looking forward to making some progress on this front. See you all there!

© EclipseSource 2008 - 2011