Eclipse Yoxos Services Downloads Blogs About
Home > Blogs >

Posts Tagged ‘git’

on Jun 22nd, 2010Git Support, Top Eclipse Helios Feature #2

Only 1 more day until Eclipse Helios is release and we are down to my Top 2 features.

Over the life of Eclipse (Jeff McAffer tells me that he’s been working on Eclipse since 1999) a lot has changed. Eclipse started its life inside OTI/IBM. In November 2001 the Eclipse Consortium was announced and Eclipse was released as ‘Open Source’. For the next few years Eclipse grew, but was still mostly supported by a few large companies. New projects were proposed, new committers came on board, and Eclipse became the dominate player in the IDE space.  But as the popularity of Eclipse grew, so did its diversification. Then in April 2010, David Carver noticed that the number of active individual committers (those not associated with any particular company) was tied with IBM for the top spot.

Committers Git Support, Top Eclipse Helios Feature #2

What does all this mean and what does this have to do with the Eclipse Helios release? Well, as Eclipse continues to diversify, the Eclipse foundation will need a software revision control system that supports this diversification. The Eclipse Helios release marks the beginning of this transformation. Number 2 on my Top 10 List is: Git Support at Eclipse.

Three important components make up the Git support at Eclipse: JGit, EGit and the Git Infrastructure. JGit is a pure Java library implementation of Git version control system. JGit is licensed under the EDL has a number of users, including the Netbeans Git support.

EGit is the Eclipse tooling, and is build on JGit. There is currently support for a number of Git features:

Egitmenu 0.8.0 Git Support, Top Eclipse Helios Feature #2

History view:

Egit 0.8 history view Git Support, Top Eclipse Helios Feature #2

Repository View:

Egitrepositoriesview Git Support, Top Eclipse Helios Feature #2

Patch Support:

PatchContextMenu Git Support, Top Eclipse Helios Feature #2

The JGit / EGit team has excellent documentation and there is some great information on Git in general.  Git is being worked on by Matthias Sohn, Shawn Pearce, Chris Aniszczyk, Mathias Kinzler, Stefan Lay, Robin Rosenberg and Christian Halstrick.  However, a really big thank-you goes out to the past (and present) committer reps for bringing Git to Eclipse.  The initial Git contribution provided a number of unique licensing challenges that required unanimous approval from the Eclipse board of directors.  Git at Eclipse would not have been possible without their hard work.

In addition to the tool support, Eclipse.org has rolled out Git infrastructure for the community to make use of. There are Git mirrors for Eclipse projects and even Git repositories that some projects have started to migrate too. The big thank-you goes out to Denis Roy and Wayne Beaton for this.  Git really is the future of Eclipse, and if all goes as planned, Git will be on my Top 10 List again next year.

on Apr 22nd, 2010Eclipse DemoCamp 2010 in Mannheim

Ever been to Mannheim? If not – this is your chance to visit this lovely city. For the Helios release, the guys behind the majug² (Mannheimer Java user Group) invite everybody to the Helios Democamp in June. And as Ian already found out: Yes, we love our DemoCamps! It’s always great to have technical discussions over a frosty beverage!

2455008482 b1def65090 Eclipse DemoCamp 2010 in Mannheim

Watertower by flamouroux

At the moment, the attendee list is still pretty empty but save yourself a seat while it’s not booked out – they only have 100 seats available. Topics this year include EGit, EclipseRT, Android and Roo. Do you think a cool topic is missing? Step up and give a demo about what you’re doing! I’m really looking forward to see more demos of how people use Eclipse as IDE or runtime.

Eclipse camp Eclipse DemoCamp 2010 in Mannheim

Hope to see you there for another great DemoCamp and ad-hoc Stammtisch!

on Mar 19th, 2010Helios M6 RCP package

The new EPP packages for Helios M6 are uploaded to the download area and just need some more hours to be distributed to the Eclipse download mirrors until we can make them available for the public from eclipse.org/downloads. The mirroring is important, because otherwise the eclipse.org uplink would be entirely saturated and no one could get the Helios M6 bits in time before EclipseCon.

In the meantime, I’d like to highlight some additions that I recently did as a package maintainer of the RCP package. (If you don’t know what a package maintainer is you should consider joining my talk on Monday about ‘Building EPP packages‘.)

  • git is becoming more and more popular at Eclipse and EGit is always one of the first plug-ins that I am installing whenever I unpack a new Eclipse milestone on my computer. The logical step: Include EGit in my RCP package because I think that I am not the only one who needs this tool.
  • Another addition that I recently made is the RAP tooling. My daily work has changed and in the last months I am doing more RAP development than RCP development. I am not entirely sure if one needs both in one package, maybe RAP needs to go into its own package, but so far I think both technologies  complement each other. I am happy to get feedback – see bug 230357.
  • Last but not least: The Marketplace Client (MPC) is included to allow early feedback – the developers of this nice tool need your feedback to bring it into the best possible shape for Helios!

Now let’s wait until the packages are available… and I need to go back preparing my EclipseCon slides.

on Dec 13th, 2009Persistent Trees in git, Clojure and CouchDB

This is a tale of three images. I found these images while investigating the internals of several different applications. There are some really neat software projects emerging at the moment, and as a developer I always find it interesting to take a look at the implementation details, because there is often a lot to be learned. It’s not always something you might need right now, but maybe a few years down the line you may be confronted with a similar problem. Plus – in my opinion – knowing a bit about the internals of a program helps reasoning about its behaviour.

Exhibit A: git repositories

git trees Persistent Trees in git, Clojure and CouchDB

Let’s start with the first image. This one is taken from Scott Chacon’s talk “Getting Git”. (Actually I had to mix slides 138 and 142 together to better fit the blog format.) I’ll try not to go to much into the details of git here, listen to the talk instead of you want to know more. The thing I do want to talk about though is git’s tree structures. These tree structures describe the project contents (both directory and file) for one specific commit. These trees are completely immutable (this is ensured by SHA-1 hashes). A new commit creates what amounts to a completely new tree. But usually the changes in each commit are only a fraction of the whole tree. Since a lot of the sub-trees of the original tree have not changed (and are immutable), these can be safely recycled. Only files and folders that have changed need to be added. Note that this means parent nodes of changed nodes have to be created as well, recursively up to the tree root. But even though the tree itself is new (as evidenced by the new root) the additional space requirements are quite low, on the order of approximately O(log n). This makes for extremely compact repository format, and an astonishingly simple one.
Another thing I want to point out that the only element that needs to be mutated is the reference to the root of the tree, everything else is immutable.

Exhibit B: Clojure’s persistent data structures

clojure trees Persistent Trees in git, Clojure and CouchDB

Next up is a picture lifted from Rich Hickey’s “Clojure Concurrency” talk. If you are interested in figuring out ways to deal with the impending multi-core age, I highly recommend you take a look at this talk. The concepts and techniques are not exclusive to Clojure, so even if you are not into Lisp’s you might want to watch it.

The way clojure deals with concurrency (in a nutshell) is that every core data structure (list, trees, hash maps, etc) is immutable. This has one big perceived drawback: Immutable data structures incur a huge overhead because of all the copying involved in creating new instances. But this is only a problem if the immutability is implemented naively. Analysis shows that usually only a part of a data structure is actually modified at any given time. This again allows recycling large parts of the the “old” data structure and only creating part of it new, and just pointing to the bits of the old structure that have not changed. As these are immutable as well, it is completely safe to do so. In the picture the two green boxes are the root nodes of two data structures sharing large parts of their trees.

Since a program that only deals with static, immutable data is pretty boring, there are mechanisms for introducing changes in a program. These mechanisms are called references, and these do mutate. But the runtime ensures that these changes happen in a controlled and well behaved manner, e.g. in the form of Software Transactional Memory (STM) or agents. But everything else is essentially immutable. Note that this has the neat side effect that reading process are inherently thread-safe since nothing can suddenly change while you are looking at it, i.e. no more ConcurrentModificationExceptions.

Exhibit C: CouchDB’s append-only file format

couchdb trees Persistent Trees in git, Clojure and CouchDB

The third picture is from a blog post by Ricky Ho titled “NoSQL patterns”. CouchDB has made quite a few waves recently, so I wanted to see what makes it tick under the hood. In keeping with the Erlang philosophy of robustness and failure tolerance, couchdb uses an append-only file format. This means that all data written is not modified any further and there is thus little chance for corruption. CouchDB uses a B+ Tree to index its entries, so again we have a tree structure. And because the tree entries live inside an append only file, these trees are immutable as well. When making a change to the database, the new data is written along with all changes to nodes in the B+ Tree. Since the B+ Tree is very flat, that means even for databases with millions of entries the number of updated tree nodes is fairly small. The most recent root of the index tree can always be found at the very end of the file, and thus the database file is read from the end backwards. If any kind of failure occurs while writing changes, the software can see that the last entry is corrupt and can seek further back until it finds a “good” entry and proceed from there. Another interesting benefit is that reading processes don’t block writing processes and vice versa. A reading process can just find the latest root and then work from there. Because all references go backward in the file, and everything is immutable there is no contention. Meanwhile a writing process can happily append at the end of the file without disrupting any readers.

Further exhibits

There are probably more examples of this pattern to be found. The oldest reference I could find is “The Design and Implementation of a Log-Structured File System” written by Mendel Rosenblum and John K. Ousterhout in 1991. The most recent file system effort seems to be NILFS. The flash-based SSDs that are currently entering the market are also rumored to use something along these lines internally. I also suspect that a few of the traditional DBMS might use similar data structures under the hood

Shared characteristics

Although all the applications do very different things and come from completely different backgrounds, they share a common data structure underneath. Unfortunately I have not been able to find a well published moniker for this pattern. If anyone does, please let me know.

Update: As FuncFan pointed out in the comments the moniker I was looking for was “Purely functional trees“. Thanks for the quick reply!

Immutable, recycling trees

All three systems have these immutable trees in some way way or form that allow sharing structure. In git these are tree objects, in Clojure they are called persistent data structures, and in CouchDB they are part of the internal index tree.

Mutable references

To allow for changes in an otherwise immutable world, all systems allow for mutable reference constructs. All change is encapsulated in these references, which are called exactly that in both git and Clojure. In CouchDB the story is a little different. Here the “reference” to the latest tree is simply the end of the database file

More common ground

Apart from these “primary” characteristics there are other shared features among these three systems, that are a direct result of the underlying data structure.

Garbage collection

Because data may be shared between data structures, it is often not safe to delete all children when a root node is no longer needed. Instead a garbage collection mechanism is needed to free unused structures. Git does have a special command that does exactly that, Clojure uses the Garbage Collection in the JVM implicitly, and CouchDB offers a compact operation, which removes old versions and tree indices.

Versioning

Because no structure is changed, it is trivially easy to keep old versions around, and since a lot of data is shared it is usually fairly cheap in terms of memory to do so. Versioning is the primary application of git, so no big surprises on that end. In Clojure it is also fairly easy to add versioning for things like an “undo” buffer: Just keep a list of the old objects around. CouchDB also offers some lightweight versioning out-of-box, but it is mostly used for replication. But it should be fairly simple to add more sophisticated versioning features.

Concurrency Safety

The immutability properties of all three system make reasoning about concurrent changes and processes a lot simpler. Reading is trivial because all that is read is set in stone, so to speak. Writing is made easier by the fact that there is only a single point – the reference – that is modified to effect a specific change. This critical point can then be guarded with traditional synchronization primitives, without having to worry about the rest of the data structure.

Conclusion

I have to admit I found it a bit eerie seeing a single pattern pop up in so many different places. It may be just an idea whose time has come (similar to Garbage Collection a few years back). Or it may be just that now we simply have enough memory and computing resources at our disposal, which allows us to never have to delete something from the record, but instead only add incrementally. And oftentimes the history of data is just as interesting as the data itself. The benefits of immutability may also be a good way to tame the concurrency beast in the years to come.

I hope you enjoyed this comparative trip into the internals of these software systems. Again, check out the two talks and the blog post, they are well worth the time.

on Nov 3rd, 2009Git Mirrors at Eclipse.org

Good news everyone, Git mirrors are going live at Eclipse.org

Eclipse Git Mirrors

Please give them a whirl.

If you find any issues, please state them on this bug.

on Sep 22nd, 2009Git at Eclipse

Git has been gaining some traction in the Eclipse community as of late.

githeader 300x40 Git at Eclipse

From the birth of the EGit project at Eclipse and the recent approval of JGit to be hosted at Eclipse as a sub project of the EGit project, good things are coming. Why should you care about Git?

Git is pretty popular these days as evident by some of the open source projects out there using Git:

  • Linux
  • KDE
  • Qt
  • Android
  • X.org
  • Wine
  • VLC
  • OLPC
  • OpenAFS
  • Ruby
  • Perl

Apache is even rumored to be switching, at the moment they have a public GIT mirror.

Git is also fast and efficient. In some of my testing, Git produced significantly smaller repositories than SVN did. In terms of speed… I think Git’s ability to do branching cheaply is one of its biggest assets. In the end, I think the most important feature of Git is that it significantly lowers the barrier to contribution. People are able to easily branch your work and you can pull at a later time. I’m not a Git expert by any means yet, but here are some things that have helped me along my Git journey:

In the Eclipse world, I see a move towards Git as the smart thing to do. It would make it easier for the Eclipse community to contribute code versus our current model. It would also help the many companies out there that maintain their own copies of Eclipse and patch things as necessary because of their release cycles. The more I look at it, I can’t come up with many reasons why Eclipse shouldn’t move to Git. Here are some current happenings:

  • A vserver is being provisioned with Git and Gerrit installed
  • A read only GIT mirror of the Eclipse codebase is being setup

If Git is important to you at Eclipse, I encourage you to get involved with the EGit project via their mailing list.

on Mar 2nd, 2009Git BoF @ EclipseCon

EclipseCon is coming up, and to my big suprise the Git BoF got accepted.

Initially, this BoF proposal was just a way to get the ball rolling on distributed version control systems at eclipse. In the recent weeks I have learned that the ball has been rolling for some time already and has gained quite a momentum, especially among the committers – just take a look at the comments on Bug 257706.

I’d like to get all stakeholders involved in the discussion. Denis Roy will be there to represent the technical and infrastructure side of things. Apart from downstream consumers, committers, contributors and potential contributors, it would also be really great to have some members of the legal team and the board present to get as many viewpoints as possible.

I know that git support was put on hold at the board meeting in December. The cited reason was the resource cost to deploy and support yet another VCS. While I can understand that concern, especially after the recent investment in subversion, I think there is another issue to consider: The opportunity cost of not deploying a DVCS in the near future. Contributions from developers are the life blood of any open source project, and as such, it should be as easy as possible to get people involved in the development process. I experienced first hand how cumbersome it can be to work with the current infrastructure. Centralized versioning may be fine for corporate software development, but for a project like Eclipse I fear it results in too many hurdles for developers. That’s why I’m hoping we can really get this off the ground sooner rather than later.

Regarding infrastructure costs, it has been my understanding that the effort to setup a repository is relatively low. Which I guess makes sense since each developer has to have his own repository. A possibility might even be to use a third party provider like github, at least during the transition period.

Another interesting issue is what work flow model to adopt if we decided to support a DVCS. Options include a “lieutenant” style process as practiced by the Linux kernel, or a more traditional approach with a central master repository that committers commit to.

These are some of the issues I’d like to discuss in this BoF. I am absolutely stoked to see this happening and am really looking forward to making some progress on this front. See you all there!

on Jan 2nd, 2009Making history

I used what little free time I had over the holidays to catch up on the recent developments in source control management systems, which have been quite interesting to follow. Especially the arrival of Distributed Version Control Systems has caused quite a buzz in the software development industry.

Somehow Eclipse as a whole totally missed the SVN boat. Only in the last year has there been any uptake of Subversion within the Eclipse Foundation itself. And although there are two separate plugins (and by my count at least 3 different adapter libraries) it still doesn’t feel quite as solid as the CVS counterpart. But the writing is on the wall: CVS does not fulfill all the needs we expect from a contemporary source code management system. And while I have been a long-time fan of Subversion (atomic-commits and rename tracking, yay!) I still feel that maybe the time of centralized source control is over, especially in an open source environment. Doug Schaefer seems to agree. Distributed source control makes it much easier for potential contributors to get started. Experimental and “investigative” branches are easier to manage, and things like feature branches are more easily merged again. While it’s not all roses either (partial checkouts and IP-taints anyone?), but it does make the the development process quite a bit better. And isn’t that one of the main goals of the Eclipse Foundation?

So over the holiday I checked out (quite literally) the latest egit sources and played around with it a little. I must say that I came away quite impressed. Although there are still some features missing, the core looks very solid and usable.

The history of the egit project

The history of the egit project

Because the source for egit is openly available, it was very easy for me to “grok” the innards of the git system. My C is getting a little rusty these days, so a Java implementation made a much better reading experience for me. Debugging and Ctrl-clicking as well for that matter. With the core jgit library in hand, I even tried to implement a versioned POJO persistence store, but that’s a story for another time. What struck me most about git was the raw and brutal simplicity and the resulting elegance of its design. Believe it or not, but git only knows four different types of objects. Java’s “hello world” uses about that many! Another neat thing is the hash value that is calculated for virtually everything you do. When I first readabout this feature, I was quite skeptical to be honest: For starters, I don’t want to type 40 letter strings anytime I want to reference a revision and secondly, what about collisions? Well I worked out the math on that one, and what it comes down to is this: by the time I will create my first collision I will have been struck by lightning three times over and have survived the sun going supernova. And as for typing in those hashes, well you need it less than you would think and even then you usually only need the first few letters to make it unambiguous.

So kudos to the egit team for the great and continued work on this project. There even is an egit project proposal up. Although git support was postponed in the last board meeting, I still think this a step in the right direction and I’m looking forward to future developments.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes
© EclipseSource 2008 - 2009