Data Mining and Weka 3 Installation

27 Sep

Now I am sorry but this post shall have no pictures, mostly because there really isn’t anything that needs a picture to be understood at this point as I assume you all know how to download and install a program onto your computer from the internet. However, not everyone knows what data mining is so I feel it will be a good base to begin with a good definition and maybe an analogy or two. Data mining is the process of finding new previously undiscovered patterns in large data sets using various methods, including but not limited to artificial intelligence and statistical methods. For more on the topic you can find the wiki here. However, I feel that this definition of data mining fits the purpose. Basically, if you picture a large data set to be a a gold mine, you are trying to “mine” for the “gold” that is a data pattern. This is usually represented as some variety of model, which I will dicuss later. Either next week or even later this week.

However, first you are going to need a a tool for data mining. this is where Weka 3 comes in. Weka 3.6.4, to be exact. At least that is what I am going to be doing all of my work in, so if you wish to follow my blog you will probably need to install that. You can find the Weka download site here. In order to download Weka 3.6.4 you ned to find your OS on the download page under the heading of “Stable book 2nd ed. version”. From there, the executable will download onto your computer, and you need only follow the instructions on your installer to install weka. Then you only need to find a data set and you can start on mining it. I will find an interesting one for my next post for discussing models.

 

Advertisements

Protege API and Eclipse.

27 Sep

This was the fourth assingment for my semantic web class. We were supposed to recreate the Hello Semantic Web World Example from Chapter 2 of our text book using Protege rather than Jena, which I discussed below. I attempted this assignment but to be honest, it was like wading through mud. First, unlike Jena, the functions in the Protege API are not extremely discriptive of their functionality nor are they very simple. In Jena, one could just read a file into a model. In Protege it is not that simple. They do have a create model function that allows you to create either an empty model, a model from a reader, a URI, etc. But from there I could not find the function to append instances from a file onto an existing model. That made it difficult to complete the addition of friends to the FOAF model. I did, however, find a SPARQL Query function. I could not get it to work on the generated model. On top of that, whenever I tried to even generate the model I got an warning, that I could not find a way to remove. So in my honest opinion, I cannot reccomend using the Protege API over Jena. While it seems that Protege is able to do anything Jena is able to, I can only confirm that from tutorials. I even attempted to make a simple model to try the query function and it was a no go. In the picture above is what happens when I run the code for creating and querying my simple model and attached is the code for it. Model Code

I apologize that I cannot be more helpful this week, and I apologize for not yet uploading anything on Weka yet. I trying to figureout how best to present it.

Creation of OWL Ontologies with Protege 3.4.7 and SPARQL Querying

21 Sep

While going through some documentation for Protege 3.4.7, I came across the documentation for creating an OWL ontology using Protege. After finding this, I decided to revisit the assignment from my previous post. First, I would like to comment on the OWL ontology creation process in Protege 3.4.7. It is not entirely intuitive, but once you get into it, it is actually quite easy. The first thing one must do is to create your class in the OWL Classes tab and rename it in the for class text window as shown in the posted picture. From there, you need to go into the properties tab, and in the properties box select the Datatype box. From there you add in the datatypes for the class of your choice. For the person class I added values such as name, nickname, job, etc. After this you add in any conditions that are necessary for the datatypes your added into the class, for example: name can only be a string, and a person can only have one name. From here I was able to successfully query the ontology, as I did in the previous post. One thing to be sure of, if you add multiple statements into the “where” statement in your query, be sure to add a “.” after each individual statements. Here is the code for my ontology: FOAFJPelican.

Please read this in conjunction with the previous post.

SPARQL Querying With Protege 3.4.7

20 Sep

First I have to be upfront, the assignment this time was to create your own FOAF file then to query it. Unfortunately while I created the FOAF file, for some reason I could not query it correctly so I will attach it to this blog. It would help me out immensely if someone could possibly look at this file and point out my mistakes. However, I did run the Protege SPARQL querying example here on the data here. However, rather than the exact tutorial, I only ran a query on name as opposed to the query they ran on the tutorial. As you can see from the image of my results above, I only ran a query on the name of the element and it only returned a table with the names of the element, whereas the example did a query on more elements thus the table returned more elements. This is actually a significant difference SPARQL has from SQL where the entirety of an element would be returned if one used a ‘select *’ statement. The ‘where’ in the statment seems to be where the element elimination seems to take place. My one comment on the syntax of this language is that it would be a bit easier to use if the syntax was more transparent with SQL syntax. Select *, while it works in SQL and returns all the elements of a table, does not work in SPARQL. You need a ‘where’ statment at least. Also, in the ‘where’ statement you need {?classname:elementname ?elementname} in order to select members from the file and the specific element from that member to select in your query. With a simple retooling, however, I believe SPARQL would be a lot more user friendly.

To change gears a bit, in order for a you to use SPARQL in Protege 3.4.7, you need to enable the Pellet reasoner in the reasoning menu of Protege. Then in order to run a SPARQL query ou need to go back into the reasoning menu and select the “open SPARQL query panel” option. Also when running this tutorial, in the ‘where’ brackets. You need to eliminate the table from the query statement, for example, rather than ‘?element table:name ?name” you need to use “?element:name ?name”. Otherwise you will get a syntax error.

Two final notes: if I am able to get my FOAF file working I will update this post or post how I corrected it in another post. Secondly, I will be posting on the use of Weka in this blog as an effort to combine Weka, which is a data mining tool, with semantic web tools and techniques we will also be learning.  You can follow this link to my FOAF file: jpelicanFOAF

Some work with Protege and Some Comments A Few Ontologies

12 Sep

This past week or so I had the opportunity to be introduced to a few sites that used semantic web ontologies in their functionalities and also was able to use Protege for the first time in order to understand these ontologies. I suppose the first thing I will comment on is the use of Protege. If you get this tool, you need to make sure you have all the plug-ins you need for it as well as GraphViz, so that you can properly access the OWLViz visualizer used by Protege. But once you have everything installed it’s a nice use friendly way of translating ontologies into something easily understood, in the case of Protege: flowcharts.

This was handy for understanding the ontologies used in Geonames and FOAF (Friend of a Friend). Actually, the featured picture for this entry is the Geonames add a location GUI so I want to talk about them first. The Geonames ontology’s hierarchy. At the very most basic layer everything is a thing, as with all flow charts I have seen with Jena. However, after that there are four classes of thing for Geonames: Concept, Concept Scheme, Document, and Spatial Thing. There is one Concept (Code, which is the feature ID), one Concept Scheme(Class, which is the classification of the feature, such as a park), three documents(Map, Wikipedia article, RDFdata, these I believe are self-explanatory), and one spatial thing (Feature, which is a location on the map). In the end, each of these Things contain their own set of qualities that describe them, but this is the general hierarchy. This is much more complex than the hierarchy of FOAF. If you create a FOAF file and put it into Protege, is the beginning there is Thing, like Geonames, but there are only two Things: Person and PersonalProfileDocument. Person describes you and contains the data that you entered into the profile, while PersonalProfileDocument is the FOAF file itself.

However, from a strictily user point of view, I couldn’t help but feel that Geonames was really lacking what I somewhat felt to be necessary functionality. While making a new point on the map, it was somewhat difficult to mark a location since you needed to be able to identify the location from an aerial view. Also, it feels like the GUI allows you to input a lot of information, but none of it really useful or meaningful to the average person. If it allowed more information to be input that was more geared towards the average person as well as a few different search options for finding unmarked points it would be a bit better. FOAF’s UI for creating a FOAF file I like a lot though. You had to enter a bunch of information, but none of the intrusive information was required to be entered and it encrypted email addresses if you wanted them to. The only complaint I had was that since a FOAF file is supposed to resemble a business card the option to include a few more lines of contact would be nice and maybe possibly a keyword option for keyword searching. But none of this was detrimental to my user experience.

I actually visited one more site but didn’t have the opportunity to go through its ontology file and that is Google recipes. From what I can see, from the UI each recipe is divided into three classes: ingredients, cooking time, and calories. From a user standpoint, I really like how simple this is to use. You can just search an ingredient and it will give you a huge list of recipes that you can then refine based on other ingredients, cook time, etc. If they had a way of rating the difficulty of the recipe and if the recipe site was easier to find I think it would be nearly perfect. Unfortunately, due to the fact that I am limited to the UI to come up with the ontology, I cannot comment on that part as much, so I will have to leave this part as a mostly user review of the site.

Installation of Tools

5 Sep

The tools that we will be using at this point in the class are the Eclipse Helios Java Development Environment (http://www.eclipse.org/downloads/packages/release/helios/sr2), the Jena Semantic Web Framework (http://sourceforge.net/projects/jena/files/Jena/Jena-2.6.4/), the Protege Ontology Tool (http://protege.stanford.edu/), and the Pellet Reasoning Plug-in for Protege. These tools can be obtained from the accompanying links. In order to install Eclipse, you first need to go to the link listed in this blog, It will download as a zip file. Unpack this file to the location of your choice (in my case, my progam files folder in my C drive) and if you like, create a shortcut to the exe file on your desktop. To install Jena, go to the listed link and unpackage that zip file to the location of your choice, preferably somewhere easily found. In my case, directly into the C drive. Protege comes with its own installer, so when you go to the accompanied link, you only need to download the installer exe and run it, installing it with your own preferences. Finally, in order to download the Pellet plug-in all you need to do is run Protege. Click on the file menu then click check for plug-ins. A menu will pop up, on this menu click on the downloads tab. Check the check box and click install.

The above image is the sample output for an example for the use of Eclipse with the Jena Libraries. Take the code from eclipse example and insert it into an elipse project as a java file. Then right-click the project, click on properties, click on Java Build Path, and click libraries. Click add external jars and import the jar files from your Jena folder, wherever you happened to install it. When you run this code, your output should appear as the above image. If it does not, ensure that you have impoted the proper Jena libraries.

What is semantic web?

5 Sep

Greetings all. This is the first post in a continuous blog that will be maintained throughout my work with semantic web technologies at Florida Atlantic University. Here is the link to our site: semanticweb.fau.edu.

There is a question that must be answered prior to proceeding with any work, however, and the question is what that work is. In this case: what is semantic web? Wikipedia defines in the article found here that semantic web is “a ‘man-made woven web of data’ that facilitates machines to understand the semantics, or meaning, of information on the World Wide Web”. More directly, it is a set of standard notations, or ontologies, that allow for the standardization of information on the internet. This standardization allows for machines to more easily access and understand information on human readable websites, in effect making the website machine readable. This allows a machine to access information on the behalf of users for any purpose that the user may need.