?

Log in

Googless Calendar

If the only thing that keeps you from closing your Google Account is Calendar, here's the solution: setup your own CalDav server. I've chosen radicale. The setup is easy, but pam support is broken in Debian wheezy, so I had to fix it the following way:

.# get the pypi installerapt-get install python-stdebpatch <<EOF--- /usr/bin/pypi-install.ori   2014-05-11 21:32:24.884512975 +0200+++ /usr/bin/pypi-install       2014-05-10 20:23:34.427058833 +0200@@ -16,7 +16,7 @@  USER_AGENT = 'pypi-install/0.6.0+git ( http://github.com/astraw/stdeb )' -def find_tar_gz(package_name, pypi_url = 'http://python.org/pypi',verbose=0):+def find_tar_gz(package_name, pypi_url = 'http://pypi.python.org/pypi',verbose=0):     transport = xmlrpclib.Transport()     transport.user_agent = USER_AGENT     pypi = xmlrpclib.ServerProxy(pypi_url, transport=transport)EOF# install the latest pam library from pypipypi-install  pampatch <<EOF--- /usr/lib/python2.7/dist-packages/radicale/acl/PAM.py.ori    2014-05-11 21:35:36.441065840 +0200+++ /usr/lib/python2.7/dist-packages/radicale/acl/PAM.py        2014-05-10 20:27:12.771722350 +0200@@ -50,7 +50,7 @@      # Check whether the group exists     try:-        members = grp.getgrnam(GROUP_MEMBERSHIP)+        members = grp.getgrnam(GROUP_MEMBERSHIP)[3]     except KeyError:         log.LOGGER.debug(             "The PAM membership required group (%s) doesn't exist" %EOFapt-mark hold python-pamapt-get install radicale

This one appears high in Google SERPs on phpbb2drupal.

{syntaxhighlighter brush:sql}
---
--- Written by Feodor (feodor [at] mundo.ru)
---
--- Modified for drupal 4.5.2 and phpbb 2.0.11-2.0.13
--- by Alexander Mikhailian
---
--- This script makes an assumption that phpbb and drupal tables are kept in
--- one and the same database. Phpbb tables are expected to have the prefix
--- phpbb_ and drupal tables are expected to have the prefix drupal_
---
--- If phpBB forum use CP1251 (or another) encoding, the tables must be converted
--- into UTF8. If version of MySQL less then 4.1 "iconv" command can be used for
--- convertion of exported tables into UTF8.
---
--- Example:
--- iconv -fcp1251 -tutf8 phpbb2_utf-8.sql
---
--- Here is a list of phpbb tables used by script for import into Drupal:
---
--- phpbb_categories
--- phpbb_forums
--- phpbb_posts
--- phpbb_posts_text
--- phpbb_users
--- phpbb_vote_desc
--- phpbb_vote_results
---
--- You should probably edit the two variables below to match your result:

---
--- The name of the forums taxanomy as it appears on the site
---
SELECT @forum_title:='Forums';

---
--- Start importing users from this id. Do not forget that uid 1 is always
--- the Administrator in Drupal. Depending on whether you already created
--- the Administrator user in Drupal or not, you may want to change this
--- variable into 2.
---
SELECT @first_phpbb_user_id:=1; # uid 1 is always an administrator in Drupal

---
--- Drupal variables for node counts
---
SELECT @phpbb_terms:=MAX(forum_id) FROM phpbb_forums;
SELECT @phpbb_cat:=MAX(cat_id) FROM phpbb_categories;
SELECT @drupal_term_data_tid:=id FROM drupal_sequences WHERE name = 'drupal_term_data_tid';
SELECT IF( @drupal_term_data_tid>0, @drupal_term_data_tid, @drupal_term_data_tid:=0);
SELECT @drupal_comments_cid:=id FROM drupal_sequences WHERE name = 'drupal_comments_cid';
SELECT IF( @drupal_comments_cid>0, @drupal_comments_cid, @drupal_comments_cid:=0);
SELECT @drupal_vocabulary_vid:=id FROM drupal_sequences WHERE name = 'drupal_vocabulary_vid';
SELECT IF( @drupal_vocabulary_vid>0, @drupal_vocabulary_vid, @drupal_vocabulary_vid:=0);
SELECT @drupal_node_nid:=id FROM drupal_sequences WHERE name = 'drupal_node_nid';
SELECT IF( @drupal_node_nid>0, @drupal_node_nid, @drupal_node_nid:=0);

---
--- Import user forms phpbb_users
---
INSERT INTO drupal_users (uid,name,pass,mail,mode,sort,threshold,theme,signature,created,changed,status,timezone,language,picture,init,data)
SELECT user_id,username,user_password,user_email,0,0,0,'',user_sig,user_regdate,user_lastvisit,1,0,'',user_avatar,user_email,'a:1:{s:5:"roles";a:1:{i:0;s:1:"2";}}'
FROM phpbb_users
WHERE user_id>=@first_phpbb_user_id;

---
--- Create a vocabulary (called "Forum")
---
DELETE FROM drupal_vocabulary WHERE nodes='forum';
INSERT INTO drupal_vocabulary (vid,name,description,help,relations,hierarchy,multiple,required,nodes,weight)
SELECT @drupal_vocabulary_vid+1,@forum_title,'','','0','2','0','1','forum','0';

---
--- Set up the vocabulary "Forum" as the forum vocabulary
---
DELETE FROM drupal_variable WHERE name = 'forum_nav_vocabulary';
INSERT INTO drupal_variable (name, value) VALUES ('forum_nav_vocabulary', CONCAT('s:1:\"',@drupal_vocabulary_vid+1,'\";'));

---
--- The default comment order in phpbb2 is "oldest first"
---
DELETE FROM drupal_variable WHERE name = 'comment_default_order';
INSERT INTO drupal_variable (name, value) VALUES ('comment_default_order', 's:1:\"2\";');

---
--- Import categories from phpbb_categories as terms
---
INSERT INTO drupal_term_data (tid,vid,name,weight)
SELECT @drupal_term_data_tid+cat_id,@drupal_vocabulary_vid+1,cat_title,cat_order
FROM phpbb_categories;

---
--- Import terms from phpbb_forums
---
INSERT INTO drupal_term_data (tid,vid,name,description,weight)
SELECT @drupal_term_data_tid+@phpbb_cat+forum_id,@drupal_vocabulary_vid+1, forum_name,forum_desc,forum_order
FROM phpbb_forums;

ALTER TABLE drupal_term_data ORDER BY tid;

---
--- Import forum hierarchy
--- Drupal allows topics to be created at this level while phpbb2 does not. If you want to disallow topics below
--- categories, mark them as containers in the forum configuration dialog
---
INSERT INTO drupal_term_hierarchy (tid,parent)
SELECT @drupal_term_data_tid+cat_id,'0'
FROM phpbb_categories;

INSERT INTO drupal_term_hierarchy (tid,parent)
SELECT @drupal_term_data_tid+@phpbb_cat+forum_id, @drupal_term_data_tid+cat_id
FROM phpbb_forums;

ALTER TABLE drupal_term_hierarchy ORDER BY tid;

---
--- Create temporary tables for sorting topics and comments.
---

DROP TABLE IF EXISTS temp_posts;
CREATE TABLE temp_posts (
post_id mediumint(8) UNSIGNED NOT NULL auto_increment,
topic_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
forum_id smallint(5) UNSIGNED DEFAULT '0' NOT NULL,
poster_id mediumint(8) DEFAULT '0' NOT NULL,
post_time int(11) DEFAULT '0' NOT NULL,
post_edit_time int(11),
post_subject char(512),
post_text text,
PRIMARY KEY (post_id),
KEY forum_id (forum_id),
KEY topic_id (topic_id),
KEY poster_id (poster_id),
KEY post_time (post_time)
);
DROP TABLE IF EXISTS temp_node;
CREATE TABLE temp_node (
post_id mediumint(8) UNSIGNED NOT NULL auto_increment,
topic_id mediumint(8) UNSIGNED DEFAULT '0' NOT NULL,
PRIMARY KEY (post_id),
KEY topic_id (topic_id)
);

---
--- Copy into temporary table topics without comments
---
INSERT INTO temp_node (post_id,topic_id)
SELECT MIN(post_id), topic_id
FROM phpbb_posts
GROUP BY topic_id;
INSERT INTO temp_posts (post_id, topic_id,forum_id,poster_id, post_time,post_edit_time,post_subject,post_text)
SELECT c.post_id, c.topic_id, a.forum_id, IF(a.poster_id='-1','0',a.poster_id), a.post_time, a.post_edit_time,
REPLACE(b.post_subject, CONCAT(':',b.bbcode_uid),''), REPLACE(b.post_text, CONCAT(':',b.bbcode_uid),'')
FROM phpbb_posts AS a, phpbb_posts_text AS b, temp_node AS c
WHERE c.post_id=a.post_id AND c.post_id=b.post_id;

---
--- Insert nid and tid from temp_posts into drupal_term_node
---
INSERT INTO drupal_term_node (nid,tid)
SELECT @drupal_node_nid+topic_id,@drupal_term_data_tid+@phpbb_cat+forum_id
FROM temp_posts;

ALTER TABLE drupal_term_node ORDER BY nid;

---
--- Insert forum topics from temp_posts into drupal_node
---
INSERT INTO drupal_node (nid,type,title,uid,created,comment,body,changed)
SELECT @drupal_node_nid+topic_id,'forum',post_subject,poster_id,post_time,'2',post_text,
IF(post_edit_time'NULL',post_edit_time,post_time)
FROM temp_posts;

ALTER TABLE drupal_node ORDER BY nid;

---
--- Insert nid into drupal_forum
---
DELETE FROM drupal_forum;
INSERT INTO drupal_forum (nid,tid)
SELECT @drupal_node_nid+topic_id,@drupal_term_data_tid+@phpbb_cat+forum_id
FROM temp_posts;

---
--- Insert comments into drupal_comments for topics from temp_posts
---
INSERT INTO drupal_comments (nid,uid,subject,comment,hostname,timestamp,users)
SELECT @drupal_node_nid+a.topic_id,
CASE WHEN a.poster_id='-1' THEN '0' ELSE a.poster_id END,
REPLACE(c.post_subject, CONCAT(':',c.bbcode_uid),''),
REPLACE(c.post_text, CONCAT(':',c.bbcode_uid),''),
CONCAT_WS('.',CONV(SUBSTRING(a.poster_ip,1,2),16,10),
CONV(SUBSTRING(a.poster_ip,3,2),16,10),
CONV(SUBSTRING(a.poster_ip,5,2),16,10),
CONV(SUBSTRING(a.poster_ip,7,2),16,10)),
a.post_time,'a:1:{i:0;i:0;}'
FROM phpbb_posts AS a LEFT JOIN temp_posts AS b ON a.post_id=b.post_id,phpbb_posts_text AS c
WHERE b.post_id IS NULL AND a.post_id=c.post_id;
ALTER TABLE drupal_comments ORDER BY cid;

UPDATE drupal_comments,drupal_node
SET drupal_comments.subject=IF(drupal_comments.subject='',
CONCAT('Re:',drupal_node.title),drupal_comments.subject)
WHERE drupal_comments.nid=drupal_node.nid;

---
--- Update thread in drupal_comments
---
DROP TABLE IF EXISTS drupal_comments_tmp;
CREATE TABLE drupal_comments_tmp (
cid int(10) NOT NULL default '0',
nid int(10) NOT NULL default '0',
thread int(10) NOT NULL auto_increment,
PRIMARY KEY (nid,thread)
);

INSERT INTO drupal_comments_tmp (cid,nid)
SELECT cid,nid
FROM drupal_comments
WHERE cid>@drupal_comments_cid;

UPDATE drupal_comments,drupal_comments_tmp
SET drupal_comments.thread=CONCAT(CONCAT(REPEAT(9,
LEFT(drupal_comments_tmp.thread,LENGTH(drupal_comments_tmp.thread)-1)),
RIGHT(drupal_comments_tmp.thread,1)),'/')
WHERE drupal_comments.cid=drupal_comments_tmp.cid;

---
--- Update drupal_history
---
---INSERT INTO drupal_history (uid, nid, timestamp)
--- SELECT a.uid, b.nid, a.timestamp
--- FROM drupal_users AS a, drupal_node AS b
--- WHERE a.uid>0 AND b.nid>@drupal_node_nid;

---
--- update topic statistics
---

INSERT INTO drupal_node_comment_statistics
(nid,last_comment_timestamp,last_comment_name,last_comment_uid,comment_count)
SELECT @drupal_node_nid+pt.topic_id, pp.post_time, ppt.post_subject, pp.poster_id,
pt.topic_replies
FROM drupal_node dn, phpbb_topics pt, phpbb_posts pp, phpbb_posts_text ppt
WHERE dn.nid = @drupal_node_nid+pt.topic_id
AND pt.topic_last_post_id = pp.post_id
AND pp.post_id = ppt.post_id;

---
--- Delete all temp tables
---
DROP TABLE IF EXISTS temp_posts;
DROP TABLE IF EXISTS drupal_comments_tmp;
DROP TABLE IF EXISTS temp_node;

---
--- Update Drupal variables
---
SELECT @drupal_term_data_tid:=MAX(tid) FROM drupal_term_data;
SELECT @drupal_comments_cid:=MAX(cid) FROM drupal_comments;
SELECT @drupal_node_nid:=MAX(nid) FROM drupal_node WHERE type = 'forum';
SELECT @drupal_users_uid:=MAX(uid) FROM drupal_users;

DELETE FROM drupal_sequences WHERE name="drupal_term_data_tid";
DELETE FROM drupal_sequences WHERE name="drupal_comments_cid";
DELETE FROM drupal_sequences WHERE name="drupal_node_nid";
DELETE FROM drupal_sequences WHERE name="drupal_users_uid";

INSERT INTO drupal_sequences (name,id) SELECT "drupal_term_data_tid", @drupal_term_data_tid;
INSERT INTO drupal_sequences (name,id) SELECT "drupal_comments_cid",@drupal_comments_cid;
INSERT INTO drupal_sequences (name,id) SELECT "drupal_node_nid",@drupal_node_nid;
INSERT INTO drupal_sequences (name,id) SELECT "drupal_users_uid",@drupal_users_uid;

---
--- this is for porting personal messages to privatemsg drupal module
---
--- phpbb_privmsgs.privmsgs_type has the following codes:
--- written=1, seen in inbox = 5, read = 0, saved = 3
---
--- attention, drupal_privatemsg limits the title to 64 chars, this is too
--- little
---
INSERT INTO drupal_privatemsg ( id, author, recipient,subject,message,
timestamp,new,hostname,folder,author_del,recipient_del )
SELECT p.privmsgs_id, p.privmsgs_from_userid, p.privmsgs_to_userid,
p.privmsgs_subject,pt.privmsgs_text,p.privmsgs_date,
IF(p.privmsgs_type = 1 OR p.privmsgs_type = 5, 1, 0),
p.privmsgs_ip,0,0,0
FROM phpbb_privmsgs p
LEFT JOIN (phpbb_privmsgs_text pt) ON p.privmsgs_id=pt.privmsgs_text_id;

---
--- converting polls
---

INSERT INTO poll(nid,runtime,polled,active)
SELECT pvd.topic_id+41,0,'',0
FROM phpbb_vote_desc pvd;

INSERT INTO poll_choices(nid,chtext,chvotes,chorder)
SELECT pvd.topic_id+41,
pvr.vote_option_text,
pvr.vote_result,
vote_option_id # TODO chorder should be autoincremented from 1 every
# unique pvd.topic_id;
FROM phpbb_vote_results pvr, phpbb_vote_desc pvd
WHERE pvr.vote_id=pvd.vote_id;

UPDATE node n, phpbb_vote_desc pvd SET type='poll'
WHERE n.nid=pvd.topic_id+41;
{/syntaxhighlighter}

Tags:

Three days ago, I got my hands on the HTC Trophy 7 running Windows Phone 7. Until now, I played occasionally with Android, but never used a smartphone.

Here's a list of things I'd like to do on the smartphone:

  • Transfer the contacts from the old Nokia phone
  • Listen to music and audio books from the local collection
  • Read books from Flibusta

In the process, I'd rather not to reveal too much personal information to Microsoft. The wishlist looks simple, huh? Let's see what I managed to do.

Transfer the contacts from the old Nokia phone

I approached the contacts synchronisation in a stupid way, I agree. Knowing that the smartphone is a powerful Windows computer which can probably do whatever a desktop does, I exported the contacts using the Nokia PC Suite to a vCards file, and started looking for ways to import them into my Windows phone . It turned out that it is impossible. A workaround would be to import the contacts into the Google account or the Windows Live ID account using another smartphone, and then to log in under from the same account from the new phone, but I had no smartphone previously, and wanted to avoid sharing the contacts on Windows Live ID, anyway.

Luckily, my old Nokia phone could export all the contacts to the SIM card, so I just ended up doing that, and imported the contacts from SIM on the Windows phone. I lost some extra fields that existed on the Nokia phone, but that was not substantial. I remember importing contacts from an old Nokia phone to an Android phone a few months ago. Android suggested to connect to the Nokia phone via Bluetooth and transferred all the contacts wirelessly, after pairing the devices. It took me 5 minutes back then, well over 1 hour now.

Verdict: importing contacts is hard and you are likely to loose bits of information in the process.

Listen to music and audio books from the local collection

It did not come as a surprise that Windows Phone 7 enforces Zune, a Microsoft counterpart of iTunes to copy files to the phone. I have no clue what iTunes looks like, but the UI of Zune is fairly basic. I could index my audio collection, but navigating it turned out to be close to impossible. I never cared about the correctness of the tags, so the files came out ugly in Zune. There was apparently no way to navigate the collection using the file hierarchy, as I used to do, so locating the music turned out to be painfully difficult. Some of my mp3s still have tags ID3v1 tags in windows-cp1251 encoding, which Zune displayed with question marks.

Oh, just a side note. After I connected my phone to the PC and started Zune, it asked for an update, then took 15 min to do the update, rebooting the phone along the way, then asked for another update-reboot-wait cycle. I did not wait for the last update to end and went for a sleep. Luckily, the update did not ask questions and I had an updated phone in the morning.

Verdict: barely usable unless you kept your ID3v2 (not ID3v1!) tags in perfect order. No way to navigate the file structure from Zune.

Read books from Flibusta

Flibusta provides downloadable texts in .fb2, .epub or .mobi formats. The Windows Phone 7 Marketplace is tiny compared to Google market, but I found one free ebook reader called Freda that is able to read .epub files. Buried down in the UI ofr Freda, there is a Website page that allows to paste an URL in the text box. The same page also embeds Internet Explorer. I figured that the text box shall be used to download .epub document to the device. However, pasting a url to an .epub file distributed by Flibusta did not work. I tried another ebook reader, called iSilo. It was a very usable and feature-rich program on my old Palm TT, but it could not open my .epub file either.

An inspection of the HTTP session between the browser and Flibusta allowed to pinpoint a problem. Flibusta issues two HTTP 302 redirects: one to insure that the .fb2 document is converted to .epub and the other to compresses the .epub document, if needed. The Internet Explorer used in Windows Phone 7 did not want to follow the redirects, but typing manually the url given in the last redirect in the Freda search box worked.

Verdict: Epic failure. I had to use my software engineering skills before I understood how to download an ebook to the phone, and that was a real pain.

Bonus: Privacy considerations

Just like Android, Windows Phone 7 is less usable if you do not log in with you Windows Live ID. You can not get to the Marketplace, you can not go to XBox Live, etc. Once you log in, all your contacts are synchronized with the Windows Live ID, and you will be able to remotely activate Location services for your phone and track its location, block it or reset it to factory defaults. If the phone is provided by an employer, the employer can order a reset of your phone to factory defaults at any time via a corporate account web interface. I did not try the GPS navigation yet, but it apparently requires to enable Location services in order to operate.

I read the Windows® Phone 7 Privacy Statement and is clearly wins over the Android Privacy Policy. The later is way too generic, while the Windows® Phone 7 Privacy Statement covers real-world use-cases and provides useful information to customers.

Verdict: Slightly better than the other two vendors.

I bought a Desire S smartphone a little more than 2 months ago, but did not have a chance to really use it, yet. It came with an elusive and odd bug. The touch screen stopped responding once in a while. This could occur any time, but seemed to happen less frequently after 5-10 minutes of active use.

The first time I sent the phone to repair, it came back with flashed ROM, but the problem stayed.

The second time I sent the phone for repair, it came back with a new touch screen, but the problem stayed.

The third time, the repair shop replaced the motherboard. The initial problem disappeared, but the phone gained a new one. Part of the screen was not reacting to touch. Or rather, every time you touched a certain area on screen to select an item, items around it were selected or nothing happened.

I filed a 4th support request today.

Overall, I took no less than an hour talking to the call center, wrote several emails and paper notes to HTC and to the repair shop, recorded a video to demonstrate the bug and uploaded it to HTC, filed two complaints online and took a good hour each time it returned from the repair shop to setting it up and find ways to reproduce the problem.

As a reminder for myself, mostly:

  1. Only Mavericks, Yosemite is not yet working under KVM.
  2. KVM SLiRP networking does not work with MacOS X, bridging is hard to setup for wireless networks so it is better to use NAT versions of qemu-ifup and qemu-ifdown.
  3. It looks so cool on screenshots

Data-mining users in a screenful of code

Objective

Select like-minded users from a local community website.

Pre-requisites

  1. A Drupal website with the votingapi module enabled and at least a few dozen votes by registered users.
  2. A working installation of the R language.

Exract data

For each user, select all other users that voted on same node and comments:

SELECT v1.uid uid1, v2.uid uid2, u1.name name1, u2.name name2,  v2.entity_id entity_id, v1.value value1, v2.value value2FROM votingapi_vote v1JOIN (votingapi_vote v2, users u1, users u2) ON (v1.uid != v2.uid AND v1.entity_id=v2.entity_id   AND v1.entity_type=v2.entity_type AND v1.uid=u1.uid AND v2.uid=u2.uid)WHERE v1.uid < v2.uid AND v1.uid != 0 AND v2.uid != 0ORDER BY v1.uid,v2.uid;

This produces a table

uid1    uid2    name1   name2   value1  value21       2       Administrator   Bob     100     1001       2       Administrator   Bob     20      201       2       Administrator   Bob     40      401       2       Administrator   Bob     100     1001       2       Administrator   Bob     20      1001       2       Administrator   Bob     100     1001       2       Administrator   Bob     100     1001       2       Administrator   Bob     100     1001       2       Administrator   Bob     100     1001       2       Administrator   Bob     80      801       2       Administrator   Bob     100     201       2       Administrator   Bob     20      201       2       Administrator   Bob     60      601       2       Administrator   Bob     100     1001       2       Administrator   Bob     100     100

with five columns:

  1. first user id
  2. second user id
  3. first user's name
  4. second user's name
  5. vote of the first user
  6. vote of the second user

The important parts in the SQL are

  1. the JOIN on the same table, which allows to generate all permutations of uid1 and uid2
  2. the WHERE clause on v1.uid < v2.uid which reduces permutations to combinations.

The uid of 0 is skipped, because it is the uid of the anonymous user. Every anonymous vote is attributed to it.

Calculate similarity

It can be done in PHP, but why bother? Here's a handy R script that takes the above table as in.tsv and produces, for each user, a file with the following columns:

  1. id of the other user
  2. username
  3. number of votes in common
  4. Pearson's correlation coefficient between votes
  5. a p-value that indicates how certain was the algorithm.
#!/usr/bin/env Rscriptd <- read.delim("in.tsv")unique1 <- unique(c(d$uid1, d$uid2))for (id1 in unique1) {  if (file.exists(as.character(id1))) {    file.remove(as.character(id1))  }  temp1 <- d[d$uid1==id1 | d$uid2==id1, ]  unique2 <- unique(c(temp1$uid1, temp1$uid2))  unique2 <- unique2[!unique2 == id1] # remove id1  for (id2 in unique2) {    if (id1 < id2) {      result <- temp1[temp1$uid1==id1 & temp1$uid2==id2, ]      name <- as.character(result$name2[1])    } else {      result <- temp1[temp1$uid1==id2 & temp1$uid2==id1, ]      name <- as.character(result$name1[1])    }    n = nrow(result)    if (n > 7) {      x <- result$value1      y <- result$value2      pvalue <- cor.test(x,y)$p.value      if (is.finite(pvalue) && pvalue < 0.05) {        correlation <- cor(x,y)        cat(id2, name, n, correlation, pvalue, "\n", sep = "\t", file = paste(id1, sep = ""), append = T)      }    }  }}

Notice the use of the cor(x,y) function that calculates the correlation and cor.test(x,y) that produces additional metrics for the correlation, including p-value. By convention, everything above p-value < 0.05 is considered uncertain, so we only print lines where p-value < 0.05. Jiggling with id1 and id2, and the if-else block are there to select pairs of users in any order.

Here's the output from the above data:

2       Bob     15      0.6039604       0.01710946

Display results

The rest is fairly obvious. I've chosen to display the data as a tag cloud on user profiles. With a hook on hook_menu,

/** * Hook into the user menu */function mymodule_menu() {  $items['user/%user/likeminded'] = array(    'access callback' => TRUE,    'access arguments' => array(1),    'page callback' => 'mymodule_likeminded', // function defined below    'page arguments' => array(1),    'title' => 'Likeminded',    'weight' => 5,    'type' => MENU_LOCAL_TASK,  );  return $items;}

I fetch the user's data file as generated by the R script above and display the data from it in a bag of words of varying sizes:

/** * Display likeminded users */function mymodule_likeminded($arg){  if (is_object($arg) && !$arg->uid) {    return;  }  # this is my path to the results, your path may be different  $path =  drupal_get_path('module', 'mymodule') . '/pearsons/' . $arg->uid;   $lines = array();  $min = 0; $max = 0;  if ($handle = @fopen($path, 'r')) {    while($line = fgets($handle)) {      $line = explode("\t", $line);      if ($line[2] >= $max) { $max = $line[2]; }      if ($line[2] <  $min) { $min = $line[2]; }      $lines[] = $line;    }   }  $output = '';   // Likeminded  $output .= '

' .t('Likeminded') .'

' ; $output .= '
'; foreach($lines as &$line) { if ($line[3] > 0 ) { $size =mymodule_font_size($min, $max, $line[2]); $opacity = $line[3]; $output .= "
[Error: Irreparable invalid markup ('<span [...] "pt;opacity:">') in entry. Owner must fix manually. Raw contents below.]

<h2>Objective</h2><p>Select like-minded users from a local community website.</p><h2>Pre-requisites</h2><ol><li>A <a href="http://www.drupal.org/">Drupal</a> website with the <a href="http://www.drupal.org/project/votingapi">votingapi</a> module enabled and at least a few dozen votes by registered users.</li><li>A working installation of the <a href="http://www.r-project.org/">R language</a>.</li></ol><h2>Exract data</h2><p>For each user, select all other users that voted on same node and comments:</p><pre class="brush: sql">SELECT v1.uid uid1, v2.uid uid2, u1.name name1, u2.name name2, v2.entity_id entity_id, v1.value value1, v2.value value2FROM votingapi_vote v1JOIN (votingapi_vote v2, users u1, users u2) ON (v1.uid != v2.uid AND v1.entity_id=v2.entity_id AND v1.entity_type=v2.entity_type AND v1.uid=u1.uid AND v2.uid=u2.uid)WHERE v1.uid < v2.uid AND v1.uid != 0 AND v2.uid != 0ORDER BY v1.uid,v2.uid;</pre><p>This produces a table</p><pre class="brush: plain">uid1 uid2 name1 name2 value1 value21 2 Administrator Bob 100 1001 2 Administrator Bob 20 201 2 Administrator Bob 40 401 2 Administrator Bob 100 1001 2 Administrator Bob 20 1001 2 Administrator Bob 100 1001 2 Administrator Bob 100 1001 2 Administrator Bob 100 1001 2 Administrator Bob 100 1001 2 Administrator Bob 80 801 2 Administrator Bob 100 201 2 Administrator Bob 20 201 2 Administrator Bob 60 601 2 Administrator Bob 100 1001 2 Administrator Bob 100 100</pre><p>with five columns:</p><ol><li>first user id </li><li>second user id</li><li>first user's name</li><li>second user's name</li><li>vote of the first user</li><li>vote of the second user</li></ol><p>The important parts in the SQL are</p><ol><li>the JOIN on the same table, which allows to generate all permutations of uid1 and uid2</li><li>the WHERE clause on <code>v1.uid &lt; v2.uid</code> which reduces <a href="http://www.mathsisfun.com/combinatorics/combinations-permutations.html">permutations to combinations</a>.</li></ol><p>The uid of 0 is skipped, because it is the uid of the anonymous user. Every anonymous vote is attributed to it.</p><h2>Calculate similarity</h2><p>It can be done in PHP, but why bother? Here's a handy R script that takes the above table as <code>in.tsv</code> and produces, for each user, a file with the following columns:</p><ol><li>id of the other user</li><li>username</li><li>number of votes in common</li><li>Pearson's correlation coefficient between votes</li><li>a <a href="http://en.wikipedia.org/wiki/P-value"><em>p</em>-value</a> that indicates how certain was the algorithm.</li></ol><pre class="brush: bash">#!/usr/bin/env Rscriptd <- read.delim("in.tsv")unique1 <- unique(c(d$uid1, d$uid2))for (id1 in unique1) { if (file.exists(as.character(id1))) { file.remove(as.character(id1)) } temp1 <- d[d$uid1==id1 | d$uid2==id1, ] unique2 <- unique(c(temp1$uid1, temp1$uid2)) unique2 <- unique2[!unique2 == id1] # remove id1 for (id2 in unique2) { if (id1 < id2) { result <- temp1[temp1$uid1==id1 & temp1$uid2==id2, ] name <- as.character(result$name2[1]) } else { result <- temp1[temp1$uid1==id2 & temp1$uid2==id1, ] name <- as.character(result$name1[1]) } n = nrow(result) if (n > 7) { x <- result$value1 y <- result$value2 pvalue <- cor.test(x,y)$p.value if (is.finite(pvalue) && pvalue < 0.05) { correlation <- cor(x,y) cat(id2, name, n, correlation, pvalue, "\n", sep = "\t", file = paste(id1, sep = ""), append = T) } } }}</pre><p>Notice the use of the <code>cor(x,y)</code> function that calculates the correlation and <code>cor.test(x,y)</code> that produces additional metrics for the correlation, including <em>p</em>-value. By convention, everything above <code>p-value &lt; 0.05</code> is considered uncertain, so we only print lines where <code>p-value &lt; 0.05</code>. Jiggling with <code>id1</code> and <code>id2</code>, and the <code>if-else</code> block are there to select pairs of users in any order.</p><p>Here's the output from the above data:</p><pre class="brush: plain">2 Bob 15 0.6039604 0.01710946</pre><h2>Display results</h2><p>The rest is fairly obvious. I've chosen to display the data as a tag cloud on user profiles. With a hook on <code>hook_menu</code>,</p><pre class="brush: php">/** * Hook into the user menu */function mymodule_menu() { $items['user/%user/likeminded'] = array( 'access callback' => TRUE, 'access arguments' => array(1), 'page callback' => 'mymodule_likeminded', // function defined below 'page arguments' => array(1), 'title' => 'Likeminded', 'weight' => 5, 'type' => MENU_LOCAL_TASK, ); return $items;}</pre><p>I fetch the user's data file as generated by the R script above and display the data from it in a bag of words of varying sizes:</p><pre class="brush: php">/** * Display likeminded users */function mymodule_likeminded($arg){ if (is_object($arg) && !$arg->uid) { return; } # this is my path to the results, your path may be different $path = drupal_get_path('module', 'mymodule') . '/pearsons/' . $arg->uid; $lines = array(); $min = 0; $max = 0; if ($handle = @fopen($path, 'r')) { while($line = fgets($handle)) { $line = explode("\t", $line); if ($line[2] >= $max) { $max = $line[2]; } if ($line[2] < $min) { $min = $line[2]; } $lines[] = $line; } } $output = ''; // Likeminded $output .= '<h1>' .t('Likeminded') .'</h1>' ; $output .= '<div class="likeminded">'; foreach($lines as &$line) { if ($line[3] > 0 ) { $size =mymodule_font_size($min, $max, $line[2]); $opacity = $line[3]; $output .= "<span style="\&quot;font-size:&quot;" .="." $size="$size" .="." "pt;opacity:"="&quot;pt;opacity:&quot;" .="." $opacity="$opacity" .="." "\"="&quot;\&quot;">"; $output .= l($line[1], 'user/' . $line[0]); $output .= "</span>"; } } $output .= '</div>'; // Adversaries $output .= '<h1>' .t('Adversaries') .'</h1>' ; $output .= '<div class="adversaries">'; foreach($lines as &$line) { if ($line[3] < 0 ) { $size =mymodule_font_size($min, $max, $line[2]); $opacity = abs($line[3]); $output .= "<span style="\&quot;font-size:&quot;" .="." $size="$size" .="." "pt;opacity:"="&quot;pt;opacity:&quot;" .="." $opacity="$opacity" .="." "\"="&quot;\&quot;">"; $output .= l($line[1], 'user/' . $line[0]); $output .= "</span>"; } } $output .= '</div>'; return $output;} /** * calculate the font size in proportion to the maximum and minimum of common votes */function mymodule_font_size($min_count, $max_count, $cur_count, $min_font_size=11, $max_font_size=36) { if ($min_count == $max_count) # avoid DivideByZero exception { return $min_font_size; } return ( ($max_font_size - $min_font_size) / ($max_count - $min_count) * ($cur_count - $min_count) + $min_font_size);}</pre><p>That's it.</p><p>This approach scales fairly well. It takes around one minute to extract the data and around 30 minutes to calculate similarity on a database of 100 000 users, 1 000 000 posts and 4 500 000 votes, all on the same server that runs the website.</p><p>The lead image shows a real user profile page with a selection of like-minded users and adversaries.</p><p>P.S. If there's enough interest, I will rewrite the above code as a Drupal module.</p><p>P.P.S. Want to datamine your own data and receive an understandable explanation afterwards? <a href="mailto:mikhailian@mova.org">Drop me a line</a>.</p>

Remember the standard structure of Subversion repositories? The one that you create with mkdir project/{trunk,tags,branches}? I now figured why people create so few branches and tags in this configuration. Because they checkout at project/trunk level and not at project level by fear of getting essentially the same code multiple times. And if you are at project/trunk, you can't really work with project/branches or project/tags easily.

But there's a solution! Use the --depth and --set-depth options to svn checkout and svn update commands. For instance, when checking out a repository, do it in two steps. First, checkout only the {trunk, tags, branches} folders, but nothing below them:

svn co --depth immediates http://example.com/svn/project

then, change to project/trunk and get the rest of the codebase from trunk:

cd project/trunksvn up --set-depth infinity

See how it helps? You can now cherry-pick only the branches you want. And get rid of them by setting depth back to immediates

Just stumbled upon a fancy banner by Microsoft that advertises its Embrace and Extend from the childhood program.

For the record: the only reason Microsoft supports this "Coding in classroom initiative" is because they want to push their products through kids. It's a problem, but a bigger problem is that Microsoft have long striven to make computing an elite profession by introducing inconsistencies and complexity for the most basic abstractions: a character, a file, a block device... their products are designed to fail pupils who want to understand how computers work. And this design is intentional, because the less people understand computing, the less competition their business has... and higher are the profits.

Thus, taking money from Microsoft to promote coding in the classroom is akin to taking money from Philip Morris to promote healthy lifestyle. Shameful.

Today my kid brought back from school an assignment to guess words from a bag of letters... it took a mum and a programmer to solve all six. I left one for you, though. Guess what C D E E I M R R stands for.

P.S. it's a classical programming interview question about all permutations of a string. Generate all permutations, grep -f them against /usr/share/dict/french and you'll get the answer.

Profile

mikhailian
mikhailian

Latest Month

July 2016
S M T W T F S
     12
3456789
10111213141516
17181920212223
24252627282930
31      

Tags

Syndicate

RSS Atom
Powered by LiveJournal.com
Designed by yoksel